First, when asking a new question, it's best to start a new subject. Your question has nothing to do with the rest of the thread....
That said, you want to create a Reader to pass along. I'd think about doing this by subclassing your MSWord class from the Reader class and providing the necessary implementation of the abstract read method. Best Erick On 6/8/07, jim shirreffs <[EMAIL PROTECTED]> wrote:
I am trying to index msword documents. I've got things working but I do not think I am doing things properly. To index msword docs I use an extractor to extract the text. Then I write the text to a .txt file and index that using an HTLMDocument object. Seems to me that since I have the text I should be able to just do a Doc.add("content", the_text_from_the_word_doc, ???, ???); But looking at Document.java it seems the field "content" requires a reader. So I write a temporary file to satified that requirement. What I would like to have is an MSWORDDocument class that would take the extracted text as a argument to the constructor and create a Ducument object that I could get. If any one has any idea, please let me know. Here is a code segment. Notice the msword hack, /* * make a document */ try { if (ftype.startsWith("text")) { doc = HTMLDocument.Document(f); } else if (ftype.equals("application/pdf")) { doc = LucenePDFDocument.getDocument(f); } else if (ftype.equals("application/msword")) { FileInputStream fin = new FileInputStream(f.getAbsolutePath()); WordExtractor extractor = new WordExtractor(fin); String content = extractor.getText(); if(debug) System.out.println(content); String tempFileName=f.getAbsolutePath() + ".txt"; BufferedWriter bw = new BufferedWriter(new FileWriter(tempFileName, false)); bw.write((String) content.toString()); bw.close(); File df = new File(tempFileName); doc = HTMLDocument.Document(df); df.delete(); } else if (ftype.equals("binary")) { return null; } else { if(debug) System.out.println("Unknown file type not ascii or pdf."); doc = HTMLDocument.Document(f); } } catch(java.lang.InterruptedException ie) { throw ie; } catch(java.io.IOException ioe) { throw ioe; } Thanks in advance --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]