http://textmining.org/
________________________________ From: Henry Lu [mailto:[EMAIL PROTECTED] Sent: Fri 5/18/2007 7:22 AM To: POI Users List Subject: Re: reading MS word file Where do i download the org.textmining package? -Henry Dmitry Goldenberg wrote: > Henry, > > There are a few things you can try. > > 1. Take a look at org.textmining's Word text extractor: > > org.textmining.text.extraction.WordExtractor > > All you have to do is this: > > new WordExtractor().extractText(inputStream) > > 2. There is also the POI extractor: > > org.apache.poi.hdf.extractor.WordDocument > > All you do is: > > WordDocument wd = new WordDocument(is); > StringWriter docTextWriter = new StringWriter(); > wd.writeAllText(new PrintWriter(docTextWriter)); > docTextWriter.close(); > text = docTextWriter.toString(); > > 3. I'd also check out the following: > > org.semanticdesktop.aperture.extractor.word.WordExtractor > > here: http://aperture.sourceforge.net/doc/javadoc/index.html > > Hope this helps, > - Dmitry > > > ________________________________ > > From: Henry Lu [mailto:[EMAIL PROTECTED] > Sent: Thu 5/17/2007 1:19 PM > To: [email protected] > Subject: reading MS word file > > > > Is there an example/code to read a MS Word file for text line by line. > All I am interested in is the text regardless format, style, font... > > -Henry > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > Mailing List: http://jakarta.apache.org/site/mail2.html#poi > The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/ > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] Mailing List: http://jakarta.apache.org/site/mail2.html#poi The Apache Jakarta Poi Project: http://jakarta.apache.org/poi/
