Or Tika, Lucene's cousin: http://incubator.apache.org/tika/ (which uses POI under the hood, but goes beyond MS Word parsing)
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ________________________________ From: Donna L Gresh <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, November 12, 2008 8:25:43 AM Subject: Re: AW: Parsing MSWord Check out POI; that's what I use http://poi.apache.org/ "Sertic Mirko, Bedag" <[EMAIL PROTECTED]> wrote on 11/12/2008 03:25:47 AM: > Hi > > You can also use a tool called "antiword" to extract the text from a > .doc file, and then > give the text to lucene. > > See here : http://en.wikipedia.org/wiki/Antiword > > Regards > Mirko > > -----Ursprüngliche Nachricht----- > Von: dipesh [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 12. November 2008 04:38 > An: java-user@lucene.apache.org > Betreff: Parsing MSWord > > Hello, > I wanted to know if there are classes in Lucene that support parsing MSWord > documents. > Many thanks, > Dipesh > > ---------------------------------------- > "Help Ever Hurt Never"- Baba