Or Tika, Lucene's cousin: http://incubator.apache.org/tika/
(which uses POI under the hood, but goes beyond MS Word parsing)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




________________________________
From: Donna L Gresh <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, November 12, 2008 8:25:43 AM
Subject: Re: AW: Parsing MSWord

Check out POI; that's what I use

http://poi.apache.org/


"Sertic Mirko, Bedag" <[EMAIL PROTECTED]> wrote on 11/12/2008 03:25:47 
AM:

> Hi
> 
> You can also use a tool called "antiword" to extract the text from a
> .doc file, and then
> give the text to lucene.
> 
> See here : http://en.wikipedia.org/wiki/Antiword
> 
> Regards
> Mirko
> 
> -----Ursprüngliche Nachricht-----
> Von: dipesh [mailto:[EMAIL PROTECTED] 
> Gesendet: Mittwoch, 12. November 2008 04:38
> An: java-user@lucene.apache.org
> Betreff: Parsing MSWord
> 
> Hello,
> I wanted to know if there are classes in Lucene that support parsing 
MSWord
> documents.
> Many thanks,
> Dipesh
> 
> ----------------------------------------
> "Help Ever Hurt Never"- Baba

Reply via email to