Antiword would be hard to inject into Nutch as it is not Java based. It will reqier native calls.
Alexander 2008/11/12 Sertic Mirko, Bedag <[EMAIL PROTECTED]> > Hi > > You can also use a tool called "antiword" to extract the text from a .doc > file, and then > give the text to lucene. > > See here : http://en.wikipedia.org/wiki/Antiword > > Regards > Mirko > > -----Ursprüngliche Nachricht----- > Von: dipesh [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 12. November 2008 04:38 > An: java-user@lucene.apache.org > Betreff: Parsing MSWord > > Hello, > I wanted to know if there are classes in Lucene that support parsing MSWord > documents. > Many thanks, > Dipesh > > ---------------------------------------- > "Help Ever Hurt Never"- Baba > -- Best Regards Alexander Aristov