Hi Stefan Using OpenOffice will enable you to parse 182 file formats, but its not a pure java solution and you still need an alternate solution for pdfs.
I'd be interested in knowing whether anyone is working on a pure java solution that would give us a single method for handling ms office documments / pdfs / etc. Cheers Pete ----- Original Message ----- From: "Stefan Groschupf" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, November 05, 2003 10:26 AM Subject: Re: Index entire filesystem > > I had write to this list some days ago, to announce a possibility to > parse 182 file formats. > There was a tiny bug report some days ago, i hope i can fix it. > > Browse the archive to figure out more. > > Cheers > Stefan > > Marcel Stor wrote: > > >Hi all, > > > >I'm thinkin' about writing a search tool for my filesystem. I know such > >things exist already but programming it myself is much more fun ;-) > >So, I would have Lucene crawl through my filesystem and pass each file > >to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows > >system and would depend on the file ending to distinguish the file type. > >Is this a good idea in general? Is there a list of available indexer for > >the the different file types? Any other comments are also welcome. > > > >Regards, > >Marcel > > > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]