Hi all,
I'm thinkin' about writing a search tool for my filesystem. I know such
things exist already but programming it myself is much more fun ;-)
So, I would have Lucene crawl through my filesystem and pass each file
to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
system and would depend on the file ending to distinguish the file type.
Is this a good idea in general? Is there a list of available indexer for
the the different file types? Any other comments are also welcome.
The general idea (limited to .txt files intentionally) is included in this code:
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html
The Ant <index> task in jakarta-lucene-sandbox CVS repository has a document handler interface that is designed to allow for plugability. You named the PDF pieces, and there is POI for dealing with Office documents.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]