On Wednesday, November 5, 2003, at 03:51 AM, Marcel Stor wrote:
Hi all,

I'm thinkin' about writing a search tool for my filesystem. I know such
things exist already but programming it myself is much more fun ;-)
So, I would have Lucene crawl through my filesystem and pass each file
to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
system and would depend on the file ending to distinguish the file type.
Is this a good idea in general? Is there a list of available indexer for
the the different file types? Any other comments are also welcome.

The general idea (limited to .txt files intentionally) is included in this code:


http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html

The Ant <index> task in jakarta-lucene-sandbox CVS repository has a document handler interface that is designed to allow for plugability. You named the PDF pieces, and there is POI for dealing with Office documents.

Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to