I've finally found some time to get the task I've built for Ant committed to
the jakarta-lucene-sandbox area.  Its under projects/ant.

Its definitely in need of some refactoring and critique by more
knowledgeable Lucene API folks in order to make it a truly great thing.  My
use case: create an index of documentation (mostly consisting of HTML
files - Ant's own documentation, in fact) at build time and bundle the index
with a distribution (a WAR in my case). It worked beautifully for my use
case, and very rapidly scans a directory and only indexes new documents when
indexing incrementally.

Syntax:

    <index index="${index.dir}"
           overwrite="true" mergeFactor="20"

documentHandler="org.apache.lucene.ant.FileExtensionDocumentHandler">
      <fileset dir="${docs.dir}"/>
    </index>

overwrite, mergeFactor, and documentHandler are optional.  The default
(shown) document handler simply delegates to an appropriate class depending
on the extension (only .txt and .html supported currently).

I'll eventually get around to adding documentation for this and assisting in
folks making it more generically useful for other extensions or creating
some other type of document handler.  This could even evolve or spawn
another Ant task for crawling sites at build time to index - at least having
a crawler launched from an Ant build file would be a good way to launch
rather than running java with some arguments.

Feedback, criticism, suggestions - all welcome!

    Erik




--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to