Hi All,

I had occasion to move an existing site that had Lucene integrated into it, from a TomCat to a Jetty setup.

I noticed during this that while Lucene is a great search engine, it can be very difficult to configure under certain circumstances, due to some internal inconsistencies.

Here is a list of _some_ of the aspects that need configuring:

1. The root directory where each Lucene index is stored
2. The actual Lucene index to use or create
3. The Analyzer to use for searching and creation
4. The set of patterns to exclude while crawling
5. The set of fields to store during index creation
6. The cocoon-views to use for content and link extraction



The first problem I came across is with (1) above, the 'index' directory used by Lucene, defaults to Jetty's 'work' directory '/private/tmp/Jetty__8888__/cocoon-files/' OMM, which gets cleaned out each time Jetty is restarted (TomCat does not do this), meaning you loose the indexes. So when you are using Jetty, you almost definitely need to re-set this.

Two separate components need this parameter, the Searcher and the Indexer. If you have multiple independently searchable sub-sites in one Servlet, you would need all of them to use the same config, differentiating between multiple indexes via param (2) above.

SimpleLuceneCocoonSearcherImpl reads an optional <directory/> parameter from cocoon.xconf, but it has no effect, because the SearchGenerator resets this during it's setup.

SimpleLuceneCocoonIndexerImpl does not pick up configuration from the <directory/> parameter, even though it's name is declared as a static variable. This parameter actually gets passed from create-index.xsp, so you need to modify the indexer XSP to set the base location of the indexes.

The only way it appears you can set a custom location for Lucene's indexes for searching, is by putting an absolute path to them in the SearchGenerator's <index/> parameter, in your SiteMap. ie in parameter (2) above. This is not good IMHO.


The next inconsistency is that the Analyzer classname (parameter (3) above) can be set in cocoon.xconf on both the Searcher and the Indexer, but again is overridden by SearchGenerator and create-index.xsp. While I am not completely sure who needs to change the Analyzer or why, I strongly suspect it could need to be different for each index in a multi-index site. I do not think this is possible with the current design.


The next set of params (4) & (5) above, should not IMHO be global, if again, you are setting up multiple sub-sites each with their own search index, you would legitimately need separate settings for each of these as the are likely to have different URLs and document structures etc..


Param (6) above, is less clear-cut ..... would there be a genuine need to have different settings for view-names for separate site-indexes?


I do not have a proper proposal yet ..... I would like to discuss how to best rationalise this situation, but have no wish to trample on other people configuration needs ..... to start with, do you think my analysis is correct?


regards Jeremy



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to