Quick reply via a PDA... I'd like to add to your list: 7) Ability to crawl a site using cocoon protocol rather than http. Thus an index could be created as an offline process (e.g when the site is statically generated, and only the search is dynamic - thus http cannot provide link view.)
Upayavira Jeremy Quinn wrote: > Hi All, > > I had occasion to move an existing site that had Lucene integrated into > it, from a TomCat to a Jetty setup. > > I noticed during this that while Lucene is a great search engine, it > can be very difficult to configure under certain circumstances, due to > some internal inconsistencies. > > Here is a list of _some_ of the aspects that need configuring: > > 1. The root directory where each Lucene index is stored > 2. The actual Lucene index to use or create > 3. The Analyzer to use for searching and creation > 4. The set of patterns to exclude while crawling > 5. The set of fields to store during index creation > 6. The cocoon-views to use for content and link extraction > > > > The first problem I came across is with (1) above, the 'index' > directory used by Lucene, defaults to Jetty's 'work' directory > '/private/tmp/Jetty__8888__/cocoon-files/' OMM, which gets cleaned out > each time Jetty is restarted (TomCat does not do this), meaning you > loose the indexes. So when you are using Jetty, you almost definitely > need to re-set this. > > Two separate components need this parameter, the Searcher and the > Indexer. If you have multiple independently searchable sub-sites in one > Servlet, you would need all of them to use the same config, > differentiating between multiple indexes via param (2) above. > > SimpleLuceneCocoonSearcherImpl reads an optional <directory/> parameter > from cocoon.xconf, but it has no effect, because the SearchGenerator > resets this during it's setup. > > SimpleLuceneCocoonIndexerImpl does not pick up configuration from the > <directory/> parameter, even though it's name is declared as a static > variable. This parameter actually gets passed from create-index.xsp, so > you need to modify the indexer XSP to set the base location of the > indexes. > > The only way it appears you can set a custom location for Lucene's > indexes for searching, is by putting an absolute path to them in the > SearchGenerator's <index/> parameter, in your SiteMap. ie in parameter > (2) above. This is not good IMHO. > > > The next inconsistency is that the Analyzer classname (parameter (3) > above) can be set in cocoon.xconf on both the Searcher and the Indexer, > but again is overridden by SearchGenerator and create-index.xsp. While > I am not completely sure who needs to change the Analyzer or why, I > strongly suspect it could need to be different for each index in a > multi-index site. I do not think this is possible with the current > design. > > > The next set of params (4) & (5) above, should not IMHO be global, if > again, you are setting up multiple sub-sites each with their own search > index, you would legitimately need separate settings for each of these > as the are likely to have different URLs and document structures etc.. > > > Param (6) above, is less clear-cut ..... would there be a genuine need > to have different settings for view-names for separate site-indexes? > > > I do not have a proper proposal yet ..... I would like to discuss how > to best rationalise this situation, but have no wish to trample on > other people configuration needs ..... to start with, do you think my > analysis is correct? > > > regards Jeremy > > > >
