[Nutch-dev] Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Andrzej Bialecki Wed, 14 Dec 2005 02:08:04 -0800

Doug Cutting wrote:

Andrzej Bialecki wrote:
Ok, I just tested IndexSorter for now. It appears to work correctly,at least I get exactly the same results, with the same scores and thesame explanations, if I run the smae queries on the original and onthe sorted index.
Here's a more complete version, still mostly untested. This shouldmake searches faster. We'll see how much good the results are...
This includes a patch to Lucene to make it easier to write hitcollectors that collect TopDocs.
I'll test this on a 38M document index tomorrow.

I'll test it soon - one comment, though. Currently you use a subclass ofRuntimeException to stop the collecting. I think we should come up witha better mechanism - throwing exceptions is too costly. Perhaps theHitCollector.collect() method should return a boolean to signal whetherthe searcher should continue working.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

[Nutch-dev] Re: IndexOptimizer (Re: Lucene performance bottlenecks)

Reply via email to