Re: IndexSorter optimizer

2006-01-04 Thread Doug Cutting
Byron Miller wrote: On optimizing performance, does anyone know if google is exporting its entire dataset as an index or only somehow indexing the topN % (since they only show the first 1000 or so results anyway) Both. The highest-scoring pages are kept in separate indexes that are searched

Re: IndexSorter optimizer

2006-01-04 Thread Andrzej Bialecki
Doug Cutting wrote: Byron Miller wrote: On optimizing performance, does anyone know if google is exporting its entire dataset as an index or only somehow indexing the topN % (since they only show the first 1000 or so results anyway) Both. The highest-scoring pages are kept in separate

Re: IndexSorter optimizer

2006-01-03 Thread Byron Miller
On optimizing performance, does anyone know if google is exporting its entire dataset as an index or only somehow indexing the topN % (since they only show the first 1000 or so results anyway) With this patch and a top result set in the xml file does that mean it will stop scanning the index at

Re: IndexSorter optimizer

2006-01-02 Thread Doug Cutting
Andrzej Bialecki wrote: I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at least comparable, if not actually

Re: IndexSorter optimizer

2006-01-02 Thread Andrzej Bialecki
Doug Cutting wrote: Andrzej Bialecki wrote: Using the original index, it was possible for pages with high tf/idf of a term, but with a low boost value (the OPIC score), to outrank pages with high boost but lower tf/idf of a term. This phenomenon leads quite often to results that are

Re: IndexSorter optimizer

2006-01-02 Thread Andrzej Bialecki
Doug Cutting wrote: I have committed this, along with the LuceneQueryOptimizer changes. I could only find one place where I was using numDocs() instead of maxDoc(). Right, I confused two bugs from different files - the other bug still exists in the committed version of the

IndexSorter optimizer

2005-12-21 Thread Andrzej Bialecki
Hi, I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at least comparable, if not actually better. The reason

Re: IndexSorter optimizer

2005-12-21 Thread Stefan Groschupf
Hi Andrzej, wow are really great news! Using the optimized index, I reported previously that some of the top-scoring results were missing. As it happens, the missing results were typically the junk pages with high tf/idf but low boost. Since we collect up to N hits, going from higher to

Re: IndexSorter optimizer

2005-12-21 Thread Byron Miller
I've got 400mill db i can run this against over the next few days. -byron --- Stefan Groschupf [EMAIL PROTECTED] wrote: Hi Andrzej, wow are really great news! Using the optimized index, I reported previously that some of the top-scoring results were missing. As it happens, the

Re: IndexSorter optimizer

2005-12-21 Thread American Jeff Bowden
Andrzej Bialecki wrote: Hi, I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at least comparable, if not

Re: IndexSorter optimizer

2005-12-21 Thread Andrzej Bialecki
American Jeff Bowden wrote: Andrzej Bialecki wrote: Hi, I'm happy to report that further tests performed on a larger index seem to show that the overall impact of the IndexSorter is definitely positive: performance improvements are significant, and the overall quality of results seems at