Byron Miller wrote:
On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)
Both. The highest-scoring pages are kept in separate indexes that are
searched
Doug Cutting wrote:
Byron Miller wrote:
On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)
Both. The highest-scoring pages are kept in separate
On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)
With this patch and a top result set in the xml file
does that mean it will stop scanning the index at
Andrzej Bialecki wrote:
I'm happy to report that further tests performed on a larger index seem
to show that the overall impact of the IndexSorter is definitely
positive: performance improvements are significant, and the overall
quality of results seems at least comparable, if not actually
Doug Cutting wrote:
Andrzej Bialecki wrote:
Using the original index, it was possible for pages with high tf/idf
of a term, but with a low boost value (the OPIC score), to outrank
pages with high boost but lower tf/idf of a term. This phenomenon
leads quite often to results that are
Doug Cutting wrote:
I have committed this, along with the LuceneQueryOptimizer changes.
I could only find one place where I was using numDocs() instead of
maxDoc().
Right, I confused two bugs from different files - the other bug still
exists in the committed version of the
Hi,
I'm happy to report that further tests performed on a larger index seem
to show that the overall impact of the IndexSorter is definitely
positive: performance improvements are significant, and the overall
quality of results seems at least comparable, if not actually better.
The reason
Hi Andrzej,
wow are really great news!
Using the optimized index, I reported previously that some of the
top-scoring results were missing. As it happens, the missing
results were typically the junk pages with high tf/idf but low
boost. Since we collect up to N hits, going from higher to
I've got 400mill db i can run this against over the
next few days.
-byron
--- Stefan Groschupf [EMAIL PROTECTED] wrote:
Hi Andrzej,
wow are really great news!
Using the optimized index, I reported previously
that some of the
top-scoring results were missing. As it happens,
the
Andrzej Bialecki wrote:
Hi,
I'm happy to report that further tests performed on a larger index
seem to show that the overall impact of the IndexSorter is definitely
positive: performance improvements are significant, and the overall
quality of results seems at least comparable, if not
American Jeff Bowden wrote:
Andrzej Bialecki wrote:
Hi,
I'm happy to report that further tests performed on a larger index
seem to show that the overall impact of the IndexSorter is definitely
positive: performance improvements are significant, and the overall
quality of results seems at
11 matches
Mail list logo