On optimizing performance, does anyone know if google
is exporting its entire dataset as an index or only
somehow indexing the topN % (since they only show the
first 1000 or so results anyway)

With this patch and a top result set in the xml file
does that mean it will stop scanning the index at that
point?  Is there a methodology to actually prune the
index on some scaling factor so that a  4 billion page
index can be searchable only 1k results deep on
average?

seems like some sort of method to do the above would
cut your search processing/index size down fairly
well. But it may be a more expensive to post process
to this scale then it is to simply push and let the
query optimize ignore it as needed.. afterall disk
space is getting rather cheap compared to cpu
processing & memory.



--- Doug Cutting <[EMAIL PROTECTED]> wrote:

> Andrzej Bialecki wrote:
> > I'm happy to report that further tests performed
> on a larger index seem 
> > to show that the overall impact of the IndexSorter
> is definitely 
> > positive: performance improvements are
> significant, and the overall 
> > quality of results seems at least comparable, if
> not actually better.
> 
> Great news!
> 
> I will submit the Lucene patches ASAP, now that we
> know they're useful.
> 
> Doug
> 

Reply via email to