On optimizing performance, does anyone know if google is exporting its entire dataset as an index or only somehow indexing the topN % (since they only show the first 1000 or so results anyway)
With this patch and a top result set in the xml file does that mean it will stop scanning the index at that point? Is there a methodology to actually prune the index on some scaling factor so that a 4 billion page index can be searchable only 1k results deep on average? seems like some sort of method to do the above would cut your search processing/index size down fairly well. But it may be a more expensive to post process to this scale then it is to simply push and let the query optimize ignore it as needed.. afterall disk space is getting rather cheap compared to cpu processing & memory. --- Doug Cutting <[EMAIL PROTECTED]> wrote: > Andrzej Bialecki wrote: > > I'm happy to report that further tests performed > on a larger index seem > > to show that the overall impact of the IndexSorter > is definitely > > positive: performance improvements are > significant, and the overall > > quality of results seems at least comparable, if > not actually better. > > Great news! > > I will submit the Lucene patches ASAP, now that we > know they're useful. > > Doug >