Re: Fast access to a random page of the search results.

Doug Cutting Tue, 01 Mar 2005 14:04:21 -0800

Daniel Naber wrote:

After fixing this I can reproduce the problem with a local index that contains about 220.000 documents (700MB). Fetching the first document takes for example 30ms, fetching the last one takes >100ms. Of course I tested this with a query that returns many results (about 50.000). Actually it happens even with the default sorting, no need to sort by some specific field.

In part this is due to the fact that Hits first searches for the top-scoring 100 documents. Then, if you ask for a hit after that, it must re-query. In part this is also due to the fact that maintaining a queue of the top 50k hits is more expensive than maintaining a queue of the top 100 hits, so the second query is slower. And in part this could be caused by other things, such as that the highest ranking document might tend to be cached and not require disk io.

One could perform profiling to determine which is the largest factor. Of these, only the first is really fixable: if you know you'll need hit 50k then you could tell this to Hits and have it perform only a single query. But the algorithmic cost of keeping the queue of the top 50k is the same as collecting all the hits and sorting them. So, in part, getting hits 49,990 through 50,000 is inherently slower than getting hits 0-10. We can minimize that, but not eliminate it.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Fast access to a random page of the search results.

Reply via email to