Re: Performance Improvement for Search using PriorityQueue

Mike Klaas Mon, 10 Dec 2007 12:25:20 -0800

On 10-Dec-07, at 12:11 PM, Shai Erera wrote:

Actually, queries on large indexes are not necessarily I/O bound.It dependson how much of the posting list is being read into memory at once.I'm notthat familiar with the inner-most of Lucene, but let's assume apostingelement takes 4 bytes for docId and 2 more bytes per position in adocument(that's without compression, I'm sure Lucene does some compressionon the
doc Ids). So, I think I won't miss by much by guessing that at most a
posting element takes 10 bytes. Which means that 1M postingelements take
10MB (this is considered a very long posting list).
Therefore if you read it into memory in chunks (16, 32, 64 KB),most of thetime the query spends in the CPU, computing the scores, PQ etc. Thereal IOoperations only involve reading fragments of the posting intomemory. In
todays hardware, reading 10MB into memory is pretty fast.
So I wouldn't be surprised here (unless I misunderstood you).

My experience is that queries against indices which haven't beenwarmed into the os disk cache to be many times slower (this isespecially true if the prox file is used at all).

I initially assumed that you had cleared the os disk cache betweenthe runs of the two algorithms, and were seeing a difference inuncached query performance. I assume though from your comments thatthis isn't the case at all.


-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance Improvement for Search using PriorityQueue

Reply via email to