Re: Performance Improvement for Search using PriorityQueue

Shai Erera Mon, 10 Dec 2007 12:30:46 -0800

No I haven't done that (to be honest, I don't know how to do that ... :-) ).
That's the reason I ran both tests multiple times and reported the last run.


On Dec 10, 2007 10:24 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:

> On 10-Dec-07, at 12:11 PM, Shai Erera wrote:
>
> > Actually, queries on large indexes are not necessarily I/O bound.
> > It depends
> > on how much of the posting list is being read into memory at once.
> > I'm not
> > that familiar with the inner-most of Lucene, but let's assume a
> > posting
> > element takes 4 bytes for docId and 2 more bytes per position in a
> > document
> > (that's without compression, I'm sure Lucene does some compression
> > on the
> > doc Ids). So, I think I won't miss by much by guessing that at most a
> > posting element takes 10 bytes. Which means that 1M posting
> > elements take
> > 10MB (this is considered a very long posting list).
> > Therefore if you read it into memory in chunks (16, 32, 64 KB),
> > most of the
> > time the query spends in the CPU, computing the scores, PQ etc. The
> > real IO
> > operations only involve reading fragments of the posting into
> > memory. In
> > todays hardware, reading 10MB into memory is pretty fast.
> > So I wouldn't be surprised here (unless I misunderstood you).
>
> My experience is that queries against indices which haven't been
> warmed into the os disk cache to be many times slower (this is
> especially true if the prox file is used at all).
>
> I initially assumed that you had cleared the os disk cache between
> the runs of the two algorithms, and were seeing a difference in
> uncached query performance.  I assume though from your comments that
> this isn't the case at all.
>
> -Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Reply via email to