Actually, queries on large indexes are not necessarily I/O bound. It depends
on how much of the posting list is being read into memory at once. I'm not
that familiar with the inner-most of Lucene, but let's assume a posting
element takes 4 bytes for docId and 2 more bytes per position in a document
(that's without compression, I'm sure Lucene does some compression on the
doc Ids). So, I think I won't miss by much by guessing that at most a
posting element takes 10 bytes. Which means that 1M posting elements take
10MB (this is considered a very long posting list).
Therefore if you read it into memory in chunks (16, 32, 64 KB), most of the
time the query spends in the CPU, computing the scores, PQ etc. The real IO
operations only involve reading fragments of the posting into memory. In
todays hardware, reading 10MB into memory is pretty fast.
So I wouldn't be surprised here (unless I misunderstood you).

> I agree; this is desirable.
I will run the test with 10M documents tomorrow and then if the results are
the same will open an issue. Is that agreed?

On Dec 10, 2007 9:59 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:

> On 10-Dec-07, at 11:31 AM, Shai Erera wrote:
>
> > As you can see, the actual allocation time is really negligible and
> > there
> > isn't much difference in the avg. running times of the queries.
> > However, the
> > *current* runs performed a lot worse at the beginning, before the
> > OS cache
> > warmed up.
>
> This surprises me.  I would have thought that the query execution at
> this point would be IO-bound, hence _less_ suceptible to a little
> extra gc.
>
> > The only significant difference is the number of allocations - the
> > modified
> > TDC and PQ allocate ~90% (!) less objects. This is significant,
> > especially
> > in heavy loaded systems.
>
> I agree; this is desirable.
>
> nice work,
> -Mike
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,

Shai Erera

Reply via email to