On Thu, Mar 1, 2012 at 8:49 AM, mark harwood <[email protected]> wrote:
> I would have assumed the many int comparisons would cost less than the 
> superfluous disk accesses? (I bow to your considerable experience in this 
> area!)
> What is the worst-case scenario on added disk reads? Could it be as bad 
> as numberOfSegments x numberOfOtherscorers before the query winds up?

Well, it depends -- the disk access is a one-time thing but the added
per-hit check is per-hit.  At some point it'll cross over...

I think likely the advance(NO_MORE_DOCS) will not usually hit disk:
our skipper impl fully pre-buffers (in RAM) the top skip lists I
think?  Even if we do go to disk it's likely the OS pre-cached those
bytes in its IO buffer.

> On the index I tried, it looked like an improvement - the spreadsheet I 
> linked to has the source for the benchmark on a second worksheet if you want 
> to give it a whirl on a different dataset.

Maybe try it on a more balanced case?  Ie, N high-freq terms whose
freq is "close-ish"?  And on slow queries (I think the results in your
spreadsheet are very fast queries right?  The slowest one was ~0.95
msec per query, if I'm reading it right?).

In general I think not slowing down the worst-case queries is much
more important that speeding up the super-fast queries.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to