On Thu, Mar 1, 2012 at 8:49 AM, mark harwood <[email protected]> wrote: > I would have assumed the many int comparisons would cost less than the > superfluous disk accesses? (I bow to your considerable experience in this > area!) > What is the worst-case scenario on added disk reads? Could it be as bad > as numberOfSegments x numberOfOtherscorers before the query winds up?
Well, it depends -- the disk access is a one-time thing but the added per-hit check is per-hit. At some point it'll cross over... I think likely the advance(NO_MORE_DOCS) will not usually hit disk: our skipper impl fully pre-buffers (in RAM) the top skip lists I think? Even if we do go to disk it's likely the OS pre-cached those bytes in its IO buffer. > On the index I tried, it looked like an improvement - the spreadsheet I > linked to has the source for the benchmark on a second worksheet if you want > to give it a whirl on a different dataset. Maybe try it on a more balanced case? Ie, N high-freq terms whose freq is "close-ish"? And on slow queries (I think the results in your spreadsheet are very fast queries right? The slowest one was ~0.95 msec per query, if I'm reading it right?). In general I think not slowing down the worst-case queries is much more important that speeding up the super-fast queries. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
