No I haven't done that (to be honest, I don't know how to do that ... :-) ). That's the reason I ran both tests multiple times and reported the last run.
On Dec 10, 2007 10:24 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: > On 10-Dec-07, at 12:11 PM, Shai Erera wrote: > > > Actually, queries on large indexes are not necessarily I/O bound. > > It depends > > on how much of the posting list is being read into memory at once. > > I'm not > > that familiar with the inner-most of Lucene, but let's assume a > > posting > > element takes 4 bytes for docId and 2 more bytes per position in a > > document > > (that's without compression, I'm sure Lucene does some compression > > on the > > doc Ids). So, I think I won't miss by much by guessing that at most a > > posting element takes 10 bytes. Which means that 1M posting > > elements take > > 10MB (this is considered a very long posting list). > > Therefore if you read it into memory in chunks (16, 32, 64 KB), > > most of the > > time the query spends in the CPU, computing the scores, PQ etc. The > > real IO > > operations only involve reading fragments of the posting into > > memory. In > > todays hardware, reading 10MB into memory is pretty fast. > > So I wouldn't be surprised here (unless I misunderstood you). > > My experience is that queries against indices which haven't been > warmed into the os disk cache to be many times slower (this is > especially true if the prox file is used at all). > > I initially assumed that you had cleared the os disk cache between > the runs of the two algorithms, and were seeing a difference in > uncached query performance. I assume though from your comments that > this isn't the case at all. > > -Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Shai Erera