Well .. I suspect this behavior is due the nature of the index - 100K docs
duplicated 10 times. Therefore at some point it hits the same documents (and
scores). Like I said, tomorrow I'll re-run the test on a 10M unique docs
index.
I agree that 80 allocations are not much, but that's per query. There were
160,000 allocations overall, which does cause some work to the GC.
Why not save those allocations?

On Dec 10, 2007 10:09 PM, Michael McCandless <[EMAIL PROTECTED]>
wrote:

>
> Shai Erera wrote:
>
> > Hi
> >
> > Well, I have results from a 1M index for now (the index contains 100K
> > documents duplicated 10 times, so it's not the final test I'll run,
> > but it
> > still shows something). I ran 2000 short queries (2.4 keywords on
> > average)
> > on a 1M docs index, after 50 queries warm-up. Following are the
> > results:
> >
> > Current TopDocCollector:
> > ------------------------------------
> > num queries: 2000
> > numDocs=1000000
> > total time: 15910 ms
> > avg time: 7.955 ms
> > avg. allocations: 79.7445
> > total allocation time: 0
> > avg. num results: 54841.26
>
> Avg number of allocations per query is ~80?  Ie, there were only ~80
> inserts in to the PQ, meaning it had very quickly accumulated the top
> scoring docs and then doesn't change much after that?  This seems
> surprisingly low?  Am I mis-reading this or something?
>
> > Modified TopDocCollector:
> > -------------------------------------
> > num queries: 2000
> > numDocs=1000000
> > total time: 15909 ms
> > avg time: 7.9545 ms
> > avg. allocations: 9.8565
> > total allocation time: 0
> > avg. num results: 54841.26
> >
> > As you can see, the actual allocation time is really negligible and
> > there
> > isn't much difference in the avg. running times of the queries.
> > However, the
> > *current* runs performed a lot worse at the beginning, before the
> > OS cache
> > warmed up.
>
> I'm also baffled by that difference, especially if we are only
> talking about 80 allocations per query to begin with.  Did you flush
> OS cache before running "Modified" to make sure it wasn't just using
> the cache?
>
> > The only significant difference is the number of allocations - the
> > modified
> > TDC and PQ allocate ~90% (!) less objects. This is significant,
> > especially
> > in heavy loaded systems.
>
> But, 80 allocations is really not very many to begin with?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
Regards,

Shai Erera

Reply via email to