Re: Performance Improvement for Search using PriorityQueue

Shai Erera Tue, 11 Dec 2007 03:21:00 -0800

Hi

Back from the experiments lab with more results. I've used two indexes (1
and 10 million documents) and ran over the two 2000 queries. Each run was
executed 4 times and I paste here the average of the latest 3 (to eliminate
any caching that is done by the OS and to mimic systems that are already
working and therefore have some data in the OS cache). Following are the
results:

Current TopDocCollector + PQ
--------------------------------------------
Index Size         1M              10M
Avg. Time           8.519ms     289.232ms
Avg. Allocations  77.38          97.35
Avg. # results      51,113        461,019

Modified TopDocCollector + PQ
----------------------------------------------
Index Size         1M              10M
Avg. Time           9.619ms     298.197ms
Avg. Allocations  9.92           10.12
Avg. # results      51,113        461,019

Basically the results haven't changed from yesterday. There isn't any
significant difference in the execution time of both versions. The only
difference is the number of allocations.
Although the number of allocations is very small (100 for 461,000 results),
I think it should not be neglected. On systems that rely solely on memory
(such as powerful systems that are able to keep entire indexes in-memory),
the number of object allocations may be significant.

The way I see it we can do either of the following:
1. Add the method to PQ and change TDC implementation to reuse ScoreDocs. We
gain only in the number of allocations. Basically, we don't lose anything by
doing that, we only gain.
2. Add the method to PQ for applications that require it and not change
TDC's implementation. For example, applications that want to show the 10
most recent documents from a very large collection need to run a
MatchAllDocsQuery with some sorting. They may create a lot more instances of
ScoreDoc.
3. Do nothing.

If you think I should run more tests, please let me know - I already have
the two indexes and any further tests can be performed quite immediately.

Thanks,

Shai

On Dec 10, 2007 11:46 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:

> On 10-Dec-07, at 1:20 PM, Shai Erera wrote:
>
> > Thanks for the info. Too bad I use Windows ...
>
> Just allocate a bunch of memory and free it.  This linux, but
> something similar should work on windows:
>
> $ vmstat -S M
> procs -----------memory----------
> r  b   swpd   free   buff  cache
> 0  0      0     45    372    786
>
> $ python -c '"a"*2000000000'
>
> $ vmstat -S M
> procs -----------memory----------
> r  b   swpd   free   buff  cache
> 0  0    463   1761      0      6
>
> -Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

-- 
Regards,

Shai Erera

Re: Performance Improvement for Search using PriorityQueue

Reply via email to