Hi Back from the experiments lab with more results. I've used two indexes (1 and 10 million documents) and ran over the two 2000 queries. Each run was executed 4 times and I paste here the average of the latest 3 (to eliminate any caching that is done by the OS and to mimic systems that are already working and therefore have some data in the OS cache). Following are the results:
Current TopDocCollector + PQ -------------------------------------------- Index Size 1M 10M Avg. Time 8.519ms 289.232ms Avg. Allocations 77.38 97.35 Avg. # results 51,113 461,019 Modified TopDocCollector + PQ ---------------------------------------------- Index Size 1M 10M Avg. Time 9.619ms 298.197ms Avg. Allocations 9.92 10.12 Avg. # results 51,113 461,019 Basically the results haven't changed from yesterday. There isn't any significant difference in the execution time of both versions. The only difference is the number of allocations. Although the number of allocations is very small (100 for 461,000 results), I think it should not be neglected. On systems that rely solely on memory (such as powerful systems that are able to keep entire indexes in-memory), the number of object allocations may be significant. The way I see it we can do either of the following: 1. Add the method to PQ and change TDC implementation to reuse ScoreDocs. We gain only in the number of allocations. Basically, we don't lose anything by doing that, we only gain. 2. Add the method to PQ for applications that require it and not change TDC's implementation. For example, applications that want to show the 10 most recent documents from a very large collection need to run a MatchAllDocsQuery with some sorting. They may create a lot more instances of ScoreDoc. 3. Do nothing. If you think I should run more tests, please let me know - I already have the two indexes and any further tests can be performed quite immediately. Thanks, Shai On Dec 10, 2007 11:46 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: > On 10-Dec-07, at 1:20 PM, Shai Erera wrote: > > > Thanks for the info. Too bad I use Windows ... > > Just allocate a bunch of memory and free it. This linux, but > something similar should work on windows: > > $ vmstat -S M > procs -----------memory---------- > r b swpd free buff cache > 0 0 0 45 372 786 > > $ python -c '"a"*2000000000' > > $ vmstat -S M > procs -----------memory---------- > r b swpd free buff cache > 0 0 463 1761 0 6 > > -Mike > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Regards, Shai Erera