[
https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated LUCENE-2127:
------------------------------------
Attachment: LUCENE-2127.patch
OK, I think this has some legs, assuming I did everything right (especially the
benchmarker stuff).
Here's what I did:
1. Added postCollect() method to Collector as an empty method
2. Hooked it into IndexSearcher, MultiSearcher and ParallelMultiSearcher. I'm
not sure I have all of the search paths covered yet, but...
3. Hooked in the ability to specify the collector in benchmarker (see
collector.alg)
4. Added a new LongToEnglishContentSource and QueryMaker to create pretty much
infinitely scalable number of docs based off the English.java test util.
Prelim results (unvalidated) retrieving up to 1M records (out of 2M):
{quote}
------------> Report sum by Prefix (SearchCollector) and Round (4 about 4 out
of 8000034)
Operation round coll runCnt recsPerRun rec/s elapsedSec
avgUsedMem avgTotalMem
SearchCollector_10 0org.apache.lucene.search.PostCollectSortCollector
1 10 0.14 73.32 290,371,776 386,625,536
SearchCollector_10 - 1topDocOrdered - - 1 - - - 10 - - - 0.10 - -
98.37 - 449,582,048 - 588,189,696
SearchCollector_10 2org.apache.lucene.search.PostCollectSortCollector
1 10 0.14 71.47 964,864,512 1,016,311,808
SearchCollector_10 - 3topDocOrdered - - 1 - - - 10 - - - 0.10 - -
98.73 - 791,313,664 1,016,311,808
{quote}
Still lots to do, but wanted to put it up for people to look at and tell me
what I'm doing wrong. I'd also love to hook in FieldComparator stuff. Even if
we could have a wrapper that took in FieldComparator inside of a regular
Comparator would be cool.
> Improved large result handling
> ------------------------------
>
> Key: LUCENE-2127
> URL: https://issues.apache.org/jira/browse/LUCENE-2127
> Project: Lucene - Java
> Issue Type: New Feature
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: LUCENE-2127.patch
>
>
> Per
> http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
> it would be nice to offer some other Collectors that are better at handling
> really large number of results. This could be implemented in a variety of
> ways via Collectors. For instance, we could have a raw collector that does
> no sorting and just returns the ScoreDocs, or we could do as Mike suggests
> and have Collectors that have heuristics about memory tradeoffs and only
> heapify when appropriate.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]