[ https://issues.apache.org/jira/browse/LUCENE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800762#comment-13800762 ]
Shikhar Bhushan commented on LUCENE-5299: ----------------------------------------- patch and benchmarks to come... > Refactor Collector API for parallelism > -------------------------------------- > > Key: LUCENE-5299 > URL: https://issues.apache.org/jira/browse/LUCENE-5299 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Shikhar Bhushan > > h2. Motivation > We should be able to scale-up better with Solr/Lucene by utilizing multiple > CPU cores, and not have to resort to scaling-out by sharding (with all the > associated distributed system pitfalls) when the index size does not warrant > it. > Presently, IndexSearcher has an optional constructor arg for an > ExecutorService, which gets used for searching in parallel for call paths > where one of the TopDocCollector's is created internally. The > per-atomic-reader search happens in parallel and then the > TopDocs/TopFieldDocs results are merged with locking around the merge bit. > However there are some problems with this approach: > * If arbitary Collector args come into play, we can't parallelize. Note that > even if ultimately results are going to a TopDocCollector it may be wrapped > inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both. > * The special-casing with parallelism baked on top does not scale, there are > many Collector's that could potentially lend themselves to parallelism, and > special-casing means the parallelization has to be re-implemented if a > different permutation of collectors is to be used. > h2. Proposal > A refactoring of collectors that allows for parallelization at the level of > the collection protocol. > Some requirements that should guide the implementation: > * easy migration path for collectors that need to remain serial > * the parallelization should be composable (when collectors wrap other > collectors) > * allow collectors to pick the optimal solution (e.g. there might be memory > tradeoffs to be made) by advising the collector about whether a search will > be parallelized, so that the serial use-case is not penalized. > * encourage use of non-blocking constructs and lock-free parallelism, > blocking is not advisable for the hot-spot of a search, besides wasting > pooled threads. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org