[
https://issues.apache.org/jira/browse/LUCENE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847146#comment-13847146
]
Shikhar Bhushan commented on LUCENE-5299:
-----------------------------------------
Thanks for your comments Otis. I have certainly run into the situation of not
seeing improvements when there is a higher degree of concurrency of search
requests. So I want to try to pin down the associated costs (cost of merge,
blocking operations, context switching, number/size of segments, etc.)
I think this could have real-world applicability, but I don't have evidence yet
in terms of a high query concurrency benchmark. Let's take as an example a
32-core server that serves 100 QPS at an average latency of 100ms. You'd expect
10 search tasks/threads to be active on average. So in theory you have 22 cores
available for helping out with the search.
> If this parallelization is optional and those who choose not to use it don't
> suffer from it, then this may be a good option to have for those with
> multi-core CPUs with low query concurrency, but if that's not the case....
It is optional and it is possible for parallelizable collectors to be written
in a way that does not penalize the serial use case. E.g. the modifications to
{{TopScoreDocCollector}} use a single {{PriorityQueue}} in the serial case, and
a {{PriorityQueue}} for each {{AtomicReaderContext}} + 1 for the final merge in
case parallelism is used. In the lucene-util benchmarks I ran I did not see a
penalty on serial search with the patch.
> Refactor Collector API for parallelism
> --------------------------------------
>
> Key: LUCENE-5299
> URL: https://issues.apache.org/jira/browse/LUCENE-5299
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Shikhar Bhushan
> Attachments: LUCENE-5299.patch, LUCENE-5299.patch, LUCENE-5299.patch,
> LUCENE-5299.patch, LUCENE-5299.patch, benchmarks.txt
>
>
> h2. Motivation
> We should be able to scale-up better with Solr/Lucene by utilizing multiple
> CPU cores, and not have to resort to scaling-out by sharding (with all the
> associated distributed system pitfalls) when the index size does not warrant
> it.
> Presently, IndexSearcher has an optional constructor arg for an
> ExecutorService, which gets used for searching in parallel for call paths
> where one of the TopDocCollector's is created internally. The
> per-atomic-reader search happens in parallel and then the
> TopDocs/TopFieldDocs results are merged with locking around the merge bit.
> However there are some problems with this approach:
> * If arbitary Collector args come into play, we can't parallelize. Note that
> even if ultimately results are going to a TopDocCollector it may be wrapped
> inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both.
> * The special-casing with parallelism baked on top does not scale, there are
> many Collector's that could potentially lend themselves to parallelism, and
> special-casing means the parallelization has to be re-implemented if a
> different permutation of collectors is to be used.
> h2. Proposal
> A refactoring of collectors that allows for parallelization at the level of
> the collection protocol.
> Some requirements that should guide the implementation:
> * easy migration path for collectors that need to remain serial
> * the parallelization should be composable (when collectors wrap other
> collectors)
> * allow collectors to pick the optimal solution (e.g. there might be memory
> tradeoffs to be made) by advising the collector about whether a search will
> be parallelized, so that the serial use-case is not penalized.
> * encourage use of non-blocking constructs and lock-free parallelism,
> blocking is not advisable for the hot-spot of a search, besides wasting
> pooled threads.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]