[jira] [Commented] (LUCENE-5299) Refactor Collector API for parallelism

Otis Gospodnetic (JIRA) Thu, 12 Dec 2013 19:43:35 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847122#comment-13847122
 ]


Otis Gospodnetic commented on LUCENE-5299:
------------------------------------------

bq. What's a PQ? 

PriorityQueue

bq. parallelism of Lucene Queries is in most cases not the best thing to do (if 
you have many users). It only makes sense if you have very few queries

I heard about this patch at tonight's NYC Search meetup talk by your colleague 
Gregg (http://www.meetup.com/NYC-Search-and-Discovery/events/125548572/ ) and 
immediately had the same reaction.  This parallelization should not improve 
query latency at all in deployments with many concurrent queries.  Correct?  
Does your benchmark test and show this?

If this parallelization is optional and those who choose not to use it don't 
suffer from it, then this may be a good option to have for those with 
multi-core CPUs with low query concurrency, but if that's not the case....

> Refactor Collector API for parallelism
> --------------------------------------
>
>                 Key: LUCENE-5299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5299
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Shikhar Bhushan
>         Attachments: LUCENE-5299.patch, LUCENE-5299.patch, LUCENE-5299.patch, 
> LUCENE-5299.patch, LUCENE-5299.patch, benchmarks.txt
>
>
> h2. Motivation
> We should be able to scale-up better with Solr/Lucene by utilizing multiple 
> CPU cores, and not have to resort to scaling-out by sharding (with all the 
> associated distributed system pitfalls) when the index size does not warrant 
> it.
> Presently, IndexSearcher has an optional constructor arg for an 
> ExecutorService, which gets used for searching in parallel for call paths 
> where one of the TopDocCollector's is created internally. The 
> per-atomic-reader search happens in parallel and then the 
> TopDocs/TopFieldDocs results are merged with locking around the merge bit.
> However there are some problems with this approach:
> * If arbitary Collector args come into play, we can't parallelize. Note that 
> even if ultimately results are going to a TopDocCollector it may be wrapped 
> inside e.g. a EarlyTerminatingCollector or TimeLimitingCollector or both.
> * The special-casing with parallelism baked on top does not scale, there are 
> many Collector's that could potentially lend themselves to parallelism, and 
> special-casing means the parallelization has to be re-implemented if a 
> different permutation of collectors is to be used.
> h2. Proposal
> A refactoring of collectors that allows for parallelization at the level of 
> the collection protocol. 
> Some requirements that should guide the implementation:
> * easy migration path for collectors that need to remain serial
> * the parallelization should be composable (when collectors wrap other 
> collectors)
> * allow collectors to pick the optimal solution (e.g. there might be memory 
> tradeoffs to be made) by advising the collector about whether a search will 
> be parallelized, so that the serial use-case is not penalized.
> * encourage use of non-blocking constructs and lock-free parallelism, 
> blocking is not advisable for the hot-spot of a search, besides wasting 
> pooled threads.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5299) Refactor Collector API for parallelism

Reply via email to