[jira] [Commented] (LUCENE-7588) A parallel DrillSideways implementation

Emmanuel Keller (JIRA) Fri, 16 Dec 2016 09:54:24 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755069#comment-15755069
 ]


Emmanuel Keller commented on LUCENE-7588:
-----------------------------------------

bq. Can you add a minimal javadocs to ParallelDrillSideways, and include 
@lucene.experimental?

Done.

bq. Can you fix the indent to 2 spaces, and change your IDE to not use
wildcard imports? (Most of the new classes seem to do so, but at
least one didn't). Or we can fix this up before pushing...

Done.

bq. Should CallableCollector be renamed to CallableCollectorManager?

True, done.

bq. I assume you're using this for your QWAZR search server built on lucene 
(https://github.com/qwazr/QWAZR)? Thank you for giving back!

With pleasure. I think there is few more contributions to come...

bq. There are quite a few new abstractions here, MultiCollectorManager, 
FacetsCollectorManager; must they be public? Can you explain what they do?

MultiCollectorManager do with CollectorManager what MultiCollector do with 
Collector. It wraps a set of CollectorManager as it was only one.

{quote}
It seems like this change opens up concurrency in 2 ways; the first
way is it uses the IndexSearcher.search API that takes a
CollectorManager such that if you had created that
IndexSearcher with an executor, you get concurrency across the
segments in the index. In general I'm not a huge fan of this
concurrency since you are at the whim of how the segments are
structured, and, confusingly, running forceMerge(1) on your index
removes all concurrency. But it's better than nothing: progress not
perfection!
{quote}

I agree. That's a first step.

{quote}
The second way is that the new ParallelDrillSideways takes its own
executor and then runs the N DrillDown queries concurrently (to
compute the sideways counts), which is very different from the current
doc-at-a-time computation. Have you compared the performance, using a
single thread? ... I'm curious how "doc at a time" vs "query at a
time" (which is also Solr's approach) compare. But, still, the fact
that this "query at a time" approach enables concurrency is a big win.
{quote}

I am working on providing a benchmark. What is the good practice for Lucene ? 
It it okay to provide a benchmark as a test case ?

{quote}
I wonder if we could absorb ParallelDrillSideways under
DrillSideways such that if you pass an executor it uses the
concurrent implementation? It's really an implementation/execution
detail I think? Similar to how IndexSearcher takes an optional
executor.
{quote}

I agree. I think that it is the way it should be. I don't wanted to be too 
intrusive.

> A parallel DrillSideways implementation
> ---------------------------------------
>
>                 Key: LUCENE-7588
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7588
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: master (7.0), 6.3.1
>            Reporter: Emmanuel Keller
>            Priority: Minor
>              Labels: facet, faceting
>             Fix For: master (7.0), 6.3.1
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7588) A parallel DrillSideways implementation

Reply via email to