[ 
https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752528#comment-15752528
 ] 

Michael McCandless commented on LUCENE-7588:
--------------------------------------------

Thanks [~ekeller]: this is an impressive change!

Can you add a minimal javadocs to {{ParallelDrillSideways}}, and include 
{{\@lucene.experimental}}?

Can you fix the indent to 2 spaces, and change your IDE to not use
wildcard imports?  (Most of the new classes seem to do so, but at
least one didn't).  Or we can fix this up before pushing...

Should {{CallableCollector}} be renamed to {{CallableCollectorManager}}?

I assume you're using this for your QWAZR search server built on lucene 
(https://github.com/qwazr/QWAZR)?  Thank you for giving back!

There are quite a few new abstractions here,
{{MultiCollectorManager}}, {{FacetsCollectorManager}}; must they be
public?  Can you explain what they do?

It seems like this change opens up concurrency in 2 ways; the first
way is it uses the {{IndexSearcher.search}} API that takes a
{{CollectorManager}} such that if you had created that
{{IndexSearcher}} with an executor, you get concurrency across the
segments in the index.  In general I'm not a huge fan of this
concurrency since you are at the whim of how the segments are
structured, and, confusingly, running {{forceMerge(1)}} on your index
removes all concurrency.  But it's better than nothing: progress not
perfection!

The second way is that the new {{ParallelDrillSideways}} takes its own
executor and then runs the N {{DrillDown}} queries concurrently (to
compute the sideways counts), which is very different from the current
doc-at-a-time computation.  Have you compared the performance, using a
single thread? ... I'm curious how "doc at a time" vs "query at a
time" (which is also Solr's approach) compare.  But, still, the fact
that this "query at a time" approach enables concurrency is a big win.

I wonder if we could absorb {{ParallelDrillSideways}} under
{{DrillSideways}} such that if you pass an executor it uses the
concurrent implementation?  It's really an implementation/execution
detail I think?  Similar to how {{IndexSearcher}} takes an optional
executor.


> A parallel DrillSideways implementation
> ---------------------------------------
>
>                 Key: LUCENE-7588
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7588
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: master (7.0), 6.3.1
>            Reporter: Emmanuel Keller
>            Priority: Minor
>              Labels: facet, faceting
>             Fix For: master (7.0), 6.3.1
>
>         Attachments: LUCENE-7588.patch
>
>
> Currently DrillSideways implementation is based on the single threaded 
> IndexSearcher.search(Query query, Collector results).
> On large document set, the single threaded collection can be really slow.
> The ParallelDrillSideways implementation could:
> 1. Use the CollectionManager based method IndexSearcher.search(Query query, 
> CollectorManager collectorManager)  to get the benefits of multithreading on 
> index segments,
> 2. Compute each DrillSideway subquery on a single thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to