[ https://issues.apache.org/jira/browse/LUCENE-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755069#comment-15755069 ]
Emmanuel Keller commented on LUCENE-7588: ----------------------------------------- bq. Can you add a minimal javadocs to ParallelDrillSideways, and include @lucene.experimental? Done. bq. Can you fix the indent to 2 spaces, and change your IDE to not use wildcard imports? (Most of the new classes seem to do so, but at least one didn't). Or we can fix this up before pushing... Done. bq. Should CallableCollector be renamed to CallableCollectorManager? True, done. bq. I assume you're using this for your QWAZR search server built on lucene (https://github.com/qwazr/QWAZR)? Thank you for giving back! With pleasure. I think there is few more contributions to come... bq. There are quite a few new abstractions here, MultiCollectorManager, FacetsCollectorManager; must they be public? Can you explain what they do? MultiCollectorManager do with CollectorManager what MultiCollector do with Collector. It wraps a set of CollectorManager as it was only one. {quote} It seems like this change opens up concurrency in 2 ways; the first way is it uses the IndexSearcher.search API that takes a CollectorManager such that if you had created that IndexSearcher with an executor, you get concurrency across the segments in the index. In general I'm not a huge fan of this concurrency since you are at the whim of how the segments are structured, and, confusingly, running forceMerge(1) on your index removes all concurrency. But it's better than nothing: progress not perfection! {quote} I agree. That's a first step. {quote} The second way is that the new ParallelDrillSideways takes its own executor and then runs the N DrillDown queries concurrently (to compute the sideways counts), which is very different from the current doc-at-a-time computation. Have you compared the performance, using a single thread? ... I'm curious how "doc at a time" vs "query at a time" (which is also Solr's approach) compare. But, still, the fact that this "query at a time" approach enables concurrency is a big win. {quote} I am working on providing a benchmark. What is the good practice for Lucene ? It it okay to provide a benchmark as a test case ? {quote} I wonder if we could absorb ParallelDrillSideways under DrillSideways such that if you pass an executor it uses the concurrent implementation? It's really an implementation/execution detail I think? Similar to how IndexSearcher takes an optional executor. {quote} I agree. I think that it is the way it should be. I don't wanted to be too intrusive. > A parallel DrillSideways implementation > --------------------------------------- > > Key: LUCENE-7588 > URL: https://issues.apache.org/jira/browse/LUCENE-7588 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: master (7.0), 6.3.1 > Reporter: Emmanuel Keller > Priority: Minor > Labels: facet, faceting > Fix For: master (7.0), 6.3.1 > > > Currently DrillSideways implementation is based on the single threaded > IndexSearcher.search(Query query, Collector results). > On large document set, the single threaded collection can be really slow. > The ParallelDrillSideways implementation could: > 1. Use the CollectionManager based method IndexSearcher.search(Query query, > CollectorManager collectorManager) to get the benefits of multithreading on > index segments, > 2. Compute each DrillSideway subquery on a single thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org