[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
[ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889970#comment-16889970 ] Atri Sharma commented on LUCENE-8727: - bq. we will have to skip all these docs with smaller doc Ids even if they have the same scores as docs with higher doc Ids and should be selected instead. That should be avoidable, since we will need a custom PQ implementation anyways if we decided to share the queue, so the PQ can tie break the other way round on doc IDs. One advantage of sharing PQ is that we can skip the merge process during reduce call of the CollectorManager. I am hesitant to introduce a synchronized block to the collector level collection mechanism -- it has a potential of blowing up in our face and becoming a performance bottleneck. I am curious about if we should simply have both versions -- sharing the PQ/min score and the CollectorManager which allows callbacks which are invoked at regular intervals by the dependent Collectors. The former can work well with lesser number of slices, while the latter can work well with a large number of slices. > IndexSearcher#search(Query,int) should operate on a shared priority queue > when configured with an executor > -- > > Key: LUCENE-8727 > URL: https://issues.apache.org/jira/browse/LUCENE-8727 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > If IndexSearcher is configured with an executor, then the top docs for each > slice are computed separately before being merged once the top docs for all > slices are computed. With block-max WAND this is a bit of a waste of > resources: it would be better if an increase of the min competitive score > could help skip non-competitive hits on every slice and not just the current > one. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
[ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889181#comment-16889181 ] Mayya Sharipova commented on LUCENE-8727: - Some comments about design option # 1. I think we should just share min competitive score(it could be AtomicLong or something) between collectors, and not the top hits. The reason for not sharing top hits is that Collectors expect leaves in [the sequential order|[https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollector.java#L240-L242]]. And if it happens that we start processing leaves with higher doc Ids first in the executor, we may populate the global priority queue with docs with higher ids and set the global min competitive score to the next float. Next, when we process leaves with smaller doc Ids, as our global priority queue is full and as we use this updated global min competitive score, we will have to skip all these docs with smaller doc Ids even if they have the same scores as docs with higher doc Ids and should be selected instead. If all collectors have their own priority queues, they will make sure first to fill them to N and only after that set min competitive score. > IndexSearcher#search(Query,int) should operate on a shared priority queue > when configured with an executor > -- > > Key: LUCENE-8727 > URL: https://issues.apache.org/jira/browse/LUCENE-8727 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > If IndexSearcher is configured with an executor, then the top docs for each > slice are computed separately before being merged once the top docs for all > slices are computed. With block-max WAND this is a bit of a waste of > resources: it would be better if an increase of the min competitive score > could help skip non-competitive hits on every slice and not just the current > one. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
[ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888766#comment-16888766 ] Atri Sharma commented on LUCENE-8727: - [~jpountz] Here are two thoughts for the implementation of same: 1) Shared Priority Queue: A shared priority queue which is held in parent CollectorManager is used by all Collectors. This flows down naturally since post collection of top N hits globally, the minimum competitive score can be increased without Collectors getting involved and further hits will be ranked accordingly. However, the downside is that the priority queue implementation will have to be synchronized, so there can be performance hit as the critical path of segment collection will be affected. 2) Alternate way can be that for N hits, each slice gets an equal number of prorated hits to start with (M collectors, so N/M hits). Each Collector gets a callback supplier which the Collector will call with the number of hits collected till the point and the score of the highest scoring local hit. The callback will return the minimum competitive hit globally seen till now, and the Collector will use that score to filter out remaining hits. The point in time when a Collector calls the callback mechanism can be relative, simplest being after each N/M hits. The callback will be provided by the CollectorManager. The downside of this approach is that there is communication involved between Collectors and CollectorManager, and some redundant hits can be collected due to the periodic callback invocation. In contrast, the shared priority queue mechanism allows for accurate filtering. WDYT? > IndexSearcher#search(Query,int) should operate on a shared priority queue > when configured with an executor > -- > > Key: LUCENE-8727 > URL: https://issues.apache.org/jira/browse/LUCENE-8727 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > If IndexSearcher is configured with an executor, then the top docs for each > slice are computed separately before being merged once the top docs for all > slices are computed. With block-max WAND this is a bit of a waste of > resources: it would be better if an increase of the min competitive score > could help skip non-competitive hits on every slice and not just the current > one. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
[ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844817#comment-16844817 ] Adrien Grand commented on LUCENE-8727: -- I mentioned a shared priority queue in the description, but there might be other ways to do this. The main goal is that slices that get collected in parallel can benefit from information that is gathered in other slices. > IndexSearcher#search(Query,int) should operate on a shared priority queue > when configured with an executor > -- > > Key: LUCENE-8727 > URL: https://issues.apache.org/jira/browse/LUCENE-8727 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > If IndexSearcher is configured with an executor, then the top docs for each > slice are computed separately before being merged once the top docs for all > slices are computed. With block-max WAND this is a bit of a waste of > resources: it would be better if an increase of the min competitive score > could help skip non-competitive hits on every slice and not just the current > one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
[ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838427#comment-16838427 ] Atri Sharma commented on LUCENE-8727: - Will take a crack at this soon > IndexSearcher#search(Query,int) should operate on a shared priority queue > when configured with an executor > -- > > Key: LUCENE-8727 > URL: https://issues.apache.org/jira/browse/LUCENE-8727 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > If IndexSearcher is configured with an executor, then the top docs for each > slice are computed separately before being merged once the top docs for all > slices are computed. With block-max WAND this is a bit of a waste of > resources: it would be better if an increase of the min competitive score > could help skip non-competitive hits on every slice and not just the current > one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org