[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

2019-07-22 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889970#comment-16889970
 ] 

Atri Sharma commented on LUCENE-8727:
-

bq. we will have to skip all these docs with smaller doc Ids even if they have 
the same scores as docs with higher doc Ids and should be selected instead.

That should be avoidable, since we will need a custom PQ implementation anyways 
if we decided to share the queue, so the PQ can tie break the other way round 
on doc IDs. One advantage of sharing PQ is that we can skip the merge process 
during reduce call of the CollectorManager.

I am hesitant to introduce a synchronized block to the collector level 
collection mechanism -- it has a potential of blowing up in our face and 
becoming a performance bottleneck.

I am curious about if we should simply have both versions -- sharing the PQ/min 
score and the CollectorManager which allows callbacks which are invoked at 
regular intervals by the dependent Collectors. The former can work well with 
lesser number of slices, while the latter can work well with a large number of 
slices.

> IndexSearcher#search(Query,int) should operate on a shared priority queue 
> when configured with an executor
> --
>
> Key: LUCENE-8727
> URL: https://issues.apache.org/jira/browse/LUCENE-8727
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> If IndexSearcher is configured with an executor, then the top docs for each 
> slice are computed separately before being merged once the top docs for all 
> slices are computed. With block-max WAND this is a bit of a waste of 
> resources: it would be better if an increase of the min competitive score 
> could help skip non-competitive hits on every slice and not just the current 
> one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

2019-07-19 Thread Mayya Sharipova (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889181#comment-16889181
 ] 

Mayya Sharipova commented on LUCENE-8727:
-

Some comments about design option # 1.

I think we should just share  min competitive score(it could be AtomicLong or 
something) between collectors, and not the top hits.  The reason for not 
sharing top hits  is that Collectors expect leaves in [the sequential 
order|[https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollector.java#L240-L242]].
 And if it happens that we start processing leaves with higher doc Ids first in 
the executor, we may populate the global priority queue with docs with higher 
ids and set the global min competitive score to the next float. Next, when we 
process leaves with smaller doc Ids, as our global priority queue is full and 
as we use this updated global min competitive score, we will have to skip all 
these docs with smaller doc Ids even if they have the same scores as docs with 
higher doc Ids and should be selected instead. 

If all collectors have their own priority queues, they will make sure first to 
fill them to N and only after that set min competitive score. 

> IndexSearcher#search(Query,int) should operate on a shared priority queue 
> when configured with an executor
> --
>
> Key: LUCENE-8727
> URL: https://issues.apache.org/jira/browse/LUCENE-8727
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> If IndexSearcher is configured with an executor, then the top docs for each 
> slice are computed separately before being merged once the top docs for all 
> slices are computed. With block-max WAND this is a bit of a waste of 
> resources: it would be better if an increase of the min competitive score 
> could help skip non-competitive hits on every slice and not just the current 
> one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

2019-07-19 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888766#comment-16888766
 ] 

Atri Sharma commented on LUCENE-8727:
-

[~jpountz] Here are two thoughts for the implementation of same:

 

1) Shared Priority Queue: A shared priority queue which is held in parent 
CollectorManager is used by all Collectors. This flows down naturally since 
post collection of top N hits globally, the minimum competitive score can be 
increased without Collectors getting involved and further hits will be ranked 
accordingly. However, the downside is that the priority queue implementation 
will have to be synchronized, so there can be performance hit as the critical 
path of segment collection will be affected.

 

2) Alternate way can be that for N hits, each slice gets an equal number of 
prorated hits to start with (M collectors, so N/M hits). Each Collector gets a 
callback supplier which the Collector will call with the number of hits 
collected till the point and the score of the highest scoring local hit. The 
callback will return the minimum competitive hit globally seen till now, and 
the Collector will use that score to filter out remaining hits. The point in 
time when a Collector calls the callback mechanism can be relative, simplest 
being after each N/M hits. The callback will be provided by the 
CollectorManager. The downside of this approach is that there is communication 
involved between Collectors and CollectorManager, and some redundant hits can 
be collected due to the periodic callback invocation. In contrast, the shared 
priority queue mechanism allows for accurate filtering.

 

WDYT?

> IndexSearcher#search(Query,int) should operate on a shared priority queue 
> when configured with an executor
> --
>
> Key: LUCENE-8727
> URL: https://issues.apache.org/jira/browse/LUCENE-8727
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> If IndexSearcher is configured with an executor, then the top docs for each 
> slice are computed separately before being merged once the top docs for all 
> slices are computed. With block-max WAND this is a bit of a waste of 
> resources: it would be better if an increase of the min competitive score 
> could help skip non-competitive hits on every slice and not just the current 
> one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844817#comment-16844817
 ] 

Adrien Grand commented on LUCENE-8727:
--

I mentioned a shared priority queue in the description, but there might be 
other ways to do this. The main goal is that slices that get collected in 
parallel can benefit from information that is gathered in other slices.

> IndexSearcher#search(Query,int) should operate on a shared priority queue 
> when configured with an executor
> --
>
> Key: LUCENE-8727
> URL: https://issues.apache.org/jira/browse/LUCENE-8727
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> If IndexSearcher is configured with an executor, then the top docs for each 
> slice are computed separately before being merged once the top docs for all 
> slices are computed. With block-max WAND this is a bit of a waste of 
> resources: it would be better if an increase of the min competitive score 
> could help skip non-competitive hits on every slice and not just the current 
> one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

2019-05-13 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838427#comment-16838427
 ] 

Atri Sharma commented on LUCENE-8727:
-

Will take a crack at this soon

> IndexSearcher#search(Query,int) should operate on a shared priority queue 
> when configured with an executor
> --
>
> Key: LUCENE-8727
> URL: https://issues.apache.org/jira/browse/LUCENE-8727
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> If IndexSearcher is configured with an executor, then the top docs for each 
> slice are computed separately before being merged once the top docs for all 
> slices are computed. With block-max WAND this is a bit of a waste of 
> resources: it would be better if an increase of the min competitive score 
> could help skip non-competitive hits on every slice and not just the current 
> one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org