[ 
https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861835#comment-16861835
 ] 

Atri Sharma commented on LUCENE-8829:
-------------------------------------

[~simonw] Essentially, allow users to pass in the tie breaker parameter?

While I do agree with the idea, I believe that having a generic tie breaking 
algorithm that works for both the cases would be a clean interface and could 
reduce the number of parameters users have to tweak. For eg, if the user were 
to give setShardIndex =true and ScoreDoc::shardIndex for a case when all hits 
belong to the same shard, we would run into the same issue of non consistency 
in order of results (unless I misunderstood your proposal?)

> TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved
> -----------------------------------------------------------------
>
>                 Key: LUCENE-8829
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8829
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch, 
> LUCENE-8829.patch
>
>
> While investigating LUCENE-8819, I understood that TopDocs#merge's order of 
> results are indirectly dependent on the number of collectors involved in the 
> merge. This is troubling because 1) The number of collectors involved in a 
> merge are cost based and directly dependent on the number of slices created 
> for the parallel searcher case. 2) TopN hits code path will invoke merge with 
> a single Collector, so essentially, doing the same TopN query with single 
> threaded and parallel threaded searcher will invoke different order of 
> results, which is a bad invariant that breaks.
>  
> The reason why this happens is because of the subtle way TopDocs#merge sets 
> shardIndex in the ScoreDoc population during populating the priority queue 
> used for merging. ShardIndex is essentially set to the ordinal of the 
> collector which generates the hit. This means that the shardIndex is 
> dependent on the number of collectors, even for the same set of hits.
>  
> In case of no sort order specified, shardIndex is used for tie breaking when 
> scores are equal. This translates to different orders for same hits with 
> different shardIndices.
>  
> I propose that we remove shardIndex from the default tie breaking mechanism 
> and replace it with docID. DocID order is the de facto that is expected 
> during collection, so it might make sense to use the same factor during tie 
> breaking when scores are the same.
>  
> CC: [~ivera]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to