[ 
https://issues.apache.org/jira/browse/LUCENE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2271:
----------------------------------

    Attachment:     (was: LUCENE-2271.patch)

> Function queries producing scores of -inf or NaN (e.g. 1/x) return incorrect 
> results with TopScoreDocCollector
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2271
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2271
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Priority: Minor
>             Fix For: 2.9.2, 3.0.1, 3.1
>
>         Attachments: LUCENE-2271.patch, TSDC.patch
>
>
> This is a foolowup to LUCENE-2270, where a part of this problem was fixed 
> (boost = 0 leading to NaN scores, which is also un-intuitive), but in 
> general, function queries in Solr can create these invalid scores easily. In 
> previous version of Lucene these scores ordered correct (except NaN, which 
> mixes up results), but never invalid document ids are returned (like 
> Integer.MAX_VALUE).
> The problem is: TopScoreDocCollector pre-fills the HitQueue with sentinel 
> ScoreDocs with a score of -inf and a doc id of Integer.MAX_VALUE. For the HQ 
> to work, this sentinel must be smaller than all posible values, which is not 
> the case:
> - -inf is equal and the document is not inserted into the HQ, as not 
> competitive, but the HQ is not yet full, so the sentinel values keep in the 
> HQ and result is the Integer.MAX_VALUE docs. This problem is solveable (and 
> only affects the Ordered collector) by chaning the exit condition to:
> {code}
> if (score <= pqTop.score && pqTop.doc != Integer.MAX_VALUE) {
>     // Since docs are returned in-order (i.e., increasing doc Id), a document
>     // with equal score to pqTop.score cannot compete since HitQueue favors
>     // documents with lower doc Ids. Therefore reject those docs too.
>     return;
> }
> {code}
> - The NaN case can be fixed in the same way, but then has another problem: 
> all comparisons with NaN result in false (none of these is true): x < NaN, x 
> > NaN, NaN == NaN. This leads to the fact that HQ's lessThan always returns 
> false, leading to unexspected ordering in the PQ and sometimes the sentinel 
> values do not stay at the top of the queue. A later hit then overrides the 
> top of the queue but leaves the incorrect sentinels  unchanged -> invalid 
> results. This can be fixed in two ways in HQ:
> Force all sentinels to the top:
> {code}
> protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
>     if (hitA.doc == Integer.MAX_VALUE)
>       return true;
>     if (hitB.doc == Integer.MAX_VALUE)
>       return false;
>     if (hitA.score == hitB.score)
>       return hitA.doc > hitB.doc; 
>     else
>       return hitA.score < hitB.score;
> }
> {code}
> or alternatively have a defined order for NaN (Float.compare sorts them after 
> +inf):
> {code}
> protected final boolean lessThan(ScoreDoc hitA, ScoreDoc hitB) {
>     if (hitA.score == hitB.score)
>       return hitA.doc > hitB.doc; 
>     else
>       return Float.compare(hitA.score, hitB.score) < 0;
> }
> {code}
> The problem with both solutions is, that we have now more comparisons per hit 
> and the use of sentinels is questionable. I would like to remove the 
> sentinels and use the old pre 2.9 code for comparing and using PQ.add() when 
> a competitive hit arrives. The order of NaN would be unspecified.
> To fix the order of NaN, it would be better to replace all score comparisons 
> by Float.compare() [also in FieldComparator].
> I would like to delay 2.9.2 and 3.0.1 until this problem is discussed and 
> solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to