Re: [PR] Fix LogOddsFusionQuery.rewrite() to filter out MatchNoDocsQuery clauses [lucene]

via GitHub Fri, 29 May 2026 07:05:49 -0700


jaepil commented on PR #16106:
URL: https://github.com/apache/lucene/pull/16106#issuecomment-4576068603


   Confirmed -- the diagnosis in the issue is accurate and this is the right 
fix. A short summary for other reviewers, since the effect is subtle.
   
   **What the dead clause does**
   
   `MatchNoDocsQuery` matches no documents (its `ScorerSupplier` is null), so 
it never contributes to `logitSum`, yet it was still counted in `totalClauses`. 
In the uniform-weight path:
   
   ```
   score = sigmoid( (logitSum / N) * N^alpha )
         = sigmoid( logitSum * N^(alpha - 1) )      // N = totalClauses
   ```
   
   A dead clause raises `N` without changing `logitSum`. For `alpha < 1` 
(default 0.5), `N^(alpha - 1)` shrinks as `N` grows, so every matching doc's 
score is pulled toward `sigmoid(0) = 0.5` -- exactly the dilution described in 
the issue.
   
   **Ranking vs. calibration (why it is worth fixing)**
   
   This can look like a no-op: within a single query `N` is a constant shared 
by every doc, so the ordering is preserved and single-query top-K does not 
change. What changes is the absolute score, which matters whenever it is 
consumed as a value rather than an order: nested in a larger scoring query 
(`BooleanQuery`, another fusion layer, a KNN clause), compared against a 
min-score / probability threshold, or compared across queries. Those are the 
hybrid-search cases this query exists for, and an inflated `N` silently biases 
them away from a score that stays interpretable as a probability.
   
   **Weight re-normalization is correct**
   
   Re-normalizing the survivors to sum to 1.0 is not only to satisfy the 
constructor check -- it is semantically right. The dead clause's weight mass 
was otherwise lost (active weights summed to < 1), under-counting the live 
signals. Re-normalizing restores the intended relative reliabilities (e.g. 
`{0.4, 0.4}` -> `{0.5, 0.5}`) and the sum-to-1 invariant the scorer assumes.
   
   Returning the bare clause for a single survivor also matches the existing 
`clauses.size() == 1` unwrapping, the filtered query reaches a rewrite 
fixpoint, and this mirrors the `MatchNoDocsQuery` filtering already in 
`BooleanQuery` / `DisjunctionMaxQuery`.
   
   The new tests pin the fix structurally (rewritten query `equals` the clean 
N-1 clause version). I would recommend adding a scoring test as well: fuse a 
query with a dead clause inside an outer `BooleanQuery` and assert the score 
equals the dead-clause-free version. Since single-query ranking is unaffected, 
the calibration behavior is the part most likely to regress silently, so a 
direct score assertion is worth having.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Fix LogOddsFusionQuery.rewrite() to filter out MatchNoDocsQuery clauses [lucene]

Reply via email to