iprithv commented on issue #16137: URL: https://github.com/apache/lucene/issues/16137#issuecomment-4567414301
@romseygeek ah got it. sharing what I found while digging into this. the negative score is actually just `-0.0`, coming from `-Math.log(1.0)`. this happens when `(pow - lambda) / (1 - lambda)` becomes exactly `1.0`. one common case is when `tfn = 0`. then `q = 0`, so `lambda^0 = 1.0`, and the fraction becomes `(1 - lambda)/(1 - lambda) = 1.0`, so we end up with `-log(1.0) = -0.0`. the `nextUp` case can also hit this — if lambda is very close to 1, it can push pow to exactly `1.0` and same thing happens. the bigger issue is the `pow - lambda` subtraction. when the values are close, that subtraction loses precision (catastrophic cancellation), so things get unstable. the current `nextUp`/`nextDown` guards are a workaround but don't fully cover it. I tried a few options: 1. rewriting using `Math.expm1` to avoid that subtraction — since `lambda^q - lambda = lambda * expm1((q-1) * log(lambda))`, the score becomes `-log(lambda) - log(|expm1(-r * log(lambda))|) + log(|1 - lambda|)`. this makes it much more stable and removes the need for the guards. worked well in most cases, though still saw tiny negatives (~1e-15) in some edge cases 2. just clamping to 0, `Axiomatic` already does `Math.max(0, score)` for the same kind of reason (its gamma component can produce negatives). simple but doesn't fix the underlying instability 3. combining both, use the stable formula and still clamp small negatives as a safety net I now think the combined approach is probably the safest, will wait for feedbacks and will change PR accordingly :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
