gortiz commented on PR #18729:
URL: https://github.com/apache/pinot/pull/18729#issuecomment-4690381750
Thanks for tackling this, @cristianpop — nice catch on the root cause.
Confirming the diagnosis is spot on: Calcite correctly folds `prop IS NULL OR
prop = 'value'` into a single `SEARCH` and encodes the null semantics in
`Sarg.nullAs`, and the bug was that `handleSearch` was reading the range set
but dropping `nullAs` on the floor. The production fix looks correct to me
across all three branches (IN / NOT IN / ranges) and all three `nullAs` values,
and it's a no-op when null handling is off, so backward compatibility is
preserved. 👍
The CI failure ("Unit Test Set 1") is coming from the new test file though —
I ran `RexExpressionUtilsTest` locally and got 3 failures that need a fix
before this can merge:
1. **`testHandleSearchNotInWithNullAsTrue`** and
**`testHandleSearchNotInWithNullAsFalse`** have inverted `AND`/`OR`
expectations. The subtlety is that `Sarg.negate()` doesn't preserve `nullAs` —
it calls `nullAs.negate()`, which flips `TRUE↔FALSE` (this is correct Calcite
behavior: negating the predicate also negates how unknown is coerced). So
`Sarg.of(TRUE, {1,2}).negate()` actually has `nullAs = FALSE`, and the
production code correctly emits `NOT IN (…) AND IS NOT NULL`. The assertions
just need to be swapped to match — and it'd be worth a one-line comment noting
that `negate()` flips `nullAs`, since it's an easy thing to trip over again.
2. **`testHandleSearchWithStringType`** throws `IllegalArgumentException`
before reaching its assertions — Calcite requires VARCHAR sarg endpoints to be
`NlsString` rather than raw `String`, so `Range.singleton("a")` blows up in
`RexLiteral` digest computation. Building the range from `NlsString` values (or
via `RexBuilder`) should fix it.
One optional suggestion: since #18726 is an end-to-end wrong-result bug, a
query-level regression test (e.g. an MSE null-handling integration test running
`… WHERE prop IS NULL OR prop = 'value'` and asserting the count) would prove
the fix all the way through planning → execution and guard against future
regressions. The unit tests are great for the translation logic, but the
integration test is what really pins the reported behavior.
Thanks again for the fix!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]