mbutrovich commented on PR #2351:
URL: https://github.com/apache/iceberg-rust/pull/2351#issuecomment-4284597538

   Thanks for catching this, @xanderbailey! The `always_true`/`always_false` 
stubs were a real correctness gap, and the fix is well-structured — mirrors the 
`is_null`/`not_null` pattern nicely.
   
   A few things I noticed:
   
   ## NULL propagation semantics
   
   `BooleanArray::from_unary` preserves the null bitmap from the input array 
([source](https://github.com/apache/arrow-rs/blob/main/arrow-array/src/array/boolean_array.rs#L326)).
 This means for a NULL input value, the output is NULL rather than a concrete 
boolean.
   
   The Iceberg spec (and the Java implementation in 
[`NaNUtil.isNaN`](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/util/NaNUtil.java#L25-L37)
 + 
[`Evaluator`](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/Evaluator.java#L95-L101))
 treats these as non-null-propagating:
   
   | Input | `is_nan` | `not_nan` |
   |-------|----------|-----------|
   | NaN   | true     | false     |
   | value | false    | true      |
   | NULL  | **false** | **true** |
   
   With null propagation, `not_nan(NULL)` produces NULL, which gets filtered 
out. But per spec it should produce **true**, meaning NULL rows should pass 
through a `not_nan` filter. This matches Spark's behavior too — `IsNaN` 
[declares `nullable = 
false`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L236)
 and returns `false` for NULL input.
   
   One approach: build the array without null propagation, e.g. iterate and 
treat null slots explicitly as "not NaN". Or apply `from_unary` and then 
override null positions afterward.
   
   ## Non-float fallback
   
   The `_ => Ok(BooleanArray::from(vec![false; array.len()]))` arm silently 
produces all-false for non-float types. Since the Iceberg type system only 
allows `float` and `double`, and `is_nan` can only bind against those types, 
this branch should be unreachable. The codebase uses `unreachable!` elsewhere 
for similar invariants (e.g., `crates/iceberg/src/arrow/schema.rs`). Might be 
worth using `unreachable!("is_nan is only valid for float types")` here to make 
the invariant explicit.
   
   ## Test assertions
   
   The test currently spot-checks indices 0, 1, and 3 but skips the NULL at 
index 2. It might be worth asserting the full truth table from the spec — 
something like:
   
   ```rust
   // Input: [Some(1.0), Some(NaN), None, Some(0.0)]
   
   // is_nan: NaN→true, value→false, NULL→false (not null-propagating per spec)
   let result = apply_predicate_to_batch(Reference::new("qux").is_nan(), 
schema.clone(), batch);
   assert!(!result.is_null(0));
   assert!(!result.value(0));     // 1.0 is not NaN
   assert!(!result.is_null(1));
   assert!(result.value(1));      // NaN is NaN
   assert!(!result.is_null(2));
   assert!(!result.value(2));     // NULL → false, not NULL
   assert!(!result.is_null(3));
   assert!(!result.value(3));     // 0.0 is not NaN
   
   // not_nan: NaN→false, value→true, NULL→true (not null-propagating per spec)
   let result = apply_predicate_to_batch(Reference::new("qux").is_not_nan(), 
schema, batch);
   assert!(!result.is_null(0));
   assert!(result.value(0));      // 1.0 is not NaN
   assert!(!result.is_null(1));
   assert!(!result.value(1));     // NaN is NaN
   assert!(!result.is_null(2));
   assert!(result.value(2));      // NULL → true, not NULL
   assert!(!result.is_null(3));
   assert!(result.value(3));      // 0.0 is not NaN
   ```
   
   This locks in the non-null-propagating semantics from the spec table above, 
and would catch the current `from_unary` null propagation issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to