adriangb opened a new issue, #19092:
URL: https://github.com/apache/datafusion/issues/19092

   *This is a dump of a conversation with Claude Code*
   
   ## Problem
   
   When a predicate like `b IS NOT NULL OR a = 24` is rewritten for a file 
schema where column `b` doesn't exist, the `b IS NOT NULL` becomes `IS NOT 
NULL(NULL literal)`. This should simplify to `false`, allowing the predicate to 
become `a = 24` for proper row group pruning.
   
   Currently:
   1. `PhysicalExprSimplifier` doesn't simplify `IS NULL(literal)` or `IS NOT 
NULL(literal)`
   2. `build_predicate_expression` treats `IS NOT NULL(literal)` as "unhandled" 
→ returns `true`
   3. Result: `true OR (pruning_expr)` = `true` → no pruning possible
   
   ## Solution
   
   Implement both fixes so that `IS NOT NULL(lit(1))` and similar expressions 
benefit from simplification at multiple levels.
   
   ---
   
   ## Change 1: Add IS NULL/IS NOT NULL Simplification to PhysicalExprSimplifier
   
   ### File: `datafusion/physical-expr/src/simplifier/mod.rs`
   
   **Modifications:**
   
   1. Add new module import:
      ```rust
      pub mod is_null;
      ```
   
   2. Update `f_up` method to chain the new simplification:
      ```rust
      fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> {
          let rewritten = simplify_not_expr(&node, self.schema)?
              .transform_data(|node| is_null::simplify_is_null_expr(&node))?
              .transform_data(|node| {
                  unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
              })?;
          Ok(rewritten)
      }
      ```
   
   ### New File: `datafusion/physical-expr/src/simplifier/is_null.rs`
   
   Create a new module following the pattern of `not.rs`:
   
   ```rust
   //! Simplify IS NULL and IS NOT NULL expressions on literals
   //!
   //! - IS NULL(literal) -> true/false based on literal.value().is_null()
   //! - IS NOT NULL(literal) -> true/false based on !literal.value().is_null()
   
   use std::sync::Arc;
   use datafusion_common::{tree_node::Transformed, Result, ScalarValue};
   use crate::expressions::{lit, IsNullExpr, IsNotNullExpr, Literal};
   use crate::PhysicalExpr;
   
   pub fn simplify_is_null_expr(
       expr: &Arc<dyn PhysicalExpr>,
   ) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
       // Handle IS NULL(literal)
       if let Some(is_null_expr) = expr.as_any().downcast_ref::<IsNullExpr>() {
           if let Some(literal) = 
is_null_expr.arg().as_any().downcast_ref::<Literal>() {
               let result = literal.value().is_null();
               return 
Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(result)))));
           }
       }
   
       // Handle IS NOT NULL(literal)
       if let Some(is_not_null_expr) = 
expr.as_any().downcast_ref::<IsNotNullExpr>() {
           if let Some(literal) = 
is_not_null_expr.arg().as_any().downcast_ref::<Literal>() {
               let result = !literal.value().is_null();
               return 
Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(result)))));
           }
       }
   
       Ok(Transformed::no(Arc::clone(expr)))
   }
   ```
   
   ### Tests to Add (in `mod.rs`)
   
   ```rust
   #[test]
   fn test_simplify_is_null_literal() -> Result<()> {
       let schema = test_schema();
       let mut simplifier = PhysicalExprSimplifier::new(&schema);
   
       // IS NULL(1) -> false
       let is_null_expr = 
Arc::new(IsNullExpr::new(lit(ScalarValue::Int32(Some(1)))));
       let result = simplifier.simplify(is_null_expr)?;
       assert_eq!(as_literal(&result).value(), 
&ScalarValue::Boolean(Some(false)));
   
       // IS NULL(NULL) -> true
       let is_null_expr = 
Arc::new(IsNullExpr::new(lit(ScalarValue::Int32(None))));
       let result = simplifier.simplify(is_null_expr)?;
       assert_eq!(as_literal(&result).value(), 
&ScalarValue::Boolean(Some(true)));
   
       Ok(())
   }
   
   #[test]
   fn test_simplify_is_not_null_literal() -> Result<()> {
       let schema = test_schema();
       let mut simplifier = PhysicalExprSimplifier::new(&schema);
   
       // IS NOT NULL(1) -> true
       let expr = 
Arc::new(IsNotNullExpr::new(lit(ScalarValue::Int32(Some(1)))));
       let result = simplifier.simplify(expr)?;
       assert_eq!(as_literal(&result).value(), 
&ScalarValue::Boolean(Some(true)));
   
       // IS NOT NULL(NULL) -> false
       let expr = Arc::new(IsNotNullExpr::new(lit(ScalarValue::Int32(None))));
       let result = simplifier.simplify(expr)?;
       assert_eq!(as_literal(&result).value(), 
&ScalarValue::Boolean(Some(false)));
   
       Ok(())
   }
   ```
   
   ---
   
   ## Change 2: Handle Literal Arguments in build_predicate_expression
   
   ### File: `datafusion/pruning/src/pruning_predicate.rs`
   
   **Modify `build_predicate_expression` function (~line 1415-1427):**
   
   Before:
   ```rust
   if let Some(is_null) = expr_any.downcast_ref::<phys_expr::IsNullExpr>() {
       return build_is_null_column_expr(is_null.arg(), schema, 
required_columns, false)
           .unwrap_or_else(|| unhandled_hook.handle(expr));
   }
   if let Some(is_not_null) = 
expr_any.downcast_ref::<phys_expr::IsNotNullExpr>() {
       return build_is_null_column_expr(
           is_not_null.arg(),
           schema,
           required_columns,
           true,
       )
       .unwrap_or_else(|| unhandled_hook.handle(expr));
   }
   ```
   
   After:
   ```rust
   if let Some(is_null) = expr_any.downcast_ref::<phys_expr::IsNullExpr>() {
       // If argument is a literal, evaluate directly
       if let Some(literal) = 
is_null.arg().as_any().downcast_ref::<phys_expr::Literal>() {
           let result = literal.value().is_null();
           return 
Arc::new(phys_expr::Literal::new(ScalarValue::Boolean(Some(result))));
       }
       return build_is_null_column_expr(is_null.arg(), schema, 
required_columns, false)
           .unwrap_or_else(|| unhandled_hook.handle(expr));
   }
   if let Some(is_not_null) = 
expr_any.downcast_ref::<phys_expr::IsNotNullExpr>() {
       // If argument is a literal, evaluate directly
       if let Some(literal) = 
is_not_null.arg().as_any().downcast_ref::<phys_expr::Literal>() {
           let result = !literal.value().is_null();
           return 
Arc::new(phys_expr::Literal::new(ScalarValue::Boolean(Some(result))));
       }
       return build_is_null_column_expr(
           is_not_null.arg(),
           schema,
           required_columns,
           true,
       )
       .unwrap_or_else(|| unhandled_hook.handle(expr));
   }
   ```
   
   ---
   
   ## Files to Modify
   
   1. `datafusion/physical-expr/src/simplifier/mod.rs` - Add is_null module 
import and chain simplification
   2. `datafusion/physical-expr/src/simplifier/is_null.rs` - New file for IS 
NULL/IS NOT NULL simplification
   3. `datafusion/pruning/src/pruning_predicate.rs` - Handle literal arguments 
in build_predicate_expression
   
   ## Testing
   
   1. Run the original failing test:
      ```
      cargo test --package datafusion --test core_integration --all-features -- 
schema_adapter::schema_adapter_integration_tests::test_parquet_missing_column 
--exact --nocapture
      ```
   
   2. Run simplifier tests:
      ```
      cargo test --package datafusion-physical-expr --lib simplifier
      ```
   
   3. Run pruning predicate tests:
      ```
      cargo test --package datafusion-pruning --lib
      ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to