adriangb opened a new issue, #19092:
URL: https://github.com/apache/datafusion/issues/19092
*This is a dump of a conversation with Claude Code*
## Problem
When a predicate like `b IS NOT NULL OR a = 24` is rewritten for a file
schema where column `b` doesn't exist, the `b IS NOT NULL` becomes `IS NOT
NULL(NULL literal)`. This should simplify to `false`, allowing the predicate to
become `a = 24` for proper row group pruning.
Currently:
1. `PhysicalExprSimplifier` doesn't simplify `IS NULL(literal)` or `IS NOT
NULL(literal)`
2. `build_predicate_expression` treats `IS NOT NULL(literal)` as "unhandled"
→ returns `true`
3. Result: `true OR (pruning_expr)` = `true` → no pruning possible
## Solution
Implement both fixes so that `IS NOT NULL(lit(1))` and similar expressions
benefit from simplification at multiple levels.
---
## Change 1: Add IS NULL/IS NOT NULL Simplification to PhysicalExprSimplifier
### File: `datafusion/physical-expr/src/simplifier/mod.rs`
**Modifications:**
1. Add new module import:
```rust
pub mod is_null;
```
2. Update `f_up` method to chain the new simplification:
```rust
fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> {
let rewritten = simplify_not_expr(&node, self.schema)?
.transform_data(|node| is_null::simplify_is_null_expr(&node))?
.transform_data(|node| {
unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
})?;
Ok(rewritten)
}
```
### New File: `datafusion/physical-expr/src/simplifier/is_null.rs`
Create a new module following the pattern of `not.rs`:
```rust
//! Simplify IS NULL and IS NOT NULL expressions on literals
//!
//! - IS NULL(literal) -> true/false based on literal.value().is_null()
//! - IS NOT NULL(literal) -> true/false based on !literal.value().is_null()
use std::sync::Arc;
use datafusion_common::{tree_node::Transformed, Result, ScalarValue};
use crate::expressions::{lit, IsNullExpr, IsNotNullExpr, Literal};
use crate::PhysicalExpr;
pub fn simplify_is_null_expr(
expr: &Arc<dyn PhysicalExpr>,
) -> Result<Transformed<Arc<dyn PhysicalExpr>>> {
// Handle IS NULL(literal)
if let Some(is_null_expr) = expr.as_any().downcast_ref::<IsNullExpr>() {
if let Some(literal) =
is_null_expr.arg().as_any().downcast_ref::<Literal>() {
let result = literal.value().is_null();
return
Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(result)))));
}
}
// Handle IS NOT NULL(literal)
if let Some(is_not_null_expr) =
expr.as_any().downcast_ref::<IsNotNullExpr>() {
if let Some(literal) =
is_not_null_expr.arg().as_any().downcast_ref::<Literal>() {
let result = !literal.value().is_null();
return
Ok(Transformed::yes(lit(ScalarValue::Boolean(Some(result)))));
}
}
Ok(Transformed::no(Arc::clone(expr)))
}
```
### Tests to Add (in `mod.rs`)
```rust
#[test]
fn test_simplify_is_null_literal() -> Result<()> {
let schema = test_schema();
let mut simplifier = PhysicalExprSimplifier::new(&schema);
// IS NULL(1) -> false
let is_null_expr =
Arc::new(IsNullExpr::new(lit(ScalarValue::Int32(Some(1)))));
let result = simplifier.simplify(is_null_expr)?;
assert_eq!(as_literal(&result).value(),
&ScalarValue::Boolean(Some(false)));
// IS NULL(NULL) -> true
let is_null_expr =
Arc::new(IsNullExpr::new(lit(ScalarValue::Int32(None))));
let result = simplifier.simplify(is_null_expr)?;
assert_eq!(as_literal(&result).value(),
&ScalarValue::Boolean(Some(true)));
Ok(())
}
#[test]
fn test_simplify_is_not_null_literal() -> Result<()> {
let schema = test_schema();
let mut simplifier = PhysicalExprSimplifier::new(&schema);
// IS NOT NULL(1) -> true
let expr =
Arc::new(IsNotNullExpr::new(lit(ScalarValue::Int32(Some(1)))));
let result = simplifier.simplify(expr)?;
assert_eq!(as_literal(&result).value(),
&ScalarValue::Boolean(Some(true)));
// IS NOT NULL(NULL) -> false
let expr = Arc::new(IsNotNullExpr::new(lit(ScalarValue::Int32(None))));
let result = simplifier.simplify(expr)?;
assert_eq!(as_literal(&result).value(),
&ScalarValue::Boolean(Some(false)));
Ok(())
}
```
---
## Change 2: Handle Literal Arguments in build_predicate_expression
### File: `datafusion/pruning/src/pruning_predicate.rs`
**Modify `build_predicate_expression` function (~line 1415-1427):**
Before:
```rust
if let Some(is_null) = expr_any.downcast_ref::<phys_expr::IsNullExpr>() {
return build_is_null_column_expr(is_null.arg(), schema,
required_columns, false)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}
if let Some(is_not_null) =
expr_any.downcast_ref::<phys_expr::IsNotNullExpr>() {
return build_is_null_column_expr(
is_not_null.arg(),
schema,
required_columns,
true,
)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}
```
After:
```rust
if let Some(is_null) = expr_any.downcast_ref::<phys_expr::IsNullExpr>() {
// If argument is a literal, evaluate directly
if let Some(literal) =
is_null.arg().as_any().downcast_ref::<phys_expr::Literal>() {
let result = literal.value().is_null();
return
Arc::new(phys_expr::Literal::new(ScalarValue::Boolean(Some(result))));
}
return build_is_null_column_expr(is_null.arg(), schema,
required_columns, false)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}
if let Some(is_not_null) =
expr_any.downcast_ref::<phys_expr::IsNotNullExpr>() {
// If argument is a literal, evaluate directly
if let Some(literal) =
is_not_null.arg().as_any().downcast_ref::<phys_expr::Literal>() {
let result = !literal.value().is_null();
return
Arc::new(phys_expr::Literal::new(ScalarValue::Boolean(Some(result))));
}
return build_is_null_column_expr(
is_not_null.arg(),
schema,
required_columns,
true,
)
.unwrap_or_else(|| unhandled_hook.handle(expr));
}
```
---
## Files to Modify
1. `datafusion/physical-expr/src/simplifier/mod.rs` - Add is_null module
import and chain simplification
2. `datafusion/physical-expr/src/simplifier/is_null.rs` - New file for IS
NULL/IS NOT NULL simplification
3. `datafusion/pruning/src/pruning_predicate.rs` - Handle literal arguments
in build_predicate_expression
## Testing
1. Run the original failing test:
```
cargo test --package datafusion --test core_integration --all-features --
schema_adapter::schema_adapter_integration_tests::test_parquet_missing_column
--exact --nocapture
```
2. Run simplifier tests:
```
cargo test --package datafusion-physical-expr --lib simplifier
```
3. Run pruning predicate tests:
```
cargo test --package datafusion-pruning --lib
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]