friendlymatthew opened a new pull request, #20822:
URL: https://github.com/apache/datafusion/pull/20822
## Which issue does this PR close?
- Related to #20603
## Rationale for this change
This PR enables Parquet row-level filter pushdown for struct field access
expressions, which previously fell back to a full scan followed by a separate
filtering pass, a significant perf penalty for queries filtering on struct
fields in large Parquet files (like Variant types!)
Filters on struct fields like `WHERE s['foo'] > 67` were not being pushed
into the Parquet decoder. This is because `PushdownChecker` sees the underlying
`Column("s")` has a `Struct` type and unconditionally rejects it, without
considering that `get_field` resolves to a primitive leaf. With this change,
deeply nested access like `s['outer']['inner']` will also get pushed down
because the logical simplifier flattens it before it reaches the physical plan
Note: this does not address the projection side and should not be blocked by
it. `SELECT s['foo']` still reads the entire struct rather than just the needed
leaf column. That requires separate changes to how the opener builds its
projection mask.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]