adriangb commented on code in PR #22026:
URL: https://github.com/apache/datafusion/pull/22026#discussion_r3190962992
##########
datafusion/datasource-parquet/src/opener.rs:
##########
@@ -807,11 +827,24 @@ impl MetadataLoadedParquetOpen {
let needs_rewrite = prepared.predicate.is_some()
|| prepared.logical_file_schema != physical_file_schema;
if needs_rewrite {
+ // When virtual columns are requested, augment the logical and
+ // physical schemas passed to the rewriter/simplifier with those
+ // fields. The rewriter identity-rewrites references found in both
+ // schemas, keeping virtual-column references as `Column` rather
+ // than replacing them with null literals; the simplifier needs
+ // them present so it can resolve their data types while walking
+ // expression trees. We keep `physical_file_schema` itself as the
+ // pure file schema so downstream predicate pushdown, pruning, and
+ // row filter construction stay unaffected.
Review Comment:
What if there is a filter on e.g. `row_number % 2 = 0 OR column = 123`? Can
it still run as a row filter? If not there is a mismatch which can lead to
wrong results (filter dropped), we've had this bug in the past with struct
columns (which were not allowed to run as row filters) iirc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]