alamb commented on issue #19049: URL: https://github.com/apache/datafusion/issues/19049#issuecomment-3643344588
> It's not clear to me whether or not this is an actual bug. It seems reasonable to expect metadata to be consistent for field names across union branches. However it could also be problematic for queries that either: In my mind it is a bug because the query is very reasonable -- I think there are other types of plans in queries where the metadata from different inputs needs to be combined (and thus might not be the same at the input and output -- for example joins and aggregates) It seems like there is an implicit assumption that "the schema (including metadata) of a plan should remain the same after an optimizer pass". If this is indeed correct, then by your analysis above > [optimize_projections](https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/optimize_projections/mod.rs) [calls](https://github.com/apache/datafusion/blob/7b4593f36e880ca1c43746d5c4465fff5a3901c3/datafusion/optimizer/src/optimize_projections/mod.rs#L468) [recompute_schema](https://github.com/apache/arrow-datafusion/blob/7b4593f36e880ca1c43746d5c4465fff5a3901c3/datafusion/expr/src/logical_plan/plan.rs#L624-L756) since the plan has changed. recompute_schema sees that the number of fields has changed and [creates a new Union node](https://github.com/influxdata/arrow-datafusion/blob/82cd7f3cdb8dbe0b63b8b62f54543641598655a0/datafusion/expr/src/logical_plan/plan.rs#L718) with [Union::try_new](https://github.com/influxdata/arrow-datafusion/blob/82cd7f3cdb8dbe0b63b8b62f54543641598655a0/datafusion/expr/src/logical_plan/plan.rs#L718) It seems like we should fix optimize_projections so it maintains the schema (either by attaching metadata to the NULL literal, or perhaps by simply reusing the previous schema). For example, @adriangb just aded code that does something similar (though during execution) in this PR https://github.com/apache/datafusion/pull/19111 : https://github.com/apache/datafusion/blob/bde16083ad344b7a52db5cb298a15d7434ffde51/datafusion/datasource-parquet/src/opener.rs#L529-L545 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
