erratic-pattern commented on issue #19049:
URL: https://github.com/apache/datafusion/issues/19049#issuecomment-3643125781

   It's not clear to me whether or not this is an actual bug or not. It seems 
reasonable to expect metadata to be consistent for field names across union 
branches. However it could also be problematic for queries that either:
    a. Shadow existing column names with new, unrelated columns. Variable 
shadowing in Rust is very normal, but I'm not sure if this is considered 
reasonable in SQL queries.
   b. Queries that introduce constant literals for fields on one side of a 
union, such as this reproducer.
   
   Perhaps we need to adjust 
[intersect_metadata_for_union](https://github.com/apache/datafusion/blob/7b4593f36e880ca1c43746d5c4465fff5a3901c3/datafusion/expr/src/expr.rs#L506-L520)
 to either:
   a. Avoid intersecting a branch that contains  *empty metadata*, and instead 
preserve/intersect only the branches that contain non-empty  metadata. This 
avoids destructive loss of metadata when one union branch is empty.
   b. Avoid intersecting a branch that contains *empty metadata* on a field 
that is a *constant literal*. This is a more restrictive version of option a 
that might result in fewer unintended consequences.
   c. *union* the metadata instead of *intersecting* the metadata. This ensures 
there is no metadata lost, but I am not sure what consequences this might have 
since it could populate metadata in the output schema when it was not intended 
to be there.
   
   I am curious if anyone has opinions about any of these approaches, or if 
there is another way to look at this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to