kosiew commented on PR #17085: URL: https://github.com/apache/datafusion/pull/17085#issuecomment-3356523033
> Is this a band-aid fix? Is there a root cause we should be looking for instead? > There's a heavy emphasis on the word "synthesize" throughout this PR but I don't know what it means to "synthesize" a schema from literal expressions 🤔 AggregateExprBuilder already captures a FieldRef for every argument (including literals) by calling each physical expression’s return_field during construction, so we retain the full Arrow metadata for those inputs in input_fields. The new args_schema helper detects when the physical input schema is empty—something that legitimately happens when an aggregate is invoked with literals only because the child plan has no columns—and in that case reconstitutes a Schema from the stored input_fields so the accumulator can still see that metadata. We then hand that schema to every AccumulatorArgs we build, so UDAFs observe the same field information whether their inputs were columns or literals. In other words, “synthesize” means “wrap the already-computed argument fields in a temporary Schema when the physical schema is empty”; there isn’t another layer hiding the real root cause. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
