Dandandan commented on issue #18411:
URL: https://github.com/apache/datafusion/issues/18411#issuecomment-3665233281

   You are totally right of course - we shouldn't make optimizations that are 
only useful for some tpc-h query and not in the wild.
   Doing the optimization in general for short strings is super useful (and 
work for all short string / byte views).
   
   > Is there a mechanism in DataFusion already to carry the necessary meta 
information from the table all the way to the aggregation? The Arrow physical 
types alone aren't sufficiently rich to model that.
   
   Although I think that this might be useful as well, there is information in 
the table schema (i.e. certain fields are `char(1)`) that helps making the 
parquet read or aggregations on those fields faster and consume less memory 
than processing it as a variable-width utf8 field.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to