asolimando commented on issue #21120: URL: https://github.com/apache/datafusion/issues/21120#issuecomment-4138342060
> > This links to paleolimbot's interest in stats propagation for a specific type of stats > > I am new to this part of DataFusion but I took a look at the draft PR...we probably would be able to use this (but would need `Statistics` to be able to represent something pluggable, and would need the analyzer trait to be able to calculate something fancier than min/max and NDV). No pressure to cater to that here, but my personal motivating example is to ask for GeoStatistics (we have our own definition) for a specific column output of a `dyn ExecutionPlan` when planning a join. Thanks for your feedback @paleolimbot. This issue covers the expression level only, as I tried to keep the scope to the minimum to not overload reviewers. But I am also working on a POC for a similar chain-of-responsibility registry for the operator level, to be able to override the statistics propagation mechanism for individual operators beyond what's implemented in `partition_statistics` today. The idea is to buy the same freedom we have today for connectors/data source, and physical rules, but for statistics. Being able to override operators' behavior for statistics propagation, would also allow supporting custom statistics, which matches exactly your use-case (provided a mechanism for storing them too, of course). Concretely speaking, I had in mind [DataSketches](https://datasketches.apache.org/) as custom stats, similar to what I implemented in [HIVE-26221](https://issues.apache.org/jira/browse/HIVE-26221) to add support for histogram-like statistics for range filters based on [KLL sketches](https://datasketches.apache.org/docs/KLL/KLLSketch.html), but it can very well be used for any kind of "custom" stats. I will hopefully have a draft PR/POC for when I will open this new issue, but I'd like to keep the two discussions separate for now to not broaden the scope too much, as you have mentioned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
