Hello community, With the introduction of multi modal index in Hudi, there is a lot of scope for improvement on the querying side. There are 2 major ways of reducing the data scan at the time of querying - partition pruning and file pruning. While with the latest developments in the community, partition pruning is supported for commonly used query engines like spark, presto and hive, File pruning using column stats index is only supported for spark and flink.
We intend to support data skipping for the rest of the engines as well which include hive, presto and trino. I have written a draft RFC here - https://github.com/apache/hudi/pull/6345. Please take a look and let me know what you think. Once we have some feedback from the community, we can decide on the next steps.