[DISCUSS]: Integrate column stats index with all query engines

Pratyaksh Sharma Wed, 10 Aug 2022 09:26:11 -0700

Hello community,

With the introduction of multi modal index in Hudi, there is a lot of scope
for improvement on the querying side. There are 2 major ways of reducing
the data scan at the time of querying - partition pruning and file pruning.
While with the latest developments in the community, partition pruning is
supported for commonly used query engines like spark, presto and hive, File
pruning using column stats index is only supported for spark and flink.


We intend to support data skipping for the rest of the engines as well
which include hive, presto and trino. I have written a draft RFC here -
https://github.com/apache/hudi/pull/6345.

Please take a look and let me know what you think. Once we have some
feedback from the community, we can decide on the next steps.

[DISCUSS]: Integrate column stats index with all query engines

Reply via email to