Hi all, We have a feature propose for improving MOR table read performance to any payload.
Background In HUDI-3217, Alexey added column prune support in HoodieMergeOnReadRDD, which is really nice feature. It can speed up MOR _rt table query significantly. However, this performance improvement is limited by a whitelistedPayloadClasses, so column prune is only supported in OverwriteWithLatestAvroPayload. If we implemented any other payload class, it can't utilise this feature. Propose After studying about this feature in HoodieMergeOnReadRDD implemented by Alexey, we added 2 new methods in the interface HoodieRecordPayload to tell HoodieMergeOnReadRDD if a payload class can be applied column prune, and if there is any extra column for doing merge. We have implemented this feature in Spark side, and also started the dev work for supporting it in Trino. Code reference: https://gist.github.com/TengHuo/48068bf1810ed771b388862271e53266 Related issues https://issues.apache.org/jira/browse/HUDI-3217 [HUDI-3217] RFC-46: Optimize Record Payload handling - ASF JIRA<https://issues.apache.org/jira/browse/HUDI-3217> Apache Hudi; HUDI-3217; RFC-46: Optimize Record Payload handling. Log In. Export issues.apache.org https://issues.apache.org/jira/browse/HUDI-5158 [HUDI-5158] Add column pruning support to any payload - ASF JIRA<https://issues.apache.org/jira/browse/HUDI-5158> In HoodieMergeOnReadRDD, Alexey added column prune support in PR #4888, which is nice, it can speed up MOR _rt table query significantly. However, this performance improvement is limited by a whitelistedPayloadClasses , so column prune is only supported in OverwriteWithLatestAvroPayload .If we implemented any other payload class, it can't utilise this feature. issues.apache.org We plan to share this feature to Hudi community once completed. May I ask if there is any suggestion about this feature? Really appreciate
