Hey hi Teng,
Definitely a good performance enhancement for payloads that can
leverage. We do have RFC-46 initiative going on. May be worth aligning w/
that so that since we are looking to go that direction for record mergers.
Eagerly looking forward for it.
On Thu, 3 Nov 2022 at 20:43, Teng Huo <[email protected]> wrote:
> Hi all,
>
> We have a feature propose for improving MOR table read performance to any
> payload.
>
> Background
>
> In HUDI-3217, Alexey added column prune support in HoodieMergeOnReadRDD,
> which is really nice feature. It can speed up MOR _rt table query
> significantly.
>
> However, this performance improvement is limited by a
> whitelistedPayloadClasses, so column prune is only supported in
> OverwriteWithLatestAvroPayload. If we implemented any other payload class,
> it can't utilise this feature.
>
>
> Propose
>
> After studying about this feature in HoodieMergeOnReadRDD implemented by
> Alexey, we added 2 new methods in the interface HoodieRecordPayload to tell
> HoodieMergeOnReadRDD if a payload class can be applied column prune, and if
> there is any extra column for doing merge.
> We have implemented this feature in Spark side, and also started the dev
> work for supporting it in Trino.
>
> Code reference:
> https://gist.github.com/TengHuo/48068bf1810ed771b388862271e53266
>
>
> Related issues
>
> https://issues.apache.org/jira/browse/HUDI-3217
> [HUDI-3217] RFC-46: Optimize Record Payload handling - ASF JIRA<
> https://issues.apache.org/jira/browse/HUDI-3217>
> Apache Hudi; HUDI-3217; RFC-46: Optimize Record Payload handling. Log In.
> Export
> issues.apache.org
> https://issues.apache.org/jira/browse/HUDI-5158
> [HUDI-5158] Add column pruning support to any payload - ASF JIRA<
> https://issues.apache.org/jira/browse/HUDI-5158>
> In HoodieMergeOnReadRDD, Alexey added column prune support in PR #4888,
> which is nice, it can speed up MOR _rt table query significantly. However,
> this performance improvement is limited by a whitelistedPayloadClasses , so
> column prune is only supported in OverwriteWithLatestAvroPayload .If we
> implemented any other payload class, it can't utilise this feature.
> issues.apache.org
>
>
> We plan to share this feature to Hudi community once completed. May I ask
> if there is any suggestion about this feature? Really appreciate
>
--
Regards,
-Sivabalan