Thanks Lin, it's a great proposal - a very critical code path to be
standardized!

On Sun, Jun 29, 2025 at 7:09 PM Lin <linliu.c...@gmail.com> wrote:

> Dear Hudi Community,
>
> I'd like to kick off a discussion around an RFC to *deprecate
> HoodieRecordPayload classes* in Hudi and move toward a *unified,
> config-driven merge semantics* architecture.
> 🔍 Motivation
>
> Currently, Hudi supports multiple mechanisms for merging records during
> read:
>
>    -
>
>    Through the legacy HoodieRecordPayload API
>    -
>
>    Via merger classes that extend HoodieRecordMerger
>    -
>
>    Through the merge mode table config (COMMIT_TIME_ORDERING,
>    EVENT_TIME_ORDERING)
>
> This fragmented model creates complexity in both read and write paths,
> increases the burden on developers contributing new features, and makes it
> harder for users to reason about table behavior. Our goal is to *simplify
> the codebase, reduce cognitive overhead*, and *unify the merge logic* under
> a well-defined set of standard semantics.
> 🧩 General Proposal
>
> This RFC proposes the following:
>
>    1.
>
>    *Deprecate usage of HoodieRecordPayload* over time, transitioning
>    existing tables and users to rely on merge modes.
>    2.
>
>    Introduce a new table config hoodie.write.partial.update.mode to handle
>    common partial update behaviors like:
>    -
>
>       KEEP_VALUES (default)
>       -
>
>       FILL_DEFAULTS
>       -
>
>       IGNORE_DEFAULTS
>       -
>
>       IGNORE_MARKERS
>       3.
>
>    Provide a *migration guide and automated upgrade paths* for existing
>    built-in payload classes such as:
>    -
>
>       OverwriteWithLatestAvroPayload
>       -
>
>       DefaultHoodieRecordPayload
>       -
>
>       EventTimeAvroPayload
>       -
>
>       PartialUpdateAvroPayload
>       -
>
>       etc.
>       4.
>
>    Continue supporting custom user-defined payloads for backward
>    compatibility, but discourage their use in favor of merger-based logic.
>
> This unification will make merge behavior *explicit and consistent*, help
> improve long-term maintenance, and ease integration with newer components
> like Flink and HoodieStreamer.
> 📋 Call for Feedback
>
> We’ve compiled a detailed mapping of each payload class, its intended
> migration path, partial update mode mapping, and compatibility notes. You
> can find the draft in PR: https://github.com/apache/hudi/pull/13499
>
> We’d love your thoughts on:
>
>    -
>
>    Whether the community agrees with deprecating HoodieRecordPayload
>    -
>
>    Any concerns from users currently relying on custom payload classes
>    -
>
>    Suggestions on better supporting partial update semantics
>    -
>
>    Thoughts on phased rollout and reader/writer compatibility
>
> If there’s agreement, we’ll convert this into a formal RFC on the wiki and
> start implementing it behind a feature flag to ensure smooth upgrades.
>
> Looking forward to hearing your feedback!
>
> Best regards,
> Lin
>


-- 
Best,
Shiyan

Reply via email to