Thanks Lin, it's a great proposal - a very critical code path to be standardized!
On Sun, Jun 29, 2025 at 7:09 PM Lin <linliu.c...@gmail.com> wrote: > Dear Hudi Community, > > I'd like to kick off a discussion around an RFC to *deprecate > HoodieRecordPayload classes* in Hudi and move toward a *unified, > config-driven merge semantics* architecture. > 🔍 Motivation > > Currently, Hudi supports multiple mechanisms for merging records during > read: > > - > > Through the legacy HoodieRecordPayload API > - > > Via merger classes that extend HoodieRecordMerger > - > > Through the merge mode table config (COMMIT_TIME_ORDERING, > EVENT_TIME_ORDERING) > > This fragmented model creates complexity in both read and write paths, > increases the burden on developers contributing new features, and makes it > harder for users to reason about table behavior. Our goal is to *simplify > the codebase, reduce cognitive overhead*, and *unify the merge logic* under > a well-defined set of standard semantics. > 🧩 General Proposal > > This RFC proposes the following: > > 1. > > *Deprecate usage of HoodieRecordPayload* over time, transitioning > existing tables and users to rely on merge modes. > 2. > > Introduce a new table config hoodie.write.partial.update.mode to handle > common partial update behaviors like: > - > > KEEP_VALUES (default) > - > > FILL_DEFAULTS > - > > IGNORE_DEFAULTS > - > > IGNORE_MARKERS > 3. > > Provide a *migration guide and automated upgrade paths* for existing > built-in payload classes such as: > - > > OverwriteWithLatestAvroPayload > - > > DefaultHoodieRecordPayload > - > > EventTimeAvroPayload > - > > PartialUpdateAvroPayload > - > > etc. > 4. > > Continue supporting custom user-defined payloads for backward > compatibility, but discourage their use in favor of merger-based logic. > > This unification will make merge behavior *explicit and consistent*, help > improve long-term maintenance, and ease integration with newer components > like Flink and HoodieStreamer. > 📋 Call for Feedback > > We’ve compiled a detailed mapping of each payload class, its intended > migration path, partial update mode mapping, and compatibility notes. You > can find the draft in PR: https://github.com/apache/hudi/pull/13499 > > We’d love your thoughts on: > > - > > Whether the community agrees with deprecating HoodieRecordPayload > - > > Any concerns from users currently relying on custom payload classes > - > > Suggestions on better supporting partial update semantics > - > > Thoughts on phased rollout and reader/writer compatibility > > If there’s agreement, we’ll convert this into a formal RFC on the wiki and > start implementing it behind a feature flag to ensure smooth upgrades. > > Looking forward to hearing your feedback! > > Best regards, > Lin > -- Best, Shiyan