Dear Hudi Community,

I'd like to kick off a discussion around an RFC to *deprecate
HoodieRecordPayload classes* in Hudi and move toward a *unified,
config-driven merge semantics* architecture.
🔍 Motivation

Currently, Hudi supports multiple mechanisms for merging records during
read:

   -

   Through the legacy HoodieRecordPayload API
   -

   Via merger classes that extend HoodieRecordMerger
   -

   Through the merge mode table config (COMMIT_TIME_ORDERING,
   EVENT_TIME_ORDERING)

This fragmented model creates complexity in both read and write paths,
increases the burden on developers contributing new features, and makes it
harder for users to reason about table behavior. Our goal is to *simplify
the codebase, reduce cognitive overhead*, and *unify the merge logic* under
a well-defined set of standard semantics.
đź§© General Proposal

This RFC proposes the following:

   1.

   *Deprecate usage of HoodieRecordPayload* over time, transitioning
   existing tables and users to rely on merge modes.
   2.

   Introduce a new table config hoodie.write.partial.update.mode to handle
   common partial update behaviors like:
   -

      KEEP_VALUES (default)
      -

      FILL_DEFAULTS
      -

      IGNORE_DEFAULTS
      -

      IGNORE_MARKERS
      3.

   Provide a *migration guide and automated upgrade paths* for existing
   built-in payload classes such as:
   -

      OverwriteWithLatestAvroPayload
      -

      DefaultHoodieRecordPayload
      -

      EventTimeAvroPayload
      -

      PartialUpdateAvroPayload
      -

      etc.
      4.

   Continue supporting custom user-defined payloads for backward
   compatibility, but discourage their use in favor of merger-based logic.

This unification will make merge behavior *explicit and consistent*, help
improve long-term maintenance, and ease integration with newer components
like Flink and HoodieStreamer.
đź“‹ Call for Feedback

We’ve compiled a detailed mapping of each payload class, its intended
migration path, partial update mode mapping, and compatibility notes. You
can find the draft in PR: https://github.com/apache/hudi/pull/13499

We’d love your thoughts on:

   -

   Whether the community agrees with deprecating HoodieRecordPayload
   -

   Any concerns from users currently relying on custom payload classes
   -

   Suggestions on better supporting partial update semantics
   -

   Thoughts on phased rollout and reader/writer compatibility

If there’s agreement, we’ll convert this into a formal RFC on the wiki and
start implementing it behind a feature flag to ensure smooth upgrades.

Looking forward to hearing your feedback!

Best regards,
Lin

Reply via email to