Dear Hudi Community, I'd like to kick off a discussion around an RFC to *deprecate HoodieRecordPayload classes* in Hudi and move toward a *unified, config-driven merge semantics* architecture. 🔍 Motivation
Currently, Hudi supports multiple mechanisms for merging records during read: - Through the legacy HoodieRecordPayload API - Via merger classes that extend HoodieRecordMerger - Through the merge mode table config (COMMIT_TIME_ORDERING, EVENT_TIME_ORDERING) This fragmented model creates complexity in both read and write paths, increases the burden on developers contributing new features, and makes it harder for users to reason about table behavior. Our goal is to *simplify the codebase, reduce cognitive overhead*, and *unify the merge logic* under a well-defined set of standard semantics. 🧩 General Proposal This RFC proposes the following: 1. *Deprecate usage of HoodieRecordPayload* over time, transitioning existing tables and users to rely on merge modes. 2. Introduce a new table config hoodie.write.partial.update.mode to handle common partial update behaviors like: - KEEP_VALUES (default) - FILL_DEFAULTS - IGNORE_DEFAULTS - IGNORE_MARKERS 3. Provide a *migration guide and automated upgrade paths* for existing built-in payload classes such as: - OverwriteWithLatestAvroPayload - DefaultHoodieRecordPayload - EventTimeAvroPayload - PartialUpdateAvroPayload - etc. 4. Continue supporting custom user-defined payloads for backward compatibility, but discourage their use in favor of merger-based logic. This unification will make merge behavior *explicit and consistent*, help improve long-term maintenance, and ease integration with newer components like Flink and HoodieStreamer. 📋 Call for Feedback We’ve compiled a detailed mapping of each payload class, its intended migration path, partial update mode mapping, and compatibility notes. You can find the draft in PR: https://github.com/apache/hudi/pull/13499 We’d love your thoughts on: - Whether the community agrees with deprecating HoodieRecordPayload - Any concerns from users currently relying on custom payload classes - Suggestions on better supporting partial update semantics - Thoughts on phased rollout and reader/writer compatibility If there’s agreement, we’ll convert this into a formal RFC on the wiki and start implementing it behind a feature flag to ensure smooth upgrades. Looking forward to hearing your feedback! Best regards, Lin