hudi-bot opened a new issue, #16691:
URL: https://github.com/apache/hudi/issues/16691
With the move towards making partial updates a first class citizen, that
does not need any special payloads/merges, we need to move the CDC payloads to
all be transformers in Hudi Streamer and SQL write path. Along with migration
instructions to users.
# partial update has been implemented for Spark SQL source as follows:
## Configuration \{{ hoodie.write.partial.update.schema }} is used for
partial update.
## {{ExpressionPayload}} creates the writer schema based on the
configuration.
## {{HoodieAppendHandle}} creates the log file based on the confgiuration
and the corresponding partial schema.
## Currently this handle assumes these records are all update records.
## We need to understand if ExpressionPayload/SQL Merger is needed to going
forward.
# For DeltaStreamer, our goal is to remove all silo CDC payloads, e.g.,
Debezium or AWS DMS, and to provide CDC data as {{InternalRow}} type. Therefore,
## The {{transformer}} in DeltaStreamer prepares the data according to the
types of the sources.
## Initially, its okay to just support full row updates/deletes/...
# Audit all of them should properly combine I/U/D into data and delete
blocks, such that U after D, D after U scenarios are handled as expected.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-8401
- Type: New Feature
- Fix version(s):
- 1.1.0
---
## Comments
13/Jun/25 00:26;danny0405;FirstValueAvroPayload removed via maste:
1876998d22c323222099a0be9105410a51af4ffc;;;
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]