[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinoth Chandar updated HUDI-802: -------------------------------- Status: New (was: Open) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --------------------------------------------------------------------------------------- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer > Reporter: Christopher Weaver > Priority: Blocker > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)