[jira] [Reopened] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly

2020-09-10 Thread Balaji Varadarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Varadarajan reopened HUDI-802:
-

> AWSDmsTransformer does not handle insert -> delete of a row in a single batch 
> correctly
> ---
>
> Key: HUDI-802
> URL: https://issues.apache.org/jira/browse/HUDI-802
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Christopher Weaver
>Assignee: Balaji Varadarajan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> The provided AWSDmsAvroPayload class 
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
>  currently handles cases where the "Op" column is a "D" for updates, and 
> successfully removes the row from the resulting table. 
> However, when an insert is quickly followed by a delete on the row (e.g. DMS 
> processes them together and puts the update records together in the same 
> parquet file), the row incorrectly appears in the resulting table. In this 
> case, the record is not in the table and getInsertValue is called rather than 
> combineAndGetUpdateValue. Since the logic to check for a delete is in 
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
> like this could fix this issue: 
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly

2020-08-24 Thread Bhavani Sudha (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavani Sudha reopened HUDI-802:


> AWSDmsTransformer does not handle insert -> delete of a row in a single batch 
> correctly
> ---
>
> Key: HUDI-802
> URL: https://issues.apache.org/jira/browse/HUDI-802
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Christopher Weaver
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.6.0
>
>
> The provided AWSDmsAvroPayload class 
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
>  currently handles cases where the "Op" column is a "D" for updates, and 
> successfully removes the row from the resulting table. 
> However, when an insert is quickly followed by a delete on the row (e.g. DMS 
> processes them together and puts the update records together in the same 
> parquet file), the row incorrectly appears in the resulting table. In this 
> case, the record is not in the table and getInsertValue is called rather than 
> combineAndGetUpdateValue. Since the logic to check for a delete is in 
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
> like this could fix this issue: 
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)