[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-802: Fix Version/s: (was: 0.6.0) 0.6.1 > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: Balaji Varadarajan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.1 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-802: --- Status: Closed (was: Patch Available) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-802: Status: Patch Available (was: In Progress) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-802: Status: Open (was: New) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-802: Status: In Progress (was: Open) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-802: Labels: pull-request-available (was: ) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Priority: Blocker > Labels: pull-request-available > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-802: Status: New (was: Open) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Priority: Blocker > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-802: Priority: Blocker (was: Major) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Priority: Blocker > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-802: Status: Open (was: New) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Priority: Major > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-802: Fix Version/s: 0.6.0 > AWSDmsTransformer does not handle insert -> delete of a row in a single batch > correctly > --- > > Key: HUDI-802 > URL: https://issues.apache.org/jira/browse/HUDI-802 > Project: Apache Hudi (incubating) > Issue Type: Bug > Components: DeltaStreamer >Reporter: Christopher Weaver >Priority: Major > Fix For: 0.6.0 > > > The provided AWSDmsAvroPayload class > ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java]) > currently handles cases where the "Op" column is a "D" for updates, and > successfully removes the row from the resulting table. > However, when an insert is quickly followed by a delete on the row (e.g. DMS > processes them together and puts the update records together in the same > parquet file), the row incorrectly appears in the resulting table. In this > case, the record is not in the table and getInsertValue is called rather than > combineAndGetUpdateValue. Since the logic to check for a delete is in > combineAndGetUpdateValue, it is skipped and the delete is missed. Something > like this could fix this issue: > [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java]. > -- This message was sent by Atlassian Jira (v8.3.4#803005)