[ https://issues.apache.org/jira/browse/KUDU-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832108#comment-16832108 ]
Mike Percy commented on KUDU-2809: ---------------------------------- +1 on the correct solution here being that diff scan should not return the deleted row at all if theĀ insert of the row was the first operation that happened after the start timestamp of the diff scan and the end state was deleted. > Incremental backup / diff scan does not handle rows that are inserted and > deleted between two incrementals correctly > -------------------------------------------------------------------------------------------------------------------- > > Key: KUDU-2809 > URL: https://issues.apache.org/jira/browse/KUDU-2809 > Project: Kudu > Issue Type: Bug > Components: backup > Affects Versions: 1.9.0 > Reporter: Will Berkeley > Priority: Major > > I did the following sequence of operations: > # Insert 100 million rows > # Update 1 out of every 11 rows > # Make a full backup > # Insert 100 million more rows, after the original rows in keyspace > # Delete 1 out of every 23 rows > # Make an incremental backup > Restore failed to apply the incremental backup, failing with an error like > {noformat} > java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; > sample errors: > {noformat} > Due to another bug, there's no sample errors, but after hacking around that > bug, I found that the incremental contained a row with a DELETE action for a > key that is not present in the full backup. That's because the row was > inserted in step 4 and deleted in step 5, between backups. > We could fix this by > # Making diff scan not return a DELETE for such a row > # Implementing and using DELETE IGNORE in the restore job -- This message was sent by Atlassian JIRA (v7.6.3#76005)