[ 
https://issues.apache.org/jira/browse/KUDU-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170779#comment-15170779
 ] 

Todd Lipcon commented on KUDU-1354:
-----------------------------------

I think this sequence might be another potential cause for "out-of-order UNDO" 
CHECK failures.

> MVCC Snapshots chosen during flush can contain out-of-order transactions
> ------------------------------------------------------------------------
>
>                 Key: KUDU-1354
>                 URL: https://issues.apache.org/jira/browse/KUDU-1354
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 0.7.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>
> I spent a while trying to debug a failure of alter_table-randomized-test and 
> found the following interesting logs:
> - We have two operations in the WAL which arrived in short succession (about 
> 4ms apart) just before an alter table. I've renumbered the txids for 
> readability here:
> {noformat}
> 1.13@2        REPLICATE WRITE_OP
>       op 0: MUTATE (int32 key=1643562) SET c6=1107303203
> 1.14@4        REPLICATE WRITE_OP
>       op 0: MUTATE (int32 key=1643562) DELETE
> {noformat}
> - and the Flush that was caused by the Altertable has the following snapshots:
> {noformat}
> ... Phase 1 snapshot:  MvccSnapshot[committed={T|T < 2 or (T in (4))]
> ...
> ... Phase 2 snapshot: MvccSnapshot[committed={T|T < 2 or (T in (4, 2))]
> {noformat}
> Note that the first snapshot considers the 'DELETE' committed but not the 
> 'UPDATE'. We then fill in the 'UPDATE' in the second snapshot.The end result 
> here is that we end up flushing REDO deltas as follows:
> REDO file 1 (flushed in phase 1): includes only the DELETE
> REDO file 2 (flushed after ReupdateMissedDeltas); includes only the UPDATE
> When we later proceed to compact this rowset, we get "Check failed: 
> !is_deleted Got UPDATE for deleted row."
> Scenarios like this seem to reproduce a few tenths of a percent of the time 
> in this stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to