[jira] [Commented] (KUDU-1354) MVCC Snapshots chosen during flush can contain out-of-order transactions

Todd Lipcon (JIRA) Sat, 27 Feb 2016 17:19:00 -0800

    [ 
https://issues.apache.org/jira/browse/KUDU-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170783#comment-15170783
 ]


Todd Lipcon commented on KUDU-1354:
-----------------------------------

It seems like we currently clean up transactions in this order:

{noformat}
- TransactionDriver::ApplyTask:
-- transaction_->PreCommit() (calls WriteTransaction::PreCommit)
--- calls release_row_locks()
-- Finalize()
--- calls transaction_->Finish()
---- calls state()->Commit()
----- calls mvcc_tx_->Commit()
{noformat}

given that we release the row lock before we commit the MVCC transaction, it's 
quite possible that, before we get a chance to mvcc-commit the UPDATE, a DELETE 
comes in for the same row and commits itself, and a compaction captures that 
snapshot.

[~dralves] do you recall why we clean up in this order? It seems like we should 
be holding the row locks until after marking the mvcc transaction committed.. 
maybe it was a case of premature optimization for concurrency?

> MVCC Snapshots chosen during flush can contain out-of-order transactions
> ------------------------------------------------------------------------
>
>                 Key: KUDU-1354
>                 URL: https://issues.apache.org/jira/browse/KUDU-1354
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 0.7.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>
> I spent a while trying to debug a failure of alter_table-randomized-test and 
> found the following interesting logs:
> - We have two operations in the WAL which arrived in short succession (about 
> 4ms apart) just before an alter table. I've renumbered the txids for 
> readability here:
> {noformat}
> 1.13@2        REPLICATE WRITE_OP
>       op 0: MUTATE (int32 key=1643562) SET c6=1107303203
> 1.14@4        REPLICATE WRITE_OP
>       op 0: MUTATE (int32 key=1643562) DELETE
> {noformat}
> - and the Flush that was caused by the Altertable has the following snapshots:
> {noformat}
> ... Phase 1 snapshot:  MvccSnapshot[committed={T|T < 2 or (T in (4))]
> ...
> ... Phase 2 snapshot: MvccSnapshot[committed={T|T < 2 or (T in (4, 2))]
> {noformat}
> Note that the first snapshot considers the 'DELETE' committed but not the 
> 'UPDATE'. We then fill in the 'UPDATE' in the second snapshot.The end result 
> here is that we end up flushing REDO deltas as follows:
> REDO file 1 (flushed in phase 1): includes only the DELETE
> REDO file 2 (flushed after ReupdateMissedDeltas); includes only the UPDATE
> When we later proceed to compact this rowset, we get "Check failed: 
> !is_deleted Got UPDATE for deleted row."
> Scenarios like this seem to reproduce a few tenths of a percent of the time 
> in this stress test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1354) MVCC Snapshots chosen during flush can contain out-of-order transactions

Reply via email to