Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5055 to look at the new patch set (#4). Change subject: KUDU-768 (part 1) - Move timestamp assignement out of Tablet ...................................................................... KUDU-768 (part 1) - Move timestamp assignement out of Tablet Safe time is a timestamp such that all transactions before it are known and either completed or in-flight. Waiting for the Mvcc snapshot at "safe time" to be "clean" allows to yield repeatable reads: scans of a tablet at a snapshot defined by a timestamp that will always return the same results. Proper "safe time" advancement also allows to improve load balancing: A scan at a clean timestamp that is lower that "safe time" on a replica is guaranteed to yield the same results as the same scan on the leader replica (though maybe with a lantency penalty). Currently this timestamp is advanced within Mvcc but this is not natural as in conflates the consensus state (all the operations that are being replicated and/or replayed) and the mvcc state (all the operations that have been consensus committed and are being applied). Furthermore, there is a confusing mixing of concepts in Mvcc between "safe time" and "clean time" where the latter means a timestamp such that all operation have been completed, whereas the former also includes the operations that are in-flight, even if they haven't started being applied to the tablet. This patch series aims at separating the two concepts and fixing safe time advancement: a) - Safe time advancement will be handled by consensus: The leader can easily establish which timestamps are safe for a replica by looking at which operations that replica knows and what the timestamp of the last committed operation is. b) - Mvcc will only take care of monitoring "clean time" advancement. This makes it simpler to wait for a timestamp to be "safe" and "clean" the caller will first wait for a timestamp to be "safe" meaning all operations are known and in-flight and then wait for it to be "clean" in mvcc meaning all the in-flight operations before have completed. This patch in particular takes the first two steps in this direction: 1) It moves timestamp assignment from tablet and into the TransactionDriver to be done prior to pushing the operation to consensus for replication. 2) It makes all operations be "operations at a timestamp", making all operations have the same behavior within mvcc independently of whether they were started at the leader or at a follower. Follow up patches will completely remove the Mvcc APIs for automatic safe time advancement and timestamp assignment and will introduce the new entity responsible for "safe time". Change-Id: I3ba7212f9211f585d4bef00e5ccfc24d5eece224 --- M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_peer.cc M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/transaction_tracker-test.cc 7 files changed, 51 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5055/4 -- To view, visit http://gerrit.cloudera.org:8080/5055 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3ba7212f9211f585d4bef00e5ccfc24d5eece224 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot