Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Hao Hao, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16510 to look at the new patch set (#7). Change subject: KUDU-2612: have MRS iteration account for txn metadata ...................................................................... KUDU-2612: have MRS iteration account for txn metadata This patch introduces the ability to iterate through the rows of a MRS taking into account the transaction's commit status, rather than relying on the apply timestamps of the individual mutations therein. It does so by adding a reference to the TxnMetadata in the MRS. Upon iteration, if a commit timestamp exists for the transaction, Kudu uses the transaction metadata to determine relevancy. As a refresher, the MvccManager tracks mutations by maintaining a "current" MvccSnapshot that encapsulates timestamps for ops that have been applied. Rather than keeping track of every applied timestamp individually, the MvccManager also keeps track of the currently in-flight ops and the lower bound on future ops' timestamps, as guaranteed by the TimeManager. Taken together, these define a watermark timestamp below which all timestamps can be considered applied, as well as a set of higher timestamps that are considered applied, but are higher than the earliest in-flight (not-yet-applied) op's timestamp. The MvccManager passes out MvccSnapshots that detail whether iterators should consider certain timestamps as relevant to iteration. These snapshots are used in the following ways: - Snapshot scans: - The user input is a timestamp, which is used to generate an MvccSnapshot defined by that timestamp, i.e. all timestamps before are applied, and all timestamps above are not applied. - Such a snapshot is defined to be a "clean" snapshot. - Before iterating through data, Kudu waits for the safe time to pass beyond the given timestamp, and waits for all ops with lower timestamps to complete. Only then can Kudu safely iterate through mutations with certainty that relevancy can be determined via a simple comparison against the clean snapshot. - Diff scans: - Similar to the above case, but with a second, lower input timestamp to serve as a lower bound on relevant mutation timestamps. - READ_LATEST scans: - Unlike the above two scenarios, no input timestamp is given here. Instead, Kudu will use the MvccManager's current MvccSnapshot, which isn't guaranteed to be a clean snapshot. - If it can, Kudu uses the watermark to determine relevancy (fast path, like with clean snapshots), and if not, it falls back on the set of higher timestamps that are considered applied (slow path). - Flushes and compactions: - Snapshots are also used in the context of flushes and compactions to track ops that get applied in the process of a flush or compaction, for the sake of duplicating ops onto new data stores if they were missed while swapping in the new data stores. - As with READ_LATEST, the snapshots used here aren't necessarily clean snapshots. Based on the above usages, this patch distinguishes between two types of MvccSnapshots that encapsulate all usage today: - kTimestamp: we are iterating as of a specific timestamp T. We must guarantee that iteration will see all mutations made visible before T (i.e. Raft committed before T for non-transaction ops, transaction committed before T for transaction ops). We may wait for MVCC ops to complete to ensure this is guaranteed. Scans in this mode are repeatable. Snapshot and diff scans use kTimestamp snapshots. - kLatest: we are iterating without waiting for the completion of any ops -- instead, we only care about seeing a view of the latest completed ops, regardless of whether there are non-applied ops from before the latest applied ops. READ_LATEST scans and flushes use kLatest snapshots. In the context of evaluating commit status in transactions, these snapshot types behave as follows when iterating: - kTimestamp: since we care about displaying all ops or transactions from before T, scanners should wait for T to become safe, and for ops before T to complete (including all commit MVCC ops). After waiting, all transactions that would have a commit timestamp lower than T will have a commit timestamp in their metadata. As such, it's sufficient that, while iterating, we look at the commit timestamp of each mutation and compare it to T. If no commit timestamp exists for a transactional mutation, it must not have committed as of T. - kLatest: since we don't care about using a clean snapshot, it's sufficient to use the current snapshot, which includes transactions' commit MVCC ops. If that op is finished for a given transaction, Kudu should check whether the transaction was aborted or committed. If the op was not finished in the snapshot, it could not have committed. This patch only adds the APIs to the MvccManager, with some initial usage for snapshot and diff scans in the memrowset; there is still no way to exercise these APIs using a real tablet, nor is there a way to persist the MVCC op timestamp in metadata. These will come in later patches. Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad --- M src/kudu/tablet/memrowset-test.cc M src/kudu/tablet/memrowset.cc M src/kudu/tablet/memrowset.h M src/kudu/tablet/metadata.proto M src/kudu/tablet/mvcc-test.cc M src/kudu/tablet/mvcc.cc M src/kudu/tablet/mvcc.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h A src/kudu/tablet/txn_metadata.h M src/kudu/tablet/txn_participant.cc 12 files changed, 653 insertions(+), 154 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/16510/7 -- To view, visit http://gerrit.cloudera.org:8080/16510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad Gerrit-Change-Number: 16510 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)