Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5240 to look at the new patch set (#34). Change subject: KUDU-798 (part 5) Correct safe time advancement ...................................................................... KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. But there is also a major cleanup of the waiting story in TabletService (KUDU-1127) and a few new integration test features. There is an instance of a TimeManager per tablet. It's used for: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. The one "hole" in safe time is solved by leader leases. Until we have those this patch takes a conservative approach to safe time progress. Fixing safe time broke a bunch of our tests that were expecting broken snapshot scans. In particular we would return broken snapshots all the time instead of waiting for the snapshot to be correct. Of course when these errors were fixed the tests started failing. In order to address these test failures I cleaned up our snapshot scan waiting story in TabletServer::HandleScanAtSnapshot(). In particular: - The client's deadline in no longer taken into account when deciding how long to wait for a snapshot to be repeatable. There is a hard (configurable) max that the server will wait for, "clamping" the client's deadline. The key here is that, when the client deadline is clamped, we return Status::ServiceUnavailable on time out instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called on KuduScanner::Open() and ServiceUnavailable is a retryable status this enables the client to try somewhere else, perhaps where it won't have to wait as long. - TimeManager now does a pre-flight check before waiting on safe time. In particular it checks that: i) it has heard from the leader within a configurable amount of time (that safe time _can_ make progress). ii) it checks that the safe time is not more that a configurable amount of time in the past, 30 seconds by default (that safe time is likely to make progress to the required timestamp). Finally, this patch adds two new integration test workloads that prove that it works. It adds a new read workload to TestWorkload that performs snapshot scans in the present, while writes are happening. This can be enabled anywhere but this patch enables it for a bunch of tests in RaftConsensusItest, in particular the *Churny* and *Crashy* ones with unique keys. This patch also enables linked_list-test to perform snapshot scans in the present after the verification. Results: I ran raft_consensus-itest with the new snapshot read workload on dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 bin/raft_consensus-itest \ --gtest_filter=*Churny*:*Crashy*-*Duplicate* I pulled the test to before this patch. It failed 1000/1000 on master. With this patch it passed 1000/1000: http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287 I ran linked_list-test with the new snapshot scans in dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \ --gtest_filter=*TestLoadAndVerify* The test passed 1000/1000 whereas before it would fail 427/1000. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665 I also ran the test in client-test that tests fault tolerance. Run config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 -- bin/client-test \ --gtest_filter=*ScanFaultTolerance* --stress_cpu_threads=2 The test passed 1000/1000 times. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481171410.4460 Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M src/kudu/consensus/time_manager.cc M src/kudu/consensus/time_manager.h M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/cluster_verifier.cc M src/kudu/integration-tests/linked_list-test-util.h M src/kudu/integration-tests/linked_list-test.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/tablet_history_gc-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/mvcc-test.cc M src/kudu/tablet/mvcc.cc M src/kudu/tablet/mvcc.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_peer.cc M src/kudu/tablet/tablet_peer.h M src/kudu/tablet/transactions/alter_schema_transaction.cc M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/transaction_tracker-test.cc M src/kudu/tablet/transactions/transaction_tracker.cc M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tools/tool_action_remote_replica.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h 41 files changed, 591 insertions(+), 274 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5240/34 -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 34 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org>