[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
David Ribeiro Alves has submitted this change and it was merged. Change subject: KUDU-798 (part 5) Correct safe time advancement .. KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. But there is also a major cleanup of the waiting story in TabletService (KUDU-1127) and a few new integration test features. There is an instance of a TimeManager per tablet. It's used for: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. The one "hole" in safe time is solved by leader leases. Until we have those this patch takes a conservative approach to safe time progress. Fixing safe time broke a bunch of our tests that were expecting broken snapshot scans. In particular we would return broken snapshots all the time instead of waiting for the snapshot to be correct. Of course when these errors were fixed the tests started failing. In order to address these test failures I cleaned up our snapshot scan waiting story in TabletServer::HandleScanAtSnapshot(). In particular: - The client's deadline in no longer taken into account when deciding how long to wait for a snapshot to be repeatable. There is a hard (configurable) max that the server will wait for, "clamping" the client's deadline. The key here is that, when the client deadline is clamped, we return Status::ServiceUnavailable on time out instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called on KuduScanner::Open() and ServiceUnavailable is a retryable status this enables the client to try somewhere else, perhaps where it won't have to wait as long. - TimeManager now does a pre-flight check before waiting on safe time. In particular it checks that: i) it has heard from the leader within a configurable amount of time (that safe time _can_ make progress). ii) it checks that the safe time is not more that a configurable amount of time in the past, 30 seconds by default (that safe time is likely to make progress to the required timestamp). Finally, this patch adds two new integration test workloads that prove that it works. It adds a new read workload to TestWorkload that performs snapshot scans in the present, while writes are happening. This can be enabled anywhere but this patch enables it for a bunch of tests in RaftConsensusItest, in particular the *Churny* and *Crashy* ones with unique keys. This patch also enables linked_list-test to perform snapshot scans in the present after the verification. Results: I ran raft_consensus-itest with the new snapshot read workload on dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 bin/raft_consensus-itest \ --gtest_filter=*Churny*:*Crashy*-*Duplicate* I pulled the test to before this patch. It failed 1000/1000 on master. With this patch it passed 1000/1000: http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287 I ran linked_list-test with the new snapshot scans in dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \ --gtest_filter=*TestLoadAndVerify* The test passed 1000/1000 whereas before it would fail 427/1000. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665 I also ran the test in client-test that tests fault tolerance. Run config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 -- bin/client-test \ --gtest_filter=*ScanFaultTolerance* --stress_cpu_threads=2 The test passed 1000/1000 times. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481171410.4460 Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Reviewed-on: http://gerrit.cloudera.org:8080/5240 Reviewed-by: Todd LipconTested-by: David Ribeiro Alves --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 34: Verified+1 overriding jenkins, all tests passed but debug build failed due to: 05:06:31 All tests passed, yet some left behind their test output: 05:06:31 krb5kdc-1481173108926 -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 34 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Todd Lipcon has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 34: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 34 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5240 to look at the new patch set (#34). Change subject: KUDU-798 (part 5) Correct safe time advancement .. KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. But there is also a major cleanup of the waiting story in TabletService (KUDU-1127) and a few new integration test features. There is an instance of a TimeManager per tablet. It's used for: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. The one "hole" in safe time is solved by leader leases. Until we have those this patch takes a conservative approach to safe time progress. Fixing safe time broke a bunch of our tests that were expecting broken snapshot scans. In particular we would return broken snapshots all the time instead of waiting for the snapshot to be correct. Of course when these errors were fixed the tests started failing. In order to address these test failures I cleaned up our snapshot scan waiting story in TabletServer::HandleScanAtSnapshot(). In particular: - The client's deadline in no longer taken into account when deciding how long to wait for a snapshot to be repeatable. There is a hard (configurable) max that the server will wait for, "clamping" the client's deadline. The key here is that, when the client deadline is clamped, we return Status::ServiceUnavailable on time out instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called on KuduScanner::Open() and ServiceUnavailable is a retryable status this enables the client to try somewhere else, perhaps where it won't have to wait as long. - TimeManager now does a pre-flight check before waiting on safe time. In particular it checks that: i) it has heard from the leader within a configurable amount of time (that safe time _can_ make progress). ii) it checks that the safe time is not more that a configurable amount of time in the past, 30 seconds by default (that safe time is likely to make progress to the required timestamp). Finally, this patch adds two new integration test workloads that prove that it works. It adds a new read workload to TestWorkload that performs snapshot scans in the present, while writes are happening. This can be enabled anywhere but this patch enables it for a bunch of tests in RaftConsensusItest, in particular the *Churny* and *Crashy* ones with unique keys. This patch also enables linked_list-test to perform snapshot scans in the present after the verification. Results: I ran raft_consensus-itest with the new snapshot read workload on dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 bin/raft_consensus-itest \ --gtest_filter=*Churny*:*Crashy*-*Duplicate* I pulled the test to before this patch. It failed 1000/1000 on master. With this patch it passed 1000/1000: http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287 I ran linked_list-test with the new snapshot scans in dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \ --gtest_filter=*TestLoadAndVerify* The test passed 1000/1000 whereas before it would fail 427/1000. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665 I also ran the test in client-test that tests fault tolerance. Run config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 -- bin/client-test \ --gtest_filter=*ScanFaultTolerance* --stress_cpu_threads=2 The test passed 1000/1000 times. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481171410.4460 Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M src/kudu/consensus/consensus.proto M
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Todd Lipcon has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 33: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 33 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5240 to look at the new patch set (#33). Change subject: KUDU-798 (part 5) Correct safe time advancement .. KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. But there is also a major cleanup of the waiting story in TabletService (KUDU-1127) and a few new integration test features. There is an instance of a TimeManager per tablet. It's used for: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. The one "hole" in safe time is solved by leader leases. Until we have those this patch takes a conservative approach to safe time progress. Fixing safe time broke a bunch of our tests that were expecting broken snapshot scans. In particular we would return broken snapshots all the time instead of waiting for the snapshot to be correct. Of course when these errors were fixed the tests started failing. In order to address these test failures I cleaned up our snapshot scan waiting story in TabletServer::HandleScanAtSnapshot(). In particular: - The client's deadline in no longer taken into account when deciding how long to wait for a snapshot to be repeatable. There is a hard (configurable) max that the server will wait for, "clamping" the client's deadline. The key here is that, when the client deadline is clamped, we return Status::ServiceUnavailable on time out instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called on KuduScanner::Open() and ServiceUnavailable is a retryable status this enables the client to try somewhere else, perhaps where it won't have to wait as long. - TimeManager now does a pre-flight check before waiting on safe time. In particular it checks that: i) it has heard from the leader within a configurable amount of time (that safe time _can_ make progress). ii) it checks that the safe time is not more that a configurable amount of time in the past, 30 seconds by default (that safe time is likely to make progress to the required timestamp). Finally, this patch adds two new integration test workloads that prove that it works. It adds a new read workload to TestWorkload that performs snapshot scans in the present, while writes are happening. This can be enabled anywhere but this patch enables it for a bunch of tests in RaftConsensusItest, in particular the *Churny* and *Crashy* ones with unique keys. This patch also enables linked_list-test to perform snapshot scans in the present after the verification. Results: I ran raft_consensus-itest with the new snapshot read workload on dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 bin/raft_consensus-itest \ --gtest_filter=*Churny*:*Crashy*-*Duplicate* I pulled the test to before this patch. It failed 1000/1000 on master. With this patch it passed 1000/1000: http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287 I ran linked_list-test with the new snapshot scans in dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \ --gtest_filter=*TestLoadAndVerify* The test passed 1000/1000 whereas before it would fail 427/1000. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665 I also ran the test in client-test that tests fault tolerance. Run config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 -- bin/client-test \ --gtest_filter=*ScanFaultTolerance* --stress_cpu_threads=2 The test passed 1000/1000 times. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481171410.4460 Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M src/kudu/consensus/consensus.proto M
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 32: (35 comments) http://gerrit.cloudera.org:8080/#/c/5240/32//COMMIT_MSG Commit Message: PS32, Line 31: whole > hole doh PS32, Line 56: it > is Done PS32, Line 52: : - TimeManager now does a pre-flight check before waiting on safe time. : In particular it checks that: i) it has heard from the leader within : a configurable amount of time (that safe time _can_ make progress). : ii) it checks that the safe time it not more that a configurable : amount of time in the past, 30 seconds by default (that safe time : is likely to make progress to the required timestamp). > I haven't looked at the code yet, but one question here: these checks shoul doing this already, yeah :) PS32, Line 65: unique > any particular reason why not the one with dup keys too? because the duplicate keys one only has 20 different rows and the test works with row counts. PS32, Line 71: workloaf > typo Done http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test-util.cc File src/kudu/client/client-test-util.cc: Line 58: ASSERT_OK(scanner.SetSelection(KuduClient::LEADER_ONLY)); > is LEADER_ONLY still relevant here? or should this be changed to set it to left a TODO think we should cleanup this retry code. http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 1147: TEST_F(ClientTest, TestScanFaultTolerance) { > did you loop this test too? I had looped it quite a bit cuz it was flaky, but forgot to keep the results. Just looped it 1000 more. 1000 passes :) Updated the commit message. http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 456: KLOG_EVERY_N_SECS(WARNING, 10) << "Safe time advancement without writes is disabled. " > hrm, maybe FIRST_K is better here? not sure seeing it once every 10 seconds made it minutes Done http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1238: if (request->has_safe_timestamp()) { > I'm still not sure this is in the right spot if the tail of the messages in yeah I agree that we shouldn't take the safe timestamp into account if we failed to prepare. left a todo. http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS32, Line 31: safe_time_max_last_advanced_safe_time > not a big fan of this name. How about: missed_heartbeats_before_rejecting_s Done PS32, Line 33: snapshot scans > scans at snapshots that aren't yet safe Done Line 34: TAG_FLAG(safe_time_max_last_advanced_safe_time, advanced); > let's mark this unstable for now too Done PS32, Line 37: lag > lag behind the requested timestamp? Done Line 39: TAG_FLAG(safe_time_max_lag_ms, advanced); > same Done PS32, Line 151: $0 secs, (max is $1 sec > 'secs' is redundant here since MonoDelta::ToString already has a 's' suffix Done Line 166: safe_time_diff.ToMilliseconds(), > wrong units here (message says secs) Done Line 167: FLAGS_safe_time_max_lag_ms); > same Done Line 173: void TimeManager::MakeWaiterTimeoutMessageUnlocked(Timestamp timestamp, string* error_message) { > probably easier to just return string from this method instead of the out-p yeah, it's ugly so I tried to hide it. moved back. Line 188: if (mode_ == NON_LEADER && timestamp > last_safe_ts_) { > ah ok, so you are doing the thing that I asked about in the commit message. Done http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.h File src/kudu/consensus/time_manager.h: PS32, Line 117: is > if Done http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: Line 2345: "--maintenance_manager_polling_interval_ms=300", > typo? Done Line 2664: //workload.set_num_read_threads(2); > hrm? Done http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: Line 270: start_latch_.Reset((uint64_t)num_write_threads_ + (uint64_t)num_read_threads_); > nit: why these casts? cuz tidy bot complains otherwise. removed http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/tools/tool_action_remote_replica.cc File src/kudu/tools/tool_action_remote_replica.cc: Line 100: // and thus cannot do snapshot scans. > I think ORDERED is still useful here. the two were mashed together when we added (or after) fault tolerance. now the the tablet service will return an error if ordered is set
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Todd Lipcon has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 32: (35 comments) http://gerrit.cloudera.org:8080/#/c/5240/32//COMMIT_MSG Commit Message: PS32, Line 31: whole hole PS32, Line 56: it is PS32, Line 52: : - TimeManager now does a pre-flight check before waiting on safe time. : In particular it checks that: i) it has heard from the leader within : a configurable amount of time (that safe time _can_ make progress). : ii) it checks that the safe time it not more that a configurable : amount of time in the past, 30 seconds by default (that safe time : is likely to make progress to the required timestamp). I haven't looked at the code yet, but one question here: these checks should be only relevant for 'now' or 'recent past' type snapshot queries. If you queried 30 seconds in the past, for example, then the preflight check is not necessary, right? Put another way, we should only do this check if the requested snapshot is not safe yet (ie if we'd actually have to wait at all). Ignore this comment if you're already doing this :) PS32, Line 65: unique any particular reason why not the one with dup keys too? PS32, Line 71: workloaf typo http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test-util.cc File src/kudu/client/client-test-util.cc: Line 58: ASSERT_OK(scanner.SetSelection(KuduClient::LEADER_ONLY)); is LEADER_ONLY still relevant here? or should this be changed to set it to SNAPSHOT at current time? (feel free to add TODO if it's not a trivial change) http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/client/client-test.cc File src/kudu/client/client-test.cc: Line 1147: TEST_F(ClientTest, TestScanFaultTolerance) { did you loop this test too? http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 456: KLOG_EVERY_N_SECS(WARNING, 10) << "Safe time advancement without writes is disabled. " hrm, maybe FIRST_K is better here? not sure seeing it once every 10 seconds is that useful. (or maybe once every 5 minutes or something?) http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1238: if (request->has_safe_timestamp()) { I'm still not sure this is in the right spot if the tail of the messages in the request failed to prepare. Agree right now it's not relevant, but once the leader starts sending this alongside ops, this behavior isn't right http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS32, Line 31: safe_time_max_last_advanced_safe_time not a big fan of this name. How about: missed_heartbeats_before_rejecting_snapshot_scans or something of that nature? PS32, Line 33: snapshot scans scans at snapshots that aren't yet safe Line 34: TAG_FLAG(safe_time_max_last_advanced_safe_time, advanced); let's mark this unstable for now too PS32, Line 37: lag lag behind the requested timestamp? Line 39: TAG_FLAG(safe_time_max_lag_ms, advanced); same PS32, Line 151: $0 secs, (max is $1 sec 'secs' is redundant here since MonoDelta::ToString already has a 's' suffix Line 166: safe_time_diff.ToMilliseconds(), wrong units here (message says secs) Line 167: FLAGS_safe_time_max_lag_ms); same Line 173: void TimeManager::MakeWaiterTimeoutMessageUnlocked(Timestamp timestamp, string* error_message) { probably easier to just return string from this method instead of the out-param. Or just inline it at the call site (I only see it once) Line 188: if (mode_ == NON_LEADER && timestamp > last_safe_ts_) { ah ok, so you are doing the thing that I asked about in the commit message. nice. http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/consensus/time_manager.h File src/kudu/consensus/time_manager.h: PS32, Line 117: is if http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: Line 2345: "--maintenance_manager_polling_interval_ms=300", typo? Line 2664: //workload.set_num_read_threads(2); hrm? http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/integration-tests/test_workload.cc File src/kudu/integration-tests/test_workload.cc: Line 270: start_latch_.Reset((uint64_t)num_write_threads_ + (uint64_t)num_read_threads_); nit: why these casts? http://gerrit.cloudera.org:8080/#/c/5240/32/src/kudu/tools/tool_action_remote_replica.cc File src/kudu/tools/tool_action_remote_replica.cc: Line 100: // and thus cannot do snapshot scans. I think ORDERED is still useful here.
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 30: I won't address any more tidy bot nits that aren't part of the changes in this patch. Its big as it is without spurious changes. -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 30 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 28: (12 comments) http://gerrit.cloudera.org:8080/#/c/5240/27//COMMIT_MSG Commit Message: PS27, Line 15: n > nit: add comma or period? Done PS27, Line 15: ins > its Done http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 450: // If we're not sending ops to the follower, set the safe time on the request. > worth a TODO here that we could also set it if we are sending our latest op Done http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/consensus_queue.h File src/kudu/consensus/consensus_queue.h: Line 34: //#include "kudu/consensus/time_manager.h" > nit Done http://gerrit.cloudera.org:8080/#/c/5240/28/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS28, Line 184: // Pre-flight checks, make sure we've heard from the leader recently and that safe time : // isn't lagging too much. > maybe we should do a GetSafeTime() call here so that if we're leader we 're Done http://gerrit.cloudera.org:8080/#/c/5240/28/src/kudu/consensus/time_manager.h File src/kudu/consensus/time_manager.h: Line 145: void MakeWaiterTimeoutMessageUnlocked(Timestamp timestmap, std::string* error_message); > warning: function 'kudu::consensus::TimeManager::MakeWaiterTimeoutMessageUn Done http://gerrit.cloudera.org:8080/#/c/5240/28/src/kudu/integration-tests/consistency-itest.cc File src/kudu/integration-tests/consistency-itest.cc: PS28, Line 639: 3 > What is the reason to have 3 rows inserted instead of 1? It would be nice Think was leftover from a previous patch. removed http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 37: #include "kudu/consensus/time_manager.h" > needed? Done http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/tablet_peer_mm_ops.cc File src/kudu/tablet/tablet_peer_mm_ops.cc: Line 26: #include "kudu/consensus/time_manager.h" > unnecessary Done http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/transactions/alter_schema_transaction.cc File src/kudu/tablet/transactions/alter_schema_transaction.cc: Line 23: #include "kudu/consensus/time_manager.h" > unnecessary Done http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/transactions/transaction_tracker.cc File src/kudu/tablet/transactions/transaction_tracker.cc: Line 24: #include "kudu/consensus/time_manager.h" > unnecessary Done http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/transactions/write_transaction.cc File src/kudu/tablet/transactions/write_transaction.cc: Line 25: #include "kudu/consensus/time_manager.h" > same Done -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5240 to look at the new patch set (#30). Change subject: KUDU-798 (part 5) Correct safe time advancement .. KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. But there is also a major cleanup of the waiting story in TabletService (KUDU-1127) and a few new integration test features. There is an instance of a TimeManager per tablet. It's used for: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. The one "whole" in safe time is solved by leader leases. Until we have those this patch takes a conservative approach to safe time progress. Fixing safe time broke a bunch of our tests that were expecting broken snapshot scans. In particular we would return broken snapshots all the time instead of waiting for the snapshot to be correct. Of course when these errors were fixed the tests started failing. In order to address these test failures I cleaned up our snapshot scan waiting story in TabletServer::HandleScanAtSnapshot(). In particular: - The client's deadline in no longer taken into account when deciding how long to wait for a snapshot to be repeatable. There is a hard (configurable) max that the server will wait for, "clamping" the client's deadline. The key here is that, when the client deadline is clamped, we return Status::ServiceUnavailable on time out instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called on KuduScanner::Open() and ServiceUnavailable is a retryable status this enables the client to try somewhere else, perhaps where it won't have to wait as long. - TimeManager now does a pre-flight check before waiting on safe time. In particular it checks that: i) it has heard from the leader within a configurable amount of time (that safe time _can_ make progress). ii) it checks that the safe time it not more that a configurable amount of time in the past, 30 seconds by default (that safe time is likely to make progress to the required timestamp). Finally, this patch adds two new integration test workloads that prove that it works. It adds a new read workload to TestWorkload that performs snapshot scans in the present, while writes are happening. This can be enabled anywhere but this patch enables it for a bunch of tests in RaftConsensusItest, in particular the *Churny* and *Crashy* ones with unique keys. This patch also enables linked_list-test to perform snapshot scans in the present after the verification. Results: I ran raft_consensus-itest with the new snapshot read workloaf on dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 bin/raft_consensus-itest \ --gtest_filter=*Churny*:*Crashy*-*Duplicate* I pulled the test to before this patch. It failed 1000/1000 on master. With this patch it passed 1000/1000: http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287 I ran linked_list-test with the new snapshot scans in dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \ --gtest_filter=*TestLoadAndVerify* The test passed 1000/1000 whereas before it would fail 427/1000. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665 Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5240 to look at the new patch set (#29). Change subject: KUDU-798 (part 5) Correct safe time advancement .. KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. But there is also a major cleanup of the waiting story in TabletService (KUDU-1127) and a few new integration test features. There is an instance of a TimeManager per tablet. It's used for: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. The one "whole" in safe time is solved by leader leases. Until we have those this patch takes a conservative approach to safe time progress. Fixing safe time broke a bunch of our tests that were expecting broken snapshot scans. In particular we would return broken snapshots all the time instead of waiting for the snapshot to be correct. Of course when these errors were fixed the tests started failing. In order to address these test failures I cleaned up our snapshot scan waiting story in TabletServer::HandleScanAtSnapshot(). In particular: - The client's deadline in no longer taken into account when deciding how long to wait for a snapshot to be repeatable. There is a hard (configurable) max that the server will wait for, "clamping" the client's deadline. The key here is that, when the client deadline is clamped, we return Status::ServiceUnavailable on time out instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called on KuduScanner::Open() and ServiceUnavailable is a retryable status this enables the client to try somewhere else, perhaps where it won't have to wait as long. - TimeManager now does a pre-flight check before waiting on safe time. In particular it checks that: i) it has heard from the leader within a configurable amount of time (that safe time _can_ make progress). ii) it checks that the safe time it not more that a configurable amount of time in the past, 30 seconds by default (that safe time is likely to make progress to the required timestamp). Finally, this patch adds two new integration test workloads that prove that it works. It adds a new read workload to TestWorkload that performs snapshot scans in the present, while writes are happening. This can be enabled anywhere but this patch enables it for a bunch of tests in RaftConsensusItest, in particular the *Churny* and *Crashy* ones with unique keys. This patch also enables linked_list-test to perform snapshot scans in the present after the verification. Results: I ran raft_consensus-itest with the new snapshot read workloaf on dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 bin/raft_consensus-itest \ --gtest_filter=*Churny*:*Crashy*-*Duplicate* I pulled the test to before this patch. It failed 1000/1000 on master. With this patch it passed 1000/1000: http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287 I ran linked_list-test with the new snapshot scans in dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \ --gtest_filter=*TestLoadAndVerify* The test passed 1000/1000 whereas before it would fail 427/1000. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665 Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Alexey Serbin has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 28: (2 comments) http://gerrit.cloudera.org:8080/#/c/5240/27//COMMIT_MSG Commit Message: PS27, Line 15: n nit: add comma or period? http://gerrit.cloudera.org:8080/#/c/5240/28/src/kudu/integration-tests/consistency-itest.cc File src/kudu/integration-tests/consistency-itest.cc: PS28, Line 639: 3 What is the reason to have 3 rows inserted instead of 1? It would be nice to know. -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 28: (3 comments) http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS27, Line 1198: fail. : // Once we have that functionality we'll have to revisit this. > nit: two sentences Done Line 1204: // If we stopped before reaching the end we failed to prepare some message(s) and need > but in the meantime don't we want to do this only if we prepared all ops? i sure, makes sense and it's defensive though I'm not sure it matters in the current implementation. http://gerrit.cloudera.org:8080/#/c/5240/28/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: PS28, Line 184: // Pre-flight checks, make sure we've heard from the leader recently and that safe time : // isn't lagging too much. maybe we should do a GetSafeTime() call here so that if we're leader we 're less likely to trip the leader check (though we do call it in every heartbeat). -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 28 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5240 to look at the new patch set (#28). Change subject: KUDU-798 (part 5) Correct safe time advancement .. KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. But there is also a major cleanup of the waiting story in TabletService (KUDU-1127) and a few new integration test features. There is an instance of a TimeManager per tablet. It's used for: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. The one "whole" in safe time is solved by leader leases. Until we have those this patch takes a conservative approach to safe time progress. Fixing safe time broke a bunch of our tests that were expecting broken snapshot scans. In particular we would return broken snapshots all the time instead of waiting for the snapshot to be correct. Of course when these errors were fixed the tests started failing. In order to address these test failures I cleaned up our snapshot scan waiting story in TabletServer::HandleScanAtSnapshot(). In particular: - The client's deadline in no longer taken into account when deciding how long to wait for a snapshot to be repeatable. There is a hard (configurable) max that the server will wait for, "clamping" the client's deadline. The key here is that, when the client deadline is clamped, we return Status::ServiceUnavailable on time out instead of Status::TimeOut(). Since HandleScanAtSnapshot() is called on KuduScanner::Open() and ServiceUnavailable is a retryable status this enables the client to try somewhere else, perhaps where it won't have to wait as long. - TimeManager now does a pre-flight check before waiting on safe time. In particular it checks that: i) it has heard from the leader within a configurable amount of time (that safe time _can_ make progress). ii) it checks that the safe time it not more that a configurable amount of time in the past, 30 seconds by default (that safe time is likely to make progress to the required timestamp). Finally, this patch adds two new integration test workloads that prove that it works. It adds a new read workload to TestWorkload that performs snapshot scans in the present, while writes are happening. This can be enabled anywhere but this patch enables it for a bunch of tests in RaftConsensusItest, in particular the *Churny* and *Crashy* ones with unique keys. This patch also enables linked_list-test to perform snapshot scans in the present after the verification. Results: I ran raft_consensus-itest with the new snapshot read workloaf on dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ --disable-sharding loop -n 1000 bin/raft_consensus-itest \ --gtest_filter=*Churny*:*Crashy*-*Duplicate* I pulled the test to before this patch. It failed 1000/1000 on master. With this patch it passed 1000/1000: http://dist-test.cloudera.org//job?job_id=david.alves.1481097865.18287 I ran linked_list-test with the new snapshot scans in dist-test, asan, with the following config: KUDU_ALLOW_SLOW_TESTS=1 ../../build-support/dist_test.py \ loop -n 1000 bin/linked_list-test --stress_cpu_threads=2 \ --gtest_filter=*TestLoadAndVerify* The test passed 1000/1000 whereas before it would fail 427/1000. Results: http://dist-test.cloudera.org//job?job_id=david.alves.1481104106.24665 Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Todd Lipcon has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 27: (1 comment) http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1204: // All transactions that are going to be prepared were started, advance the safe timestamp. > yeah, the comment is misleading likely left over from the leader leases pat but in the meantime don't we want to do this only if we prepared all ops? i.e move this down to line 1214? -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 27: (1 comment) http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1204: // All transactions that are going to be prepared were started, advance the safe timestamp. > this doesn't smell quite right: imagine the leader sent: yeah, the comment is misleading likely left over from the leader leases patch. see, when we have those, the leader is free to send safe time with actual uncommitted messages, but until then the leader only sets this timestamp when there are no messages to send. this largely reduces the opportunity for anomalies, at the cost of some extra latency. its a bit conservative, but can be helped with "update me" rpcs which I still have some hope to put up as it should be simple -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Todd Lipcon has posted comments on this change. Change subject: KUDU-798 (part 5) Correct safe time advancement .. Patch Set 27: (10 comments) http://gerrit.cloudera.org:8080/#/c/5240/27//COMMIT_MSG Commit Message: PS27, Line 15: it's its http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 450: // If we're not sending ops to the follower, set the safe time on the request. worth a TODO here that we could also set it if we are sending our latest op to the follower? http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/consensus_queue.h File src/kudu/consensus/consensus_queue.h: Line 34: //#include "kudu/consensus/time_manager.h" nit http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS27, Line 1198: fail : // once we have that functionality we'll have to revisit this. nit: two sentences Line 1204: // All transactions that are going to be prepared were started, advance the safe timestamp. this doesn't smell quite right: imagine the leader sent: 1.1 @ time 10 1.2 @ time 20 1.3 @ time 30 safetime = 40 and then we failed to prepare 1.3.. in that case, we can't go ahead and set safetime to 40 http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 37: #include "kudu/consensus/time_manager.h" needed? http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/tablet_peer_mm_ops.cc File src/kudu/tablet/tablet_peer_mm_ops.cc: Line 26: #include "kudu/consensus/time_manager.h" unnecessary http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/transactions/alter_schema_transaction.cc File src/kudu/tablet/transactions/alter_schema_transaction.cc: Line 23: #include "kudu/consensus/time_manager.h" unnecessary http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/transactions/transaction_tracker.cc File src/kudu/tablet/transactions/transaction_tracker.cc: Line 24: #include "kudu/consensus/time_manager.h" unnecessary http://gerrit.cloudera.org:8080/#/c/5240/27/src/kudu/tablet/transactions/write_transaction.cc File src/kudu/tablet/transactions/write_transaction.cc: Line 25: #include "kudu/consensus/time_manager.h" same -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 5) Correct safe time advancement
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5240 to look at the new patch set (#27). Change subject: KUDU-798 (part 5) Correct safe time advancement .. KUDU-798 (part 5) Correct safe time advancement This patch fixes safe time advancement in general and allows for safe time advancement in the absence of writes in particular. The core of the patch is to plug in the new TimeManager class wherever needed. There is an instance of a TimeManager per tablet it's main uses are: - When replicating messages to other replicas a leader uses the TimeManager to assign timestamps and to obtain a safe time to send, even when there are no writes. - When receiving messages from a leader consensus uses the TimeManager to update the clock and to unblock any waiters that might waiting for a particular timestamp to be safe. - Before a snapshot scan proceeds to scan, it must first wait for the TimeManager to deem whatever timestamp it has safe. Then it proceeds to wait for the snapshot at timestamp to have all its transactions committed and, finally, proceeds with the scan. Put together, these changes allow to make sure that snapshot scans are repeatable in the large majority of cases. Note: There is a high likely hood that https://gerrit.cloudera.org/#/c/5375/ will be merged with this patch. However I'm currently running some experiments with it pre/post this patch. Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestHybridTime.java M src/kudu/client/client-test-util.cc M src/kudu/client/client-test.cc M src/kudu/consensus/consensus.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M src/kudu/integration-tests/alter_table-test.cc M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/tablet/mvcc-test.cc M src/kudu/tablet/mvcc.cc M src/kudu/tablet/mvcc.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_peer.cc M src/kudu/tablet/tablet_peer.h M src/kudu/tablet/tablet_peer_mm_ops.cc M src/kudu/tablet/transactions/alter_schema_transaction.cc M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/transaction_tracker-test.cc M src/kudu/tablet/transactions/transaction_tracker.cc M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/tablet_service.h 33 files changed, 228 insertions(+), 150 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/5240/27 -- To view, visit http://gerrit.cloudera.org:8080/5240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8532fdb069c8bee7f3e08ffe74cab0273885cc8e Gerrit-PatchSet: 27 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon