Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19681 )
Change subject: [clock] add sanity check to detect wall clock jumps ...................................................................... [clock] add sanity check to detect wall clock jumps There was a case when a timestamp read from system/local clock using the ntp_adjtime() call jumped 40+ years ahead when running kudu-tserver on an Azure VM, while ntp_adjtime() didn't report an error on clock being unsynchronized or any other error at all. That came along with a huge number of kernel messages, and other software (such as the SASL library used by SSSD) detected the strange jump in the local clock as well. My multiple attempts to reproduce the issue on a real hardware node, Dockerized environment run at a real hardware server in a datacenter, and GCE & EC2 VMs were not successful. This patch adds a sanity check to detect such strange jumps in wall clock readings. The idea is to rely on the readings from the CLOCK_MONOTONIC_RAW clock captured along with the wall clock readings. A jump should manifest itself in a big difference between the wall clock delta and the corresponding CLOCK_MONOTONIC_RAW delta. If such a condition is detected, then HybridClock::NowWithErrorUnlocked() dumps diagnostic information about clock NTP synchronisation status and returns Status::ServiceUnavailable() with appropriate error message. This patch also adds a unit test for the newly added functionality. As a part of this changelist, the following new flags are introduced: * --wall_clock_jump_detection This is to control the newly introduced sanity check for readings of the wall clock. Acceptable values are 'auto', 'enabled, and 'disabled'. It is set to 'auto' by default, meaning the sanity check for timestamps is enabled if the process detects that it's running on a VM in Azure cloud. * --wall_clock_jump_threshold_sec This is to control the threshold (in seconds) for the difference in deltas of the wall clock's and CLOCK_MONOTONIC_RAW clock's readings. It is set to 900 (15 minutes) by default. The reasoning behind having --wall_clock_jump_detection=auto by default is to skip an extra check at the majority of nodes out there since NTP-synchronized system clock isn't supposed to jump that much at all. However, since the issue has been observed only at VMs in Azure cloud, it's enabled automatically at Azure nodes to detect the issue. If the clock jump went unnoticed, the timestamp would be persisted with an operation in the WAL and propagated to other replicas. That could lead to crashes during tablet bootstrapping and would require manual intervention to remove the orphaned operations with out-of-whack timestamps from the WALs of affected tablet replicas. To assess the performance hit induced by the jump clock detection, I also added a parameterized HybridClockJumpProtectionTest.BasicPerf scenario to compare how fast HybridClock::NowWithError() runs with and without the newly introduced sanity check. I found that with sanity check enabled the method runs 1.12 times slower than without the check. The details are below. To collect stats on the performance of the HybridClock::NowWithError() method, I used the command below, running it 10 times for RELEASE and DEBUG builds to run the new test scenario. I recorded the timings output by the test scenarios and computed the average time taken to invoke HybridClock::NowWithError() 1000000 times. KUDU_ALLOW_SLOW_TESTS=1 ./bin/hybrid_clock-test --gtest_filter='*Perf*' ------------------------------------------------------------------- RELEASE build: without clock jump detection: 1000000 iterations in 158.553514 ms 1000000 iterations in 159.200062 ms 1000000 iterations in 157.840572 ms 1000000 iterations in 159.079716 ms 1000000 iterations in 161.235647 ms average: 159.181902 ms with clock jump detection: 1000000 iterations in 178.415508 ms 1000000 iterations in 178.102677 ms 1000000 iterations in 177.465617 ms 1000000 iterations in 180.552167 ms 1000000 iterations in 179.644688 ms average: 178.836131 ms ------------------------------------------------------------------- DEBUG build: without clock jump detection: 1000000 iterations in 592.051281 ms 1000000 iterations in 591.087500 ms 1000000 iterations in 597.142767 ms 1000000 iterations in 591.066335 ms 1000000 iterations in 590.717687 ms average: 592.413114 ms with clock jump detection: 1000000 iterations in 682.691815 ms 1000000 iterations in 682.087873 ms 1000000 iterations in 685.602277 ms 1000000 iterations in 683.057099 ms 1000000 iterations in 682.095996 ms average: 683.107012 ms ------------------------------------------------------------------- RELEASE, running under perf stat without clock jump detection: Performance counter stats for './bin/hybrid_clock-test --gtest_filter=Perf/HybridClockJumpProtectionTest.BasicPerf/0' (10 runs): 171.499485 task-clock # 0.999 CPUs utilized ( +- 0.18% ) 3 context-switches # 0.016 K/sec ( +- 4.76% ) 0 cpu-migrations # 0.000 K/sec 2,829 page-faults # 0.016 M/sec ( +- 1.92% ) 562,222,108 cycles # 3.278 GHz ( +- 0.18% ) 698,278,773 instructions # 1.24 insns per cycle ( +- 0.09% ) 122,793,006 branches # 715.996 M/sec ( +- 0.11% ) 319,525 branch-misses # 0.26% of all branches ( +- 0.16% ) 0.171751822 seconds time elapsed ( +- 0.18% ) ------------------------------------------------------------------- RELEASE, running under perf stat with clock jump detection: Performance counter stats for './bin/hybrid_clock-test --gtest_filter=Perf/HybridClockJumpProtectionTest.BasicPerf/1' (10 runs): 193.189743 task-clock # 0.999 CPUs utilized ( +- 0.84% ) 3 context-switches # 0.015 K/sec ( +- 8.05% ) 0 cpu-migrations # 0.000 K/sec 2,741 page-faults # 0.014 M/sec ( +- 2.35% ) 627,522,941 cycles # 3.248 GHz ( +- 0.22% ) 797,617,361 instructions # 1.27 insns per cycle ( +- 0.13% ) 138,636,629 branches # 717.619 M/sec ( +- 0.13% ) 339,705 branch-misses # 0.25% of all branches ( +- 0.31% ) 0.193459251 seconds time elapsed ( +- 0.84% ) ------------------------------------------------------------------- Change-Id: I630783653717d975a9b2ad668e8bd47b7796d275 Reviewed-on: http://gerrit.cloudera.org:8080/19473 Tested-by: Alexey Serbin <ale...@apache.org> Reviewed-by: Yingchun Lai <laiyingc...@apache.org> (cherry picked from commit 555854178b9b498701619f4bb0dbbbbeab8e69e7) Reviewed-on: http://gerrit.cloudera.org:8080/19681 Tested-by: Kudu Jenkins --- M src/kudu/clock/hybrid_clock-test.cc M src/kudu/clock/hybrid_clock.cc M src/kudu/clock/hybrid_clock.h M src/kudu/clock/system_ntp.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h 6 files changed, 377 insertions(+), 40 deletions(-) Approvals: Kudu Jenkins: Verified Yingchun Lai: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/19681 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.17.x Gerrit-MessageType: merged Gerrit-Change-Id: I630783653717d975a9b2ad668e8bd47b7796d275 Gerrit-Change-Number: 19681 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Alexey Serbin <ale...@apache.org> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yingchun Lai <laiyingc...@apache.org>