[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup
Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12170 to look at the new patch set (#2). Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup .. client: avoid KLOG_EVERY_N_SECS with ScopedCleanup I saw a failure of MasterMigrationTest.TestEndToEndMigration in which the stack watchdog logged the following stacktrace: User stack: @ 0x7fa3e575c330 (unknown) at ??:0 @ 0x52dc09 __sanitizer::internal_read() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_syscall_linux_x86_64.inc:46 (discriminator 7) @ 0x52f79f __sanitizer::ReadFromFile() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_posix.cc:176 @ 0x53acb9 __sanitizer::SymbolizerProcess::ReadFromSymbolizer() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:468 @ 0x53ba76 __sanitizer::SymbolizerProcess::SendCommand() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:445 @ 0x53c3c5 __sanitizer::Symbolizer::SymbolizePC() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:356 @ 0x539673 __sanitizer::StackTrace::Print() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:36 @ 0x541c44 MaybePrintStackTrace() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:48 @ 0x5422ee __ubsan::ScopedReport::~ScopedReport() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:73 @ 0x549608 HandleDynamicTypeCacheMiss() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:81 @ 0x549a43 __ubsan_handle_dynamic_type_cache_miss_abort at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:93 @ 0x7fa3f0086643 _ZZN4kudu6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS3_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEENKUlvE_clEv at ??:0 @ 0x7fa3f0059b8c _ZN4kudu13ScopedCleanupIZNS_6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS4_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEEUlvE_ED2Ev at ??:0 @ 0x7fa3f005945c kudu::client::internal::AsyncLeaderMasterRpc<>::RetryOrReconnectIfNecessary() at ??:0 @ 0x7fa3f0057d0a kudu::client::internal::AsyncLeaderMasterRpc<>::SendRpcCb() at ??:0 @ 0x7fa3f0085a85 _ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EE6__callIvJEJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE at ??:0 #19 0x7fa3ddc84ffc in clone sysdeps/unix/sysv/linux/x86_64/clone.S:111 Upon looking into this and similar failures, it seems like the following series of events is possible: T1: allocate an AsyncLeaderMasterRpc on the stack with Synchronizer.Wait() as the user-specified callback T2: the RPC call results in an error that makes it retry via RetryOrReconnectIfNecessary(); a retry is scheduled T3: the retry completes and calls the user-specified callback to begin execution on T1 T1: with the RPC call completed, destroys RPC object T2: the KLOG messages attempt to access state from the destroyed RPC object and hit a crash, race, undefined behavior, etc. The fix is to make a copy of the RPC's state that was to be logged so there's no chance that T1's destruction of the RPC will affect T2. I tested this by looping a test[1] that repeatedly called Client::IsCreateTableInProgress() on a multi-master cluster configured to have election storms, and observing 5/1000 that yielded TSAN errors around the logging calls in RetryOrReconnectIfNecessary. [1] https://gist.github.com/andrwng/5d552f75a2e0d671b7ed54dd01892c66 Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6 --- M src/kudu/client/master_proxy_rpc.cc 1 file changed, 9 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/12170/2 -- To view, visit http://gerrit.cloudera.org:8080/12170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6 Gerrit-Change-Number: 12170 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew W
[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12170 ) Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@11 PS1, Line 11: User stack: > this output format looks like from the stack watchdog, not UBSAN. UBSAN its You're right, it's the user stack from the watchdog. The UBSAN error was: SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/kudu/client/master_proxy_rpc.cc:188:13 in And the log cuts off there. Though looking at another, it SIGSEGVed with *** Aborted at 1546650883 (unix time) try "date -d @1546650883" if you are using GNU date *** PC: @ 0x7f48d23ce156 __dynamic_cast *** SIGSEGV (@0xfff0) received by PID 13824 (TID 0x7f48c8fce700) from PID 18446744073709551600; stack trace: *** @ 0x7f48d9865330 (unknown) at ??:0 @ 0x7f48d23ce156 __dynamic_cast at ??:0 @ 0x549184 __ubsan::checkDynamicType() at /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_type_hash_itanium.cc:234 (discriminator 2) @ 0x547083 HandleDynamicTypeCacheMiss() at /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:37 @ 0x5479a3 __ubsan_handle_dynamic_type_cache_miss_abort at /home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:93 @ 0x7f48e3bf76b3 _ZZN4kudu6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS3_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEENKUlvE_clEv at ??:0 @ 0x7f48e3bcabfc _ZN4kudu13ScopedCleanupIZNS_6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS4_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEEUlvE_ED2Ev at ??:0 @ 0x7f48e3bca4cc kudu::client::internal::AsyncLeaderMasterRpc<>::RetryOrReconnectIfNecessary() at ??:0 @ 0x7f48e3bc8d7a kudu::client::internal::AsyncLeaderMasterRpc<>::SendRpcCb() at ??:0 @ 0x7f48e3bf6af5 _ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EE6__callIvJEJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE at ??:0 @ 0x7f48e3bf698f _ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EEclIJEvEET0_DpOT_ at ??:0 @ 0x7f48de83daac boost::function0<>::operator()() at ??:0 @ 0x7f48de838cf2 kudu::rpc::OutboundCall::CallCallback() at ??:0 @ 0x7f48de83a379 kudu::rpc::OutboundCall::SetTimedOut() at ??:0 @ 0x7f48de7e4658 kudu::rpc::Connection::HandleOutboundCallTimeout() at ??:0 @ 0x7f48de7e43ad kudu::rpc::Connection::CallAwaitingResponse::HandleTimeout() at ??:0 @ 0x7f48d4050606 ev_invoke_pending at ??:0 @ 0x7f48de84e2ca kudu::rpc::ReactorThread::InvokePendingCb() at ??:0 @ 0x7f48d40514f8 ev_run at ??:0 @ 0x7f48de84e619 kudu::rpc::ReactorThread::RunThread() at ??:0 @ 0x7f48de865bd3 boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7f48de83daac boost::function0<>::operator()() at ??:0 @ 0x7f48d62b229b kudu::Thread::SuperviseThread() at ??:0 @ 0x7f48d985d184 start_thread at ??:0 @ 0x7f48d1e8fffd clone at ??:0 which also hits HandleDynamicTypeCacheMiss. So seems like it may not be as simple as I described. http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@30 PS1, Line 30: From this, it seems that KLOG_EVERY_N_SECS doesn't pair well with : ScopedCleanup. > But do we know why? Looking into it more. -- To view, visit http://gerrit.cloudera.org:8080/12170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6 Gerrit-Change-Number: 12170 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 07 Jan 2019 19:03:11 + Gerrit-HasComments: Yes
[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12170 ) Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@11 PS1, Line 11: User stack: this output format looks like from the stack watchdog, not UBSAN. UBSAN itself doesn't even give stack traces, if I remember correctly. What was the UBSAN error? -- To view, visit http://gerrit.cloudera.org:8080/12170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6 Gerrit-Change-Number: 12170 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 07 Jan 2019 17:45:41 + Gerrit-HasComments: Yes
[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12170 ) Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@30 PS1, Line 30: From this, it seems that KLOG_EVERY_N_SECS doesn't pair well with : ScopedCleanup. But do we know why? -- To view, visit http://gerrit.cloudera.org:8080/12170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6 Gerrit-Change-Number: 12170 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 07 Jan 2019 17:33:49 + Gerrit-HasComments: Yes
[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12170 Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup .. client: avoid KLOG_EVERY_N_SECS with ScopedCleanup I saw a failure of MasterMigrationTest.TestEndToEndMigration with the following UBSAN stacktrace: User stack: @ 0x7fa3e575c330 (unknown) at ??:0 @ 0x52dc09 __sanitizer::internal_read() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_syscall_linux_x86_64.inc:46 (discriminator 7) @ 0x52f79f __sanitizer::ReadFromFile() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_posix.cc:176 @ 0x53acb9 __sanitizer::SymbolizerProcess::ReadFromSymbolizer() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:468 @ 0x53ba76 __sanitizer::SymbolizerProcess::SendCommand() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:445 @ 0x53c3c5 __sanitizer::Symbolizer::SymbolizePC() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:356 @ 0x539673 __sanitizer::StackTrace::Print() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:36 @ 0x541c44 MaybePrintStackTrace() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:48 @ 0x5422ee __ubsan::ScopedReport::~ScopedReport() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:73 @ 0x549608 HandleDynamicTypeCacheMiss() at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:81 @ 0x549a43 __ubsan_handle_dynamic_type_cache_miss_abort at /data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:93 @ 0x7fa3f0086643 _ZZN4kudu6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS3_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEENKUlvE_clEv at ??:0 @ 0x7fa3f0059b8c _ZN4kudu13ScopedCleanupIZNS_6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS4_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEEUlvE_ED2Ev at ??:0 @ 0x7fa3f005945c kudu::client::internal::AsyncLeaderMasterRpc<>::RetryOrReconnectIfNecessary() at ??:0 @ 0x7fa3f0057d0a kudu::client::internal::AsyncLeaderMasterRpc<>::SendRpcCb() at ??:0 @ 0x7fa3f0085a85 _ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EE6__callIvJEJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE at ??:0 #19 0x7fa3ddc84ffc in clone sysdeps/unix/sysv/linux/x86_64/clone.S:111 >From this, it seems that KLOG_EVERY_N_SECS doesn't pair well with ScopedCleanup. Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6 --- M src/kudu/client/master_proxy_rpc.cc 1 file changed, 8 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/12170/1 -- To view, visit http://gerrit.cloudera.org:8080/12170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6 Gerrit-Change-Number: 12170 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong