[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup

2019-01-11 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12170

to look at the new patch set (#2).

Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup
..

client: avoid KLOG_EVERY_N_SECS with ScopedCleanup

I saw a failure of MasterMigrationTest.TestEndToEndMigration in which
the stack watchdog logged the following stacktrace:

User stack:
@ 0x7fa3e575c330  (unknown) at ??:0
@   0x52dc09  __sanitizer::internal_read() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_syscall_linux_x86_64.inc:46
 (discriminator 7)
@   0x52f79f  __sanitizer::ReadFromFile() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_posix.cc:176
@   0x53acb9  __sanitizer::SymbolizerProcess::ReadFromSymbolizer() 
at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:468
@   0x53ba76  __sanitizer::SymbolizerProcess::SendCommand() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:445
@   0x53c3c5  __sanitizer::Symbolizer::SymbolizePC() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:356
@   0x539673  __sanitizer::StackTrace::Print() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:36
@   0x541c44  MaybePrintStackTrace() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:48
@   0x5422ee  __ubsan::ScopedReport::~ScopedReport() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:73
@   0x549608  HandleDynamicTypeCacheMiss() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:81
@   0x549a43  __ubsan_handle_dynamic_type_cache_miss_abort at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:93
@ 0x7fa3f0086643  
_ZZN4kudu6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS3_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEENKUlvE_clEv
 at ??:0
@ 0x7fa3f0059b8c  
_ZN4kudu13ScopedCleanupIZNS_6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS4_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEEUlvE_ED2Ev
 at ??:0
@ 0x7fa3f005945c  
kudu::client::internal::AsyncLeaderMasterRpc<>::RetryOrReconnectIfNecessary() 
at ??:0
@ 0x7fa3f0057d0a  
kudu::client::internal::AsyncLeaderMasterRpc<>::SendRpcCb() at ??:0
@ 0x7fa3f0085a85  
_ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EE6__callIvJEJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
 at ??:0
#19 0x7fa3ddc84ffc in clone sysdeps/unix/sysv/linux/x86_64/clone.S:111

Upon looking into this and similar failures, it seems like the following
series of events is possible:

T1: allocate an AsyncLeaderMasterRpc on the stack with
Synchronizer.Wait() as the user-specified callback
T2: the RPC call results in an error that makes it retry via
RetryOrReconnectIfNecessary(); a retry is scheduled
T3: the retry completes and calls the user-specified callback to begin
execution on T1
T1: with the RPC call completed, destroys RPC object
T2: the KLOG messages attempt to access state from the destroyed RPC
object and hit a crash, race, undefined behavior, etc.

The fix is to make a copy of the RPC's state that was to be logged so
there's no chance that T1's destruction of the RPC will affect T2. I
tested this by looping a test[1] that repeatedly called
Client::IsCreateTableInProgress() on a multi-master cluster configured
to have election storms, and observing 5/1000 that yielded TSAN errors
around the logging calls in RetryOrReconnectIfNecessary.

[1] https://gist.github.com/andrwng/5d552f75a2e0d671b7ed54dd01892c66

Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6
---
M src/kudu/client/master_proxy_rpc.cc
1 file changed, 9 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/12170/2
--
To view, visit http://gerrit.cloudera.org:8080/12170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6
Gerrit-Change-Number: 12170
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew W

[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup

2019-01-07 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12170 )

Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@11
PS1, Line 11: User stack:
> this output format looks like from the stack watchdog, not UBSAN. UBSAN its
You're right, it's the user stack from the watchdog.

The UBSAN error was:

 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior 
src/kudu/client/master_proxy_rpc.cc:188:13 in

And the log cuts off there.

Though looking at another, it SIGSEGVed with

*** Aborted at 1546650883 (unix time) try "date -d @1546650883" if you are 
using GNU date ***
PC: @ 0x7f48d23ce156 __dynamic_cast
*** SIGSEGV (@0xfff0) received by PID 13824 (TID 0x7f48c8fce700) 
from PID 18446744073709551600; stack trace: ***
@ 0x7f48d9865330 (unknown) at ??:0
@ 0x7f48d23ce156 __dynamic_cast at ??:0
@   0x549184 __ubsan::checkDynamicType() at 
/home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_type_hash_itanium.cc:234
 (discriminator 2)
@   0x547083 HandleDynamicTypeCacheMiss() at 
/home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:37
@   0x5479a3 __ubsan_handle_dynamic_type_cache_miss_abort at 
/home/jenkins-slave/workspace/kudu-master/2/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:93
@ 0x7f48e3bf76b3 
_ZZN4kudu6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS3_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEENKUlvE_clEv
 at ??:0
@ 0x7f48e3bcabfc 
_ZN4kudu13ScopedCleanupIZNS_6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS4_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEEUlvE_ED2Ev
 at ??:0
@ 0x7f48e3bca4cc 
kudu::client::internal::AsyncLeaderMasterRpc<>::RetryOrReconnectIfNecessary() 
at ??:0
@ 0x7f48e3bc8d7a 
kudu::client::internal::AsyncLeaderMasterRpc<>::SendRpcCb() at ??:0
@ 0x7f48e3bf6af5 
_ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EE6__callIvJEJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
 at ??:0
@ 0x7f48e3bf698f 
_ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EEclIJEvEET0_DpOT_
 at ??:0
@ 0x7f48de83daac boost::function0<>::operator()() at ??:0
@ 0x7f48de838cf2 kudu::rpc::OutboundCall::CallCallback() at ??:0
@ 0x7f48de83a379 kudu::rpc::OutboundCall::SetTimedOut() at ??:0
@ 0x7f48de7e4658 kudu::rpc::Connection::HandleOutboundCallTimeout() at 
??:0
@ 0x7f48de7e43ad 
kudu::rpc::Connection::CallAwaitingResponse::HandleTimeout() at ??:0
@ 0x7f48d4050606 ev_invoke_pending at ??:0
@ 0x7f48de84e2ca kudu::rpc::ReactorThread::InvokePendingCb() at ??:0
@ 0x7f48d40514f8 ev_run at ??:0
@ 0x7f48de84e619 kudu::rpc::ReactorThread::RunThread() at ??:0
@ 0x7f48de865bd3 boost::_bi::bind_t<>::operator()() at ??:0
@ 0x7f48de83daac boost::function0<>::operator()() at ??:0
@ 0x7f48d62b229b kudu::Thread::SuperviseThread() at ??:0
@ 0x7f48d985d184 start_thread at ??:0
@ 0x7f48d1e8fffd clone at ??:0

which also hits HandleDynamicTypeCacheMiss. So seems like it may not be as 
simple as I described.


http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@30
PS1, Line 30: From this, it seems that KLOG_EVERY_N_SECS doesn't pair well with
: ScopedCleanup.
> But do we know why?
Looking into it more.



--
To view, visit http://gerrit.cloudera.org:8080/12170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6
Gerrit-Change-Number: 12170
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 07 Jan 2019 19:03:11 +
Gerrit-HasComments: Yes


[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup

2019-01-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12170 )

Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@11
PS1, Line 11: User stack:
this output format looks like from the stack watchdog, not UBSAN. UBSAN itself 
doesn't even give stack traces, if I remember correctly.

What was the UBSAN error?



--
To view, visit http://gerrit.cloudera.org:8080/12170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6
Gerrit-Change-Number: 12170
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 07 Jan 2019 17:45:41 +
Gerrit-HasComments: Yes


[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup

2019-01-07 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12170 )

Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12170/1//COMMIT_MSG@30
PS1, Line 30: From this, it seems that KLOG_EVERY_N_SECS doesn't pair well with
: ScopedCleanup.
But do we know why?



--
To view, visit http://gerrit.cloudera.org:8080/12170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6
Gerrit-Change-Number: 12170
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 07 Jan 2019 17:33:49 +
Gerrit-HasComments: Yes


[kudu-CR] client: avoid KLOG EVERY N SECS with ScopedCleanup

2019-01-07 Thread Andrew Wong (Code Review)
Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12170


Change subject: client: avoid KLOG_EVERY_N_SECS with ScopedCleanup
..

client: avoid KLOG_EVERY_N_SECS with ScopedCleanup

I saw a failure of MasterMigrationTest.TestEndToEndMigration with the
following UBSAN stacktrace:
User stack:
@ 0x7fa3e575c330  (unknown) at ??:0
@   0x52dc09  __sanitizer::internal_read() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_syscall_linux_x86_64.inc:46
 (discriminator 7)
@   0x52f79f  __sanitizer::ReadFromFile() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_posix.cc:176
@   0x53acb9  __sanitizer::SymbolizerProcess::ReadFromSymbolizer() 
at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:468
@   0x53ba76  __sanitizer::SymbolizerProcess::SendCommand() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:445
@   0x53c3c5  __sanitizer::Symbolizer::SymbolizePC() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc:356
@   0x539673  __sanitizer::StackTrace::Print() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cc:36
@   0x541c44  MaybePrintStackTrace() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:48
@   0x5422ee  __ubsan::ScopedReport::~ScopedReport() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_diag.cc:73
@   0x549608  HandleDynamicTypeCacheMiss() at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:81
@   0x549a43  __ubsan_handle_dynamic_type_cache_miss_abort at 
/data/8/awong/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/ubsan/ubsan_handlers_cxx.cc:93
@ 0x7fa3f0086643  
_ZZN4kudu6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS3_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEENKUlvE_clEv
 at ??:0
@ 0x7fa3f0059b8c  
_ZN4kudu13ScopedCleanupIZNS_6client8internal20AsyncLeaderMasterRpcINS_6master23GetTableSchemaRequestPBENS4_24GetTableSchemaResponsePBEE27RetryOrReconnectIfNecessaryEPNS_6StatusEEUlvE_ED2Ev
 at ??:0
@ 0x7fa3f005945c  
kudu::client::internal::AsyncLeaderMasterRpc<>::RetryOrReconnectIfNecessary() 
at ??:0
@ 0x7fa3f0057d0a  
kudu::client::internal::AsyncLeaderMasterRpc<>::SendRpcCb() at ??:0
@ 0x7fa3f0085a85  
_ZNSt5_BindIFSt7_Mem_fnIMN4kudu6client8internal20AsyncLeaderMasterRpcINS1_6master23GetTableSchemaRequestPBENS5_24GetTableSchemaResponsePBEEEFvRKNS1_6StatusEEEPS8_S9_EE6__callIvJEJLm0ELm1T_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
 at ??:0
#19 0x7fa3ddc84ffc in clone sysdeps/unix/sysv/linux/x86_64/clone.S:111

>From this, it seems that KLOG_EVERY_N_SECS doesn't pair well with
ScopedCleanup.

Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6
---
M src/kudu/client/master_proxy_rpc.cc
1 file changed, 8 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/12170/1
--
To view, visit http://gerrit.cloudera.org:8080/12170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8cefd9613018247a1a25d17adedc021e8be166f6
Gerrit-Change-Number: 12170
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong