Alexey Serbin created KUDU-3688:
-----------------------------------
Summary: Race between CatalogManager::InitCertAuthorityWith() and
ServerNegotiation::HandleTlsHandshake() in follower Kudu master
Key: KUDU-3688
URL: https://issues.apache.org/jira/browse/KUDU-3688
Project: Kudu
Issue Type: Bug
Components: master
Reporter: Alexey Serbin
Attachments: tsan-reports.txt.xz
With the blanket suppression of TSAN warnings for everything called from
libcrypto.so removed, there are reports on data race between ongoing RPC
connection negotiations and the background thread that runs
{{CatalogManager::PrepareFollowerCaInfo()}} in the follower Kudu master.
The essence of the problem boils down to {{TlsContext::AdoptSignedCert()}}
invoking OpenSSL's {{SSL_CTX_use_certificate()}} when there is concurrent TLS
handshake being performed by one of the threads in the connection negotiation
pool.
Below is a snippet from
{noformat}
WARNING: ThreadSanitizer: data race (pid=884526)
Write of size 8 at 0x7b1c000033a0 by thread T84 (mutexes: read M3638, write
M3818):
#0 free
/root/Projects/kudu/thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:708:3
(kudu+0x468f30)
#1 ossl_asn1_string_embed_free
/usr/src/debug/openssl-3.2.2-6.el9_5.1.x86_64/crypto/asn1/asn1_lib.c:367:9
(libcrypto.so.3+0xb937c)
#2 ASN1_STRING_free
/usr/src/debug/openssl-3.2.2-6.el9_5.1.x86_64/crypto/asn1/asn1_lib.c:376:5
(libcrypto.so.3+0xb937c)
#3 ASN1_STRING_free
/usr/src/debug/openssl-3.2.2-6.el9_5.1.x86_64/crypto/asn1/asn1_lib.c:372:6
(libcrypto.so.3+0xb937c)
#4
kudu::master::CatalogManager::InitCertAuthorityWith(std::__1::unique_ptr<kudu::security::PrivateKey,
std::__1::default_delete<kudu::security::PrivateKey> >,
std::__1::unique_ptr<kudu::security::Cert,
std::__1::default_delete<kudu::security::Cert> >)
/root/Projects/kudu/src/kudu/master/catalog_manager.cc:1249:5
(libmaster.so+0x323366)
#5
kudu::master::CatalogManager::PrepareFollowerCaInfo()::$_13::operator()() const
/root/Projects/kudu/src/kudu/master/catalog_manager.cc:1573:12
(libmaster.so+0x35d1fb)
#6 kudu::Status
kudu::Status::AndThen<kudu::master::CatalogManager::PrepareFollowerCaInfo()::$_13>(kudu::master::CatalogManager::PrepareFollowerCaInfo()::$_13)
/root/Projects/kudu/src/kudu/util/status.h:241:14 (libmaster.so+0x325e4d)
#7 kudu::master::CatalogManager::PrepareFollowerCaInfo()
/root/Projects/kudu/src/kudu/master/catalog_manager.cc:1572:49
(libmaster.so+0x325c59)
#8 kudu::master::CatalogManager::PrepareFollower(kudu::MonoTime*)
/root/Projects/kudu/src/kudu/master/catalog_manager.cc:1618:5
(libmaster.so+0x32091d)
#9 kudu::master::CatalogManagerBgTasks::Run()
/root/Projects/kudu/src/kudu/master/catalog_manager.cc:873:38
(libmaster.so+0x31e3d1)
#10 kudu::master::CatalogManagerBgTasks::Init()::$_0::operator()() const
/root/Projects/kudu/src/kudu/master/catalog_manager.cc:773:3
(libmaster.so+0x352211)
...
Previous read of size 8 at 0x7b1c000033a0 by thread T89:
#0 memcpy sanitizer_common/sanitizer_common_interceptors.inc:808:5
(kudu+0x48b836)
#1 asn1_ex_i2c /usr/include/bits/string_fortified.h:29:10
(libcrypto.so.3+0xc6eca)
#2 kudu::rpc::ServerNegotiation::HandleTlsHandshake(kudu::rpc::NegotiatePB
const&) /root/Projects/kudu/src/kudu/rpc/server_negotiation.cc:633:35
(libkrpc.so+0x1e92d1)
#3 kudu::rpc::ServerNegotiation::Negotiate()
/root/Projects/kudu/src/kudu/rpc/server_negotiation.cc:244:18
(libkrpc.so+0x1e745a)
#4 kudu::rpc::DoServerNegotiation(kudu::rpc::Connection*,
kudu::TriStateFlag, kudu::TriStateFlag, bool, kudu::MonoTime const&)
/root/Projects/kudu/src/kudu/rpc/negotiation.cc:293:3 (libkrpc.so+0x188180)
#5
kudu::rpc::Negotiation::RunNegotiation(scoped_refptr<kudu::rpc::Connection>
const&, kudu::TriStateFlag, kudu::TriStateFlag, bool, kudu::MonoTime)
/root/Projects/kudu/src/kudu/rpc/negotiation.cc:315:9 (libkrpc.so+0x1879b5)
#6
kudu::rpc::ReactorThread::StartConnectionNegotiation(scoped_refptr<kudu::rpc::Connection>
const&)::$_1::operator()() const
/root/Projects/kudu/src/kudu/rpc/reactor.cc:631:3 (libkrpc.so+0x1a6edc)
...
{noformat}
The log with several instances of the TSAN warning is attached.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)