[
https://issues.apache.org/jira/browse/KUDU-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Serbin updated KUDU-3620:
--------------------------------
Attachment: ts_recovery-itest.ubsan.log.xz
> Race condition in OpDriver::ReplicationFinished()
> -------------------------------------------------
>
> Key: KUDU-3620
> URL: https://issues.apache.org/jira/browse/KUDU-3620
> Project: Kudu
> Issue Type: Bug
> Components: master, tserver
> Reporter: Alexey Serbin
> Priority: Major
> Attachments: ts_recovery-itest.asan.txt.xz,
> ts_recovery-itest.sigsegv.txt.xz, ts_recovery-itest.ubsan.log.xz
>
>
> There is a race condition in {{OpDriver::ReplicationFinished}} that with
> [1b99da532|https://github.com/apache/kudu/commit/1b99da532f52d143c46440c3903785d642fb45a3]
> manifests itself in the following ways when running ts_recovery-itest:
> # A tablet server crashes with SIGSEGV (DEBUG builds and probably RELEASE
> builds as well)
> # The address sanitizer issues warnings (ASAN builds)
> ## The AddressSanitizer reports a heap-use-after-free error
> ## The UndefinedBehaviorSanitizer reports a run-time error due to invalid vptr
> Full logs are attached.
> The stack trace for item 1:
> {noformat}
> *** Aborted at 1727269462 (unix time) try "date -d @1727269462" if you are
> using GNU date ***
> PC: @ 0x0 (unknown)
> *** SIGSEGV (@0x30) received by PID 14694 (TID 0x7f734f91b700) from PID 48;
> stack trace: ***
> @ 0x7f73830a5980 (unknown) at ??:0
> @ 0x7f73848b3db6 kudu::tablet::OpState::tablet_replica() at ??:0
> @ 0x7f73848d55c3 kudu::tablet::OpDriver::ReplicationFinished() at ??:0
> @ 0x7f73848aa27e
> _ZZN4kudu6tablet13TabletReplica15StartFollowerOpERK13scoped_refptrINS_9consensus14ConsensusRoundEEENKUlRKNS_6StatusEE_clESA_
> at ??:0
> @ 0x7f73848b0f41
> _ZNSt17_Function_handlerIFvRKN4kudu6StatusEEZNS0_6tablet13TabletReplica15StartFollowerOpERK13scoped_refptrINS0_9consensus14ConsensusRoundEEEUlS3_E_E9_M_invokeERKSt9_Any_dataS3_
> at ??:0
> @ 0x7f7386351325 std::function<>::operator()() at ??:0
> @ 0x7f7384407f2b
> kudu::consensus::ConsensusRound::NotifyReplicationFinished() at ??:0
> @ 0x7f73843d774b
> kudu::consensus::PendingRounds::AdvanceCommittedIndex() at ??:0
> @ 0x7f73843f6888 kudu::consensus::RaftConsensus::UpdateReplica() at
> ??:0
> @ 0x7f73843f1ef5 kudu::consensus::RaftConsensus::Update() at ??:0
> @ 0x7f7385467de7
> kudu::tserver::ConsensusServiceImpl::UpdateConsensus() at ??:0
> @ 0x7f7383c95fd2
> _ZZN4kudu9consensus18ConsensusServiceIfC4ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE0_clESG_SH_SJ_
> at ??:0
> @ 0x7f7383c9a063
> _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_9consensus18ConsensusServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E0_E9_M_invokeERKSt9_Any_dataOS4_OS5_OS9_
> at ??:0
> @ 0x7f73834af4b8 std::function<>::operator()() at ??:0
> @ 0x7f73834aed6c kudu::rpc::GeneratedServiceIf::Handle() at ??:0
> @ 0x7f73834b1a7d kudu::rpc::ServicePool::RunThread() at ??:0
> @ 0x7f73834b03c7 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv at ??:0
> @ 0x7f73834b1e06
> _ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data
> at ??:0
> @ 0x55ab245f526e std::function<>::operator()() at ??:0
> @ 0x7f7382853bb1 kudu::Thread::SuperviseThread() at ??:0
> @ 0x7f738309a6db start_thread at ??:0
> @ 0x7f73805ae71f clone at ??:0
> {noformat}
> A sample of output for item 2.1:
> {noformat}
> ==26864==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x617000212830 at pc 0x7fd36dc2c636 bp 0x7fd32f986530 sp 0x7fd32f986528
> READ of size 8 at 0x617000212830 thread T84 (rpc worker-2694)
> #0 0x7fd36dc2c635 in kudu::tablet::OpState::tablet_replica() const
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op.h:189:12
> #1 0x7fd36dc70732 in
> kudu::tablet::OpDriver::ReplicationFinished(kudu::Status const&)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.cc:443:37
> #2 0x7fd36dc20493 in
> kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound>
> const&)::$_7::operator()(kudu::Status const&) const
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/tablet_replica.cc:857:51
> #3 0x7fd36dc202fc in std::_Function_handler<void (kudu::Status const&),
> kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound>
> const&)::$_7>::_M_invoke(std::_Any_data const&, kudu::Status const&)
> ../../../include/c++/7.5.0/bits/std_function.h:316:2
> #4 0x7fd37460bd0d in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const
> ../../../include/c++/7.5.0/bits/std_function.h:706:14
> #5 0x7fd36c940afc in
> kudu::consensus::ConsensusRound::NotifyReplicationFinished(kudu::Status
> const&)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/raft_consensus.cc:3311:3
> #6 0x7fd36c8cdbbc in
> kudu::consensus::PendingRounds::AdvanceCommittedIndex(long)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/pending_rounds.cc:185:12
> #7 0x7fd36c916f16 in
> kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB
> const*, kudu::consensus::ConsensusResponsePB*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/raft_consensus.cc:1530:5
> #8 0x7fd36c914e57 in
> kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB
> const*, kudu::consensus::ConsensusResponsePB*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/consensus/raft_consensus.cc:1097:14
> #9 0x7fd3705ec7ad in
> kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB
> const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tserver/tablet_service.cc:1764:25
> #10 0x7fd36ace9b56 in
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
> const&, scoped_refptr<kudu::rpc::ResultTracker>
> const&)::$_1::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> /home/jenkins-slave/workspace/build_and_test_flaky@2/build/asan/src/kudu/consensus/consensus.service.cc:299:13
> #11 0x7fd36ace9885 in std::_Function_handler<void
> (google::protobuf::Message const*, google::protobuf::Message*,
> kudu::rpc::RpcContext*),
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
> const&, scoped_refptr<kudu::rpc::ResultTracker>
> const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message
> const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&)
> ../../../include/c++/7.5.0/bits/std_function.h:316:2
> #12 0x7fd367dc924e in std::function<void (google::protobuf::Message
> const*, google::protobuf::Message*,
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> ../../../include/c++/7.5.0/bits/std_function.h:706:14
> #13 0x7fd367dc812e in
> kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/rpc/service_if.cc:137:3
> #14 0x7fd367dce365 in kudu::rpc::ServicePool::RunThread()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/rpc/service_pool.cc:229:15
> #15 0x7fd367dcec8f in
> kudu::rpc::ServicePool::Init(int)::$_0::operator()() const
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/rpc/service_pool.cc:92:5
> #16 0x7fd367dceab8 in std::_Function_handler<void (),
> kudu::rpc::ServicePool::Init(int)::$_0>::_M_invoke(std::_Any_data const&)
> ../../../include/c++/7.5.0/bits/std_function.h:316:2
> #17 0xa86d2c in std::function<void ()>::operator()() const
> ../../../include/c++/7.5.0/bits/std_function.h:706:14
> #18 0x7fd36108db5d in kudu::Thread::SuperviseThread(void*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/thread.cc:693:3
> #19 0x7fd36446b6da in start_thread
> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
> #20 0x7fd35d1fa71e in clone (/lib/x86_64-linux-gnu/libc.so.6+0x12171e)
> 0x617000212830 is located 48 bytes inside of 688-byte region
> [0x617000212800,0x617000212ab0)
> freed by thread T140 (apply [worker]-) here:
> #0 0x9557b0 in operator delete(void*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/thirdparty/src/llvm-11.0.0.src/projects/compiler-rt/l
> ib/asan/asan_new_delete.cpp:160
> #1 0x7fd36dca4f0a in kudu::tablet::WriteOpState::~WriteOpState()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/write_
> op.cc:665:31
> #2 0x7fd37472bf41 in
> std::default_delete<kudu::tablet::WriteOpState>::operator()(kudu::tablet::WriteOpState*)
> const ../../../include/c++/7.5.0/bits/unique_ptr.h:78:2
> #3 0x7fd37471974b in std::unique_ptr<kudu::tablet::WriteOpState,
> std::default_delete<kudu::tablet::WriteOpState> >::~unique_ptr()
> ../../../include/c++/7.5.0/bits/unique_ptr.h:263:4
> #4 0x7fd36dca9c64 in kudu::tablet::WriteOp::~WriteOp()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/write_op.h:345:7
> #5 0x7fd36dca9ca2 in kudu::tablet::WriteOp::~WriteOp()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/write_op.h:345:7
> #6 0x7fd36dc348d1 in
> std::default_delete<kudu::tablet::Op>::operator()(kudu::tablet::Op*) const
> ../../../include/c++/7.5.0/bits/unique_ptr.h:78:2
> #7 0x7fd36dc2700b in std::unique_ptr<kudu::tablet::Op,
> std::default_delete<kudu::tablet::Op> >::~unique_ptr()
> ../../../include/c++/7.5.0/bits/unique_ptr.h:263:4
> #8 0x7fd36dc44252 in kudu::tablet::OpDriver::~OpDriver()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.h:304:16
> #9 0x7fd36dc4421a in kudu::RefCountedThreadSafe<kudu::tablet::OpDriver,
> kudu::DefaultRefCountedThreadSafeTraits<kudu::tablet::OpDriver>
> >::DeleteInternal(kudu::tablet::OpDriver const*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:153:44
> #10 0x7fd36dc441f0 in
> kudu::DefaultRefCountedThreadSafeTraits<kudu::tablet::OpDriver>::Destruct(kudu::tablet::OpDriver
> const*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:116:5
> #11 0x7fd36dc441be in kudu::RefCountedThreadSafe<kudu::tablet::OpDriver,
> kudu::DefaultRefCountedThreadSafeTraits<kudu::tablet::OpDriver> >::Release()
> const
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:144:7
> #12 0x7fd36dc270e7 in
> scoped_refptr<kudu::tablet::OpDriver>::~scoped_refptr()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/gutil/ref_counted.h:266:13
> #13 0x7fd36dc71f53 in kudu::tablet::OpDriver::ApplyTask()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.cc:563:1
> #14 0x7fd36dc74ccb in
> kudu::tablet::OpDriver::ApplyAsync()::$_2::operator()() const
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/tablet/ops/op_driver.cc:504:47
> #15 0x7fd36dc74b48 in std::_Function_handler<void (),
> kudu::tablet::OpDriver::ApplyAsync()::$_2>::_M_invoke(std::_Any_data const&)
> ../../../include/c++/7.5.0/bits/std_function.h:316:2
> #16 0xa86d2c in std::function<void ()>::operator()() const
> ../../../include/c++/7.5.0/bits/std_function.h:706:14
> #17 0x7fd3610af604 in kudu::ThreadPool::DispatchThread()
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/threadpool.cc:776:7
> #18 0x7fd3610b2c2b in kudu::ThreadPool::CreateThread()::$_2::operator()()
> const
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/threadpool.cc:849:48
> #19 0x7fd3610b2aa8 in std::_Function_handler<void (),
> kudu::ThreadPool::CreateThread()::$_2>::_M_invoke(std::_Any_data const&)
> ../../../include/c++/7.5.0/bits/std_function.h:316:2
> #20 0xa86d2c in std::function<void ()>::operator()() const
> ../../../include/c++/7.5.0/bits/std_function.h:706:14
> #21 0x7fd36108db5d in kudu::Thread::SuperviseThread(void*)
> /home/jenkins-slave/workspace/build_and_test_flaky@2/src/kudu/util/thread.cc:693:3
> #22 0x7fd36446b6da in start_thread
> (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
> {noformat}
> A sample output for item 2.2:
> {noformat}
> /root/Projects/kudu/src/kudu/tablet/ops/op.h:189:12: runtime error: member
> access within address 0x617000118e80 which does not point to an object of
> type 'const kudu::tablet::OpState'
> 0x617000118e80: note: object has invalid vptr
> 7f 00 80 6c 78 00 00 7e 44 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00
> ^~~~~~~~~~~~~~~~~~~~~~~
> invalid vptr
> #0 0x7f44c3e91f3d in kudu::tablet::OpState::tablet_replica() const
> /root/Projects/kudu/src/kudu/tablet/ops/op.h:189:12
> #1 0x7f44c3ed5762 in
> kudu::tablet::OpDriver::ReplicationFinished(kudu::Status const&)
> /root/Projects/kudu/src/kudu/tablet/ops/op_driver.cc:443:37
> #2 0x7f44c3e85ca3 in
> kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound>
> const&)::$_7::operator()(kudu::Status const&) const
> /root/Projects/kudu/src/kudu/tablet/tablet_replica.cc:857:51
> #3 0x7f44c3e85b0c in std::_Function_handler<void (kudu::Status const&),
> kudu::tablet::TabletReplica::StartFollowerOp(scoped_refptr<kudu::consensus::ConsensusRound>
> const&)::$_7>::_M_invoke(std::_Any_data const&, kudu::Status const&)
> ../../../include/c++/9/bits/std_function.h:300:2
> #4 0x7f44ca5fa80d in std::function<void (kudu::Status
> const&)>::operator()(kudu::Status const&) const
> ../../../include/c++/9/bits/std_function.h:688:14
> #5 0x7f44c2bd051c in
> kudu::consensus::ConsensusRound::NotifyReplicationFinished(kudu::Status
> const&) /root/Projects/kudu/src/kudu/consensus/raft_consensus.cc:3311:3
> #6 0x7f44c2b5cf49 in
> kudu::consensus::PendingRounds::AdvanceCommittedIndex(long)
> /root/Projects/kudu/src/kudu/consensus/pending_rounds.cc:187:12
> #7 0x7f44c2ba6498 in
> kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB
> const*, kudu::consensus::ConsensusResponsePB*)
> /root/Projects/kudu/src/kudu/consensus/raft_consensus.cc:1530:5
> #8 0x7f44c2ba43a7 in
> kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB
> const*, kudu::consensus::ConsensusResponsePB*)
> /root/Projects/kudu/src/kudu/consensus/raft_consensus.cc:1097:14
> #9 0x7f44c675861d in
> kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB
> const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*)
> /root/Projects/kudu/src/kudu/tserver/tablet_service.cc:1764:25
> #10 0x7f44c0fe2086 in
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
> const&, scoped_refptr<kudu::rpc::ResultTracker>
> const&)::$_1::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> /root/Projects/kudu/build/master.asan/src/kudu/consensus/consensus.service.cc:299:13
> #11 0x7f44c0fe1db5 in std::_Function_handler<void
> (google::protobuf::Message const*, google::protobuf::Message*,
> kudu::rpc::RpcContext*),
> kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr<kudu::MetricEntity>
> const&, scoped_refptr<kudu::rpc::ResultTracker>
> const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message
> const*&&, google::protobuf::Message*&&, kudu::rpc::RpcContext*&&)
> ../../../include/c++/9/bits/std_function.h:300:2
> #12 0x7f44be4e5c6e in std::function<void (google::protobuf::Message
> const*, google::protobuf::Message*,
> kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*,
> google::protobuf::Message*, kudu::rpc::RpcContext*) const
> ../../../include/c++/9/bits/std_function.h:688:14
> #13 0x7f44be4e4b4e in
> kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*)
> /root/Projects/kudu/src/kudu/rpc/service_if.cc:137:3
> #14 0x7f44be4eac35 in kudu::rpc::ServicePool::RunThread()
> /root/Projects/kudu/src/kudu/rpc/service_pool.cc:229:15
> #15 0x7f44be4eb55f in
> kudu::rpc::ServicePool::Init(int)::$_0::operator()() const
> /root/Projects/kudu/src/kudu/rpc/service_pool.cc:92:5
> #16 0x7f44be4eb388 in std::_Function_handler<void (),
> kudu::rpc::ServicePool::Init(int)::$_0>::_M_invoke(std::_Any_data const&)
> ../../../include/c++/9/bits/std_function.h:300:2
> #17 0xa097bc in std::function<void ()>::operator()() const
> ../../../include/c++/9/bits/std_function.h:688:14
> #18 0x7f44b8530a9d in kudu::Thread::SuperviseThread(void*)
> /root/Projects/kudu/src/kudu/util/thread.cc:693:3
> #19 0x7f44baff3608 in start_thread
> /build/glibc-LcI20x/glibc-2.31/nptl/pthread_create.c:477:8
> #20 0x7f44b665b352 in clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f352)
> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
> /root/Projects/kudu/src/kudu/tablet/ops/op.h:189:12 in
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)