[ https://issues.apache.org/jira/browse/KUDU-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mike Percy updated KUDU-2118: ----------------------------- Target Version/s: 1.5.0 Code Review: https://gerrit.cloudera.org/7883 > Running RaftConsensus instances should not be destroyed by reactor threads > -------------------------------------------------------------------------- > > Key: KUDU-2118 > URL: https://issues.apache.org/jira/browse/KUDU-2118 > Project: Kudu > Issue Type: Bug > Components: consensus > Affects Versions: 1.5.0 > Reporter: Adar Dembo > Assignee: Mike Percy > Priority: Critical > Attachments: 07e4b47e517a4d44b1b8cbdaed95e216.txt, > 0_create-table-stress-test.txt.gz > > > RaftConsensus is an object with shared ownership, and one of its invariants > is that the last ref may be dropped (and thus the object destroyed) by the > reactor thread, but if that happens, RaftConsensus must already be shut down, > because the act of shutting down may wait, and reactor threads aren't allowed > to wait. > And yet, here's a pre-commit test failure showing otherwise. In it, a reactor > thread destroys a LeaderElection object, which destroys the embedded > ElectionDecisionCallback, which had the last ref to RaftConsensus, which then > destroys it. Normally the Shutdown call in the destructor would no-op, but > apparently it's going through a full stop sequence instead. > {noformat} > thread_restrictions.cc:79] Check failed: LoadTLS()->wait_allowed Waiting is > not allowed to be used on this thread to prevent server-wide latency > aberrations and deadlocks. Thread 3852 (name: "rpc reactor", category: > "reactor") > @ 0x7fcfc8864507 kudu::ThreadRestrictions::AssertWaitAllowed() at > ??:0 > @ 0x7fcfc55de12f kudu::consensus::RaftConsensus::Stop() at ??:0 > @ 0x7fcfc55de6aa kudu::consensus::RaftConsensus::Shutdown() at ??:0 > @ 0x7fcfc55cdba4 kudu::consensus::RaftConsensus::~RaftConsensus() at > ??:0 > @ 0x7fcfc55fab95 __gnu_cxx::new_allocator<>::destroy<>() at ??:0 > @ 0x7fcfc55fab47 std::allocator_traits<>::_S_destroy<>() at ??:0 > @ 0x7fcfc55faae9 std::allocator_traits<>::destroy<>() at ??:0 > @ 0x7fcfc55fa91b std::_Sp_counted_ptr_inplace<>::_M_dispose() at ??:0 > @ 0x4304fa std::_Sp_counted_base<>::_M_release() at > /usr/include/c++/4.8/bits/shared_ptr_base.h:158 > @ 0x42e68f std::__shared_count<>::~__shared_count() at > /usr/include/c++/4.8/bits/shared_ptr_base.h:547 > @ 0x7fcfcb8a4032 std::__shared_ptr<>::~__shared_ptr() at ??:0 > @ 0x7fcfcb8a4072 std::shared_ptr<>::~shared_ptr() at ??:0 > @ 0x7fcfc55ed4d4 std::_Head_base<>::~_Head_base() at ??:0 > @ 0x7fcfc55ed4f2 > _ZNSt11_Tuple_implILm0EJSt10shared_ptrIN4kudu9consensus13RaftConsensusEENS3_14ElectionReasonESt12_PlaceholderILi1EEEED1Ev > at ??:0 > @ 0x7fcfc55ed50c std::tuple<>::~tuple() at ??:0 > @ 0x7fcfc55ed52a std::_Bind<>::~_Bind() at ??:0 > @ 0x7fcfc55f6162 std::_Function_base::_Base_manager<>::_M_destroy() > at ??:0 > @ 0x7fcfc55f34ed std::_Function_base::_Base_manager<>::_M_manager() > at ??:0 > @ 0x7fcfcbe5d5c5 std::_Function_base::~_Function_base() at ??:0 > @ 0x7fcfc55b0d18 std::function<>::~function() at ??:0 > @ 0x7fcfc55add9d kudu::consensus::LeaderElection::~LeaderElection() > at ??:0 > @ 0x7fcfc55b699a kudu::RefCountedThreadSafe<>::DeleteInternal() at > ??:0 > @ 0x7fcfc55b697a > kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() at ??:0 > @ 0x7fcfc55b6960 kudu::RefCountedThreadSafe<>::Release() at ??:0 > @ 0x7fcfc55b6936 kudu::internal::MaybeRefcount<>::Release() at ??:0 > @ 0x7fcfc55b68c4 kudu::internal::BindState<>::~BindState() at ??:0 > @ 0x7fcfc55b6910 kudu::internal::BindState<>::~BindState() at ??:0 > @ 0x7fcfcb44f23d kudu::RefCountedThreadSafe<>::DeleteInternal() at > ??:0 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)