[jira] [Updated] (KUDU-1520) Possible race between alter schema lock release and tablet shutdown
[ https://issues.apache.org/jira/browse/KUDU-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-1520: -- Target Version/s: (was: 1.5.0) > Possible race between alter schema lock release and tablet shutdown > --- > > Key: KUDU-1520 > URL: https://issues.apache.org/jira/browse/KUDU-1520 > Project: Kudu > Issue Type: Bug > Components: tablet >Affects Versions: 0.9.1 >Reporter: Adar Dembo >Priority: Major > > I've been running a new stress that hammers a cluster with concurrent alter > and delete table requests, and one of my test runs failed with the following: > {noformat} > F0707 18:59:34.311122 373 rw_semaphore.h:145] Check failed: > base::subtle::NoBarrier_Load(&state_) == kWriteFlag (0 vs. 2147483648) > *** Check failure stack trace: *** > @ 0x7f86cd37df5d google::LogMessage::Fail() at ??:0 > @ 0x7f86cd37fe5d google::LogMessage::SendToLog() at ??:0 > @ 0x7f86cd37da99 google::LogMessage::Flush() at ??:0 > @ 0x7f86cd3808ff google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x7f86d4f77c78 kudu::rw_semaphore::unlock() at ??:0 > @ 0x7f86d3728de0 std::unique_lock<>::unlock() at ??:0 > @ 0x7f86d3727192 std::unique_lock<>::~unique_lock() at ??:0 > @ 0x7f86d3725582 > kudu::tablet::AlterSchemaTransactionState::~AlterSchemaTransactionState() at > ??:0 > @ 0x7f86d37255be > kudu::tablet::AlterSchemaTransactionState::~AlterSchemaTransactionState() at > ??:0 > @ 0x7f86d4f68dce std::default_delete<>::operator()() at ??:0 > @ 0x7f86d4f670b9 std::unique_ptr<>::~unique_ptr() at ??:0 > @ 0x7f86d374510e > kudu::tablet::AlterSchemaTransaction::~AlterSchemaTransaction() at ??:0 > @ 0x7f86d374514a > kudu::tablet::AlterSchemaTransaction::~AlterSchemaTransaction() at ??:0 > @ 0x7f86d373f532 kudu::DefaultDeleter<>::operator()() at ??:0 > @ 0x7f86d373df4a > kudu::internal::gscoped_ptr_impl<>::~gscoped_ptr_impl() at ??:0 > @ 0x7f86d373d552 gscoped_ptr<>::~gscoped_ptr() at ??:0 > @ 0x7f86d373d580 > kudu::tablet::TransactionDriver::~TransactionDriver() at ??:0 > @ 0x7f86d3740ab4 kudu::RefCountedThreadSafe<>::DeleteInternal() at > ??:0 > @ 0x7f86d3740405 > kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() at ??:0 > @ 0x7f86d373f928 kudu::RefCountedThreadSafe<>::Release() at ??:0 > @ 0x7f86d373e769 scoped_refptr<>::~scoped_refptr() at ??:0 > @ 0x7f86d37397cd kudu::tablet::TabletPeer::SubmitAlterSchema() at > ??:0 > @ 0x7f86d4f4e070 > kudu::tserver::TabletServiceAdminImpl::AlterSchema() at ??:0 > @ 0x7f86d27a4e92 > _ZZN4kudu7tserver26TabletServerAdminServiceIfC4ERK13scoped_refptrINS_12MetricEntityEEENKUlPKN6google8protobuf7MessageEPS9_PNS_3rpc10RpcContextEE1_clESB_SC_SF_ > at ??:0 > @ 0x7f86d27a5d96 > _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver26TabletServerAdminServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEEEUlS4_S5_S9_E1_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ > at ??:0 > @ 0x7f86d22ce6e4 std::function<>::operator()() at ??:0 > @ 0x7f86d22ce19b kudu::rpc::GeneratedServiceIf::Handle() at ??:0 > @ 0x7f86d22d0a97 kudu::rpc::ServicePool::RunThread() at ??:0 > @ 0x7f86d22d1d45 boost::_mfi::mf0<>::operator()() at ??:0 > @ 0x7f86d22d1b6c boost::_bi::list1<>::operator()<>() at ??:0 > @ 0x7f86d22d1a61 boost::_bi::bind_t<>::operator()() at ??:0 > @ 0x7f86d22d1998 > boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 > {noformat} > After looking through the code a bit, I suspect this happened because, in the > event of failure, the AlterSchema transaction releases the tablet's schema > lock implicitly (i.e. when AlterSchemaTransactionState is destroyed) _after_ > the transaction itself is removed from the driver's TransactionTracker. Thus, > the WaitForAllToFinish() performed during the tablet shutdown process thinks > all the transactions are done and proceeds to free tablet state. Later, the > last reference to the transaction is released (in > TabletPeer::SubmitAlterSchema), the transaction is destroyed, and we try to > unlock a lock whose memory has already been freed. > If this analysis is correct, the broken invariant is: once the transaction > has been released from the tracker, it may no longer access any tablet state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-1520) Possible race between alter schema lock release and tablet shutdown
[ https://issues.apache.org/jira/browse/KUDU-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1520: -- Priority: Major (was: Critical) > Possible race between alter schema lock release and tablet shutdown > --- > > Key: KUDU-1520 > URL: https://issues.apache.org/jira/browse/KUDU-1520 > Project: Kudu > Issue Type: Bug > Components: tablet >Affects Versions: 0.9.1 >Reporter: Adar Dembo > > I've been running a new stress that hammers a cluster with concurrent alter > and delete table requests, and one of my test runs failed with the following: > {noformat} > F0707 18:59:34.311122 373 rw_semaphore.h:145] Check failed: > base::subtle::NoBarrier_Load(&state_) == kWriteFlag (0 vs. 2147483648) > *** Check failure stack trace: *** > @ 0x7f86cd37df5d google::LogMessage::Fail() at ??:0 > @ 0x7f86cd37fe5d google::LogMessage::SendToLog() at ??:0 > @ 0x7f86cd37da99 google::LogMessage::Flush() at ??:0 > @ 0x7f86cd3808ff google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x7f86d4f77c78 kudu::rw_semaphore::unlock() at ??:0 > @ 0x7f86d3728de0 std::unique_lock<>::unlock() at ??:0 > @ 0x7f86d3727192 std::unique_lock<>::~unique_lock() at ??:0 > @ 0x7f86d3725582 > kudu::tablet::AlterSchemaTransactionState::~AlterSchemaTransactionState() at > ??:0 > @ 0x7f86d37255be > kudu::tablet::AlterSchemaTransactionState::~AlterSchemaTransactionState() at > ??:0 > @ 0x7f86d4f68dce std::default_delete<>::operator()() at ??:0 > @ 0x7f86d4f670b9 std::unique_ptr<>::~unique_ptr() at ??:0 > @ 0x7f86d374510e > kudu::tablet::AlterSchemaTransaction::~AlterSchemaTransaction() at ??:0 > @ 0x7f86d374514a > kudu::tablet::AlterSchemaTransaction::~AlterSchemaTransaction() at ??:0 > @ 0x7f86d373f532 kudu::DefaultDeleter<>::operator()() at ??:0 > @ 0x7f86d373df4a > kudu::internal::gscoped_ptr_impl<>::~gscoped_ptr_impl() at ??:0 > @ 0x7f86d373d552 gscoped_ptr<>::~gscoped_ptr() at ??:0 > @ 0x7f86d373d580 > kudu::tablet::TransactionDriver::~TransactionDriver() at ??:0 > @ 0x7f86d3740ab4 kudu::RefCountedThreadSafe<>::DeleteInternal() at > ??:0 > @ 0x7f86d3740405 > kudu::DefaultRefCountedThreadSafeTraits<>::Destruct() at ??:0 > @ 0x7f86d373f928 kudu::RefCountedThreadSafe<>::Release() at ??:0 > @ 0x7f86d373e769 scoped_refptr<>::~scoped_refptr() at ??:0 > @ 0x7f86d37397cd kudu::tablet::TabletPeer::SubmitAlterSchema() at > ??:0 > @ 0x7f86d4f4e070 > kudu::tserver::TabletServiceAdminImpl::AlterSchema() at ??:0 > @ 0x7f86d27a4e92 > _ZZN4kudu7tserver26TabletServerAdminServiceIfC4ERK13scoped_refptrINS_12MetricEntityEEENKUlPKN6google8protobuf7MessageEPS9_PNS_3rpc10RpcContextEE1_clESB_SC_SF_ > at ??:0 > @ 0x7f86d27a5d96 > _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver26TabletServerAdminServiceIfC4ERK13scoped_refptrINS6_12MetricEntityEEEUlS4_S5_S9_E1_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ > at ??:0 > @ 0x7f86d22ce6e4 std::function<>::operator()() at ??:0 > @ 0x7f86d22ce19b kudu::rpc::GeneratedServiceIf::Handle() at ??:0 > @ 0x7f86d22d0a97 kudu::rpc::ServicePool::RunThread() at ??:0 > @ 0x7f86d22d1d45 boost::_mfi::mf0<>::operator()() at ??:0 > @ 0x7f86d22d1b6c boost::_bi::list1<>::operator()<>() at ??:0 > @ 0x7f86d22d1a61 boost::_bi::bind_t<>::operator()() at ??:0 > @ 0x7f86d22d1998 > boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 > {noformat} > After looking through the code a bit, I suspect this happened because, in the > event of failure, the AlterSchema transaction releases the tablet's schema > lock implicitly (i.e. when AlterSchemaTransactionState is destroyed) _after_ > the transaction itself is removed from the driver's TransactionTracker. Thus, > the WaitForAllToFinish() performed during the tablet shutdown process thinks > all the transactions are done and proceeds to free tablet state. Later, the > last reference to the transaction is released (in > TabletPeer::SubmitAlterSchema), the transaction is destroyed, and we try to > unlock a lock whose memory has already been freed. > If this analysis is correct, the broken invariant is: once the transaction > has been released from the tracker, it may no longer access any tablet state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)