[ https://issues.apache.org/jira/browse/KUDU-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3149: -------------------------------- Status: In Review (was: Open) > Lock contention between registering ops and computing maintenance op stats > -------------------------------------------------------------------------- > > Key: KUDU-3149 > URL: https://issues.apache.org/jira/browse/KUDU-3149 > Project: Kudu > Issue Type: Bug > Components: perf, tserver > Reporter: Andrew Wong > Priority: Critical > > We saw a bunch of tablets bootstrapping extremely slowly, and many stuck > supposedly bootstrapping, but not showing up in the {{/tablets}} page, i.e. > we could only see INITIALIZED and RUNNING tablets, no BOOTSTRAPPING. > Upon digging into the stacks, we saw a bunch waiting to acquire the MM lock: > {code:java} > TID 46577(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb59b99 kudu::tablet::Tablet::RegisterMaintenanceOps() > @ 0xb855a1 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > TID 46574(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb59c74 kudu::tablet::Tablet::RegisterMaintenanceOps() > @ 0xb855a1 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > 7 threads with same stack: > TID 46575(tablet-open [wo): > TID 46576(tablet-open [wo): > TID 46578(tablet-open [wo): > TID 46580(tablet-open [wo): > TID 46581(tablet-open [wo): > TID 46582(tablet-open [wo): > TID 46583(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb85374 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > TID 46573(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb854c7 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > 2 threads with same stack: > TID 43795(MaintenanceMgr ): > TID 43796(MaintenanceMgr ): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x239a064 kudu::MaintenanceManager::LaunchOp() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > {code} > A couple more stacks show some work being done by the maintenance manager: > {code:java} > TID 43794(MaintenanceMgr ): > @ 0x7f1dd57147e0 (unknown) > @ 0xba7b41 > kudu::tablet::BudgetedCompactionPolicy::RunApproximation() > @ 0xba8f5d > kudu::tablet::BudgetedCompactionPolicy::PickRowSets() > @ 0xb5b1a1 kudu::tablet::Tablet::PickRowSetsToCompact() > @ 0xb64e93 kudu::tablet::Tablet::Compact() > @ 0xb81071 kudu::tablet::CompactRowSetsOp::Perform() > @ 0x2399c77 kudu::MaintenanceManager::LaunchOp() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > TID 46999(maintenance_sch): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0xb51f29 kudu::tablet::Tablet::UpdateCompactionStats() > @ 0xb7f435 kudu::tablet::CompactRowSetsOp::UpdateStats() > @ 0x23956e4 kudu::MaintenanceManager::FindBestOp() > @ 0x2396af9 kudu::MaintenanceManager::FindAndLaunchOp() > @ 0x2397858 kudu::MaintenanceManager::RunSchedulerThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > {code} > So the question is, why is updating stats taking so long? Getting the full > stacks, the scheduler thread is updating compaction stats, but is waiting to > take a lock: > {code:java} > Thread 4 (Thread 0x7f1d7d358700 (LWP 46999)): > #0 0x00007f1dd5713334 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00007f1dd570e5d8 in _L_lock_854 () from /lib64/libpthread.so.0 > #2 0x00007f1dd570e4a7 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x0000000000b51f29 in > kudu::tablet::Tablet::UpdateCompactionStats(kudu::MaintenanceOpStats*) () > #4 0x0000000000b7f435 in > kudu::tablet::CompactRowSetsOp::UpdateStats(kudu::MaintenanceOpStats*) () > #5 0x00000000023956e4 in kudu::MaintenanceManager::FindBestOp() () > #6 0x0000000002396af9 in > kudu::MaintenanceManager::FindAndLaunchOp(std::unique_lock<kudu::Mutex>*) () > #7 0x0000000002397858 in kudu::MaintenanceManager::RunSchedulerThread() () > {code} > And a compaction thread running the rowset-picking algorithm, which happens > while the {{compact_select_lock_}} is held: > {code:java} > Thread 125 (Thread 0x7f1dce377700 (LWP 43794)): > #0 0x0000000000ba7b90 in > kudu::tablet::BudgetedCompactionPolicy::RunApproximation(std::vector<kudu::tablet::RowSetInfo, > std::allocator<kudu::tablet::RowSetInfo> > const&, > std::vector<kudu::tablet::RowSetInfo, > std::allocator<kudu::tablet::RowSetInfo> > const&, std::vector<double, > std::allocator<double> >*, > kudu::tablet::BudgetedCompactionPolicy::SolutionAndValue*) const () > #1 0x0000000000ba8f5d in > kudu::tablet::BudgetedCompactionPolicy::PickRowSets(kudu::tablet::RowSetTree > const&, std::unordered_set<kudu::tablet::RowSet const*, > std::hash<kudu::tablet::RowSet const*>, std::equal_to<kudu::tablet::RowSet > const*>, std::allocator<kudu::tablet::RowSet const*> >*, double*, > std::vector<std::basic_string<char, std::char_traits<char>, > std::allocator<char> >, std::allocator<std::basic_string<char, > std::char_traits<char>, std::allocator<char> > > >*) () > #2 0x0000000000b5b1a1 in > kudu::tablet::Tablet::PickRowSetsToCompact(kudu::tablet::RowSetsInCompaction*, > int) const () > #3 0x0000000000b64e93 in kudu::tablet::Tablet::Compact(int) () > #4 0x0000000000b81071 in kudu::tablet::CompactRowSetsOp::Perform() () > #5 0x0000000002399c77 in > kudu::MaintenanceManager::LaunchOp(kudu::MaintenanceOp*) () > #6 0x00000000023f994c in kudu::ThreadPool::DispatchThread() () > #7 0x00000000023f3f8b in kudu::Thread::SuperviseThread(void*) () > #8 0x00007f1dd570caa1 in start_thread () from /lib64/libpthread.so.0 > #9 0x00007f1dd3b18bcd in clone () from /lib64/libc.so.6 > {code} > It seems the computing of stats is being blocked by on-going compactions, > contending with the {{compact_select_lock_}}. Additionally, we're holding the > maintenance manager's {{lock_}} member, for the duration of us computing > stats, which is contending with the registration of maintenance manager ops. > So the compaction selection is blocking the scheduler thread, and is thus > effectively blocking the registration of many tablet replicas' ops, and > blocking further bootstrapping. > A couple things come to mind with regards to the registration issue: > - We could probably take a snapshot of the ops under lock and release the > lock_ when finding the best op to run, so op registration would be somewhat > independent of scheduling. > - Additionally, we may want to consider disabling compactions entirely until > the initial set of tablets finishes bootstrapping. > It's worth noting that it isn't the act of compacting that is contending > directly with registration per se, but rather the computation of the stats. > The computation of stats happens to be contending with on-going compactions. > Improving either contention would likely help. > As a workaround, we used the {{set_flag}} tool to disable compactions on the > node and noted significantly faster bootstrapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)