Andrew Wong created KUDU-2244: --------------------------------- Summary: spinlock contention in raft_consensus Key: KUDU-2244 URL: https://issues.apache.org/jira/browse/KUDU-2244 Project: Kudu Issue Type: Bug Components: consensus Reporter: Andrew Wong
I was going through the logs of a cluster that was seeing a bunch of kernel_stack_watchdog traces, and the slowness seemed to be caused by a lot of activity in consensus requests. E.g. W1214 18:57:29.514219 36138 kernel_stack_watchdog.cc:145] Thread 36317 stuck at /data/jenkins/workspace/generic-package-centos64-7-0-impala/topdir/BUILD/kudu-1.3.0-cdh5.11.0/src/kudu/rpc/outbound_call.cc:192 for 123ms: Kernel stack: [<ffffffff810b7ef5>] sys_sched_yield+0x65/0xd0 [<ffffffff81645909>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff User stack: @ 0x7f72fab92057 __GI___sched_yield @ 0x19498bf kudu::Thread::StartThread() @ 0x1952e7d kudu::ThreadPool::CreateThreadUnlocked() @ 0x19534d3 kudu::ThreadPool::Submit() @ 0x1953a27 kudu::ThreadPool::SubmitFunc() @ 0x1953ecb kudu::ThreadPool::SubmitClosure() @ 0x9c94ec kudu::consensus::RaftConsensus::ElectionCallback() @ 0x9e6032 kudu::consensus::LeaderElection::CheckForDecision() @ 0x9e78c3 kudu::consensus::LeaderElection::VoteResponseRpcCallback() @ 0xa8b137 kudu::rpc::OutboundCall::CallCallback() @ 0xa8c2bc kudu::rpc::OutboundCall::SetResponse() @ 0xa822c0 kudu::rpc::Connection::HandleCallResponse() @ 0xa83ffc ev::base<>::method_thunk<>() @ 0x198a07f ev_invoke_pending @ 0x198af71 ev_run @ 0xa5e049 kudu::rpc::ReactorThread::RunThread() So it seemed to be cause by some slowness in getting threads. Upon perusing the logs a bit more, there were a sizable number of spinlock profiling traces: W1214 18:54:27.897955 36379 rpcz_store.cc:238] Trace: 1214 18:54:26.766922 (+ 0us) service_pool.cc:143] Inserting onto call queue 1214 18:54:26.771135 (+ 4213us) service_pool.cc:202] Handling call 1214 18:54:26.771138 (+ 3us) raft_consensus.cc:1126] Updating replica for 0 ops 1214 18:54:27.897699 (+1126561us) raft_consensus.cc:1165] Early marking committed up to index 0 1214 18:54:27.897700 (+ 1us) raft_consensus.cc:1170] Triggering prepare for 0 ops 1214 18:54:27.897701 (+ 1us) raft_consensus.cc:1282] Marking committed up to 1766 1214 18:54:27.897702 (+ 1us) raft_consensus.cc:1332] Filling consensus response to leader. 1214 18:54:27.897736 (+ 34us) spinlock_profiling.cc:255] Waited 991 ms on lock 0x120b3540. stack: 00000000019406c5 00000000009c60d7 00000000009c75f7 00000000007dc628 0000000000a7adfc 0000000000a7b9cd 000000000194d059 00007f72fbcc2dc4 00007f72fabad1cc ffffffffffffffff 1214 18:54:27.897737 (+ 1us) raft_consensus.cc:1327] UpdateReplicas() finished 1214 18:54:27.897741 (+ 4us) inbound_call.cc:130] Queueing success response Metrics: {"spinlock_wait_cycles":2478395136} Each of the traces noted on the order of 500-1000ms of waiting on spinlocks. Upon looking at raft_consensus.cc, it seems we're holding a spinlock (update_lock_) while we call RaftConsensus::UpdateReplica(), which according to its header, "won't return until all operations have been stored in the log and all Prepares() have been completed". While locking may be necessary, it's worth considering using a different kind of lock here. -- This message was sent by Atlassian JIRA (v6.4.14#64029)