[ https://issues.apache.org/jira/browse/KUDU-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774594#comment-16774594 ]
Will Berkeley commented on KUDU-2708: ------------------------------------- Alexey's probably right that it's mostly about doing I/O under the lock. >From assorted investigations, this appears to be the biggest culprit in >service queue overflows. What's not clear is how the initial elections start-- this phenomenon relies on voting in a (non-pre-) election, so that cmeta must be flushed to record the vote. So what triggered the original pre-elections, and what caused their candidates to win, triggering a regular election? > Possible contention creating temporary files while flushing cmeta during an > election storm > ------------------------------------------------------------------------------------------ > > Key: KUDU-2708 > URL: https://issues.apache.org/jira/browse/KUDU-2708 > Project: Kudu > Issue Type: Improvement > Reporter: Will Berkeley > Priority: Major > > Doing investigation into consensus queue overflows that happen under heavy > write load, I noticed 6/10 service threads at the time of overflow have > stacks like > {noformat} > 0x3b6720f710 <unknown> > 0x1fb900a base::internal::SpinLockDelay() > 0x1fb8ea7 base::SpinLock::SlowLock() > 0xb82e25 kudu::consensus::RaftConsensus::RequestVote() > 0x931555 > kudu::tserver::ConsensusServiceImpl::RequestConsensusVote() > 0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle() > 0x1e2935a kudu::rpc::ServicePool::RunThread() > 0x1f9bd91 kudu::Thread::SuperviseThread() > 0x3b672079d1 start_thread > 0x3b66ee88fd clone > {noformat} > They are waiting on some tablet's Raft consensus instance's {{lock_}} in > order to vote. Looking into what might be holding that lock, I see stacks like > {noformat} > 0x3b6720f710 <unknown> > 0x3b66edb2ed __GI_open64 > 0x3b66e63caa __gen_tempname > 0x1f1cf35 kudu::(anonymous namespace)::PosixEnv::MkTmpFile() > 0x1f1f662 kudu::(anonymous namespace)::PosixEnv::NewTempRWFile() > 0x1f8305e kudu::pb_util::WritePBContainerToPath() > 0xb47932 kudu::consensus::ConsensusMetadata::Flush() > 0xb74164 > kudu::consensus::RaftConsensus::SetVotedForCurrentTermUnlocked() > 0xb783aa > kudu::consensus::RaftConsensus::RequestVoteRespondVoteGranted() > 0xb836a1 kudu::consensus::RaftConsensus::RequestVote() > 0x931555 > kudu::tserver::ConsensusServiceImpl::RequestConsensusVote() > 0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle() > 0x1e2935a kudu::rpc::ServicePool::RunThread() > 0x1f9bd91 kudu::Thread::SuperviseThread() > 0x3b672079d1 start_thread > 0x3b66ee88fd clone > {noformat} > Doing some junior spelunking into glibc code, one hypothesis is that we are > generating lots of collisions of proposed temporary file names in the cmeta > folder because many threads are attempting to flush cmeta at once. The glibc > code looks like > Maybe we could put the thread id into the temporary file name when a thread > does a cmeta flush. -- This message was sent by Atlassian JIRA (v7.6.3#76005)