[ 
https://issues.apache.org/jira/browse/KUDU-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774594#comment-16774594
 ] 

Will Berkeley commented on KUDU-2708:
-------------------------------------

Alexey's probably right that it's mostly about doing I/O under the lock.

>From assorted investigations, this appears to be the biggest culprit in 
>service queue overflows.

What's not clear is how the initial elections start-- this phenomenon relies on 
voting in a (non-pre-) election, so that cmeta must be flushed to record the 
vote. So what triggered the original pre-elections, and what caused their 
candidates to win, triggering a regular election?

> Possible contention creating temporary files while flushing cmeta during an 
> election storm
> ------------------------------------------------------------------------------------------
>
>                 Key: KUDU-2708
>                 URL: https://issues.apache.org/jira/browse/KUDU-2708
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Will Berkeley
>            Priority: Major
>
> Doing investigation into consensus queue overflows that happen under heavy 
> write load, I noticed 6/10 service threads at the time of overflow have 
> stacks like
> {noformat}
> 0x3b6720f710 <unknown>
>            0x1fb900a base::internal::SpinLockDelay()
>            0x1fb8ea7 base::SpinLock::SlowLock()
>             0xb82e25 kudu::consensus::RaftConsensus::RequestVote()
>             0x931555 
> kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
>            0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
>            0x1e2935a kudu::rpc::ServicePool::RunThread()
>            0x1f9bd91 kudu::Thread::SuperviseThread()
>         0x3b672079d1 start_thread
>         0x3b66ee88fd clone
> {noformat}
> They are waiting on some tablet's Raft consensus instance's {{lock_}} in 
> order to vote. Looking into what might be holding that lock, I see stacks like
> {noformat}
> 0x3b6720f710 <unknown>
>         0x3b66edb2ed __GI_open64
>         0x3b66e63caa __gen_tempname
>            0x1f1cf35 kudu::(anonymous namespace)::PosixEnv::MkTmpFile()
>            0x1f1f662 kudu::(anonymous namespace)::PosixEnv::NewTempRWFile()
>            0x1f8305e kudu::pb_util::WritePBContainerToPath()
>             0xb47932 kudu::consensus::ConsensusMetadata::Flush()
>             0xb74164 
> kudu::consensus::RaftConsensus::SetVotedForCurrentTermUnlocked()
>             0xb783aa 
> kudu::consensus::RaftConsensus::RequestVoteRespondVoteGranted()
>             0xb836a1 kudu::consensus::RaftConsensus::RequestVote()
>             0x931555 
> kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
>            0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
>            0x1e2935a kudu::rpc::ServicePool::RunThread()
>            0x1f9bd91 kudu::Thread::SuperviseThread()
>         0x3b672079d1 start_thread
>         0x3b66ee88fd clone
> {noformat}
> Doing some junior spelunking into glibc code, one hypothesis is that we are 
> generating lots of collisions of proposed temporary file names in the cmeta 
> folder because many threads are attempting to flush cmeta at once. The glibc 
> code looks like
> Maybe we could put the thread id into the temporary file name when a thread 
> does a cmeta flush.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to