[ 
https://issues.apache.org/jira/browse/KUDU-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16466770#comment-16466770
 ] 

Mike Percy commented on KUDU-2433:
----------------------------------

So after thinking about this more, it appears that the pc for one of the frames 
was NULL, which should be impossible. Therefore it makes me wonder if the cause 
of this was some kind of stack-smashing bug?

> Potential flake in client-negotiation-failover-itest due to negotiation 
> timeout
> -------------------------------------------------------------------------------
>
>                 Key: KUDU-2433
>                 URL: https://issues.apache.org/jira/browse/KUDU-2433
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Mike Percy
>            Priority: Major
>         Attachments: client-negotiation-failover-itest.txt
>
>
> There appears to be a rare flake in client-negotiation-failover-itest in 
> ClientFailoverOnNegotiationTimeoutITest.Kudu1580ConnectToTServer due to a 
> negotiation timeout. This appeared on one of the test dashboards:
> {code:java}
> I0507 20:10:39.140327 1387 raft_consensus.cc:2459] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Leader pre-election lost for term 4. Reason: could not achieve 
> majority
> W0507 20:10:39.140895 296 leader_election.cc:286] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> 5014824b94b64fd49da2c1a645c078aa: Network error: Client connection 
> negotiation failed: client connection to 127.0.52.194:43942: connect: 
> Connection refused (error 111)
> I0507 20:10:39.165484 1389 raft_consensus.cc:435] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election (detected failure of leader 
> b5e887ddd8a14cf3b2c24a0aa14be025)
> I0507 20:10:39.165598 1389 raft_consensus.cc:2704] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.853s (starting election)
> I0507 20:10:39.165699 1389 raft_consensus.cc:457] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election with config: opid_index: -1 OBSOLETE_local: 
> false peers { permanent_uuid: "b5e887ddd8a14cf3b2c24a0aa14be025" member_type: 
> VOTER last_known_addr { host: "127.0.52.195" port: 38453 } } peers { 
> permanent_uuid: "d1bf0f72a88f4483a1906bc8246849dd" member_type: VOTER 
> last_known_addr { host: "127.0.52.193" port: 46057 } } peers { 
> permanent_uuid: "5014824b94b64fd49da2c1a645c078aa" member_type: VOTER 
> last_known_addr { host: "127.0.52.194" port: 43942 } }
> I0507 20:10:39.166122 1389 leader_election.cc:230] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> b5e887ddd8a14cf3b2c24a0aa14be025
> I0507 20:10:39.166318 1389 leader_election.cc:230] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> 5014824b94b64fd49da2c1a645c078aa
> W0507 20:10:39.166895 297 leader_election.cc:286] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> b5e887ddd8a14cf3b2c24a0aa14be025: Timed out: RequestConsensusVote RPC to 
> 127.0.52.195:38453 timed out after 0.633s (SENT)
> I0507 20:10:39.166954 297 leader_election.cc:257] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Election decided. Result: candidate lost.
> W0507 20:10:39.167076 296 leader_election.cc:286] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> 5014824b94b64fd49da2c1a645c078aa: Network error: Client connection 
> negotiation failed: client connection to 127.0.52.194:43942: connect: 
> Connection refused (error 111)
> I0507 20:10:39.167102 1389 raft_consensus.cc:2704] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.685s (election complete)
> I0507 20:10:39.167136 1389 raft_consensus.cc:2459] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Leader pre-election lost for term 4. Reason: could not achieve 
> majority
> W0507 20:10:39.179502 660 negotiation.cc:313] Failed RPC negotiation. Trace:
> 0507 20:10:36.182479 (+ 0us) reactor.cc:577] Submitting negotiation task for 
> client connection to 127.0.52.195:38453
> 0507 20:10:36.182694 (+ 215us) negotiation.cc:98] Waiting for socket to 
> connect
> 0507 20:10:36.182707 (+ 13us) client_negotiation.cc:160] Beginning negotiation
> 0507 20:10:36.182797 (+ 90us) client_negotiation.cc:237] Sending NEGOTIATE 
> NegotiatePB request
> 0507 20:10:39.179166 (+2996369us) negotiation.cc:304] Negotiation complete: 
> Timed out: Client connection negotiation failed: client connection to 
> 127.0.52.195:38453
> Metrics: {"client-negotiator.queue_time_us":63}
> W0507 20:10:39.179760 659 meta_cache.cc:216] Tablet 
> a0376a5e3b9f495aa00d366b35594a27: Replica b5e887ddd8a14cf3b2c24a0aa14be025 
> (127.0.52.195:38453) has failed: Timed out: Client connection negotiation 
> failed: client connection to 127.0.52.195:38453
> W0507 20:10:39.242803 297 leader_election.cc:286] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> b5e887ddd8a14cf3b2c24a0aa14be025: Timed out: RequestConsensusVote RPC to 
> 127.0.52.195:38453 timed out after 0.101s (SENT)
> I0507 20:10:39.242909 297 leader_election.cc:257] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Election decided. Result: candidate lost.
> I0507 20:10:39.243211 1389 raft_consensus.cc:2704] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.097s (election complete)
> I0507 20:10:39.243288 1389 raft_consensus.cc:2459] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Leader pre-election lost for term 4. Reason: could not achieve 
> majority
> W0507 20:10:39.263934 297 leader_election.cc:286] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 5 pre-election: RPC error from VoteRequest() call to peer 
> b5e887ddd8a14cf3b2c24a0aa14be025: Timed out: RequestConsensusVote RPC to 
> 127.0.52.195:38453 timed out after 0.975s (SENT)
> I0507 20:10:39.264031 297 leader_election.cc:257] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 5 pre-election: Election decided. Result: candidate lost.
> I0507 20:10:39.264344 1389 raft_consensus.cc:2704] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.720s (election complete)
> I0507 20:10:39.264427 1389 raft_consensus.cc:2459] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd [term 4 
> FOLLOWER]: Leader pre-election lost for term 5. Reason: could not achieve 
> majority
> I0507 20:10:39.340955 1389 raft_consensus.cc:435] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election (detected failure of leader 
> b5e887ddd8a14cf3b2c24a0aa14be025)
> I0507 20:10:39.341055 1389 raft_consensus.cc:2704] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.311s (starting election)
> I0507 20:10:39.341125 1389 raft_consensus.cc:457] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election with config: opid_index: -1 OBSOLETE_local: 
> false peers { permanent_uuid: "5014824b94b64fd49da2c1a645c078aa" member_type: 
> VOTER last_known_addr { host: "127.0.52.194" port: 43942 } } peers { 
> permanent_uuid: "d1bf0f72a88f4483a1906bc8246849dd" member_type: VOTER 
> last_known_addr { host: "127.0.52.193" port: 46057 } } peers { 
> permanent_uuid: "b5e887ddd8a14cf3b2c24a0aa14be025" member_type: VOTER 
> last_known_addr { host: "127.0.52.195" port: 38453 } }
> I0507 20:10:39.341584 1389 leader_election.cc:230] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> 5014824b94b64fd49da2c1a645c078aa
> I0507 20:10:39.341753 1389 leader_election.cc:230] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> b5e887ddd8a14cf3b2c24a0aa14be025
> W0507 20:10:39.342707 296 leader_election.cc:286] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> 5014824b94b64fd49da2c1a645c078aa: Network error: Client connection 
> negotiation failed: client connection to 127.0.52.194:43942: connect: 
> Connection refused (error 111)
> W0507 20:10:39.443085 659 batcher.cc:348] Timed out: Failed to write batch of 
> 1 ops to tablet a0376a5e3b9f495aa00d366b35594a27 after 261 attempt(s): Failed 
> to write to server: (no server available): Write(tablet: 
> a0376a5e3b9f495aa00d366b35594a27, num_ops: 1, num_attempts: 261) passed its 
> deadline: Illegal state: Replica d1bf0f72a88f4483a1906bc8246849dd is not 
> leader of this config. Role: FOLLOWER. Consensus state: current_term: 3 
> leader_uuid: "b5e887ddd8a14cf3b2c24a0aa14be025" committed_config { 
> opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: 
> "b5e887ddd8a14cf3b2c24a0aa14be025" member_type: VOTER last_known_addr { host: 
> "127.0.52.195" port: 38453 } } peers { permanent_uuid: 
> "d1bf0f72a88f4483a1906bc8246849dd" member_type: VOTER last_known_addr { host: 
> "127.0.52.193" port: 46057 } } peers { permanent_uuid: 
> "5014824b94b64fd49da2c1a645c078aa" member_type: VOTER last_known_addr { host: 
> "127.0.52.194" port: 43942 } } }
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/client-negotiation-failover-itest.cc:186:
>  Failure
> Failed
> Bad status: IO error: Some errors occurred
> ...
> [ FAILED ] ClientFailoverOnNegotiationTimeoutITest.Kudu1580ConnectToTServer 
> (325968 ms){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to