[ 
https://issues.apache.org/jira/browse/KUDU-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Percy updated KUDU-2433:
-----------------------------
    Summary: Possible stack-smashing bug  (was: Potential flake in 
client-negotiation-failover-itest due to negotiation timeout)

> Possible stack-smashing bug
> ---------------------------
>
>                 Key: KUDU-2433
>                 URL: https://issues.apache.org/jira/browse/KUDU-2433
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Mike Percy
>            Priority: Major
>         Attachments: client-negotiation-failover-itest.txt
>
>
> There appears to be a rare flake in client-negotiation-failover-itest in 
> ClientFailoverOnNegotiationTimeoutITest.Kudu1580ConnectToTServer due to a 
> negotiation timeout. This appeared on one of the test dashboards:
> {code:java}
> I0507 20:10:39.140327 1387 raft_consensus.cc:2459] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Leader pre-election lost for term 4. Reason: could not achieve 
> majority
> W0507 20:10:39.140895 296 leader_election.cc:286] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> 5014824b94b64fd49da2c1a645c078aa: Network error: Client connection 
> negotiation failed: client connection to 127.0.52.194:43942: connect: 
> Connection refused (error 111)
> I0507 20:10:39.165484 1389 raft_consensus.cc:435] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election (detected failure of leader 
> b5e887ddd8a14cf3b2c24a0aa14be025)
> I0507 20:10:39.165598 1389 raft_consensus.cc:2704] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.853s (starting election)
> I0507 20:10:39.165699 1389 raft_consensus.cc:457] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election with config: opid_index: -1 OBSOLETE_local: 
> false peers { permanent_uuid: "b5e887ddd8a14cf3b2c24a0aa14be025" member_type: 
> VOTER last_known_addr { host: "127.0.52.195" port: 38453 } } peers { 
> permanent_uuid: "d1bf0f72a88f4483a1906bc8246849dd" member_type: VOTER 
> last_known_addr { host: "127.0.52.193" port: 46057 } } peers { 
> permanent_uuid: "5014824b94b64fd49da2c1a645c078aa" member_type: VOTER 
> last_known_addr { host: "127.0.52.194" port: 43942 } }
> I0507 20:10:39.166122 1389 leader_election.cc:230] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> b5e887ddd8a14cf3b2c24a0aa14be025
> I0507 20:10:39.166318 1389 leader_election.cc:230] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> 5014824b94b64fd49da2c1a645c078aa
> W0507 20:10:39.166895 297 leader_election.cc:286] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> b5e887ddd8a14cf3b2c24a0aa14be025: Timed out: RequestConsensusVote RPC to 
> 127.0.52.195:38453 timed out after 0.633s (SENT)
> I0507 20:10:39.166954 297 leader_election.cc:257] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Election decided. Result: candidate lost.
> W0507 20:10:39.167076 296 leader_election.cc:286] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> 5014824b94b64fd49da2c1a645c078aa: Network error: Client connection 
> negotiation failed: client connection to 127.0.52.194:43942: connect: 
> Connection refused (error 111)
> I0507 20:10:39.167102 1389 raft_consensus.cc:2704] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.685s (election complete)
> I0507 20:10:39.167136 1389 raft_consensus.cc:2459] T 
> a0376a5e3b9f495aa00d366b35594a27 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Leader pre-election lost for term 4. Reason: could not achieve 
> majority
> W0507 20:10:39.179502 660 negotiation.cc:313] Failed RPC negotiation. Trace:
> 0507 20:10:36.182479 (+ 0us) reactor.cc:577] Submitting negotiation task for 
> client connection to 127.0.52.195:38453
> 0507 20:10:36.182694 (+ 215us) negotiation.cc:98] Waiting for socket to 
> connect
> 0507 20:10:36.182707 (+ 13us) client_negotiation.cc:160] Beginning negotiation
> 0507 20:10:36.182797 (+ 90us) client_negotiation.cc:237] Sending NEGOTIATE 
> NegotiatePB request
> 0507 20:10:39.179166 (+2996369us) negotiation.cc:304] Negotiation complete: 
> Timed out: Client connection negotiation failed: client connection to 
> 127.0.52.195:38453
> Metrics: {"client-negotiator.queue_time_us":63}
> W0507 20:10:39.179760 659 meta_cache.cc:216] Tablet 
> a0376a5e3b9f495aa00d366b35594a27: Replica b5e887ddd8a14cf3b2c24a0aa14be025 
> (127.0.52.195:38453) has failed: Timed out: Client connection negotiation 
> failed: client connection to 127.0.52.195:38453
> W0507 20:10:39.242803 297 leader_election.cc:286] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> b5e887ddd8a14cf3b2c24a0aa14be025: Timed out: RequestConsensusVote RPC to 
> 127.0.52.195:38453 timed out after 0.101s (SENT)
> I0507 20:10:39.242909 297 leader_election.cc:257] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Election decided. Result: candidate lost.
> I0507 20:10:39.243211 1389 raft_consensus.cc:2704] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.097s (election complete)
> I0507 20:10:39.243288 1389 raft_consensus.cc:2459] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Leader pre-election lost for term 4. Reason: could not achieve 
> majority
> W0507 20:10:39.263934 297 leader_election.cc:286] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 5 pre-election: RPC error from VoteRequest() call to peer 
> b5e887ddd8a14cf3b2c24a0aa14be025: Timed out: RequestConsensusVote RPC to 
> 127.0.52.195:38453 timed out after 0.975s (SENT)
> I0507 20:10:39.264031 297 leader_election.cc:257] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 5 pre-election: Election decided. Result: candidate lost.
> I0507 20:10:39.264344 1389 raft_consensus.cc:2704] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.720s (election complete)
> I0507 20:10:39.264427 1389 raft_consensus.cc:2459] T 
> d80f4fae4b9f4d238f7f626507290390 P d1bf0f72a88f4483a1906bc8246849dd [term 4 
> FOLLOWER]: Leader pre-election lost for term 5. Reason: could not achieve 
> majority
> I0507 20:10:39.340955 1389 raft_consensus.cc:435] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election (detected failure of leader 
> b5e887ddd8a14cf3b2c24a0aa14be025)
> I0507 20:10:39.341055 1389 raft_consensus.cc:2704] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd: Snoozing 
> failure detection for 0.311s (starting election)
> I0507 20:10:39.341125 1389 raft_consensus.cc:457] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd [term 3 
> FOLLOWER]: Starting pre-election with config: opid_index: -1 OBSOLETE_local: 
> false peers { permanent_uuid: "5014824b94b64fd49da2c1a645c078aa" member_type: 
> VOTER last_known_addr { host: "127.0.52.194" port: 43942 } } peers { 
> permanent_uuid: "d1bf0f72a88f4483a1906bc8246849dd" member_type: VOTER 
> last_known_addr { host: "127.0.52.193" port: 46057 } } peers { 
> permanent_uuid: "b5e887ddd8a14cf3b2c24a0aa14be025" member_type: VOTER 
> last_known_addr { host: "127.0.52.195" port: 38453 } }
> I0507 20:10:39.341584 1389 leader_election.cc:230] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> 5014824b94b64fd49da2c1a645c078aa
> I0507 20:10:39.341753 1389 leader_election.cc:230] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: Requesting pre-vote from peer 
> b5e887ddd8a14cf3b2c24a0aa14be025
> W0507 20:10:39.342707 296 leader_election.cc:286] T 
> 7161aaf1a71d455883ef478337dfaf19 P d1bf0f72a88f4483a1906bc8246849dd 
> [CANDIDATE]: Term 4 pre-election: RPC error from VoteRequest() call to peer 
> 5014824b94b64fd49da2c1a645c078aa: Network error: Client connection 
> negotiation failed: client connection to 127.0.52.194:43942: connect: 
> Connection refused (error 111)
> W0507 20:10:39.443085 659 batcher.cc:348] Timed out: Failed to write batch of 
> 1 ops to tablet a0376a5e3b9f495aa00d366b35594a27 after 261 attempt(s): Failed 
> to write to server: (no server available): Write(tablet: 
> a0376a5e3b9f495aa00d366b35594a27, num_ops: 1, num_attempts: 261) passed its 
> deadline: Illegal state: Replica d1bf0f72a88f4483a1906bc8246849dd is not 
> leader of this config. Role: FOLLOWER. Consensus state: current_term: 3 
> leader_uuid: "b5e887ddd8a14cf3b2c24a0aa14be025" committed_config { 
> opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: 
> "b5e887ddd8a14cf3b2c24a0aa14be025" member_type: VOTER last_known_addr { host: 
> "127.0.52.195" port: 38453 } } peers { permanent_uuid: 
> "d1bf0f72a88f4483a1906bc8246849dd" member_type: VOTER last_known_addr { host: 
> "127.0.52.193" port: 46057 } } peers { permanent_uuid: 
> "5014824b94b64fd49da2c1a645c078aa" member_type: VOTER last_known_addr { host: 
> "127.0.52.194" port: 43942 } } }
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/client-negotiation-failover-itest.cc:186:
>  Failure
> Failed
> Bad status: IO error: Some errors occurred
> ...
> [ FAILED ] ClientFailoverOnNegotiationTimeoutITest.Kudu1580ConnectToTServer 
> (325968 ms){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to