[ 
https://issues.apache.org/jira/browse/KUDU-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daicheng resolved KUDU-3460.
----------------------------
    Fix Version/s: 1.16.0
       Resolution: Not A Problem

> RPC error from VoteRequest()call to peer **:Timed out: RequestConsensusVote 
> RPC to ** time out after 1.713s [SENT]
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3460
>                 URL: https://issues.apache.org/jira/browse/KUDU-3460
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: daicheng
>            Priority: Major
>             Fix For: 1.16.0
>
>         Attachments: image-2023-03-17-15-27-45-755.png, 
> image-2023-03-17-15-28-13-480.png, image-2023-03-17-15-28-40-361.png, 
> image-2023-03-17-15-38-51-218.png
>
>
> we hava 3 kudu_master and 6 kudu_tserver,when  i create 2W tables to kudu, 
> wei got some error, and we cann't read any data from kudu,it throw many 
> errors:
> here the errors from client :
> {code:java}
> Job aborted due to stage failure: Task 0 in stage 35.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 35.0 (TID 9601) (prod-bigdata-mw-159 
> executor 3): java.lang.RuntimeException: 
> org.apache.kudu.client.NonRecoverableException: tablet hasn't heard from 
> leader or there hasn't been a stable leader fo..
> 2023-03-08 09:59:49,198 INFO  org.apache.kudu.client.AsyncKuduClient          
>             [] - Invalidating location master-10.0.2.33:7051(10.0.2.33:7051) 
> for tablet Kudu Master: Service unavailable: ListTables request on 
> kudu.master.MasterService from 10.0.3.82:8764 dropped due to backpressure. 
> The service queue is full; it has 100 items. {code}
> and i found kudu tserver has many error like :
> {code:java}
> W0307 14:36:57.368008 14759 leader_election.cc:334] T 
> fa2a3b405a87466da7a6b1a962f35d99 P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 1640 pre-election: RPC error from VoteRequest() call topeer 
> d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.206s (SENT)W0307 
> 14:36:57.368801 14759 leader_election.cc:334] T 
> 5f8d377660aa46f29e3f1595a33d086c P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer 
> dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.725s (SENT)W0307 
> 14:36:57.368917 14759 leader_election.cc:334] T 
> a32af7dd8af44b47b4b26d7a222c2f6b P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 344 pre-election: RPC error from VoteRequest() call to peer 
> dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.713s (SENT)W0307 
> 14:36:57.369045 14759 leader_election.cc:334] T 
> 15e9b550c3274243a5ee923ceda67dc5 P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 1509 pre-election: RPC error from VoteRequest() call topeer 
> d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 3.056s (SENT)W0307 
> 14:36:57.369563 14759 leader_election.cc:334] T 
> e5e49b443f71478984162a2eb65d3607 P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 1575 pre-election: RPC error from VoteRequest() call topeer 
> d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.553s (SENT)W0307 
> 14:36:57.371872 14759 leader_election.cc:334] T 
> 2ec17c9dd68e47ceb7f572efb9f18fe3 P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 1633 pre-election: RPC error from VoteRequest() call topeer 
> d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.010s (SENT)W0307 
> 14:36:57.372673 14759 leader_election.cc:334] T 
> a91cf24cc4c943cbbd041c7e6726d7aa P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 1610 pre-election: RPC error from VoteRequest() call topeer 
> d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.970s (SENT)W0307 
> 14:36:57.372789 14759 leader_election.cc:334] T 
> cd667f33abb74afba4b9c510b8f6dfaa P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 3 pre-election: RPC error from VoteRequest() call to peer 
> dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.674s (SENT)W0307 
> 14:36:57.373358 14759 leader_election.cc:334] T 
> 39709b52ffe34f81b08d0562e45a7a13 P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 44 pre-election: RPC error from VoteRequest() call to peer 
> d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.636s (SENT)W0307 
> 14:36:57.373525 14759 leader_election.cc:334] T 
> 00da9e2c20814ac88e18f7d7220f01c9 P 5ac35cfccaf84228bf6d589501ec533e 
> [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer 
> dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: 
> RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.524s (SENT) 
> {code}
> and the disk where wal dir located is abnormal
> !image-2023-03-17-15-27-45-755.png|width=314,height=166!!image-2023-03-17-15-28-40-361.png|width=309,height=135!
> here is the wal file look like :
> {code:java}
> schema_version: 0compression_codec: LZ41.1@6873507535186497536 REPLICATE 
> NO_OP        id { term: 1 index: 1 } timestamp: 6873507535186497536 op_type: 
> NO_OP noop_request { }COMMIT 1.1        op_type: NO_OP commited_op_id { term: 
> 1 index: 1 }1.2@6873839930165628928 REPLICATE CHANGE_CONFIG_OP        id { 
> term: 1 index: 2 } timestamp: 6873839930165628928 op_type: CHANGE_CONFIG_OP 
> change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" 
> old_config { opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: 
> "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: 
> "10.0.2.14" port: 7050 } } peers { permanent_uuid: 
> "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: 
> "10.0.2.15" port: 7050 } } peers { permanent_uuid: 
> "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: 
> "10.0.2.20" port: 7050 } } } new_config { opid_index: 2 OBSOLETE_local: false 
> peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER 
> last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: 
> "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: 
> "10.0.2.15" port: 7050 } } peers { permanent_uuid: 
> "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: 
> "10.0.2.20" port: 7050 } } peers { permanent_uuid: 
> "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { 
> host: "10.0.2.19" port: 7050 } attrs { promote: true } } } }COMMIT 1.2        
> op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 2 
> }1.3@6873841023495979008 REPLICATE CHANGE_CONFIG_OP        id { term: 1 
> index: 3 } timestamp: 6873841023495979008 op_type: CHANGE_CONFIG_OP 
> change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" 
> old_config { opid_index: 2 OBSOLETE_local: false peers {permanent_uuid: 
> "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: 
> "10.0.2.14" port: 7050 } } peers { permanent_uuid: 
> "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: 
> "10.0.2.15" port: 7050 } } peers { permanent_uuid: 
> "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: 
> "10.0.2.20" port: 7050 } } peers { permanent_uuid: 
> "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { 
> host: "10.0.2.19" port: 7050 } attrs { promote: true } } } new_config { 
> opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: 
> "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: 
> "10.0.2.14" port: 7050 } } peers { permanent_uuid: 
> "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: 
> "10.0.2.15" port: 7050 } } peers { permanent_uuid: 
> "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: 
> "10.0.2.20" port: 7050 } } peers { permanent_uuid: 
> "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: 
> "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.3        
> op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 3 
> }1.4@6873841038243381248 REPLICATE CHANGE_CONFIG_OP        id { term: 1 
> index: 4 } timestamp: 6873841038243381248 op_type: CHANGE_CONFIG_OP 
> change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" 
> old_config { opid_index: 3 OBSOLETE_local: false peers {permanent_uuid: 
> "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: 
> "10.0.2.14" port: 7050 } } peers { permanent_uuid: 
> "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: 
> "10.0.2.15" port: 7050 } } peers { permanent_uuid: 
> "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: 
> "10.0.2.20" port: 7050 } } peers { permanent_uuid: 
> "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: 
> "10.0.2.19" port: 7050 } attrs { promote: false } } } new_config { 
> opid_index: 4 OBSOLETE_local: false peers { permanent_uuid: 
> "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: 
> "10.0.2.14" port: 7050 } } peers { permanent_uuid: 
> "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: 
> "10.0.2.15" port: 7050 } } peers { permanent_uuid: 
> "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: 
> "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.4 {code}
> and there are many raft worker theads running,
> !image-2023-03-17-15-38-51-218.png|width=704,height=576!
> it seems like system is busy to handle consensus vote, and i didn't got more 
> helpful error logs in kudu, can anyone explain what happened?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to