[ https://issues.apache.org/jira/browse/KUDU-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke updated KUDU-2918: ------------------------------ Labels: stability supportability (was: ) > Rebalancer can fail when a service queue is full > ------------------------------------------------ > > Key: KUDU-2918 > URL: https://issues.apache.org/jira/browse/KUDU-2918 > Project: Kudu > Issue Type: Bug > Components: CLI, ksck > Affects Versions: 1.11.0 > Reporter: Adar Dembo > Priority: Major > Labels: stability, supportability > > The various low-level RPCs issued by ksck aren't retried if the corresponding > service queues are full. These include GetConsensusState, GetStatus, and > ListTablets. > Without retries, ksck (and the rebalancer) can fail midway: > {noformat} > I0812 11:21:10.669682 42799 rebalancer.cc:831] tablet > d729fb149e804696a0862adacb725d66: a0dca75bbbfb4de69616694834adf930 -> > 24d0eb73b3c64a0f901ae092186b3439 move is abandoned: Remote error: Service > unavailable: GetConsensusState request on kudu.consensus.ConsensusService > from 10.17.182.15:50754 dropped due to backpressure. The service queue is > full; it has 50 items. > I0812 11:21:10.871894 42799 rebalancer.cc:239] re-synchronizing cluster state > Illegal state: tablet server 0d88ff7360b74d1e81cd2ccd41fab8a5 > (foo.bar.com:7050): unacceptable health status UNAVAILABLE > {noformat} > The helper classes in rpc/rpc.h may be useful here. -- This message was sent by Atlassian Jira (v8.3.4#803005)