Jean-Daniel Cryans created KUDU-1934: ----------------------------------------
Summary: tservers aggressively try to reconnect to masters Key: KUDU-1934 URL: https://issues.apache.org/jira/browse/KUDU-1934 Project: Kudu Issue Type: Bug Components: tserver Affects Versions: 1.3.0 Reporter: Jean-Daniel Cryans Related to KUDU-1933, I had mismatched 1.3 snapshots between the master and the tservers which caused them to try to reconnect to the master infinitely. Since they do it as fast as they can, the logs were quickly full of: {noformat} I0307 23:55:21.228502 70832 heartbeater.cc:291] Connected to a master server at ve0120.halxg.cloudera.com:7051 I0307 23:55:21.228528 70832 heartbeater.cc:359] Registering TS with master... I0307 23:55:21.228865 70832 heartbeater.cc:389] Master ve0120.halxg.cloudera.com:7051 requested a full tablet report, sending... W0307 23:55:21.346961 70832 heartbeater.cc:499] Failed to heartbeat to ve0120.halxg.cloudera.com:7051: Remote error: Failed to send heartbeat to master: Not authorized: invalid CSR: CSR did not contain expected username. (CSR: '' RPC: 'kudu') I0307 23:55:22.347733 70832 heartbeater.cc:291] Connected to a master server at ve0120.halxg.cloudera.com:7051 I0307 23:55:22.347757 70832 heartbeater.cc:359] Registering TS with master... I0307 23:55:22.348042 70832 heartbeater.cc:389] Master ve0120.halxg.cloudera.com:7051 requested a full tablet report, sending... W0307 23:55:22.467021 70832 heartbeater.cc:499] Failed to heartbeat to ve0120.halxg.cloudera.com:7051: Remote error: Failed to send heartbeat to master: Not authorized: invalid CSR: CSR did not contain expected username. (CSR: '' RPC: 'kudu') {noformat} Sounds like we should do backoff retries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)