Juan Yu has posted comments on this change. Change subject: IMPALA-3575: Add retry to backend connection request and rpc timeout ......................................................................
Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/3343/15/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS15, Line 1490: // Try to send the RPC 3 times before failing. > Why try 3 times? Have you seen in your testing where there's failure on the This is to increase the chance the cancel request can reach remote nodes to avoid orphan fragments. If network is not stable, we could get "send expire" error on the coordinator to remote node connection, but the report status callback might keep working so remote nodes don't aware there is connection issue with coordinator. Though DoRpc() will always retry once, in the situation of connection storm, it might not be able to create a new connection at first retry. If you are unlucky, you could get a closed connection from cache (this could happen if CreateClient() in ClientCacheHelper::ReopenClient() fails for previous RPC call). then the cancel request might not get a chance to send out. -- To view, visit http://gerrit.cloudera.org:8080/3343 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964 Gerrit-PatchSet: 15 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Juan Yu <j...@cloudera.com> Gerrit-Reviewer: Alan Choi <a...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Juan Yu <j...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-HasComments: Yes