Juan Yu has posted comments on this change.

Change subject: IMPALA-3575: Add retry to backend connection request and rpc 
timeout
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3343/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS15, Line 1490: // Try to send the RPC 3 times before failing.
> Why try 3 times? Have you seen in your testing where there's failure on the
This is to increase the chance the cancel request can reach remote nodes to 
avoid orphan fragments. If network is not stable, we could get "send expire" 
error on the coordinator to remote node connection, but the report status 
callback might keep working so remote nodes don't aware there is connection 
issue with coordinator.
Though DoRpc() will always retry once, in the situation of connection storm, it 
might not be able to create a new connection at first retry. 
If you are unlucky, you could get a closed connection from cache (this could 
happen if CreateClient() in ClientCacheHelper::ReopenClient() fails for 
previous RPC call). then the cancel request might not get a chance to send out.


-- 
To view, visit http://gerrit.cloudera.org:8080/3343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id6723cfe58df6217f4a9cdd12facd320cbc24964
Gerrit-PatchSet: 15
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Juan Yu <j...@cloudera.com>
Gerrit-Reviewer: Alan Choi <a...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Juan Yu <j...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to