Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14677 )
Change subject: IMPALA-9137: Blacklist node if a DataStreamService RPC to the node fails ...................................................................... IMPALA-9137: Blacklist node if a DataStreamService RPC to the node fails Introduces a new optional field to FragmentInstanceExecStatusPB: AuxErrorInfoPB. AuxErrorInfoPB contains optional metadata associated with a failed fragment instance. Currently, AuxErrorInfoPB only contains one field: RPCErrorInfoPB, which is only set if the fragment failed because a RPC to another impalad failed. The RPCErrorInfoPB contains the destination node of the failed RPC and the posix error code of the failed RPC. Coordinator::UpdateBackendExecStatus(ReportExecStatusRequestPB, ...) uses the information in RPCErrorInfoPB (if one is set) to blacklist the target node. While RPCErrorInfoPB::dest_node can be set to the address of the Coordinator, the Coordinator will not blacklist itself. The Coordinator only blacklists the node if the RPC failed with a specific error code (currently either ENOTCONN, ECONNREFUSED, ESHUTDOWN). Testing: * Ran core tests * Added new test to test_blacklist.py Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c Reviewed-on: http://gerrit.cloudera.org:8080/14677 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> --- M be/src/runtime/coordinator.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/util/network-util.cc M be/src/util/network-util.h M common/protobuf/common.proto M common/protobuf/control_service.proto M tests/custom_cluster/test_blacklist.py 11 files changed, 230 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/14677 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I733cca13847fde43c8ea2ae574d3ae04bd06419c Gerrit-Change-Number: 14677 Gerrit-PatchSet: 15 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Michael Ho <michael...@gmail.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>