Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16369 )

Change subject: IMPALA-9636: Don't run retried query on the blacklisted nodes
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/runtime/query-driver.cc@128
PS8, Line 128:
             :     // Set blacklisted_executor_addresses for 
client_request_state of the original query.
             :     if (blacklisted_executor_addresses != nullptr
             :         && !blacklisted_executor_addresses->empty()) {
             :       client_request_state_->SetBlacklistedExecutorAddresses(
             :           *blacklisted_executor_addresses);
             :     }
> instead of passing blacklisted_executor_addresses into this method, it seem
Good suggestion. It reduces code change. Fixed it as suggested.


http://gerrit.cloudera.org:8080/#/c/16369/4/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16369/4/be/src/scheduling/admission-controller.cc@1572
PS4, Line 1572: reated) delete filtered_executor_group;
> right now, if a query is retried and all executors are blacklisted, the que
Yes, we can fix it by changing the returned status.


http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc@1120
PS8, Line 1120:   if (!request.blacklisted_executor_addresses.empty()) {
              :     // This debug_action is to allow delay to be injected for 
the admission of a retried
              :     // query, which is triggered by blacklisted node, before 
getting membership snapshot.
              :     DebugActionNoFail(FLAGS_debug_actions, 
"ADMISSION_DELAY_FOR_RETRIED_QUERY");
              :   }
> can you use CRS_BEFORE_ADMISSION instead?
No, it does not work for adding new cases which need to trigger blacklist 
timeout for retried query. CRS_BEFORE_ADMISSION apply all queries. If we use 
it, the original attempt of query will be failed due to timeout, instead of RPC 
failure for the new test cases. We need a debug action which only apply to 
retried query.


http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/scheduler.cc@163
PS8, Line 163:       // If exec_at_coord equals false and number of executors 
in the executor group
             :       // equals 0, the overloaded ComputeScanRangeAssignment(), 
which will be called
             :       // below, will return error.
> not sure I fully understand why the DCHECK needs to be removed, could you e
If all executor are blacklisted, the number of executors in filtered executor 
could be 0. This could happen and is already handled in overloaded 
ComputeScanRangeAssignment() so we should remove DCHECK here. Actually the new 
test case  
test_query_retries.py::TestQueryRetries::test_retry_query_failure_no_executor_available
 will hit this DCHECK failure if we don't remove it.


http://gerrit.cloudera.org:8080/#/c/16369/8/tests/custom_cluster/test_query_retries.py
File tests/custom_cluster/test_query_retries.py:

http://gerrit.cloudera.org:8080/#/c/16369/8/tests/custom_cluster/test_query_retries.py@287
PS8, Line 287:  _get_rpc_fail_action(FAILED_KRPC_PORT_2)
> nice! I think this actually solves IMPALA-9227 as well.
Maybe we need to add more test cases for IMPALA-9227.



--
To view, visit http://gerrit.cloudera.org:8080/16369
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00bc1b5026efbd0670ffbe57bcebc457d34cb105
Gerrit-Change-Number: 16369
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Sep 2020 03:21:56 +0000
Gerrit-HasComments: Yes

Reply via email to