Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16369 )
Change subject: IMPALA-9636: Don't run retried query on the blacklisted nodes ...................................................................... Patch Set 12: (3 comments) http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/16369/8/be/src/scheduling/admission-controller.cc@1120 PS8, Line 1120: cluster_membership_mgr_->GetSnapshot(); : DCHECK(membership_snapshot.get() != nullptr); : string blacklist_str = membership_snapshot->executor_blacklist.BlacklistToString(); : if (!blacklist_str.empty()) { : > why would the original attempt fail due to a timeout instead of an RPC erro I am wrong. The unit test failed due to timeout since I did not set enough waiting time when adding delay for both original and retried queries. I adjusted the heart-beat interval, sleep time with CRS_BEFORE_ADMISSION and total waiting time for the query, the new unit-test is passed. Removed this debug action, and use CRS_BEFORE_ADMISSION instead. http://gerrit.cloudera.org:8080/#/c/16369/11/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/16369/11/be/src/scheduling/admission-controller.cc@1579 PS11, Line 1579: DCHECK(!group_state->executor_group().empty()); > might want to add "LOG(WARNING) << queue_node->not_admitted_reason;" as wel Added LOG message. http://gerrit.cloudera.org:8080/#/c/16369/11/be/src/scheduling/admission-controller.cc@1575 PS11, Line 1575: Status ret = : ExecEnv::GetInstance()->scheduler()->Schedule(group_config, group_state.get()); : if (created) delete filtered_executor_group; : if (UNLIKELY(!ret.ok())) return ret; : DCHECK(!group_state->executor_group().empty()); : output_schedules->emplace_back(std::move(group_state), *executor_group); : } > couldn't you do this before calling scheduler()->Schedule(...)? if num_exec Right, move the code before schedule() is called. -- To view, visit http://gerrit.cloudera.org:8080/16369 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I00bc1b5026efbd0670ffbe57bcebc457d34cb105 Gerrit-Change-Number: 16369 Gerrit-PatchSet: 12 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Fri, 11 Sep 2020 14:12:39 +0000 Gerrit-HasComments: Yes