Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/16215 )
Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc@554 PS3, Line 554: server->CancelQueriesOnFailedBackends(current_backend_set); > Yeah for this and your other comment below, probably the cleanest thing is I was thinking to reuse the backend set and save a loop with one callback function to handle two tasks. But it seems not clean. Will change to register two separate callback functions as suggested by Thomas. http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc@560 PS3, Line 560: cluster_membership_mgr_->RegisterUpdateCallbackFn( : [server](ClusterMembershipMgr::SnapshotPtr snapshot) { : std::unordered_set<BackendIdPB> current_backend_set; : for (const auto& it : snapshot->current_backends) { : current_backend_set.insert(it.second.backend_id()); : } : server->CancelQueriesOnFailedBackends(current_backend_set); : }); > not necessarily related to your change, but it would be nice to make this c Will fix it. http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc@222 PS3, Line 222: // TODO: create cancellation task queue and working thread to run cancellation tasks : // on a separate thread. If the queue is full, ignore the cancellations since we'll : // be able to process them on the next heartbeat instead. : : for (auto& qs : to_cancel) { : VLOG(1) << "CancelQueriesForFailedCoordinators(): cancel query " << qs->query_id(); : qs->Cancel(); : qs->is_coord_active_.Store(false); : ReleaseQueryState(qs); : } > I had suggested not to put this in ImpalaServer because its primarily the c Will define a new thread pool owned by QueryExecMgr. http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc@222 PS3, Line 222: // TODO: create cancellation task queue and working thread to run cancellation tasks : // on a separate thread. If the queue is full, ignore the cancellations since we'll : // be able to process them on the next heartbeat instead. : : for (auto& qs : to_cancel) { : VLOG(1) << "CancelQueriesForFailedCoordinators(): cancel query " << qs->query_id(); : qs->Cancel(); : qs->is_coord_active_.Store(false); : ReleaseQueryState(qs); : } > if you move this into impala-server.cc you can use the ImpalaServer::Cancel The item type and work_function which runs for each item are set when creating a thread pool. It seems that one thread pool cannot serve two different types of tasks. The class CancellationWork is defined to handle cancellation of remote running fragments for coordinator. Maybe we should define a separate cancellation work for executer to cancel local running fragments and create a separate thread pool with one thread. http://gerrit.cloudera.org:8080/#/c/16215/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/16215/3/common/thrift/ImpalaInternalService.thrift@607 PS3, Line 607: 27: optional Types.TUniqueId coord_backend_id Will change coord_backend_id data type as UniqueIdPB and move it to FragmentInstanceExecStatusPB in control_service.proto. http://gerrit.cloudera.org:8080/#/c/16215/3/tests/custom_cluster/test_restart_services.py File tests/custom_cluster/test_restart_services.py: http://gerrit.cloudera.org:8080/#/c/16215/3/tests/custom_cluster/test_restart_services.py@241 PS3, Line 241: def test_kill_coordinator(self): > This is an okay place for this test (obviously we have a lot of different t Yes, it's better to move it to test_process_failures.py. Will do. -- To view, visit http://gerrit.cloudera.org:8080/16215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd Gerrit-Change-Number: 16215 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Comment-Date: Tue, 21 Jul 2020 20:30:51 +0000 Gerrit-HasComments: Yes