Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10855 )
Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC ...................................................................... Patch Set 4: (30 comments) http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16 PS4, Line 16: This patch also introduces a new service pool for all query execution : control related RPCs in the future so that control commands from : coordinators aren't blocked by long-running DataStream services' RPCs. > yep, that's essentially what I meant. Thanks for explaining it better than Thanks for the suggestion. I think it's fine to do it as a follow-up or parallel work on Kudu side. This doesn't necessarily block the merging of this patch IMHO as ReportExecStatus() can already be late today for various reasons such as an overloaded coordinator. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h@31 PS4, Line 31: #include "kudu/util/slice.h" > Not needed? Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc File be/src/runtime/dml-exec-state.cc: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@404 PS4, Line 404: void DmlExecState::ToProto(InsertExecStatusPB* dml_status) { > do you want to clear dml_status first just in case? Similarly for other ToP Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@455 PS4, Line 455: if (!kudu_stats->has_num_row_errors()) { : kudu_stats->set_num_row_errors(0); : } > I don't think this is necessary- it's not illegal to access an "unset" fiel Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h File be/src/runtime/exec-env.h: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h@136 PS4, Line 136: DataStreamService* data_svc() const { return data_svc_.get(); } > Not used, we can remove this. Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@268 PS4, Line 268: rpc_controller.AddOutboundSidecar(move(sidecar), &sidecar_idx).ok > should this be a CHECK_OK? Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@273 PS4, Line 273: "final" > need a space here Oops. Done. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@274 PS4, Line 274: ERROR > should this be a DFATAL? do you expect this to ever actually happen? I don't think this is expected to happen very often but it seems more robust to not crash Impala because of failure to serialize the query profile. The query can still run to completion without the profile. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@283 PS4, Line 283: req.clear_error_log(); > isnt the error log already clear because this is a new instance? Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@284 PS4, Line 284: state->GetUnreportedErrors(req.mutable_error_log()); > it's a shame that this mutates the state, making retries a bit more difficu The new patch creates a single instance of the RPC parameter and reuse it on retry. My understanding is that the RPC layer shouldn't mutate the input RPC request parameter. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@289 PS4, Line 289: rpc_mgr->GetProxy > clang-tidy failure: Missing status check. Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@293 PS4, Line 293: kudu::Status rpc_status = proxy->ReportExecStatus(req, &resp, &rpc_controller); > do you want any timeout on this RPC? Yes. Switched back to the behavior of existing code of using FLAGS_backend_client_rpc_timeout_ms. This is not ideal but I guess it's simplest to just preserve the existing behavior and fix IMPALA-2990 in a separate patch. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@305 PS4, Line 305: // We can retry the RPC if the payload hasn't been sent yet. > are these reports idempotent? seems like it shoudl be easy to make them ide Yes, I believe they are idempotent. So, I was worried about the connection being reset in the middle of transmission so only partial payload was sent. In that case, I think the server side will also drop that connection so it should be fine I suppose. Comments removed. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@306 PS4, Line 306: RpcMgr::IsServerTooBusy(rpc_controller > I'm a bit fuzzy on the context of this code, but if we get TOO_BUSY, it see Restored to the old behavior of retrying for at most 2 times. It will still fail due to IMPALA-2990 which we will work on fixing later (e.g. coordinator cancelling query if certain fragment hasn't sent a report for an extended period of time). http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@32 PS4, Line 32: #include "gen-cpp/control_service.pb.h" > Not needed. Forward declaring ReportExecStatusRequestPB should suffice. Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@35 PS4, Line 35: #include "gen-cpp/RuntimeProfile_types.h" > Not needed. Forward declaring TRuntimeProfileTree should suffice. Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@23 PS4, Line 23: #include "common/status.h" : #include "runtime/mem-tracker.h" > Not required. You can forward declare 'Status', 'MemTracker' and 'MetricGro Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@34 PS4, Line 34: class ImpalaServer; > Not required. Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@53 PS4, Line 53: /// Reference to the singleton ImpalaServer object. Not owned. : ImpalaServer* impala_server_ = nullptr; > This is probably not needed. You can get the ImpalaServer using: Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@60 PS4, Line 60: num_svc_threads > Was there any noticeable slowdown on stress workloads since we now have onl No serious measure on perf yet but yes, there is no point in pushing beyond number of cores. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@71 PS4, Line 71: > Shouldn't we add: Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@72 PS4, Line 72: // TODO: implement something more efficient here, we're currently : // acquiring/releasing the map lock and doing a map lookup for : // every report (assign each query a local int32_t id and use that to index into a : // vector of ClientRequestStates, w/o lookup or locking?) > Sounds reasonable to me. TODO removed. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@87 PS4, Line 87: mem_tracker_->Release(rpc_context->GetTransferSize()); : rpc_context->RespondSuccess(); > is everything in this function exception-safe? eg I thought Thrift could oc As far as I know, yes. DeserializeThriftMsg() has a try-catch clause inside to catch exception. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@104 PS4, Line 104: if (LIKELY(sidecar_status.ok())) { > do you want to warn or even CHECK on this? I switched to using a CHECK() for any operations with sidecar as that's unexpected but I kept the error handling code for thrift deserialization failure. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@111 PS4, Line 111: dummy > nit: empty Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h File be/src/util/container-util.h: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h@147 PS4, Line 147: void MergeMapValues(const google::protobuf::Map<K, V>& src, > If you templatized this on the map type instead of the K and V types, you c Good point. Done. http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc File be/src/util/error-util.cc: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@144 PS4, Line 144: for (auto iter : entry.messages()) *stream << iter << "\n"; > Formatting. Also, 'iter' isn't an iterator, it's the element itself, so we' Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@170 PS4, Line 170: for (auto iter : entry.messages()) target->add_messages(iter); > Formatting, we should avoid single line for loops. Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@194 PS4, Line 194: if (target.messages_size() == 0) target.add_messages(e.msg()); > Formatting Done http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h File be/src/util/runtime-profile.h: http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h@28 PS4, Line 28: #include "kudu/util/faststring.h" > Forward declare 'faststring' Done -- To view, visit http://gerrit.cloudera.org:8080/10855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe Gerrit-Change-Number: 10855 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Wed, 01 Aug 2018 19:05:01 +0000 Gerrit-HasComments: Yes