Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10855 )

Change subject: IMPALA-7213: Port ReportExecStatus() RPC to use KRPC
......................................................................


Patch Set 4:

(30 comments)

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10855/4//COMMIT_MSG@16
PS4, Line 16: This patch also introduces a new service pool for all query 
execution
            : control related RPCs in the future so that control commands from
            : coordinators aren't blocked by long-running DataStream services' 
RPCs.
> yep, that's essentially what I meant. Thanks for explaining it better than
Thanks for the suggestion. I think it's fine to do it as a follow-up or 
parallel work on Kudu side. This doesn't necessarily block the merging of this 
patch IMHO as ReportExecStatus() can already be late today for various reasons 
such as an overloaded coordinator.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/coordinator.h@31
PS4, Line 31: #include "kudu/util/slice.h"
> Not needed?
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@404
PS4, Line 404: void DmlExecState::ToProto(InsertExecStatusPB* dml_status) {
> do you want to clear dml_status first just in case? Similarly for other ToP
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/dml-exec-state.cc@455
PS4, Line 455:    if (!kudu_stats->has_num_row_errors()) {
             :       kudu_stats->set_num_row_errors(0);
             :     }
> I don't think this is necessary- it's not illegal to access an "unset" fiel
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/exec-env.h@136
PS4, Line 136: DataStreamService* data_svc() const { return data_svc_.get(); }
> Not used, we can remove this.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@268
PS4, Line 268: rpc_controller.AddOutboundSidecar(move(sidecar), &sidecar_idx).ok
> should this be a CHECK_OK?
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@273
PS4, Line 273: "final"
> need a space here
Oops. Done.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@274
PS4, Line 274: ERROR
> should this be a DFATAL? do you expect this to ever actually happen?
I don't think this is expected to happen very often but it seems more robust to 
not crash Impala because of failure to serialize the query profile. The query 
can still run to completion without the profile.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@283
PS4, Line 283:     req.clear_error_log();
> isnt the error log already clear because this is a new instance?
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@284
PS4, Line 284:     state->GetUnreportedErrors(req.mutable_error_log());
> it's a shame that this mutates the state, making retries a bit more difficu
The new patch creates a single instance of the RPC parameter and reuse it on 
retry. My understanding is that the RPC layer shouldn't mutate the input RPC 
request parameter.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@289
PS4, Line 289: rpc_mgr->GetProxy
> clang-tidy failure: Missing status check.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@293
PS4, Line 293:   kudu::Status rpc_status = proxy->ReportExecStatus(req, &resp, 
&rpc_controller);
> do you want any timeout on this RPC?
Yes. Switched back to the behavior of existing code of using 
FLAGS_backend_client_rpc_timeout_ms. This is not ideal but I guess it's 
simplest to just preserve the existing behavior and fix IMPALA-2990 in a 
separate patch.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@305
PS4, Line 305:     // We can retry the RPC if the payload hasn't been sent yet.
> are these reports idempotent? seems like it shoudl be easy to make them ide
Yes, I believe they are idempotent. So, I was worried about the connection 
being reset in the middle of transmission so only partial payload was sent. In 
that case, I think the server side will also drop that connection so it should 
be fine I suppose. Comments removed.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/runtime/query-state.cc@306
PS4, Line 306: RpcMgr::IsServerTooBusy(rpc_controller
> I'm a bit fuzzy on the context of this code, but if we get TOO_BUSY, it see
Restored to the old behavior of retrying for at most 2 times. It will still 
fail due to IMPALA-2990 which we will work on fixing later (e.g. coordinator 
cancelling query if certain fragment hasn't sent a report for an extended 
period of time).


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@32
PS4, Line 32: #include "gen-cpp/control_service.pb.h"
> Not needed. Forward declaring ReportExecStatusRequestPB should suffice.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/client-request-state.h@35
PS4, Line 35: #include "gen-cpp/RuntimeProfile_types.h"
> Not needed. Forward declaring TRuntimeProfileTree should suffice.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@23
PS4, Line 23: #include "common/status.h"
            : #include "runtime/mem-tracker.h"
> Not required. You can forward declare 'Status', 'MemTracker' and 'MetricGro
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@34
PS4, Line 34: class ImpalaServer;
> Not required.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.h@53
PS4, Line 53:    /// Reference to the singleton ImpalaServer object. Not owned.
            :    ImpalaServer* impala_server_ = nullptr;
> This is probably not needed. You can get the ImpalaServer using:
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@60
PS4, Line 60: num_svc_threads
> Was there any noticeable slowdown on stress workloads since we now have onl
No serious measure on perf yet but yes, there is no point in pushing beyond 
number of cores.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@71
PS4, Line 71:
> Shouldn't we add:
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@72
PS4, Line 72:   // TODO: implement something more efficient here, we're 
currently
            :   // acquiring/releasing the map lock and doing a map lookup for
            :   // every report (assign each query a local int32_t id and use 
that to index into a
            :   // vector of ClientRequestStates, w/o lookup or locking?)
> Sounds reasonable to me.
TODO removed.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@87
PS4, Line 87:     mem_tracker_->Release(rpc_context->GetTransferSize());
            :     rpc_context->RespondSuccess();
> is everything in this function exception-safe? eg I thought Thrift could oc
As far as I know, yes. DeserializeThriftMsg() has a try-catch clause inside to 
catch exception.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@104
PS4, Line 104:     if (LIKELY(sidecar_status.ok())) {
> do you want to warn or even CHECK on this?
I switched to using a CHECK() for any operations with sidecar as that's 
unexpected but I kept the error handling code for thrift deserialization 
failure.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/service/control-service.cc@111
PS4, Line 111: dummy
> nit: empty
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h
File be/src/util/container-util.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/container-util.h@147
PS4, Line 147: void MergeMapValues(const google::protobuf::Map<K, V>& src,
> If you templatized this on the map type instead of the K and V types, you c
Good point. Done.


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc
File be/src/util/error-util.cc:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@144
PS4, Line 144: for (auto iter : entry.messages()) *stream << iter << "\n";
> Formatting. Also, 'iter' isn't an iterator, it's the element itself, so we'
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@170
PS4, Line 170: for (auto iter : entry.messages()) target->add_messages(iter);
> Formatting, we should avoid single line for loops.
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/error-util.cc@194
PS4, Line 194: if (target.messages_size() == 0) target.add_messages(e.msg());
> Formatting
Done


http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h
File be/src/util/runtime-profile.h:

http://gerrit.cloudera.org:8080/#/c/10855/4/be/src/util/runtime-profile.h@28
PS4, Line 28: #include "kudu/util/faststring.h"
> Forward declare 'faststring'
Done



--
To view, visit http://gerrit.cloudera.org:8080/10855
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7638583b433dcac066b87198e448743d90415ebe
Gerrit-Change-Number: 10855
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Wed, 01 Aug 2018 19:05:01 +0000
Gerrit-HasComments: Yes

Reply via email to