Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12260 )

Change subject: IMPALA-7985: Port RemoteShutdown() to KRPC.
......................................................................


Patch Set 2:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12260/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12260/1//COMMIT_MSG@13
PS1, Line 13: imapald
typo


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc@653
PS2, Line 653:   // KRPC relies on resolved IP address, so convert hostname.
             :   IpAddr ip_address;
             :   Status ip_status = HostnameToIpAddr(request.backend.hostname, 
&ip_address);
             :   if (!ip_status.ok()) {
             :     VLOG(1) << "Could not convert hostname " << 
request.backend.hostname
             :             << " to ip address, error: " << 
ip_status.GetDetail();
             :     return ip_status;
             :   }
             :   TNetworkAddress addr = MakeNetworkAddress(ip_address, port);
Instead of resolving the IP for the backend, it seems more consistent to re-use 
some of the logic at the scheduler which already contains a map of all the 
backend state:

https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L354-L357

May need some refactoring in the scheduler code to expose the necessary backend 
states. It's unfortunate that the existing 'executor_config_' uses 'hostname, 
be_port' as lookup key. May be it makes sense to implement a new function which 
looks up if there exists a known backend with 'hostname and krpc_address which 
exposes the port specified' and if so, returns the krpc_address_. If there 
isn't a known backend which maps to the specified 'host + krpc port', then we 
should just fail the shutdown request without making the actual RPC.

Please feel free to come up with other ideas too.


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/client-request-state.cc@689
PS2, Line 689: if (msg.find("RemoteShutdown() RPC failed: Timed out: connection 
negotiation to")
             :         != string::npos) {
If we take the route of using existing backend states suggested above, the case 
of invalid port should be impossible so we shouldn't need to call this out like 
below here.


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h@79
PS2, Line 79:   /// Retry the Rpc 'rpc_call' up to 3 times.
            :   /// Pass 'debug_action' to DebugAction() to potentially inject 
errors.
            :   template <typename F>
            :   static Status DoRpcWithRetry(F&& rpc_call, const TQueryCtx& 
query_ctx,
            :       const char* debug_action, const char* error_msg) {
Not specifically related to this change but it may be useful to factor this 
retry function to generic RPC code in rpc-mgr.inline.h.

Also, if the RPC fails due to being overloaded, it may make sense to have some 
sleep time between retries.

FWIW, we had some retry logic in KrpcDataStreamSender too and I believe we also 
had similar retry logic for the status reporting code. It may be worth the 
effort to consider refactoring these code so we don't implement the same retry 
logic at multiple places:

https://github.com/apache/impala/blob/master/be/src/runtime/krpc-data-stream-sender.cc#L374-L386

https://github.com/apache/impala/blob/master/be/src/runtime/query-state.cc#L332-L336


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.h@87
PS2, Line 87: MonoDelta::FromSeconds(10)
Not sure if it makes sense to hard code the timeout here as it seems to be 
utility function used by different RPC calls. Would it be better for the 
timeout to be one of the function parameter ? At the minimum, if we decide that 
10 seconds is the timeout for all RPCs in ControlService, we should at least 
define it as some constant or #define to make it more explicit.

I have a similar question about the number of retries too (which is hardcoded 
to 3).


http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/12260/1/be/src/service/control-service.h@81
PS1, Line 81: name F>
typo ? DoRpcWithRetry ?


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.cc@182
PS2, Line 182: class
Why is this needed ?


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/control-service.cc@183
PS2, Line 183: class
Why is this needed ?


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/impala-server.cc@2435
PS2, Line 2435:                     "queries registered on coordinator: $2, 
queries executing: $3, "
The original formatting seems fine to me.


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/impala-server.cc@2481
PS2, Line 2481:   if (set_deadline)
              :     
shutdown_status->set_deadline_remaining_ms(relative_deadline_s * 1000L);
Coding style nit:

if (set_deadline) {
    ....
}


http://gerrit.cloudera.org:8080/#/c/12260/2/be/src/service/impala-server.cc@2489
PS2, Line 2489: ShutdownStatusPB
const ShutdownStatusPB&



--
To view, visit http://gerrit.cloudera.org:8080/12260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4fd00ee4e638f5e71e27893162fd65501ef9e74e
Gerrit-Change-Number: 12260
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Comment-Date: Tue, 29 Jan 2019 07:56:21 +0000
Gerrit-HasComments: Yes

Reply via email to