Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14824 )

Change subject: IMPALA-9199: Add support for single query retries on cluster 
membership changes
......................................................................


Patch Set 25:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/coordinator.cc@937
PS24, Line 937: parent_query_dri
> This isn't valid. (and you have similar issues elsewhere)
Done


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h
File be/src/runtime/query-driver.h:

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@1
PS24, Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> The distinction isn't that clear, but it might make more sense to put this
I think client-request-state probably belongs in the runtime folder. I plan to 
move some of this around in IMPALA-9370. Left a comment to follow up on this in 
IMPALA-9370.


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@39
PS24, Line 39: QueryHandle
> I think this name is confusing, esp. since there's already a QueryHandle in
I like referring to this as a handle, since its really just a pointer into 
other query level objects.
I think some of these re-factoring / re-renmaing decisions should be done in 
IMPALA-9370 so that we can tackle them all together and ensure that all class 
names are consistent with one another.


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@82
PS24, Line 82: /// *Transparent* Query Retries:
> Not a huge deal in this patch since it's still hidden behind a flag and exp
That is a good point. I added some notes here and in the uber-JIRA as a 
disclaimer. It's a bit hard to change all the places where this is referred to 
as "transparent query retries" and I think many folks are already using the 
phrase. So I think at this point it is a matter of setting proper expectations. 
I didn't explicitly mention the GetRuntimeProfile() changes so we are planning 
to revise the behavior in IMPALA-9229
Yeah - I've tested this with Hue, and it works as expected. There are a few 
observability improvements that could be made to Hue to improve this. The 
uber-JIRA (IMPALA-9124) has some sub-tasks related to impala-shell usability 
improvements.


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@136
PS24, Line 136:   /// query option 'retry_failed_queries') and if the query can 
be retried. Queries can
> The way you're both returning a Status and also returning some status infor
Doesn't return a Status anymore.
The input Status is used to set the error message "Max retry limit was hit" 
which does show up in the final error message (see 
TestQueryRetries::test_multiple_retries)


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@189
PS24, Line 189:  owns this QueryDriver.
> ?
Forgot to remove this. Deleted.


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@194
PS24, Line 194:
              :   /// The ClientRequestState
> ?
Forgot to remove this. Deleted.


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@204
PS24, Line 204:   /// 'RunFrontendPlanner'.
> Now that this is just move()-ed from 'exec_request_' I'm not sure why its s
Do you mean that CreateRetriedClientRequestState could just pass in a pointer 
to exec_request_ rather than retry_exec_request_ when creating the new 
ClientRequestState? I guess we could do that, although I think its a little 
easier to understand the code in its current form. you move the TExecRequest 
out from the original ClientRequestState into the new one.


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.cc
File be/src/runtime/query-driver.cc:

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.cc@97
PS24, Line 97:     if (client_request_state->fetched_rows()) {
> This function would be more readable if you turned these into:
Done


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.cc@302
PS24, Line 302: try_request
> Instead of this make_unique/move() thing, why not just have SetOriginalId()
Done


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-hs2-server.cc@884
PS24, Line 884:   QueryHandle query_handle;
> I'm not sure that in its current state QueryHandle is really doing much. Th
I think the point of the QueryHandle is to enforce that whenever users use a 
ClientRequestState, that the shared_ptr to the corresponding QueryDriver is in 
scope. QueryHandle might not be the perfect way to do this, but I think its 
better than the previous solution where you had to make sure the 
shared_ptr<QueryDriver> was always on the stack whenever you used a 
ClientRequestState. Now you can just use the QueryHandle and as long as it is 
still in scope, you can use its member variables.
I think the guarantee QueryHandle makes, is that as long as QueryHandle is in 
scope, you can use its member variables (e.g. ClientRequestState), which feels 
pretty similar to other object ownership (e.g. an object owns some number of 
member variables).
I created a GetQueryHandle and a GetActiveQueryHandle as you suggested in your 
other comments and migrated almost all of the usages of Get.*ClientRequestState 
to use these methods. The only exception is a few methods in the 
ImpalaHttpHandler which use the GetQueryDriver::DoFuncForAllEntries method. I 
probably could re-factor that as well, but not sure it is worth the effort. 
Open to doing it though if necessary.


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-http-handler.cc@809
PS24, Line 809:   QueryHandle query_handle;
> Noted elsewhere, but I think if you added an 'active' flag to GetQueryHandl
Done. Added a GetActiveQueryHandle().


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.h@932
PS24, Line 932:   /// ClientRequestStates. See 'GetQueryDriver' for a 
description of the
> Might be worth naming this GetActiveQueryHandle() for clarity.
Done


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc@1160
PS24, Line 1160:   
RETURN_IF_ERROR(query_handle.query_driver->StartUnregister());
> I'm not sure why this is necessary. I guess it has the effect of making Unr
This was based on https://gerrit.cloudera.org/#/c/15821/10 - basically there 
used to be a version of IMPALA-9380 that was written on top of this patch.
I'm not sure I completely follow your comments, but my understanding is that 
this is needed to support async query unregistration (
IMPALA-9380 - the commit message has some details about thread safety).


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc@1167
PS24, Line 1167:   unreg_thread_pool_->Offer(move(query_handle));
> I think you'll want to still call move() in situations like this - otherwis
Done


http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc@1586
PS24, Line 1586:     // cluster. CancellationWorkCause::BACKEND_FAILED 
indicates that a backend running
> I'm wondering why we unregister the query in this case, but only cancel it
This got deleted since TryQueryRetry doesn't return a Status anymore.


http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@642
PS25, Line 642:     Status status = GetQueryHandle(query_id, &query_handle, 
/*return_unregistered=*/ true);
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@644
PS25, Line 644:       ClientRequestState* request_state = 
query_handle.request_state;
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@710
PS25, Line 710:     Status status = GetQueryHandle(query_id, &query_handle, 
/*return_unregistered=*/ true);
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@712
PS25, Line 712:       ClientRequestState* request_state = 
query_handle.request_state;
> line has trailing whitespace
Done



--
To view, visit http://gerrit.cloudera.org:8080/14824
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd
Gerrit-Change-Number: 14824
Gerrit-PatchSet: 25
Gerrit-Owner: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 May 2020 19:16:53 +0000
Gerrit-HasComments: Yes

Reply via email to