Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14824 )
Change subject: IMPALA-9199: Add support for single query retries on cluster membership changes ...................................................................... Patch Set 25: (20 comments) http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/coordinator.cc@937 PS24, Line 937: parent_query_dri > This isn't valid. (and you have similar issues elsewhere) Done http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h File be/src/runtime/query-driver.h: http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@1 PS24, Line 1: // Licensed to the Apache Software Foundation (ASF) under one > The distinction isn't that clear, but it might make more sense to put this I think client-request-state probably belongs in the runtime folder. I plan to move some of this around in IMPALA-9370. Left a comment to follow up on this in IMPALA-9370. http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@39 PS24, Line 39: QueryHandle > I think this name is confusing, esp. since there's already a QueryHandle in I like referring to this as a handle, since its really just a pointer into other query level objects. I think some of these re-factoring / re-renmaing decisions should be done in IMPALA-9370 so that we can tackle them all together and ensure that all class names are consistent with one another. http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@82 PS24, Line 82: /// *Transparent* Query Retries: > Not a huge deal in this patch since it's still hidden behind a flag and exp That is a good point. I added some notes here and in the uber-JIRA as a disclaimer. It's a bit hard to change all the places where this is referred to as "transparent query retries" and I think many folks are already using the phrase. So I think at this point it is a matter of setting proper expectations. I didn't explicitly mention the GetRuntimeProfile() changes so we are planning to revise the behavior in IMPALA-9229 Yeah - I've tested this with Hue, and it works as expected. There are a few observability improvements that could be made to Hue to improve this. The uber-JIRA (IMPALA-9124) has some sub-tasks related to impala-shell usability improvements. http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@136 PS24, Line 136: /// query option 'retry_failed_queries') and if the query can be retried. Queries can > The way you're both returning a Status and also returning some status infor Doesn't return a Status anymore. The input Status is used to set the error message "Max retry limit was hit" which does show up in the final error message (see TestQueryRetries::test_multiple_retries) http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@189 PS24, Line 189: owns this QueryDriver. > ? Forgot to remove this. Deleted. http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@194 PS24, Line 194: : /// The ClientRequestState > ? Forgot to remove this. Deleted. http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.h@204 PS24, Line 204: /// 'RunFrontendPlanner'. > Now that this is just move()-ed from 'exec_request_' I'm not sure why its s Do you mean that CreateRetriedClientRequestState could just pass in a pointer to exec_request_ rather than retry_exec_request_ when creating the new ClientRequestState? I guess we could do that, although I think its a little easier to understand the code in its current form. you move the TExecRequest out from the original ClientRequestState into the new one. http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.cc File be/src/runtime/query-driver.cc: http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.cc@97 PS24, Line 97: if (client_request_state->fetched_rows()) { > This function would be more readable if you turned these into: Done http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/runtime/query-driver.cc@302 PS24, Line 302: try_request > Instead of this make_unique/move() thing, why not just have SetOriginalId() Done http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-hs2-server.cc@884 PS24, Line 884: QueryHandle query_handle; > I'm not sure that in its current state QueryHandle is really doing much. Th I think the point of the QueryHandle is to enforce that whenever users use a ClientRequestState, that the shared_ptr to the corresponding QueryDriver is in scope. QueryHandle might not be the perfect way to do this, but I think its better than the previous solution where you had to make sure the shared_ptr<QueryDriver> was always on the stack whenever you used a ClientRequestState. Now you can just use the QueryHandle and as long as it is still in scope, you can use its member variables. I think the guarantee QueryHandle makes, is that as long as QueryHandle is in scope, you can use its member variables (e.g. ClientRequestState), which feels pretty similar to other object ownership (e.g. an object owns some number of member variables). I created a GetQueryHandle and a GetActiveQueryHandle as you suggested in your other comments and migrated almost all of the usages of Get.*ClientRequestState to use these methods. The only exception is a few methods in the ImpalaHttpHandler which use the GetQueryDriver::DoFuncForAllEntries method. I probably could re-factor that as well, but not sure it is worth the effort. Open to doing it though if necessary. http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-http-handler.cc@809 PS24, Line 809: QueryHandle query_handle; > Noted elsewhere, but I think if you added an 'active' flag to GetQueryHandl Done. Added a GetActiveQueryHandle(). http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.h@932 PS24, Line 932: /// ClientRequestStates. See 'GetQueryDriver' for a description of the > Might be worth naming this GetActiveQueryHandle() for clarity. Done http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc@1160 PS24, Line 1160: RETURN_IF_ERROR(query_handle.query_driver->StartUnregister()); > I'm not sure why this is necessary. I guess it has the effect of making Unr This was based on https://gerrit.cloudera.org/#/c/15821/10 - basically there used to be a version of IMPALA-9380 that was written on top of this patch. I'm not sure I completely follow your comments, but my understanding is that this is needed to support async query unregistration ( IMPALA-9380 - the commit message has some details about thread safety). http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc@1167 PS24, Line 1167: unreg_thread_pool_->Offer(move(query_handle)); > I think you'll want to still call move() in situations like this - otherwis Done http://gerrit.cloudera.org:8080/#/c/14824/24/be/src/service/impala-server.cc@1586 PS24, Line 1586: // cluster. CancellationWorkCause::BACKEND_FAILED indicates that a backend running > I'm wondering why we unregister the query in this case, but only cancel it This got deleted since TryQueryRetry doesn't return a Status anymore. http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@642 PS25, Line 642: Status status = GetQueryHandle(query_id, &query_handle, /*return_unregistered=*/ true); > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@644 PS25, Line 644: ClientRequestState* request_state = query_handle.request_state; > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@710 PS25, Line 710: Status status = GetQueryHandle(query_id, &query_handle, /*return_unregistered=*/ true); > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/14824/25/be/src/service/impala-server.cc@712 PS25, Line 712: ClientRequestState* request_state = query_handle.request_state; > line has trailing whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/14824 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2e4a0e72a9bf8ec10b91639aefd81bef17886ddd Gerrit-Change-Number: 14824 Gerrit-PatchSet: 25 Gerrit-Owner: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Fri, 08 May 2020 19:16:53 +0000 Gerrit-HasComments: Yes