Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15840 )
Change subject: IMPALA-6984: coordinator cancels backends on EOS ...................................................................... Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/15840/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15840/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-6984 > there seems to be some concern in https://issues.apache.org/jira/browse/IMP Yeah I did see that too and kept that in mind. Sending the cancel RPC without changing the query status will force the status reports to be sent, but still have them applied because IsDone() is still false. This is kinda the cowardly way to solve the problem - I'm not trying to do anything with status reports for otherwise cancelled/failed queries. I.e. not touching IMPALA-539. I think the other weirdness around summaries has been fixed by other changes - I updated that JIRA. http://gerrit.cloudera.org:8080/#/c/15840/3//COMMIT_MSG@28 PS3, Line 28: Manually inspected logs of some queries with limit, : saw that it sent cancellation then waited for backends : as expected. > would be nice to have a test for this, but could be deferred to a follow JI I couldn't think of a reliable way for automated tests to distinguish between this and other means of cancellation, since the user-visible behaviour is the same. There's also multiple ways that cancellation can propagate, so the timing is hard to reason about. I was able to construct an example query that got significantly faster as a result of this change. I added it to targeted-perf. http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.h@121 PS3, Line 121: Returns true if : /// cancellation was attempted, false otherwise. > nit: needs updating Done http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.h@393 PS3, Line 393: bool sent_cancel_rpc_ = false; > is there any chance we intentionally decided to *not* add something like th I don't believe I changed that behaviour, just decoupled this from status_. Previously Cancel() always set status_ to non-OK, then subsequent attempts did an early exit (the code path with "Not cancelling because the backend is already done"). I think there are multiple layers to the question about what happens when an RPC is dropped, since we depend on the cancel RPC reaching the executor, and the status report being sent back to the coordinator. IMPALA-2990 handles that scenario on the coordinator, and we handle it on the executor by returning an error when it sends a status report for a query that's in a terminal state. http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.cc@561 PS3, Line 561: result.became_done = true; > this should only be set if the status_ gets set to CANCELLED above, right? Done -- To view, visit http://gerrit.cloudera.org:8080/15840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I966eceaafdc18a019708b780aee4ee9d70fd3a47 Gerrit-Change-Number: 15840 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 05 May 2020 00:52:49 +0000 Gerrit-HasComments: Yes