Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15840 )

Change subject: IMPALA-6984: coordinator cancels backends on EOS
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15840/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15840/3//COMMIT_MSG@7
PS3, Line 7: IMPALA-6984
> there seems to be some concern in https://issues.apache.org/jira/browse/IMP
Yeah I did see that too and kept that in mind. Sending the cancel RPC without 
changing the query status will force the status reports to be sent, but still 
have them applied because IsDone() is still false.

This is kinda the cowardly way to solve the problem - I'm not trying to do 
anything with status reports for otherwise cancelled/failed queries. I.e. not 
touching IMPALA-539. I think the other weirdness around summaries has been 
fixed by other changes - I updated that JIRA.


http://gerrit.cloudera.org:8080/#/c/15840/3//COMMIT_MSG@28
PS3, Line 28: Manually inspected logs of some queries with limit,
            : saw that it sent cancellation then waited for backends
            : as expected.
> would be nice to have a test for this, but could be deferred to a follow JI
I couldn't think of a reliable way for automated tests to distinguish between 
this and other means of cancellation, since the user-visible behaviour is the 
same. There's also multiple ways that cancellation can propagate, so the timing 
is hard to reason about.

I was able to construct an example query that got significantly faster as a 
result of this change. I added it to targeted-perf.


http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.h@121
PS3, Line 121: Returns true if
             :   /// cancellation was attempted, false otherwise.
> nit: needs updating
Done


http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.h@393
PS3, Line 393:   bool sent_cancel_rpc_ = false;
> is there any chance we intentionally decided to *not* add something like th
I don't believe I changed that behaviour, just decoupled this from status_. 
Previously Cancel() always set status_ to non-OK, then subsequent attempts did 
an early exit (the code path with "Not cancelling because the backend is 
already done").

I think there are multiple layers to the question about what happens when an 
RPC is dropped, since we depend on the cancel RPC reaching the executor, and 
the status report being sent back to the coordinator. IMPALA-2990 handles that 
scenario on the coordinator, and we handle it on the executor by returning an 
error when it sends a status report for a query that's in a terminal state.


http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/15840/3/be/src/runtime/coordinator-backend-state.cc@561
PS3, Line 561:     result.became_done = true;
> this should only be set if the status_ gets set to CANCELLED above, right?
Done



--
To view, visit http://gerrit.cloudera.org:8080/15840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I966eceaafdc18a019708b780aee4ee9d70fd3a47
Gerrit-Change-Number: 15840
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 00:52:49 +0000
Gerrit-HasComments: Yes

Reply via email to