[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 4:

This change did not cherrypick successfully into branch 2.x. To resolve this, 
please do the cherry-pick manually and submit it to Gerrit at refs/for/2.x or 
add an exception to the branch 2.x copy of bin/ignored_commits.json. Thanks, 
your friendly bot at https://jenkins.impala.io/job/cherrypick-2.x-and-test/520/ 
.


--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 19 May 2018 23:10:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed
- loop query_test & failure for 12 hours with no dchecks or crashes
  (This had previously reproduced IMPALA-7030 and IMPALA-7033 with
  the previous version of this change).

Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Reviewed-on: http://gerrit.cloudera.org:8080/10440
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 396 insertions(+), 391 deletions(-)

Approvals:
  Dan Hecht: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 4
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 18 May 2018 04:36:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 18 May 2018 01:22:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2502/


--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 3
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 18 May 2018 01:23:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 17 May 2018 23:26:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@695
PS1, Line 695: is_fragment_failure ? 
> as mentioned in this thread, it will be no worse than before this change. I
sounds good. Thanks for the explanation.



--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 17 May 2018 22:30:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Dan Hecht (Code Review)
Hello Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10440

to look at the new patch set (#2).

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed
- loop query_test & failure for 12 hours with no dchecks or crashes
  (This had previously reproduced IMPALA-7030 and IMPALA-7033 with
  the previous version of this change).

Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 396 insertions(+), 391 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/10440/2
--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 2
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.h@351
PS1, Line 351: Cannot finalize execution until exec RPCs are no longer
 :   /// being sent.
> We added a DCHECK, but this comment suggests that this method will ensure t
Yeah, I can see how the comment is confusing. Updated the comments. 
(Ultimately, in a future change I likely will move the wait() but want to hold 
off on that to "prove" that the wait is only needed for the 
UpdateBackendExecStatus() call path)


http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@530
PS1, Line 530: // TODO: IMPALA-7011 is this needed? This will also happen on 
the "backend" side of
 :   // cancel rpc. And in the case of EOS, sink already knows this.
> I think we can update this comment, since we figured out that we do need it
Done. (Though I haven't given up on this quite yet -- I think it's needed given 
the current plan-root-sink code, but it might be possible to fix that code to 
make it not needed. But the JIRA can track that work).


http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@695
PS1, Line 695: exec_rpcs_complete_barrier_->Wait();
> should we be concerned about adding a possible long wait() call on an RPC c
as mentioned in this thread, it will be no worse than before this change. In 
the old code, we hold the Coordinator::lock_ while sending the exec RPCs. 
Additionally, we take the Coordinator::lock_ inside UpdateStatus() and also at 
line 688 -- so this RPC handler will always block until the exec rpcs are done 
sending in the old code.

In the new code, the blocking only happens when the query hits an error.

I agree it's not ideal, but we're better off. And I plan to improve things with 
IMPALA-6788.



--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 17 May 2018 22:00:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.h@351
PS1, Line 351: Cannot finalize execution until exec RPCs are no longer
 :   /// being sent.
> this comment is new
We added a DCHECK, but this comment suggests that this method will ensure that 
finalization fails(like returns a non-ok status) if exec RPCs are pending.
we should either move the wait() call to this method or move this comment to 
UpdateBackendExecStatus() and update the one for this to say that "It should 
not be called if exec RPCs are pending"


http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@530
PS1, Line 530: // TODO: IMPALA-7011 is this needed? This will also happen on 
the "backend" side of
 :   // cancel rpc. And in the case of EOS, sink already knows this.
I think we can update this comment, since we figured out that we do need it.


http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@695
PS1, Line 695: exec_rpcs_complete_barrier_->Wait();
should we be concerned about adding a possible long wait() call on an RPC 
callback?



--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 17 May 2018 21:48:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@693
PS1, Line 693:   // We may start receiving status reports before all exec 
rpcs are complete.
 :   // Can't apply state transition until no more exec rpcs 
will be sent.
 :   exec_rpcs_complete_barrier_->Wait();
> Just to make sure I understand the bug correctly, we only need to wait for
For the bug we were hitting. But we also need to wait for all exec rpcs to 
avoid the case of the cancel rpc passing the exec rpc for the other backends 
(the new comment in HandleExecStateTransition is meant to explain that - lmk if 
it needs clarification).

I think what we'll want to do is preserve this barrier, but be able to skip 
sending exec rpcs when there is an error to unblock this barrier sooner.



--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 17 May 2018 19:16:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 1: Code-Review+1

(1 comment)

Bikram may want to take a look too.

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@693
PS1, Line 693:   // We may start receiving status reports before all exec 
rpcs are complete.
 :   // Can't apply state transition until no more exec rpcs 
will be sent.
 :   exec_rpcs_complete_barrier_->Wait();
> Also note that we are still better off than before this patch because previ
Just to make sure I understand the bug correctly, we only need to wait for exec 
RPCs for this backend, right? I.e. waiting for all exec RPCs is sufficient but 
not necessary.

Not suggesting we need to change it.



--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 17 May 2018 18:14:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@693
PS1, Line 693:   // We may start receiving status reports before all exec 
rpcs are complete.
 :   // Can't apply state transition until no more exec rpcs 
will be sent.
 :   exec_rpcs_complete_barrier_->Wait();
> this is the fix for IMPALA-7030/IMPALA-7033 race. Also, I plan to add a reg
Also note that we are still better off than before this patch because 
previously, the Coordinator::lock_ was held while sending exec rpcs and also 
taken by this callback, and so all status report RPCs would block until we 
finish exec rpcs. Now, we only block a failure status.

Also, i plan to improve this further with IMPALA-6788 (which will stop sending 
exec rpcs once we get the first failure report or exec rpc fails).



--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Comment-Date: Thu, 17 May 2018 17:48:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10440 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 1:

(4 comments)

This is the same as the previous reviewed patch at 
https://gerrit.cloudera.org/#/c/10158/ except where noted in this comment. I 
backed out the change in order to add a fix for IMPALA-7030/7033.

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.h@351
PS1, Line 351: Cannot finalize execution until exec RPCs are no longer
 :   /// being sent.
this comment is new


http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@523
PS1, Line 523:   // Can't transition until the exec RPCs are no longer in 
progress. Otherwise, a
 :   // cancel RPC could be missed, and resources freed before a 
backend has had a chance
 :   // to take a resource reference.
 :   DCHECK(exec_rpcs_complete_barrier_ != nullptr &&
 :   exec_rpcs_complete_barrier_->pending() <= 0) << "exec rpcs 
not completed";
this DCHECK is new


http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@664
PS1, Line 664: PrintId(query_id())
I folded in the PrintId() into this version of the change.


http://gerrit.cloudera.org:8080/#/c/10440/1/be/src/runtime/coordinator.cc@693
PS1, Line 693:   // We may start receiving status reports before all exec 
rpcs are complete.
 :   // Can't apply state transition until no more exec rpcs 
will be sent.
 :   exec_rpcs_complete_barrier_->Wait();
this is the fix for IMPALA-7030/IMPALA-7033 race. Also, I plan to add a 
regression test specifically for this race with IMPALA-7046.



--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Comment-Date: Thu, 17 May 2018 17:45:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-17 Thread Dan Hecht (Code Review)
Dan Hecht has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10440


Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed
- loop query_test & failure for 12 hours with no dchecks or crashes
  (This had previously reproduced IMPALA-7030 and IMPALA-7033 with
  the previous version of this change).

Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 398 insertions(+), 391 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/10440/1
--
To view, visit http://gerrit.cloudera.org:8080/10440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Gerrit-Change-Number: 10440
Gerrit-PatchSet: 1
Gerrit-Owner: Dan Hecht 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 22:

This change did not cherrypick successfully into branch 2.x. To resolve this, 
please do the cherry-pick manually and submit it to Gerrit at refs/for/2.x or 
add an exception to the branch 2.x copy of bin/ignored_commits.json. Thanks, 
your friendly bot at https://jenkins.impala.io/job/cherrypick-2.x-and-test/486/ 
.


--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 22
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 12 May 2018 07:10:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Reviewed-on: http://gerrit.cloudera.org:8080/10158
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 393 insertions(+), 392 deletions(-)

Approvals:
  Dan Hecht: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 22
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 21: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 21
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 12 May 2018 00:46:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@476
PS18, Line 476:   DCHECK(exec_status_.ok()) << exec_status_;
> I was thinking the ($3 -> $4) already gives that info, once you get used to
Agreed. That makes sense.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 21
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 23:59:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2455/


--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 21
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 21:18:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 21: Code-Review+2

Rebase


--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 21
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 21:18:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@476
PS18, Line 476:   DCHECK(exec_status_.ok()) << exec_status_;
> Maybe we can have a larger discussion later, and leave this as it is now. B
I was thinking the ($3 -> $4) already gives that info, once you get used to it. 
For an error that results in query error, it will look like:

(EXECUTING -> ERROR)

for other errors (that are effectively dropped), it will look like e.g.:

(CANCELLED -> CANCELLED)
(RETURNED_RESULTS -> RETURNED_RESULTS)
(ERROR -> ERROR)

which says that we didn't enter the error state for this status.

I'm definitely open to tweaking the condition for this log in the future, but 
just thinking given the amount of active development in the code that will call 
this path (i.e. IMPALA-2990), it might be helpful to have at least initially.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 18
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 21:16:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 20: Code-Review+2

(4 comments)

Thanks! This LGTM, +2-ing it since it's already gone through multiple rounds of 
review. All comments are just responses for discussion's sake, and don't 
require any code changes.

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@325
PS18, Line 325: rySummary() and
> ExecState is a scalar (enum). We pass scalar by value - no reason to have i
I agree but on the other hand, keeping it const would help by avoiding 
accidental bugs such as overwriting the exec state in future changes, so it's 
better just to define the read only contract upfront. Also agree with no need 
for a ref.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@279
PS14, Line 279:
> Cancel() is not called for (execution) errors. Thats part of the point of t
I think I meant CancelBackends() as the NotifyAll() lives there, but the 
comment you updated it to is much clearer. Thanks.


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@457
PS18, Line 457: ret_status
> exec_status_ is protected by the lock, so the copy is necessary.
Fair point, I missed that.


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@476
PS18, Line 476:   DCHECK(exec_status_.ok()) << exec_status_;
> My thinking was that it would be good to at least log the error, which is w
Maybe we can have a larger discussion later, and leave this as it is now. But 
just as someone who frequents the logs to debug issues, if I see an error for a 
query, I would assume by default that the query failed, which could lead to 
some confusion, if it in fact didn't fail.

Maybe a solution would be to meet in the middle and tag log messages such as 
this with something like " failed but query succeeded", etc. But we 
don't need to have that convo in this patch.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 20
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 21:11:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@457
PS18, Line 457: ret_status
> Just fyi, there's no copy because return-value-optimisation applies to ret_
RVO eliminates the copy of ret_value into the thing that is accepting it, but 
there's still a copy from exec_status_ while holding the lock. But, yeah, agree 
with your point that it ultimately we end up with one copy from exec_status_ in 
both cases.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 18
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 17:15:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@457
PS18, Line 457: ret_status
> exec_status_ is protected by the lock, so the copy is necessary.
Just fyi, there's no copy because return-value-optimisation applies to 
ret_status.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 18
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 16:57:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 20: Code-Review+1

Carry +1s


--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 20
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 16:44:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 20:

(12 comments)

Thanks for taking a look Sailesh.

http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.h@93
PS19, Line 93: /// 2. exec_state_lock_, backend_states_init_lock_, filter_lock_,
> What about backend_states_init_lock_ ?
Done


http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.h@206
PS19, Line 206:
> comment requires an update.
referenced the class level comment.


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@325
PS18, Line 325: rySummary() and
> const ref
ExecState is a scalar (enum). We pass scalar by value - no reason to have 
indirection.

I can make them const though, if you prefer, given these are leafs routines. 
Generally, I'm not a huge fan of const scalar args because it leads to const 
creep with little benefit (compiler has to perform analysis anyway, and usually 
easy enough to see when non-pointers are not modified/aliased).


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@350
PS18, Line 350:  in all cases releases resources and cal
> These look like they're read only, so they should be passed as const refs.
Scalar and in registers, so not going to pass by reference.


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@357
PS18, Line 357:
> const ref
made it const


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.h@360
PS18, Line 360:
> const ref
made it const


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@127
PS14, Line 127: ();
> You mean Cancel() and UpdateBackendExecStatus()? Maybe this is not worth me
This comment was pre-existing but I'm fine with removing it. It's covered in 
the class comment.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@279
PS14, Line 279:
> To be more explicit, we should say "... or when Cancel() is called."
Cancel() is not called for (execution) errors. Thats part of the point of the 
patch is to distinguish canceled state and error state.
(It's true that the impala server layer can implement an error at that layer 
using Cancel(), but that's not an execution error, and I'd rather not muddle 
the terminology).

Will update this comment to talk about backend cancellation, though.


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@457
PS18, Line 457: ret_status
> Preferable to get rid of the automatic variable 'ret_status' and just retur
exec_status_ is protected by the lock, so the copy is necessary.


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@462
PS18, Line 462: Status ret_status;
> Same here.
Same locking.


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@476
PS18, Line 476:   DCHECK(exec_status_.ok()) << exec_status_;
> If an error happens after exec_state_ is already set to ExecState::RETURNED
My thinking was that it would be good to at least log the error, which is why I 
wrote the code that way.  Alternatively, we could only log when a state 
transition happens, but it seems like it might be good to know when an backend 
error occurs even if not failing the query, but I'm also not a fan of log spew. 
Are you arguing for one way or the other?


http://gerrit.cloudera.org:8080/#/c/10158/18/be/src/runtime/coordinator.cc@486
PS18, Line 486: !status.ok() && !status.IsCancelled() && e
> (!status.ok() && !status.IsCancelled() && exec_status_.IsCancelled())
This path should be rare, but sure we can avoid the copy in that case.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 20
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 11 May 2018 16:37:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-11 Thread Dan Hecht (Code Review)
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10158

to look at the new patch set (#20).

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 393 insertions(+), 392 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/20
--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 20
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/19/be/src/runtime/coordinator.cc@599
PS19, Line 599:   WaitForBackends();
somewhere along the way I broke Kudu DML. Moving this back outside the if-stmt 
fixes that.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 19
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 10 May 2018 23:36:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-10 Thread Dan Hecht (Code Review)
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10158

to look at the new patch set (#19).

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 387 insertions(+), 391 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/19
--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 19
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 18: Code-Review+1

Thanks for the reviews! rebase and carry.


--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 18
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 10 May 2018 17:10:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-10 Thread Dan Hecht (Code Review)
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10158

to look at the new patch set (#17).

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 389 insertions(+), 391 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/17
--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 17
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-10 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@522
PS15, Line 522: query_events_->MarkEvent(exec_state_to_event.at(new_state));
  :   // TODO: is this needed? This will also happen on the 
"backend" side of cancel rpc.
  :   // And in the case of EOS, sink already knows this.
> makes sense.
I filed IMPALA-7011 for that.


http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc@132
PS16, Line 132:
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc@618
PS16, Line 618:
> concurrently
Done



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 14
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 10 May 2018 17:08:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-09 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 16: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@522
PS15, Line 522: // TODO: is this needed? This will also happen on the "backend" 
side of cancel rpc.
  :   // And in the case of EOS, sink already knows this.
  :   if (coord_sink_ != nullptr) coord_sink_->CloseConsumer();
> Yeah, I agree, but would prefer to do it in an independent commit in case t
makes sense.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@943
PS15, Line 943: e : backend_states_) {
> In principle, I agree that's how we'd want it to work. But given the way Cl
sounds good


http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc@132
PS16, Line 132:
nit: extra space



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 16
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 10 May 2018 02:16:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 16: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/16/be/src/runtime/coordinator.cc@618
PS16, Line 618: currently
concurrently



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 16
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 10 May 2018 00:47:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@291
PS14, Line 291:   ExecState exec_state_ = ExecState::EXECUTING;
> I agree it's a little confusing, but there is nothing you can do with this
I think the changes to the header comment made it clearer.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 14
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 10 May 2018 00:31:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-09 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 16:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@322
PS15, Line 322: cu
> double space snuck back in
Done


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@341
PS15, Line 341: ith 'instance_ho
> nit: update to "failed_finstance"
Done


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@414
PS15, Line 414:
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@a115
PS14, Line 115:
  :
> put that back
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@121
PS14, Line 121: may call any
> should we call this a non-OK status, because for cancellation through Cance
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@131
PS14, Line 131:
> do you mean cancels?
I think I meant to delete that.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@147
PS15, Line 147: Status prepare_status = query_state_->WaitForPrepare();
  :   DCHECK(!prepare_status.ok());
  :   return UpdateExecState
> should we call UpdateExecStateIfError() here?
yes, done.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@486
PS15, Line 486: !status.ok() && exec_status_.IsCancelled()
> can we have a case with  =  ?
Ideally no. but I think the backend can sometimes do it due to complications in 
the error propagation path. I would like to get to a point where this can't be 
cancelled, but I don't think we're there yet.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@522
PS15, Line 522: // TODO: is this needed? This will also happen on the "backend" 
side of cancel rpc.
  :   // And in the case of EOS, sink already knows this.
  :   if (coord_sink_ != nullptr) coord_sink_->CloseConsumer();
> I dont think we need this, I looked at why we originally needed this and it
Yeah, I agree, but would prefer to do it in an independent commit in case there 
is some unforeseen issue.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@530
PS15, Line 530:
  :   } else {
> in case of cancelBackends(), should we also wait for resources to be freed?
I think the argument can be made in both directions, but we eventually settled 
on the current behavior. The reasoning is documented in the header comment for 
ReleaseAdmissionControlResources(). I'll update this todo as to not add 
confusion.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@605
PS15, Line 605:
> maybe change the name to something like UpdateExecState, since this is also
The "if error" was meant to say that the exec state is only updated if the 
status is an not-ok and only to move into an error state (as opposed to, say, 
cancell state). But I can see the confusion so i'll change it since you feel 
it'd be clearer. I guess the header comment should clarify that this shouldn't 
be used for cancellation.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@639
PS15, Line 639: DCHECK(exec_rpcs_complete_barrier_ != nullptr &&
  :   exec_rpcs_complete_barrier_->pending() <= 0) << "Exec() 
must be called first";
  :   discard_result(SetNonErrorTerminalSt
> does this mean we can call cancel before Exec() completes? what if cancel i
Yeah, this is bogus. Fixed. The comments in the header already say Exec() needs 
to precede this.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@943
PS15, Line 943: e : backend_states_) {
> Do we expect these methods to be called before Exec() completes? If not, we
In principle, I agree that's how we'd want it to work. But given the way 
ClientRequestState currently works and that things like the webserver just 
reach in and get the coord() object, I think they can currently call these 
webserver support rourtines before Exec() completes. How about I revisit this 
after your patch is merged?



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158

[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-09 Thread Dan Hecht (Code Review)
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10158

to look at the new patch set (#16).

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 389 insertions(+), 391 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/16
--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 16
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-09 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 15:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@341
PS15, Line 341: failed_fragment'
nit: update to "failed_finstance"


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@414
PS15, Line 414:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@121
PS14, Line 121: s not thread
should we call this a non-OK status, because for cancellation through Cancel() 
we should not call it error


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@131
PS14, Line 131: tate_
do you mean cancels?


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@147
PS15, Line 147: Status prepare_status = query_state_->WaitForPrepare();
  :   DCHECK(!prepare_status.ok());
  :   return prepare_status;
should we call UpdateExecStateIfError() here?


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@486
PS15, Line 486: !status.ok() && exec_status_.IsCancelled()
can we have a case with  =  ?


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@522
PS15, Line 522: // TODO: is this needed? This will also happen on the "backend" 
side of cancel rpc.
  :   // And in the case of EOS, sink already knows this.
  :   if (coord_sink_ != nullptr) coord_sink_->CloseConsumer();
I dont think we need this, I looked at why we originally needed this and it was 
because the fragment lifecycle was a little different and we needed to to close 
the sink in case the Prepare phase of the fragments failed. After patch 
IMPALA-2550, we handle partially-prepared state failure differently and hence 
dont require this anymore.

Moreover, since this was used to close partially-prepared state failure and now 
if we fail prepare, coord_sink_ is not even set, so this is essentially 
redundant.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@530
PS15, Line 530: ensure resources are
  : // really freed before admitting more queries?
in case of cancelBackends(), should we also wait for resources to be freed?


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@605
PS15, Line 605: UpdateExecStateIfError
maybe change the name to something like UpdateExecState, since this is also 
used to only get the overall status and it may or may not be an error status


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@639
PS15, Line 639: // Wait until backends have started executing, otherwise the 
cancel rpc might arrive
  :   // before the exec rpc.
  :   exec_rpcs_complete_barrier_->Wait();
does this mean we can call cancel before Exec() completes? what if cancel is 
called before exec_rpcs_complete_barrier_ is initialized and is still nullptr.
We should probably just mark this method to only be called after Exec completes 
because currently that how it is used.


http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@943
PS15, Line 943: backend_states_init_lock_
Do we expect these methods to be called before Exec() completes? If not, we can 
get rid of backend_states_init_lock_ and mark these methods to only be called 
after Exec()



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 15
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 09 May 2018 20:55:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@a115
PS14, Line 115:
  :
put that back


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.cc@623
PS14, Line 623: TOOD
> Inspection of the ClientRequestState code. I think it's already true, but i
It's IMPALA-4580. I'm not gonna muck with that now -- will revert part of the 
header comment to keep this behavior documented.



--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 14
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 09 May 2018 03:41:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@322
PS15, Line 322:
double space snuck back in



-- 
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 15
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 09 May 2018 03:24:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 14:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@23
PS14, Line 23: #include 
> Not your change but these accumulators don't look like they're used.
moved to coordinator-backend-state.h, where they are used.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@33
PS14, Line 33: #include 
> Not used?
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@37
PS14, Line 37: #include "common/hdfs.h"
> Move to .cc?
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@42
PS14, Line 42: #include "runtime/query-state.h"
> I think you could move this to the .cc - QueryState is already forward-decl
to .cc and coordinator-backend-state.h


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@43
PS14, Line 43: #include "scheduling/query-schedule.h"
> Could forward-declare QuerySchedule and move this to the .cc.
query_ctx() was relying on that so moved that to .cc (and moved query_id() 
back).


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@87
PS14, Line 87: /// 1. When an error is encountered while still executing, 
backend cancellation is
> Can we define what "executing" means?
In this context it just means hasn't encountered either of these three 
conditions. I've tried to clarify by rewording.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@99
PS14, Line 99: ///
> Should we document the lock order? Coordinator::lock_ is mentioned in the I
I deleted that impala-server.h comment, lmk if you disagree. Added short 
comment about ordering.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@114
PS14, Line 114:
> extra spaces in this comment
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@130
PS14, Line 130:   /// Cancel execution of query and sets the overall query 
status to CANCELLED if the
> Long line
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@191
PS14, Line 191: accesses
> accessing.
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@214
PS14, Line 214:   boost::mutex wait_lock_;
> Switch to SpinLock? The inconsistency might be confusing otherwise. Probabl
Let me do that in a follow up commit. It should be fine but could have some 
weird subtle side effect.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@218
PS14, Line 218:   /// Keeps track of number of completed ranges and total scan 
ranges. Initialized by Exec().
> Long line.
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@219
PS14, Line 219:   ProgressUpdater progress_;
> I don't feel that strongly but it seems like we could PIMPL this and avoid
Given you don't feel strongly (nor do I) and it's not related to this change 
and the change is large enough (in terms of complexity) let's defer that for 
now.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@291
PS14, Line 291:   ExecState exec_state_ = ExecState::EXECUTING;
> It's a little confusing that this is EXECUTING before Exec() is called. The
I agree it's a little confusing, but there is nothing you can do with this 
object between construction and calling Exec() so not sure it's worth making a 
distinct state (the implementation is also simplified given that all 
transformations result in a terminal state).

Maybe I can clarify with comments -- Which part are you referring to about the 
distinction?


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@325
PS14, Line 325:   /// If 'exec_state_' != EXECUTING, does nothing. Otherwise 
sets 'exec_state_' to
> Maybe a brief high-level description would be helpful. E.g. "Called when th
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@326
PS14, Line 326:   /// 'state' (must be either CANCELLED or RETURNED_RESULTS),
> Formatting of this comment seems weird.
not sure what happened there. fixed.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@333
PS14, Line 333:   /// Transitions 'exec_state_' given an execution status:
> It's a little hard to decode why someone would call this with an OK status
I moved the last line up to the beginning. Hopefully that helps clarify.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@351
PS14, Line 351:
> extra space
Done


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@360
PS14, Line 360: void
> void argument list not needed in C++ - () 

[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-08 Thread Dan Hecht (Code Review)
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10158

to look at the new patch set (#15).

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
---
M be/src/runtime/coordinator-backend-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/impala-server.h
M be/src/util/counting-barrier.h
6 files changed, 391 insertions(+), 390 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/15
--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 15
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 14:

(33 comments)

I think this makes sense. I really need to do another pass to verify my 
understanding. I have a lot of comments but they're mainly cleanup 
opportunities and about comments.

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@23
PS14, Line 23: #include 
Not your change but these accumulators don't look like they're used.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@33
PS14, Line 33: #include 
Not used?


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@37
PS14, Line 37: #include "common/hdfs.h"
Move to .cc?


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@42
PS14, Line 42: #include "runtime/query-state.h"
I think you could move this to the .cc - QueryState is already forward-declared.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@43
PS14, Line 43: #include "scheduling/query-schedule.h"
Could forward-declare QuerySchedule and move this to the .cc.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@87
PS14, Line 87: /// 1. When an error is encountered while still executing, 
backend cancellation is
Can we define what "executing" means?


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@99
PS14, Line 99: ///
Should we document the lock order? Coordinator::lock_ is mentioned in the 
ImpalaServer header but there are multiple locks in here. Or is it sufficient 
to document that individual locks are terminal?


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@114
PS14, Line 114:
extra spaces in this comment


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@130
PS14, Line 130:   /// Cancel execution of query and sets the overall query 
status to CANCELLED if the
Long line


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@191
PS14, Line 191: accesses
accessing.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@214
PS14, Line 214:   boost::mutex wait_lock_;
Switch to SpinLock? The inconsistency might be confusing otherwise. Probably 
doesn't matter that much...


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@218
PS14, Line 218:   /// Keeps track of number of completed ranges and total scan 
ranges. Initialized by Exec().
Long line.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@219
PS14, Line 219:   ProgressUpdater progress_;
I don't feel that strongly but it seems like we could PIMPL this and avoid 
including the header above.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@291
PS14, Line 291:   ExecState exec_state_ = ExecState::EXECUTING;
It's a little confusing that this is EXECUTING before Exec() is called. There's 
a distinction made above about being in an "Executing" state, which is distinct 
from this being EXECUTING. Maybe there should be a different initial state to 
make that part of the state machine explicit?


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@325
PS14, Line 325:   /// If 'exec_state_' != EXECUTING, does nothing. Otherwise 
sets 'exec_state_' to
Maybe a brief high-level description would be helpful. E.g. "Called when the 
query is done executing in a non-error state."


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@326
PS14, Line 326:   /// 'state' (must be either CANCELLED or RETURNED_RESULTS),
Formatting of this comment seems weird.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@333
PS14, Line 333:   /// Transitions 'exec_state_' given an execution status:
It's a little hard to decode why someone would call this with an OK status 
based on the comment - at least until you get all the ways to the last line of 
the comment. I'm not sure that I have a concrete suggestion since the comment 
is already quite long, but maybe you have an idea.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@351
PS14, Line 351:
extra space


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@360
PS14, Line 360: void
void argument list not needed in C++ - () is equivalent to (void) unlike C.


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@406
PS14, Line 406:
extra space


http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.cc@54
PS14, Line 54: using namespace strings;

[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-08 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10158 )

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..


Patch Set 14:

I think this is ready for review now.


--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 14
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 May 2018 18:49:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

2018-05-08 Thread Dan Hecht (Code Review)
Hello Michael Ho, Sailesh Mukil, Tim Armstrong, Bikramjeet Vig,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10158

to look at the new patch set (#14).

Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify 
state
..

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed

Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
---
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/util/counting-barrier.h
4 files changed, 368 insertions(+), 370 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/10158/14
--
To view, visit http://gerrit.cloudera.org:8080/10158
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e
Gerrit-Change-Number: 10158
Gerrit-PatchSet: 14
Gerrit-Owner: Dan Hecht 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tim Armstrong