Henry Robinson has posted comments on this change.

Change subject: IMPALA-4014: Introduce query-wide execution state.
......................................................................


Patch Set 6:

(11 comments)

Some more comments about the refcounting, many in response to yours.

http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS6, Line 509: qs = nullptr;
superfluous, since it is never read again.


http://gerrit.cloudera.org:8080/#/c/4418/4/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS4, Line 86: // start new thread to execute instance
            :   Thread::Run("query-exec-mgr",
            :       Substitute("exec-fragment-instance-$0", PrintId(
> i felt both approaches are equally valid.
Having some fragment instance management logic here confuses the 
responsibilities of this class and of QueryState (which I would expect to 
control the lifecycle of the fragment instance as well as contain its 
metadata). There naturally seems to be a hierarchy of responsibilities: 
QueryExecMgr manages QueryStates, QueryStates manage FragmentInstances. That 
encapsulation makes the code easier to reason about.


PS4, Line 115: // always decrement refcount
> i thought about this, but didn't see it as a bug.
Maybe it's not a bug as implemented, but it's counter-intuitive that there 
could be more than one QS over time (it's tripped two reviewers up now). It is 
a simpler invariant that there is one QES over the lifetime of the query. I 
think there are bugs waiting to happen. For example: we add aggregated state / 
metrics to the QES and then wonder why they get reset over time.


PS4, Line 147: if (it == qs_map_.end()) return;
> after decreasing the refcnt to 0 in l138 someone else could have increased 
I think this is again unnecessarily confusing. Why not have the semantics such 
that:

  refcount == 0 -> no more references may be taken && caller to RQS() that sets 
count == 0 performs (or schedules) the GC.

If we stipulate that a query is live for as long as its fragments are running, 
then having the QES start out with refcount == 1 (like you already do) is 
compatible with the above invariant, and much easier to reason about.


http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

PS6, Line 76: qs->refcnt_.Load()
beware that this can change between line 72 and now. Just store the result of 
Add() and log that.


http://gerrit.cloudera.org:8080/#/c/4418/4/be/src/runtime/query-exec-mgr.h
File be/src/runtime/query-exec-mgr.h:

PS4, Line 58: 
> i'll change this comment to fit the existing behavior.
Elsewhere you say that you're happy with a QS getting recreated if the original 
gets GCed. I think that allowing refcount == 0 as a valid live state 
complicates the state machine for a refcounted object. For now, if the refcount 
hits 0, it's not deterministic whether or not a QS will get GC'ed. It's more 
understandable to make that relationship unambiguous, and I've expanded on that 
in other comments.

Who are the clients that you expect to do a second registration? If you're 
concerned about the case where a fragment completes before another fragment 
arrives, that seems like an edge case that's not worth adapting the semantics 
for (given that that case won't be possible once batching happens). Eventually 
I would expect registration to happen once per query, called from one path only 
(the batched RPC handler).


http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

PS6, Line 89: // Helper function to translate between Beeswax and HiveServer2 
type
            : static TOperationState::type QueryStateToTOperationState(
            :     const beeswax::QueryState::type& query_state) {
            :   switch (query_state) {
            :     case beeswax::QueryState::CREATED: return 
TOperationState::INITIALIZED_STATE;
            :     case beeswax::QueryState::RUNNING: return 
TOperationState::RUNNING_STATE;
            :     case beeswax::QueryState::FINISHED: return 
TOperationState::FINISHED_STATE;
            :     case beeswax::QueryState::EXCEPTION: return 
TOperationState::ERROR_STATE;
            :     default: return TOperationState::UKNOWN_STATE;
            :   }
            : }
While you're here, put in anonymous namespace.


http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/util/thread.cc
File be/src/util/thread.cc:

PS6, Line 305: //ThreadMgr* thread_mgr = ExecEnv::GetInstance()->thread_mgr();
?


PS6, Line 313: //thread_mgr_ref->AddThread(this_thread::get_id(), name, 
category, system_tid);
?


http://gerrit.cloudera.org:8080/#/c/4418/6/be/src/util/thread.h
File be/src/util/thread.h:

PS6, Line 146: /// Starts a detached thread running 'functor' and returns 
immediately.
             :   static void StartDetachedThread(con
What's a detached thread? Why doesn't this method register threads so that we 
can monitor them?


PS6, Line 172: SuperviseDetachedThread
Needs a comment. At this moment I don't see why this exists.


-- 
To view, visit http://gerrit.cloudera.org:8080/4418
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I962ae6b7cb7dc0d07fbb8f70317aeb01d88d400b
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to