[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

Michael Ho (Code Review) Thu, 26 Jul 2018 13:28:14 -0700

Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )


Change subject: IMPALA-7163: Implement a state machine for the QueryState class
......................................................................


Patch Set 7:

(9 comments)

Looking good. Some more comments.

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/fragment-instance-state.cc@125
PS7, Line 125: DCHECK(
Does DCHECK_EQ() not work ?


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@67
PS7, Line 67: however,
            : /// if any fragment instance hits an error or cancellation, then 
we immediately change the
            : /// state of the query to the ERROR or CANCELLED state 
accordingly.
This is not true for PREPARING phase, right ? Seems easier to separately 
describe the behavior in PREPARING phase and EXECUTING phase instead of making 
a general statement like this and talking about exception in the next paragraph.


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@70
PS7, Line 70: EXECUTING
PREPARING ?


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@71
PS7, Line 71: all the underlying FISes will be in Prepare()
This is not necessarily true, right ? In PREPARING state, it only means at 
least one fragment instances is running Prepare(), right ? Some fragment 
instances could already be past Prepare() and executing Open(), Exec() etc, 
right ?


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@264
PS7, Line 264: BackendExecState backend_exec_state_ = 
BackendExecState::PREPARING;
May want to document the thread safety of this variable. I suppose it's only 
updated by the "query-state thread" only, right ?


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.h@287
PS7, Line 287: /// ID of first fragment instance to hit an error.
Please consider documenting the thread safety of this variable too.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@308
PS6, Line 308:
             :   /// Number of active fragment instances and coordinators for 
this query that may consume
             :   /// resources for query execution (i.e. threads, memory) on 
the Impala daemon.
             :   /// Query-wide execution resources for this query are released 
once this goes to zero.
             :   AtomicInt32 exec_resource_refcnt_;
             :
             :   /// Temporary files for this query (owned by obj_pool_). 
Non-null if spilling is
             :   /// enabled. Set in Pr
> I feel that it's not ideal having the fragment instance threads modify the
Now that the patch has sunk in more, I think the state machine the query status 
and the query state has a slightly counter intuitive relationship which exists 
today already.

query_status_ represents the error status which any fragment instances can hit 
either in PREPARING or EXECUTING phase.

The query state means whether at least one fragment instances is still in that 
said state. So, a query state of PREPARING only means at least one fragment 
instance is still preparing but other fragment instances could already be 
executing.

The problem is that each fragment instance will directly report to the 
coordinator on failure today independent of other fragment instances' states.

In the next patch, the status reporting will be driven solely by the query 
state thread so we may actually track the status of PREPARING phase and 
EXECUTING phase separately like you did here. The change in behavior will be 
that we will report any error hit in PREPARING phase first before any error hit 
in EXECUTING phase. Do you think that will make the system easier to reason 
about ?


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc@230
PS7, Line 230:         BackendExecState::EXECUTING :
             :         BackendExecState::FINISHED;
nit: one line


http://gerrit.cloudera.org:8080/#/c/10813/7/be/src/runtime/query-state.cc@479
PS7, Line 479:   {
             :     unique_lock<SpinLock> l(status_lock_);
             :     query_status_ = Status::CANCELLED;
             :   }
Should this happen after line 483 below ?

Actually, do we need to set the query_status_ here ? Won't the first thread 
which notices the cancellation set it when calling ErrorDuringExec() set it ?

Do we have to worry about overwriting the error status if query_status_ already 
contains some errors ?



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Jul 2018 20:27:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7163: Implement a state machine for the QueryState class

Reply via email to