Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10813 )

Change subject: IMPALA-7163: Implement a state machine for the QueryState class
......................................................................


Patch Set 7:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc@a113
PS6, Line 113:
> May make sense to replace this by checking the state. Doing a racy read of
I'm not sure if there's a way to do a racy read on an AtomicEnum, so I've used 
current_state_.Load() here.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@152
PS6, Line 152:   /// Not idempotent, not thread-safe. Must only be called by a 
single thread.
> update comment to mention that it blocks till a terminal state has been rea
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@156
PS6, Line 156: mentInstanceState* GetFInstanc
> nit: avoid reference to private methods. how about switching the descriptio
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@198
PS6, Line 198:  void DonePreparing() { d
> If you take the suggestion to only keep a single 'query_status_', there doe
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@212
PS6, Line 212:   }
> This is only meaningful when there is an error status, right ? Please docum
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@234
PS6, Line 234: on
> fragment instance
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@271
PS6, Line 271:
> Consider calling it "PREPARING"
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@274
PS6, Line 274:
> Consider calling it "EXECUTING"
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@281
PS6, Line 281: / The ove
> when is the cancelled state set?
I've added a change to Cancel() to make the query_status_ Status::CANCELLED. 
Anytime UpdateBackendExecState() is called after that, the 
BackendExecState::CANCELLED state will be set.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@289
PS6, Line 289:
> if cancelled, will the status be Status::CANCELLED? if yes that is also cou
You're right. I've fixed that in the next patch set.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@308
PS6, Line 308:
             :   /// Number of active fragment instances and coordinators for 
this query that may consume
             :   /// resources for query execution (i.e. threads, memory) on 
the Impala daemon.
             :   /// Query-wide execution resources for this query are released 
once this goes to zero.
             :   AtomicInt32 exec_resource_refcnt_;
             :
             :   /// Temporary files for this query (owned by obj_pool_). 
Non-null if spilling is
             :   /// enabled. Set in Pr
> It seems simpler to keep track of a single status for the first error hit i
I feel that it's not ideal having the fragment instance threads modify the 
query wide status. Now the patch looks like this:
The fragment instance threads modify the query wide Status to show that there's 
an error, but they don't modify the query BackendExecState. The 
BackendExecState is modified by the query state thread. This seems a little 
counter intuitive, since the fragment instance thread could itself modify the 
BackendExecState if it wants to as it has all the information to do so when it 
hits an error. But that would mean that sending final reports, etc. would have 
to be driven by the fragment instance thread too, which is not a good idea.

In any case, I'll go ahead with this, as functionally it'll have the same 
behavior, and it's not a huge problem, so we can avoid more back and forth 
about it.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@320
PS6, Line 320:
> Mind documenting the thread safety about this variable ?
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@233
PS6, Line 233: return query_status_;
> unused variable. maybe just pass a status object to this method if the fins
I was printing this to the logs in the previous patchset, but I removed it 
since that was one of the comments. So, in this patch, we aren't using the 
'finst_id_' part of the struct, but as part of a following change, we will 
require it (IMPALA-4063). This patch is primarily meant as a set up to do that 
change, so this state machine isn't really used for anything until that follow 
on patch comes in.

In any case, FInstanceStatus is removed, and we have a query wide 
'failed_finstance_id_' member variable instead now (which is also not used), 
but will be for 4063.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@238
PS6, Line 238:   if (!WaitForPrepare().ok()) return nullptr;
             :   auto it = fis_map_.find(instance_id);
             :   return it != fis_map_.end() ? it->second : nullptr;
             : }
             :
             : void QueryState::ReportExecStatus(bool done, const Status& 
status,
             :     FragmentInstanceState* fis) {
> Can this be merged with TransitionNonTerminalState() above as:
Done. I removed TransitionNonTerminalExecState() and just made all the code 
inline here.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@241
PS6, Line 241:
             :
             : void QueryState::ReportExecStatus(bool done, const Status& 
status,
             :     FragmentInstanceState* fis) {
             :   Repor
> do we have a cancellation path?
I forgot to add a line to Cancel() to update the query_status_, which means 
that previously all CANCELLED statuses would look like errors. I've changed it 
now.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@250
PS6, Line 250:  if we're reporting an error, w
> Under what circumstances would we call QueryState::UpdateBackendExecState()
We won't in this patch. However, after 4063, we'd expect that this can get 
called periodically, so the behavior should be idempotent if the state is 
already terminal.

In any case, I've removed all this code, since we decided we don't want to have 
them around just for DCHECKs.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@379
PS6, Line 379:       DCHECK_LT(fragment_ctx_idx, 
rpc_params_.fragment_ctxs.size());
> unique_lock<SpinLock> l(status_lock_)?
Done


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@399
PS6, Line 399:       // reference counts were both greater than zero before the 
increments, so
Forgot to add "instances_prepared_barrier_->NotifyRemaining()" here too. Added 
in latest patchset.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@457
PS6, Line 457:   << " fragment_idx
> Won't it be a problem if we call ErrorDuringPrepare() here as it only bump
Hrm, yea, this is a problem in this patchset. I guess the only way to get 
around this is to call "instances_prepared_barrier_->NotifyRemaining()" over 
here. Done.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@460
PS6, Line 460:   << "
> Seems wrong to return here. Don't we need to call ReportExecStatusAux() lik
Good catch, done.


http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@463
PS6, Line 463:   
ImpaladMetrics::IMPALA_SERVER_NUM_FRAGMENTS_IN_FLIGHT->Increment(-1L);
             :   VLOG_QUERY << "Instance completed. instance_id=" << Print
> Seems unnecessary to be copying all_fis_status here and having two separate
Done



--
To view, visit http://gerrit.cloudera.org:8080/10813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe
Gerrit-Change-Number: 10813
Gerrit-PatchSet: 7
Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Comment-Date: Tue, 24 Jul 2018 19:17:54 +0000
Gerrit-HasComments: Yes

Reply via email to