Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10813 )
Change subject: IMPALA-7163: Implement a state machine for the QueryState class ...................................................................... Patch Set 7: (21 comments) http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/fragment-instance-state.cc@a113 PS6, Line 113: > May make sense to replace this by checking the state. Doing a racy read of I'm not sure if there's a way to do a racy read on an AtomicEnum, so I've used current_state_.Load() here. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@152 PS6, Line 152: /// Not idempotent, not thread-safe. Must only be called by a single thread. > update comment to mention that it blocks till a terminal state has been rea Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@156 PS6, Line 156: mentInstanceState* GetFInstanc > nit: avoid reference to private methods. how about switching the descriptio Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@198 PS6, Line 198: void DonePreparing() { d > If you take the suggestion to only keep a single 'query_status_', there doe Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@212 PS6, Line 212: } > This is only meaningful when there is an error status, right ? Please docum Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@234 PS6, Line 234: on > fragment instance Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@271 PS6, Line 271: > Consider calling it "PREPARING" Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@274 PS6, Line 274: > Consider calling it "EXECUTING" Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@281 PS6, Line 281: / The ove > when is the cancelled state set? I've added a change to Cancel() to make the query_status_ Status::CANCELLED. Anytime UpdateBackendExecState() is called after that, the BackendExecState::CANCELLED state will be set. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@289 PS6, Line 289: > if cancelled, will the status be Status::CANCELLED? if yes that is also cou You're right. I've fixed that in the next patch set. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@308 PS6, Line 308: : /// Number of active fragment instances and coordinators for this query that may consume : /// resources for query execution (i.e. threads, memory) on the Impala daemon. : /// Query-wide execution resources for this query are released once this goes to zero. : AtomicInt32 exec_resource_refcnt_; : : /// Temporary files for this query (owned by obj_pool_). Non-null if spilling is : /// enabled. Set in Pr > It seems simpler to keep track of a single status for the first error hit i I feel that it's not ideal having the fragment instance threads modify the query wide status. Now the patch looks like this: The fragment instance threads modify the query wide Status to show that there's an error, but they don't modify the query BackendExecState. The BackendExecState is modified by the query state thread. This seems a little counter intuitive, since the fragment instance thread could itself modify the BackendExecState if it wants to as it has all the information to do so when it hits an error. But that would mean that sending final reports, etc. would have to be driven by the fragment instance thread too, which is not a good idea. In any case, I'll go ahead with this, as functionally it'll have the same behavior, and it's not a huge problem, so we can avoid more back and forth about it. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.h@320 PS6, Line 320: > Mind documenting the thread safety about this variable ? Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@233 PS6, Line 233: return query_status_; > unused variable. maybe just pass a status object to this method if the fins I was printing this to the logs in the previous patchset, but I removed it since that was one of the comments. So, in this patch, we aren't using the 'finst_id_' part of the struct, but as part of a following change, we will require it (IMPALA-4063). This patch is primarily meant as a set up to do that change, so this state machine isn't really used for anything until that follow on patch comes in. In any case, FInstanceStatus is removed, and we have a query wide 'failed_finstance_id_' member variable instead now (which is also not used), but will be for 4063. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@238 PS6, Line 238: if (!WaitForPrepare().ok()) return nullptr; : auto it = fis_map_.find(instance_id); : return it != fis_map_.end() ? it->second : nullptr; : } : : void QueryState::ReportExecStatus(bool done, const Status& status, : FragmentInstanceState* fis) { > Can this be merged with TransitionNonTerminalState() above as: Done. I removed TransitionNonTerminalExecState() and just made all the code inline here. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@241 PS6, Line 241: : : void QueryState::ReportExecStatus(bool done, const Status& status, : FragmentInstanceState* fis) { : Repor > do we have a cancellation path? I forgot to add a line to Cancel() to update the query_status_, which means that previously all CANCELLED statuses would look like errors. I've changed it now. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@250 PS6, Line 250: if we're reporting an error, w > Under what circumstances would we call QueryState::UpdateBackendExecState() We won't in this patch. However, after 4063, we'd expect that this can get called periodically, so the behavior should be idempotent if the state is already terminal. In any case, I've removed all this code, since we decided we don't want to have them around just for DCHECKs. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@379 PS6, Line 379: DCHECK_LT(fragment_ctx_idx, rpc_params_.fragment_ctxs.size()); > unique_lock<SpinLock> l(status_lock_)? Done http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@399 PS6, Line 399: // reference counts were both greater than zero before the increments, so Forgot to add "instances_prepared_barrier_->NotifyRemaining()" here too. Added in latest patchset. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@457 PS6, Line 457: << " fragment_idx > Won't it be a problem if we call ErrorDuringPrepare() here as it only bump Hrm, yea, this is a problem in this patchset. I guess the only way to get around this is to call "instances_prepared_barrier_->NotifyRemaining()" over here. Done. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@460 PS6, Line 460: << " > Seems wrong to return here. Don't we need to call ReportExecStatusAux() lik Good catch, done. http://gerrit.cloudera.org:8080/#/c/10813/6/be/src/runtime/query-state.cc@463 PS6, Line 463: ImpaladMetrics::IMPALA_SERVER_NUM_FRAGMENTS_IN_FLIGHT->Increment(-1L); : VLOG_QUERY << "Instance completed. instance_id=" << Print > Seems unnecessary to be copying all_fis_status here and having two separate Done -- To view, visit http://gerrit.cloudera.org:8080/10813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec5670a7db83ecae4656d7bb2ea372d3767ba7fe Gerrit-Change-Number: 10813 Gerrit-PatchSet: 7 Gerrit-Owner: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Comment-Date: Tue, 24 Jul 2018 19:17:54 +0000 Gerrit-HasComments: Yes