Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/10158 )
Change subject: IMPALA-5384, part 2: Simplify Coordinator locking and clarify state ...................................................................... Patch Set 16: (13 comments) http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@322 PS15, Line 322: cu > double space snuck back in Done http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@341 PS15, Line 341: ith 'instance_ho > nit: update to "failed_finstance" Done http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.h@414 PS15, Line 414: > nit: extra space Done http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@a115 PS14, Line 115: : > put that back Done http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@121 PS14, Line 121: may call any > should we call this a non-OK status, because for cancellation through Cance Done http://gerrit.cloudera.org:8080/#/c/10158/14/be/src/runtime/coordinator.h@131 PS14, Line 131: > do you mean cancels? I think I meant to delete that. http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@147 PS15, Line 147: Status prepare_status = query_state_->WaitForPrepare(); : DCHECK(!prepare_status.ok()); : return UpdateExecState > should we call UpdateExecStateIfError() here? yes, done. http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@486 PS15, Line 486: !status.ok() && exec_status_.IsCancelled() > can we have a case with <state, status> = <ERROR, Cancelled> ? Ideally no. but I think the backend can sometimes do it due to complications in the error propagation path. I would like to get to a point where this can't be cancelled, but I don't think we're there yet. http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@522 PS15, Line 522: // TODO: is this needed? This will also happen on the "backend" side of cancel rpc. : // And in the case of EOS, sink already knows this. : if (coord_sink_ != nullptr) coord_sink_->CloseConsumer(); > I dont think we need this, I looked at why we originally needed this and it Yeah, I agree, but would prefer to do it in an independent commit in case there is some unforeseen issue. http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@530 PS15, Line 530: : } else { > in case of cancelBackends(), should we also wait for resources to be freed? I think the argument can be made in both directions, but we eventually settled on the current behavior. The reasoning is documented in the header comment for ReleaseAdmissionControlResources(). I'll update this todo as to not add confusion. http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@605 PS15, Line 605: > maybe change the name to something like UpdateExecState, since this is also The "if error" was meant to say that the exec state is only updated if the status is an not-ok and only to move into an error state (as opposed to, say, cancell state). But I can see the confusion so i'll change it since you feel it'd be clearer. I guess the header comment should clarify that this shouldn't be used for cancellation. http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@639 PS15, Line 639: DCHECK(exec_rpcs_complete_barrier_ != nullptr && : exec_rpcs_complete_barrier_->pending() <= 0) << "Exec() must be called first"; : discard_result(SetNonErrorTerminalSt > does this mean we can call cancel before Exec() completes? what if cancel i Yeah, this is bogus. Fixed. The comments in the header already say Exec() needs to precede this. http://gerrit.cloudera.org:8080/#/c/10158/15/be/src/runtime/coordinator.cc@943 PS15, Line 943: e : backend_states_) { > Do we expect these methods to be called before Exec() completes? If not, we In principle, I agree that's how we'd want it to work. But given the way ClientRequestState currently works and that things like the webserver just reach in and get the coord() object, I think they can currently call these webserver support rourtines before Exec() completes. How about I revisit this after your patch is merged? -- To view, visit http://gerrit.cloudera.org:8080/10158 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1abdfd02163f9356c59d470fe1c64ebe012a9e8e Gerrit-Change-Number: 10158 Gerrit-PatchSet: 16 Gerrit-Owner: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 10 May 2018 00:22:42 +0000 Gerrit-HasComments: Yes