Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12049 )
Change subject: IMPALA-4555: Make QueryState's status reporting more robust ...................................................................... Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc@300 PS2, Line 300: nstance_exec_status : > If we send the final report multiple times, won't report_seq_no == last_rep Yes, but instance_stats->done_ can be set by an intermediate report (if that particular fragment is done but others aren't). So the following sequence could occur: - Some fragment finishes but others are still running, backends sends a report, backend thinks the report failed but actually the coordinator received it. - Backend sends another report. This one has a higher sequence number, but still has the info for the finished fragment as we didn't think that it was received. Coordinator already processed the report for the finished fragment, so we want to skip processing it again. http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/coordinator-backend-state.cc@320 PS2, Line 320: n_ran > const auto& Done http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/fragment-instance-state.h File be/src/runtime/fragment-instance-state.h: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/fragment-instance-state.h@101 PS2, Line 101: received > nit: received Done http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.h File be/src/runtime/query-state.h: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.h@387 PS2, Line 387: The number of failed intermediate reports since the la > This is reset to 0 upon a successful RPC, right ? So, will it be clearer to Done http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc@386 PS2, Line 386: } else { > const TUniqueId& Done http://gerrit.cloudera.org:8080/#/c/12049/2/be/src/runtime/query-state.cc@563 PS2, Line 563: e all fragment instances finished preparing successfully, start > Some more thoughts on this: Done http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@116 PS2, Line 116: StatefulStatusPB { > Is there a more succinct name for it ? StatefulStatusPB ? Done http://gerrit.cloudera.org:8080/#/c/12049/2/common/protobuf/control_service.proto@138 PS2, Line 138: // Cumulative structural changes made by the table sink of this fragment : // instance. This is sent only when 'done' above is true. Not idempotent. : optional DmlExecStatusPB dml_exec_status = 5; > Why isn't this moved into FInstanceExecStatusStatefulPB Its not quite straight-forward to do, so given this patch is already reasonably large and complicated, I think its better to leave as followup work. If you agree, I'll file a JIRA for it. -- To view, visit http://gerrit.cloudera.org:8080/12049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971 Gerrit-Change-Number: 12049 Gerrit-PatchSet: 3 Gerrit-Owner: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 23 Jan 2019 00:25:25 +0000 Gerrit-HasComments: Yes