Thomas Tauber-Marshall has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9656


Change subject: IMPALA-6338: Ensure successfully completed queries have full 
profile
......................................................................

IMPALA-6338: Ensure successfully completed queries have full profile

test_profile_fragment_instances checks that, once all the results have
been returned, every fragment instance appears in the query profile
for a query that internally cancels fragment instances that are still
executing when the results have been fully returned.

Every fis is guaranteed to send a profile to the coordinator in
Finalize(), but previously fragment profiles were not applied by the
coordinator if the backend was 'done', defined as either all instances
having completed or one has entered an error state (including
cancelled).

So, the test could fail by the following sequence:
- Some fragment for a particular backend sends an update to the
  coordinator. 'returned_all_results_' is true, so the coordinator
  responds indicating the the backend should cancel its remaining
  fragments.
- Another fragment from that backend executes Finalize() and reports
  that it was cancelled. This causes the coordinator to consider the
  entire backend to be 'done'.
- A third fragment, which had not previously sent a report from the
  reporting thread, from the same backend executes Finalize(). This
  report will not be applied by the coordinator as the backend is
  considered 'done', so this fragment will not appear in the final
  profile.

The solution is to change the definition of 'done' to not include a
backend that has been cancelled due to all the results having been
returned but still has fragments that haven't completed.

THis patch accomplishes this by introducing a new Status,
CANCELLED_INTERNAL, which indicates a fragment that was cancelled due
to all of the results having been returned. A backend that has this
Status will continue to apply profile updates until all of its
fragments have completed or an error is encountered. For all other
purposes, CANCELLED_INTERNAL is equivalent to CANCELLED, i.e.
Status::IsCancelled() is true for both.

Testing:
- Ran test_profile_fragment_instances in a loop with no failures.
  I can reliably repro the original problem with a few carefully
  placed sleeps.

Change-Id: I97b031548c64ac16d7e2a09b38baac2d30ac3340
---
M be/src/common/status.cc
M be/src/common/status.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-state.h
M common/thrift/generate_error_codes.py
M tests/query_test/test_observability.py
11 files changed, 60 insertions(+), 29 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/9656/1
--
To view, visit http://gerrit.cloudera.org:8080/9656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I97b031548c64ac16d7e2a09b38baac2d30ac3340
Gerrit-Change-Number: 9656
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>

Reply via email to