Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9656
Change subject: IMPALA-6338: Ensure successfully completed queries have full profile ...................................................................... IMPALA-6338: Ensure successfully completed queries have full profile test_profile_fragment_instances checks that, once all the results have been returned, every fragment instance appears in the query profile for a query that internally cancels fragment instances that are still executing when the results have been fully returned. Every fis is guaranteed to send a profile to the coordinator in Finalize(), but previously fragment profiles were not applied by the coordinator if the backend was 'done', defined as either all instances having completed or one has entered an error state (including cancelled). So, the test could fail by the following sequence: - Some fragment for a particular backend sends an update to the coordinator. 'returned_all_results_' is true, so the coordinator responds indicating the the backend should cancel its remaining fragments. - Another fragment from that backend executes Finalize() and reports that it was cancelled. This causes the coordinator to consider the entire backend to be 'done'. - A third fragment, which had not previously sent a report from the reporting thread, from the same backend executes Finalize(). This report will not be applied by the coordinator as the backend is considered 'done', so this fragment will not appear in the final profile. The solution is to change the definition of 'done' to not include a backend that has been cancelled due to all the results having been returned but still has fragments that haven't completed. THis patch accomplishes this by introducing a new Status, CANCELLED_INTERNAL, which indicates a fragment that was cancelled due to all of the results having been returned. A backend that has this Status will continue to apply profile updates until all of its fragments have completed or an error is encountered. For all other purposes, CANCELLED_INTERNAL is equivalent to CANCELLED, i.e. Status::IsCancelled() is true for both. Testing: - Ran test_profile_fragment_instances in a loop with no failures. I can reliably repro the original problem with a few carefully placed sleeps. Change-Id: I97b031548c64ac16d7e2a09b38baac2d30ac3340 --- M be/src/common/status.cc M be/src/common/status.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/runtime-state.h M common/thrift/generate_error_codes.py M tests/query_test/test_observability.py 11 files changed, 60 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/9656/1 -- To view, visit http://gerrit.cloudera.org:8080/9656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I97b031548c64ac16d7e2a09b38baac2d30ac3340 Gerrit-Change-Number: 9656 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>