Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16763 )
Change subject: IMPALA-10258, IMPALA-10109: Fixed flaky test in test_query_retries.py ...................................................................... Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-10258, IMPALA-10109 since these issues are basically unrelated, could you separate them out into two reviews? http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG@9 PS2, Line 9: When TestQueryRetries.test_original_query_cancel was ran on s3 I'm not sure I understand what you're saying the issue is: According to the JIRA, the test was waiting for the query to reach state "RUNNING", but it was already at state "EXCEPTION" (QueryState = 5, see beeswax.thrift). At that point in the test, the query shouldn't have failed, since the impalad hasn't been killed yet, so really not sure what could have happened, and unfortunately it doesn't look like we have the logs for it. http://gerrit.cloudera.org:8080/#/c/16763/2//COMMIT_MSG@16 PS2, Line 16: For IMPALA-10109, test_retries_from_cancellation_pool did not I'm not sure I understand what you're saying the issue is: According to the JIRA, the query timed out after ~784s, which is a lot longer than the default statestore time-to-detect-failure of heartbeat_frequency x max_missed = 1000ms x 10 = 10s. So it seems like the coordinator should have had plenty of time to get the statestore message, even under the old values. Looking through the logs, I'm a little confused by what I see - the coordinator says the query was only scheduled on 2 backends, but I think the test assumes that it gets scheduled on all 3 backends in the minicluster (see __kill_random_impalad()). I also see a reference to CancelFromThreadPool in QueryExecMgr on impalad_node1, but that should be hit unless the coordinator is killed, which it shouldn't have been. -- To view, visit http://gerrit.cloudera.org:8080/16763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3 Gerrit-Change-Number: 16763 Gerrit-PatchSet: 2 Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Tue, 24 Nov 2020 20:36:46 +0000 Gerrit-HasComments: Yes