[ https://issues.apache.org/jira/browse/IMPALA-9889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155809#comment-17155809 ]
Wenzhe Zhou edited comment on IMPALA-9889 at 7/11/20, 11:34 PM: ---------------------------------------------------------------- This issue randomly happened in impala-cdpd-master-core-asan builds. Checked the query profiles in log files, the issue happened for following query for TestMinMaxFilters.test_min_max_filters and TestAllRuntimeFilters.test_all_runtime_filters: select STRAIGHT_JOIN count(*) from decimal_rtf_tbl a join [BROADCAST] decimal_rtf_tiny_tbl b where a.d28_28 = b.d28_28 and b.d28_28 != 0 When the issue happened, backend took 4 to 6 minutes to finish the query plan. The filters did not arrive scan node in time. Arrival delay is about 4 m ~ 5 m. The codegen for one fragment tooks about 3 m. Kudu scan did not take too much time (about ~25 s). Here the messages in query profile for one of the failures, others have same pattern. impala-ec2-centos74-r5-4xlarge-ondemand-0573.vpc.cloudera.com:22002: Filter 1 arrival: 5m42s Fragment F00: CodeGen:(Total: 3m58s, non-child: 0.000ns, % non-child: 0.00%) Fragment F01: CodeGen:(Total: 3m39s, non-child: 0.000ns, % non-child: 0.00%) KUDU_SCAN_NODE (id=0):(Total: 20.999ms, non-child: 20.999ms, % non-child: 100.00%) Table Name: functional_kudu.decimal_rtf_tbl Runtime filters: Not all filters arrived (arrived: [], missing [1]), waited for 0. Arrival delay: 6m14s. These runtime filter testing are executed in the end part of whole testing. According to [https://gerrit.cloudera.org/#/c/16155/, |https://gerrit.cloudera.org/#/c/16155/]as the stacks accumulate in ASAN build, allocations and frees get slower and slower. This causes test execution time to degrade over time especially for codegen, hence increase the delay for runtime filters to arrive the scan node. IMPALA- 9887 (Add support for sharding end-to-end tests) and IMPALA-5444 (Asynchronous code generation) should be helpful to solve the issue. IMPALA-5444 was merged into upstream recently, but query option ASYNC_CODEGEN was off by default. We should turn on ASYNC_CODEGEN for runtime filter testing against Kudu table when test runs slowly, like ASAN build. was (Author: wzhou): This issue randomly happened in impala-cdpd-master-core-asan builds. Checked the query profiles in log files, the issue happened for following query for TestMinMaxFilters.test_min_max_filters and TestAllRuntimeFilters.test_all_runtime_filters: select STRAIGHT_JOIN count(*) from decimal_rtf_tbl a join [BROADCAST] decimal_rtf_tiny_tbl b where a.d28_28 = b.d28_28 and b.d28_28 != 0 When the issue happened, backend took 4 to 6 minutes to finish the query plan. The filters did not arrive scan node in time. Arrival delay is about 4 m ~ 5 m. The codegen for one fragment tooks about 3 m. Kudu scan did not take too much time (about ~25 s). Here the messages in query profile for one of the failures, others have same pattern. impala-ec2-centos74-r5-4xlarge-ondemand-0573.vpc.cloudera.com:22002: Filter 1 arrival: 5m42s Fragment F00: CodeGen:(Total: 3m58s, non-child: 0.000ns, % non-child: 0.00%) Fragment F01: CodeGen:(Total: 3m39s, non-child: 0.000ns, % non-child: 0.00%) KUDU_SCAN_NODE (id=0):(Total: 20.999ms, non-child: 20.999ms, % non-child: 100.00%) Table Name: functional_kudu.decimal_rtf_tbl Runtime filters: Not all filters arrived (arrived: [], missing [1]), waited for 0. Arrival delay: 6m14s. These runtime filter testing are executed in the end part of whole testing. According to [https://gerrit.cloudera.org/#/c/16155/, |https://gerrit.cloudera.org/#/c/16155/]as the stacks accumulate in ASAN build, allocations and frees get slower and slower. This causes test execution time to degrade over time especially for codegen, hence increase the delay for runtime filters to arrive the scan node. IMPALA- 9887 (Add support for sharding end-to-end tests) and IMPALA-5444 (Asynchronous code generation) should be helpful to solve the issue. IMPALA-5444 was merged into upstream recently, but query option ASYNC_CODEGEN was off by default. We should turn on ASYNC_CODEGEN for runtime filter testing against Kudu in ASAN build. > test_runtime_filters flaky on Kudu table format > ----------------------------------------------- > > Key: IMPALA-9889 > URL: https://issues.apache.org/jira/browse/IMPALA-9889 > Project: IMPALA > Issue Type: Bug > Affects Versions: Impala 4.0 > Reporter: Vihang Karajgaonkar > Assignee: Wenzhe Zhou > Priority: Blocker > Labels: broken-build, flaky-test > > Couple of tests in test_runtime_filters test fail on Kudu table formats > randomly with the stack trace below: > {noformat} > query_test/test_runtime_filters.py:208: in test_min_max_filters > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:718: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:627: in verify_runtime_profile > % (function, field, expected_value, actual_value, actual)) > E AssertionError: Aggregation of SUM over ProbeRows did not match expected > results. > E EXPECTED VALUE: > E 619 > E > E ACTUAL VALUE: > E 718 > {noformat} > {noformat} > query_test/test_runtime_filters.py:277: in test_all_runtime_filters > test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS': str(WAIT_TIME_MS)}) > common/impala_test_suite.py:718: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:627: in verify_runtime_profile > % (function, field, expected_value, actual_value, actual)) > E AssertionError: Aggregation of SUM over ProbeRows did not match expected > results. > E EXPECTED VALUE: > E 37 > E > E ACTUAL VALUE: > E 718 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org