[ https://issues.apache.org/jira/browse/IMPALA-7422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580145#comment-16580145 ]
Michael Ho commented on IMPALA-7422: ------------------------------------ Actually, the race mentioned in the previous update doesn't quite explain the DCHECK we were hitting above although it's still a race which needs fixing. > Crash in QueryState::PublishFilter() > fragment_map_.count(params.dst_fragment_idx) == 1 (0 vs. 1) > ------------------------------------------------------------------------------------------------ > > Key: IMPALA-7422 > URL: https://issues.apache.org/jira/browse/IMPALA-7422 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.1.0 > Reporter: Tim Armstrong > Assignee: Michael Ho > Priority: Blocker > Labels: broken-build, crash > Attachments: > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17c2.vpc.cloudera.com.jenkins.log.ERROR.20180809-162257.28028.gz, > > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17c2.vpc.cloudera.com.jenkins.log.FATAL.20180809-201036.28028.gz, > > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17c2.vpc.cloudera.com.jenkins.log.INFO.20180809-162257.28028.gz, > > impalad.impala-ec2-centos74-m5-4xlarge-ondemand-17c2.vpc.cloudera.com.jenkins.log.WARNING.20180809-162257.28028.gz > > > Ran into this running core tests on one of my patches. > {noformat} > query-state.cc:506] Check failed: > fragment_map_.count(params.dst_fragment_idx) == 1 (0 vs. 1) > *** Check failure stack trace: *** > @ 0x4387b8c > @ 0x4389431 > @ 0x4387566 > @ 0x438ab2d > @ 0x1e0ba94 > @ 0x1f3d097 > @ 0x303fe61 > @ 0x303ddef > @ 0x18fa28f > @ 0x1d0d1b8 > @ 0x1d054b8 > @ 0x1d06bde > @ 0x1d06a74 > @ 0x1d067c0 > @ 0x1d066d3 > @ 0x1c2d3c1 > @ 0x2041992 > @ 0x2049a6a > @ 0x204998e > @ 0x2049951 > @ 0x32b31d9 > @ 0x7fb7d61a2e24 > @ 0x7fb7d5ed034c > {noformat} > {noformat} > 20:10:21 [gw0] PASSED > query_test/test_runtime_filters.py::TestMinMaxFilters::test_min_max_filters[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] > 20:10:29 > query_test/test_runtime_filters.py::TestMinMaxFilters::test_large_strings > 20:10:29 [gw0] PASSED > query_test/test_runtime_filters.py::TestMinMaxFilters::test_large_strings > 20:10:30 > query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 20:10:30 [gw3] PASSED > query_test/test_runtime_filters.py::TestBloomFilters::test_bloom_filters[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: seq/def/record] > 20:10:32 > query_test/test_runtime_filters.py::TestBloomFilters::test_bloom_filters[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: seq/def/record] > 20:10:32 [gw4] PASSED > query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: rc/snap/block] > 20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: rc/snap/block] > 20:10:39 [gw5] FAILED > query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/bzip/block] > 20:10:39 > query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/bzip/block] > 20:10:39 [gw4] FAILED > query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: rc/snap/block] > 20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] > 20:10:39 [gw7] FAILED > query_test/test_queries.py::TestQueries::test_sort[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] > 20:10:39 > query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: > 16 | debug_action: None | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: rc/snap/block] > 20:10:39 [gw4] FAILED > query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] > 20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] > 20:10:39 [gw5] FAILED > query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/bzip/block] > 20:10:39 > query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/snap/block] > 20:10:39 [gw4] FAILED > query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] > 20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: avro/none] > 20:10:39 [gw7] FAILED > query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: > 16 | debug_action: None | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: rc/snap/block] > 20:10:40 > query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: > 1 | debug_action: -1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0 | > exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: avro/def/block] > 20:10:40 [gw5] FAILED > query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/snap/block] > 20:10:40 > query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: > {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/snap/block] > 20:10:45 [gw4] FAILED > query_test/test_queries.py::TestQueries::test_subquery[exec_option: > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': 0} | table_format: avro/none] Traceback > (most recent call last): > {noformat} > I looked at the code and I couldn't see how there was any synchronization > between 'fragment_map_' being modified in QueryState::StartFInstances() and > it being read in PublishFilters(). It looks like before IMPALA-7163, > instances_prepared_promise_ functioned as a barrier between those two > functions, so there was synchronization but it wasn't documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org