[ https://issues.apache.org/jira/browse/IMPALA-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796887#comment-17796887 ]
Riza Suminto commented on IMPALA-12630: --------------------------------------- I'm trying to compare dataloading log from pass vs failed run. In pass run ([^ubuntu-20.04-1158-load-tpch-core-hive-generated-orc-def-block.sql.log]), lineitem is loaded with 12 Map and 1 File Merge. In failed run ([^ubuntu-20.04-1134-load-tpch-core-hive-generated-orc-def-block.sql.log]), lineitem is loaded with 4 Map and no File Merge. Using orc-tool, I inspect the file metadata of lineitem table from my local dataset. In [^meta-lineitem.txt] {code:java} Stripe 3: Column 0: count: 533501 hasNull: false Column 1: count: 533501 hasNull: false bytesOnDisk: 379090 min: 1075520 max: 1609411 sum: 716213269480 ... Stripe 4: Column 0: count: 533455 hasNull: false Column 1: count: 533455 hasNull: false bytesOnDisk: 378202 min: 1609411 max: 2142211 sum: 1000577484397 {code} l_orderkey = 1609411 used by the testcase is in boundary of Stripe 3 and Stripe 4. One way to make this test more deterministic is to change it to count orders table with o_orderkey = 1. tpch_orc_def.orders are loaded as single file with 3 stripes in both 1134 and 1158 run. And o_orderkey = 1 only lies in the first stripe, as shown by [^meta-orders.txt]. {code:java} Stripe Statistics: Stripe 1: Column 0: count: 591839 hasNull: false Column 1: count: 591839 hasNull: false bytesOnDisk: 4629 min: 1 max: 2367335 sum: 700541773200 {code} > TestOrcStats.test_orc_stats fails in count-start on lineitem with filter > ------------------------------------------------------------------------ > > Key: IMPALA-12630 > URL: https://issues.apache.org/jira/browse/IMPALA-12630 > Project: IMPALA > Issue Type: Bug > Reporter: Quanlong Huang > Priority: Critical > Attachments: load-tpch-core-hive-generated-orc-def-block.sql, > meta-lineitem.txt, meta-orders.txt, profile_1134.txt, profile_949.txt, > ubuntu-20.04-1134-load-tpch-core-hive-generated-orc-def-block.sql.log, > ubuntu-20.04-1158-load-tpch-core-hive-generated-orc-def-block.sql.log > > > Saw the test failed several times recently: > https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/949 > https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/1134 > {noformat} > query_test/test_orc_stats.py:41: in test_orc_stats > self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database) > common/impala_test_suite.py:776: in run_test_case > update_section=pytest.config.option.update_results) > common/test_result_verifier.py:683: in verify_runtime_profile > % (function, field, expected_value, actual_value, op, actual)) > E AssertionError: Aggregation of SUM over RowsRead did not match expected > results. > E EXPECTED VALUE: > E 13501 > E > E > E ACTUAL VALUE: > E 20000 > E > E OP: > E : {noformat} > The query is > {code:sql} > select count(*) from tpch_orc_def.lineitem where l_orderkey = 1609411 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org