[ 
https://issues.apache.org/jira/browse/IMPALA-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796887#comment-17796887
 ] 

Riza Suminto commented on IMPALA-12630:
---------------------------------------

I'm trying to compare dataloading log from pass vs failed run.

In pass run 
([^ubuntu-20.04-1158-load-tpch-core-hive-generated-orc-def-block.sql.log]), 
lineitem is loaded with 12 Map and 1 File Merge.
In failed run 
([^ubuntu-20.04-1134-load-tpch-core-hive-generated-orc-def-block.sql.log]), 
lineitem is loaded with 4 Map and no File Merge.

Using orc-tool, I inspect the file metadata of lineitem table from my local 
dataset. In [^meta-lineitem.txt]
{code:java}
  Stripe 3:
    Column 0: count: 533501 hasNull: false
    Column 1: count: 533501 hasNull: false bytesOnDisk: 379090 min: 1075520 
max: 1609411 sum: 716213269480
...
  Stripe 4:
    Column 0: count: 533455 hasNull: false
    Column 1: count: 533455 hasNull: false bytesOnDisk: 378202 min: 1609411 
max: 2142211 sum: 1000577484397 {code}
l_orderkey = 1609411 used by the testcase is in boundary of Stripe 3 and Stripe 
4.

One way to make this test more deterministic is to change it to count orders 
table with o_orderkey = 1. tpch_orc_def.orders are loaded as single file with 3 
stripes in both 1134 and 1158 run. And o_orderkey = 1 only lies in the first 
stripe, as shown by [^meta-orders.txt].
{code:java}
Stripe Statistics:
  Stripe 1:
    Column 0: count: 591839 hasNull: false
    Column 1: count: 591839 hasNull: false bytesOnDisk: 4629 min: 1 max: 
2367335 sum: 700541773200 {code}
 

> TestOrcStats.test_orc_stats fails in count-start on lineitem with filter
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-12630
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12630
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Priority: Critical
>         Attachments: load-tpch-core-hive-generated-orc-def-block.sql, 
> meta-lineitem.txt, meta-orders.txt, profile_1134.txt, profile_949.txt, 
> ubuntu-20.04-1134-load-tpch-core-hive-generated-orc-def-block.sql.log, 
> ubuntu-20.04-1158-load-tpch-core-hive-generated-orc-def-block.sql.log
>
>
> Saw the test failed several times recently:
> https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/949
> https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/1134
> {noformat}
> query_test/test_orc_stats.py:41: in test_orc_stats
>     self.run_test_case('QueryTest/orc-stats', vector, use_db=unique_database)
> common/impala_test_suite.py:776: in run_test_case
>     update_section=pytest.config.option.update_results)
> common/test_result_verifier.py:683: in verify_runtime_profile
>     % (function, field, expected_value, actual_value, op, actual))
> E   AssertionError: Aggregation of SUM over RowsRead did not match expected 
> results.
> E   EXPECTED VALUE:
> E   13501
> E   
> E   
> E   ACTUAL VALUE:
> E   20000
> E   
> E   OP:
> E   : {noformat}
> The query is
> {code:sql}
> select count(*) from tpch_orc_def.lineitem where l_orderkey = 1609411
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to