[ https://issues.apache.org/jira/browse/IMPALA-10473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang resolved IMPALA-10473. ------------------------------------- Fix Version/s: Impala 4.0 Resolution: Fixed > Order by a constant should not be ignored in row_number() > --------------------------------------------------------- > > Key: IMPALA-10473 > URL: https://issues.apache.org/jira/browse/IMPALA-10473 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0 > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > Labels: correctness > Fix For: Impala 4.0 > > > [~thundergun] found a bug that row_number() ordering by a constant get wrong > results when there are more than one fragment instances: > {code:sql} > create table t1(c1 int) stored as textfile; > -- Insert 3 times to create 3 files > insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1); > insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1); > insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1); > -- Wrong plan missing a sort node after scan. Analytic is wrongly performed > locally. > set exec_single_node_rows_threshold=0; > select row_number() over (order by '1') from t1; > +------------------------+ > | row_number() OVER(...) | > +------------------------+ > | 1 | > | 2 | > | 3 | > | 4 | > | 5 | > | 6 | > | 7 | > | 8 | > | 9 | > | 10 | > | 1 | > | 2 | > | 3 | > | 4 | > | 5 | > | 6 | > | 7 | > | 8 | > | 9 | > | 10 | > | 1 | > | 2 | > | 3 | > | 4 | > | 5 | > | 6 | > | 7 | > | 8 | > | 9 | > | 10 | > +------------------------+ > {code} > In the plan, we can find that ANALYTIC is placed in the fragment with SCAN. > So row_number() is performed locally, which gets wrong results. > {code:java} > F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B > thread-reservation=1 > PLAN-ROOT SINK > | output exprs: row_number() > | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > 02:EXCHANGE [UNPARTITIONED] > | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 > | tuple-ids=0,2 row-size=8B cardinality=15 > | in pipelines: 00(GETNEXT) > | > F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 > Per-Host Resources: mem-estimate=36.00MB mem-reservation=4.01MB > thread-reservation=2 > 01:ANALYTIC > | functions: row_number() > | order by: '1' ASC > | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > thread-reservation=0 > | tuple-ids=0,2 row-size=8B cardinality=15 > | in pipelines: 00(GETNEXT) > | > 00:SCAN HDFS [default.t1, RANDOM] > HDFS partitions=1/1 files=3 size=60B > stored statistics: > table: rows=unavailable size=unavailable > columns: all > extrapolated-rows=disabled max-scan-range-rows=unavailable > mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > tuple-ids=0 row-size=0B cardinality=15 > in pipelines: 00(GETNEXT) {code} > This is an old issue since we have IMPALA-6323 and IMPALA-8069. IMPALA-6323 > allows analytic functions to have a constant order by clause and they are > always ignored after IMPALA-8069. This causes analytic funcs being performed > locally instead of globally and can cause incorrect results for some > functions like row_number(). -- This message was sent by Atlassian Jira (v8.3.4#803005)