[ https://issues.apache.org/jira/browse/IMPALA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha resolved IMPALA-11030. --------------------------------- Fix Version/s: Impala 4.1.0 Resolution: Fixed > Wrong result due to predicate pushdown into inline view with Analytic function > ------------------------------------------------------------------------------ > > Key: IMPALA-11030 > URL: https://issues.apache.org/jira/browse/IMPALA-11030 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 3.4.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Priority: Major > Fix For: Impala 4.1.0 > > > Table DDL and population: > {noformat} > create table t1( c1 int, c2 char(1)); > insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as > char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as > char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as > char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' > as char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' > as char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, > cast('P' as char(1))),(18, cast('N' as char(1))),(19, cast('P' as > char(1))),(20, cast('N' as char(1))); > {noformat} > {noformat} > default> select * from t1; > ------+ > c1 c2 > ------+ > 11 P > 12 N > 13 P > 14 N > 15 N > 16 N > 17 P > 18 N > 19 P > 20 N > 1 P > 2 P > 3 P > 4 N > 5 P > 6 N > 7 P > 8 N > 9 N > 10 N > ------+ > The following query produces a wrong num_row() for num_ranks column. > default> select * from (select c1, c2 , row_number() over(order by c1) as > num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from > t1) a where c2='P'; > -------------------------+ > c1 c2 num_ranks prime_rank > -------------------------+ > 1 P 1 1 > 2 P 2 2 > 3 P 3 3 > 5 P 4 4 > 7 P 5 5 > 11 P 6 6 > 13 P 7 7 > 17 P 8 8 > 19 P 9 9 > -------------------------+ > {noformat} > The plan indicates that the predicate c2='P' is incorrectly pushed to the > scan and affects the order of operations in the SQL statement. > {noformat} > Query: explain select * from (select c1, c2 , row_number() over(order by c1) > as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank > from t1) a where c2='P' > +------------------------------------------------------------------------------------------+ > | Explain String > | > +------------------------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=20.00MB Threads=2 > | > | Per-Host Resource Estimates: Memory=30MB > | > | Codegen disabled by planner > | > | Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY > c1 | > | ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC) > prime_rank | > | FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P' > | > | > | > | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | > | | Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB > thread-reservation=2 | > | PLAN-ROOT SINK > | > | | output exprs: c1, c2, row_number(), row_number() > | > | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > thread-reservation=0 | > | | > | > | 04:ANALYTIC > | > | | functions: row_number() > | > | | order by: c1 ASC > | > | | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > | > | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > thread-reservation=0 | > | | tuple-ids=7,3 row-size=21B cardinality=10 > | > | | in pipelines: 03(GETNEXT) > | > | | > | > | 03:SORT > | > | | order by: c1 ASC > | > | | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB > thread-reservation=0 | > | | tuple-ids=7 row-size=13B cardinality=10 > | > | | in pipelines: 03(GETNEXT), 01(OPEN) > | > | | > | > | 02:ANALYTIC > | > | | functions: row_number() > | > | | partition by: c2 > | > | | order by: c1 ASC > | > | | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW > | > | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB > thread-reservation=0 | > | | tuple-ids=5,4 row-size=13B cardinality=10 > | > | | in pipelines: 01(GETNEXT) > | > | | > | > | 01:SORT > | > | | order by: c2 ASC NULLS LAST, c1 ASC > | > | | mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB > thread-reservation=0 | > | | tuple-ids=5 row-size=5B cardinality=10 > | > | | in pipelines: 01(GETNEXT), 00(OPEN) > | > | | > | > | 00:SCAN HDFS [default.t1] > | > | HDFS partitions=1/1 files=2 size=91B > | > | predicates: CAST(default.t1.c2 AS STRING) = 'P' > | > | stored statistics: > | > | table: rows=20 size=91B > | > | columns: all > | > | extrapolated-rows=disabled max-scan-range-rows=10 > | > | mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=0 row-size=5B cardinality=10 > | > | in pipelines: 00(GETNEXT) > | > +------------------------------------------------------------------------------------------+ > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)