[ 
https://issues.apache.org/jira/browse/IMPALA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved IMPALA-11030.
---------------------------------
    Fix Version/s: Impala 4.1.0
       Resolution: Fixed

> Wrong result due to predicate pushdown into inline view with Analytic function
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-11030
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11030
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: Impala 4.1.0
>
>
> Table DDL and population:
> {noformat}
> create table t1( c1 int, c2 char(1));
> insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as 
> char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as 
> char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as 
> char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' 
> as char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' 
> as char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, 
> cast('P' as char(1))),(18, cast('N' as char(1))),(19, cast('P' as 
> char(1))),(20, cast('N' as char(1)));
> {noformat}
> {noformat}
> default> select * from t1;
> ------+
> c1    c2
> ------+
> 11    P
> 12    N
> 13    P
> 14    N
> 15    N
> 16    N
> 17    P
> 18    N
> 19    P
> 20    N
> 1     P
> 2     P
> 3     P
> 4     N
> 5     P
> 6     N
> 7     P
> 8     N
> 9     N
> 10    N
> ------+
> The following query produces a wrong num_row() for num_ranks column.
> default> select * from (select c1, c2 , row_number() over(order by c1) as 
> num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from 
> t1) a where c2='P';
> -------------------------+
> c1    c2      num_ranks       prime_rank
> -------------------------+
> 1     P       1       1
> 2     P       2       2
> 3     P       3       3
> 5     P       4       4
> 7     P       5       5
> 11    P       6       6
> 13    P       7       7
> 17    P       8       8
> 19    P       9       9
> -------------------------+
> {noformat}
> The plan indicates that the predicate c2='P' is incorrectly pushed to the 
> scan and affects the order of operations in the SQL statement.
> {noformat}
> Query: explain select * from (select c1, c2 , row_number() over(order by c1) 
> as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank 
> from t1) a where c2='P'
> +------------------------------------------------------------------------------------------+
> | Explain String                                                              
>              |
> +------------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=20.00MB Threads=2                 
>              |
> | Per-Host Resource Estimates: Memory=30MB                                    
>              |
> | Codegen disabled by planner                                                 
>              |
> | Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY 
> c1             |
> | ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC) 
> prime_rank           |
> | FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P'                         
>              |
> |                                                                             
>              |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
>              |
> | |  Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB 
> thread-reservation=2 |
> | PLAN-ROOT SINK                                                              
>              |
> | |  output exprs: c1, c2, row_number(), row_number()                         
>              |
> | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |                                                                           
>              |
> | 04:ANALYTIC                                                                 
>              |
> | |  functions: row_number()                                                  
>              |
> | |  order by: c1 ASC                                                         
>              |
> | |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                 
>              |
> | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=7,3 row-size=21B cardinality=10                                
>              |
> | |  in pipelines: 03(GETNEXT)                                                
>              |
> | |                                                                           
>              |
> | 03:SORT                                                                     
>              |
> | |  order by: c1 ASC                                                         
>              |
> | |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=7 row-size=13B cardinality=10                                  
>              |
> | |  in pipelines: 03(GETNEXT), 01(OPEN)                                      
>              |
> | |                                                                           
>              |
> | 02:ANALYTIC                                                                 
>              |
> | |  functions: row_number()                                                  
>              |
> | |  partition by: c2                                                         
>              |
> | |  order by: c1 ASC                                                         
>              |
> | |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                 
>              |
> | |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=5,4 row-size=13B cardinality=10                                
>              |
> | |  in pipelines: 01(GETNEXT)                                                
>              |
> | |                                                                           
>              |
> | 01:SORT                                                                     
>              |
> | |  order by: c2 ASC NULLS LAST, c1 ASC                                      
>              |
> | |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
> thread-reservation=0   |
> | |  tuple-ids=5 row-size=5B cardinality=10                                   
>              |
> | |  in pipelines: 01(GETNEXT), 00(OPEN)                                      
>              |
> | |                                                                           
>              |
> | 00:SCAN HDFS [default.t1]                                                   
>              |
> |    HDFS partitions=1/1 files=2 size=91B                                     
>              |
> |    predicates: CAST(default.t1.c2 AS STRING) = 'P'                          
>              |
> |    stored statistics:                                                       
>              |
> |      table: rows=20 size=91B                                                
>              |
> |      columns: all                                                           
>              |
> |    extrapolated-rows=disabled max-scan-range-rows=10                        
>              |
> |    mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1         
>              |
> |    tuple-ids=0 row-size=5B cardinality=10                                   
>              |
> |    in pipelines: 00(GETNEXT)                                                
>              |
> +------------------------------------------------------------------------------------------+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to