Aman Sinha created IMPALA-11030:
-----------------------------------

             Summary: Wrong result due to predicate pushdown into inline view 
with Analytic function
                 Key: IMPALA-11030
                 URL: https://issues.apache.org/jira/browse/IMPALA-11030
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


Table DDL and population:
{noformat}
create table t1( c1 int, c2 char(1));
insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as 
char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as 
char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as 
char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' as 
char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' as 
char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, cast('P' as 
char(1))),(18, cast('N' as char(1))),(19, cast('P' as char(1))),(20, cast('N' 
as char(1)));
{noformat}

{noformat}
default> select * from t1;
------+

c1      c2
------+

11      P
12      N
13      P
14      N
15      N
16      N
17      P
18      N
19      P
20      N
1       P
2       P
3       P
4       N
5       P
6       N
7       P
8       N
9       N
10      N
------+

The following query produces a wrong num_row() for num_ranks column.

default> select * from (select c1, c2 , row_number() over(order by c1) as 
num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from 
t1) a where c2='P';

-------------------------+

c1      c2      num_ranks       prime_rank
-------------------------+

1       P       1       1
2       P       2       2
3       P       3       3
5       P       4       4
7       P       5       5
11      P       6       6
13      P       7       7
17      P       8       8
19      P       9       9
-------------------------+

{noformat}

The plan indicates that the predicate c2='P' is incorrectly pushed to the scan 
and affects the order of operations in the SQL statement.

{noformat}
Query: explain select * from (select c1, c2 , row_number() over(order by c1) as 
num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from 
t1) a where c2='P'

+------------------------------------------------------------------------------------------+
| Explain String                                                                
           |
+------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=20.00MB Threads=2                   
           |
| Per-Host Resource Estimates: Memory=30MB                                      
           |
| Codegen disabled by planner                                                   
           |
| Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY c1  
           |
| ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC) 
prime_rank           |
| FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P'                           
           |
|                                                                               
           |
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                         
           |
| |  Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB 
thread-reservation=2 |
| PLAN-ROOT SINK                                                                
           |
| |  output exprs: c1, c2, row_number(), row_number()                           
           |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0   |
| |                                                                             
           |
| 04:ANALYTIC                                                                   
           |
| |  functions: row_number()                                                    
           |
| |  order by: c1 ASC                                                           
           |
| |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                   
           |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0   |
| |  tuple-ids=7,3 row-size=21B cardinality=10                                  
           |
| |  in pipelines: 03(GETNEXT)                                                  
           |
| |                                                                             
           |
| 03:SORT                                                                       
           |
| |  order by: c1 ASC                                                           
           |
| |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0   |
| |  tuple-ids=7 row-size=13B cardinality=10                                    
           |
| |  in pipelines: 03(GETNEXT), 01(OPEN)                                        
           |
| |                                                                             
           |
| 02:ANALYTIC                                                                   
           |
| |  functions: row_number()                                                    
           |
| |  partition by: c2                                                           
           |
| |  order by: c1 ASC                                                           
           |
| |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                   
           |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0   |
| |  tuple-ids=5,4 row-size=13B cardinality=10                                  
           |
| |  in pipelines: 01(GETNEXT)                                                  
           |
| |                                                                             
           |
| 01:SORT                                                                       
           |
| |  order by: c2 ASC NULLS LAST, c1 ASC                                        
           |
| |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB 
thread-reservation=0   |
| |  tuple-ids=5 row-size=5B cardinality=10                                     
           |
| |  in pipelines: 01(GETNEXT), 00(OPEN)                                        
           |
| |                                                                             
           |
| 00:SCAN HDFS [default.t1]                                                     
           |
|    HDFS partitions=1/1 files=2 size=91B                                       
           |
|    predicates: CAST(default.t1.c2 AS STRING) = 'P'                            
           |
|    stored statistics:                                                         
           |
|      table: rows=20 size=91B                                                  
           |
|      columns: all                                                             
           |
|    extrapolated-rows=disabled max-scan-range-rows=10                          
           |
|    mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1           
           |
|    tuple-ids=0 row-size=5B cardinality=10                                     
           |
|    in pipelines: 00(GETNEXT)                                                  
           |
+------------------------------------------------------------------------------------------+
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to