[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg

Zoltan Borok-Nagy (Code Review) Fri, 14 Apr 2023 06:34:48 -0700

Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19534 )


Change subject: IMPALA-11701 Part1: Don't push down predicates to scanner if 
already applied by Iceberg
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19534/12/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test:

http://gerrit.cloudera.org:8080/#/c/19534/12/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@63
PS12, Line 63: #
nit: did you want to add some comments here?


http://gerrit.cloudera.org:8080/#/c/19534/12/testdata/workloads/functional-planner/queries/PlannerTest/iceberg-predicates.test@64
PS12, Line 64: SELECT COUNT(*) FROM functional_parquet.iceberg_partitioned 
WHERE action = 'click';
Are we sure we get Parquet count(*) optimization here?

Because I see the following plan in such a case:

  Query: explain SELECT COUNT(*) FROM functional_parquet.iceberg_partitioned 
where true
 
+----------------------------------------------------------------------------------+
 | Explain String                                                               
    |
 
+----------------------------------------------------------------------------------+
 | Max Per-Host Resource Reservation: Memory=8.00KB Threads=2                   
    |
 | Per-Host Resource Estimates: Memory=10MB                                     
    |
 | Codegen disabled by planner                                                  
    |
 |                                                                              
    |
 | PLAN-ROOT SINK                                                               
    |
 | |                                                                            
    |
 | 01:AGGREGATE [FINALIZE]                                                      
    |
 | |  output: sum_init_zero(functional_parquet.iceberg_partitioned.stats: 
num_rows) |
 | |  row-size=8B cardinality=1                                                 
    |
 | |                                                                            
    |
 | 00:SCAN HDFS [functional_parquet.iceberg_partitioned]                        
    |
 |    HDFS partitions=1/1 files=20 size=22.90KB                                 
    |
 |    row-size=8B cardinality=20                                                
    |
 
+----------------------------------------------------------------------------------+

Please note the sum_init_zero aggregate function, plus the row-size of SCAN 
HDFS is not 0.

I get a similar plan to yours when I run a select count(*) query on a plain 
text table that doesn't have this optimization, e.g.: explain SELECT COUNT(*) 
FROM functional.alltypes;



--
To view, visit http://gerrit.cloudera.org:8080/19534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfa80ce469cecfcfbcd0dcb595a6b04b7027285b
Gerrit-Change-Number: 19534
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Apr 2023 13:30:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11701 Part1: Don't push down predicates to scanner if already applied by Iceberg

Reply via email to