[ 
https://issues.apache.org/jira/browse/HIVE-28581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017352#comment-18017352
 ] 

Vikram Ahuja commented on HIVE-28581:
-------------------------------------

[~dkuzmenko] , thanks for working on this.

Is there any performance benchmark(TPCDS for instance) on some dataset to see 
the change in performance after this patch?

> Support Partition Pruning stats optimization for Iceberg tables
> ---------------------------------------------------------------
>
>                 Key: HIVE-28581
>                 URL: https://issues.apache.org/jira/browse/HIVE-28581
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Denys Kuzmenko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Add support for Iceberg partition prune stats optimization
> {code}
> create external table ice01 (`i` int, `t` timestamp) 
>     partitioned by (year int, month int, day int) 
> stored by iceberg tblproperties ('format-version'='2', 
> 'write.summary.partition-limit'='10');
> insert into ice01 (i, year, month, day) values
> (1, 2023, 10, 3),
> (2, 2023, 10, 3),
> (2, 2023, 10, 3),
> (3, 2023, 10, 4),
> (4, 2023, 10, 4);
> {code}
> explain
> select i from ice01 where year=2023 and month = 10 and day = 3;
> {code}
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@ice01
> POSTHOOK: Input: default@ice01@year=2023/month=10/day=3
> POSTHOOK: Output: hdfs://### HDFS PATH ###
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
> #### A masked pattern was here ####
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: ice01
>                   filterExpr: ((year = 2023) and (month = 10) and (day = 3)) 
> (type: boolean)
>                   Statistics: Num rows: 3 Data size: 48 Basic stats: COMPLETE 
> Column stats: NONE
>                   Filter Operator
>                     predicate: ((year = 2023) and (month = 10) and (day = 3)) 
> (type: boolean)
>                     Statistics: Num rows: 3 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: i (type: int)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 3 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                       File Output Operator
>                         compressed: false
>                         Statistics: Num rows: 3 Data size: 48 Basic stats: 
> COMPLETE Column stats: NONE
>                         table:
>                             input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                             output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                             serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to