[ 
https://issues.apache.org/jira/browse/HIVE-28765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shohei Okumiya resolved HIVE-28765.
-----------------------------------
    Fix Version/s: 4.1.0
       Resolution: Fixed

> Iceberg: Incorrect partition statistics on time travel + partition evolution
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-28765
>                 URL: https://issues.apache.org/jira/browse/HIVE-28765
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>            Reporter: Shohei Okumiya
>            Assignee: Shohei Okumiya
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> HiveIcebergStorageHandler fails to fetch stats of evolved partitions. It 
> triggers schema and FS level stats estimation as a fallback and results in 
> unreasonable counts.
> {code:java}
> CREATE TABLE test (key INT, id INT) PARTITIONED BY SPEC (bucket(4, key)) 
> STORED BY ICEBERG;
> INSERT INTO test VALUES (1, 1), (2, 2), (3, 3), (4, 4);
> ALTER TABLE test CREATE TAG version1;ALTER TABLE test SET PARTITION SPEC 
> (bucket(256, key));
> INSERT INTO test VALUES (1, 5), (2, 6), (3, 7), (4, 8);
> ALTER TABLE test CREATE TAG version2;SET hive.fetch.task.conversion=none;
> EXPLAIN SELECT * FROM default.test.tag_version1;
> +----------------------------------------------------+
> |                      Explain                       |
> +----------------------------------------------------+
> | Plan optimized by CBO.                             |
> |                                                    |
> | Stage-0                                            |
> |   Fetch Operator                                   |
> |     limit:-1                                       |
> |     Stage-1                                        |
> |       Map 1 vectorized                             |
> |       File Output Operator [FS_4]                  |
> |         Select Operator [SEL_3] (rows=12940 width=8) |
> |           Output:["_col0","_col1"]                 |
> |           TableScan [TS_0] (rows=12940 width=8)    |
> |             default@test,test,Tbl:PARTIAL,Col:COMPLETE,Output:["key","id"] |
> |                                                    |
> +----------------------------------------------------+{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to