[ 
https://issues.apache.org/jira/browse/HIVE-28120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834511#comment-17834511
 ] 

Denys Kuzmenko commented on HIVE-28120:
---------------------------------------

[~xinmingchang], is this ticket still valid, or could be closed?

> When insert overwrite the iceberg table, data will loss if the sql contains 
> union all
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-28120
>                 URL: https://issues.apache.org/jira/browse/HIVE-28120
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>    Affects Versions: 4.0.0-beta-1
>         Environment: hadoop version: 3.3.1
> hive version: 4.0.0-beta-1
> iceberg version: 1.3.0
>            Reporter: xinmingchang
>            Priority: Critical
>
> {{(1)}}
> create table tmp.test_iceberg_overwrite_union_all(
> a string
> )
> stored by iceberg
> ;
> {{(2)}}
> insert overwrite table tmp.test_iceberg_overwrite_union_all
> select distinct 'a' union all select distinct 'b';
> {{(3)}}
> select * from tmp.test_iceberg_overwrite_union_all;
>  
> the result only has one record:
> +-------------------------------------+
> | test_iceberg_overwrite_union_all.a  |
> +-------------------------------------+
> | a                                   |
> +-------------------------------------+
> According to the hiveserver log, this query will start two jobs, and each job 
> will be committed. The problem is that the job that is committed later is 
> also an overwrite, causing the result of the first commit to be overwritten. 
> like this:
> 2024-03-05T22:10:12,995 INFO  [iceberg-commit-table-pool-0]: 
> hive.HiveIcebergOutputCommitter () - Committing job has started for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:13,081 INFO  [iceberg-commit-table-pool-1]: 
> hive.HiveIcebergOutputCommitter () - Committing job has started for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all
> 2024-03-05T22:10:15,152 INFO  [iceberg-commit-table-pool-0]: 
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 2157 ms for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)
> 2024-03-05T22:10:16,980 INFO  [iceberg-commit-table-pool-1]: 
> hive.HiveIcebergOutputCommitter () - Overwrite commit took 3899 ms for table: 
> default_iceberg.tmp.test_iceberg_overwrite_union_all with 1 file(s)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to