[ 
https://issues.apache.org/jira/browse/HIVE-22969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora reassigned HIVE-22969:
------------------------------------

    Assignee:     (was: Marta Kuczora)

> Union remove optimisation results incorrect data when inserting to ACID table
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-22969
>                 URL: https://issues.apache.org/jira/browse/HIVE-22969
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Marta Kuczora
>            Priority: Major
>
> Steps to reproduce the issue:
> {noformat}
> create table input_text(key string, val string) stored as textfile location 
> '/Users/martakuczora/work/hive/warehouse/external/input_text';
> create table output_acid(key string, val string) stored as orc 
> tblproperties('transactional'='true');
> insert into input_text values ('1','1'), ('2','2'),('3','3');
> {noformat}
> {noformat}
> set hive.mapred.mode=nonstrict;
> set hive.stats.autogather=false;
> set hive.optimize.union.remove=true;
> set hive.auto.convert.join=true;
> set hive.exec.submitviachild=false;
> set hive.exec.submit.local.task.via.child=false;
> SELECT * FROM (
> select key, val from input_text
> union all
> select a.key as key, b.val as val FROM input_text a join input_text b on 
> a.key=b.key) c;
> The result of the select:
> 1     1
> 2     2
> 3     3
> 1     1
> 2     2
> 3     3
> {noformat}
> {noformat}
> insert into table output_acid
> SELECT * FROM (
> select key, val from input_text
> union all
> select a.key as key, b.val as val FROM input_text a join input_text b on 
> a.key=b.key) c;
> select * from output_acid;
> The result:
> 1     1
> 2     2
> 3     3
> {noformat}
> The folder of the output_acid table contained the following delta directories:
> {noformat}
> drwxr-xr-x  6 martakuczora  staff  192 Mar  2 16:29 delta_0000000_0000000
> drwxr-xr-x  6 martakuczora  staff  192 Mar  2 16:29 delta_0000001_0000001_0001
> {noformat}
> It can be seen that the statement ID from the first directory is missing and 
> when the select statements runs on the table, this directory will be ignored. 
> That's why only half of the data got returned when running the select on the 
> output_acid table.
> If either hive.stats.autogather is set to true or hive.optimize.union.remove 
> is set to false the result of the insert will be correct. In this case there 
> will be only 1 delta directory in the table's folder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to