[ 
https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504962#comment-13504962
 ] 

Ashutosh Chauhan commented on HIVE-3734:
----------------------------------------

Gang,
I fail to see a bug here. You didn't show how you created the srcpart, but I 
assume you did similar to following: 
{code}
create table srcpart (key string, value string) partitioned by (ds string, hr 
string);
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-08', hr='11');
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-08', hr='12');
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-09', hr='11');
load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' 
overwrite into table srcpart partition (ds='2008-04-09', hr='12');
{code}

If so, in your insert statement, you are going to select all the rows from 
srcpart corresponding to ds=2008-04-08 which includes rows corresponding to 
both hr=11 and hr=12 and then insert into testtable in partition 
ds='2008-04-08', hr='11'. This implies rows corresponding to hr=12 in srcpart 
will be in hr=11 in testtable. Then if you are going to do select key, value 
from testtable where ds='2008-04-08' and hr='11' and key = "484"; you will get 
two rows since hr='11' in testable has rows from hr='12' also of srcpart. This 
is expected. This is how partitioning has always worked in Hive. To be doubly 
sure, I also checked on hive-0.9, it has same behavior. 
Though, I agree it is bit confusing.
                
> Static partition DML create duplicate files and records
> -------------------------------------------------------
>
>                 Key: HIVE-3734
>                 URL: https://issues.apache.org/jira/browse/HIVE-3734
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0
>            Reporter: Gang Tim Liu
>
> Static DML create duplicate files and record.
> Given the following test case, hive will return 2 records:
> 484   val_484
> 484   val_484
> but srcpart returns one record:
> 484   val_484
> If you look at file system, DML generates duplicate file with the same 
> content:
> -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 000000_0
> -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 000001_0
> Test Case
> ===
> set hive.mapred.supports.subdirectories=true;
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
> set hive.merge.mapfiles=false;
> set hive.merge.mapredfiles=false;
> set mapred.input.dir.recursive=true;
> create table testtable (key String, value String) partitioned by (ds String, 
> hr String) ;
> explain extended
> insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
> key, value from srcpart where ds='2008-04-08';
> insert overwrite table testtable partition (ds='2008-04-08', hr='11') select 
> key, value from srcpart where ds='2008-04-08';
> desc formatted testtable partition (ds='2008-04-08', hr='11');
> select count(1) from srcpart where ds='2008-04-08';
> select count(1) from testtable where ds='2008-04-08';
> select key, value from srcpart where ds='2008-04-08' and hr='11' and key = 
> "484";
> explain extended
> select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
> "484";
> select key, value from testtable where ds='2008-04-08' and hr='11' and key = 
> "484";
> ===

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to