[ https://issues.apache.org/jira/browse/HIVE-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504962#comment-13504962 ]
Ashutosh Chauhan commented on HIVE-3734: ---------------------------------------- Gang, I fail to see a bug here. You didn't show how you created the srcpart, but I assume you did similar to following: {code} create table srcpart (key string, value string) partitioned by (ds string, hr string); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-08', hr='11'); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-08', hr='12'); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-09', hr='11'); load data local inpath '/home/ashutosh/workspace/hive/data/files/kv1.txt' overwrite into table srcpart partition (ds='2008-04-09', hr='12'); {code} If so, in your insert statement, you are going to select all the rows from srcpart corresponding to ds=2008-04-08 which includes rows corresponding to both hr=11 and hr=12 and then insert into testtable in partition ds='2008-04-08', hr='11'. This implies rows corresponding to hr=12 in srcpart will be in hr=11 in testtable. Then if you are going to do select key, value from testtable where ds='2008-04-08' and hr='11' and key = "484"; you will get two rows since hr='11' in testable has rows from hr='12' also of srcpart. This is expected. This is how partitioning has always worked in Hive. To be doubly sure, I also checked on hive-0.9, it has same behavior. Though, I agree it is bit confusing. > Static partition DML create duplicate files and records > ------------------------------------------------------- > > Key: HIVE-3734 > URL: https://issues.apache.org/jira/browse/HIVE-3734 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.10.0 > Reporter: Gang Tim Liu > > Static DML create duplicate files and record. > Given the following test case, hive will return 2 records: > 484 val_484 > 484 val_484 > but srcpart returns one record: > 484 val_484 > If you look at file system, DML generates duplicate file with the same > content: > -rw-r--r-- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 000000_0 > -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 000001_0 > Test Case > === > set hive.mapred.supports.subdirectories=true; > set hive.exec.dynamic.partition=true; > set hive.exec.dynamic.partition.mode=nonstrict; > set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; > set hive.merge.mapfiles=false; > set hive.merge.mapredfiles=false; > set mapred.input.dir.recursive=true; > create table testtable (key String, value String) partitioned by (ds String, > hr String) ; > explain extended > insert overwrite table testtable partition (ds='2008-04-08', hr='11') select > key, value from srcpart where ds='2008-04-08'; > insert overwrite table testtable partition (ds='2008-04-08', hr='11') select > key, value from srcpart where ds='2008-04-08'; > desc formatted testtable partition (ds='2008-04-08', hr='11'); > select count(1) from srcpart where ds='2008-04-08'; > select count(1) from testtable where ds='2008-04-08'; > select key, value from srcpart where ds='2008-04-08' and hr='11' and key = > "484"; > explain extended > select key, value from testtable where ds='2008-04-08' and hr='11' and key = > "484"; > select key, value from testtable where ds='2008-04-08' and hr='11' and key = > "484"; > === -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira