[ https://issues.apache.org/jira/browse/HIVE-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713722#comment-13713722 ]
Edward Capriolo commented on HIVE-4891: --------------------------------------- I suspect there are some funny issues when partitions are not the same format, this is a newish feature and we may not have as much coverage as we need. > Distinct includes duplicate records > ----------------------------------- > > Key: HIVE-4891 > URL: https://issues.apache.org/jira/browse/HIVE-4891 > Project: Hive > Issue Type: Bug > Components: File Formats, HiveServer2, Query Processor > Affects Versions: 0.10.0 > Reporter: Fengdong Yu > > I have two partitions, one is sequence file, another is RCFile, but they are > the same data(only different file format). > I have the following SQL: > {code} > select distinct uid from test where (dt ='20130718' or dt ='20130718_1') and > cur_url like '%cq.aa.com%'; > {code} > dt ='20130718' is sequence file,(default input format, which specified when > create table) > > dt ='20130718_1' is RCFile. > {code} > ALTER TABLE test ADD IF NOT EXISTS PARTITION (dt='20130718_1') LOCATION > '/user/test/test-data' > ALTER TABLE test PARTITION(dt='20130718_1') SET FILEFORMAT RCFILE; > {code} > but there are duplicate recoreds in the result. > If two partitions with the same input format, then there are no duplicate > records. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira