[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navis updated HIVE-7248: ------------------------ Attachment: HIVE-7248.3.patch.txt Rebased to trunk > UNION ALL in hive returns incorrect results on Hbase backed table > ----------------------------------------------------------------- > > Key: HIVE-7248 > URL: https://issues.apache.org/jira/browse/HIVE-7248 > Project: Hive > Issue Type: Bug > Components: HBase Handler > Affects Versions: 0.12.0, 0.13.0, 0.13.1 > Reporter: Mala Chikka Kempanna > Assignee: Navis > Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt, > HIVE-7248.3.patch.txt > > > The issue can be recreated with following steps > 1) In hbase > create 'TABLE_EMP','default' > 2) On hive > sudo -u hive hive > CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME > string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY > 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH > SERDEPROPERTIES("hbase.columns.mapping" = > "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", > "hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) > TBLPROPERTIES("hbase.table.name" = > "TABLE_EMP",'serialization.null.format'=''); > 3) On hbase insert the following data > put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' > put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' > put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' > put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' > put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' > put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' > 4) On hive execute the following query > hive > SELECT * > FROM ( > SELECT CDS_PK > FROM TABLE_EMP > WHERE > CDS_PK >= '0' > AND CDS_PK <= '9' > AND CDS_UPDATED_DATE IS NOT NULL > UNION ALL SELECT CDS_PK > FROM TABLE_EMP > WHERE > CDS_PK >= 'a' > AND CDS_PK <= 'z' > AND CDS_UPDATED_DATE IS NOT NULL > )t ; > 5) Output of the query > 1 > 1 > 2 > 2 > 6) Output of just > SELECT CDS_PK > FROM TABLE_EMP > WHERE > CDS_PK >= '0' > AND CDS_PK <= '9' > AND CDS_UPDATED_DATE IS NOT NULL > is > 1 > 2 > 7) Output of just > SELECT CDS_PK > FROM TABLE_EMP > WHERE > CDS_PK >= 'a' > AND CDS_PK <= 'z' > AND CDS_UPDATED_DATE IS NOT NULL > Empty > 8) UNION is used to combine the result from multiple SELECT statements into a > single result set. Hive currently only supports UNION ALL (bag union), in > which duplicates are not eliminated > Accordingly above query should return output > 1 > 2 > instead it is giving wrong output > 1 > 1 > 2 > 2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)