Mala Chikka Kempanna created HIVE-7248:
------------------------------------------
Summary: UNION ALL in hive returns incorrect results on Hbase
backed table
Key: HIVE-7248
URL: https://issues.apache.org/jira/browse/HIVE-7248
Project: Hive
Issue Type: Bug
Affects Versions: 0.13.1, 0.13.0, 0.12.0
Reporter: Mala Chikka Kempanna
The issue can be recreated with following steps
1) In hbase
create 'TABLE_EMP','default'
2) On hive
sudo -u hive hive
CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME
string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES("hbase.columns.mapping" =
"default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key",
"hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" )
TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');
3) On hbase insert the following data
put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini'
put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P'
put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind'
put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K'
put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
4) On hive execute the following query
hive
SELECT *
FROM (
SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= '0'
AND CDS_PK <= '9'
AND CDS_UPDATED_DATE IS NOT NULL
UNION ALL SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= 'a'
AND CDS_PK <= 'z'
AND CDS_UPDATED_DATE IS NOT NULL
)t ;
5) Output of the query
1
1
2
2
6) Output of just
SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= '0'
AND CDS_PK <= '9'
AND CDS_UPDATED_DATE IS NOT NULL
is
1
2
7) Output of just
SELECT CDS_PK
FROM TABLE_EMP
WHERE
CDS_PK >= 'a'
AND CDS_PK <= 'z'
AND CDS_UPDATED_DATE IS NOT NULL
Empty
8) UNION is used to combine the result from multiple SELECT statements into a
single result set. Hive currently only supports UNION ALL (bag union), in which
duplicates are not eliminated
Accordingly above query should return output
1
2
instead it is giving wrong output
1
1
2
2
--
This message was sent by Atlassian JIRA
(v6.2#6252)