[ 
https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7248:
------------------------
    Attachment: HIVE-7248.3.patch.txt

Rebased to trunk

> UNION ALL in hive returns incorrect results on Hbase backed table
> -----------------------------------------------------------------
>
>                 Key: HIVE-7248
>                 URL: https://issues.apache.org/jira/browse/HIVE-7248
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>    Affects Versions: 0.12.0, 0.13.0, 0.13.1
>            Reporter: Mala Chikka Kempanna
>            Assignee: Navis
>         Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt, 
> HIVE-7248.3.patch.txt
>
>
> The issue can be recreated with following steps
> 1) In hbase 
> create 'TABLE_EMP','default' 
> 2) On hive 
> sudo -u hive hive 
> CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME 
> string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
> SERDEPROPERTIES("hbase.columns.mapping" = 
> "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", 
> "hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) 
> TBLPROPERTIES("hbase.table.name" = 
> "TABLE_EMP",'serialization.null.format'=''); 
> 3) On hbase insert the following data 
> put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini' 
> put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P' 
> put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
> put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind' 
> put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K' 
> put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00' 
> 4) On hive execute the following query 
> hive 
> SELECT * 
> FROM ( 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= '0' 
> AND CDS_PK <= '9' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> UNION ALL SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= 'a' 
> AND CDS_PK <= 'z' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> )t ; 
> 5) Output of the query 
> 1 
> 1 
> 2 
> 2 
> 6) Output of just 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= '0' 
> AND CDS_PK <= '9' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> is 
> 1 
> 2 
> 7) Output of just 
> SELECT CDS_PK 
> FROM TABLE_EMP 
> WHERE 
> CDS_PK >= 'a' 
> AND CDS_PK <= 'z' 
> AND CDS_UPDATED_DATE IS NOT NULL 
> Empty 
> 8) UNION is used to combine the result from multiple SELECT statements into a 
> single result set. Hive currently only supports UNION ALL (bag union), in 
> which duplicates are not eliminated 
> Accordingly above query should return output 
> 1 
> 2 
> instead it is giving wrong output 
> 1 
> 1 
> 2 
> 2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to