Marton Bod created HIVE-24133:
---------------------------------
Summary: Hive query with Hbase storagehandler can give back
incorrect results when predicate contains null check
Key: HIVE-24133
URL: https://issues.apache.org/jira/browse/HIVE-24133
Project: Hive
Issue Type: Bug
Reporter: Marton Bod
It has been observed that when using Hbase storage handler and the table
contains null values, Hive can give back wrong query results, depending on what
columns we select for and whether the where clause predicate contains any null
checks.
For example:
create 'default:hive_test', 'cf'
put 'default:hive_test', '1', 'cf:col1', 'val1'
put 'default:hive_test', '1', 'cf:col2', 'val2'
put 'default:hive_test', '2', 'cf:col1', 'val1_2'
put 'default:hive_test', '2', 'cf:col2', 'val2_2'
put 'default:hive_test', '3', 'cf:col1', 'val1_3'
put 'default:hive_test', '3', 'cf:col2', 'val2_3'
put 'default:hive_test', '3', 'cf:col3', 'val3_3'
put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"
put 'default:hive_test', '4', 'cf:col1', 'val1_4'
put 'default:hive_test', '4', 'cf:col2', 'val2_4'
scan 'default:hive_test'
===== HIVE
CREATE EXTERNAL TABLE hbase_hive_test (
rowkey string,
col1 string,
col2 string,
col3 string
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
)
TBLPROPERTIES("hbase.table.name" = "default:hive_test");
query: select * from hbase_hive_test where col3 is null;
result:
Total MapReduce CPU Time Spent: 10 seconds 980 msec
OK
1 val1 val2 NULL
2 val1_2 val2_2 NULL
4 val1_4 val2_4 NULL
query: select rowkey from hbase_hive_test where col3 is null;
This does not produce any records.
However, select rowkey, col2 from hbase_hive_test where col3 is null;
This gives back the correct results again.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)