Jean-Pierre Hoang created HIVE-21936:
----------------------------------------
Summary: Snapshot inconsistency plan execution
Key: HIVE-21936
URL: https://issues.apache.org/jira/browse/HIVE-21936
Project: Hive
Issue Type: Bug
Components: HBase Handler
Affects Versions: 2.3.5, 2.3.4, 3.1.1, 3.1.0, 2.3.2, 2.3.1, 3.0.0, 2.3.0,
2.2.0, 2.1.1, 2.1.0, 2.0.1, 1.2.2, 2.0.0, 1.2.1, 1.1.1
Reporter: Jean-Pierre Hoang
when using snapshot from hive, there are no validation of the existence of the
snapshot nor if the snapshot apply to the hive target table.
How to reproduce :
create two hive table backing from hbase:
{code:java}
CREATE TABLE default.employee(rowkey string, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string",
"hbase.table.name"= "default:employee" );
CREATE TABLE default.work(rowkey string, company string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ( "hbase.columns.mapping"= "cf:string",
"hbase.table.name"= "default:work" ); {code}
{{Insert some stuff in the tables:}}
{code:java}
INSERT INTO TABLE default.employee values("1", "Dupont");
INSERT INTO TABLE default.work values ("c1", "ACME");{code}
{{from Hbase, create a snapshot :}}
{code:java}
snapshot 'employee', 'mysnapshot'{code}
{{from beeline some sanity check}}
{code:java}
SELECT * FROM employee;
SELECT * FROM work;
{code}
{{Now that the set up is done, the first bug appearing is when setting the
snapshot name within hive and query another hbase table:}}
{code:java}
set hive.hbase.snapshot.name=mysnapshot;
SELECT * FROM work;{code}
{{The problem is the condition that trigger snapshot input format :}}
{code:java}
@Override
public Class<? extends InputFormat> getInputFormatClass() {
if (HiveConf.getVar(jobConf, HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) !=
null) {
LOG.debug("Using TableSnapshotInputFormat");
return HiveHBaseTableSnapshotInputFormat.class;
}
LOG.debug("Using HiveHBaseTableInputFormat");
return HiveHBaseTableInputFormat.class;
}{code}
{{}}
{{The second problem is the pushdown predicate when using the snapshot in a
query more complex than a simple select :}}
{code:java}
set hive.hbase.snapshot.name=mysnapshot;
SELECT * FROM employee a UNION ALL SELECT * FROM employee b;{code}
{{the result is not what we expect : all the column that is not rowkey is
null.}}
{{As a result, we can really use the snapshot feature for use case that need
analytic computation (full scan).}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)