[ https://issues.apache.org/jira/browse/HIVE-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yannik Zuehlke updated HIVE-11327: ---------------------------------- Tags: hive, predicatepushdown, hbase (was: hive predicatepushdown) > HiveQL to HBase - Predicate Pushdown for composite key not working > ------------------------------------------------------------------ > > Key: HIVE-11327 > URL: https://issues.apache.org/jira/browse/HIVE-11327 > Project: Hive > Issue Type: Bug > Components: HBase Handler, Hive > Affects Versions: 0.14.0 > Reporter: Yannik Zuehlke > Priority: Blocker > > I am using Hive 0.14 and Hbase 0.98.8 I would like to use HiveQL for > accessing a HBase "table". > I created a table with a complex composite rowkey: > ---- > {quote} > CREATE EXTERNAL TABLE db.hive_hbase (rowkey struct<p1:string, p2:string, > p3:string>, column1 string, column2 string) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' > COLLECTION ITEMS TERMINATED BY ';' > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = > ":key,cf:c1,cf:c2") > TBLPROPERTIES("hbase.table.name"="hbase_table"); > {quote} > ---- > The table is getting successfully created, but the HiveQL query is taking > forever: > ---- > {quote} > SELECT * from db.hive_hbase WHERE rowkey.p1 = 'xyz'; > {quote} > ---- > I am working with 1 TB of data (around 1,5 bn records) and this queries takes > forever (It ran over night, but did not finish in the morning). > I changed the log4j properties to 'DEBUG' and found some interesting > information: > ---- > {quote} > 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory > (OpProcFactory.java:logExpr(823)) - Pushdown Predicates of FIL For Alias : > hive_hbase > 2015-07-15 15:56:41,232 INFO ppd.OpProcFactory > (OpProcFactory.java:logExpr(826)) - (rowkey.p1 = 'xyz') > {quote} > ---- > But some lines later: > ---- > {quote} > 2015-07-15 15:56:41,430 DEBUG ppd.OpProcFactory > (OpProcFactory.java:pushFilterToStorageHandler(1051)) - No pushdown possible > for predicate: (rowkey.p1 = 'xyz') > {quote} > ---- > So my guess is: HiveQL over HBase does not do any predicate pushdown but > starts a MapReduce job. > The normal HBase scan (via the HBase Shell) takes around 5 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)