Delta or incremental loading for Hbase table
We have a Hbase table. Each time we aggreate the table based on some columns, we are doing full scan for entire table. What are the ideas for extracting just the delta or increments frokm the last loading . Right now i m following this approach. But want some better ideas. - Mount the hbase into Hive table -The rowkey of hbase table is mapped to key column in hive table. - extracting the timestamp from rowkey and extracting yesterday's data. - also there is a timestamp column ( non key) . I am extracting previous days's data and aggregating it - Then merging the incremental aggregated data into target aggregate table using full outer join . Questions 1) any better sugestions for incremental loading 2) if the use of key column from Hive , give any perfromance benefit. I dont see much change in terms of timing.
Extract datetime from reverse Time stamp.
My rowkey contains reverseTimestamp ( Max value - current time stamp) Example 9223370646332874562 select FROM_UNIXTIME ( unix_timestamp( '9223370646332874562','MMddHHmmssSSS')) > from HiveTest limit 1; 9226-01-07 22:34:42--obviously this wont give me right result as its reverse time stamp. Max value - current time stamp
Tuning Hive queries that uses underlying HBase Table
I am querying Hive table ( mapped to HBase Table ) . What are the techniques to tune the Hive query and to avoid HBase scans. Query uses multiple SPLIT and SUBSTR functions and WHERE condition something like select col1, col2, ...,count(*) from hiveTable where split( col1)[0] > timestamp1 and split( col1)[0]