Delta or incremental loading for Hbase table

2014-03-25 Thread Manjula mohapatra
We have a Hbase table.
Each time we aggreate the table based on some columns, we are doing full
scan for entire table.
What are the ideas for extracting just the delta or increments frokm the
last loading .


Right now i m following this approach. But want some better ideas.
- Mount the hbase into Hive table
-The rowkey of hbase table is mapped to key column in hive table.
- extracting the timestamp from rowkey and extracting yesterday's data.
- also there is a timestamp column ( non key) . I am extracting previous
days's data and aggregating it
- Then merging the incremental aggregated data into target aggregate table
using full outer join .


Questions 1) any better sugestions for incremental loading
2) if the use of key column from Hive , give any perfromance benefit. I
dont see much change in terms of timing.


Extract datetime from reverse Time stamp.

2014-03-14 Thread Manjula mohapatra
My rowkey contains reverseTimestamp ( Max value - current time stamp)
Example  9223370646332874562



select FROM_UNIXTIME ( unix_timestamp(
'9223370646332874562','MMddHHmmssSSS'))
> from HiveTest limit 1;

9226-01-07 22:34:42--obviously this wont give me right result as its
reverse time stamp.

Max value - current time stamp


Tuning Hive queries that uses underlying HBase Table

2014-02-20 Thread Manjula mohapatra
I am querying Hive table ( mapped to HBase Table ) .

What are the techniques to tune the Hive query and to avoid HBase scans.

Query uses multiple SPLIT and SUBSTR functions and WHERE  condition
something like

select  col1, col2, ...,count(*)
from hiveTable

where split( col1)[0] > timestamp1  and split( col1)[0]