Re: Approach: Incremental data load from HBASE

2017-01-06 Thread Chetan Khatri
Hi Ayan, I mean by Incremental load from HBase, weekly running batch jobs takes rows from HBase table and dump it out to Hive. Now when next i run Job it only takes newly arrived jobs. Same as if we use Sqoop for incremental load from RDBMS to Hive with below command, sqoop job --create myssb1

Re: Approach: Incremental data load from HBASE

2017-01-04 Thread ayan guha
Hi Chetan What do you mean by incremental load from HBase? There is a timestamp marker for each cell, but not at Row level. On Wed, Jan 4, 2017 at 10:37 PM, Chetan Khatri wrote: > Ted Yu, > > You understood wrong, i said Incremental load from HBase to Hive, >

Re: Approach: Incremental data load from HBASE

2017-01-04 Thread Chetan Khatri
Ted Yu, You understood wrong, i said Incremental load from HBase to Hive, individually you can say Incremental Import from HBase. On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu wrote: > Incremental load traditionally means generating hfiles and > using

Re: Approach: Incremental data load from HBASE

2016-12-23 Thread Chetan Khatri
Ted Correct, In my case i want Incremental Import from HBASE and Incremental load to Hive. Both approach discussed earlier with Indexing seems accurate to me. But like Sqoop support Incremental import and load for RDBMS, Is there any tool which supports Incremental import from HBase ? On Wed,

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
Incremental load traditionally means generating hfiles and using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load the data into hbase. For your use case, the producer needs to find rows where the flag is 0 or 1. After such rows are obtained, it is up to you how the result of

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Chetan Khatri
Ok, Sure will ask. But what would be generic best practice solution for Incremental load from HBASE. On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu wrote: > I haven't used Gobblin. > You can consider asking Gobblin mailing list of the first option. > > The second option would

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
I haven't used Gobblin. You can consider asking Gobblin mailing list of the first option. The second option would work. On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri wrote: > Hello Guys, > > I would like to understand different approach for Distributed

Approach: Incremental data load from HBASE

2016-12-21 Thread Chetan Khatri
Hello Guys, I would like to understand different approach for Distributed Incremental load from HBase, Is there any *tool / incubactor tool* which satisfy requirement ? *Approach 1:* Write Kafka Producer and maintain manually column flag for events and ingest it with Linkedin Gobblin to HDFS /