Re: Approach: Incremental data load from HBASE

Ted Yu Wed, 21 Dec 2016 08:35:08 -0800

Incremental load traditionally means generating hfiles and
using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load the
data into hbase.


For your use case, the producer needs to find rows where the flag is 0 or 1.
After such rows are obtained, it is up to you how the result of processing
is delivered to hbase.

Cheers

On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri <chetan.opensou...@gmail.com>
wrote:

> Ok, Sure will ask.
>
> But what would be generic best practice solution for Incremental load from
> HBASE.
>
> On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> I haven't used Gobblin.
>> You can consider asking Gobblin mailing list of the first option.
>>
>> The second option would work.
>>
>>
>> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri <
>> chetan.opensou...@gmail.com> wrote:
>>
>>> Hello Guys,
>>>
>>> I would like to understand different approach for Distributed
>>> Incremental load from HBase, Is there any *tool / incubactor tool* which
>>> satisfy requirement ?
>>>
>>> *Approach 1:*
>>>
>>> Write Kafka Producer and maintain manually column flag for events and
>>> ingest it with Linkedin Gobblin to HDFS / S3.
>>>
>>> *Approach 2:*
>>>
>>> Run Scheduled Spark Job - Read from HBase and do transformations and
>>> maintain flag column at HBase Level.
>>>
>>> In above both approach, I need to maintain column level flags. such as 0
>>> - by default, 1-sent,2-sent and acknowledged. So next time Producer will
>>> take another 1000 rows of batch where flag is 0 or 1.
>>>
>>> I am looking for best practice approach with any distributed tool.
>>>
>>> Thanks.
>>>
>>> - Chetan Khatri
>>>
>>
>>
>

Re: Approach: Incremental data load from HBASE

Reply via email to