Incremental load traditionally means generating hfiles and using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load the data into hbase.
For your use case, the producer needs to find rows where the flag is 0 or 1. After such rows are obtained, it is up to you how the result of processing is delivered to hbase. Cheers On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Ok, Sure will ask. > > But what would be generic best practice solution for Incremental load from > HBASE. > > On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> I haven't used Gobblin. >> You can consider asking Gobblin mailing list of the first option. >> >> The second option would work. >> >> >> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri < >> chetan.opensou...@gmail.com> wrote: >> >>> Hello Guys, >>> >>> I would like to understand different approach for Distributed >>> Incremental load from HBase, Is there any *tool / incubactor tool* which >>> satisfy requirement ? >>> >>> *Approach 1:* >>> >>> Write Kafka Producer and maintain manually column flag for events and >>> ingest it with Linkedin Gobblin to HDFS / S3. >>> >>> *Approach 2:* >>> >>> Run Scheduled Spark Job - Read from HBase and do transformations and >>> maintain flag column at HBase Level. >>> >>> In above both approach, I need to maintain column level flags. such as 0 >>> - by default, 1-sent,2-sent and acknowledged. So next time Producer will >>> take another 1000 rows of batch where flag is 0 or 1. >>> >>> I am looking for best practice approach with any distributed tool. >>> >>> Thanks. >>> >>> - Chetan Khatri >>> >> >> >