I haven't used Gobblin. You can consider asking Gobblin mailing list of the first option.
The second option would work. On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hello Guys, > > I would like to understand different approach for Distributed Incremental > load from HBase, Is there any *tool / incubactor tool* which satisfy > requirement ? > > *Approach 1:* > > Write Kafka Producer and maintain manually column flag for events and > ingest it with Linkedin Gobblin to HDFS / S3. > > *Approach 2:* > > Run Scheduled Spark Job - Read from HBase and do transformations and > maintain flag column at HBase Level. > > In above both approach, I need to maintain column level flags. such as 0 - > by default, 1-sent,2-sent and acknowledged. So next time Producer will take > another 1000 rows of batch where flag is 0 or 1. > > I am looking for best practice approach with any distributed tool. > > Thanks. > > - Chetan Khatri >