Ok, Sure will ask. But what would be generic best practice solution for Incremental load from HBASE.
On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhih...@gmail.com> wrote: > I haven't used Gobblin. > You can consider asking Gobblin mailing list of the first option. > > The second option would work. > > > On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri < > chetan.opensou...@gmail.com> wrote: > >> Hello Guys, >> >> I would like to understand different approach for Distributed Incremental >> load from HBase, Is there any *tool / incubactor tool* which satisfy >> requirement ? >> >> *Approach 1:* >> >> Write Kafka Producer and maintain manually column flag for events and >> ingest it with Linkedin Gobblin to HDFS / S3. >> >> *Approach 2:* >> >> Run Scheduled Spark Job - Read from HBase and do transformations and >> maintain flag column at HBase Level. >> >> In above both approach, I need to maintain column level flags. such as 0 >> - by default, 1-sent,2-sent and acknowledged. So next time Producer will >> take another 1000 rows of batch where flag is 0 or 1. >> >> I am looking for best practice approach with any distributed tool. >> >> Thanks. >> >> - Chetan Khatri >> > >