+1,  good idea to implement streamer tool.

Regards
Nihal

On 2021/08/31 17:47:35, Akash Nilugal <akashnilu...@gmail.com> wrote: 
> Hi Community,
> 
> OLTP systems like Mysql are used heavily for storing transactional data in
> real-time and the same data is later used for doing fraud detection and
> taking various data-driven business decisions. Since OLTP systems are not
> suited for analytical queries due to their row-based storage, there is a
> need to store this primary data into big data storage in a way that data on
> DFS is an exact replica of the data present in Mysql. Traditional ways for
> capturing data from primary databases, like Apache Sqoop, use pull-based
> CDC approaches which put additional load on the primary databases. Hence
> log-based CDC solutions became increasingly popular. However, there are 2
> aspects to this problem. We should be able to incrementally capture the
> data changes from primary databases and should be able to incrementally
> ingest the same in the data lake so that the overall latency decreases. The
> former is taken care of using log-based CDC systems like Maxwell and
> Debezium. Here we are proposing a solution for the second aspect using
> Apache Carbondata.
> 
> Carbondata streamer tool enables users to incrementally ingest data from
> various sources, like Kafka and DFS into their data lakes. The tool comes
> with out-of-the-box support for almost all types of schema evolution use
> cases. Currently, this tool can be launched as a spark application either
> in continuous mode or a one-time job.
> 
> Further details are present in the design document. Please review the
> design and help to improve it. I'm attaching the link to the google doc,
> you can directly comment on that. Any suggestions and improvements are most
> welcome.
> 
> https://docs.google.com/document/d/1x66X5LU5silp4wLzjxx2Hxmt78gFRLF_8IocapoXxJk/edit?usp=sharing
> 
> Thanks
> 
> Regards,
> Akash R Nilugal
> 

Reply via email to