Re: Duplication Kafka/CDC projects

邓科 Tue, 08 Oct 2024 00:12:35 -0700

Hello, Marton.

It's been a while since you sent this email, and I'm wondering if there has
been any progress.
Based on the commit we made at https://gerrit.cloudera.org/#/c/19909/,
we've implemented a relatively stable version that has been running stably
in dozens of customer environments for a long time. We are now considering
whether to contribute this feature back to the community. If you could
share your progress and future plans, we can determine whether there is
still a need to submit this part of the code.


Thanks,
Ke Deng

Marton Greber <[email protected]> 于2023年11月14日周二 23:10写道：

> Devs,
>
>
> We have been tinkering with a proof of concept(POC) to accomplish cross
> cluster async replication. The use case is to have a backup Kudu cluster
> for disaster recovery. The new replication feature could be treated as an
> alternative to backup/restore, but with finer time granularity. Moreover it
> would eliminate the need for intermediate storage.
>
>
> For our POC we have been looking for inspiration at YugaByte xCluster
> replication
> <
> https://docs.yugabyte.com/preview/architecture/docdb-replication/async-replication/
> >
> (the active-passive part). This would be a CDC based approach, where we
> have Kudu CDC producers/consumers.
>
>
> On the other hand while looking at https://gerrit.cloudera.org/c/19909/
> "Support write ops to kafka with kafka client” I’ve found some
> similarities. Here, according to my understanding, the goal is to
> move records from Kudu into Kafka.
>
>
> I think there is an intersection between these two projects, and wanted to
> start the conversation about potential ways to consolidate these two
> projects. Figuring out what are the commonalities, and thereby avoiding
> pushing in pieces of changes which are quite similar (bloating the
> codebase).
>
>
> —
>
>
> Some initial thoughts:
>
>    - the need for a CDC interface for Kudu emerges as a commonality
>    - https://gerrit.cloudera.org/c/19909/ could avoid adding the Kafka
>    client into Kudu, by leveraging the above CDC interface in a Kafka
> source
>    connector for example
>       - this would maybe be a better separation of concerns
>    - We could re-think our CDC approach such that it is a generic interface
>    rather than the Kudu:Kudu specific one.
>    - Maybe, for our purpose, we could maybe initially reuse the Kudu ->
>    Kafka (with connector) way, and implement async replication by
> implementing
>    the other end: the Kafka sink connector
>
> —
>
>
> Let me know your thoughts on this one!
>
>
> Thanks,
> Marton
>

Re: Duplication Kafka/CDC projects

Reply via email to