Re: Duplication Kafka/CDC projects

邓科 Wed, 09 Oct 2024 20:28:32 -0700

Hello!

Thank you very much for your reply. Regarding the questions you raised in
your email, I will respond one by one.


Firstly, we have not updated the design; the document you pasted is the
latest design document.

Secondly, our internal implementation is based on the fix and improvements
made in this commit: https://gerrit.cloudera.org/#/c/19909/. It primarily
involves enhancements in Kafka connection management, WAL playback, and
fixes for Kerberos scenarios, among other features and scenarios.

Lastly, during implementation, the most trickiest part was handling Kudu
after Kafka exceptions. Initially, our approach was to retry requests
continuously, expecting Kafka to recover. However, if Kafka did not
recover, this approach would eventually prevent Kudu from processing new
requests. Later, we modified it to retry for a period or a certain number
of times after Kafka exceptions. If recovery was not achieved, we would let
the tserver process exit. This approach can mitigate the continuous impact
of Kudu exceptions in private deployments but may affect other clients in
SaaS environments.Ultimately, our chosen solution is to unsubscribe from
the table and trigger alerts after a certain number of Kafka write
failures, expecting manual intervention to re-subscribe after resolution.

These are the responses to your questions. If anything is unclear or if
you're interested in other aspects, please let me know.

Thanks,
Ke Deng

Marton Greber <[email protected]> 于2024年10月9日周三 18:04写道：

> Hi!
>
> I'm glad to hear updates about this project from your end!
> On our end I did not manage to get great progress as I dealt with infra and
> the python package etc. However I have been tinkering with the idea of
> writing a Flink job that does diff scans and writes those results to a DR
> Kudu cluster. This is still in POC state.
> I'm happy to hear that you've managed to use your proposal in production
> that is great news!
> Internally we've talked about this and one key point is that it would be
> probably best if we implemented a proper CDC interface for Kudu. This could
> be then used to satisfy your requirement as well, right? (eg picking
> changes up in Kafka - please correct me if I'm wrong here)
> Personally I think if you have promising results from prod deployments its
> a great thing, and it benefits the project and the project health.
> It's been quite a while and I would like to take a couple of days and
> understand(again) your proposal, for this I have a couple of questions:
> - are there any updates to the design? (is this up to date:
>
> https://docs.google.com/document/d/1ihqPFO1vulpYDYYKcHmhes0LCOXKQtdAf5ub0y2LiaM/edit
> ?
> <https://docs.google.com/document/d/1ihqPFO1vulpYDYYKcHmhes0LCOXKQtdAf5ub0y2LiaM/edit?>
> )
> - is https://gerrit.cloudera.org/#/c/19909/ that went to prod, or did you
> apply any other changes eg improvements, fixes?
> - have you experienced any pitfalls with the design/implementation?
>
> Anyway will go through the design doc again, and post questions, and we can
> continue the discussion in this thread.
>
> Thank you!
> Marton
>
>
> 邓科 <[email protected]> ezt írta (időpont: 2024. okt. 8., K, 9:21):
>
> > Hello, Marton.
> >
> > It's been a while since you sent this email, and I'm wondering if there
> has
> > been any progress.
> > Based on the commit we made at https://gerrit.cloudera.org/#/c/19909/,
> > we've implemented a relatively stable version that has been running
> stably
> > in dozens of customer environments for a long time. We are now
> considering
> > whether to contribute this feature back to the community. If you could
> > share your progress and future plans, we can determine whether there is
> > still a need to submit this part of the code.
> >
> > Thanks,
> > Ke Deng
> >
> > Marton Greber <[email protected]> 于2023年11月14日周二 23:10写道：
> >
> > > Devs,
> > >
> > >
> > > We have been tinkering with a proof of concept(POC) to accomplish cross
> > > cluster async replication. The use case is to have a backup Kudu
> cluster
> > > for disaster recovery. The new replication feature could be treated as
> an
> > > alternative to backup/restore, but with finer time granularity.
> Moreover
> > it
> > > would eliminate the need for intermediate storage.
> > >
> > >
> > > For our POC we have been looking for inspiration at YugaByte xCluster
> > > replication
> > > <
> > >
> >
> https://docs.yugabyte.com/preview/architecture/docdb-replication/async-replication/
> > > >
> > > (the active-passive part). This would be a CDC based approach, where we
> > > have Kudu CDC producers/consumers.
> > >
> > >
> > > On the other hand while looking at
> https://gerrit.cloudera.org/c/19909/
> > > "Support write ops to kafka with kafka client” I’ve found some
> > > similarities. Here, according to my understanding, the goal is to
> > > move records from Kudu into Kafka.
> > >
> > >
> > > I think there is an intersection between these two projects, and wanted
> > to
> > > start the conversation about potential ways to consolidate these two
> > > projects. Figuring out what are the commonalities, and thereby avoiding
> > > pushing in pieces of changes which are quite similar (bloating the
> > > codebase).
> > >
> > >
> > > —
> > >
> > >
> > > Some initial thoughts:
> > >
> > >    - the need for a CDC interface for Kudu emerges as a commonality
> > >    - https://gerrit.cloudera.org/c/19909/ could avoid adding the Kafka
> > >    client into Kudu, by leveraging the above CDC interface in a Kafka
> > > source
> > >    connector for example
> > >       - this would maybe be a better separation of concerns
> > >    - We could re-think our CDC approach such that it is a generic
> > interface
> > >    rather than the Kudu:Kudu specific one.
> > >    - Maybe, for our purpose, we could maybe initially reuse the Kudu ->
> > >    Kafka (with connector) way, and implement async replication by
> > > implementing
> > >    the other end: the Kafka sink connector
> > >
> > > —
> > >
> > >
> > > Let me know your thoughts on this one!
> > >
> > >
> > > Thanks,
> > > Marton
> > >
> >
>

Re: Duplication Kafka/CDC projects

Reply via email to