Hi David,

Thank you for your response and your interest in this one. I also
agree with the main counter-argument, having a single control-plane
for all tenants would have a greater blast radius, I still think it
would also be more cost effective. The idea is having a multi-region
(3 AZ) controller quorum and then dual region (2 AZ) broker cluster
and sharing this control plane with all other Kafka clusters. Actually
this is not something new ¹, I have personally battle tested this
setup and it works just as expected.

With both ZK and Kraft based deployments one can have different
availability levels for metadata and data planes, although there is no
chroot functionality for Kraft

- chroot comes handy when the scale is getting larger and more Kafka
clusters are being provisioned.
- Within restricted network topologies (e.g. DMZ, GDPR or equivalent
regulative requirement) separation of roles (metadata / data) helps
complying with the regulations and chroot in this case enable us to
re-use same metadata ensemble for new Kafka clusters
- When data and metadata planes are separated it is easier to achieve
0 RTO and RPO if the metadata plane is distributed in three regions.
Data plane does not have that requirement.²

Cost effectiveness comes from reduced compute resources and eased
management requirements.

¹ 
https://docs.confluent.io/platform/current/multi-dc-deployments/multi-region-architectures.html#stretched-cluster-2-5-data-center-cp-only
² 
https://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen#:~:text=we%20can%20tolerate%20N%2D1%20Kafka%20node%20failures%2C%20but%20only%20N/2%2D1%20Zookeeper%20failures.

Kind regards,
OSB









On Mon, 1 Apr 2024 at 21:34, David Arthur
<david.art...@confluent.io.invalid> wrote:
>
> Omer,
>
> Thanks for the email. This is an interesting thing to consider.
> Conceptually, there is no reason why the controllers couldn't manage the
> metadata for multiple brokers. The main counter-argument I can think of is
> essentially the same as the motivation -- less isolation. With a shared
> controller, one "noisy" broker cluster that put a lot of load on the
> controller could affect metadata availability/latency for other broker
> clusters. Related to this, having multiple broker clusters share one
> controller cluster means a larger blast radius for controller failures.
>
> The "noisy neighbor" problem could be mitigated with a good implementation,
> but the failure coupling cannot.
>
> In the containerized world, resources are abstracted away, so there is not
> so much overhead to run a set of dedicated controller nodes. Even with
> bare-metal hardware, controller processes can be run on the same nodes as
> broker processes if needed.
>
>
> The 2+1 data center example seems a bit tangential to me.
>
> > This way metadata and data would have different level of availability
> and it enable enterprises to design a more cost effective solution by
> separating metadata and data service layer
>
> Is the idea here to have a multi-region controller quorum and then single
> region broker clusters? Could you achieve the same thing with one large
> Kafka cluster spread across regions but with topics having assignments that
> kept them region local? Is the "cost effectiveness" you're after just
> inter-broker networking costs?
>
> Maybe you could expand on this scenario and help motivate it a bit more?
>
> -David

Reply via email to