Hello Frank,

That's a good question.
I think we all know there is no "correct" answer for this question. But I
can share with you what our team did for it.

Readiness: controller is listening on the controller.listener.names

The rationale behind it is:
1. The last step for the controller node startup is to wait until all the
SocketServer ports to be open, and the Acceptors to be started, and the
controller port is one of them.
2. This controller listener is used to talk to other controllers (voters)
to form the raft quorum, so if it is not open and listening, the controller
is basically not working at all.
3. The controller listener is also used for brokers (observers) to get the
updated raft quorum info and fetch metadata.

Compared with Zookeeper cluster, which is the KRaft quorum is trying to
replace with, the liveness/readiness probe that recommended in Kubernetes
tutorial
<https://kubernetes.io/docs/tutorials/stateful-application/zookeeper/#testing-for-liveness>
is also doing "ruok" check for the pod. And the handler for this "ruok"
command
<https://github.com/apache/zookeeper/blob/d12aba599233b0fcba0b9b945ed3d2f45d4016f0/zookeeper-server/src/main/java/org/apache/zookeeper/server/command/RuokCommand.java#L32>
in the Zookeeper server side, is returning "imok" directly, which means
it's just doing connection check only. So we think this check makes sense.

Here's our design proposal
<https://github.com/strimzi/proposals/blob/main/046-kraft-liveness-readiness.md>
for the Liveness and Readiness probes in a KRaft Kafka cluster, FYI.
But again, I still think there's no "correct" answer for it. If you have
any better ideas, please let us know.

However, I have some suggestions for your readiness probe for brokers:

> our brokers are configured to use a script which marks the containers as
unready if under-replicated partitions exist. With this readiness check and
a pod disruption budget of the minimum in sync replica - 1

I understand it works well, but it has some drawbacks, and the biggest
issue I can think of is: it's possible to cause unavailability in some
partitions.
For example: 3 brokers in the cluster: 0, 1, 2, and 10 topic partitions are
hosted in broker 0.
a. Broker 0 is shutting down, all partitions in broker 0 are becoming
follower.
b. Broker 0 is starting up, all the followers are trying to catch up with
the leader.
c. 9 out of 10 partitions are caught up and joined ISR group. At this
point, this pod is still unready because there's still 1 partition is under
replicated.
d. Some of the partitions in broker 0 are becoming leader, for example,
auto leader rebalance is triggered.
e. For the leader partitions in broker 0 are now unavailable because the
pod is not in ready state, it cannot serve incoming requests.

In our team, we use the brokerState metric value = RUNNING state for
readiness probe. In KRaft mode, the broker will enter RUNNING state after
the broker has caught up with the controller for metadata, and start to
serve requests from clients. We think that makes more senses.
Again, for more details, you can check the design proposal
<https://github.com/strimzi/proposals/blob/main/046-kraft-liveness-readiness.md>
for the Liveness and Readiness probes in a KRaft Kafka cluster.

Finally, I saw you didn't have operators for Kafka clusters.
I don't know how you manage all these kafka clusters manually, but there
must be some cumbersome operations, like rolling pods.
Let's say now you want to roll the pods 1 by 1, which pod will you go
first?
And which pod goes last?
Will you do any check before rolling?
How much time does it take for each rolling?
...

I'm just listing some of the problems they might have. So I would recommend
deploying an operator to help manage the kafka clusters.
This is our design proposal
<https://github.com/strimzi/proposals/blob/main/060-kafka-roller-kraft.md>
for Kafka roller in operator for KRaft. FYI.

And now, I'm totally biased, but Stirmzi
<https://github.com/strimzi/strimzi-kafka-operator> provides an fully
open-source operator to manager kafka cluster on Kubernetes.
Welcome to try it (hopefully it will help you manage kafka clusters), join
the community to ask questions, join discussions, or contribute to it.

Thank you.
Luke













On Fri, Apr 19, 2024 at 4:19 AM Francesco Burato <bur...@adobe.com.invalid>
wrote:

> Hello,
>
> I have a question regarding the deployment of Kafka using Kraft
> controllers in a Kubernetes environment. Our current Kafka cluster is
> deployed on K8S clusters as statefulsets without operators and our brokers
> are configured to use a script which marks the containers as unready if
> under-replicated partitions exist. With this readiness check and a pod
> disruption budget of the minimum in sync replica - 1, we are able to
> perform rollout restarts of our brokers automatically without ever
> producing consumers and producers errors.
>
> We have started the processes of transitioning to Kraft and based on the
> recommended deployment strategy we are going to define dedicated nodes as
> controllers instead of using combined servers. However, defining nodes as
> controller does not seem to allow to use the same strategy for readiness
> check as the kafka-topics.sh does not appear to be executable on controller
> brokers.
>
> The question is: what is a reliable readiness check that can be used for
> Kraft controllers that ensures that rollout restart can be performed safely?
>
> Thanks,
>
> Frank
>
> --
> Francesco Burato | Software Development Engineer | Adobe |
> bur...@adobe.com<mailto:bur...@adobe.com>
>
>

Reply via email to