Does the kafka controller protocol have a backup?

2024-04-18 Thread Dima Brodsky
Hello,

I am in the process of setting up a kafka cluster which is configured to
use KRaft.  There is a set of three controller nodes and a set of six
brokers.  Both the controllers and the brokers are configured to use mTLS
(Mutual TLS).  So the part of the controller config looks like:

listeners=CONTROLLER://:9097
listener.security.protocol.map=CONTROLLER:SSL
controller.listener.names=CONTROLLER

Now the certificates initially were missing a SAN that corresponded to the
fqdn of the nodes.  The fqdn was used in creating the controller quorum
voters config.

When the controllers started up I did not see any errors or issues.  When
the brokers started up I saw a couple of SSL connection errors when it
tried to connect to the controllers, giving the controller hostname was
missing from the SAN of the certificate.  But the whole cluster seemed to
function normally.  No other errors and everything was in sync.  And the
kafka-metadata-quorum.sh ... describe --status showed the correct status of
the controllers and the brokers.

I fixed the SAN in the cert and the errors went away on the brokers.

My question is if the certs prevented the SSL connection from being
established between the brokers and the controllers, or even between the
controllers, is there some fallback that was used?  PLAINTEXT or was some
of the validation skipped?

Thanks!
ttyl
Dima


Re: Kraft controller readiness checks

2024-04-18 Thread Luke Chen
Hello Frank,

That's a good question.
I think we all know there is no "correct" answer for this question. But I
can share with you what our team did for it.

Readiness: controller is listening on the controller.listener.names

The rationale behind it is:
1. The last step for the controller node startup is to wait until all the
SocketServer ports to be open, and the Acceptors to be started, and the
controller port is one of them.
2. This controller listener is used to talk to other controllers (voters)
to form the raft quorum, so if it is not open and listening, the controller
is basically not working at all.
3. The controller listener is also used for brokers (observers) to get the
updated raft quorum info and fetch metadata.

Compared with Zookeeper cluster, which is the KRaft quorum is trying to
replace with, the liveness/readiness probe that recommended in Kubernetes
tutorial

is also doing "ruok" check for the pod. And the handler for this "ruok"
command

in the Zookeeper server side, is returning "imok" directly, which means
it's just doing connection check only. So we think this check makes sense.

Here's our design proposal

for the Liveness and Readiness probes in a KRaft Kafka cluster, FYI.
But again, I still think there's no "correct" answer for it. If you have
any better ideas, please let us know.

However, I have some suggestions for your readiness probe for brokers:

> our brokers are configured to use a script which marks the containers as
unready if under-replicated partitions exist. With this readiness check and
a pod disruption budget of the minimum in sync replica - 1

I understand it works well, but it has some drawbacks, and the biggest
issue I can think of is: it's possible to cause unavailability in some
partitions.
For example: 3 brokers in the cluster: 0, 1, 2, and 10 topic partitions are
hosted in broker 0.
a. Broker 0 is shutting down, all partitions in broker 0 are becoming
follower.
b. Broker 0 is starting up, all the followers are trying to catch up with
the leader.
c. 9 out of 10 partitions are caught up and joined ISR group. At this
point, this pod is still unready because there's still 1 partition is under
replicated.
d. Some of the partitions in broker 0 are becoming leader, for example,
auto leader rebalance is triggered.
e. For the leader partitions in broker 0 are now unavailable because the
pod is not in ready state, it cannot serve incoming requests.

In our team, we use the brokerState metric value = RUNNING state for
readiness probe. In KRaft mode, the broker will enter RUNNING state after
the broker has caught up with the controller for metadata, and start to
serve requests from clients. We think that makes more senses.
Again, for more details, you can check the design proposal

for the Liveness and Readiness probes in a KRaft Kafka cluster.

Finally, I saw you didn't have operators for Kafka clusters.
I don't know how you manage all these kafka clusters manually, but there
must be some cumbersome operations, like rolling pods.
Let's say now you want to roll the pods 1 by 1, which pod will you go
first?
And which pod goes last?
Will you do any check before rolling?
How much time does it take for each rolling?
...

I'm just listing some of the problems they might have. So I would recommend
deploying an operator to help manage the kafka clusters.
This is our design proposal

for Kafka roller in operator for KRaft. FYI.

And now, I'm totally biased, but Stirmzi
 provides an fully
open-source operator to manager kafka cluster on Kubernetes.
Welcome to try it (hopefully it will help you manage kafka clusters), join
the community to ask questions, join discussions, or contribute to it.

Thank you.
Luke













On Fri, Apr 19, 2024 at 4:19 AM Francesco Burato 
wrote:

> Hello,
>
> I have a question regarding the deployment of Kafka using Kraft
> controllers in a Kubernetes environment. Our current Kafka cluster is
> deployed on K8S clusters as statefulsets without operators and our brokers
> are configured to use a script which marks the containers as unready if
> under-replicated partitions exist. With this readiness check and a pod
> disruption budget of the minimum in sync replica - 1, we are able to
> perform rollout restarts of our brokers automatically without ever
> producing consumers and producers errors.
>
> We have started the processes of transitioning to Kraft and based on the
> recommended deployment strategy we are 

Kraft controller readiness checks

2024-04-18 Thread Francesco Burato
Hello,

I have a question regarding the deployment of Kafka using Kraft controllers in 
a Kubernetes environment. Our current Kafka cluster is deployed on K8S clusters 
as statefulsets without operators and our brokers are configured to use a 
script which marks the containers as unready if under-replicated partitions 
exist. With this readiness check and a pod disruption budget of the minimum in 
sync replica - 1, we are able to perform rollout restarts of our brokers 
automatically without ever producing consumers and producers errors.

We have started the processes of transitioning to Kraft and based on the 
recommended deployment strategy we are going to define dedicated nodes as 
controllers instead of using combined servers. However, defining nodes as 
controller does not seem to allow to use the same strategy for readiness check 
as the kafka-topics.sh does not appear to be executable on controller brokers.

The question is: what is a reliable readiness check that can be used for Kraft 
controllers that ensures that rollout restart can be performed safely?

Thanks,

Frank

--
Francesco Burato | Software Development Engineer | Adobe | 
bur...@adobe.com



Re: Streams group final result: EmitStrategy vs Suppressed

2024-04-18 Thread Matthias J. Sax
The main difference is the internal implementation. Semantically, both 
are equivalent.


suppress() uses an in-memory buffer, while `emitStrategy()` does not, 
but modifies the upstream aggregation operator impl, and waits to send 
results downstream, and thus, it's RocksDB based.



-Matthias


On 4/12/24 10:37 AM, Ayoub wrote:

Hello,

*[Not sure if my email went through as I was not subscribed to this mailing
list. Here is my original email]*

I found that there are two ways to send only the final result of a windowed
groupBy, either using Suppressed
.untilWindowCloses
on the final KTable or EmitStrategy

on
the windowed stream.

I tried to compare both but didn't find differences in the result they give.

Are there any differences apart from the moment they are defined within the
pipeline. And is there any preference on using one or the other ?

Thanks,
Ayoub


Le ven. 12 avr. 2024 à 11:50, Ayoub  a écrit :


Hello,

I found that there are two ways to send only the final result of a
windowed groupBy, either using Suppressed
.untilWindowCloses
on the final KTable or EmitStrategy

 on
the windowed stream.

I tried to compare both but didn't find differences in the result they
give.

Are there any differences apart from the moment they are defined within
the pipeline. And Is there any preference on using one or the other ?

Thanks,
Ayoub





Re: Is there any recommendation about header max size?

2024-04-18 Thread Matthias J. Sax
I don't think that there is any specific recommendation. However, there 
is an overall max-message-size config that you need to keep in mind.


-Matthias

On 4/16/24 9:42 AM, Gabriel Giussi wrote:

I have logic in my service to capture exceptions being thrown during
message processing and produce a new message to a different topic with
information about the error. The idea is to leave the message unmodified,
aka produce the exact same bytes to this new topic, therefore I'm planning
on adding the java exception as a header.
By looking at the documentation it is just an array of bytes and it doesn't
say anything about a max size but is there any recommendation about it?
https://kafka.apache.org/documentation/#recordheader



Re: [ANNOUNCE] New Kafka PMC Member: Greg Harris

2024-04-18 Thread Matthias J. Sax

Congrats Greg!

On 4/15/24 10:44 AM, Hector Geraldino (BLOOMBERG/ 919 3RD A) wrote:

Congrats! Well deserved

From: d...@kafka.apache.org At: 04/13/24 14:42:22 UTC-4:00To:  
d...@kafka.apache.org
Subject: [ANNOUNCE] New Kafka PMC Member: Greg Harris

Hi all,

Greg Harris has been a Kafka committer since July 2023. He has remained
very active and instructive in the community since becoming a committer.
It's my pleasure to announce that Greg is now a member of Kafka PMC.

Congratulations, Greg!

Chris, on behalf of the Apache Kafka PMC