Re: Few partitions stuck in under replication

2022-03-03 Thread Dhirendra Singh
Hi Tom,
During the rolling restart we check for under replicated partition count to
be zero in the readiness probe before restarting the next POD in order.
This issue never occurred before. It started after we upgraded kafka
version from 2.5.0 to 2.7.1.
So i suspect some bug introduced in the version after 2.5.0.

Thanks,
Dhirendra.

On Thu, Mar 3, 2022 at 11:09 PM Thomas Cooper  wrote:

> I suspect this nightly rolling will have something to do with your issues.
> If you are just rolling the stateful set in order, with no dependence on
> maintaining minISR and other Kafka considerations you are going to hit
> issues.
>
> If you are running on Kubernetes I would suggest using an Operator like
> Strimzi  which will do a lot of the Kafka admin
> tasks like this for you automatically.
>
> Tom
>
> On 03/03/2022 16:28, Dhirendra Singh wrote:
>
> Hi Tom,
> Doing the nightly restart is the decision of the cluster admin. I have no
> control on it.
> We have implementation using stateful set. restart is triggered by
> updating a annotation in the pod.
> Issue is not triggered by kafka cluster restart but the zookeeper servers
> restart.
>
> Thanks,
> Dhirendra.
>
> On Thu, Mar 3, 2022 at 7:19 PM Thomas Cooper  wrote:
>
>> Hi Dhirenda,
>>
>> Firstly, I am interested in why are you restarting the ZK and Kafka
>> cluster every night?
>>
>> Secondly, how are you doing the restarts. For example, in [Strimzi](
>> https://strimzi.io/), when we roll the Kafka cluster we leave the
>> designated controller broker until last. For each of the other brokers we
>> wait until all the partitions they are leaders for are above their minISR
>> and then we roll the broker. In this way we maintain availability and make
>> sure leadership can move off the rolling broker temporarily.
>>
>> Cheers,
>>
>> Tom Cooper
>>
>> [@tomncooper](https://twitter.com/tomncooper) | https://tomcooper.dev
>>
>> On 03/03/2022 07:38, Dhirendra Singh wrote:
>>
>> > Hi All,
>> >
>> > We have kafka cluster running in kubernetes. kafka version we are using
>> is
>> > 2.7.1.
>> > Every night zookeeper servers and kafka brokers are restarted.
>> > After the nightly restart of the zookeeper servers some partitions
>> remain
>> > stuck in under replication. This happens randomly but not at every
>> nightly
>> > restart.
>> > Partitions remain under replicated until kafka broker with the partition
>> > leader is restarted.
>> > For example partition 4 of consumer_offsets topic remain under
>> replicated
>> > and we see following error in the log...
>> >
>> > [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1]
>> > Controller failed to update ISR to PendingExpandIsr(isr=Set(1),
>> > newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying.
>> > (kafka.cluster.Partition)
>> > [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error
>> in
>> > request completion: (org.apache.kafka.clients.NetworkClient)
>> > java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request
>> with
>> > state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2),
>> > zkVersion=4719) for partition __consumer_offsets-4
>> > at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
>> > at
>> >
>> kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
>> > at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
>> > at
>> >
>> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
>> > at
>> >
>> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
>> > at
>> >
>> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
>> > at
>> >
>> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
>> > at scala.collection.immutable.List.foreach(List.scala:333)
>> > at
>> >
>> kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
>> > at
>> >
>> kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
>> > at
>> >
>> kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
>> > at
>> >
>> kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
>> > at
>> >
>> kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
>> > at
>> >
>> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
>> > at
>> >
>> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
>> > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
>> > at
>> kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
>> > at
>> >
>> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
>> > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
>> 

[VOTE] 3.0.1 RC0

2022-03-03 Thread Mickael Maison
Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.0.1.

Apache Kafka 3.0.1 is a bugfix release and 29 issues have been fixed
since 3.0.0.

Release notes for the 3.0.1 release:
https://home.apache.org/~mimaison/kafka-3.0.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Thursday, March 10, 6pm GMT ***

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~mimaison/kafka-3.0.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~mimaison/kafka-3.0.1-rc0/javadoc/

* Tag to be voted upon (off 3.0 branch) is the 3.0.1 tag:
https://github.com/apache/kafka/releases/tag/3.0.1-rc0

* Documentation:
https://kafka.apache.org/30/documentation.html

* Protocol:
https://kafka.apache.org/30/protocol.html

* Successful Jenkins builds for the 3.0 branch:
I'll share a link once the build complete

/**

Thanks,
Mickael


Re: Few partitions stuck in under replication

2022-03-03 Thread Thomas Cooper
I suspect this nightly rolling will have something to do with your issues. If 
you are just rolling the stateful set in order, with no dependence on 
maintaining minISR and other Kafka considerations you are going to hit issues.

If you are running on Kubernetes I would suggest using an Operator like 
[Strimzi](https://strimzi.io/) which will do a lot of the Kafka admin tasks 
like this for you automatically.

Tom

On 03/03/2022 16:28, Dhirendra Singh wrote:

> Hi Tom,
> Doing the nightly restart is the decision of the cluster admin. I have no 
> control on it.
> We have implementation using stateful set. restart is triggered by updating a 
> annotation in the pod.
> Issue is not triggered by kafka cluster restart but the zookeeper servers 
> restart.
> Thanks,
> Dhirendra.
>
> On Thu, Mar 3, 2022 at 7:19 PM Thomas Cooper  wrote:
>
>> Hi Dhirenda,
>>
>> Firstly, I am interested in why are you restarting the ZK and Kafka cluster 
>> every night?
>>
>> Secondly, how are you doing the restarts. For example, in 
>> [Strimzi](https://strimzi.io/), when we roll the Kafka cluster we leave the 
>> designated controller broker until last. For each of the other brokers we 
>> wait until all the partitions they are leaders for are above their minISR 
>> and then we roll the broker. In this way we maintain availability and make 
>> sure leadership can move off the rolling broker temporarily.
>>
>> Cheers,
>>
>> Tom Cooper
>>
>> [@tomncooper](https://twitter.com/tomncooper) | https://tomcooper.dev
>>
>> On 03/03/2022 07:38, Dhirendra Singh wrote:
>>
>>> Hi All,
>>>
>>> We have kafka cluster running in kubernetes. kafka version we are using is
>>> 2.7.1.
>>> Every night zookeeper servers and kafka brokers are restarted.
>>> After the nightly restart of the zookeeper servers some partitions remain
>>> stuck in under replication. This happens randomly but not at every nightly
>>> restart.
>>> Partitions remain under replicated until kafka broker with the partition
>>> leader is restarted.
>>> For example partition 4 of consumer_offsets topic remain under replicated
>>> and we see following error in the log...
>>>
>>> [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1]
>>> Controller failed to update ISR to PendingExpandIsr(isr=Set(1),
>>> newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying.
>>> (kafka.cluster.Partition)
>>> [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in
>>> request completion: (org.apache.kafka.clients.NetworkClient)
>>> java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with
>>> state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2),
>>> zkVersion=4719) for partition __consumer_offsets-4
>>> at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
>>> at
>>> kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
>>> at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
>>> at
>>> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
>>> at
>>> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
>>> at
>>> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
>>> at
>>> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
>>> at scala.collection.immutable.List.foreach(List.scala:333)
>>> at
>>> kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
>>> at
>>> kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
>>> at
>>> kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
>>> at
>>> kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
>>> at
>>> kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
>>> at
>>> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
>>> at
>>> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
>>> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
>>> at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
>>> at
>>> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
>>> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
>>> Looks like some kind of race condition bug...anyone has any idea ?
>>>
>>> Thanks,
>>> Dhirendra

--

Tom Cooper

[@tomncooper](https://twitter.com/tomncooper) | tomcooper.dev

Re: Few partitions stuck in under replication

2022-03-03 Thread Dhirendra Singh
Hi Tom,
Doing the nightly restart is the decision of the cluster admin. I have no
control on it.
We have implementation using stateful set. restart is triggered by updating
a annotation in the pod.
Issue is not triggered by kafka cluster restart but the zookeeper servers
restart.

Thanks,
Dhirendra.

On Thu, Mar 3, 2022 at 7:19 PM Thomas Cooper  wrote:

> Hi Dhirenda,
>
> Firstly, I am interested in why are you restarting the ZK and Kafka
> cluster every night?
>
> Secondly, how are you doing the restarts. For example, in [Strimzi](
> https://strimzi.io/), when we roll the Kafka cluster we leave the
> designated controller broker until last. For each of the other brokers we
> wait until all the partitions they are leaders for are above their minISR
> and then we roll the broker. In this way we maintain availability and make
> sure leadership can move off the rolling broker temporarily.
>
> Cheers,
>
> Tom Cooper
>
> [@tomncooper](https://twitter.com/tomncooper) | https://tomcooper.dev
>
> On 03/03/2022 07:38, Dhirendra Singh wrote:
>
> > Hi All,
> >
> > We have kafka cluster running in kubernetes. kafka version we are using
> is
> > 2.7.1.
> > Every night zookeeper servers and kafka brokers are restarted.
> > After the nightly restart of the zookeeper servers some partitions remain
> > stuck in under replication. This happens randomly but not at every
> nightly
> > restart.
> > Partitions remain under replicated until kafka broker with the partition
> > leader is restarted.
> > For example partition 4 of consumer_offsets topic remain under replicated
> > and we see following error in the log...
> >
> > [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1]
> > Controller failed to update ISR to PendingExpandIsr(isr=Set(1),
> > newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying.
> > (kafka.cluster.Partition)
> > [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error
> in
> > request completion: (org.apache.kafka.clients.NetworkClient)
> > java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request
> with
> > state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2),
> > zkVersion=4719) for partition __consumer_offsets-4
> > at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
> > at
> >
> kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
> > at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
> > at
> >
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
> > at
> >
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
> > at
> >
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
> > at
> >
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
> > at scala.collection.immutable.List.foreach(List.scala:333)
> > at
> >
> kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
> > at
> >
> kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
> > at
> >
> kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
> > at
> >
> kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
> > at
> >
> kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
> > at
> >
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> > at
> >
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
> > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
> > at
> kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
> > at
> >
> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
> > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
> > Looks like some kind of race condition bug...anyone has any idea ?
> >
> > Thanks,
> > Dhirendra


RE: Is MirrorMaker 2 horizontally scalable?

2022-03-03 Thread Dmitri Pavlov
Thank you very much Chris!

-Original Message-
From: Chris Egerton 
Sent: Thursday, March 3, 2022 5:30 PM
To: users@kafka.apache.org
Subject: Re: Is MirrorMaker 2 horizontally scalable?

Hi Dmitri,

There's at least one issue with MirrorMaker 2 that impacts horizontal 
scalability and has not yet been addressed:
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-9981data=04%7C01%7Cdpavlov%40perforce.com%7C53b6338153194b386f5408d9fd2abb2b%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C637819183008636777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=EKpH5nmW6OljbRz8HSUlrSp8f6p%2FZBeRPV6GpEJlLcU%3Dreserved=0.
 There is some work in progress to fix it ( 
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-710%253A%2BFull%2Bsupport%2Bfor%2Bdistributed%2Bmode%2Bin%2Bdedicated%2BMirrorMaker%2B2.0%2Bclustersdata=04%7C01%7Cdpavlov%40perforce.com%7C53b6338153194b386f5408d9fd2abb2b%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C637819183008636777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=6jTfhFhh3shJaS%2BPTTs5NJhFZ8uEXQ%2B1vlQfePecE08%3Dreserved=0),
but the effort hasn't received much attention to date.

There may be other issues as well, but until KAFKA-9981 is resolved, running 
MirrorMaker 2 in a multi-node cluster will be at best difficult and at worst, 
impossible.

Cheers,

Chris

On Thu, Mar 3, 2022 at 10:00 AM Dmitri Pavlov  wrote:

> Hi,
>
> A quick question, maybe you can help?
>
> Trying to follow this article
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwik
> i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-382%253A%2BMirrorMak
> er%2B2.0data=04%7C01%7Cdpavlov%40perforce.com%7C53b6338153194b386
> f5408d9fd2abb2b%7C95b666d19a7549ab95a38969fbcdc08c%7C0%7C0%7C637819183
> 008636777%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FrA3yOCYijD6gP7lUDiD
> 6SutEg3cN%2Bo3Dywlr%2BIZsdk%3Dreserved=0
> -> "Walkthrough: Running MirrorMaker 2.0", and the last lines in the
> paragraph are
>
> ==
> Second, launch one or more MirrorMaker cluster nodes:
> $ ./bin/connect-mirror-maker.sh mm2.properties ==
>
> But apparently it does not work this way, one of 2 simultaneously
> started instance will remain idle, confirmed with Jconsole -> Mbeans.
> Setup: Broker A (in cluster A) -> MM2 2 instances -> Broker B (cluster
> B), for simplicity there is only one broker per cluster.
> A simple experiment is, when one instance is configured and started to
> replicate topic A only and another topic B only, only one topic will
> be replicated, when 2 instances are running in parallel. While, if
> only one of the instances is running at a time, each topic will be replicated 
> correctly.
>
> The main question -> is Mirrormaker 2 horizontally scalable? And if
> yes, would be possible to share a link to a document that describes
> the setup process?
>
> Thanks in advance,
> Dmitri.
>
>
> This e-mail may contain information that is privileged or
> confidential. If you are not the intended recipient, please delete the
> e-mail and any attachments and notify us immediately.
>
>


CAUTION: This email originated from outside of the organization. Do not click 
on links or open attachments unless you recognize the sender and know the 
content is safe.

This e-mail may contain information that is privileged or confidential. If you 
are not the intended recipient, please delete the e-mail and any attachments 
and notify us immediately.



Re: Is MirrorMaker 2 horizontally scalable?

2022-03-03 Thread Chris Egerton
Hi Dmitri,

There's at least one issue with MirrorMaker 2 that impacts horizontal
scalability and has not yet been addressed:
https://issues.apache.org/jira/browse/KAFKA-9981. There is some work in
progress to fix it (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters),
but the effort hasn't received much attention to date.

There may be other issues as well, but until KAFKA-9981 is resolved,
running MirrorMaker 2 in a multi-node cluster will be at best difficult and
at worst, impossible.

Cheers,

Chris

On Thu, Mar 3, 2022 at 10:00 AM Dmitri Pavlov  wrote:

> Hi,
>
> A quick question, maybe you can help?
>
> Trying to follow this article
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> -> "Walkthrough: Running MirrorMaker 2.0", and the last lines in the
> paragraph are
>
> ==
> Second, launch one or more MirrorMaker cluster nodes:
> $ ./bin/connect-mirror-maker.sh mm2.properties
> ==
>
> But apparently it does not work this way, one of 2 simultaneously started
> instance will remain idle, confirmed with Jconsole -> Mbeans.
> Setup: Broker A (in cluster A) -> MM2 2 instances -> Broker B (cluster B),
> for simplicity there is only one broker per cluster.
> A simple experiment is, when one instance is configured and started to
> replicate topic A only and another topic B only, only one topic will be
> replicated, when 2 instances are running in parallel. While, if only one of
> the instances is running at a time, each topic will be replicated correctly.
>
> The main question -> is Mirrormaker 2 horizontally scalable? And if yes,
> would be possible to share a link to a document that describes the setup
> process?
>
> Thanks in advance,
> Dmitri.
>
>
> This e-mail may contain information that is privileged or confidential. If
> you are not the intended recipient, please delete the e-mail and any
> attachments and notify us immediately.
>
>


Is MirrorMaker 2 horizontally scalable?

2022-03-03 Thread Dmitri Pavlov
Hi,

A quick question, maybe you can help?

Trying to follow this article 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 -> 
"Walkthrough: Running MirrorMaker 2.0", and the last lines in the paragraph are

==
Second, launch one or more MirrorMaker cluster nodes:
$ ./bin/connect-mirror-maker.sh mm2.properties
==

But apparently it does not work this way, one of 2 simultaneously started 
instance will remain idle, confirmed with Jconsole -> Mbeans.
Setup: Broker A (in cluster A) -> MM2 2 instances -> Broker B (cluster B), for 
simplicity there is only one broker per cluster.
A simple experiment is, when one instance is configured and started to 
replicate topic A only and another topic B only, only one topic will be 
replicated, when 2 instances are running in parallel. While, if only one of the 
instances is running at a time, each topic will be replicated correctly.

The main question -> is Mirrormaker 2 horizontally scalable? And if yes, would 
be possible to share a link to a document that describes the setup process?

Thanks in advance,
Dmitri.


This e-mail may contain information that is privileged or confidential. If you 
are not the intended recipient, please delete the e-mail and any attachments 
and notify us immediately.



Re: Few partitions stuck in under replication

2022-03-03 Thread Thomas Cooper
Hi Dhirenda,

Firstly, I am interested in why are you restarting the ZK and Kafka cluster 
every night?

Secondly, how are you doing the restarts. For example, in 
[Strimzi](https://strimzi.io/), when we roll the Kafka cluster we leave the 
designated controller broker until last. For each of the other brokers we wait 
until all the partitions they are leaders for are above their minISR and then 
we roll the broker. In this way we maintain availability and make sure 
leadership can move off the rolling broker temporarily.

Cheers,

Tom Cooper

[@tomncooper](https://twitter.com/tomncooper) | https://tomcooper.dev

On 03/03/2022 07:38, Dhirendra Singh wrote:

> Hi All,
>
> We have kafka cluster running in kubernetes. kafka version we are using is
> 2.7.1.
> Every night zookeeper servers and kafka brokers are restarted.
> After the nightly restart of the zookeeper servers some partitions remain
> stuck in under replication. This happens randomly but not at every nightly
> restart.
> Partitions remain under replicated until kafka broker with the partition
> leader is restarted.
> For example partition 4 of consumer_offsets topic remain under replicated
> and we see following error in the log...
>
> [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1]
> Controller failed to update ISR to PendingExpandIsr(isr=Set(1),
> newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying.
> (kafka.cluster.Partition)
> [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in
> request completion: (org.apache.kafka.clients.NetworkClient)
> java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with
> state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2),
> zkVersion=4719) for partition __consumer_offsets-4
> at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
> at
> kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
> at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
> at
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
> at
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
> at
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
> at
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
> at scala.collection.immutable.List.foreach(List.scala:333)
> at
> kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
> at
> kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
> at
> kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
> at
> kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
> at
> kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
> at
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
> at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
> at
> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
> Looks like some kind of race condition bug...anyone has any idea ?
>
> Thanks,
> Dhirendra

Re: Few partitions stuck in under replication

2022-03-03 Thread Fares Oueslati
I don't know about the root cause, but if you're trying to solve the issue,
restarting the controller broker pod should do the trick.

Fares

On Thu, Mar 3, 2022 at 8:38 AM Dhirendra Singh 
wrote:

> Hi All,
>
> We have kafka cluster running in kubernetes. kafka version we are using is
> 2.7.1.
> Every night zookeeper servers and kafka brokers are restarted.
> After the nightly restart of the zookeeper servers some partitions remain
> stuck in under replication. This happens randomly but not at every nightly
> restart.
> Partitions remain under replicated until kafka broker with the partition
> leader is restarted.
> For example partition 4 of consumer_offsets topic remain under replicated
> and we see following error in the log...
>
> [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1]
> Controller failed to update ISR to PendingExpandIsr(isr=Set(1),
> newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying.
> (kafka.cluster.Partition)
> [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in
> request completion: (org.apache.kafka.clients.NetworkClient)
> java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with
> state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2),
> zkVersion=4719) for partition __consumer_offsets-4
> at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
> at
>
> kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
> at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
> at
>
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
> at
>
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
> at
>
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
> at
>
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
> at scala.collection.immutable.List.foreach(List.scala:333)
> at
>
> kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
> at
>
> kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
> at
>
> kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
> at
>
> kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
> at
>
> kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
> at
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at
>
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
> at
> kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
> at
>
> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
> Looks like some kind of race condition bug...anyone has any idea ?
>
> Thanks,
> Dhirendra
>