[jira] [Updated] (KAFKA-16296) Broker shrinks ISR when restarting

2024-02-23 Thread Colin Leroy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Leroy updated KAFKA-16296:

Description: 
We have a rolling-restart problem we don't understand on a 3-node cluster.

When stopping a broker, everything goes fine and the partitions are reassigned 
to the other brokers.

When that broker restarts, it shrinks ISR because of "Out of sync replicas", a 
few minutes after having restarted (here, the restart was at 10:11) :
{code:java}
[2024-02-22 10:18:02,069] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] Shrinking ISR from 2,1,3 to 3. Leader: (highWatermark: 704389542, 
endOffset: 704395843). Out of sync replicas: (brokerId: 2, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335) (brokerId: 1, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335). (kafka.cluster.Partition)

[2024-02-22 10:18:02,124] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] ISR updated to 3 (under-min-isr) and version updated to 1075 
(kafka.cluster.Partition) {code}
I do not understand why brokers 1 and 2 would be out of sync, it seems to me 
that given that brokers 1 and 2 were not restarted, they should be in sync.

This, of course, causes problems as producers reconnect to broker 3 only to 
find the min ISR requirement is not fullfilled.

I have attached the logs for one of the affected partitions, both from broker 3 
(the restarted one) and broker 2 (not restarted).

Thanks in advance,

Colin

  was:
We have a rolling-restart problem we don't understand on a 3-node cluster.

When stopping a broker, everything goes fine and the partitions are reassigned 
to the other brokers.

When that broker restarts, it shrinks ISR because of "Out of sync replicas":
{code:java}
[2024-02-22 10:18:02,069] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] Shrinking ISR from 2,1,3 to 3. Leader: (highWatermark: 704389542, 
endOffset: 704395843). Out of sync replicas: (brokerId: 2, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335) (brokerId: 1, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335). (kafka.cluster.Partition)

[2024-02-22 10:18:02,124] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] ISR updated to 3 (under-min-isr) and version updated to 1075 
(kafka.cluster.Partition) {code}
I do not understand why brokers 1 and 2 would be out of sync, it seems to me 
that given that brokers 1 and 2 were not restarted, they should be in sync.

This, of course, causes problems as producers reconnect to broker 3 only to 
find the min ISR requirement is not fullfilled.

I have attached the logs for one of the affected partitions, both from broker 3 
(the restarted one) and broker 2 (not restarted).

Thanks in advance,

Colin


> Broker shrinks ISR when restarting
> --
>
> Key: KAFKA-16296
> URL: https://issues.apache.org/jira/browse/KAFKA-16296
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.6.1
>Reporter: Colin Leroy
>Priority: Major
> Attachments: broker2.log, broker3.log
>
>
> We have a rolling-restart problem we don't understand on a 3-node cluster.
> When stopping a broker, everything goes fine and the partitions are 
> reassigned to the other brokers.
> When that broker restarts, it shrinks ISR because of "Out of sync replicas", 
> a few minutes after having restarted (here, the restart was at 10:11) :
> {code:java}
> [2024-02-22 10:18:02,069] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
> broker=3] Shrinking ISR from 2,1,3 to 3. Leader: (highWatermark: 704389542, 
> endOffset: 704395843). Out of sync replicas: (brokerId: 2, endOffset: -1, 
> lastCaughtUpTimeMs: 1708593437335) (brokerId: 1, endOffset: -1, 
> lastCaughtUpTimeMs: 1708593437335). (kafka.cluster.Partition)
> [2024-02-22 10:18:02,124] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
> broker=3] ISR updated to 3 (under-min-isr) and version updated to 1075 
> (kafka.cluster.Partition) {code}
> I do not understand why brokers 1 and 2 would be out of sync, it seems to me 
> that given that brokers 1 and 2 were not restarted, they should be in sync.
> This, of course, causes problems as producers reconnect to broker 3 only to 
> find the min ISR requirement is not fullfilled.
> I have attached the logs for one of the affected partitions, both from broker 
> 3 (the restarted one) and broker 2 (not restarted).
> Thanks in advance,
> Colin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16296) Broker shrinks ISR when restarting

2024-02-22 Thread Colin Leroy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Leroy updated KAFKA-16296:

Description: 
We have a rolling-restart problem we don't understand on a 3-node cluster.

When stopping a broker, everything goes fine and the partitions are reassigned 
to the other brokers.

When that broker restarts, it shrinks ISR because of "Out of sync replicas":
{code:java}
[2024-02-22 10:18:02,069] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] Shrinking ISR from 2,1,3 to 3. Leader: (highWatermark: 704389542, 
endOffset: 704395843). Out of sync replicas: (brokerId: 2, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335) (brokerId: 1, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335). (kafka.cluster.Partition)

[2024-02-22 10:18:02,124] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] ISR updated to 3 (under-min-isr) and version updated to 1075 
(kafka.cluster.Partition) {code}
I do not understand why brokers 1 and 2 would be out of sync, it seems to me 
that given that brokers 1 and 2 were not restarted, they should be in sync.

This, of course, causes problems as producers reconnect to broker 3 only to 
find the min ISR requirement is not fullfilled.

I have attached the logs for one of the affected partitions, both from broker 3 
(the restarted one) and broker 2 (not restarted).

Thanks in advance,

Colin

  was:
We have a rolling-restart problem we don't understand on a 3-node cluster.

When stopping a broker, everything goes fine and the partitions are reassigned 
to the other brokers.

When that broker restarts, it shrinks ISR because of "Out of sync replicas":
{code:java}
[2024-02-22 10:18:02,069] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] Shrinking ISR from 2,1,3 to 3. Leader: (highWatermark: 704389542, 
endOffset: 704395843). Out of sync replicas: (brokerId: 2, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335) (brokerId: 1, endOffset: -1, 
lastCaughtUpTimeMs: 1708593437335). (kafka.cluster.Partition)

[2024-02-22 10:18:02,124] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
broker=3] ISR updated to 3 (under-min-isr) and version updated to 1075 
(kafka.cluster.Partition) {code}
I do not understand why brokers 1 and 2 would be out of sync, it seems to me 
that given that brokers 1 and 2 were not restarted, they should be in sync.

This, of course, causes problems as producers reconnect to broker 3 only to 
find the min ISR requirement is not fullfilled.

Thanks in advance,

Colin


> Broker shrinks ISR when restarting
> --
>
> Key: KAFKA-16296
> URL: https://issues.apache.org/jira/browse/KAFKA-16296
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.6.1
>Reporter: Colin Leroy
>Priority: Major
> Attachments: broker2.log, broker3.log
>
>
> We have a rolling-restart problem we don't understand on a 3-node cluster.
> When stopping a broker, everything goes fine and the partitions are 
> reassigned to the other brokers.
> When that broker restarts, it shrinks ISR because of "Out of sync replicas":
> {code:java}
> [2024-02-22 10:18:02,069] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
> broker=3] Shrinking ISR from 2,1,3 to 3. Leader: (highWatermark: 704389542, 
> endOffset: 704395843). Out of sync replicas: (brokerId: 2, endOffset: -1, 
> lastCaughtUpTimeMs: 1708593437335) (brokerId: 1, endOffset: -1, 
> lastCaughtUpTimeMs: 1708593437335). (kafka.cluster.Partition)
> [2024-02-22 10:18:02,124] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
> broker=3] ISR updated to 3 (under-min-isr) and version updated to 1075 
> (kafka.cluster.Partition) {code}
> I do not understand why brokers 1 and 2 would be out of sync, it seems to me 
> that given that brokers 1 and 2 were not restarted, they should be in sync.
> This, of course, causes problems as producers reconnect to broker 3 only to 
> find the min ISR requirement is not fullfilled.
> I have attached the logs for one of the affected partitions, both from broker 
> 3 (the restarted one) and broker 2 (not restarted).
> Thanks in advance,
> Colin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16296) Broker shrinks ISR when restarting

2024-02-22 Thread Colin Leroy (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Leroy updated KAFKA-16296:

Attachment: broker2.log
broker3.log

> Broker shrinks ISR when restarting
> --
>
> Key: KAFKA-16296
> URL: https://issues.apache.org/jira/browse/KAFKA-16296
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.6.1
>Reporter: Colin Leroy
>Priority: Major
> Attachments: broker2.log, broker3.log
>
>
> We have a rolling-restart problem we don't understand on a 3-node cluster.
> When stopping a broker, everything goes fine and the partitions are 
> reassigned to the other brokers.
> When that broker restarts, it shrinks ISR because of "Out of sync replicas":
> {code:java}
> [2024-02-22 10:18:02,069] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
> broker=3] Shrinking ISR from 2,1,3 to 3. Leader: (highWatermark: 704389542, 
> endOffset: 704395843). Out of sync replicas: (brokerId: 2, endOffset: -1, 
> lastCaughtUpTimeMs: 1708593437335) (brokerId: 1, endOffset: -1, 
> lastCaughtUpTimeMs: 1708593437335). (kafka.cluster.Partition)
> [2024-02-22 10:18:02,124] INFO [Partition OSS.PREPROD.Monitoring.Metric-5 
> broker=3] ISR updated to 3 (under-min-isr) and version updated to 1075 
> (kafka.cluster.Partition) {code}
> I do not understand why brokers 1 and 2 would be out of sync, it seems to me 
> that given that brokers 1 and 2 were not restarted, they should be in sync.
> This, of course, causes problems as producers reconnect to broker 3 only to 
> find the min ISR requirement is not fullfilled.
> Thanks in advance,
> Colin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)