[jira] [Commented] (KAFKA-15467) Kafka broker returns offset out of range for topic/partitions on restart from unclean shutdown

2024-02-19 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818572#comment-17818572
 ] 

Steve Jacobs commented on KAFKA-15467:
--

This isn't a kafka or mm2 bug per se. This is all down to timing. Unclean 
shutdown forced no writes to disk on the single broker setup, so when the power 
came back, MM2 had consumed 'past the end of offsets on disk'. And since we 
were flushing to disk every minute, it only took 1 minute for offsets to 'catch 
up', making it appear like the offsets were on the broker when in fact they 
were not. 

[https://github.com/apache/kafka/pull/14567]

Should let me work around this issue when it merges, hopefully in kafka 3.7.0

 

> Kafka broker returns offset out of range for topic/partitions on restart from 
> unclean shutdown
> --
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 3.5.1
> Environment: Apache Kafka 3.5.1 with Strimzi on kubernetes.
>Reporter: Steve Jacobs
>Priority: Major
>
> So this started with me thinking this was a mirrormaker2 issue because here 
> are the symptoms I am seeing:
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: 

[jira] [Commented] (KAFKA-15467) Kafka broker returns offset out of range for topic/partitions on restart from unclean shutdown

2024-02-07 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815385#comment-17815385
 ] 

Steve Jacobs commented on KAFKA-15467:
--

The way to reproduce this is an unclean shutdown of the broker. Every time I 
kill or power off a node I can reproduce this problem. 

Personally: It is extremely frustating that no one has looked at or responded 
to this issue. I've reached out on the mailing lists, asked on slack (both 
confluent and apache), and I have not received a single response on this issue. 
Not even a "oh that looks interesting". I feel like a ghost and it is 
disheartening to say the least.

> Kafka broker returns offset out of range for topic/partitions on restart from 
> unclean shutdown
> --
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 3.5.1
> Environment: Apache Kafka 3.5.1 with Strimzi on kubernetes.
>Reporter: Steve Jacobs
>Priority: Major
>
> So this started with me thinking this was a mirrormaker2 issue because here 
> are the symptoms I am seeing:
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> 

[jira] [Updated] (KAFKA-15467) Kafka broker returns offset out of range for topic/partitions on restart from unclean shutdown

2024-01-03 Thread Steve Jacobs (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Jacobs updated KAFKA-15467:
-
Component/s: log

> Kafka broker returns offset out of range for topic/partitions on restart from 
> unclean shutdown
> --
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: core, log
>Affects Versions: 3.5.1
> Environment: Apache Kafka 3.5.1 with Strimzi on kubernetes.
>Reporter: Steve Jacobs
>Priority: Major
>
> So this started with me thinking this was a mirrormaker2 issue because here 
> are the symptoms I am seeing:
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate data is very not good. It 

[jira] [Updated] (KAFKA-15467) Kafka broker returns offset out of range for topic/partitions on restart from unclean shutdown

2024-01-03 Thread Steve Jacobs (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Jacobs updated KAFKA-15467:
-
Component/s: core

> Kafka broker returns offset out of range for topic/partitions on restart from 
> unclean shutdown
> --
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 3.5.1
> Environment: Apache Kafka 3.5.1 with Strimzi on kubernetes.
>Reporter: Steve Jacobs
>Priority: Major
>
> So this started with me thinking this was a mirrormaker2 issue because here 
> are the symptoms I am seeing:
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate data is very not good. It almost 

[jira] [Closed] (KAFKA-10133) Cannot compress messages in destination cluster with MM2

2024-01-03 Thread Steve Jacobs (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Jacobs closed KAFKA-10133.


> Cannot compress messages in destination cluster with MM2
> 
>
> Key: KAFKA-10133
> URL: https://issues.apache.org/jira/browse/KAFKA-10133
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
> Environment: kafka 2.5.0 deployed via the strimzi operator 0.18
>Reporter: Steve Jacobs
>Assignee: Ning Zhang
>Priority: Minor
> Fix For: 2.7.0
>
>
> When configuring mirrormaker2 using kafka connect, it is not possible to 
> configure things such that messages are compressed in the destination 
> cluster. Dump Log shows that batching is occuring, but no compression. If 
> this is possible, then this is a documentation bug, because I can find no 
> documentation on how to do this.
> baseOffset: 4208 lastOffset: 4492 count: 285 baseSequence: -1 lastSequence: 
> -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 2 isTransactional: 
> false isControl: false position: 239371 CreateTime: 1591745894859 size: 16362 
> magic: 2 compresscodec: NONE crc: 1811507259 isvalid: true
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15467) Kafka broker returns offset out of range for topic/partitions on restart from unclean shutdown

2023-09-28 Thread Steve Jacobs (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Jacobs updated KAFKA-15467:
-
Component/s: (was: mirrormaker)
Description: 
So this started with me thinking this was a mirrormaker2 issue because here are 
the symptoms I am seeing:

I'm encountering an odd issue with mirrormaker2 with our remote replication 
setup to high latency remote sites (satellite).

Every few days we get several topics completely re-replicated, this appears to 
happen after a network connectivity outage. It doesn't matter if it's a long 
outage (hours) or a short one (minutes). And it only seems to affect a few 
topics.

I was finally able to track down some logs showing the issue. This was after an 
hour-ish long outage where connectivity went down. There were lots of logs 
about connection timeouts, etc. Here is the relevant part when the connection 
came back up:
{code:java}
2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
[AdminClient 
clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
 Disconnecting from node 0 due to socket connection setup timeout. The timeout 
value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
[kafka-admin-client-thread | 
mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
[AdminClient 
clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
 Metadata update failed 
(org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
[kafka-admin-client-thread | 
mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Disconnecting from node 0 due to socket connection setup 
timeout. The timeout value is 52624 ms. 
(org.apache.kafka.clients.NetworkClient) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Error sending fetch request (sessionId=460667411, epoch=INITIAL) 
to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
[Scheduler for MirrorSourceConnector: 
scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Fetch position FetchPosition{offset=4918131, 
offsetEpoch=Optional[0], 
currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
 (id: 0 rack: null)], epoch=0}} is out of range for partition 
reading.sensor.hfp01sc-0, resetting offset 
(org.apache.kafka.clients.consumer.internals.AbstractFetch) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
(Repeats for 11 more topics)
2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
 (id: 0 rack: null)], epoch=0}}. 
(org.apache.kafka.clients.consumer.internals.SubscriptionState) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
(Repeats for 11 more topics) {code}
The consumer reports that offset 4918131 is out of range for this 
topic/partition, but that offset still exists on the remote cluster. I can go 
pull it up with a consumer right now. The earliest offset in that topic that 
still exists is 3444977 as of yesterday. We have 30 day retention configured so 
pulling in 30 days of duplicate data is very not good. It almost seems like a 
race condition as there are 38 topics we replicate but this only affected 12 
(on this occurance).  

The number of topics affected seems to vary each time. Today I see one site has 
2 topics it is resending, and another has 13.

But since opening this issue originally what I have discovered is that this 
error occurs every time I have a power failure at a remote site I mirror to. I 
can reproduce this issue by doing a hard reset on my single node broker setup. 
The broker comes back up cleanly after processing unflushed messages to 
segments, but while it's coming up I consistently have an issue where I get 
these partition out of range 

[jira] [Commented] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-28 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770079#comment-17770079
 ] 

Steve Jacobs commented on KAFKA-15467:
--

It's not mirrormaker2 it's the broker. I'm able to reproduce the problem now. 
Any help on what I could use for debug logs to trace this would be helpful.

> On reconnect mm2 cannot find offset that exists in remote cluster and 
> re-syncs the entire topic
> ---
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Steve Jacobs
>Priority: Major
>
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate data is very not good. It almost 

[jira] [Commented] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-27 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769707#comment-17769707
 ] 

Steve Jacobs commented on KAFKA-15467:
--

Rolling back to MirrorMaker2 3.4.1 to see if that fixes the issue. No one else 
has any ideas? This seems like a pretty major bug if it's a bug...

> On reconnect mm2 cannot find offset that exists in remote cluster and 
> re-syncs the entire topic
> ---
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Steve Jacobs
>Priority: Major
>
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate data is very not good. It almost seems 

[jira] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-20 Thread Steve Jacobs (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-15467 ]


Steve Jacobs deleted comment on KAFKA-15467:
--

was (Author: steveatbat):
This appears to coincide with our upgrade from MM2 version 3.4.1 to version 
3.5.1. I've reverted to 3.4.1 to see if the problem still occurs.

> On reconnect mm2 cannot find offset that exists in remote cluster and 
> re-syncs the entire topic
> ---
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Steve Jacobs
>Priority: Major
>
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate data is very not good. It almost seems 
> like a race condition as there are 38 topics we replicate 

[jira] [Commented] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-20 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767115#comment-17767115
 ] 

Steve Jacobs commented on KAFKA-15467:
--

This appears to coincide with our upgrade from MM2 version 3.4.1 to version 
3.5.1. I've reverted to 3.4.1 to see if the problem still occurs.

> On reconnect mm2 cannot find offset that exists in remote cluster and 
> re-syncs the entire topic
> ---
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Steve Jacobs
>Priority: Major
>
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate data is very not good. It almost seems 
> 

[jira] [Updated] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-14 Thread Steve Jacobs (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Jacobs updated KAFKA-15467:
-
Description: 
I'm encountering an odd issue with mirrormaker2 with our remote replication 
setup to high latency remote sites (satellite).

Every few days we get several topics completely re-replicated, this appears to 
happen after a network connectivity outage. It doesn't matter if it's a long 
outage (hours) or a short one (minutes). And it only seems to affect a few 
topics.

I was finally able to track down some logs showing the issue. This was after an 
hour-ish long outage where connectivity went down. There were lots of logs 
about connection timeouts, etc. Here is the relevant part when the connection 
came back up:
{code:java}
2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
[AdminClient 
clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
 Disconnecting from node 0 due to socket connection setup timeout. The timeout 
value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
[kafka-admin-client-thread | 
mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
[AdminClient 
clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
 Metadata update failed 
(org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
[kafka-admin-client-thread | 
mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Disconnecting from node 0 due to socket connection setup 
timeout. The timeout value is 52624 ms. 
(org.apache.kafka.clients.NetworkClient) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Error sending fetch request (sessionId=460667411, epoch=INITIAL) 
to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
[Scheduler for MirrorSourceConnector: 
scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Fetch position FetchPosition{offset=4918131, 
offsetEpoch=Optional[0], 
currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
 (id: 0 rack: null)], epoch=0}} is out of range for partition 
reading.sensor.hfp01sc-0, resetting offset 
(org.apache.kafka.clients.consumer.internals.AbstractFetch) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
(Repeats for 11 more topics)
2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
 (id: 0 rack: null)], epoch=0}}. 
(org.apache.kafka.clients.consumer.internals.SubscriptionState) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
(Repeats for 11 more topics) {code}
The consumer reports that offset 4918131 is out of range for this 
topic/partition, but that offset still exists on the remote cluster. I can go 
pull it up with a consumer right now. The earliest offset in that topic that 
still exists is 3444977 as of yesterday. We have 30 day retention configured so 
pulling in 30 days of duplicate data is very not good. It almost seems like a 
race condition as there are 38 topics we replicate but this only affected 12 
(on this occurance).  

 

The number of topics affected seems to vary each time. Today I see one site has 
2 topics it is resending, and another has 13.

  was:
I'm encountering an odd issue with mirrormaker2 with our remote replication 
setup to high latency remote sites (satellite).

Every few days we get several topics completely re-replicated, this appears to 
happen after a network connectivity outage. It doesn't matter if it's a long 
outage (hours) or a short one (minutes). And it only seems to affect a few 
topics.

I was finally able to track down some logs showing the issue. This was after an 
hour-ish long outage where connectivity went down. There were lots of logs 
about connection timeouts, etc. 

[jira] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-14 Thread Steve Jacobs (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-15467 ]


Steve Jacobs deleted comment on KAFKA-15467:
--

was (Author: steveatbat):
Ok I've gone through this and the last 3 occurrences I can find, it's 12 topics 
each time that get reset. At different sites. (We do have the same topics at 
each remote site though).

> On reconnect mm2 cannot find offset that exists in remote cluster and 
> re-syncs the entire topic
> ---
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Steve Jacobs
>Priority: Major
>
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate data is very not good. It almost seems 
> like a race 

[jira] [Commented] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-14 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765324#comment-17765324
 ] 

Steve Jacobs commented on KAFKA-15467:
--

Ok I've gone through this and the last 3 occurrences I can find, it's 12 topics 
each time that get reset. At different sites. (We do have the same topics at 
each remote site though).

> On reconnect mm2 cannot find offset that exists in remote cluster and 
> re-syncs the entire topic
> ---
>
> Key: KAFKA-15467
> URL: https://issues.apache.org/jira/browse/KAFKA-15467
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Steve Jacobs
>Priority: Major
>
> I'm encountering an odd issue with mirrormaker2 with our remote replication 
> setup to high latency remote sites (satellite).
> Every few days we get several topics completely re-replicated, this appears 
> to happen after a network connectivity outage. It doesn't matter if it's a 
> long outage (hours) or a short one (minutes). And it only seems to affect a 
> few topics.
> I was finally able to track down some logs showing the issue. This was after 
> an hour-ish long outage where connectivity went down. There were lots of logs 
> about connection timeouts, etc. Here is the relevant part when the connection 
> came back up:
> {code:java}
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Disconnecting from node 0 due to socket connection setup timeout. The 
> timeout value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> [AdminClient 
> clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
>  Metadata update failed 
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
> [kafka-admin-client-thread | 
> mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Disconnecting from node 0 due to socket connection setup 
> timeout. The timeout value is 52624 ms. 
> (org.apache.kafka.clients.NetworkClient) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Error sending fetch request (sessionId=460667411, 
> epoch=INITIAL) to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> 2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
> refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
> [Scheduler for MirrorSourceConnector: 
> scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
> 2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Fetch position FetchPosition{offset=4918131, 
> offsetEpoch=Optional[0], 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}} is out of range for partition 
> reading.sensor.hfp01sc-0, resetting offset 
> (org.apache.kafka.clients.consumer.internals.AbstractFetch) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics)
> 2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] 
> [Consumer 
> clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
>  groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
> position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
> currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
>  (id: 0 rack: null)], epoch=0}}. 
> (org.apache.kafka.clients.consumer.internals.SubscriptionState) 
> [task-thread-scbi->gcp.MirrorSourceConnector-1]
> (Repeats for 11 more topics) {code}
> The consumer reports that offset 4918131 is out of range for this 
> topic/partition, but that offset still exists on the remote cluster. I can go 
> pull it up with a consumer right now. The earliest offset in that topic that 
> still exists is 3444977 as of yesterday. We have 30 day retention configured 
> so pulling in 30 days of duplicate 

[jira] [Created] (KAFKA-15467) On reconnect mm2 cannot find offset that exists in remote cluster and re-syncs the entire topic

2023-09-14 Thread Steve Jacobs (Jira)
Steve Jacobs created KAFKA-15467:


 Summary: On reconnect mm2 cannot find offset that exists in remote 
cluster and re-syncs the entire topic
 Key: KAFKA-15467
 URL: https://issues.apache.org/jira/browse/KAFKA-15467
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 3.5.1
Reporter: Steve Jacobs


I'm encountering an odd issue with mirrormaker2 with our remote replication 
setup to high latency remote sites (satellite).

Every few days we get several topics completely re-replicated, this appears to 
happen after a network connectivity outage. It doesn't matter if it's a long 
outage (hours) or a short one (minutes). And it only seems to affect a few 
topics.

I was finally able to track down some logs showing the issue. This was after an 
hour-ish long outage where connectivity went down. There were lots of logs 
about connection timeouts, etc. Here is the relevant part when the connection 
came back up:
{code:java}
2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
[AdminClient 
clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
 Disconnecting from node 0 due to socket connection setup timeout. The timeout 
value is 63245 ms. (org.apache.kafka.clients.NetworkClient) 
[kafka-admin-client-thread | 
mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
2023-09-08 16:52:45,380 INFO [scbi->gcp.MirrorSourceConnector|worker] 
[AdminClient 
clientId=mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
 Metadata update failed 
(org.apache.kafka.clients.admin.internals.AdminMetadataManager) 
[kafka-admin-client-thread | 
mm2-admin-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector|replication-source-admin]
2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Disconnecting from node 0 due to socket connection setup 
timeout. The timeout value is 52624 ms. 
(org.apache.kafka.clients.NetworkClient) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
2023-09-08 16:52:47,029 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Error sending fetch request (sessionId=460667411, epoch=INITIAL) 
to node 0: (org.apache.kafka.clients.FetchSessionHandler) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
2023-09-08 16:52:47,336 INFO [scbi->gcp.MirrorSourceConnector|worker] 
refreshing topics took 67359 ms (org.apache.kafka.connect.mirror.Scheduler) 
[Scheduler for MirrorSourceConnector: 
scbi->gcp|scbi->gcp.MirrorSourceConnector-refreshing topics]
2023-09-08 16:52:48,413 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Fetch position FetchPosition{offset=4918131, 
offsetEpoch=Optional[0], 
currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
 (id: 0 rack: null)], epoch=0}} is out of range for partition 
reading.sensor.hfp01sc-0, resetting offset 
(org.apache.kafka.clients.consumer.internals.AbstractFetch) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
(Repeats for 11 more topics)
2023-09-08 16:52:48,479 INFO [scbi->gcp.MirrorSourceConnector|task-1] [Consumer 
clientId=mm2-consumer-scbi|scbi->gcp|scbi->gcp.MirrorSourceConnector-1|replication-consumer,
 groupId=null] Resetting offset for partition reading.sensor.hfp01sc-0 to 
position FetchPosition{offset=3444977, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[kafka.scbi.eng.neoninternal.org:9094
 (id: 0 rack: null)], epoch=0}}. 
(org.apache.kafka.clients.consumer.internals.SubscriptionState) 
[task-thread-scbi->gcp.MirrorSourceConnector-1]
(Repeats for 11 more topics) {code}
The consumer reports that offset 4918131 is out of range for this 
topic/partition, but that offset still exists on the remote cluster. I can go 
pull it up with a consumer right now. The earliest offset in that topic that 
still exists is 3444977 as of yesterday. We have 30 day retention configured so 
pulling in 30 days of duplicate data is very not good. It almost seems like a 
race condition as there are 38 topics we replicate but this only affected 12 
(on this occurance).  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10133) Cannot compress messages in destination cluster with MM2

2020-08-24 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183445#comment-17183445
 ] 

Steve Jacobs commented on KAFKA-10133:
--

That would be incredibly helpful. Just an example of where this setting goes 
would save a lot of trouble!

> Cannot compress messages in destination cluster with MM2
> 
>
> Key: KAFKA-10133
> URL: https://issues.apache.org/jira/browse/KAFKA-10133
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
> Environment: kafka 2.5.0 deployed via the strimzi operator 0.18
>Reporter: Steve Jacobs
>Priority: Minor
>
> When configuring mirrormaker2 using kafka connect, it is not possible to 
> configure things such that messages are compressed in the destination 
> cluster. Dump Log shows that batching is occuring, but no compression. If 
> this is possible, then this is a documentation bug, because I can find no 
> documentation on how to do this.
> baseOffset: 4208 lastOffset: 4492 count: 285 baseSequence: -1 lastSequence: 
> -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 2 isTransactional: 
> false isControl: false position: 239371 CreateTime: 1591745894859 size: 16362 
> magic: 2 compresscodec: NONE crc: 1811507259 isvalid: true
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10133) Cannot compress messages in destination cluster with MM2

2020-08-24 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183300#comment-17183300
 ] 

Steve Jacobs commented on KAFKA-10133:
--

That is how I was confirming compression as well. The example you gave is the 
same as what I said above. It sets the property on the kafka connect worker.  
Not in the sourceconnector/checkpointconnector etc. 

> Cannot compress messages in destination cluster with MM2
> 
>
> Key: KAFKA-10133
> URL: https://issues.apache.org/jira/browse/KAFKA-10133
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
> Environment: kafka 2.5.0 deployed via the strimzi operator 0.18
>Reporter: Steve Jacobs
>Priority: Minor
>
> When configuring mirrormaker2 using kafka connect, it is not possible to 
> configure things such that messages are compressed in the destination 
> cluster. Dump Log shows that batching is occuring, but no compression. If 
> this is possible, then this is a documentation bug, because I can find no 
> documentation on how to do this.
> baseOffset: 4208 lastOffset: 4492 count: 285 baseSequence: -1 lastSequence: 
> -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 2 isTransactional: 
> false isControl: false position: 239371 CreateTime: 1591745894859 size: 16362 
> magic: 2 compresscodec: NONE crc: 1811507259 isvalid: true
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10133) Cannot compress messages in destination cluster with MM2

2020-06-10 Thread Steve Jacobs (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Jacobs resolved KAFKA-10133.
--
Resolution: Not A Problem

> Cannot compress messages in destination cluster with MM2
> 
>
> Key: KAFKA-10133
> URL: https://issues.apache.org/jira/browse/KAFKA-10133
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
> Environment: kafka 2.5.0 deployed via the strimzi operator 0.18
>Reporter: Steve Jacobs
>Priority: Minor
>
> When configuring mirrormaker2 using kafka connect, it is not possible to 
> configure things such that messages are compressed in the destination 
> cluster. Dump Log shows that batching is occuring, but no compression. If 
> this is possible, then this is a documentation bug, because I can find no 
> documentation on how to do this.
> baseOffset: 4208 lastOffset: 4492 count: 285 baseSequence: -1 lastSequence: 
> -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 2 isTransactional: 
> false isControl: false position: 239371 CreateTime: 1591745894859 size: 16362 
> magic: 2 compresscodec: NONE crc: 1811507259 isvalid: true
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10133) Cannot compress messages in destination cluster with MM2

2020-06-10 Thread Steve Jacobs (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17132712#comment-17132712
 ] 

Steve Jacobs commented on KAFKA-10133:
--

Got this working finally. compression.type has to be set on the cluster config 
(kafka connect worker config). Not on the kafkaconnect cluster as a producer 
override (which seems like it should work from the documentation but doesn't). 
Overall this was extremely confusing and improved documentation would save a 
ton of hassle.

> Cannot compress messages in destination cluster with MM2
> 
>
> Key: KAFKA-10133
> URL: https://issues.apache.org/jira/browse/KAFKA-10133
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 2.4.0, 2.5.0, 2.4.1
> Environment: kafka 2.5.0 deployed via the strimzi operator 0.18
>Reporter: Steve Jacobs
>Priority: Minor
>
> When configuring mirrormaker2 using kafka connect, it is not possible to 
> configure things such that messages are compressed in the destination 
> cluster. Dump Log shows that batching is occuring, but no compression. If 
> this is possible, then this is a documentation bug, because I can find no 
> documentation on how to do this.
> baseOffset: 4208 lastOffset: 4492 count: 285 baseSequence: -1 lastSequence: 
> -1 producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 2 isTransactional: 
> false isControl: false position: 239371 CreateTime: 1591745894859 size: 16362 
> magic: 2 compresscodec: NONE crc: 1811507259 isvalid: true
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10133) Cannot compress messages in destination cluster with MM2

2020-06-09 Thread Steve Jacobs (Jira)
Steve Jacobs created KAFKA-10133:


 Summary: Cannot compress messages in destination cluster with MM2
 Key: KAFKA-10133
 URL: https://issues.apache.org/jira/browse/KAFKA-10133
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Affects Versions: 2.4.1, 2.5.0, 2.4.0
 Environment: kafka 2.5.0 deployed via the strimzi operator 0.18
Reporter: Steve Jacobs


When configuring mirrormaker2 using kafka connect, it is not possible to 
configure things such that messages are compressed in the destination cluster. 
Dump Log shows that batching is occuring, but no compression. If this is 
possible, then this is a documentation bug, because I can find no documentation 
on how to do this.

baseOffset: 4208 lastOffset: 4492 count: 285 baseSequence: -1 lastSequence: -1 
producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 2 isTransactional: false 
isControl: false position: 239371 CreateTime: 1591745894859 size: 16362 magic: 
2 compresscodec: NONE crc: 1811507259 isvalid: true

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)