[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2024-08-09 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872515#comment-17872515
 ] 

Colin McCabe commented on KAFKA-14302:
--

The ticket is still unassigned so moving to the next release.

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Critical
> Fix For: 3.9.0
>
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2024-06-19 Thread Josep Prat (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856153#comment-17856153
 ] 

Josep Prat commented on KAFKA-14302:


Changing target fix version to 3.9 since this is not a blocker and we are past 
code freeze

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Critical
> Fix For: 3.8.0
>
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2023-12-26 Thread Stanislav Kozlovski (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800620#comment-17800620
 ] 

Stanislav Kozlovski commented on KAFKA-14302:
-

Changing target fix version to 3.8 since this is not a blocker and we are 
cutting a 3.7 RC

 

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Critical
> Fix For: 3.8.0
>
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2023-09-02 Thread Satish Duggana (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761524#comment-17761524
 ] 

Satish Duggana commented on KAFKA-14302:


Moving it to 3.7.0 as it is not a blocker for 3.6.0

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Critical
> Fix For: 3.6.0
>
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2023-04-24 Thread Mickael Maison (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715656#comment-17715656
 ] 

Mickael Maison commented on KAFKA-14302:


The ticket is still unassigned so moving to the next release.

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2022-12-14 Thread A. Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647820#comment-17647820
 ] 

A. Sophie Blee-Goldman commented on KAFKA-14302:


I did look into this briefly but it was not the straightforward bug I initially 
assumed from the title. Bumping it to 3.5 since we are past code freeze for 3.4

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Major
> Fix For: 3.5.0
>
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14302) Infinite probing rebalance if a changelog topic got emptied

2022-10-21 Thread A. Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17622499#comment-17622499
 ] 

A. Sophie Blee-Goldman commented on KAFKA-14302:


Oof. That's a silly one – seems easy enough to avoid, too bad we missed it on 
the first go but I imagine it should be easy enough to fix. Will mark it for 
3.4.0 so we don't let it slip

> Infinite probing rebalance if a changelog topic got emptied
> ---
>
> Key: KAFKA-14302
> URL: https://issues.apache.org/jira/browse/KAFKA-14302
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Damien Gasparina
>Priority: Major
> Attachments: image-2022-10-14-12-04-01-190.png, logs.tar.gz2
>
>
> If a store, with a changelog topic, has been fully emptied, it could generate 
> infinite probing rebalance.
>  
> The scenario is the following:
>  * A Kafka Streams application, deployed on many instances, have a store with 
> a changelog
>  * Many entries are pushed into the changelog, thus the Log end Offset is 
> high, let's say 20,000
>  * Then, the store got emptied, either due to data retention (windowing) or 
> tombstone
>  * Then an instance of the application is restarted, and its local disk is 
> deleted (e.g. Kubernetes without Persistent Volume)
>  * After restart, the application restores the store from the changelog, but 
> does not write a checkpoint file as there are no data
>  * As there are no checkpoint entries, this instance specify a taskOffsetSums 
> with offset set to 0 in the subscriptionUserData
>  * The group leader, during the assignment, then compute a lag of 20,000 (end 
> offsets - task offset), which is greater than the default acceptable lag, 
> thus decide to schedule a probing rebalance
>  * In ther next probing rebalance, nothing changed, so... new probing 
> rebalance
>  
> I was able to reproduce locally with a simple topology:
>  
> {code:java}
> var table = streamsBuilder.stream("table");
> streamsBuilder
> .stream("stream")
> .join(table, (eSt, eTb) -> eSt.toString() + eTb.toString(), 
> JoinWindows.of(Duration.ofSeconds(5)))
> .to("output");{code}
>  
>  
>  
> Due to this issue, application having an empty changelog are experiencing 
> frequent rebalance:
> !image-2022-10-14-12-04-01-190.png!
>  
> With assignments similar to:
> {code:java}
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3] INFO 
> org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor - 
> stream-thread 
> [hello-world-8323d214-4c56-470f-bace-e4291cdf10eb-StreamThread-3-consumer] 
> Assigned tasks [0_5, 0_4, 0_3, 0_2, 0_1, 0_0] including stateful [0_5, 0_4, 
> 0_3, 0_2, 0_1, 0_0] to clients as: 
> d0e2d556-2587-48e8-b9ab-43a4e8207be6=[activeTasks: ([]) standbyTasks: ([0_0, 
> 0_1, 0_2, 0_3, 0_4, 0_5])]
> 8323d214-4c56-470f-bace-e4291cdf10eb=[activeTasks: ([0_0, 0_1, 0_2, 0_3, 0_4, 
> 0_5]) standbyTasks: ([])].{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)