[ https://issues.apache.org/jira/browse/KAFKA-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164009#comment-17164009 ]
John Roesler commented on KAFKA-10287: -------------------------------------- >From [~cadonna] 's PR: > In PR [#8962|https://github.com/apache/kafka/pull/8962] we introduced a > sentinel UNKNOWN_OFFSET to mark unknown offsets in checkpoint files. The > sentinel was set to -2 which is the same value used for the sentinel > LATEST_OFFSET that is used in subscriptions to signal that state stores have > been used by an active task. Unfortunately, we missed to skip UNKNOWN_OFFSET > when we compute the sum of the changelog offsets. > If a task had only one state store and it did not restore anything before the > next rebalance, the stream thread wrote -2 (i.e., UNKNOWN_OFFSET) into the > subscription as sum of the changelog offsets. During assignment, the leader > interpreted the -2 as if the stream run the task as active although it might > have run it as standby. This misinterpretation of the sentinel value resulted > in unexpected task assigments. This seems like a blocker to me, [~rhauch] , what do you think? The condition is that the streams assignor would unexpectedly move work to a node that's not ready, resulting in downtime while that node restores from the changelog. > fix flaky streams/streams_standby_replica_test.py > ------------------------------------------------- > > Key: KAFKA-10287 > URL: https://issues.apache.org/jira/browse/KAFKA-10287 > Project: Kafka > Issue Type: Bug > Components: streams, system tests > Reporter: Chia-Ping Tsai > Assignee: Chia-Ping Tsai > Priority: Blocker > Fix For: 2.6.0 > > > {quote} > Module: kafkatest.tests.streams.streams_standby_replica_test > Class: StreamsStandbyTask > Method: test_standby_tasks_rebalance > {quote} > It pass occasionally on my local. -- This message was sent by Atlassian Jira (v8.3.4#803005)