Andy Coates created KAFKA-17120:
-----------------------------------

             Summary: Race condition in KafkaStreams.close can result in 
StreamsException "Failed to shut down while in state"
                 Key: KAFKA-17120
                 URL: https://issues.apache.org/jira/browse/KAFKA-17120
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 3.6.1
            Reporter: Andy Coates


If `KafkaStreams.close` is called while the app is transitioning to an error 
state, i.e.  `PENDING_ERROR` and `ERROR`, it seems there is a race condition 
that can result in the `close` method throwing a `StreamsException` with the 
message "Failed to shut down while in state PENDING_ERROR":

```

[2024-07-02T13:09:05.182Z] 13:09:04.921 
[mktx.com.cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495-StreamThread-1]
 INFO org.apache.kafka.streams.KafkaStreams - stream-client 
[mktx.com.cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495]
 State transition from RUNNING to PENDING_ERROR

2024-07-02T13:09:05.182Z] 13:09:04.921 [Thread-30] ERROR 
org.apache.kafka.streams.KafkaStreams - stream-client 
[cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495]
 Failed to transition to PENDING_SHUTDOWN, current state is PENDING_ERROR

[2024-07-02T13:09:05.182Z] 13:09:04.922 
[cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495-StreamThread-1]
 INFO com.cdc.observability.LoggingApplicationLifecycleListener - 
\{"state":"PENDING_ERROR","message":"Application-lifecycle","oldState":"RUNNING"}

[2024-07-02T13:09:05.182Z] 13:09:04.922 [Thread-30] ERROR 
com.cdc.observability.LoggingApplicationLifecycleListener - {"reason":"Failed 
to stop the Kafka Streams app as streams.close() threw an 
exception","cause":"org.apache.kafka.streams.errors.StreamsException: Failed to 
shut down while in state PENDING_ERROR\n\tat 
org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:1447)\n\tat 
org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:1497)\n\tat 
com.cdc.streams.KafkaStreamsExecutor.shutdownStreams(KafkaStreamsExecutor.java:90)\n\tat
 
com.cdc.streams.KafkaStreamsExecutor.execute(KafkaStreamsExecutor.java:70)\n\tat
 com.cdc.Main.startKafkaStreams(Main.java:148)\n
```

Could this be caused by lack of synchronisation in the `close` method around 
the state checks?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to