Andy Coates created KAFKA-17120:
-----------------------------------
Summary: Race condition in KafkaStreams.close can result in
StreamsException "Failed to shut down while in state"
Key: KAFKA-17120
URL: https://issues.apache.org/jira/browse/KAFKA-17120
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 3.6.1
Reporter: Andy Coates
If `KafkaStreams.close` is called while the app is transitioning to an error
state, i.e. `PENDING_ERROR` and `ERROR`, it seems there is a race condition
that can result in the `close` method throwing a `StreamsException` with the
message "Failed to shut down while in state PENDING_ERROR":
```
[2024-07-02T13:09:05.182Z] 13:09:04.921
[mktx.com.cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495-StreamThread-1]
INFO org.apache.kafka.streams.KafkaStreams - stream-client
[mktx.com.cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495]
State transition from RUNNING to PENDING_ERROR
2024-07-02T13:09:05.182Z] 13:09:04.921 [Thread-30] ERROR
org.apache.kafka.streams.KafkaStreams - stream-client
[cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495]
Failed to transition to PENDING_SHUTDOWN, current state is PENDING_ERROR
[2024-07-02T13:09:05.182Z] 13:09:04.922
[cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495-StreamThread-1]
INFO com.cdc.observability.LoggingApplicationLifecycleListener -
\{"state":"PENDING_ERROR","message":"Application-lifecycle","oldState":"RUNNING"}
[2024-07-02T13:09:05.182Z] 13:09:04.922 [Thread-30] ERROR
com.cdc.observability.LoggingApplicationLifecycleListener - {"reason":"Failed
to stop the Kafka Streams app as streams.close() threw an
exception","cause":"org.apache.kafka.streams.errors.StreamsException: Failed to
shut down while in state PENDING_ERROR\n\tat
org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:1447)\n\tat
org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:1497)\n\tat
com.cdc.streams.KafkaStreamsExecutor.shutdownStreams(KafkaStreamsExecutor.java:90)\n\tat
com.cdc.streams.KafkaStreamsExecutor.execute(KafkaStreamsExecutor.java:70)\n\tat
com.cdc.Main.startKafkaStreams(Main.java:148)\n
```
Could this be caused by lack of synchronisation in the `close` method around
the state checks?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)