Andy Coates created KAFKA-17120: ----------------------------------- Summary: Race condition in KafkaStreams.close can result in StreamsException "Failed to shut down while in state" Key: KAFKA-17120 URL: https://issues.apache.org/jira/browse/KAFKA-17120 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 3.6.1 Reporter: Andy Coates
If `KafkaStreams.close` is called while the app is transitioning to an error state, i.e. `PENDING_ERROR` and `ERROR`, it seems there is a race condition that can result in the `close` method throwing a `StreamsException` with the message "Failed to shut down while in state PENDING_ERROR": ``` [2024-07-02T13:09:05.182Z] 13:09:04.921 [mktx.com.cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495-StreamThread-1] INFO org.apache.kafka.streams.KafkaStreams - stream-client [mktx.com.cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495] State transition from RUNNING to PENDING_ERROR 2024-07-02T13:09:05.182Z] 13:09:04.921 [Thread-30] ERROR org.apache.kafka.streams.KafkaStreams - stream-client [cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495] Failed to transition to PENDING_SHUTDOWN, current state is PENDING_ERROR [2024-07-02T13:09:05.182Z] 13:09:04.922 [cdc:cdc-processor-b586687d-e969-45eb-949c-09042b0d358b-b76cd6d2-0549-4cf9-b7e5-6f886d408495-StreamThread-1] INFO com.cdc.observability.LoggingApplicationLifecycleListener - \{"state":"PENDING_ERROR","message":"Application-lifecycle","oldState":"RUNNING"} [2024-07-02T13:09:05.182Z] 13:09:04.922 [Thread-30] ERROR com.cdc.observability.LoggingApplicationLifecycleListener - {"reason":"Failed to stop the Kafka Streams app as streams.close() threw an exception","cause":"org.apache.kafka.streams.errors.StreamsException: Failed to shut down while in state PENDING_ERROR\n\tat org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:1447)\n\tat org.apache.kafka.streams.KafkaStreams.close(KafkaStreams.java:1497)\n\tat com.cdc.streams.KafkaStreamsExecutor.shutdownStreams(KafkaStreamsExecutor.java:90)\n\tat com.cdc.streams.KafkaStreamsExecutor.execute(KafkaStreamsExecutor.java:70)\n\tat com.cdc.Main.startKafkaStreams(Main.java:148)\n ``` Could this be caused by lack of synchronisation in the `close` method around the state checks? -- This message was sent by Atlassian Jira (v8.20.10#820010)