Evgheni Popusoi created KAFKA-19629:
---------------------------------------
Summary: Deadlock in Kafka Streams when processing Interactive
Queries and state store updates concurrently
Key: KAFKA-19629
URL: https://issues.apache.org/jira/browse/KAFKA-19629
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 3.9.1, 3.8.1
Environment: Kafka Streams, kotlin, linux, docker. JDK 21
Reporter: Evgheni Popusoi
Attachments: thread-dump-1.txt, thread-dump-2.txt
We are using a Kafka Streams topology that continuously writes large volumes of
data into a RocksDB state store with stable throughput. In parallel, another
thread executes Interactive Query (IQ) requests against the same local state
store.
When the number of IQ requests in the queue grows (≈50+), the application
enters a {*}deadlock state{*}.
*Investigation:*
Using a thread dump, we discovered a lock inversion between RocksDB operations:
* {{RocksDBStore.put}}
** blocked on {{org.apache.kafka.streams.query.Position@4ba00b6c}}
** holding {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}
* {{RocksDBStore.range}}
** blocked on
{{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}
** holding {{org.apache.kafka.streams.query.Position@4ba00b6c}}
This indicates that {*}{{put}} and {{range}} acquire the same locks but in
different order{*}, which leads to deadlock under concurrent load.
*Expected Behavior:*
Kafka Streams API should guarantee deadlock-free operation. Store writes
({{{}put{}}}) and IQ reads ({{{}range{}}}) should not block each other in a way
that leads to lock inversion.
*Steps to Reproduce:*
# Create a Kafka Streams topology with a RocksDB state store receiving
continuous writes.
# In a parallel thread, issue a high number of Interactive Query {{range}}
requests (≈50+ queued).
# Observe that the system eventually enters deadlock.
*
*Impact:*
* Application stops processing data.
* Interactive Queries fail indefinitely.
* Requires manual restart to recover.
*Notes:*
* Appears to be a lock ordering bug in {{{}RocksDBStore{}}}.
* Expected the Streams API to coordinate thread-safety and prevent such
deadlocks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)