[ 
https://issues.apache.org/jira/browse/KAFKA-19853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035625#comment-18035625
 ] 

Colt McNealy edited comment on KAFKA-19853 at 11/5/25 6:22 PM:
---------------------------------------------------------------

KIP-1035 would _help_ the write stalls but not remove them. Note that the write 
stalls are not "global" but rather per-rocksdb instance, because (in the case I 
observed) they were specifically caused by the number of L0 files in the 
RocksDB for the state store under restoration.

Active tasks did not experience stalls or even slowdowns because compaction was 
easily able to keep up with the write rate.

As an experiment, I disabled manual flushing (changing 
`StateManagerUtil.OFFSET_DELTA_THRESHOLD_FOR_CHECKPOINT` to 1M instead of 10k). 
The results:
 * I still got write stalls on the restoring tasks + warmup replicas, but they 
were less bad.
 * Restoration throughput improved 3x (from 4MB/s to 12MB/s) with identical 
conditions as measured by restore consumer throughput.


was (Author: JIRAUSER301663):
KIP-1035 would _help_ the write stalls but not remove them. Note that the write 
stalls are not "global" but rather per-rocksdb instance, because (in the case I 
observed) they were specifically caused by the number of L0 files in the 
RocksDB for the state store under restoration.

As an experiment, I disabled manual flushing (changing 
`StateManagerUtil.OFFSET_DELTA_THRESHOLD_FOR_CHECKPOINT` to 1M instead of 10k). 
The results:
 * I still got write stalls on the restoring tasks + warmup replicas, but they 
were less bad.
 * Restoration throughput improved 3x (from 4MB/s to 12MB/s) with identical 
conditions as measured by restore consumer throughput.

> Transaction Failure when StreamThread blocks on StateUpdater during 
> onAssignment()
> ----------------------------------------------------------------------------------
>
>                 Key: KAFKA-19853
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19853
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.9.0
>            Reporter: Colt McNealy
>            Priority: Major
>         Attachments: image (3).png, image (4).png, image (5).png
>
>
> We've observed that the `StreamThread` blocks waiting for a `Future` from the 
> `StateUpdater` in the `StreamsPartitionAssigner#onAssignment()` method when 
> we are moving a task out of the `StateUpdater` and onto the `StreamThread`.
>  
> This can cause problems because, during restoration or with warmup replicas, 
> the `StateUpdater#runOnce()` method can take a long time (upwards of 20 
> seconds) when RocksDB stalls writes to allow compaction to keep up. In EOS 
> this blockage may cause the transaction to time out, which is a big mess. 
> This is because the `StreamThread` may have an open transaction before the 
> `StreamsPartitionAssignor#onAssignment()` method is called.
>  
> Some screenshots from the JFR below (credit to [~eduwerc]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to