[
https://issues.apache.org/jira/browse/KAFKA-19853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035341#comment-18035341
]
Lucas Brutschy commented on KAFKA-19853:
----------------------------------------
Yeah, this is a valid ticket, but I am not sure whether we can fix it by making
the state updater task hand-off non-blocking.
Q: In the non-state updater code-path, wouldn't we also stall for 20s and
thereby run into the transaction timeout?
Since rebalances should be fairly infrequent and commits are pretty frequent, I
am actually not sure why we don't just commit all tasks at the beginning of the
rebalance. IIRC, we only commit the ones that are being revoked, but in EOSv2
we need to commit them all anyway. So your solution of just committing every
time there is a rebalance may not be such a bad idea.
> StreamThread blocks on StateUpdater during onAssignment()
> ---------------------------------------------------------
>
> Key: KAFKA-19853
> URL: https://issues.apache.org/jira/browse/KAFKA-19853
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.9.0
> Reporter: Colt McNealy
> Priority: Major
> Attachments: image (3).png, image (4).png, image (5).png
>
>
> We've observed that the `StreamThread` blocks waiting for a `Future` from the
> `StateUpdater` in the `StreamsPartitionAssigner#onAssignment()` method when
> we are moving a task out of the `StateUpdater` and onto the `StreamThread`.
>
> This can cause problems because, during restoration or with warmup replicas,
> the `StateUpdater#runOnce()` method can take a long time (upwards of 20
> seconds) when RocksDB stalls writes to allow compaction to keep up. In EOS
> this blockage may cause the transaction to time out, which is a big mess.
> This is because the `StreamThread` may have an open transaction before the
> `StreamsPartitionAssignor#onAssignment()` method is called.
>
> Some screenshots from the JFR below (credit to [~eduwerc]).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)