[ 
https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999353#comment-16999353
 ] 

Jendrik Poloczek edited comment on KAFKA-9062 at 12/18/19 5:04 PM:
-------------------------------------------------------------------

I think this is actually a deadlocked thread by the invocation of the first PUT 
(after restoration) which triggers an auto compaction (default Kafka Streams 
settings). The auto compaction however, is triggered inside PUT and the method 
is blocking. We profiled our application with yourkit to identify this problem. 
We disabled auto compaction (using the and we don’t have the problem anymore 
(the timeout block on PUT). Compaction still happens at some point but it seems 
Kafka Streams is aware of this (is not deadlocked by it).
P.S. We're using Kafka Streams 2.2 and default RocksDB settings


was (Author: jpzk):
I think this is actually a Kafka Streams bug, since it’s a deadlocked thread by 
the invocation of the first PUT (after restoration) which triggers an auto 
compaction (default Kafka Streams settings). The auto compaction however, is 
triggered inside PUT and the method is blocking. We profiled our application 
with yourkit to identify this problem. We disabled auto compaction (using the 
and we don’t have the problem anymore (the timeout block on PUT). Compaction 
still happens at some point but it seems Kafka Streams is aware of this (is not 
deadlocked by it).
P.S. We're using Kafka Streams 2.2 and default RocksDB settings

> Handle stalled writes to RocksDB
> --------------------------------
>
>                 Key: KAFKA-9062
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9062
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Major
>
> RocksDB may stall writes at times when background compactions or flushes are 
> having trouble keeping up. This means we can effectively end up blocking 
> indefinitely during a StateStore#put call within Streams, and may get kicked 
> from the group if the throttling does not ease up within the max poll 
> interval.
> Example: when restoring large amounts of state from scratch, we use the 
> strategy recommended by RocksDB of turning off automatic compactions and 
> dumping everything into L0. We do batch somewhat, but do not sort these small 
> batches before loading into the db, so we end up with a large number of 
> unsorted L0 files.
> When restoration is complete and we toggle the db back to normal (not bulk 
> loading) settings, a background compaction is triggered to merge all these 
> into the next level. This background compaction can take a long time to merge 
> unsorted keys, especially when the amount of data is quite large.
> Any new writes while the number of L0 files exceeds the max will be stalled 
> until the compaction can finish, and processing after restoring from scratch 
> can block beyond the polling interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to