[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17821066#comment-17821066 ] Bruno Cadonna commented on KAFKA-9062: -- Closed it. If we will run again into a similar issue let's create a new ticket. > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: A. Sophie Blee-Goldman >Priority: Major > Labels: new-streams-runtime-should-fix > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17820819#comment-17820819 ] Matthias J. Sax commented on KAFKA-9062: Given that bulk loading was disabled, should we close this ticket? > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: A. Sophie Blee-Goldman >Priority: Major > Labels: new-streams-runtime-should-fix > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432797#comment-17432797 ] James Cheng commented on KAFKA-9062: [~vvcephei], you said that in 2.6, you removed "bulk loading". Is it this JIRA? https://issues.apache.org/jira/browse/KAFKA-10005 > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: A. Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432794#comment-17432794 ] James Cheng commented on KAFKA-9062: Are there any metrics that we can measure and look at, to see if we are being impacted by the issue? Anything in RocksDb or Kafka streams? Maybe PUT latency? Thanks. > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: A. Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17151399#comment-17151399 ] John Roesler commented on KAFKA-9062: - Hi all, I would like to see KAFKA-10122 fixed ASAP, but we didn’t make it a blocker for 2.6 or 2.5.1. Guozhang opened a PR for it, but it’s a big change to rush in at the last minute. What we did for 2.6 was remove “bulk loading”, so that compactions will happen throughout the restoration instead of having one, catastrophically long compaction at the end. > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149541#comment-17149541 ] Bradley Peterson commented on KAFKA-9062: - [~ableegoldman] great, thanks. That bug does sound like a match, so I'll keep an eye on it. > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149532#comment-17149532 ] Sophie Blee-Goldman commented on KAFKA-9062: Sorry, I didn't actually answer your question. It _should_ affect only polling, and thus only the max.poll.interval should apply, but there is at least that one known bug where you can actually get kicked out of the group if you fail to poll (and complete the rebalance) within the session timeout. > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149530#comment-17149530 ] Sophie Blee-Goldman commented on KAFKA-9062: [~thebearmayor] I think you could be hitting https://issues.apache.org/jira/browse/KAFKA-10122 [~guozhang] [~vvcephei] I thought we wanted to fix that in 2.5 (and maybe 2.6, but it's should hopefully be less severe given other changes. I know we thought it might be rare but this isn't the first report I've heard that could potentially be traced back to this. Of course, it could be something else so I don't want to jump to conclusion. Is there anything else that could explain this? > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17149437#comment-17149437 ] Bradley Peterson commented on KAFKA-9062: - I have just a question -- would these stalls block heartbeats or just polling? We have hit this issue in the past and raised out max.poll.interval.ms very high (12 hours) to work around it. But, sometimes after a state restoration we still instances kicked out of the group by the group coordinator. Our session.timeout.ms is 30 minutes, so that suggests to me that something is blocking heartbeats for over 30 minutes. > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001259#comment-17001259 ] Sophie Blee-Goldman commented on KAFKA-9062: Just to clarify, the "simple workaround" wouldn't involve turning off batching, just autocompaction. I get the sense the suggested bulk loading configuration is aimed at the specific case where you would like to dump a large amount of data to rocks and then start querying it. This differs slightly from what Streams actually needs to do, which is restore all that data and then start _writing_ (and also reading) from it – it occurs to me that the bulk loading mode is not targeted at our specific use case, since queries would not be stalled by excessive L0 files/compaction, only writes. That said, as mentioned above I think we should solve this holistically – stalled writes can still happen for other reasons. I just want to point out that the bulk loading mode may not be what we think it is > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999467#comment-16999467 ] Sophie Blee-Goldman commented on KAFKA-9062: [~jpzk] I agree this is a bug in Streams (that's what this ticket is for :) ) but just to clarify, the PUT is stalled due to the manual compaction issued at the end of restoration which is taking too long to compact the excessive L0 files, not due to an autocompaction triggered by the PUT itself. The reason disabling autocompaction prevents this from happening is that rocksdb just doesn't stall writes due to excessive L0 files with autocompaction off (which makes sense as otherwise you could get stuck forever) As a very simple workaround, we could consider making the "bulk loading" mode optional/configurable (possibly through an augmented RocksDBConfigSetter). Users hitting this issue could simply keep autocompaction enabled during restoration to hopefully keep the L0 file count under control so new writes won't stall. This would in theory slow down the restoration, but I suspect we may not be gaining as much from the bulk loading mode as we think since the keys are unsorted > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999353#comment-16999353 ] Jendrik Poloczek commented on KAFKA-9062: - I think this is actually a Kafka Streams bug, since it’s a deadlocked thread by the invocation of the first PUT (after restoration) which triggers an auto compaction (default Kafka Streams settings). The auto compaction however, is triggered inside PUT and the method is blocking. We profiled our application with yourkit to identify this problem. We disabled auto compaction (using the and we don’t have the problem anymore (the timeout block on PUT). Compaction still happens at some point but it seems Kafka Streams is aware of this (is not deadlocked by it). P.S. We're using Kafka Streams 2.2 and default RocksDB settings > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957416#comment-16957416 ] Boyang Chen commented on KAFKA-9062: That's a good suggestion, we should try to utilize as close to max.batch.size as possible if there is any > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957266#comment-16957266 ] Sophie Blee-Goldman commented on KAFKA-9062: My understand is that a lot of what compaction is spending time doing is sorting, so starting with more sorted runs should make it faster. Also, to really gain a benefit from sorting we'd presumably want to build up bigger batches, so there would be fewer L0 files to compact > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956439#comment-16956439 ] Boyang Chen commented on KAFKA-9062: [~ableegoldman] Could you provide more details on why the sorting is going to make the compaction better? What's the sorting key? It should be easy to reproduce by using a RockDB instance to try out. > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954994#comment-16954994 ] Sophie Blee-Goldman commented on KAFKA-9062: I think this is challenging to solve in a holistic way without a large scale architectural change. Personally I'd vote to focus on a short term patch for the specific case users seem to be hitting while keeping this is mind when discussing longer term revamping of Streams. That's an interesting idea, one other thought I had was just to allow the bulk loading strategy to be more configurable as it seems to work well for some users and clearly not so well for others. This could actually be related to KAFKA-8627 as I believe pre-sorting may greatly improve the compaction times > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954893#comment-16954893 ] Guozhang Wang commented on KAFKA-9062: -- My current thought, which is a bit hacky, is that upon finishing up restoration we could look into the rocksDB compaction stats, and if found out that it is still very hot and hence a write stall is very possible on the first put, we can keep the the task idle while continuously calling consumer.poll to stay in the group until compaction cools down. But this does not avoid any general write stalls generally speaking; in the long run maybe we still need to consider different threads polling from consumer v.s. threads processing from queue. Thoughts? > Handle stalled writes to RocksDB > > > Key: KAFKA-9062 > URL: https://issues.apache.org/jira/browse/KAFKA-9062 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > RocksDB may stall writes at times when background compactions or flushes are > having trouble keeping up. This means we can effectively end up blocking > indefinitely during a StateStore#put call within Streams, and may get kicked > from the group if the throttling does not ease up within the max poll > interval. > Example: when restoring large amounts of state from scratch, we use the > strategy recommended by RocksDB of turning off automatic compactions and > dumping everything into L0. We do batch somewhat, but do not sort these small > batches before loading into the db, so we end up with a large number of > unsorted L0 files. > When restoration is complete and we toggle the db back to normal (not bulk > loading) settings, a background compaction is triggered to merge all these > into the next level. This background compaction can take a long time to merge > unsorted keys, especially when the amount of data is quite large. > Any new writes while the number of L0 files exceeds the max will be stalled > until the compaction can finish, and processing after restoring from scratch > can block beyond the polling interval -- This message was sent by Atlassian Jira (v8.3.4#803005)