[ https://issues.apache.org/jira/browse/KAFKA-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020489#comment-17020489 ]
Matthias J. Sax commented on KAFKA-9450: ---------------------------------------- For EOS, we don't write a checkpoint file, and thus we would also not add the metadata as a preserved key in the store – hence, it's unclear to me how changing where we store the offset-metadata would help for this ticket? This tickets says, we don't want to call `innerByteStore#flush` when we call `cachingStore#flush` and `changeloggingStore#flush` if EOS is enabled – however, stores themselves are agnostic if EOS is enabled and not (what is a good thing IMHO). Hence, we can only avoid calling `innerByteStore#flush()` if we decouple the caching/changelog/innerBytesStores from each other and the KS runtime does not call a single #flush() on the outer metered store that wraps all other stores and implicitly flushes all wrapped store, but KS can access each store-layer individually and flush them individually as needed. Or do you suggest to never (ie, for EOS and non-EOS case) call `innerByteStore#flush()`? This might be possible, but would have a negative impact on non-EOS as it would make current fault-tolerance mechanism for non-EOS less efficient (we would not have a guarantee on commit that data is flushed to disk and might need to recover more data from the changelog topic in case of failure). > Decouple inner state flushing from committing with EOS > ------------------------------------------------------ > > Key: KAFKA-9450 > URL: https://issues.apache.org/jira/browse/KAFKA-9450 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Sophie Blee-Goldman > Priority: Major > > When EOS is turned on, the commit interval is set quite low (100ms) and all > the store layers are flushed during a commit. This is necessary for > forwarding records in the cache to the changelog, but unfortunately also > forces rocksdb to flush the current memtable before it's full. The result is > a large number of small writes to disk, losing the benefits of batching, and > a large number of very small L0 files that are likely to slow compaction. > Since we have to delete the stores to recreate from scratch anyways during an > unclean shutdown with EOS, we may as well skip flushing the innermost > StateStore during a commit and only do so during a graceful shutdown, before > a rebalance, etc. This is currently blocked on a refactoring of the state > store layers to allow decoupling the flush of the caching layer from the > actual state store. -- This message was sent by Atlassian Jira (v8.3.4#803005)