[ 
https://issues.apache.org/jira/browse/KAFKA-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020489#comment-17020489
 ] 

Matthias J. Sax commented on KAFKA-9450:
----------------------------------------

For EOS, we don't write a checkpoint file, and thus we would also not add the 
metadata as a preserved key in the store – hence, it's unclear to me how 
changing where we store the offset-metadata would help for this ticket? This 
tickets says, we don't want to call `innerByteStore#flush` when we call 
`cachingStore#flush` and `changeloggingStore#flush` if EOS is enabled – 
however, stores themselves are agnostic if EOS is enabled and not (what is a 
good thing IMHO). Hence, we can only avoid calling `innerByteStore#flush()` if 
we decouple the caching/changelog/innerBytesStores from each other and the KS 
runtime does not call a single #flush() on the outer metered store that wraps 
all other stores and implicitly flushes all wrapped store, but KS can access 
each store-layer individually and flush them individually as needed.

Or do you suggest to never (ie, for EOS and non-EOS case) call 
`innerByteStore#flush()`? This might be possible, but would have a negative 
impact on non-EOS as it would make current fault-tolerance mechanism for 
non-EOS less efficient (we would not have a guarantee on commit that data is 
flushed to disk and might need to recover more data from the changelog topic in 
case of failure). 

> Decouple inner state flushing from committing with EOS
> ------------------------------------------------------
>
>                 Key: KAFKA-9450
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9450
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Major
>
> When EOS is turned on, the commit interval is set quite low (100ms) and all 
> the store layers are flushed during a commit. This is necessary for 
> forwarding records in the cache to the changelog, but unfortunately also 
> forces rocksdb to flush the current memtable before it's full. The result is 
> a large number of small writes to disk, losing the benefits of batching, and 
> a large number of very small L0 files that are likely to slow compaction.
> Since we have to delete the stores to recreate from scratch anyways during an 
> unclean shutdown with EOS, we may as well skip flushing the innermost 
> StateStore during a commit and only do so during a graceful shutdown, before 
> a rebalance, etc. This is currently blocked on a refactoring of the state 
> store layers to allow decoupling the flush of the caching layer from the 
> actual state store.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to