nicktelford commented on code in PR #15264: URL: https://github.com/apache/kafka/pull/15264#discussion_r1487649620
########## streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java: ########## @@ -1836,6 +1842,36 @@ public void maybeThrowTaskExceptionsFromProcessingThreads() { } } + private boolean transactionBuffersExceedCapacity = false; + + boolean transactionBuffersExceedCapacity() { + return transactionBuffersExceedCapacity; + } + + boolean transactionBuffersWillExceedCapacity() { + final boolean transactionBuffersAreUnbounded = maxUncommittedStateBytes < 0; + if (transactionBuffersAreUnbounded) { + return false; + } + + // force an early commit if the uncommitted bytes exceeds or is *likely to exceed* the configured threshold Review Comment: > I could be wrong about this, but I thought that writing to an existing key in the active memtable -- ie before it is flushed and/or made immutable -- is indeed applied as an overwrite, and does not grow the write buffer's overall size? Of course this only applies to patterns in which keys are updated multiple times between flushes, sure. Still, I feel like this is one of the major points of the write buffer: to deduplicate same-key writes as much as possible before flushing them, since once they're flushed it has to wait for a compaction to deduplicate further. This config controls the maximum memory used by the transaction buffers aka. WriteBatches. These work a bit differently to memtables/write buffers: all writes to a WriteBatch are written serially, even if the same key is touched multiple times. The size of the index it tracks over those writes _may_ vary with cardinality, but it shouldn't be enough to worry about I expect. It's worth remembering that our tracking of RocksDB transaction buffer size will not be perfect either, because WriteBatch has no means to determine its size in memory. We will be simply tracking the aggregate size of the keys and values being written to it. This means that the index and any serialization overhead used by WriteBatchWithIndex will not be included in this size estimate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org