[ https://issues.apache.org/jira/browse/KAFKA-12503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304523#comment-17304523 ]
Guozhang Wang commented on KAFKA-12503: --------------------------------------- Thanks for the summary [~wcarlson5] [~ableegoldman]. I'm wondering if we should fix it along with https://issues.apache.org/jira/browse/KAFKA-12500 together by adding a separate function in the cache to just free the space corresponding to a thread that does not trigger eviction --- i.e. just clear the records in the buffer. Wondering what's your current proposal to fix. > Resizing the thread cache in a non thread safe way can cause records to be > redirected throughout the topology > ------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-12503 > URL: https://issues.apache.org/jira/browse/KAFKA-12503 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.8.0 > Reporter: Walker Carlson > Priority: Blocker > Fix For: 2.8.0 > > > When a thread is added, removed or replaced the cache is resized. When the > thread cache was resized it was being done so from the thread initiating > these calls. This can cause the record to be redirected to the wrong > processor via the call to `evict` in the cache. The evict flushes records > downstream to the next processor after the cache. But if this is on the wrong > thread the wrong processor receives them. > This can cause 3 problems. > 1) When the owner finishes processing the record it set the current node to > null in the processor context a this then causes the other processor to throw > an exception `StreamsException: Current node is unknown.`. > 2) Depending on the type it can cause a class cast exception as the record is > a different type. Mostly this happened when the value types were different > inside of the map node from the toStream method > 3) A silent issue is it could cause data to be processed by the wrong node > and cause data corruption. We have not been able to confirm this last one but > it is the most dangerous in many ways. -- This message was sent by Atlassian Jira (v8.3.4#803005)