[ https://issues.apache.org/jira/browse/KAFKA-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738060#comment-17738060 ]
David Gammon commented on KAFKA-15116: -------------------------------------- Hi [~mjsax], please see my responses below: # Message A uses an internal store to store information about the entity. The store knows that there is a pending event that is yet to be committed so it blocks until it is committed. The problem happens when Message B (which has a processor that uses the store) tries to get information on it's entity. It will block and timeout because Message A hasn't been committed. # I think our scenario is specifically *during* a rebalance. I've seen code that says if the taskManager is rebalancing then do not commit. # This is more to do with our store and how long it takes before it times out. The timeout then can impact the transaction timeout and producers get fenced etc. # The fix is to add a check for rebalancing in the while loop in runOnce. This checks if a rebalancing is in progress and sets the numIterations to 0 to stop processing of messages. When it has rebalanced it sets numIterations back to 1. # Again I think we are talking about *during* a rebalance rather than before. Thanks, David > Kafka Streams processing blocked during rebalance > ------------------------------------------------- > > Key: KAFKA-15116 > URL: https://issues.apache.org/jira/browse/KAFKA-15116 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.5.0 > Reporter: David Gammon > Priority: Major > > We have a Kafka Streams application that simply takes a messages, processes > it and then produces an event out the other side. The complexity is that > there is a requirement that all events with the same partition key must be > committed before the next message is processed. > This works most of the time flawlessly but we have started to see problems > during deployments where the first message blocks the second message during a > rebalance because the first message isn’t committed before the second message > is processed. This ultimately results in transactions timing out and more > rebalancing. > We’ve tried lots of configuration to get the behaviour we require with no > luck. We’ve now put in a temporary fix so that Kafka Streams works with our > framework but it feels like this might be a missing feature or potentially a > bug. > +Example+ > Given: > * We have two messages (InA and InB). > * Both messages have the same partition key. > * A rebalance is in progress so streams is no longer able to commit. > When: > # Message InA -> processor -> OutA (not committed) > # Message InB -> processor -> blocked because #1 has not been committed -- This message was sent by Atlassian Jira (v8.20.10#820010)