[ 
https://issues.apache.org/jira/browse/KAFKA-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738060#comment-17738060
 ] 

David Gammon commented on KAFKA-15116:
--------------------------------------

Hi [~mjsax], please see my responses below:
 # Message A uses an internal store to store information about the entity.  The 
store knows that there is a pending event that is yet to be committed so it 
blocks until it is committed. The problem happens when Message B (which has a 
processor that uses the store) tries to get information on it's entity. It will 
block and timeout because Message A hasn't been committed.
 # I think our scenario is specifically *during* a rebalance. I've seen code 
that says if the taskManager is rebalancing then do not commit.
 # This is more to do with our store and how long it takes before it times out. 
The timeout then can impact the transaction timeout and producers get fenced 
etc.
 # The fix is to add a check for rebalancing in the while loop in runOnce. This 
checks if a rebalancing is in progress and sets the numIterations to 0 to stop 
processing of messages. When it has rebalanced it sets numIterations back to 1.
 # Again I think we are talking about *during* a rebalance rather than before.

Thanks,

David

 

> Kafka Streams processing blocked during rebalance
> -------------------------------------------------
>
>                 Key: KAFKA-15116
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15116
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.5.0
>            Reporter: David Gammon
>            Priority: Major
>
> We have a Kafka Streams application that simply takes a messages, processes 
> it and then produces an event out the other side. The complexity is that 
> there is a requirement that all events with the same partition key must be 
> committed before the next message  is processed.
> This works most of the time flawlessly but we have started to see problems 
> during deployments where the first message blocks the second message during a 
> rebalance because the first message isn’t committed before the second message 
> is processed. This ultimately results in transactions timing out and more 
> rebalancing.
> We’ve tried lots of configuration to get the behaviour we require with no 
> luck. We’ve now put in a temporary fix so that Kafka Streams works with our 
> framework but it feels like this might be a missing feature or potentially a 
> bug.
> +Example+
> Given:
>  * We have two messages (InA and InB).
>  * Both messages have the same partition key.
>  * A rebalance is in progress so streams is no longer able to commit.
> When:
>  # Message InA -> processor -> OutA (not committed)
>  # Message InB -> processor -> blocked because #1 has not been committed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to