[ 
https://issues.apache.org/jira/browse/KAFKA-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568572#comment-16568572
 ] 

Guozhang Wang commented on KAFKA-4479:
--------------------------------------

My understanding is that, the previous flakiness of 
`QueryableStateIntegrationTest` is not due to the timing issue that a rebalance 
is not triggered in time, but due to the fact that we did not cleanup the 
internal topics and hence the verification itself will fail even before the 
thread shutdown.

If it was indeed not the case, then we should re-open the ticket. 

> Streams tests should pass without hardcoded Time.SYSTEM in GroupCoordinator
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-4479
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4479
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams, unit tests
>            Reporter: Ismael Juma
>            Priority: Minor
>              Labels: newbie, newbie++
>         Attachments: test.zip
>
>
> If we pass `KafkaServer.time` to `GroupCoordinator`[1], some streams tests 
> like QueryableStateIntegrationTest fail sem-regularly. [~damianguy] looked 
> into it and described it as:
> {quote}
> Looking at the sequence of events, one thread is stopped, and hence leaves 
> the group triggering a rebalance, but the other thread doesn’t seem to get 
> the memo, tries to commit, fails, and then game-over.
> So.. the case that it fails the one alive thread is not getting a rebalance. 
> This would happen during  a `poll(..)` right? However i can see the thread is 
> polling many times after the other thread has shutdown.
> It tries to commit every time around the loop, so:
> poll(..)
> process(..)
> maybeCommit(..)
> and there is like < 10ms between calls to `poll`.
> {quote}
> A theory was that the mock time was not advancing enough to trigger a 
> rebalance in the group coordinator. However, the consumer is closed, so that 
> should trigger a `LeaveGroup` request and it's unclear why a rebalance is not 
> triggered for the live consumer.
> PR where this issue was first seen and discussed: 
> https://github.com/apache/kafka/pull/2095
> [1] 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaServer.scala#L222



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to