[ 
https://issues.apache.org/jira/browse/KAFKA-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15995257#comment-15995257
 ] 

Colin P. McCabe commented on KAFKA-5004:
----------------------------------------

Thanks for filing this, [~mjsax].  I think the severity is mitigated somewhat 
by the fact that there has to be a client-side bug (polling thread dies) to 
trigger the bad behavior.

bq. IMHO, a "clean" solution would be, to disable the heartbeat thread if the 
client connects to 0.10.0 broker and sends heartbeats on poll() as 0.10.0 
consumer does. Not sure, how complex this would be to do though.

I think this would be a bit risky since we'd be adding code that only ever gets 
used in a very obscure error path when talking to 0.10.0 brokers.  It's not 
likely to be well-tested.

bq. [~cmccabe] had the idea to set a "flag" on the heartbeat thread each time 
poll() is called, and let the heartbeat thread stop if max.poll.interval.ms 
passed and flag got not "renewed".

Yeah, this might be a good option.

> poll() timeout not enforced when connecting to 0.10.0 broker
> ------------------------------------------------------------
>
>                 Key: KAFKA-5004
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5004
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>    Affects Versions: 0.10.2.0
>            Reporter: Matthias J. Sax
>
> In 0.10.1, heartbeat thread and new poll timeout {{max.poll.interval.ms}} got 
> introduced via KIP-62. In 0.10.2, we added client-broker backward 
> compatibility.
> Now, if a 0.10.2 client connects to a 0.10.0 broker, the broker only 
> understand the heartbeat timeout but not the poll timeout, while the client 
> is still using the heartbeat background threat. Thus, the new client config 
> {{max.poll.interval.ms}} is ignored.
> In the worst case, the polling threat might die while the heartbeat thread is 
> still up. Thus, the broker would not timeout the client and no rebalance 
> would be triggered while at the same time the client is effectively dead not 
> making any progress in its assigned partitions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to