[
https://issues.apache.org/jira/browse/SAMZA-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Riccomini updated SAMZA-440:
----------------------------------
Attachment: SAMZA-440-0.patch
Attaching patch. No RB since it's a one liner.
> UnknownTopicOrPartitionCode results in infinite loop in BrokerProxy
> -------------------------------------------------------------------
>
> Key: SAMZA-440
> URL: https://issues.apache.org/jira/browse/SAMZA-440
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.8.0
> Reporter: Chris Riccomini
> Assignee: Chris Riccomini
> Fix For: 0.8.0
>
> Attachments: SAMZA-440-0.patch
>
>
> We have seen several occasions where shifting partitions in a Kafka cluster
> results in some Samza containers getting stuck with:
> {noformat}
> 2014-10-22 15:10:48 BrokerProxy [INFO] Creating new SimpleConsumer for host
> eat1-app582.corp:10251 for system kafka
> 2014-10-22 15:10:48 BrokerProxy [WARN] Got non-recoverable error codes during
> multifetch. Throwing an exception to trigger reconnect. Errors:
> Error([all-service-call-events,10],3,kafka.common.UnknownTopicOrPartitionException)
> 2014-10-22 15:10:48 BrokerProxy [WARN] Restarting consumer due to
> kafka.common.UnknownTopicOrPartitionException. Turn on debugging to get a
> full stack trace.
> 2014-10-22 15:10:58 BrokerProxy [INFO] Creating new SimpleConsumer for host
> eat1-app582.corp:10251 for system kafka
> 2014-10-22 15:10:58 BrokerProxy [WARN] Got non-recoverable error codes during
> multifetch. Throwing an exception to trigger reconnect. Errors:
> Error([all-service-call-events,10],3,kafka.common.UnknownTopicOrPartitionException)
> 2014-10-22 15:10:58 BrokerProxy [WARN] Restarting consumer due to
> kafka.common.UnknownTopicOrPartitionException. Turn on debugging to get a
> full stack trace.
> 2014-10-22 15:11:08 BrokerProxy [INFO] Creating new SimpleConsumer for host
> eat1-app582.corp:10251 for system kafka
> {noformat}
> The problem appears to be a misunderstanding in how Kafka works. If a
> partition is moved to another broker, and the BrokerProxy continues fetching
> on the old broker, it will throw an UnknownTopicOrPartitionException, and try
> and try and reconnect to the same broker. It will do this indefinitely.
> Instead, the BrokerProxy should abdicate the TopicAndPartition, and allow the
> new broker to pick it up.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)