[ 
https://issues.apache.org/jira/browse/SAMZA-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated SAMZA-440:
----------------------------------
    Attachment: SAMZA-440-0.patch

Attaching patch. No RB since it's a one liner.

> UnknownTopicOrPartitionCode results in infinite loop in BrokerProxy
> -------------------------------------------------------------------
>
>                 Key: SAMZA-440
>                 URL: https://issues.apache.org/jira/browse/SAMZA-440
>             Project: Samza
>          Issue Type: Bug
>          Components: kafka
>    Affects Versions: 0.8.0
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>             Fix For: 0.8.0
>
>         Attachments: SAMZA-440-0.patch
>
>
> We have seen several occasions where shifting partitions in a Kafka cluster 
> results in some Samza containers getting stuck with:
> {noformat}
> 2014-10-22 15:10:48 BrokerProxy [INFO] Creating new SimpleConsumer for host 
> eat1-app582.corp:10251 for system kafka
> 2014-10-22 15:10:48 BrokerProxy [WARN] Got non-recoverable error codes during 
> multifetch. Throwing an exception to trigger reconnect. Errors: 
> Error([all-service-call-events,10],3,kafka.common.UnknownTopicOrPartitionException)
> 2014-10-22 15:10:48 BrokerProxy [WARN] Restarting consumer due to 
> kafka.common.UnknownTopicOrPartitionException. Turn on debugging to get a 
> full stack trace.
> 2014-10-22 15:10:58 BrokerProxy [INFO] Creating new SimpleConsumer for host 
> eat1-app582.corp:10251 for system kafka
> 2014-10-22 15:10:58 BrokerProxy [WARN] Got non-recoverable error codes during 
> multifetch. Throwing an exception to trigger reconnect. Errors: 
> Error([all-service-call-events,10],3,kafka.common.UnknownTopicOrPartitionException)
> 2014-10-22 15:10:58 BrokerProxy [WARN] Restarting consumer due to 
> kafka.common.UnknownTopicOrPartitionException. Turn on debugging to get a 
> full stack trace.
> 2014-10-22 15:11:08 BrokerProxy [INFO] Creating new SimpleConsumer for host 
> eat1-app582.corp:10251 for system kafka
> {noformat}
> The problem appears to be a misunderstanding in how Kafka works. If a 
> partition is moved to another broker, and the BrokerProxy continues fetching 
> on the old broker, it will throw an UnknownTopicOrPartitionException, and try 
> and try and reconnect to the same broker. It will do this indefinitely. 
> Instead, the BrokerProxy should abdicate the TopicAndPartition, and allow the 
> new broker to pick it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to