[
https://issues.apache.org/jira/browse/SAMZA-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367310#comment-14367310
]
Chris Riccomini commented on SAMZA-607:
---------------------------------------
bq. Would it make sense for the BrokerProxy to abdicate all of its
topic-partitions after getting too many network errors, and possibly shut
itself down if it becomes empty? I think it'd be good to support brokers going
offline temporarily or even permanently.
Yep, this is pretty much what SAMZA-590 does. If there's an error, the
BrokerProxy abdicates everything. I'm going to close this as a dupe unless you
think it's something different, [~gian]?
> BrokerProxy gets stuck on down brokers
> --------------------------------------
>
> Key: SAMZA-607
> URL: https://issues.apache.org/jira/browse/SAMZA-607
> Project: Samza
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Gian Merlino
>
> I took a broker offline for a few hours today and found that a Samza job was
> stuck trying to read from it while it was down, instead of switching to
> another broker in the ISR (this was a replicated topic with some partitions
> under-replicated, but all partitions available). During this time the
> BrokerProxy thread was in a retry loop logging a lot of
> ClosedChannelExceptions.
> The broker had done a clean shutdown, but I think what happened is that the
> BrokerProxy just hadn't made any calls between when that broker stopped being
> leader for its partitions and when that broker went offline. So, it never got
> a NotLeaderForPartitionException and never abdicated.
> Would it make sense for the BrokerProxy to abdicate all of its
> topic-partitions after getting too many network errors, and possibly shut
> itself down if it becomes empty? I think it'd be good to support brokers
> going offline temporarily or even permanently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)