[
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355161#comment-14355161
]
Guozhang Wang commented on KAFKA-1461:
--------------------------------------
[~junrao] Could you elaborate a bit on "different partitions become active at
slightly different times and the fetcher doesn't actually back off"? Not sure I
understand why the fetcher does not actually back off.
I agree that upon IOException thrown in SimpleConsumer.fetch, we should back
off the thread as a whole for common case #1 you mentioned above; but at the
same time we should still consider backing off for partition-specific error
codes, as otherwise the broker logs will be kind of polluted with all error
messages from continuous retries we have seen before. Do you agree?
> Replica fetcher thread does not implement any back-off behavior
> ---------------------------------------------------------------
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
> Issue Type: Improvement
> Components: replication
> Affects Versions: 0.8.1.1
> Reporter: Sam Meder
> Assignee: Sriharsha Chintalapani
> Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error
> occurs during the fetch call. For example, we've seen cases where the fetch
> continuously throws a connection refused exception leading to several replica
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread,
> although the fact that erroring partitions are removed so a leader can be
> re-discovered helps some.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)