[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355301#comment-14355301
 ] 

Jun Rao commented on KAFKA-1461:
--------------------------------

[~guozhang], my concern is on the implementation of the DelayedItem. If you 
create a bunch of DelayedItems with the same timeout, they may timeout slightly 
differently since the calculation depends on the current time, which can 
change. In the second case when the leaders are moved one at time, what's going 
to happen is that the controller will tell the broker to move to the right 
leader right away. This typically happens within a few milli seconds. We could 
optimize this case, but I am not sure if it's worth the extra complexity in the 
code. In the first case, the remaining shutdown process could take seconds 
after the socket server is shut down. So backing off will definitely help.

Perhaps we can just do a simple experiment with controlled shutdown and see how 
serious the issue is w/o backing off.

> Replica fetcher thread does not implement any back-off behavior
> ---------------------------------------------------------------
>
>                 Key: KAFKA-1461
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1461
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 0.8.1.1
>            Reporter: Sam Meder
>            Assignee: Sriharsha Chintalapani
>              Labels: newbie++
>             Fix For: 0.8.3
>
>         Attachments: KAFKA-1461.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to