[
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137181#comment-14137181
]
Nicolae Marasoiu commented on KAFKA-1461:
-----------------------------------------
Hi,
So I guess in this block:
try {
trace("Issuing to broker %d of fetch request %s".format(sourceBroker.id,
fetchRequest))
response = simpleConsumer.fetch(fetchRequest)
} catch {
case t: Throwable =>
if (isRunning.get) {
warn("Error in fetch %s. Possible cause: %s".format(fetchRequest,
t.toString))
partitionMapLock synchronized {
partitionsWithError ++= partitionMap.keys
}
}
}
I should add a case for the specific scenario of connection
timeout/refused/reset and introduce a backoff on that path?
> Replica fetcher thread does not implement any back-off behavior
> ---------------------------------------------------------------
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
> Issue Type: Improvement
> Components: replication
> Affects Versions: 0.8.1.1
> Reporter: Sam Meder
> Assignee: nicu marasoiu
> Labels: newbie++
>
> The current replica fetcher thread will retry in a tight loop if any error
> occurs during the fetch call. For example, we've seen cases where the fetch
> continuously throws a connection refused exception leading to several replica
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread,
> although the fact that erroring partitions are removed so a leader can be
> re-discovered helps some.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)