Maytee Chinavanichkit created KAFKA-6051:
--------------------------------------------

             Summary: ReplicaFetcherThread should close the 
ReplicaFetcherBlockingSend earlier on shutdown
                 Key: KAFKA-6051
                 URL: https://issues.apache.org/jira/browse/KAFKA-6051
             Project: Kafka
          Issue Type: Bug
            Reporter: Maytee Chinavanichkit


The ReplicaFetcherBlockingSend works as designed and will blocks until it is 
able to get data. This becomes a problem when we are gracefully shutting down a 
broker. The controller will attempt to shutdown the fetchers and elect new 
leaders. When the last fetch of partition is removed, as part of the 
{replicaManager.becomeLeaderOrFollower} call will proceed to shut down any idle 
ReplicaFetcherThread. The shutdown process here can block up to until the last 
fetch request completes. This blocking delay is a big problem because the 
{replicaStateChangeLock}, and {mapLock} in {AbstractFetcherManager} is still 
locked causing latency spikes on multiple brokers.

At this point in time, we do not need the last response as the fetcher is 
shutting down. We should close the leaderEndpoint early during 
{initiateShutdown()} instead of after {super.shutdown()}.


For example we see here the shutdown blocked the broker from processing more 
replica changes for ~500 ms 

{code}
[2017-09-01 18:11:42,879] INFO [ReplicaFetcherThread-0-2], Shutting down 
(kafka.server.ReplicaFetcherThread) 
[2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Stopped 
(kafka.server.ReplicaFetcherThread) 
[2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Shutdown completed 
(kafka.server.ReplicaFetcherThread)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to