Zhanxiang (Patrick) Huang created KAFKA-8667:
------------------------------------------------

             Summary: Improve leadership transition time
                 Key: KAFKA-8667
                 URL: https://issues.apache.org/jira/browse/KAFKA-8667
             Project: Kafka
          Issue Type: Improvement
            Reporter: Zhanxiang (Patrick) Huang
            Assignee: Zhanxiang (Patrick) Huang


When the replica fetcher thread processes fetch response, it will hold the 
{{partitionMapLock}}. If at the same time, a LeaderAndIsr request comes in, it 
will be blocked at the end of its processing when calling 
{{shutdownIdleFetcherThread}} because it will need to wait for the 
{{partitionMapLock}} of each replica fetcher thread to be acquired to check 
whether there is any partition assigned to each fetcher and the request handler 
thread performs this check sequentially for the fetcher threads

For example, in a cluster with 20 brokers and num.replica.fetcher.thread set to 
32, if each fetcher thread holds lock for a little bit longer, the total time 
for the request handler thread to finish shutdownIdleFetcherThread can be a lot 
larger due to waiting for the partitionMapLock for a longer time for each 
fetcher thread. If the LeaderAndIsr gets blocked for >request.timeout.ms 
(default to 30s) in the broker, request send thread in the controller side will 
timeout while waiting for the response and try to establish a new connection to 
the broker and re-send the request, which will break in-order delivery because 
we will have more than one channel talking to the broker. Moreover, this may 
make the lock contention problem worse or saturate request handler threads 
because duplicate control requests are sent to the broker for multiple time. In 
our own testing, we saw up to *8 duplicate LeaderAndIsrRequest* being sent to 
the broker during bounce and the 99th LeaderAndIsr local time goes up to ~500s.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to