Zhanxiang (Patrick) Huang created KAFKA-8667:
------------------------------------------------
Summary: Improve leadership transition time
Key: KAFKA-8667
URL: https://issues.apache.org/jira/browse/KAFKA-8667
Project: Kafka
Issue Type: Improvement
Reporter: Zhanxiang (Patrick) Huang
Assignee: Zhanxiang (Patrick) Huang
When the replica fetcher thread processes fetch response, it will hold the
{{partitionMapLock}}. If at the same time, a LeaderAndIsr request comes in, it
will be blocked at the end of its processing when calling
{{shutdownIdleFetcherThread}} because it will need to wait for the
{{partitionMapLock}} of each replica fetcher thread to be acquired to check
whether there is any partition assigned to each fetcher and the request handler
thread performs this check sequentially for the fetcher threads
For example, in a cluster with 20 brokers and num.replica.fetcher.thread set to
32, if each fetcher thread holds lock for a little bit longer, the total time
for the request handler thread to finish shutdownIdleFetcherThread can be a lot
larger due to waiting for the partitionMapLock for a longer time for each
fetcher thread. If the LeaderAndIsr gets blocked for >request.timeout.ms
(default to 30s) in the broker, request send thread in the controller side will
timeout while waiting for the response and try to establish a new connection to
the broker and re-send the request, which will break in-order delivery because
we will have more than one channel talking to the broker. Moreover, this may
make the lock contention problem worse or saturate request handler threads
because duplicate control requests are sent to the broker for multiple time. In
our own testing, we saw up to *8 duplicate LeaderAndIsrRequest* being sent to
the broker during bounce and the 99th LeaderAndIsr local time goes up to ~500s.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)