Dong Lin created KAFKA-7019:
-------------------------------
Summary: Reduction the contention between metadata update and
metadata read operation
Key: KAFKA-7019
URL: https://issues.apache.org/jira/browse/KAFKA-7019
Project: Kafka
Issue Type: Improvement
Reporter: Dong Lin
Assignee: Radai Rosenblatt
Currently MetadataCache.updateCache() grabs a write lock in order to process
the UpdateMetadataRequest from controller. And a read lock is needed in order
to handle the MetadataRequest from clients. Thus the handling of
MetadataRequest and UpdateMetadataRequest blocks each other and the broker can
only process such request at a time even if there are multiple request handler
threads. Note that broker can not process MetadataRequest in parallel if there
is a UpdateMetadataRequest waiting for the write lock, even if MetadataRequest
only requires the read lock to e processed.
For large cluster which has tens of thousands of partitions, it can take e.g.
200 ms to process UpdateMetadataRequest and MetadataRequest from large clients
(e.g. MM). During the period when user is rebalancinng cluster, the leadership
change will cause both UpdateMetadataRequest from controller and also
MetadataRequest from client. If a broker receives 10 MetadataRequest per second
and 2 UpdateMetadataRequest per second on average, since these requests need to
be processed one-at-a-time, it can reduce the request handler thread idle ratio
to 0 which makes this broker unavailable to user.
We can address this problem by removing the read/write lock in MetadataCache.
The idea is that MetadataCache.updateCache() can instantiate a new copy of the
cache as method local variable when it is processing the UpdateMetadataRequest
and replace the class private varaible with newly instantiated method local
varaible at the end of MetadataCache.updateCache(). All these can be done
without grabbing any lock. The handling of MetadataRequest only requires access
to the read-only class-private variable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)