[
https://issues.apache.org/jira/browse/KAFKA-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manikumar resolved KAFKA-6630.
------------------------------
Resolution: Fixed
Fix Version/s: 1.2.0
> Speed up the processing of TopicDeletionStopReplicaResponseReceived events on
> the controller
> --------------------------------------------------------------------------------------------
>
> Key: KAFKA-6630
> URL: https://issues.apache.org/jira/browse/KAFKA-6630
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Reporter: Lucas Wang
> Assignee: Lucas Wang
> Priority: Minor
> Fix For: 1.2.0
>
>
> Problem Statement:
> We find in a large cluster with many partition replicas, it takes a long time
> to successfully delete a topic.
> Root cause:
> Further analysis shows that for a topic with N replicas, the controller
> receives all the N StopReplicaResponses from brokers within a short time,
> however sequentially handling all the N
> TopicDeletionStopReplicaResponseReceived events one by one takes a long time.
> Specifically the functions triggered while handling every single
> TopicDeletionStopReplicaResponseReceived event include:
> TopicDeletionStopReplicaResponseReceived.process calls
> TopicDeletionManager.completeReplicaDeletion, which calls
> TopicDeletionManager.resumeDeletions, which calls several inefficient
> functions.
> The inefficient functions called inside TopicDeletionManager.resumeDeletions
> include
> ReplicaStateMachine.areAllReplicasForTopicDeleted
> ReplicaStateMachine.isAtLeastOneReplicaInDeletionStartedState
> ReplicaStateMachine.replicasInState
> Each of the 3 inefficient functions above will iterate through all the
> replicas in the cluster, and filter out the replicas belonging to a topic. In
> a large cluster with many replicas, these functions can be quite slow.
> Total deletion time for a topic becomes long in single threaded controller
> processing model:
> Since the controller needs to sequentially process the queued
> TopicDeletionStopReplicaResponseReceived events, if the time cost to process
> one event is t, the total time to process all events for all replicas of a
> topic is N * t.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)