[ https://issues.apache.org/jira/browse/KAFKA-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060037#comment-16060037 ]
ASF GitHub Bot commented on KAFKA-5502: --------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/3413 > read current brokers from zookeeper upon processing broker change > ----------------------------------------------------------------- > > Key: KAFKA-5502 > URL: https://issues.apache.org/jira/browse/KAFKA-5502 > Project: Kafka > Issue Type: Sub-task > Reporter: Onur Karaman > Assignee: Onur Karaman > Fix For: 0.11.0.0 > > > [~lindong]'s testing of the 0.11.0 release revealed a controller-side > performance regression in clusters with many brokers and many partitions when > bringing up many brokers simultaneously. > The regression is caused by KAFKA-5028: a Watcher receives WatchedEvent > notifications from the raw ZooKeeper client EventThread. A WatchedEvent only > contains the following information: > - KeeperState > - EventType > - path > Note that it does not actually contain the current data or current set of > children associated with the data/child change notification. It is up to the > user to do this lookup to see the current data or set of children. > ZkClient is itself a Watcher. When it receives a WatchedEvent, it puts a > ZkEvent into its own queue which its own ZkEventThread processes. Users of > ZkClient interact with these notifications through listeners > (IZkDataListener, IZkChildListener). IZkDataListener actually expects as > input the current data of the watched znode, and likewise IZkChildListener > actually expects as input the current set of children of the watched znode. > In order to provide this information to the listeners, the ZkEventThread, > when processing the ZkEvent in its queue, looks up the information (either > the current data or current set of children) simultaneously sets up the next > watch, and passes the result to the listener. > The regression introduced in KAFKA-5028 is the time at which we lookup the > information needed for the event processing. > In the past, the lookup from the ZkEventThread during ZkEvent processing > would be passed into the listener which is processed immediately after. For > instance in ZkClient.fireChildChangedEvents: > {code} > List<String> children = getChildren(path); > listener.handleChildChange(path, children); > {code} > Now, however, there are multiple listeners that pass information looked up by > the ZkEventThread into a ControllerEvent which gets processed potentially > much later. For instance in BrokerChangeListener: > {code} > class BrokerChangeListener(controller: KafkaController) extends > IZkChildListener with Logging { > override def handleChildChange(parentPath: String, currentChilds: > java.util.List[String]): Unit = { > import JavaConverters._ > > controller.addToControllerEventQueue(controller.BrokerChange(currentChilds.asScala)) > } > } > {code} > In terms of impact, this: > - increases the odds of working with stale information by the time the > ControllerEvent gets processed. > - can cause the cluster to take a long time to stabilize if you bring up many > brokers simultaneously. > In terms of how to solve it: > - (short term) just ignore the ZkClient's information lookup and repeat the > lookup at the start of the ControllerEvent. This increases reads from 1 read > per change to 2 reads per change. This is the approach taken in this ticket. > - (long term) try to remove a queue. This basically means getting rid of > ZkClient. This is likely the approach that will be taken in KAFKA-5501. Note > that with KAFKA-5501, we can revert this short term fix so that we reduce the > reads from 2 reads per change back down to 1 read per change. -- This message was sent by Atlassian JIRA (v6.4.14#64029)