[ https://issues.apache.org/jira/browse/KAFKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090765#comment-15090765 ]
ASF GitHub Bot commented on KAFKA-3038: --------------------------------------- GitHub user enothereska opened a pull request: https://github.com/apache/kafka/pull/750 KAFKA-3038: use async ZK calls to speed up leader reassignment Updated failure code path to deal specifically with issue identified at affecting latency most. @fpj could you have a look please? You can merge this pull request into a Git repository by running: $ git pull https://github.com/enothereska/kafka kafka-3038 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/750.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #750 ---- commit 3be8bb68c6ccb37b77ed527cf4ff05bc80ee8e99 Author: Eno Thereska <eno.there...@gmail.com> Date: 2016-01-08T16:09:38Z Asynchronous implementation of failure path when updating Zookeeper commit e288c5e35d151e6e8ce06eaa1076ebb2ceb2db13 Author: Eno Thereska <eno.there...@gmail.com> Date: 2016-01-08T16:10:07Z Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038 commit 3913ab76707a6ad125b4252d88bc3cdf091702ee Author: Eno Thereska <eno.there...@gmail.com> Date: 2016-01-09T18:23:33Z Implemented top method using a CountDownLatch. Minor code cleanup commit a40ad4e768f1c626fc6c818c28d22f0a91d33eaf Author: Eno Thereska <eno.there...@gmail.com> Date: 2016-01-09T18:24:25Z Merge remote-tracking branch 'apache-kafka/trunk' into kafka-3038 ---- > Speeding up partition reassignment after broker failure > ------------------------------------------------------- > > Key: KAFKA-3038 > URL: https://issues.apache.org/jira/browse/KAFKA-3038 > Project: Kafka > Issue Type: Improvement > Components: controller, core > Affects Versions: 0.9.0.0 > Reporter: Eno Thereska > Assignee: Eno Thereska > Fix For: 0.9.0.0 > > > After a broker failure the controller does several writes to Zookeeper for > each partition on the failed broker. Writes are done one at a time, in closed > loop, which is slow especially under high latency networks. Zookeeper has > support for batching operations (the "multi" API). It is expected that > substituting serial writes with batched ones should reduce failure handling > time by an order of magnitude. > This is identified as an issue in > https://cwiki.apache.org/confluence/display/KAFKA/kafka+Detailed+Replication+Design+V3 > (section End-to-end latency during a broker failure) -- This message was sent by Atlassian JIRA (v6.3.4#6332)