Github user Parth-Brahmbhatt commented on the pull request: https://github.com/apache/storm/pull/354#issuecomment-85338889 Disclaimer I did not thoroughly look at the code but I am commenting based on your design description of Jstorm. @longdafeng Did you have a chance to take a look at the current design? We are using curator for leader election which seems to be a very well tested library and is not really far from what you have proposed for leader election. As for the length of the code, I don't completely agree with that being a good metric for most things. Due to the usage of an existing library the actual code for leader election in current PR is much smaller, 53 lines. https://github.com/Parth-Brahmbhatt/incubator-storm/blob/STORM-166/storm-core/src/clj/backtype/storm/zookeeper.clj#L250. On top of that as part of this PR several of us had concerns around all clients connecting to zk to identify leader nimbus , as each new zk connection is a write to zk. We have partially fixed the issue by introducing thrift APIs for nimbus discovery which should be more efficient then the original approach and I plan to add caching at nimbus layer which should further improve the performance. As @ptgoetz mentioned in the jira, we do not want user's topologies getting lost once nimbus accepts it and we also do not want to force all users to have a dependency on a fully replicated storage layer like HDFS. In current design by adding a code replication interface we are guaranteeing that once a topology is in active state it will be fully replicated, which seems to be another missing feature in your proposal. Its still a choice between availability and initial topology submission time which the users can chose based on their topology.replication.count config setting. We also added few more features like UI improvements, nimbus summary being stored in zk, thrift API modification so users can figure out replication factor of their topologies, compatibility with rolling upgrade feature. All of which in my opinion are good admin tools and this feature will be incomplete without it. I appreciate any feedback you can provide based on your experience of running Nimbus HA in production for a year. Please take some time to review the current design and let us know if you have any concerns.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---