Github user harshach commented on the pull request: https://github.com/apache/storm/pull/354#issuecomment-72519250 @Parth-Brahmbhatt I apologize for not getting to this earlier. I took a first-pass at the code and tested it in cluster. Here are my thoughts and nitpicks :) 1. nimbus.min.replication.count default should be 1 . Lets consider that leader nimbus is also a replica. Also can we rename this as "topology.replication.count" . This config affects topology replicas and specifying "min" in the config indicates that minimum has to be reached for the success but in this even if the min.replication is not reached topology can be deployed. 2. nimbus.max.replication.wait.time.sec this should be higher than 0 as default. Can be renamed to "topology.max.replication.wait.time.secs" 3. needs doc on how to use nimbus.min.replication.count . Either user can specify as part of toplogy config? another way is to nimbus.min.replication.count via storm jar example: ./bin/storm jar examples/storm-starter/storm-starter-topologies-0.10.0-SNAPSHOT.jar storm.starter.WordCountTopology wordcount3 -c "nimbus.min.replication.count=0" I am not sure if this value being validated anywhere i.e user can pass in -1 and the topology still gets uploaded. 4. If a user kills a topology , topology jars are not being deleted from the non-leader replica. 5. **Improvement** we can do this in a follow-up JIRA. If a user already uploads a topology and wants to increase its replication factor. Not sure if this is required, thought it would be helpful. 6. **Improvement** follow-up JIRA . TopologySummary should also contain what its replication. Helpful for admin/users to check if a topology has enough replicas. 7. renew-credentials should happen even if nimbus is leader or not . This make sures that a non-leader nimbus still have valid ticket. In a scenario where we don't renew ticket for non-leader nimbus and ticket expires. If the leader nimbus dies , this non-leader nimbus becomes leader which is running with a expired ticket. 8. The above should also apply to cleanup-corrupt-topologies . This might be what causing the issue mentioned in 4. 9. Mostly a question. From the code and doc it looks to me all the clients need to call zk-leader-elector to find the leader nimbus. This is ok for clojure or java code. There can be a usecase where users want to submit a topology without using StormSubmitter, is there a way to provide a thrift api or some other mechanism to discover the leader nimbus by passing in zookeeper connect string. Overall the code looks great and tested the failure scenarios during topology deployment and topology opertaions. It looks good to me. Great work @Parth-Brahmbhatt
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---