[GitHub] storm pull request: Storm-166: Nimbus HA design doc and implementa...

harshach Mon, 02 Feb 2015 11:21:12 -0800

Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/354#issuecomment-72519250
  
    @Parth-Brahmbhatt I apologize for not getting to this earlier. I took a 
first-pass at the code and tested it in cluster. Here are my thoughts and 
nitpicks :)
    1. nimbus.min.replication.count default should be 1 . Lets consider that 
leader nimbus is also a replica.
    Also can we rename this as "topology.replication.count" . This config 
affects topology replicas and specifying "min" in the config indicates that 
minimum has to be reached for the success but in this even if the 
min.replication is not reached topology can be deployed.
    2. nimbus.max.replication.wait.time.sec this should be higher than 0 as 
default. Can be renamed to "topology.max.replication.wait.time.secs"
    3. needs doc on how to use nimbus.min.replication.count . Either user can 
specify as part of toplogy config? another way is to 
nimbus.min.replication.count via storm jar
    example: ./bin/storm jar 
examples/storm-starter/storm-starter-topologies-0.10.0-SNAPSHOT.jar 
storm.starter.WordCountTopology wordcount3 -c "nimbus.min.replication.count=0"
    I am not sure if this value being validated anywhere i.e user can pass in 
-1 and the topology still gets uploaded.
    4. If a user kills a topology , topology jars are not being deleted from 
the non-leader replica.
    5. **Improvement** we can do this in a follow-up JIRA. If a user already 
uploads a topology and wants to increase its replication factor. Not sure if 
this is required, thought it would be helpful.
    6. **Improvement** follow-up JIRA .
    TopologySummary should also contain what its replication. Helpful for 
admin/users to check if a topology has enough replicas.
    7. renew-credentials should happen even if nimbus is leader or not . This 
make sures that a non-leader nimbus still have valid ticket. In a scenario 
where we don't renew ticket for non-leader nimbus and ticket expires. If the 
leader nimbus dies , this non-leader nimbus becomes leader which is running 
with a expired ticket.
    8. The above should also apply to cleanup-corrupt-topologies . This might 
be what causing the issue mentioned in 4.
    9. Mostly a question. From the code and doc it looks to me all the clients 
need to call zk-leader-elector to find the leader nimbus. This is ok for 
clojure or java code. There can be a usecase where users want to submit a 
topology without using StormSubmitter, is there a way to provide a thrift api or
    some other mechanism to discover the leader nimbus by passing in zookeeper 
connect string.
    
    Overall the code looks great and tested the failure scenarios during 
topology deployment and topology opertaions. It looks good to me.  Great work 
@Parth-Brahmbhatt




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm-166: Nimbus HA design doc and implementa...

Reply via email to