[GitHub] storm pull request: Storm-166: Nimbus HA design doc and implementa...

Parth-Brahmbhatt Mon, 02 Feb 2015 23:07:50 -0800

Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/354#issuecomment-72604552
  
    @harshach Thanks for taking the time to review. 
    1.  nimbus.min.replication.count=1, makese sense.Currently in 
wait_for_desired_code_replication method I am passing "conf" and not 
"storm-conf" or "total-storm-conf" which has the user overrides so this 
configuration can not be modified per topology. I didn't think about the 
scenario where users have topologies with different replication needs but I 
think its a good idea to give this option so modified both the name of the 
config and the code.
    2. nimbus.max.replication.wait.time.sec=0, again will change the name and 
code so this becomes a topology level config. Also for starters updating the 
default value to 60 seconds.
    3. I have documented the configs and meaning as part of nimbus-ha-design in 
the configuration section. I will update the section to indicate the changes 
made in point 1 and 2.
    4. That is a new bug and I will try to reproduce it locally. The local 
files from nimbuses are deleted via "clean-inbox" thread and that thread is 
scheduled for non leader nimbuses as well so not really sure why this would 
happen.
    5. By default all nimbuses will sync-code so even if user indicates he only 
wants replication of n, it is  just the lower bound and if cluster has m 
nimbuses where m > n , all m nimbuses will replicate topology code. If by 
dynamically modifying the replication count you mean some API like "Deactivate 
my topology and reactivate only when this replication count is achieved" then 
yes we can discuss it in separate jira but my initial thought is this won't be 
very useful as ideally the admin should investigate why all nimbuses don't have 
the topology replicated.
    6. Agreed, I can do this as part of this PR only.
    7. Renew credentials thread only renews creds per topology and stores the 
creds for that topology in zookeeper under /storm/credentials/storm-id. I don't 
think it is responsible for renewing nimbus's own credentials. Given a non 
leader nimbus process will never use a topology credentials for any purpose and 
should have its own TGT for nimbus operation I am not sure if we want to enable 
this thread.  You already know I don't understand the storm security part well 
:-).  can you explain why nimbus would need topology creds?
    8. cleanup-corrupt-topology thread lists local directory and diffs it with 
active storm-ids. It does not cleanup local state but instead deletes the 
zookeeper state for missing storm-ids. This means an out of sync nimbus host 
can delete a perfectly working topology. I think you mean to activate the 
clean-inbox thread which clears the local state and that thread is already 
activated for non leader nimbus.
    9. As mentioned in the design doc I think a thrift API is a great idea for 
nimbus discovery and I plan to add them. I felt the changes I have as part of 
this single PR are already too big for review and it might be better to do it 
as part of a follow up task. Given both you and @revans2 have asked for this I 
am going to create a child task and submit a PR against this branch.
    
    Thanks for the vagrant setup, I tried and failed to have a working multi 
nimbus secure setup. With your secure vagrant multi nimbus setup I can now test 
nimbus HA in secure environment.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm-166: Nimbus HA design doc and implementa...

Reply via email to