Github user Parth-Brahmbhatt commented on the pull request:
https://github.com/apache/storm/pull/354#issuecomment-72604552
@harshach Thanks for taking the time to review.
1. nimbus.min.replication.count=1, makese sense.Currently in
wait_for_desired_code_replication method I am passing "conf" and not
"storm-conf" or "total-storm-conf" which has the user overrides so this
configuration can not be modified per topology. I didn't think about the
scenario where users have topologies with different replication needs but I
think its a good idea to give this option so modified both the name of the
config and the code.
2. nimbus.max.replication.wait.time.sec=0, again will change the name and
code so this becomes a topology level config. Also for starters updating the
default value to 60 seconds.
3. I have documented the configs and meaning as part of nimbus-ha-design in
the configuration section. I will update the section to indicate the changes
made in point 1 and 2.
4. That is a new bug and I will try to reproduce it locally. The local
files from nimbuses are deleted via "clean-inbox" thread and that thread is
scheduled for non leader nimbuses as well so not really sure why this would
happen.
5. By default all nimbuses will sync-code so even if user indicates he only
wants replication of n, it is just the lower bound and if cluster has m
nimbuses where m > n , all m nimbuses will replicate topology code. If by
dynamically modifying the replication count you mean some API like "Deactivate
my topology and reactivate only when this replication count is achieved" then
yes we can discuss it in separate jira but my initial thought is this won't be
very useful as ideally the admin should investigate why all nimbuses don't have
the topology replicated.
6. Agreed, I can do this as part of this PR only.
7. Renew credentials thread only renews creds per topology and stores the
creds for that topology in zookeeper under /storm/credentials/storm-id. I don't
think it is responsible for renewing nimbus's own credentials. Given a non
leader nimbus process will never use a topology credentials for any purpose and
should have its own TGT for nimbus operation I am not sure if we want to enable
this thread. You already know I don't understand the storm security part well
:-). can you explain why nimbus would need topology creds?
8. cleanup-corrupt-topology thread lists local directory and diffs it with
active storm-ids. It does not cleanup local state but instead deletes the
zookeeper state for missing storm-ids. This means an out of sync nimbus host
can delete a perfectly working topology. I think you mean to activate the
clean-inbox thread which clears the local state and that thread is already
activated for non leader nimbus.
9. As mentioned in the design doc I think a thrift API is a great idea for
nimbus discovery and I plan to add them. I felt the changes I have as part of
this single PR are already too big for review and it might be better to do it
as part of a follow up task. Given both you and @revans2 have asked for this I
am going to create a child task and submit a PR against this branch.
Thanks for the vagrant setup, I tried and failed to have a working multi
nimbus secure setup. With your secure vagrant multi nimbus setup I can now test
nimbus HA in secure environment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---