[GitHub] storm pull request: Storm-166: Nimbus HA design doc and implementa...

Parth-Brahmbhatt Mon, 23 Mar 2015 22:04:04 -0700

Github user Parth-Brahmbhatt commented on the pull request:

https://github.com/apache/storm/pull/354#issuecomment-85338889

Disclaimer I did not thoroughly look at the code but I am commenting based
on your design description of Jstorm.

@longdafeng Did you have a chance to take a look at the current design? We
are using curator for leader election which seems to be a very well tested
library and is not really far from what you have proposed for leader election.

As for the length of the code, I don't completely agree with that being a
good metric for most things. Due to the usage of an existing library the actual
code for leader election in current PR is much smaller, 53 lines.
https://github.com/Parth-Brahmbhatt/incubator-storm/blob/STORM-166/storm-core/src/clj/backtype/storm/zookeeper.clj#L250.

On top of that as part of this PR several of us had concerns around all
clients connecting to zk to identify leader nimbus , as each new zk connection
is a write to zk. We have partially fixed the issue by introducing thrift APIs
for nimbus discovery which should be more efficient then the original approach
and I plan to add caching at nimbus layer which should further improve the
performance.

As @ptgoetz mentioned in the jira, we do not want user's topologies
getting lost once nimbus accepts it and we also do not want to force all users
to have a dependency on a fully replicated storage layer like HDFS. In current
design by adding a code replication interface we are guaranteeing that once a
topology is in active state it will be fully replicated, which seems to be
another missing feature in your proposal. Its still a choice between
availability and initial topology submission time which the users can chose
based on their topology.replication.count config setting.

We also added few more features like UI improvements, nimbus summary being
stored in zk, thrift API modification so users can figure out replication
factor of their topologies, compatibility with rolling upgrade feature. All of
which in my opinion are good admin tools and this feature will be incomplete
without it.

I appreciate any feedback you can provide based on your experience of
running Nimbus HA in production for a year. Please take some time to review the
current design and let us know if you have any concerns.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Storm-166: Nimbus HA design doc and implementa...

Reply via email to