Github user Parth-Brahmbhatt commented on the pull request:

    https://github.com/apache/storm/pull/354#issuecomment-85338889
  
    Disclaimer I did not thoroughly look at the code but I am commenting based 
on your design description of Jstorm.
    
    @longdafeng Did you have a chance to take a look at the current design? We 
are using curator for leader election which seems to be a very well tested 
library and is not really far from what you have proposed for leader election.
    
    As for the length of the code, I don't completely agree with that being a 
good metric for most things. Due to the usage of an existing library the actual 
code for leader election in current PR is much smaller, 53 lines. 
https://github.com/Parth-Brahmbhatt/incubator-storm/blob/STORM-166/storm-core/src/clj/backtype/storm/zookeeper.clj#L250.
    
    On top of that as part of this PR several of us had concerns around all 
clients connecting to zk to identify leader nimbus , as each new zk connection 
is a write to zk. We have partially fixed the issue by introducing thrift APIs 
for nimbus discovery which should be more efficient then the original approach 
and I plan to add caching at nimbus layer which should further improve the 
performance.
    
    As @ptgoetz mentioned in the jira, we do not want  user's topologies 
getting lost once nimbus accepts it and we also do not want to force all users 
to have a dependency on a fully replicated storage layer like HDFS. In current 
design by adding a code replication interface we are guaranteeing that once a 
topology is in active state it will be fully replicated, which seems to be 
another missing feature in your proposal. Its still a choice between 
availability and initial topology submission time which the users can chose 
based on their topology.replication.count config setting.
    
    We also added few more features like UI improvements, nimbus summary being 
stored in zk, thrift API modification so users can figure out replication 
factor of their topologies,  compatibility with rolling upgrade feature. All of 
which in my opinion are good admin tools and this feature will be incomplete 
without it.
    
    I appreciate any feedback you can provide based on your experience of 
running Nimbus HA in production for a year. Please take some time to review the 
current design and let us know if you have any concerns. 
    
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to