Github user HeartSaVioR commented on the issue: https://github.com/apache/storm/pull/1574 @revans2 And local BlobStore should be designed to achieve hive availability just same as HDFS BlobStore. But the process BlobStore is behind is Nimbus, which is designed to fail-fast, which I think is just not same way. For example, suppose the scenario what I addressed from STORM-1977. Tested with the scenario I've described from STORM-1977 1. comment cleanup-corrupt-topologies! from nimbus.clj (It's a quick workaround for resolving STORM-1976), and patch Storm cluster 2. Launch Nimbus 1 (leader) 3. Run topology1 4. Kill Nimbus 1 5. Launch Nimbus 2 from different node Without having condition for granting leadership, Nimbus 2 can grant leadership, and act as leader. This is not a blocker for BlobStore's view, since replication count for topology1 is 0 but it doesn't make them crashed, and reviving Nimbus 1 should eventually replicate topology 1 to Nimbus 2. The thing is, leader nimbus should do the own work as Nimbus. In this case just requesting getClusterInfo can make Nimbus 2 crashed, and Nimbus 1 comes in later and gain leadership, but replication count for topology1 is still 1 until Nimbus 2 comes in. With having condition for granting leadership, Nimbus 2 gives up leadership, and continuously waits for new leader. (No leader at that time) And Nimbus 1 comes in, and topology1 is eventually replicated to Nimbus 2 to ensure replication count. Due to this behavior, the crash and recovery scenario heavily depends on sequence of launching Nimbuses. I think this is not a good UX.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---