[GitHub] storm issue #1574: STORM-1977 Restore logic: give up leadership when elected...

HeartSaVioR Mon, 18 Jul 2016 08:13:34 -0700

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/1574
  
    @revans2 
    And local BlobStore should be designed to achieve hive availability just 
same as HDFS BlobStore. But the process BlobStore is behind is Nimbus, which is 
designed to fail-fast, which I think is just not same way.
    
    For example, suppose the scenario what I addressed from STORM-1977.
    
    Tested with the scenario I've described from STORM-1977
    
    1. comment cleanup-corrupt-topologies! from nimbus.clj (It's a quick 
workaround for resolving STORM-1976), and patch Storm cluster
    2. Launch Nimbus 1 (leader)
    3. Run topology1
    4. Kill Nimbus 1
    5. Launch Nimbus 2 from different node
    
    Without having condition for granting leadership, Nimbus 2 can grant 
leadership, and act as leader. This is not a blocker for BlobStore's view, 
since replication count for topology1 is 0 but it doesn't make them crashed, 
and reviving Nimbus 1 should eventually replicate topology 1 to Nimbus 2.
    The thing is, leader nimbus should do the own work as Nimbus. In this case 
just requesting getClusterInfo can make Nimbus 2 crashed, and Nimbus 1 comes in 
later and gain leadership, but replication count for topology1 is still 1 until 
Nimbus 2 comes in.
    
    With having condition for granting leadership, Nimbus 2 gives up 
leadership, and continuously waits for new leader. (No leader at that time) And 
Nimbus 1 comes in, and topology1 is eventually replicated to Nimbus 2 to ensure 
replication count.
    
    Due to this behavior, the crash and recovery scenario heavily depends on 
sequence of launching Nimbuses. I think this is not a good UX.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #1574: STORM-1977 Restore logic: give up leadership when elected...

Reply via email to