Derek Dagit created STORM-1383:
----------------------------------

             Summary: Supervisors should not crash if nimbus is unavailable
                 Key: STORM-1383
                 URL: https://issues.apache.org/jira/browse/STORM-1383
             Project: Apache Storm
          Issue Type: Improvement
          Components: storm-core
    Affects Versions: 0.11.0
            Reporter: Derek Dagit
            Assignee: Derek Dagit


In cases of maintenance or unexpected downtime of nimbus nodes, supervisors 
will crash in a loop.  This can cause a lot of confusion among users 
(supervisors crash repeatedly) and admins (monitoring/alerting triggered for 
the entire cluster).

Supervisors periodically check with nimbus to synchronize blob versions, and as 
part of this, a connection is made to the leader nimbus daemon.  Formerly, 
supervisors did not periodically contact nimbus, and so nimbus downtime did not 
cascade to cluster-wide supervisor failures.

It might be nice to handle the case when nimbus cannot be contacted, and 
continue in the normal loop.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to