[ 
https://issues.apache.org/jira/browse/STORM-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382291#comment-15382291
 ] 

ASF GitHub Bot commented on STORM-634:
--------------------------------------

Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/414
  
    @danny0405 like @HeartSaVioR said it depends on the versions you are 
upgrading between.  Most of the time we have maintained wire and binary 
compatibility so you can do the upgrade piecemeal. This should work between 
versions of storm that have the same major version number.  1.0.0 to 1.1.0, or 
1.1.0 to 1.1.2, but not 0.10.x to 1.0.0.
    
    The procedure that we follow when doing an upgrade is to
    
    1) shutdown and upgrade nimbus (we are not currently running HA, but if we 
were step 1.b would be to upgrade the other nimbus instances one at a time)
    2) pick a single node that is not upgraded yet.
    2.b) install the new version of storm on the node.
    2.c) shoot all the storm processes, supervisor, logviewer, and workers
    2.d) clear out all of the state on the node (NOT needed every time, but we 
are cautious because of bugs in the past)
    2.e) relaunch the supervisor and logviewer.
    3) repeat until all of the nodes are done.
    
    For our large clusters we actually do a few nodes at a time, not one. This 
procedure does have a few issues.  Primarily the biggest issue is churn in the 
worker processes.  We try to avoid doing the upgrade a lot because it is not 
truly transparent to all topologies.  They recover, but they have had every one 
of their worker processes shot at least one, and possibly multiple times.  This 
can cause data issues in non-trident topologies, and can slow down the 
processing in trident.
    
    I would recommend that you do it a little differently, and this is what we 
want to move to.
    
    for each node in parallel as much as possible install the new version of 
storm then shoot the supervisor and the logviewer.  Wait for them to all come 
back up, or at least enough that you feel good about it.
    
    Then again as parallel as possible shoot all of the worker processes on all 
of the nodes.
    
    This still has the disadvantage of having all of the worker processes being 
shot and slowing things down, but they are guaranteed to only be shot once, and 
the recovery time should be much faster.  The supervisor relaunches them 
quickly instead of possibly having nimbus time them out and reschedule them on 
a node that has not been upgraded yet.


> Storm should support rolling upgrade/downgrade of storm cluster.
> ----------------------------------------------------------------
>
>                 Key: STORM-634
>                 URL: https://issues.apache.org/jira/browse/STORM-634
>             Project: Apache Storm
>          Issue Type: Dependency upgrade
>          Components: storm-core
>            Reporter: Parth Brahmbhatt
>            Assignee: Parth Brahmbhatt
>             Fix For: 0.10.0
>
>
> Currently when a new version of storm is released in order to upgrade 
> existing storm clusters users need to backup their existing topologies , kill 
> all the topologies , perform the upgrade and resubmit all the topologies. 
> This is painful and results in downtime which may not be acceptable for 
> "Always alive"  production systems.
> Storm should support a rolling  upgrade/downgrade deployment process to avoid 
> these downtimes and to make the transition to a different version effortless. 
> Based on my initial attempt the primary issue seem to be the java 
> serialization used to serialize java classes like StormBase, Assignment, 
> WorkerHeartbeat which is then stored in zookeeper. When deserializing if the 
> serial versions do not match the deserialization fails resulting in processes 
> just getting killed indefinitely. We need to change the Utils/serialize and 
> Utils/deserialize so it can support non java serialization mechanism like 
> json. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to