Re: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Lakmal Warusawithana Thu, 07 May 2015 06:46:04 -0700

+1, shall we have a call sometime next week?

On Thursday, May 7, 2015, Martin Eppel (meppel) <[email protected]> wrote:


>  Hi Imesh, Sandaruwan
>
>
>
> We would like to continue the discussion on this feature as we think this
> could be a useful enhancement to stratos.
>
>
>
> To get some idea about the effort and as a first steps towards an
> implementation I identified the areas / components which IMHO need to be
> enhanced (based on stratos 4.1:
>
> Btw, I also marked some of the items with a “?” - any feedback would be
> appreciated.
>
>
>
> ·        new Rest API to update resource state with maintenance mode:
>
> o   PUT, resource types: application / group / cluster / instance
>
> o   Maintenance mode on / off / restart / replace
>
> §  sub state: autoscaling off / on
>
> §  auto healing on / off
>
> ·        new API in autoscaler to set maintenance mode – not sure about
> that if necessary, any pointers  ?
>
> ·        adding new / enhancing existing  topology events : [application
> / group /cluster / member]
>
> o   enhancing messaging domain model to add maintenance state + sub states
>
> o   adding / enhancing event handling in Autoscaler (receiver, monitors,
> etc …)
>
> §  Event receiver / monitor for maintenance event
>
> §  Can we utilize / reuse
>  ClusterMonitor->handleMemberMaintenanceModeEvent for this feature ?
>
> §
>
> ·        Adding maintenance state (In autoscaler e.g.
> ClusterStatusProcessor, GroupStatusProcessor, etc. )
>
> o   application
>
> o   group
>
> o   cluster
>
> o   member – member already has a MAINTENANCE state, can we utilize it
> for this feature ?
>
> ·        enhance / add  drools rule to handle the new maintenance mode to
> turn on / off autoscaling, auto healing
>
> o   scale up / scale down, dependent scaling, min / max
>
> o   logging requirements
>
> ·        AutoscalerHealthStatEventReceiver
>
> o   Handle Fault Events in context of maintenance mode
>
> ·        Persistence of maintenance related states
>
> o   Registry - any pointers on how the maintenance mode should be
> persisted  ?
>
>
>
>
>
> Any thoughts or feedback on this, do you think there will be other
> components affected or need to be reworked  ?
>
>
>
> The other question would be what will be the best or recommended way to
> develop the feature with the input from the community and to ensure a
> smooth integration with the stratos master ?
>
>
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Shaheedur Haque (shahhaqu)
> *Sent:* 09 April 2015 14:44
> *To:* [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>; Sandaruwan
> Nanayakkara (JIRA); Imesh Gunaratne ([email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>)
> *Subject:* RE: Maintenance modes (was RE: [jira] [Commented]
> (STRATOS-1234) Software Update Management Solution for Stratos)
>
>
>
> Hi Imesh, Sandaruwan,
>
>
>
> Here is a written-up proposal. I **think** it covers the various use
> cases suggested both here and in JIRA STRATOS-1234, but as always, your
> thoughts on the matter are welcome. The write-up has the form of a “spec”
> and a “Q&A”. As a next step, I guess we could do a hang-out or con-call or
> something?
>
>
>
> Thoughts welcome…
>
>
>
> Thanks, Shaheed
>
>
>
> *OPERATIONAL STATE COMMANDS*
>
>
>
> The following commands, with the defined effects, are needed:
>
>
>
> ·        No command *directly* affects what I call the “major state” of
> the Application/Group/Cluster/Cartridge, i.e. the state as reflected in the
> information CURRENTLY returned by the application/{appId}/runtime
> information.
>
> ·        Each command affects what I call the “operational state” only.
> The commands and their operational states are:
>
> o   Autoscaling *on*, *off*. Autoscaling *on* is current behaviour.
>
> o   Autohealing *on*, *off*. Autohealing *on* is current behaviour.
>
> o   Maintenance *off*, *restart, replace.* Maintenance *off* is current
> behaviour.
>
> o   (We can add more later if needed)
>
>
>
> Command
>
> Server effect
>
> Cartridge effect
>
> Autoscaling *off*.
>
> CEP and gathers stats and history as usual. Autoscalar operates as usual,
> except that no scaling is done. Instead, a cluster state variable tracks
> the normal, overload or underload state and logs messages when this state
> variable changes value.
>
> No effect on running cartridges. No new cartridges are spun up, no
> existing cartridges are spun down EXCEPT for autohealing.
>
> Autohealing *off*.
>
> CEP ignores any heartbeat timeout other than to log that it happened, and
> set an instance state variable to track this.
>
> When autohealing is turned back on, the timeout will happen again, and the
> failure will be acted upon normally, except that the log shall make it
> clear (using the instance state variable) that the autohealing had been
> delayed.
>
> No new cartridges are spun up until after the autohealing is enabled.
>
> Maintenance *restart.*
>
> Like autohealing *off* except that the an extra state variable is set
> indicating maintenance mode is in effect.
>
>
>
> The both state variables are cleared when the Cartridge resume event is
> seen.
>
> Cartridge is signalled with an **event**, not a blocking callout.
>
>
>
> Cartridge application must be able to reboot or just restart, and have the
> cartridge agent resume its previous (active/inactive) state. When resuming,
> the agent signals the server with a resume **event**.
>
>
>
> Note this implies the cartridge agent is restartable (because the
> application can choose to reboot).
>
> Maintenance *replace.*
>
> Like maintenance restart except that the cartridge instance is replaced.
>
> The difference between “restart” and “replace” is that the latter is for
> applications that cannot update themselves, but expect essentially a new VM
> instance with the new software.
>
>
>
> In other words, this is the big hammer/most general approach to upgrades
> (e.g. this is more likely to work that an apt-get downgrade J).
>
>
>
> ·        Each command referred to here is a REST API call.
>
> ·        Each command can apply to an entire Application, or any nested
> level (group or cartridge) within it.
>
> ·        Arguments for application-wide use case:
>
> o   application={appId}, operationalState={command}
>
> ·        Arguments for nested-level use case:
>
> o   application={appId}, nesting={0}/{1}/{2}/…/{n}, operationalState
> ={command}
>
>
>
> *Q&A*
>
>
>
> 1.      *What’s the point of restart/replace, over and above auto* off?*
>
>
>
> These are to actually cause the application software in the VM instance to
> take note to do something. Typically, I would expect this to result in an
> internally-managed software update. For example think of a VMs running
> Ubuntu, and pointing to a known repository of say security patches, they
> could all just do a “apt-get update/upgrade”.
>
>
>
> The Cartridge logic is defined to be event-based rather than blocking,
> because making the thing blocking would be a problem if a reboot was
> involved. (Also, generally, blocking operations in a distributed system
> raise too many edge cases like: can this operation be cancelled? Repeated?
> etc.).
>
>
>
> 2.      *Propagation/inheritance rules*
>
>
>
> I see two options:
>
>
>
> ·        Use hierarchy. If you apply a thing a hierarchy level n, and n
> has internal structure (i.e. it is a group not a cartridge), the command
> propagates all the way down (note: this is implied in what I said for the
> application level command).
>
> ·        Do not use hierarchy. The command only applies to the level to
> which is was addressed by the REST call.
>
>
>
> In either case, the effect of contradictory commands is UNDEFINED, i.e.
> toggling the flags in quick succession will likely result in an unhelpful
> outcome.
>
>
>
> I think the normal approach is NOT to use hierarchy; after all just
> because there is a upgrade to be applied for application code in a given
> set of VMs, there is nothing to say that any elements lower down the
> hierarchy should be upgraded at the same time. Even in the case where (say)
> security patches to a common OS are to be applied, I would doubt the sanity
> of anybody doing this across every VM in the whole system in one go J.
> OTOH, maybe I am wrong!
>
>
>
> 3.      *Should these commands apply to “deployed” or only to
> “configured” Applications?*
>
>
>
> I think the commands can be applied whether the Application is deployed or
> not….clearly the stuff that sets flags on instances has to set those flags
> on all current and future instances that may spin up under a given
> deployment.
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>]
> *Sent:* 27 March 2015 04:21
> *To:* dev
> *Subject:* Re: Maintenance modes (was RE: [jira] [Commented]
> (STRATOS-1234) Software Update Management Solution for Stratos)
>
>
>
> Hi Shaheed,
>
>
>
> A really good suggestion! I think we could to manage what you have
> suggested in the same implementation as they overlap. I'm +1 for the idea
> of putting a cluster into the "Maintenance Mode" manually for diagnostic
> purposes and stop autoscaling it. We could introduce new API methods to
> manage this. The only question is whether we could use the same instance
> state for all the scenarios:
>
>
>
> 1. Update platform (might need to use the term platform here as it may get
> confused with the software that may run on the platform)
>
> 2. Apply patches
>
> 3. Pause a cluster for diagnostic purposes
>
>
>
> I would like to suggest to change the updateSoftware API method to
> updatePlatform:
>
> POST /applications/{applicationId}/updatePlatform
>
>
>
> May be we could introduce a new API method as follows to put a cluster
> into "Maintenance/Diagnostic Mode":
>
> POST /clusters/{clusterId}/pause
>
>
>
> Thanks
>
> Imesh
>
>
>
> On Thu, Mar 26, 2015 at 3:01 PM, Shaheedur Haque (shahhaqu) <
> [email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>>
> wrote:
>
>
> First, let me say that I like a lot of what is proposed in this JIRA, but
> I am forking the thread here because I would like to suggest that we
> generalise just one part of it, the API into Stratos to cover a set of
> related use cases.
>
> In the current version of this JIRA, the proposed API into Stratos looks
> like this:
>
> PUT /api/applications/{applicationId} /updateSoftware
>
> (see the JIRA section 2.3 for the details). I think this is actually one
> of a set of possible runtime states that we would like to put VM instances
> and various parts of Stratos in. Notice that I am deliberately not using
> specific terms such as "cluster" or "Autoscalar" because working that out
> is the point of this email.
>
> So, the sorts of use cases I have in mind are:
>
>    - Updating the cartridge software as per this JIRA
>    - Putting a cluster (or maybe an instance) into a "maintenance mode"
>    for diagnostic reasons. There could be multiple versions of this
>    maintenance mode where (for example)
>
>
>     - The instance(s) might still handle traffic and deliver "I'm alive"
>       health stats but no autoscaling is done.
>       - The instance(s) don't deliver health stats but no health stats
>
>
>    - Some of these would deliver notifications to the cartridge agent,
>    others might only affect Stratos component(s).
>    - etc...other ideas anybody?
>
> Thus, it might make sense to generalise the API to support  a set of
> closely related cases. Is there interest in taking such an approach to
> address this JIRA as well in clarifying and addressing the other use cases?
>
>
>
> Thanks, Shaheed
>
> ________________________________________
> From: Sandaruwan Nanayakkara (JIRA) [[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>]
> Sent: 25 March 2015 08:36
> To: [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>
> Subject: [jira] [Commented] (STRATOS-1234) Software Update Management
> Solution for Stratos
>
> [
> https://issues.apache.org/jira/browse/STRATOS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379497#comment-14379497
> ]
>
> Sandaruwan Nanayakkara commented on STRATOS-1234:
> -------------------------------------------------
>
> Hi all,
>
> I have updated the Google doc with updating scenarios and please share
> your ideas by commenting and will be pretty much appreciated.
>
>
> https://docs.google.com/document/d/1Ep2EwLubQnAv0bQGXE2ynwIDrRFCtMnCZ1E52KtzUH4/edit?usp=sharing
>
> After days I finally deployed almost all of the Stratos samples with
> kubernates and openstack :)
> Now the main fuss is on triggering updates in different software. Can you
> give an example on a software and how update is triggered manually. A
> practical approach??
> Suppose that I have a software in a single cartridge application. So when
> triggering update with the REST we need a specific way to communicate with
> the software. Is there any way that this updating command is given to the
> software?
>
> Thanks
> Sandaruwan
>
>
>
> > Software Update Management Solution for Stratos
> > ------------------------------------------------
> >
> > Key: STRATOS-1234
> > URL: https://issues.apache.org/jira/browse/STRATOS-1234
> > Project: Stratos
> > Issue Type: New Feature
> > Reporter: Imesh Gunaratne
> > Labels: gsoc2015, mentor
> >
> > Stratos uses Virtual Machines and Containers for hosting platform
> services on different Infrastructure as a Service (IaaS) solutions. At
> present Puppet is used for orchestration management on Virtual Machine
> based systems and manages all required software in Puppet Master. Container
> based systems creates Docker images for each platform service by including
> required software in the Docker image itself.
> > In Virtual Machine use-case VM instances will communicate with Puppet
> master and execute the software installation. The same approach can be used
> for applying software updates.
> > In Docker use-case we do not use Puppet because a new container with
> required software can be started in few seconds. This is very efficient
> compared to using Puppet and installing software on demand.
> > The requirement of this project is to implement a core Stratos feature
> to propagate software updates in a live PaaS environment.
> > 1. Puppet based solution:
> > - Push software updates of a cartridge to Puppet Master (might not need
> to automate).
> > - Invoke the software update process via the Stratos API for a given
> application.
> > - Stratos Manager could send a new event to trigger puppet agent in each
> instance to apply the updates.
> > 2. Docker based solution
> > - Create a new docker image (with a new image id) for the cartridge with
> software updates (might not need to automate).
> > - Invoke the software update process via the Stratos API for a given
> application.
> > - Autoscaler can implement a new feature to bring down existing
> instances and create new instances with the new docker image id.
> > Important!
> > - In each scenario if updates are backward compatible, software update
> process should execute in phases, it should not bring down the entire
> cluster to apply the updates. If so the service will be unavailable for a
> certain time period. The idea is to apply the updates to set of members at
> a time.
> > - If the updates are not backward compatible, we could make the entire
> cluster unavailable at once and apply the updates.
> > - Member's state needs to be changed to a new state called "Updating"
> when applying the updates.
> > If there is an interest on doing this project please send a mail to
> imesh at apache dot org by copying Apache Dev mailing list [1]. Please
> refer Stratos Wiki [2] for more information on Stratos architecture and how
> it works.
> > [1] http://stratos.apache.org/community/mailing-lists.html
> > [2] https://cwiki.apache.org/confluence/display/STRATOS
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>


-- 
Sent from Gmail Mobile

Re: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Reply via email to