+1, shall we have a call sometime next week? On Thursday, May 7, 2015, Martin Eppel (meppel) <mep...@cisco.com> wrote:
> Hi Imesh, Sandaruwan > > > > We would like to continue the discussion on this feature as we think this > could be a useful enhancement to stratos. > > > > To get some idea about the effort and as a first steps towards an > implementation I identified the areas / components which IMHO need to be > enhanced (based on stratos 4.1: > > Btw, I also marked some of the items with a “?” - any feedback would be > appreciated. > > > > · new Rest API to update resource state with maintenance mode: > > o PUT, resource types: application / group / cluster / instance > > o Maintenance mode on / off / restart / replace > > § sub state: autoscaling off / on > > § auto healing on / off > > · new API in autoscaler to set maintenance mode – not sure about > that if necessary, any pointers ? > > · adding new / enhancing existing topology events : [application > / group /cluster / member] > > o enhancing messaging domain model to add maintenance state + sub states > > o adding / enhancing event handling in Autoscaler (receiver, monitors, > etc …) > > § Event receiver / monitor for maintenance event > > § Can we utilize / reuse > ClusterMonitor->handleMemberMaintenanceModeEvent for this feature ? > > § > > · Adding maintenance state (In autoscaler e.g. > ClusterStatusProcessor, GroupStatusProcessor, etc. ) > > o application > > o group > > o cluster > > o member – member already has a MAINTENANCE state, can we utilize it > for this feature ? > > · enhance / add drools rule to handle the new maintenance mode to > turn on / off autoscaling, auto healing > > o scale up / scale down, dependent scaling, min / max > > o logging requirements > > · AutoscalerHealthStatEventReceiver > > o Handle Fault Events in context of maintenance mode > > · Persistence of maintenance related states > > o Registry - any pointers on how the maintenance mode should be > persisted ? > > > > > > Any thoughts or feedback on this, do you think there will be other > components affected or need to be reworked ? > > > > The other question would be what will be the best or recommended way to > develop the feature with the input from the community and to ensure a > smooth integration with the stratos master ? > > > > > > Thanks > > > > Martin > > > > *From:* Shaheedur Haque (shahhaqu) > *Sent:* 09 April 2015 14:44 > *To:* dev@stratos.apache.org > <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>; Sandaruwan > Nanayakkara (JIRA); Imesh Gunaratne (im...@wso2.com > <javascript:_e(%7B%7D,'cvml','im...@wso2.com');>) > *Subject:* RE: Maintenance modes (was RE: [jira] [Commented] > (STRATOS-1234) Software Update Management Solution for Stratos) > > > > Hi Imesh, Sandaruwan, > > > > Here is a written-up proposal. I **think** it covers the various use > cases suggested both here and in JIRA STRATOS-1234, but as always, your > thoughts on the matter are welcome. The write-up has the form of a “spec” > and a “Q&A”. As a next step, I guess we could do a hang-out or con-call or > something? > > > > Thoughts welcome… > > > > Thanks, Shaheed > > > > *OPERATIONAL STATE COMMANDS* > > > > The following commands, with the defined effects, are needed: > > > > · No command *directly* affects what I call the “major state” of > the Application/Group/Cluster/Cartridge, i.e. the state as reflected in the > information CURRENTLY returned by the application/{appId}/runtime > information. > > · Each command affects what I call the “operational state” only. > The commands and their operational states are: > > o Autoscaling *on*, *off*. Autoscaling *on* is current behaviour. > > o Autohealing *on*, *off*. Autohealing *on* is current behaviour. > > o Maintenance *off*, *restart, replace.* Maintenance *off* is current > behaviour. > > o (We can add more later if needed) > > > > Command > > Server effect > > Cartridge effect > > Autoscaling *off*. > > CEP and gathers stats and history as usual. Autoscalar operates as usual, > except that no scaling is done. Instead, a cluster state variable tracks > the normal, overload or underload state and logs messages when this state > variable changes value. > > No effect on running cartridges. No new cartridges are spun up, no > existing cartridges are spun down EXCEPT for autohealing. > > Autohealing *off*. > > CEP ignores any heartbeat timeout other than to log that it happened, and > set an instance state variable to track this. > > When autohealing is turned back on, the timeout will happen again, and the > failure will be acted upon normally, except that the log shall make it > clear (using the instance state variable) that the autohealing had been > delayed. > > No new cartridges are spun up until after the autohealing is enabled. > > Maintenance *restart.* > > Like autohealing *off* except that the an extra state variable is set > indicating maintenance mode is in effect. > > > > The both state variables are cleared when the Cartridge resume event is > seen. > > Cartridge is signalled with an **event**, not a blocking callout. > > > > Cartridge application must be able to reboot or just restart, and have the > cartridge agent resume its previous (active/inactive) state. When resuming, > the agent signals the server with a resume **event**. > > > > Note this implies the cartridge agent is restartable (because the > application can choose to reboot). > > Maintenance *replace.* > > Like maintenance restart except that the cartridge instance is replaced. > > The difference between “restart” and “replace” is that the latter is for > applications that cannot update themselves, but expect essentially a new VM > instance with the new software. > > > > In other words, this is the big hammer/most general approach to upgrades > (e.g. this is more likely to work that an apt-get downgrade J). > > > > · Each command referred to here is a REST API call. > > · Each command can apply to an entire Application, or any nested > level (group or cartridge) within it. > > · Arguments for application-wide use case: > > o application={appId}, operationalState={command} > > · Arguments for nested-level use case: > > o application={appId}, nesting={0}/{1}/{2}/…/{n}, operationalState > ={command} > > > > *Q&A* > > > > 1. *What’s the point of restart/replace, over and above auto* off?* > > > > These are to actually cause the application software in the VM instance to > take note to do something. Typically, I would expect this to result in an > internally-managed software update. For example think of a VMs running > Ubuntu, and pointing to a known repository of say security patches, they > could all just do a “apt-get update/upgrade”. > > > > The Cartridge logic is defined to be event-based rather than blocking, > because making the thing blocking would be a problem if a reboot was > involved. (Also, generally, blocking operations in a distributed system > raise too many edge cases like: can this operation be cancelled? Repeated? > etc.). > > > > 2. *Propagation/inheritance rules* > > > > I see two options: > > > > · Use hierarchy. If you apply a thing a hierarchy level n, and n > has internal structure (i.e. it is a group not a cartridge), the command > propagates all the way down (note: this is implied in what I said for the > application level command). > > · Do not use hierarchy. The command only applies to the level to > which is was addressed by the REST call. > > > > In either case, the effect of contradictory commands is UNDEFINED, i.e. > toggling the flags in quick succession will likely result in an unhelpful > outcome. > > > > I think the normal approach is NOT to use hierarchy; after all just > because there is a upgrade to be applied for application code in a given > set of VMs, there is nothing to say that any elements lower down the > hierarchy should be upgraded at the same time. Even in the case where (say) > security patches to a common OS are to be applied, I would doubt the sanity > of anybody doing this across every VM in the whole system in one go J. > OTOH, maybe I am wrong! > > > > 3. *Should these commands apply to “deployed” or only to > “configured” Applications?* > > > > I think the commands can be applied whether the Application is deployed or > not….clearly the stuff that sets flags on instances has to set those flags > on all current and future instances that may spin up under a given > deployment. > > > > > > > > *From:* Imesh Gunaratne [mailto:im...@apache.org > <javascript:_e(%7B%7D,'cvml','im...@apache.org');>] > *Sent:* 27 March 2015 04:21 > *To:* dev > *Subject:* Re: Maintenance modes (was RE: [jira] [Commented] > (STRATOS-1234) Software Update Management Solution for Stratos) > > > > Hi Shaheed, > > > > A really good suggestion! I think we could to manage what you have > suggested in the same implementation as they overlap. I'm +1 for the idea > of putting a cluster into the "Maintenance Mode" manually for diagnostic > purposes and stop autoscaling it. We could introduce new API methods to > manage this. The only question is whether we could use the same instance > state for all the scenarios: > > > > 1. Update platform (might need to use the term platform here as it may get > confused with the software that may run on the platform) > > 2. Apply patches > > 3. Pause a cluster for diagnostic purposes > > > > I would like to suggest to change the updateSoftware API method to > updatePlatform: > > POST /applications/{applicationId}/updatePlatform > > > > May be we could introduce a new API method as follows to put a cluster > into "Maintenance/Diagnostic Mode": > > POST /clusters/{clusterId}/pause > > > > Thanks > > Imesh > > > > On Thu, Mar 26, 2015 at 3:01 PM, Shaheedur Haque (shahhaqu) < > shahh...@cisco.com <javascript:_e(%7B%7D,'cvml','shahh...@cisco.com');>> > wrote: > > > First, let me say that I like a lot of what is proposed in this JIRA, but > I am forking the thread here because I would like to suggest that we > generalise just one part of it, the API into Stratos to cover a set of > related use cases. > > In the current version of this JIRA, the proposed API into Stratos looks > like this: > > PUT /api/applications/{applicationId} /updateSoftware > > (see the JIRA section 2.3 for the details). I think this is actually one > of a set of possible runtime states that we would like to put VM instances > and various parts of Stratos in. Notice that I am deliberately not using > specific terms such as "cluster" or "Autoscalar" because working that out > is the point of this email. > > So, the sorts of use cases I have in mind are: > > - Updating the cartridge software as per this JIRA > - Putting a cluster (or maybe an instance) into a "maintenance mode" > for diagnostic reasons. There could be multiple versions of this > maintenance mode where (for example) > > > - The instance(s) might still handle traffic and deliver "I'm alive" > health stats but no autoscaling is done. > - The instance(s) don't deliver health stats but no health stats > > > - Some of these would deliver notifications to the cartridge agent, > others might only affect Stratos component(s). > - etc...other ideas anybody? > > Thus, it might make sense to generalise the API to support a set of > closely related cases. Is there interest in taking such an approach to > address this JIRA as well in clarifying and addressing the other use cases? > > > > Thanks, Shaheed > > ________________________________________ > From: Sandaruwan Nanayakkara (JIRA) [j...@apache.org > <javascript:_e(%7B%7D,'cvml','j...@apache.org');>] > Sent: 25 March 2015 08:36 > To: d...@stratos.incubator.apache.org > <javascript:_e(%7B%7D,'cvml','d...@stratos.incubator.apache.org');> > Subject: [jira] [Commented] (STRATOS-1234) Software Update Management > Solution for Stratos > > [ > https://issues.apache.org/jira/browse/STRATOS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379497#comment-14379497 > ] > > Sandaruwan Nanayakkara commented on STRATOS-1234: > ------------------------------------------------- > > Hi all, > > I have updated the Google doc with updating scenarios and please share > your ideas by commenting and will be pretty much appreciated. > > > https://docs.google.com/document/d/1Ep2EwLubQnAv0bQGXE2ynwIDrRFCtMnCZ1E52KtzUH4/edit?usp=sharing > > After days I finally deployed almost all of the Stratos samples with > kubernates and openstack :) > Now the main fuss is on triggering updates in different software. Can you > give an example on a software and how update is triggered manually. A > practical approach?? > Suppose that I have a software in a single cartridge application. So when > triggering update with the REST we need a specific way to communicate with > the software. Is there any way that this updating command is given to the > software? > > Thanks > Sandaruwan > > > > > Software Update Management Solution for Stratos > > ------------------------------------------------ > > > > Key: STRATOS-1234 > > URL: https://issues.apache.org/jira/browse/STRATOS-1234 > > Project: Stratos > > Issue Type: New Feature > > Reporter: Imesh Gunaratne > > Labels: gsoc2015, mentor > > > > Stratos uses Virtual Machines and Containers for hosting platform > services on different Infrastructure as a Service (IaaS) solutions. At > present Puppet is used for orchestration management on Virtual Machine > based systems and manages all required software in Puppet Master. Container > based systems creates Docker images for each platform service by including > required software in the Docker image itself. > > In Virtual Machine use-case VM instances will communicate with Puppet > master and execute the software installation. The same approach can be used > for applying software updates. > > In Docker use-case we do not use Puppet because a new container with > required software can be started in few seconds. This is very efficient > compared to using Puppet and installing software on demand. > > The requirement of this project is to implement a core Stratos feature > to propagate software updates in a live PaaS environment. > > 1. Puppet based solution: > > - Push software updates of a cartridge to Puppet Master (might not need > to automate). > > - Invoke the software update process via the Stratos API for a given > application. > > - Stratos Manager could send a new event to trigger puppet agent in each > instance to apply the updates. > > 2. Docker based solution > > - Create a new docker image (with a new image id) for the cartridge with > software updates (might not need to automate). > > - Invoke the software update process via the Stratos API for a given > application. > > - Autoscaler can implement a new feature to bring down existing > instances and create new instances with the new docker image id. > > Important! > > - In each scenario if updates are backward compatible, software update > process should execute in phases, it should not bring down the entire > cluster to apply the updates. If so the service will be unavailable for a > certain time period. The idea is to apply the updates to set of members at > a time. > > - If the updates are not backward compatible, we could make the entire > cluster unavailable at once and apply the updates. > > - Member's state needs to be changed to a new state called "Updating" > when applying the updates. > > If there is an interest on doing this project please send a mail to > imesh at apache dot org by copying Apache Dev mailing list [1]. Please > refer Stratos Wiki [2] for more information on Stratos architecture and how > it works. > > [1] http://stratos.apache.org/community/mailing-lists.html > > [2] https://cwiki.apache.org/confluence/display/STRATOS > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > > > > > > -- > > Imesh Gunaratne > > > > Technical Lead, WSO2 > > Committer & PMC Member, Apache Stratos > -- Sent from Gmail Mobile