Re: [openstack-dev] [heat] autoscaling across regions and availability zones

2014-07-11 Thread Mike Spreitzer
Zane Bitter zbit...@redhat.com wrote on 07/10/2014 05:57:14 PM:

 On 09/07/14 22:38, Mike Spreitzer wrote:
  Zane Bitter zbit...@redhat.com wrote on 07/01/2014 06:54:58 PM:
 
On 01/07/14 16:23, Mike Spreitzer wrote:
...
 
 Hmm, now that I think about it, CloudFormation provides a Fn::GetAZs 
 function that returns a list of available AZs. That suggests an 
 implementation where you can specify an AZ

If we're whittling down then it would be one or more AZs, right?

when creating the stack and 
 the function returns only that value within that stack (and its 
 children). There's no way in OS::Heat::AutoScalingGroup to specify an 
 intrinsic function that is resolved in the context of the scaling 
 group's nested stack,

I am not sure I understand what you mean.  Is it: there is no way for the 
implementation of a resource type to create or modify an intrinsic 
function?

   but if the default value of the AZ for 
 OS::Nova::Server were calculated the same way then the user would have 
 the option of omitting the AZ (to allow the autoscaling implementation 
 to control it)

I am not sure I get this part.  If the scaling group member type is a 
Compute instance (as well as if it is not) then the template generated by 
the group (to implement the group) wants to put different resources in 
different AZs.  The nested stack that is the scaling group is given a 
whole list of AZs as its list-of-AZs parameter value.

or overriding it explicitly. At that point you don't even 

 need the intrinsic function.
 
 So don't assign a stack to a particular AZ as such, but allow the list 
 of valid AZs to be whittled down as you move toward the leaves of the 
 tree of templates.

I partially get the suggestion.  Let me repeat it back to see if it sounds 
right.
Let the stack create and update operations gain an optional parameter that 
is a list of AZs,
(noting that a stack operation parameter is something different from a 
parameter
specified in a template)
constrained to be a subset of the AZs available to the user in Heat's 
configured region;
the default value is the list of all AZs available to the user in Heat's 
configured region.
Redefine the Fn::GetAZs intrinsic to return that new parameter's value.
For each resource type that can be given a list of AZs, we (as plugin 
authors) redefine the default to be the list returned by Fn::GetAZs;
for each resource type that can be given a single AZ, we (as plugin 
authors) redefine the default to be one (which one?) of the AZs returned 
by Fn::GetAZs.
That would probably require some finessing around the schema technology, 
because a parameter's default value is fixed when the resource type is 
registered, right?
A template generated by scaling group code somehow uses that new stack 
operation parameter
to set the member's AZ when the member is a stack and the scaling group is 
spanning AZs.
Would the new stack operation parameter (list of AZs) be reflected as a 
property of OS::Heat::Stack?
How would that list be passed in a scenario like 
https://review.openstack.org/#/c/97366/10/hot/asg_of_stacks.yaml,unified
where the member type is a template filename and the member's properties 
are simply the stack's parameters?
Can the redefinitions mentioned here be a backward compatibility problem?

So yes, the tricky part is how to handle that when the scaling unit 
is
not a server (or a provider template with the same interface as a
  server).
   
One solution would have been to require that the scaled unit was,
indeed, either an OS::Nova::Server or a provider template with the 
same
interface as (or a superset of) an OS::Nova::Server, but the 
consensus
was against that. (Another odd consequence of this decision is that
we'll potentially be overwriting an AZ specified in the launch 
config
section with one from the list supplied to the scaling group 
itself.)
   
For provider templates, we could insert a pseudo-parameter 
containing
the availability zone. I think that could be marginally better than
taking over one of the user's parameters, but you're basically on 
the
right track IMO.
 
  I considered a built-in function or pseudo parameter and rejected them
  based on a design principle that was articulated in an earlier
  discussion: no modes.  Making the innermost template explicitly
  declare that it takes an AZ parameter makes it more explicit what is
  going on.  But I agree that this is a relatively minor design point, 
and
  would be content to go with a pseudo-parameter if the community really
  prefers that.
 
Unfortunately, that is not the end of the story, because we still 
have
to deal with other types of resources being scaled. I always 
advocated
for an autoscaling resource where the scaled unit was either a 
provider
stack (if you provided a template) or an OS::Nova::Server (if you
didn't), but the 

Re: [openstack-dev] [heat] autoscaling across regions and availability zones

2014-07-10 Thread Zane Bitter

On 09/07/14 22:38, Mike Spreitzer wrote:

Zane Bitter zbit...@redhat.com wrote on 07/01/2014 06:54:58 PM:

  On 01/07/14 16:23, Mike Spreitzer wrote:
   An AWS autoscaling group can span multiple availability zones in one
   region.  What is the thinking about how to get analogous functionality
   in OpenStack?
  ...
   Currently, a stack does not have an AZ.  That makes the case of an
   OS::Heat::AutoScalingGroup whose members are nested stacks interesting
   --- how does one of those nested stacks get into the right AZ?  And
what
   does that mean, anyway?  The meaning would have to be left up to the
   template author.  But he needs something he can write in his member
   template to reference the desired AZ for the member stack.  I
suppose we
   could stipulate that if the member template has a parameter named
   availability_zone and typed string then the scaling group takes
care
   of providing the right value to that parameter.
 
  The concept of an availability zone for a stack is not meaningful.
  Servers have availability zones; stacks exist in one region. It is up to
  the *operator*, not the user, to deploy Heat in such a way that it
  remains highly-available assuming the Region is still up.

There are two distinct issues there: (1) making the heat engine HA and
(2) making a scaling group of stacks span across AZs (within a region).
  I agree that (1) is the cloud provider's problem, and never meant to
suggest otherwise.  I think (2) makes sense by analogy: a nested stack
is a way of implementing a particular abstraction (defined by the
template) --- in fact the outer template author might not even be aware
that the group members are stacks, thanks to provider templates --- and
here we suppose the user has chosen to use an abstraction that makes
sense to be considered to be in an AZ.  While a stack in general does
not have an AZ, I think we can suppose that if the outer template author
asked for stacks to be spread across AZs then the stacks in question can
reasonably considered to each be in one AZ.  For example, the inner
template might contain a Compute instance and a Cinder volume and an
attachment between the two; such a stack makes sense to put in an AZ.
  Heat itself does not even need there to be any particular real meaning
to a stack being in an AZ, all I am proposing is that Heat make this
concept available to the authors of the outer and innermost templates to
use in whatever way they find useful.


Hmm, now that I think about it, CloudFormation provides a Fn::GetAZs 
function that returns a list of available AZs. That suggests an 
implementation where you can specify an AZ when creating the stack and 
the function returns only that value within that stack (and its 
children). There's no way in OS::Heat::AutoScalingGroup to specify an 
intrinsic function that is resolved in the context of the scaling 
group's nested stack, but if the default value of the AZ for 
OS::Nova::Server were calculated the same way then the user would have 
the option of omitting the AZ (to allow the autoscaling implementation 
to control it) or overriding it explicitly. At that point you don't even 
need the intrinsic function.


So don't assign a stack to a particular AZ as such, but allow the list 
of valid AZs to be whittled down as you move toward the leaves of the 
tree of templates.



  So yes, the tricky part is how to handle that when the scaling unit is
  not a server (or a provider template with the same interface as a
server).
 
  One solution would have been to require that the scaled unit was,
  indeed, either an OS::Nova::Server or a provider template with the same
  interface as (or a superset of) an OS::Nova::Server, but the consensus
  was against that. (Another odd consequence of this decision is that
  we'll potentially be overwriting an AZ specified in the launch config
  section with one from the list supplied to the scaling group itself.)
 
  For provider templates, we could insert a pseudo-parameter containing
  the availability zone. I think that could be marginally better than
  taking over one of the user's parameters, but you're basically on the
  right track IMO.

I considered a built-in function or pseudo parameter and rejected them
based on a design principle that was articulated in an earlier
discussion: no modes.  Making the innermost template explicitly
declare that it takes an AZ parameter makes it more explicit what is
going on.  But I agree that this is a relatively minor design point, and
would be content to go with a pseudo-parameter if the community really
prefers that.

  Unfortunately, that is not the end of the story, because we still have
  to deal with other types of resources being scaled. I always advocated
  for an autoscaling resource where the scaled unit was either a provider
  stack (if you provided a template) or an OS::Nova::Server (if you
  didn't), but the implementation that landed followed the design of
  ResourceGroup by allowing (actually, 

Re: [openstack-dev] [heat] autoscaling across regions and availability zones

2014-07-09 Thread Mike Spreitzer
Zane Bitter zbit...@redhat.com wrote on 07/01/2014 06:54:58 PM:

 On 01/07/14 16:23, Mike Spreitzer wrote:
  An AWS autoscaling group can span multiple availability zones in one
  region.  What is the thinking about how to get analogous functionality
  in OpenStack?
 ...
  Currently, a stack does not have an AZ.  That makes the case of an
  OS::Heat::AutoScalingGroup whose members are nested stacks interesting
  --- how does one of those nested stacks get into the right AZ?  And 
what
  does that mean, anyway?  The meaning would have to be left up to the
  template author.  But he needs something he can write in his member
  template to reference the desired AZ for the member stack.  I suppose 
we
  could stipulate that if the member template has a parameter named
  availability_zone and typed string then the scaling group takes 
care
  of providing the right value to that parameter.
 
 The concept of an availability zone for a stack is not meaningful. 
 Servers have availability zones; stacks exist in one region. It is up to 

 the *operator*, not the user, to deploy Heat in such a way that it 
 remains highly-available assuming the Region is still up.

There are two distinct issues there: (1) making the heat engine HA and (2) 
making a scaling group of stacks span across AZs (within a region).  I 
agree that (1) is the cloud provider's problem, and never meant to suggest 
otherwise.  I think (2) makes sense by analogy: a nested stack is a way of 
implementing a particular abstraction (defined by the template) --- in 
fact the outer template author might not even be aware that the group 
members are stacks, thanks to provider templates --- and here we suppose 
the user has chosen to use an abstraction that makes sense to be 
considered to be in an AZ.  While a stack in general does not have an 
AZ, I think we can suppose that if the outer template author asked for 
stacks to be spread across AZs then the stacks in question can reasonably 
considered to each be in one AZ.  For example, the inner template might 
contain a Compute instance and a Cinder volume and an attachment between 
the two; such a stack makes sense to put in an AZ.  Heat itself does not 
even need there to be any particular real meaning to a stack being in an 
AZ, all I am proposing is that Heat make this concept available to the 
authors of the outer and innermost templates to use in whatever way they 
find useful.

 So yes, the tricky part is how to handle that when the scaling unit is 
 not a server (or a provider template with the same interface as a 
server).
 
 One solution would have been to require that the scaled unit was, 
 indeed, either an OS::Nova::Server or a provider template with the same 
 interface as (or a superset of) an OS::Nova::Server, but the consensus 
 was against that. (Another odd consequence of this decision is that 
 we'll potentially be overwriting an AZ specified in the launch config 
 section with one from the list supplied to the scaling group itself.)
 
 For provider templates, we could insert a pseudo-parameter containing 
 the availability zone. I think that could be marginally better than 
 taking over one of the user's parameters, but you're basically on the 
 right track IMO.

I considered a built-in function or pseudo parameter and rejected them 
based on a design principle that was articulated in an earlier discussion: 
no modes.  Making the innermost template explicitly declare that it 
takes an AZ parameter makes it more explicit what is going on.  But I 
agree that this is a relatively minor design point, and would be content 
to go with a pseudo-parameter if the community really prefers that.

 Unfortunately, that is not the end of the story, because we still have 
 to deal with other types of resources being scaled. I always advocated 
 for an autoscaling resource where the scaled unit was either a provider 
 stack (if you provided a template) or an OS::Nova::Server (if you 
 didn't), but the implementation that landed followed the design of 
 ResourceGroup by allowing (actually, requiring) you to specify an 
 arbitrary resource type.
 
 We could do something fancy here involving tagging the properties schema 

 with metadata so that we could allow plugin authors to map the AZ list 
 to an arbitrary property. However, I propose that we just raise a 
 validation error if the AZ is specified for a resource that is not 
 either an OS::Nova::Server or a provider template.

Yes, I agree with limiting ambition.  For OS::Heat::AutoScalingGroup and 
ResourceGroup, I think a pretty simple rule will cover all the cases that 
matter besides nested stack: if the member resource type has a string 
parameter named availability zone (camel or snake case) then this is 
valid.

I just reviewed LaunchConfiguration (
http://docs.openstack.org/developer/heat/template_guide/cfn.html#AWS::AutoScaling::LaunchConfiguration
) and noticed that it does not have an availability_zone.

Since AWS::AutoScalingGroup and 

[openstack-dev] [heat] autoscaling across regions and availability zones

2014-07-01 Thread Mike Spreitzer
An AWS autoscaling group can span multiple availability zones in one 
region.  What is the thinking about how to get analogous functionality in 
OpenStack?

Warmup question: what is the thinking about how to get the levels of 
isolation seen between AWS regions when using OpenStack?  What is the 
thinking about how to get the level of isolation seen between AWS AZs in 
the same AWS Region when using OpenStack?  Do we use OpenStack Region and 
AZ, respectively?  Do we believe that OpenStack AZs can really be as 
independent as we want them (note that this is phrased to not assume we 
only want as much isolation as AWS provides --- they have had high profile 
outages due to lack of isolation between AZs in a region)?

I am going to assume that the answer to the question about ASG spanning 
involves spanning OpenStack regions and/or AZs.  In the case of spanning 
AZs, Heat has already got one critical piece: the OS::Heat::InstanceGroup 
and AWS::AutoScaling::AutoScalingGroup types of resources take a list of 
AZs as an optional parameter.  Presumably all four kinds of scaling group 
(i.e., also OS::Heat::AutoScalingGroup and OS::Heat::ResourceGroup) should 
have such a parameter.  We would need to change the code that generates 
the template for the nested stack that is the group, so that it spreads 
the members across the AZs in a way that is as balanced as is possible at 
the time.

Currently, a stack does not have an AZ.  That makes the case of an 
OS::Heat::AutoScalingGroup whose members are nested stacks interesting --- 
how does one of those nested stacks get into the right AZ?  And what does 
that mean, anyway?  The meaning would have to be left up to the template 
author.  But he needs something he can write in his member template to 
reference the desired AZ for the member stack.  I suppose we could 
stipulate that if the member template has a parameter named 
availability_zone and typed string then the scaling group takes care 
of providing the right value to that parameter.

To spread across regions adds two things.  First, all four kinds of 
scaling group would need the option to be given a list of regions instead 
of a list of AZs.  More likely, a list of contexts as defined in 
https://review.openstack.org/#/c/53313/ --- that would make this handle 
multi-cloud as well as multi-region.  The other thing this adds is a 
concern for context health.  It is not enough to ask Ceilometer to monitor 
member health --- in multi-region or multi-cloud you also have to worry 
about the possibility that Ceilometer itself goes away.  It would have to 
be the scaling group's responsibility to monitor for context health, and 
react properly to failure of a whole context.

Does this sound about right?  If so, I could draft a spec.

Thanks,
Mike___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] autoscaling across regions and availability zones

2014-07-01 Thread Zane Bitter

On 01/07/14 16:23, Mike Spreitzer wrote:

An AWS autoscaling group can span multiple availability zones in one
region.  What is the thinking about how to get analogous functionality
in OpenStack?


Correct, you specify a list of availability zones (instead of just one), 
and AWS distributes servers across them in some sort of round-robin 
fashion. We should implement this.



Warmup question: what is the thinking about how to get the levels of
isolation seen between AWS regions when using OpenStack?  What is the
thinking about how to get the level of isolation seen between AWS AZs in
the same AWS Region when using OpenStack?  Do we use OpenStack Region
and AZ, respectively?  Do we believe that OpenStack AZs can really be as
independent as we want them (note that this is phrased to not assume we
only want as much isolation as AWS provides --- they have had high
profile outages due to lack of isolation between AZs in a region)?


That seems like a question for individual operators, rather than for 
OpenStack. OpenStack allows you, as an operator, to create AZs and 
Regions... how good a job you do is up to you.



I am going to assume that the answer to the question about ASG spanning
involves spanning OpenStack regions and/or AZs.  In the case of spanning
AZs, Heat has already got one critical piece: the
OS::Heat::InstanceGroup and AWS::AutoScaling::AutoScalingGroup types of
resources take a list of AZs as an optional parameter.


That's technically true, but we don't read the list :(


Presumably all
four kinds of scaling group (i.e., also OS::Heat::AutoScalingGroup and
OS::Heat::ResourceGroup) should have such a parameter.  We would need to
change the code that generates the template for the nested stack that is
the group, so that it spreads the members across the AZs in a way that
is as balanced as is possible at the time.


+1


Currently, a stack does not have an AZ.  That makes the case of an
OS::Heat::AutoScalingGroup whose members are nested stacks interesting
--- how does one of those nested stacks get into the right AZ?  And what
does that mean, anyway?  The meaning would have to be left up to the
template author.  But he needs something he can write in his member
template to reference the desired AZ for the member stack.  I suppose we
could stipulate that if the member template has a parameter named
availability_zone and typed string then the scaling group takes care
of providing the right value to that parameter.


The concept of an availability zone for a stack is not meaningful. 
Servers have availability zones; stacks exist in one region. It is up to 
the *operator*, not the user, to deploy Heat in such a way that it 
remains highly-available assuming the Region is still up.


So yes, the tricky part is how to handle that when the scaling unit is 
not a server (or a provider template with the same interface as a server).


One solution would have been to require that the scaled unit was, 
indeed, either an OS::Nova::Server or a provider template with the same 
interface as (or a superset of) an OS::Nova::Server, but the consensus 
was against that. (Another odd consequence of this decision is that 
we'll potentially be overwriting an AZ specified in the launch config 
section with one from the list supplied to the scaling group itself.)


For provider templates, we could insert a pseudo-parameter containing 
the availability zone. I think that could be marginally better than 
taking over one of the user's parameters, but you're basically on the 
right track IMO.


Unfortunately, that is not the end of the story, because we still have 
to deal with other types of resources being scaled. I always advocated 
for an autoscaling resource where the scaled unit was either a provider 
stack (if you provided a template) or an OS::Nova::Server (if you 
didn't), but the implementation that landed followed the design of 
ResourceGroup by allowing (actually, requiring) you to specify an 
arbitrary resource type.


We could do something fancy here involving tagging the properties schema 
with metadata so that we could allow plugin authors to map the AZ list 
to an arbitrary property. However, I propose that we just raise a 
validation error if the AZ is specified for a resource that is not 
either an OS::Nova::Server or a provider template.



To spread across regions adds two things.  First, all four kinds of
scaling group would need the option to be given a list of regions
instead of a list of AZs.  More likely, a list of contexts as defined in
https://review.openstack.org/#/c/53313/--- that would make this handle
multi-cloud as well as multi-region.  The other thing this adds is a
concern for context health.  It is not enough to ask Ceilometer to
monitor member health --- in multi-region or multi-cloud you also have
to worry about the possibility that Ceilometer itself goes away.  It
would have to be the scaling group's responsibility to monitor for
context health, and react properly to failure of a