On 9/24/2015 9:06 AM, Matt Riedemann wrote:


On 9/24/2015 3:19 AM, Sylvain Bauza wrote:


Le 24/09/2015 09:04, Duncan Thomas a écrit :
Hi

I thought I was late on this thread, but looking at the time stamps,
it is just something that escalated very quickly. I am honestly
surprised an cross-project interaction option went from 'we don't seem
to understand this' to 'deprecation merged' in 4 hours, with only a 12
hour discussion on the mailing list, right at the end of a cycle when
we're supposed to be stabilising features.


So, I agree it was maybe a bit too quick hence the revert. That said,
Nova master is now Mitaka, which means that the deprecation change was
provided for the next cycle, not the one currently stabilising.

Anyway, I'm really all up with discussing why Cinder needs to know the
Nova AZs.

I proposed a session at the Tokyo summit for a discussion of Cinder
AZs, since there was clear confusion about what they are intended for
and how they should be configured.

Cool, count me in from the Nova standpoint.

Since then I've reached out to and gotten good feedback from, a number
of operators. There are two distinct configurations for AZ behaviour
in cinder, and both sort-of worked until very recently.

1) No AZs in cinder
This is the config where a single 'blob' of storage (most of the
operators who responded so far are using Ceph, though that isn't
required). The storage takes care of availability concerns, and any AZ
info from nova should just be ignored.

2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couples
storage to nova AZs. It is may be that an AZ is used as a unit of
scaling, or it could be a real storage failure domain. Eitehr way,
there are a number of operators who have this configuration and want
to keep it. Storage can certainly have a failure domain, and limiting
the scalability problem of storage to a single cmpute AZ can have
definite advantages in failure scenarios. These people do not want
cross-az attach.


Ahem, Nova AZs are not failure domains - I mean the current
implementation, in the sense of many people understand what is a failure
domain, ie. a physical unit of machines (a bay, a room, a floor, a
datacenter).
All the AZs in Nova share the same controlplane with the same message
queue and database, which means that one failure can be propagated to
the other AZ.

To be honest, there is one very specific usecase where AZs *are* failure
domains : when cells exact match with AZs (ie. one AZ grouping all the
hosts behind one cell). That's the very specific usecase that Sam is
mentioning in his email, and I certainly understand we need to keep that.

What are AZs in Nova is pretty well explained in a quite old blogpost :
http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/


We also added a few comments in our developer doc here
http://docs.openstack.org/developer/nova/aggregates.html#availability-zones-azs


tl;dr: AZs are aggregate metadata that makes those aggregates of compute
nodes visible to the users. Nothing more than that, no magic sauce.
That's just a logical abstraction that can be mapping your physical
deployment, but like I said, which would share the same bus and DB.
Of course, you could still provide networks distinct between AZs but
that just gives you the L2 isolation, not the real failure domain in a
Business Continuity Plan way.

What puzzles me is how Cinder is managing a datacenter-level of
isolation given there is no cells concept AFAIK. I assume that
cinder-volumes are belonging to a specific datacenter but how is managed
the controlplane of it ? I can certainly understand the need of affinity
placement between physical units, but I'm missing that piece, and
consequently I wonder why Nova need to provide AZs to Cinder on a
general case.



My hope at the summit session was to agree these two configurations,
discuss any scenarios not covered by these two configuration, and nail
down the changes we need to get these to work properly. There's
definitely been interest and activity in the operator community in
making nova and cinder AZs interact, and every desired interaction
I've gotten details about so far matches one of the above models.


I'm all with you about providing a way for users to get volume affinity
for Nova. That's a long story I'm trying to consider and we are
constantly trying to improve the nova scheduler interfaces so that other
projects could provide resources to the nova scheduler for decision
making. I just want to consider whether AZs are the best concept for
that or we should do thing by other ways (again, because AZs are not
what people expect).

Again, count me in for the Cinder session, and just lemme know when the
session is planned so I could attend it.

-Sylvain



__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


I plan on reverting the deprecation change (which was a mitaka change,
not a liberty change, as Sylvain pointed out).

However, given how many nova and cinder cores were talking about this
yesterday and thought it was the right thing to do speaks to the fact
that this is not a well understood use case (or documented at all).  So
as part of reverting the deprecation I also want to see improved docs
for the cross_az_attach option itself and probably a nova devref change
explaining the use cases and issues with this.

I think the volume attach case is pretty straightforward.  You create a
nova instance in some nova AZ x and create a cinder volume in some
cinder AZ y and try to attach the volume to the server instance.  If
cinder.cross_az_attach=True this is OK, else it fails.

The problem I have is with the boot from volume case where
source=(blank/image/snapshot).  In those cases nova is creating the
volume and passing the server instance AZ to the volume create API.  How
are people that are using cinder.cross_az_attach=False handling the BFV
case?

Per bug 1496235 that started this, the user is booting a nova instance
in a nova AZ with bdm source=image and when nova tries to create the
volume it fails because that AZ doesn't exist in cinder.  This fails in
the compute manager when building the instance, so this results in a
NoValidHost error for the user - which we all know and love as a super
useful error.  So how do we handle this case?  If
cinder.cross_az_attach=True in nova we could just not pass the instance
AZ to the volume create, or only pass it if cinder has that AZ available.

But if cinder.cross_az_attach=False when creating the volume, what do we
do?  I guess we can just leave the code as-is and if the AZ isn't in
cinder (or your admin hasn't set allow_availability_zone_fallback=True
in cinder.conf), then it fails and you open a support ticket.  That
seems gross to me.  I'd like to at least see some of this validated in
the nova API layer before it gets to the scheduler and compute so we can
avoid NoValidHost.  My thinking is, in the BFV case where source !=
volume, if cinder.cross_az_attach is False and instance.az is not None,
then we check the list of AZs from the volume API.  If the instance.az
is not in that list, we fail fast (400 response to the user).  However,
if allow_availability_zone_fallback=True in cinder.conf, we'd be
rejecting the request even though the actual volume create would
succeed.  These are just details that we don't have in the nova API
since it's all policy driven gorp using config options that the user
doesn't know about, which makes it really hard to write applications
against this - and was part of the reason I moved to deprecate that option.

Am I off in the weeds?  It sounds like Duncan is going to try and get a
plan together in Tokyo about how to handle this and decouple nova and
cinder in this case, which is the right long-term goal.


Revert is approved: https://review.openstack.org/#/c/227340/

--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to