Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

Sylvain Bauza Thu, 24 Sep 2015 01:30:52 -0700


Le 24/09/2015 09:04, Duncan Thomas a écrit :

Hi
I thought I was late on this thread, but looking at the time stamps,it is just something that escalated very quickly. I am honestlysurprised an cross-project interaction option went from 'we don't seemto understand this' to 'deprecation merged' in 4 hours, with only a 12hour discussion on the mailing list, right at the end of a cycle whenwe're supposed to be stabilising features.

So, I agree it was maybe a bit too quick hence the revert. That said,Nova master is now Mitaka, which means that the deprecation change wasprovided for the next cycle, not the one currently stabilising.

Anyway, I'm really all up with discussing why Cinder needs to know theNova AZs.

I proposed a session at the Tokyo summit for a discussion of CinderAZs, since there was clear confusion about what they are intended forand how they should be configured.


Cool, count me in from the Nova standpoint.

Since then I've reached out to and gotten good feedback from, a numberof operators. There are two distinct configurations for AZ behaviourin cinder, and both sort-of worked until very recently.
1) No AZs in cinder
This is the config where a single 'blob' of storage (most of theoperators who responded so far are using Ceph, though that isn'trequired). The storage takes care of availability concerns, and any AZinfo from nova should just be ignored.
2) Cinder AZs map to Nova AZs
In this case, some combination of storage / networking / etc couplesstorage to nova AZs. It is may be that an AZ is used as a unit ofscaling, or it could be a real storage failure domain. Eitehr way,there are a number of operators who have this configuration and wantto keep it. Storage can certainly have a failure domain, and limitingthe scalability problem of storage to a single cmpute AZ can havedefinite advantages in failure scenarios. These people do not wantcross-az attach.

Ahem, Nova AZs are not failure domains - I mean the currentimplementation, in the sense of many people understand what is a failuredomain, ie. a physical unit of machines (a bay, a room, a floor, adatacenter).All the AZs in Nova share the same controlplane with the same messagequeue and database, which means that one failure can be propagated tothe other AZ.

To be honest, there is one very specific usecase where AZs *are* failuredomains : when cells exact match with AZs (ie. one AZ grouping all thehosts behind one cell). That's the very specific usecase that Sam ismentioning in his email, and I certainly understand we need to keep that.

What are AZs in Nova is pretty well explained in a quite old blogpost :http://blog.russellbryant.net/2013/05/21/availability-zones-and-host-aggregates-in-openstack-compute-nova/

We also added a few comments in our developer doc herehttp://docs.openstack.org/developer/nova/aggregates.html#availability-zones-azs

tl;dr: AZs are aggregate metadata that makes those aggregates of computenodes visible to the users. Nothing more than that, no magic sauce.That's just a logical abstraction that can be mapping your physicaldeployment, but like I said, which would share the same bus and DB.Of course, you could still provide networks distinct between AZs butthat just gives you the L2 isolation, not the real failure domain in aBusiness Continuity Plan way.

What puzzles me is how Cinder is managing a datacenter-level ofisolation given there is no cells concept AFAIK. I assume thatcinder-volumes are belonging to a specific datacenter but how is managedthe controlplane of it ? I can certainly understand the need of affinityplacement between physical units, but I'm missing that piece, andconsequently I wonder why Nova need to provide AZs to Cinder on ageneral case.

My hope at the summit session was to agree these two configurations,discuss any scenarios not covered by these two configuration, and naildown the changes we need to get these to work properly. There'sdefinitely been interest and activity in the operator community inmaking nova and cinder AZs interact, and every desired interactionI've gotten details about so far matches one of the above models.

I'm all with you about providing a way for users to get volume affinityfor Nova. That's a long story I'm trying to consider and we areconstantly trying to improve the nova scheduler interfaces so that otherprojects could provide resources to the nova scheduler for decisionmaking. I just want to consider whether AZs are the best concept forthat or we should do thing by other ways (again, because AZs are notwhat people expect).

Again, count me in for the Cinder session, and just lemme know when thesession is planned so I could attend it.


-Sylvain


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][cinder] how to handle AZ bug 1496235?

Reply via email to