Re: [Openstack] Instance IDs and Multiple Zones
+1 Great discussion and not anything that should be blocking distributed scheduler. -S From: Eric Day [e...@oddments.org] Ok. :) The original statement felt like it was written with negative connotations, and I just wanted to say I think it's all been positive. -Eric Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
If we were to go with UUIDs and using XenServer, I should be able to use the uuid that it generates upon VM creation. I would almost ask your above question for XenServer then. When I terminate and launch an VM on the same machine, I should be able to give it the same uuid that I was just using, but I can't. Maybe I can and I'm making it harder on myself :) Yes, it would be great if you could use the XenServer-generated UUID. The reason this doesn't work is because OpenStack is outside of the XenServer design envelope, and is orchestrating on top of it. If you were using XenServer pools, and only starting, stopping, and migrating VMs within the pool, then you could use our UUID, because the VM retains its identity for the lifetime of all those operations. It's the fact that OpenStack moves beyond that model that breaks this. For OpenStack, it might be a VM move, but for XenServer it's a VM copy + VM destroy + VM start for a completely different VM; we lose track of the identity at that point. Ewan. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
It's early here, but I think it's closer to 200 zones? :) On Mar 24, 2011, at 5:16 AM, Ed Leafe e...@leafe.com wrote: On Mar 23, 2011, at 9:41 PM, Justin Santa Barbara wrote: The type of a server @id in CloudServers is xsd:int, which is a 32-bit signed integer: http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd So if you have 1 billion integers per zone, you only get 2 zones. You can have 4 if you're willing to go negative, but surely it's too early in the campaign. Yes, you're correct. That always trips me up: why would anyone pick a signed integer for a PK? OK, so I'll slice the ranges down to the current Rackspace practice of 10 million. That will allow for around 2000 zones. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Good conversation guys. Certainly something we need to get settled out sooner than later. On naming: No matter how we shake it out (prefixes, mac address, time, etc), we're essentially fabricating our own form of UUID ... trying to pick some unique qualifier(s) to avoid collisions. I think the real driver is making something that is as-short-as-possible and mnemonic enough that a user could look at it and say yup, that's mine. Personally, I find UUID's to be ugly monsters and think URN's are better for providing a mnemonic for remembering names. Given: 6373-ba62-9847-feab-b72a-00dd vs. rax:ord:zone3:rack2:cust29:inst383 ... give me a URN anytime. However, this does pose security risks by exposing internal layouts. We currently allow a user supplied friendly name but under-the-hood use the instance ID. Since customers use different auth credentials their instances live in different Projects and there is no conflict. Duplicate names are allowed across customers (even within customers?) Downside is there are no hints for routing from names. On bursting: Currently, the Instance ID is fabricated in the zone where the create() call was handled. This Instance ID is treated like a Reservation # which is returned to the user for later follow-up (since provisioning can take a while). The way I currently envision bursting with zones is that the commercial zones would be the leaf zones in a deployment. That is, instances would be provisioned locally first (depending on Server Best Match) due to their low weight scores and ultimately burst through the bottom of the zone tree to the commercial cloud. I think this works well. If I have a hybrid cloud and issue 'nova list' I would see something like: sleepy - com:myco:development:inst1 dopey - com:myco:development:inst2 blinky - com:myco:development:inst3 inky - rax:ord:zone3:rack2:cust293:inst393 pinky - rax:ord:zone2:rack34:cust293:inst8746 clyde - bobscloud:basement:shelf2:cust9:inst8 and get a good idea of what's what. Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
We shouldn't keep tainting this argument with concerns about whether the IDs are readable or not. We have UIs and CLIs to make things readable for humans. We have to accept that, on the scales we care about, any unique ID is going to be incomprehensible to a human. Rely on your presentation layer, that's what it's there for! Ewan. -Original Message- From: openstack-bounces+ewan.mellor=citrix@lists.launchpad.net [mailto:openstack-bounces+ewan.mellor=citrix@lists.launchpad.net] On Behalf Of Sandy Walsh Sent: 23 March 2011 12:30 To: openstack@lists.launchpad.net Subject: Re: [Openstack] Instance IDs and Multiple Zones Good conversation guys. Certainly something we need to get settled out sooner than later. On naming: No matter how we shake it out (prefixes, mac address, time, etc), we're essentially fabricating our own form of UUID ... trying to pick some unique qualifier(s) to avoid collisions. I think the real driver is making something that is as-short-as- possible and mnemonic enough that a user could look at it and say yup, that's mine. Personally, I find UUID's to be ugly monsters and think URN's are better for providing a mnemonic for remembering names. Given: 6373-ba62-9847-feab-b72a-00dd vs. rax:ord:zone3:rack2:cust29:inst383 ... give me a URN anytime. However, this does pose security risks by exposing internal layouts. We currently allow a user supplied friendly name but under-the-hood use the instance ID. Since customers use different auth credentials their instances live in different Projects and there is no conflict. Duplicate names are allowed across customers (even within customers?) Downside is there are no hints for routing from names. On bursting: Currently, the Instance ID is fabricated in the zone where the create() call was handled. This Instance ID is treated like a Reservation # which is returned to the user for later follow-up (since provisioning can take a while). The way I currently envision bursting with zones is that the commercial zones would be the leaf zones in a deployment. That is, instances would be provisioned locally first (depending on Server Best Match) due to their low weight scores and ultimately burst through the bottom of the zone tree to the commercial cloud. I think this works well. If I have a hybrid cloud and issue 'nova list' I would see something like: sleepy - com:myco:development:inst1 dopey - com:myco:development:inst2 blinky - com:myco:development:inst3 inky - rax:ord:zone3:rack2:cust293:inst393 pinky - rax:ord:zone2:rack34:cust293:inst8746 clyde - bobscloud:basement:shelf2:cust9:inst8 and get a good idea of what's what. Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Mar 23, 2011, at 8:46 AM, Ewan Mellor wrote: We have to accept that, on the scales we care about, any unique ID is going to be incomprehensible to a human. Rely on your presentation layer, that's what it's there for! +1 -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Mar 23, 2011, at 11:28 AM, Chris Behrens wrote: How would the admin API know which ID to work with if there are collisions? Eric's point is that we'd not know where to route the request. This reflects a fundamental misunderstanding of the way inter-zone communication works. There is no direct routing. Instead, a zone knows about its instances and its child zones. If the zone receives a request for some action involving a particular instance, it checks if it has that instance among its compute nodes; if not, it forwards the request to each of its child zones. That is repeated until the leaf zones are reached, and most of those will respond with something akin to a 404, indicating that they didn't handle the request. The zone that does have the requested instance, though, will carry out the action and return the result of that action. The child zone responses are then aggregated. If all indicate 404, the zone returns the same. If one child responds that it has handled the request, that response is returned. This repeats back up the zone tree until the zone that originally received the request has heard from all of its child zones (or they timed out). If there were to be a collision (i.e., two leaf nodes handling the request), there are only two possibilities: either the authenticated user has rights to those nodes, or they do not. If they do not, nothing will happen beyond an authorization failure message. If they do have rights to both instances, then the action will happen to both instances. Since the context of this discussion is deliberate spoofing, my response would be serves them right. :) So it seems that spoofing should have no effect, assuming that our authentication/authorization system is sound. If it isn't, then we have bigger issues than just ID spoofing, since I could write a program to send API delete requests for random instance IDs - no spoofing required. Without spoofing, let's be realistic: the chance of duplicate uuid values colliding is much, much smaller than the chance of a meteorite smashing into our data centers. From Wikipedia: In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. I believe that that is well beyond our scalability goals, so we can effectively ignore the impact of non-spoofed collisions. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
You have a fundamental misunderstanding of my fundamental understanding of how inter-zone communication works. :) I understand how it works. I'm asking about an admin API that has privileges for actions for all VMs. As an ISP, I want to disable a particular VM because it's being 'bad'. If someone has injected a collision, I would be sending an action to more than 1 VM, not only the intended target. I don't see how collisions can be made to work at all. And yes, we're talking about spoofing (or really, purposefully colliding a known UUID). I haven't seen any mention to anything else (although I may have missed it). I'm certainly really not worried about machine generated UUIDs colliding, myself. But what we're also talking about here is efficient routing. Is it necessary? No. Would it scale? Yes. A zone name or ID needs to be part of the identifier. I prefer the DNS name idea, although prefixing UUIDs or reserving bits in a UUID could also work. - Chris On Mar 23, 2011, at 9:01 AM, Ed Leafe wrote: On Mar 23, 2011, at 11:28 AM, Chris Behrens wrote: How would the admin API know which ID to work with if there are collisions? Eric's point is that we'd not know where to route the request. This reflects a fundamental misunderstanding of the way inter-zone communication works. There is no direct routing. Instead, a zone knows about its instances and its child zones. If the zone receives a request for some action involving a particular instance, it checks if it has that instance among its compute nodes; if not, it forwards the request to each of its child zones. That is repeated until the leaf zones are reached, and most of those will respond with something akin to a 404, indicating that they didn't handle the request. The zone that does have the requested instance, though, will carry out the action and return the result of that action. The child zone responses are then aggregated. If all indicate 404, the zone returns the same. If one child responds that it has handled the request, that response is returned. This repeats back up the zone tree until the zone that originally received the request has heard from all of its child zones (or they timed out). If there were to be a collision (i.e., two leaf nodes handling the request), there are only two possibilities: either the authenticated user has rights to those nodes, or they do not. If they do not, nothing will happen beyond an authorization failure message. If they do have rights to both instances, then the action will happen to both instances. Since the context of this discussion is deliberate spoofing, my response would be serves them right. :) So it seems that spoofing should have no effect, assuming that our authentication/authorization system is sound. If it isn't, then we have bigger issues than just ID spoofing, since I could write a program to send API delete requests for random instance IDs - no spoofing required. Without spoofing, let's be realistic: the chance of duplicate uuid values colliding is much, much smaller than the chance of a meteorite smashing into our data centers. From Wikipedia: In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. I believe that that is well beyond our scalability goals, so we can effectively ignore the impact of non-spoofed collisions. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Hi Ed, On Wed, Mar 23, 2011 at 08:15:54AM -0400, Ed Leafe wrote: On Mar 23, 2011, at 1:55 AM, Eric Day wrote: If we provide some structure to the IDs, such as DNS names, we not only solve this namespacing problem but we also get a much more efficient routing mechanism. When I read things like this, the DBA in me winces a little. Meaningful PKs, compound PKs - they always end up being a Very Bad Thing. If you want to add efficient DNS routing, that could be added as additional data about an instance that is periodically updated up the zone structure along with the other capability information, but until now we've passed on that as a premature optimization. That was one of the major arguments in favor of the global DB design. We're talking about a number of partitioning schemes, reserved bits, URNs, URIs, etc. Because of the namespace issue I believe we will need some structure to our resource names. Lets say you have api.rackspace.com (global aggregation zone), rack1.dfw.rackspace.com (real zone running instances), and bursty.customer.com (private zone). Bursty is a rackspace customer and they want to leverage their private resources alongside the public cloud, so they add bursty.customer.com as a private zone for their Rackspace account. The api.rackspace.com server now gets a terminate request for id x and it needs to know where to route the request. If we have a global namespace for instances (such as UUIDs), rack1.dfw.rackspace.com and bursty.customer.com could both have servers for id x (most likely from bursty spoofing the ID). Now api.rackspace.com doesn't know who to forward the request to. Even if this scenario were to happen, and nova tried to delete an instance with a spoofed ID that did *not* belong to Bursty, it would fail due to improper auth. Otherwise, even without zones/uuids/whatever, I could send termination requests to the API with random IDs and delete any machines with those IDs, whether I had rights to them or not. This implies the resource is now uniquely identified along with auth credentials, which means the resource name cannot stand alone. If we do have collisions due to spoofing, we're going to see ambiguity issues crop up in other systems that don't have the auth context. I strongly believe we need unique resource names that stand on our own and don't depend on any other component such as auth. In the current zone design, a request to terminate id x would not be handled by the outermost zone, since it wouldn't have instances, so it would be forward to each child zone. This would repeat down the zone hierarchy until either there were no more child zones, or a zone found that it had an instance with that ID. In the Bursty example, two zones would find an instance with that ID; one would fail due to auth, and the one owned by Bursty would be terminated as requested. The only way more than one instance would terminate would be if Bursty spoofed their own IDs, which would be their problem, not ours. I think the In the current zone design is my main concern. This discussions is taking into account how things need to work in the near future, not just now. We've punted on routing for now and are simply sending the request to every zone, but this won't work in the long run. If we had a large public cloud with hundreds of zones, and thousands of bursting zones, things will get prohibitively expensive. It's not that they won't function, it just may be unreasonable response time. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Pvo brought up a good use case for naming a little while ago: Migrations. If we use the instance id (assume UNC) to provide hints to the target zone, this means the instance id would need to change should the instance move locations. That's a no-no by everyone's measure. So, now I'm thinking more about Justin's comment about an external registry. Perhaps a glance-like entry with metadata that can change? Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
I asked for further details in IRC, which started a discussion there. To sum up, most folks agree migrations within a zone won't require a new instance ID. Nothing changes except the compute host it's running on. Migrations outside of a zone would require a new instance ID, but this should be fine, since other things would also change (such as IP, available volumes, ...). A cross-zone migration will be more of a copy+delete than a proper move. -Eric On Wed, Mar 23, 2011 at 06:14:31PM +, Sandy Walsh wrote: Pvo brought up a good use case for naming a little while ago: Migrations. If we use the instance id (assume UNC) to provide hints to the target zone, this means the instance id would need to change should the instance move locations. That's a no-no by everyone's measure. So, now I'm thinking more about Justin's comment about an external registry. Perhaps a glance-like entry with metadata that can change? Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Mar 23, 2011, at 3:00 PM, Eric Day wrote: Migrations outside of a zone would require a new instance ID, but this should be fine, since other things would also change (such as IP, available volumes, ...). That's probably true in the Rackspace use case, as zones would most likely be physically separate hardware, but nothing about zones makes that mandatory. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Wed, Mar 23, 2011 at 07:40:20PM +, Ed Leafe wrote: Migrations outside of a zone would require a new instance ID, but this should be fine, since other things would also change (such as IP, available volumes, ...). That's probably true in the Rackspace use case, as zones would most likely be physically separate hardware, but nothing about zones makes that mandatory. It does currently, I wasn't speaking specifically to Rackspace's use case. Right now some network and volume code are not aware of cross-zone issues, and instead assume they are the authority for things like configured IP ranges. We can certainly change this, and if we do want to allow proper instance migrations between zones, we would need to allow instance IDs to change. I don't see the importance of enabling cross-zone migrations (backup+restore seems sufficient). I may be wrong, but if we did enable this functionality in the future, I don't see a reason not to allow resource IDs to change. Sounds like a design summit topic if folks feel we should support this. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
From: openstack-bounces+ewan.mellor=citrix@lists.launchpad.net [mailto:openstack- bounces+ewan.mellor=citrix@lists.launchpad.net] On Behalf Of Justin Santa Barbara Sent: 23 March 2011 19:22 To: Eric Day Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] Instance IDs and Multiple Zones Migrations outside of a zone would require a new instance ID, but this should be fine, since other things would also change (such as IP, available volumes, ...). A cross-zone migration will be more of a copy+delete than a proper move. +1 on this. If the IP is changing, there's little point in trying to keep the ID the same. Great point. I don't agree at all. There are many good reasons to preserve the identity of a VM even when it's IP or location changes. Billing, for example. Access control. Intrusion detection. Just because I move a VM from one place to another, why would I expect its identity to change? Ewan. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
-Original Message- From: Paul Voccio [mailto:paul.voc...@rackspace.com] Sent: 23 March 2011 22:19 To: Ewan Mellor; Justin Santa Barbara; Eric Day Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] Instance IDs and Multiple Zones I don't agree at all. There are many good reasons to preserve the identity of a VM even when it's IP or location changes. Billing, for example. Access control. Intrusion detection. Just because I move a VM from one place to another, why would I expect its identity to change? Where do we put the boundary on the preservation of id? Within the same deployment? Within the same zone topology? I'm not quite following the billing aspect. If you shut one down and start another that is a problem for billing? You stated earlier today: We have to accept that, on the scales we care about, any unique ID is going to be incomprehensible to a human. Rely on your presentation layer, that's what it's there for! Is this really different? If the id changes, should the user care if it is presented in the same way with the same data? Am I missing something? I certainly didn't intend for those statements to be contradictory. I don't think that they are. My view is that identity should be preserved as long as it's possible to do so. A VM that moves around, gets resized, gets rebooted, etc, should have the same identity. By identity I mean that other pieces of software should be able to tell that it's the same thing. A billing system should be able to say that's the same VM that I saw before. For example, if I charge my customers for a month of usage, even if they only run the VM for a part of that month, then my billing system needs to be able to say that VM has moved from here to here, but it's actually the same VM, so I'm charging for one month, not two. This is the current charging scheme for RHEL instances hosted on Rackspace Cloud (http://www.rackspace.com/cloud/blog/2010/08/31/red-hat-license-fee-for-rackspace-cloud-servers-changing-from-hourly-to-monthly/), not just a corner-case example. You can invent similar arguments for penetration detection systems (that VM is acting the way that it used to) or any other system for enforcing policy. If you are using some kind of location- or path-based identifier for that VM, then client software has to be notified of and keep track of all the movement of the VM. If you have a unique identifier, then clients don't have to do any of this. My point about the UI was that we shouldn't worry about how complex these IDs should be. We should make sure that bits of software can talk to each other correctly and simply, and base our ID scheme on those needs. Once we've figured out what ID scheme we're using, it's _trivial_ for a UI or CLI to turn those ugly IDs into Paul's Apache server and Ewan's build machine. To your point about the boundary of preservation of ID, that's a good question. If you ignore the security / trust issues, then the obvious answer is that IDs should be globally, infinitely, permanently unique. That's what UUIDs are for. We can generate these randomly without any need for a central authority, and with no fear of collisions. It would certainly be nice if my VM can leave my SoftLayer DC and arrive in my Rackspace DC and when it comes back I still know that it's the same VM. That's the OpenStack dream, right? I'm willing to accept that that's difficult to achieve, and I'd compromise on identity only being preserved within an ownership/trust boundary. I really don't see why I should lose track of my VM when it moves from one zone to another within a given provider though. Ewan. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
OK, time for everyone to step back and take a deep breath. There are many implications of the earlier design decision to use integer PKs for database entries. Most who have responded here, myself included, have indicated that they would prefer that this be changed to either a string value comprised of several meaningful bits of information, or a UUID approach, or some combination of things that would address various things in the operation of a zoned design. I think that this will make an excellent discussion at next month's design summit! But the reality is that this needs to be developed now, under the current design of integer PKs. Please note that the only concern here is how to reconcile the Rackspace API requirement of globally unique instance IDs with the current design of generating PKs in local databases at the compute node level. To my understanding, there is no other alternative than partitioning the available integer range across zones, so that each zone generates its instance PKs starting from a different number, and spaced far enough apart that they will never overlap. In the first post of this thread, I proposed a simple partitioning system: allocating a range of integers for each zone, and asked for feedback as to what people would think would be a reasonable estimate for the maximum number of instances a zone would ever need to create. Most shared my distaste for this sort of partitioning system, but no one offered an alternative that would be workable given the current constraints. So I'm going to implement a partition of 1 billion integers per zone, which should allow for approximately 1 billion zones, given a 64 bit integer for the PK. This should be workable for now, and after the design summit, when we've come to a consensus on changing the API to accept something other than integer identifiers, it should not be too difficult to retrofit. Unless someone has a better idea... ;-) -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
(sorry Eric, meant to send to the list) -S From: Eric Day [e...@oddments.org] Do we want this namespace per zone, deployment, resource owner, or some other dimension? Good question. We can prevent collisions at the zone level and within a deployment (single provider / multi-zone). But hybrid clusters are a different matter. Regardless of how we delineate it or which ID scheme we use, we have no way of detecting collisions. In the top-level zones of hybrid installations, all instances.get(id) calls issued would have to assume they could get back more than one instance. Ugly, but perhaps this is just the nature of the problem? This includes for 64-bit integer, 1-billion per zone approaches ... but so be it. Let's just get something working. -S Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Thu, Mar 24, 2011 at 12:23:42AM +, Sandy Walsh wrote: From: Eric Day [e...@oddments.org] Do we want this namespace per zone, deployment, resource owner, or some other dimension? Good question. We can prevent collisions at the zone level and within a deployment (single provider / multi-zone). But hybrid clusters are a different matter. Regardless of how we delineate it or which ID scheme we use, we have no way of detecting collisions. Why not? Some schemes such as the ID.DNS name + ssl cert check I mentioned before allow us to verify the authenticity of a namespace before it is used. No other peer could register a zone with that name unless the cert checks out. Within that zone Nova will prevent collisions, but if things are really broken (accident or on purpose) and it starts returning duplicate resource IDs, peer zones can choose to just use one/none. We can document the behavior as undefined. So, sure, you can still have duplicates within a zone (or other namespace), but at least it's self contained and others peering with it don't need to concern itself or worry about spoofing attacks within it's own namespace. In the top-level zones of hybrid installations, all instances.get(id) calls issued would have to assume they could get back more than one instance. Ugly, but perhaps this is just the nature of the problem? If we define the API for that call to only return a single instance, it is up to the child zone to choose which one to send. If it tries to return an array for a single ID, it would just be a protocol error and fail. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
From: Eric Day [e...@oddments.org] On Thu, Mar 24, 2011 at 12:23:42AM +, Sandy Walsh wrote: Regardless of how we delineate it or which ID scheme we use, we have no way of detecting collisions. Why not? Some schemes such as the ID.DNS name + ssl cert check I mentioned before allow us to verify the authenticity of a namespace before it is used. No other peer could register a zone with that name unless the cert checks out. Hmm, yeah, you're right, the SSL cert approach should work for validating unique zone names. Funny, myself and pvo were talking about that route yesterday. But will it help us with the duplicates problem? ... Within that zone Nova will prevent collisions, but if things are really broken (accident or on purpose) and it starts returning duplicate resource IDs, peer zones can choose to just use one/none. We can document the behavior as undefined. I'm not sure that's a good thing ... the use case I was thinking of is the customer using two providers: The customer has his own Openstack deployment (range 0-1B) and outsources to Provider-A and Provider-B. Sadly, Pro-A and Pro-B both use the default ID ranges for service providers (let's say 10-11B). The customer starts provisioning instances to both provider zones evenly ... pow, duplicates. The customer won't be happy that sometimes he gets status on Instance 10,000,000,001 from Provider-A and sometimes from Provider-B. Or none at all. If we append the DNS name of the provider, we bust RS 1.0 compatibility. Perhaps you can walk me through how you see the Cert check helping here (assuming no prefix on id)? Or are we assuming that bursting is a RS x.0 API feature and things will change then? -S Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Mar 23, 2011, at 8:59 PM, Eric Day wrote: May I ask what is the point of doing this if it won't make cactus and we're just going to replace it in a month or two? I think we all agree that 64-bit integer IDs are insufficient for multi-zone deployments, so no one will be deploying this until we sort it out and come up with a better ID. Because this is just one part of the process of creating a distributed scheduler. The process for selecting a host for a new instance won't depend on the type of PK used for that instance in a db table. The only reason I brought it up was that Sandy pointed out this uniqueness requirement, and we felt it would be a good idea to ask the list if they had any good ideas about alternatives to range partitions. I prefaced my initial post with a disclaimer that I wasn't looking to re-argue things that had already been discussed and agreed to, but I guess most people missed that part. :) -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Hi Sandy, On Thu, Mar 24, 2011 at 01:01:18AM +, Sandy Walsh wrote: From: Eric Day [e...@oddments.org] Within that zone Nova will prevent collisions, but if things are really broken (accident or on purpose) and it starts returning duplicate resource IDs, peer zones can choose to just use one/none. We can document the behavior as undefined. I'm not sure that's a good thing ... the use case I was thinking of is the customer using two providers: The customer won't be happy that sometimes he gets status on Instance 10,000,000,001 from Provider-A and sometimes from Provider-B. Or none at all. If we append the DNS name of the provider, we bust RS 1.0 compatibility. I think this is fine. RS 1.0, just like the EC2 API, were not designed with federation in mine. We should not try to jump through hoops to force it if we have the luxury of defining the next API version and supporting it more elegantly there. As for backwards compatibility for RS 1.0/EC2, those APIs could depend on a global mapping server for non-bursting zones to translate nova-internal IDs (id.zone) to what they need (integer, etc.), but this should not be a core component of Nova since it goes against our design tenets. It should be deprecated (along with the APIs) and shutdown in a timely manner once the new API and tools are available. Managing resources in bursting zones would only be available through the new API (along with other new features), so there will be plenty of incentive for clients to change. Perhaps you can walk me through how you see the Cert check helping here (assuming no prefix on id)? Or are we assuming that bursting is a RS x.0 API feature and things will change then? Yeah, the cert check verifies the zone nova.example.com can return resource IDs named *.nova.example.com, all others should be ignored. The ID's need the zone name suffix for it to make sense. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
So I'm going to implement a partition of 1 billion integers per zone, which should allow for approximately 1 billion zones, given a 64 bit integer for the PK. This should be workable for now, and after the design summit, when we've come to a consensus on changing the API to accept something other than integer identifiers, it should not be too difficult to retrofit. The type of a server @id in CloudServers is xsd:int, which is a 32-bit signed integer: http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd So if you have 1 billion integers per zone, you only get 2 zones. You can have 4 if you're willing to go negative, but surely it's too early in the campaign. http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd I think the only way long-term we're going to have CloudServers v1.0 compatibility is by having a proxy that bridges between legacy APIs (EC2 and CS) and future APIs (OpenStack). I'm guessing that proxy will have to be stateful to implement mappings of server IDs etc. Yes, this sucks. But at some stage you have to say you know, maybe 640KB wasn't enough, and we have to make some changes How about this as a solution: use ranges as you suggest, but let the starting points for the zone-ids that child-zones draw from be customer-configured. We're pushing the problem onto the end-user, but they probably know best anyway, and we don't really expect anyone to use sub-zones in anger anyway until Diablo or later, right? Justin ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Mar 23, 2011, at 9:54 PM, Eric Day wrote: I don't think anyone is arguing, all the discussion has been very healthy IMHO. Of course we are arguing - presenting evidence for a particular position in an effort to persuade is argument. The arguments have not become heated or personal, if that's what you meant. Differing ideas and opposing POVs are wonderful, IMO. Groupthink is what should be avoided as unhealthy. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Ok. :) The original statement felt like it was written with negative connotations, and I just wanted to say I think it's all been positive. -Eric On Wed, Mar 23, 2011 at 10:09:50PM -0400, Ed Leafe wrote: On Mar 23, 2011, at 9:54 PM, Eric Day wrote: I don't think anyone is arguing, all the discussion has been very healthy IMHO. Of course we are arguing - presenting evidence for a particular position in an effort to persuade is argument. The arguments have not become heated or personal, if that's what you meant. Differing ideas and opposing POVs are wonderful, IMO. Groupthink is what should be avoided as unhealthy. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
the IDs must be strictly numericalish numbers, with nothing smelling of something like a string in there, i take it? ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Mar 22, 2011, at 1:11 PM, Jon Slenk wrote: the IDs must be strictly numericalish numbers, with nothing smelling of something like a string in there, i take it? Well, since they are defined as: `id` int(11) NOT NULL AUTO_INCREMENT, I would say the chance of a stringish thing slipping in is pretty small. :) -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Tue, Mar 22, 2011 at 10:41 AM, Ed Leafe e...@leafe.com wrote: Well, since they are defined as: `id` int(11) NOT NULL AUTO_INCREMENT, I would say the chance of a stringish thing slipping in is pretty small. :) if the schema cannot be changed (which might be worth reconsidering since it seems to be a bit of a root cause of trouble) then maybe you have to reserve the last 4 or 5 digits of the id to be the zone id, and then autoincrement on top of that? on the assumption that there would be a limit of or 9 zones ever. but really i'd hazard to suggest that it should somehow be 2 parts, neither of which are super constrained: a zone part and an in-zone-id part. it could even be that the id is left as-is and is semantically required to be joined with the zone name as a prefix before it is a valid interzone id. sincerely. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also possible, but requires some trickier protocols.) However, I agree with Monsyne: numeric IDs have got to go. Suppose I'm a customer of Rackspace CloudServers once it is running on OpenStack, and I also have a private cloud that the new Rackspace Cloud Business unit has built for me. I like both, and then I want to do cloud bursting in between them, by putting an aggregating zone in front of them. I think at that stage, we're screwed unless we figure this out now. And this scenario only has one provider (Rackspace) involved! We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice collide. I'd still prefer strings though... Justin On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote: I want to get some input from all of you on what you think is the best way to approach this problem: the RS API requires that every instance have a unique ID, and we are currently creating these IDs by use of an auto-increment field in the instances table. The introduction of zones complicates this, as each zone has its own database. The two obvious solutions are a) a single, shared database and b) using a UUID instead of an integer for the ID. Both of these approaches have been discussed and rejected, so let's not bring them back up now. Given integer IDs and separate databases, the only obvious choice is partitioning the numeric space so that each zone starts its auto-incrementing at a different point, with enough room between starting ranges to ensure that they would never overlap. This would require some assumptions be made about the maximum number of instances that would ever be created in a single zone in order to determine how much numeric space that zone would need. I'm looking to get some feedback on what would seem to be reasonable guesses to these partition sizes. The other concern is more aesthetic than technical: we can make the numeric spaces big enough to avoid overlap, but then we'll have very large ID values; e.g., 10 or more digits for an instance. Computers won't care, but people might, so I thought I'd at least bring up this potential objection. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Also, I should note that there seems to be merges pending to make the v1.1 api use urls as instance identifiers in api calls, rather than integer id's... I'm not sure of the impact of that with the v1.0 compat, but that is something to think of. -- -- -Monsyne Dragon Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Tue, Mar 22, 2011 at 12:40:21PM -0400, Ed Leafe wrote: The two obvious solutions are a) a single, shared database and b) using a UUID instead of an integer for the ID. Both of these approaches have been discussed and rejected, so let's not bring them back up now. We shouldn't dismiss previous ideas just because we've not chosen them in the past, but lets not have the same discussion. Given integer IDs and separate databases, the only obvious choice is partitioning the numeric space so that each zone starts its auto-incrementing at a different point, with enough room between starting ranges to ensure that they would never overlap. This would require some assumptions be made about the maximum number of instances that would ever be created in a single zone in order to determine how much numeric space that zone would need. I'm looking to get some feedback on what would seem to be reasonable guesses to these partition sizes. I think we need: * No central authority such as a globally shared DB. This also means not partitioning some set and handing them out to zones as offset (this is just another form of a shared DB). * Ability to seamlessly join existing zones without chance of namespace collisions for peering and bursting. This means a globally unique zone naming scheme, and for this I'll reiterate the idea of using DNS names for zones. If we want to stick with a single DB per zone, as it looks like we are, this can simply be the auto-increment value from the instance table and the zone as: instance.zone. The other concern is more aesthetic than technical: we can make the numeric spaces big enough to avoid overlap, but then we'll have very large ID values; e.g., 10 or more digits for an instance. Computers won't care, but people might, so I thought I'd at least bring up this potential objection. I'm not concerned with aesthetic issues to be honest. We have copy/paste, DNS, and various techniques for presentation layers. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Mar 22, 2011, at 1:45 PM, Jon Slenk wrote: if the schema cannot be changed (which might be worth reconsidering since it seems to be a bit of a root cause of trouble) then maybe you have to reserve the last 4 or 5 digits of the id to be the zone id, and then autoincrement on top of that? on the assumption that there would be a limit of or 9 zones ever. Just to be clear: I would not have been in favor of using integer IDs. However, this was discussed and settled before I was actively involved in the OpenStack code, so I didn't want to have this devolve into a resurrection of what had already been decided. If someone wants to restart that discussion, I'd certainly be interested, but that's not what I'm looking for in this thread. The question before us is: given integer IDs, what is the best way to handle the added complexity of multiple zones? -- Ed Leafe Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at ab...@rackspace.com, and delete the original message. Your cooperation is appreciated. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
On Tue, Mar 22, 2011 at 10:48:09AM -0700, Justin Santa Barbara wrote: We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice collide. I'd still prefer strings though... If we use a number/uuid without a zone prefix, then they can collide. What happens when I want to burst to my private cloud and I've fixed my UUIDs to intentionally collide just to cause trouble? Through peering and bursting we have potentially malicious users for some deployments and we need to be sure resource ID spoofing and poisoning is not possible. The simplest way is to have a namespace for every zone, and the most obvious namespace is the zone name. We'll of course need a mechanism to detect authenticity of zone names too (signed certs, etc). Oh, and all this discussion should not be limited to just instance IDs, networks and volumes need to be globally addressed as well and should follow the same mechanism. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Totally agree with Eric. Two questions that I think can help us move forward: 1. Is the decision to stick with integers still valid? Can someone that was there give us the reason for the decision? Is it documented anywhere? 2. If we must have integers means that we get 128 bit 'random' integers, do we still want integers? Justin On Tue, Mar 22, 2011 at 11:25 AM, Eric Day e...@oddments.org wrote: On Tue, Mar 22, 2011 at 12:40:21PM -0400, Ed Leafe wrote: The two obvious solutions are a) a single, shared database and b) using a UUID instead of an integer for the ID. Both of these approaches have been discussed and rejected, so let's not bring them back up now. We shouldn't dismiss previous ideas just because we've not chosen them in the past, but lets not have the same discussion. Given integer IDs and separate databases, the only obvious choice is partitioning the numeric space so that each zone starts its auto-incrementing at a different point, with enough room between starting ranges to ensure that they would never overlap. This would require some assumptions be made about the maximum number of instances that would ever be created in a single zone in order to determine how much numeric space that zone would need. I'm looking to get some feedback on what would seem to be reasonable guesses to these partition sizes. I think we need: * No central authority such as a globally shared DB. This also means not partitioning some set and handing them out to zones as offset (this is just another form of a shared DB). * Ability to seamlessly join existing zones without chance of namespace collisions for peering and bursting. This means a globally unique zone naming scheme, and for this I'll reiterate the idea of using DNS names for zones. If we want to stick with a single DB per zone, as it looks like we are, this can simply be the auto-increment value from the instance table and the zone as: instance.zone. The other concern is more aesthetic than technical: we can make the numeric spaces big enough to avoid overlap, but then we'll have very large ID values; e.g., 10 or more digits for an instance. Computers won't care, but people might, so I thought I'd at least bring up this potential objection. I'm not concerned with aesthetic issues to be honest. We have copy/paste, DNS, and various techniques for presentation layers. -Eric ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
I think Dragon got it right. We need a zone identifier prefix on the IDs. I think we need to get away from numbers. I don't see any reason why they need to be numbers. But, even if they did, you can pick very large numbers and reserve some bits for zone ID. - Chris On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote: I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also possible, but requires some trickier protocols.) However, I agree with Monsyne: numeric IDs have got to go. Suppose I'm a customer of Rackspace CloudServers once it is running on OpenStack, and I also have a private cloud that the new Rackspace Cloud Business unit has built for me. I like both, and then I want to do cloud bursting in between them, by putting an aggregating zone in front of them. I think at that stage, we're screwed unless we figure this out now. And this scenario only has one provider (Rackspace) involved! We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice collide. I'd still prefer strings though... Justin On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote: I want to get some input from all of you on what you think is the best way to approach this problem: the RS API requires that every instance have a unique ID, and we are currently creating these IDs by use of an auto-increment field in the instances table. The introduction of zones complicates this, as each zone has its own database. The two obvious solutions are a) a single, shared database and b) using a UUID instead of an integer for the ID. Both of these approaches have been discussed and rejected, so let's not bring them back up now. Given integer IDs and separate databases, the only obvious choice is partitioning the numeric space so that each zone starts its auto-incrementing at a different point, with enough room between starting ranges to ensure that they would never overlap. This would require some assumptions be made about the maximum number of instances that would ever be created in a single zone in order to determine how much numeric space that zone would need. I'm looking to get some feedback on what would seem to be reasonable guesses to these partition sizes. The other concern is more aesthetic than technical: we can make the numeric spaces big enough to avoid overlap, but then we'll have very large ID values; e.g., 10 or more digits for an instance. Computers won't care, but people might, so I thought I'd at least bring up this potential objection. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
I know you don't want to resurrect a past discussion. But, UUIDs are designed to solve these kind of problems, frankly. The decision to go with integer IDs is a poor one, and will be negatively affecting the scalability and architecture of our systems well into the future. I'd love to see a discussion around moving away from internal integer identifiers and towards UUID internal identifiers at the next summit. Just my 2 cents, -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
I agree with the sentiment that integers aren't the way to go long term. The current spec of the api does introduce some interesting problems to this discussion. All can be solved. The spec calls for the api to return an id and a password upon instance creation. This means the api isn't asynchronous if it has to wait for the zone to create the id. From page 46 of the API Spec states the following: Note that when creating a server only the server ID and the admin password are guaranteed to be returned in the request object. Additional attributes may be retrieved by performing subsequent GETs on the server. This creates a problem with the bursting if Z1 calls to Z2, which is a public cloud, which has to wait for Z3-X to find out where it is going be placed. How would this work? pvo On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote: I think Dragon got it right. We need a zone identifier prefix on the IDs. I think we need to get away from numbers. I don't see any reason why they need to be numbers. But, even if they did, you can pick very large numbers and reserve some bits for zone ID. - Chris On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote: I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also possible, but requires some trickier protocols.) However, I agree with Monsyne: numeric IDs have got to go. Suppose I'm a customer of Rackspace CloudServers once it is running on OpenStack, and I also have a private cloud that the new Rackspace Cloud Business unit has built for me. I like both, and then I want to do cloud bursting in between them, by putting an aggregating zone in front of them. I think at that stage, we're screwed unless we figure this out now. And this scenario only has one provider (Rackspace) involved! We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice collide. I'd still prefer strings though... Justin On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote: I want to get some input from all of you on what you think is the best way to approach this problem: the RS API requires that every instance have a unique ID, and we are currently creating these IDs by use of an auto-increment field in the instances table. The introduction of zones complicates this, as each zone has its own database. The two obvious solutions are a) a single, shared database and b) using a UUID instead of an integer for the ID. Both of these approaches have been discussed and rejected, so let's not bring them back up now. Given integer IDs and separate databases, the only obvious choice is partitioning the numeric space so that each zone starts its auto-incrementing at a different point, with enough room between starting ranges to ensure that they would never overlap. This would require some assumptions be made about the maximum number of instances that would ever be created in a single zone in order to determine how much numeric space that zone would need. I'm looking to get some feedback on what would seem to be reasonable guesses to these partition sizes. The other concern is more aesthetic than technical: we can make the numeric spaces big enough to avoid overlap, but then we'll have very large ID values; e.g., 10 or more digits for an instance. Computers won't care, but people might, so I thought I'd at least bring up this potential objection. -- Ed Leafe ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list:
Re: [Openstack] Instance IDs and Multiple Zones
With this, are we saying EC2API wouldn't be able to use the child zones in the same way as the OSAPI? From: Vishvananda Ishaya vishvana...@gmail.commailto:vishvana...@gmail.com Date: Tue, 22 Mar 2011 12:44:21 -0700 To: Justin Santa Barbara jus...@fathomdb.commailto:jus...@fathomdb.com Cc: Paul Voccio paul.voc...@rackspace.commailto:paul.voc...@rackspace.com, openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net, Chris Behrens chris.behr...@rackspace.commailto:chris.behr...@rackspace.com Subject: Re: [Openstack] Instance IDs and Multiple Zones The main issue that drove integers is backwards compatibility to the ec2_api and existing ec2 toolsets. People seemed very opposed to the idea of having two separate ids in the database, one for ec2 and one for the underlying system. If we want to move to another id scheme that doesn't fit in a 32 bit integer we have to provide a way for ec2 style ids to be assigned to instances, perhaps through a central authority that hands out unique ids. Vish On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote: The API spec doesn't seem to preclude us from doing a fully-synchronous method if we want to (it just reserves the option to do an async implementation). Obviously we should make scheduling fast, but I think we're fine doing synchronous scheduling. It's still probably going to be much faster than CloudServers on a bad day anyway :-) Anyone have a link to where we chose to go with integer IDs? I'd like to understand why, because presumably we had a good reason. However, if we don't have documentation of the decision, then I vote that it never happened, and instance ids are strings. We've always been at war with Eastasia, and all ids have always been strings. Justin On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio paul.voc...@rackspace.commailto:paul.voc...@rackspace.com wrote: I agree with the sentiment that integers aren't the way to go long term. The current spec of the api does introduce some interesting problems to this discussion. All can be solved. The spec calls for the api to return an id and a password upon instance creation. This means the api isn't asynchronous if it has to wait for the zone to create the id. From page 46 of the API Spec states the following: Note that when creating a server only the server ID and the admin password are guaranteed to be returned in the request object. Additional attributes may be retrieved by performing subsequent GETs on the server. This creates a problem with the bursting if Z1 calls to Z2, which is a public cloud, which has to wait for Z3-X to find out where it is going be placed. How would this work? pvo On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.commailto:chris.behr...@rackspace.com wrote: I think Dragon got it right. We need a zone identifier prefix on the IDs. I think we need to get away from numbers. I don't see any reason why they need to be numbers. But, even if they did, you can pick very large numbers and reserve some bits for zone ID. - Chris On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote: I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also possible, but requires some trickier protocols.) However, I agree with Monsyne: numeric IDs have got to go. Suppose I'm a customer of Rackspace CloudServers once it is running on OpenStack, and I also have a private cloud that the new Rackspace Cloud Business unit has built for me. I like both, and then I want to do cloud bursting in between them, by putting an aggregating zone in front of them. I think at that stage, we're screwed unless we figure this out now. And this scenario only has one provider (Rackspace) involved! We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice collide. I'd still prefer strings though... Justin On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.commailto:e...@leafe.com wrote: I want to get some input from all of you on what you think is the best way to approach this problem: the RS API requires that every instance have a unique ID, and we are currently creating these IDs by use of an auto-increment field
Re: [Openstack] Instance IDs and Multiple Zones
EC2 uses xsd:string for their instance id. I can't find any additional guarantees. Here's a (second hand) quote from Amazon: http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever Instance ids are unique. You'll never receive a duplicate id. However, the current format of the instance id is an implementation detail that is subject to change. If you use the instance id as a string, you should be fine. So, strings it is then? :-) On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya vishvana...@gmail.comwrote: The main issue that drove integers is backwards compatibility to the ec2_api and existing ec2 toolsets. People seemed very opposed to the idea of having two separate ids in the database, one for ec2 and one for the underlying system. If we want to move to another id scheme that doesn't fit in a 32 bit integer we have to provide a way for ec2 style ids to be assigned to instances, perhaps through a central authority that hands out unique ids. Vish On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote: The API spec doesn't seem to preclude us from doing a fully-synchronous method if we want to (it just reserves the option to do an async implementation). Obviously we should make scheduling fast, but I think we're fine doing synchronous scheduling. It's still probably going to be much faster than CloudServers on a bad day anyway :-) Anyone have a link to where we chose to go with integer IDs? I'd like to understand why, because presumably we had a good reason. However, if we don't have documentation of the decision, then I vote that it never happened, and instance ids are strings. We've always been at war with Eastasia, and all ids have always been strings. Justin On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio paul.voc...@rackspace.comwrote: I agree with the sentiment that integers aren't the way to go long term. The current spec of the api does introduce some interesting problems to this discussion. All can be solved. The spec calls for the api to return an id and a password upon instance creation. This means the api isn't asynchronous if it has to wait for the zone to create the id. From page 46 of the API Spec states the following: Note that when creating a server only the server ID and the admin password are guaranteed to be returned in the request object. Additional attributes may be retrieved by performing subsequent GETs on the server. This creates a problem with the bursting if Z1 calls to Z2, which is a public cloud, which has to wait for Z3-X to find out where it is going be placed. How would this work? pvo On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote: I think Dragon got it right. We need a zone identifier prefix on the IDs. I think we need to get away from numbers. I don't see any reason why they need to be numbers. But, even if they did, you can pick very large numbers and reserve some bits for zone ID. - Chris On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote: I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also possible, but requires some trickier protocols.) However, I agree with Monsyne: numeric IDs have got to go. Suppose I'm a customer of Rackspace CloudServers once it is running on OpenStack, and I also have a private cloud that the new Rackspace Cloud Business unit has built for me. I like both, and then I want to do cloud bursting in between them, by putting an aggregating zone in front of them. I think at that stage, we're screwed unless we figure this out now. And this scenario only has one provider (Rackspace) involved! We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice collide. I'd still prefer strings though... Justin On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote: I want to get some input from all of you on what you think is the best way to approach this problem: the RS API requires that every instance have a unique ID, and we are currently creating these IDs by use of an auto-increment field in the instances table. The introduction of zones complicates this, as each zone has its own database.
Re: [Openstack] Instance IDs and Multiple Zones
However, if we don't have documentation of the decision, then I vote that it never happened, and instance ids are strings. We've always been at war with Eastasia, and all ids have always been strings. This approach might help us in fixing some of the nastier bits of the openstack api images resource, as well. Justin Santa Barbara jus...@fathomdb.com said: ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp The API spec doesn't seem to preclude us from doing a fully-synchronous method if we want to (it just reserves the option to do an async implementation). Obviously we should make scheduling fast, but I think we're fine doing synchronous scheduling. It's still probably going to be much faster than CloudServers on a bad day anyway :-) Anyone have a link to where we chose to go with integer IDs? I'd like to understand why, because presumably we had a good reason. However, if we don't have documentation of the decision, then I vote that it never happened, and instance ids are strings. We've always been at war with Eastasia, and all ids have always been strings. Justin On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio paul.voc...@rackspace.comwrote: I agree with the sentiment that integers aren't the way to go long term. The current spec of the api does introduce some interesting problems to this discussion. All can be solved. The spec calls for the api to return an id and a password upon instance creation. This means the api isn't asynchronous if it has to wait for the zone to create the id. From page 46 of the API Spec states the following: Note that when creating a server only the server ID and the admin password are guaranteed to be returned in the request object. Additional attributes may be retrieved by performing subsequent GETs on the server. This creates a problem with the bursting if Z1 calls to Z2, which is a public cloud, which has to wait for Z3-X to find out where it is going be placed. How would this work? pvo On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote: I think Dragon got it right. We need a zone identifier prefix on the IDs. I think we need to get away from numbers. I don't see any reason why they need to be numbers. But, even if they did, you can pick very large numbers and reserve some bits for zone ID. - Chris On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote: I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also possible, but requires some trickier protocols.) However, I agree with Monsyne: numeric IDs have got to go. Suppose I'm a customer of Rackspace CloudServers once it is running on OpenStack, and I also have a private cloud that the new Rackspace Cloud Business unit has built for me. I like both, and then I want to do cloud bursting in between them, by putting an aggregating zone in front of them. I think at that stage, we're screwed unless we figure this out now. And this scenario only has one provider (Rackspace) involved! We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice collide. I'd still prefer strings though... Justin On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote: I want to get some input from all of you on what you think is the best way to approach this problem: the RS API requires that every instance have a unique ID, and we are currently creating these IDs by use of an auto-increment field in the instances table. The introduction of zones complicates this, as each zone has its own database. The two obvious solutions are a) a single, shared database and b) using a UUID instead of an integer for the ID. Both of these approaches have been discussed and rejected, so let's not bring them back up now. Given integer IDs and separate databases, the only obvious choice is partitioning the numeric space so that each zone starts its auto-incrementing at a different point, with enough room between starting ranges to ensure
Re: [Openstack] Instance IDs and Multiple Zones
Yes, that is what they say, Unfortunately all of the ec2 tools expect the current format that they are using to various degrees. Some just need the proper prefix (euca2ools) Others need the prefix + hex (elasticfox, irrc) Others allow a string but limit it to 11 chars, etc. So to keep compatibility we are stuck mimicking amazon's string version for now. Vish On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote: EC2 uses xsd:string for their instance id. I can't find any additional guarantees. Here's a (second hand) quote from Amazon: http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever Instance ids are unique. You'll never receive a duplicate id. However, the current format of the instance id is an implementation detail that is subject to change. If you use the instance id as a string, you should be fine. So, strings it is then? :-) On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: The main issue that drove integers is backwards compatibility to the ec2_api and existing ec2 toolsets. People seemed very opposed to the idea of having two separate ids in the database, one for ec2 and one for the underlying system. If we want to move to another id scheme that doesn't fit in a 32 bit integer we have to provide a way for ec2 style ids to be assigned to instances, perhaps through a central authority that hands out unique ids. Vish On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote: The API spec doesn't seem to preclude us from doing a fully-synchronous method if we want to (it just reserves the option to do an async implementation). Obviously we should make scheduling fast, but I think we're fine doing synchronous scheduling. It's still probably going to be much faster than CloudServers on a bad day anyway :-) Anyone have a link to where we chose to go with integer IDs? I'd like to understand why, because presumably we had a good reason. However, if we don't have documentation of the decision, then I vote that it never happened, and instance ids are strings. We've always been at war with Eastasia, and all ids have always been strings. Justin On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio paul.voc...@rackspace.com wrote: I agree with the sentiment that integers aren't the way to go long term. The current spec of the api does introduce some interesting problems to this discussion. All can be solved. The spec calls for the api to return an id and a password upon instance creation. This means the api isn't asynchronous if it has to wait for the zone to create the id. From page 46 of the API Spec states the following: Note that when creating a server only the server ID and the admin password are guaranteed to be returned in the request object. Additional attributes may be retrieved by performing subsequent GETs on the server. This creates a problem with the bursting if Z1 calls to Z2, which is a public cloud, which has to wait for Z3-X to find out where it is going be placed. How would this work? pvo On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote: I think Dragon got it right. We need a zone identifier prefix on the IDs. I think we need to get away from numbers. I don't see any reason why they need to be numbers. But, even if they did, you can pick very large numbers and reserve some bits for zone ID. - Chris On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote: I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also possible, but requires some trickier protocols.) However, I agree with Monsyne: numeric IDs have got to go. Suppose I'm a customer of Rackspace CloudServers once it is running on OpenStack, and I also have a private cloud that the new Rackspace Cloud Business unit has built for me. I like both, and then I want to do cloud bursting in between them, by putting an aggregating zone in front of them. I think at that stage, we're screwed unless we figure this out now. And this scenario only has one provider (Rackspace) involved! We can square the circle however - if we want numbers, let's use UUIDs - they're 128 bit numbers, and won't in practice
Re: [Openstack] Instance IDs and Multiple Zones
+1 Sounds like some IPV6 discussions back when the standards were being debated. We could debate bit-allocation forever. Why can't we use UUIDs? http://tools.ietf.org/html/rfc4122 2. Motivation One of the main reasons for using UUIDs is that no centralized authority is required to administer them (although one format uses IEEE 802 node identifiers, others do not). As a result, generation on demand can be completely automated, and used for a variety of purposes. The UUID generation algorithm described here supports very high allocation rates of up to 10 million per second per machine if necessary, so that they could even be used as transaction IDs. UUIDs are of a fixed size (128 bits) which is reasonably small compared to other alternatives. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general. Since UUIDs are unique and persistent, they make excellent Uniform Resource Names. The unique ability to generate a new UUID without a registration process allows for UUIDs to be one of the URNs with the lowest minting cost. Brian Schott bfsch...@gmail.com On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote: I know you don't want to resurrect a past discussion. But, UUIDs are designed to solve these kind of problems, frankly. The decision to go with integer IDs is a poor one, and will be negatively affecting the scalability and architecture of our systems well into the future. I'd love to see a discussion around moving away from internal integer identifiers and towards UUID internal identifiers at the next summit. Just my 2 cents, -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Seems resonable +1 to design summit discussion Vish On Mar 22, 2011, at 1:06 PM, Justin Santa Barbara wrote: Let's take a leadership position here and go with strings; we're not breaking Amazon's API. AWS will have to make the same changes when they reach our scale and ambition :-) We should also start engaging with client tools, because we're never going to be 100% EC2 compatible. At the least, our endpoints will be different. I think we should discuss this at the Design Summit, and then make an effort on this front as part of Diablo. On Tue, Mar 22, 2011 at 12:58 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: Yes, that is what they say, Unfortunately all of the ec2 tools expect the current format that they are using to various degrees. Some just need the proper prefix (euca2ools) Others need the prefix + hex (elasticfox, irrc) Others allow a string but limit it to 11 chars, etc. So to keep compatibility we are stuck mimicking amazon's string version for now. Vish On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote: EC2 uses xsd:string for their instance id. I can't find any additional guarantees. Here's a (second hand) quote from Amazon: http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever Instance ids are unique. You'll never receive a duplicate id. However, the current format of the instance id is an implementation detail that is subject to change. If you use the instance id as a string, you should be fine. So, strings it is then? :-) On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: The main issue that drove integers is backwards compatibility to the ec2_api and existing ec2 toolsets. People seemed very opposed to the idea of having two separate ids in the database, one for ec2 and one for the underlying system. If we want to move to another id scheme that doesn't fit in a 32 bit integer we have to provide a way for ec2 style ids to be assigned to instances, perhaps through a central authority that hands out unique ids. Vish On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote: The API spec doesn't seem to preclude us from doing a fully-synchronous method if we want to (it just reserves the option to do an async implementation). Obviously we should make scheduling fast, but I think we're fine doing synchronous scheduling. It's still probably going to be much faster than CloudServers on a bad day anyway :-) Anyone have a link to where we chose to go with integer IDs? I'd like to understand why, because presumably we had a good reason. However, if we don't have documentation of the decision, then I vote that it never happened, and instance ids are strings. We've always been at war with Eastasia, and all ids have always been strings. Justin On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio paul.voc...@rackspace.com wrote: I agree with the sentiment that integers aren't the way to go long term. The current spec of the api does introduce some interesting problems to this discussion. All can be solved. The spec calls for the api to return an id and a password upon instance creation. This means the api isn't asynchronous if it has to wait for the zone to create the id. From page 46 of the API Spec states the following: Note that when creating a server only the server ID and the admin password are guaranteed to be returned in the request object. Additional attributes may be retrieved by performing subsequent GETs on the server. This creates a problem with the bursting if Z1 calls to Z2, which is a public cloud, which has to wait for Z3-X to find out where it is going be placed. How would this work? pvo On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote: I think Dragon got it right. We need a zone identifier prefix on the IDs. I think we need to get away from numbers. I don't see any reason why they need to be numbers. But, even if they did, you can pick very large numbers and reserve some bits for zone ID. - Chris On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote: I think _if_ we want to stick with straight numbers, the following are the 'traditional' choices: 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6. Requires that you know in advance how many zones there are. 2) Prefixing - so zone0 would get 0xxx, zone1 1xx. 3) Central allocation - each zone would request an ID from a central pool. This might not be a bad thing, if you do want to have a quick lookup table of ID - zone. Doesn't work if the zones aren't under the same administrative control. 4) Block allocation - a refinement of #3, where you get a bunch of IDs. Effectively amortizes the cost of the RPC. Probably not worth the effort here. (If you want central allocation without a shared database, that's also
Re: [Openstack] Instance IDs and Multiple Zones
I remember reading this a while ago. Not saying we have to do this. This is probably why zones are independent and ids are not unique across zones in EC2. This could be handled in the ec2 api service for compatibility. We could just XOR the top half and the bottom half of a UUID and get a unique hash that just the EC2 API needs to keep track of. The only important thing is that the USER doesn't get id collisions. --- http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/ Anatomy of a Resource ID So how were the numbers above calculated? To find out, let’s decompose an EC2 resource ID. After comparing hundreds of IDs, this opaque identifier turned out to be a little more transparent than you’d expect. inline: ec2_resource_id.png Type The most trivial of the fields, the type is one of the following values, depending on the resource type: • i – instance • r – reservation • vol – EBS volume • snap – EBS snapshot • ami – Amazon machine image • aki – Amazon kernel image • ari – Amazon ramdisk image Inner ID The Inner ID is a 16-bit counter of resources allocated. Each time a resource is requested, the Inner ID increments by one. For instance and reservation IDs, it increments by two (i.e., these Inner IDs are always even). Instead of counting from 0- as you’d expect, the Inner ID uses the following cycle: • 4000-7FFF • -3FFF • C000- • 8000-BFFF (This cycle can be easily normalized by XORing with 4000.) When the Inner ID has exhausted its space, a new series begins (see below) and the cycle restarts. Series Marker For a given resource type, there is one active 8-bit Series ID. This Series ID, however, is not embedded directly into the resource ID. Instead, it is XORed to the leftmost 8 bits of the Inner ID. The result, which I call the Series Marker, is embedded in the ID to the left of the Inner ID. For example, on the resource ID above the Series ID would be e5 = a7 XOR 42. Series IDs usually decrement by one each time the Inner ID completes a cycle. I say “usually” because while this is the most common behavior, from time to time Series IDs seem to jump around in a pattern which is yet to be explained. UPDATE (Oct 7th 2009): RightScale contributed the missing piece: to normalize a series ID, XOR with E5 – this irons out the “jumps” I noticed perfectly. Superseries Marker For a given resource type, there is one active 8-bit Superseries ID. Like the Series ID, the Superseries ID is not embedded directly into the resource ID. Instead, it is XORed to the rightmost 8 bits of the Inner ID. The result – the Superseries Marker – is the leftmost byte of the resource ID. For example, on the resource ID above the Superseries ID would be 69 = 31 XOR 58. The Superseries ID changes so rarely that originally I had assumed it was some kind of checksum. This would have been odd as it limits the total available IDs to 224 = 16.8 million. Up to very recently, the Superseries ID for all resource types – instances, images, volumes, snapshots, etc. – was 69 (in the us-east-1 region (for eu-west-1 the Superseries ID is 74). These days, new instances use the Superseries ID 68. This subtle change, unnoticed by the industry, may hint at an astonishing achievement: 8.4 million instances launched since EC2′s debut! (Instance IDs are even so 8.4M = 16.8M / 2.) UPDATE (Oct 7th 2009): RightScale suggested to normalize the Superseries ID by XORing with 69. In this technique, the superseries ID for us-east-1 was 0, and the recent change incremented it to 1. Brian Schott bfsch...@gmail.com On Mar 22, 2011, at 3:44 PM, Vishvananda Ishaya wrote: The main issue that drove integers is backwards compatibility to the ec2_api and existing ec2 toolsets. People seemed very opposed to the idea of having two separate ids in the database, one for ec2 and one for the underlying system. If we want to move to another id scheme that doesn't fit in a 32 bit integer we have to provide a way for ec2 style ids to be assigned to instances, perhaps through a central authority that hands out unique ids. Vish On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote: The API spec doesn't seem to preclude us from doing a fully-synchronous method if we want to (it just reserves the option to do an async implementation). Obviously we should make scheduling fast, but I think we're fine doing synchronous scheduling. It's still probably going to be much faster than CloudServers on a bad day anyway :-) Anyone have a link to where we chose to go with integer IDs? I'd like to understand why, because presumably we had a good reason. However, if we don't have documentation of the decision, then I vote that it never happened, and instance ids are strings. We've always been at war with Eastasia, and all ids have always been strings. Justin On
Re: [Openstack] Instance IDs and Multiple Zones
See my previous response to Justin's email as to why UUIDs alone are not sifficient. -Eric On Tue, Mar 22, 2011 at 04:06:14PM -0400, Brian Schott wrote: +1 Sounds like some IPV6 discussions back when the standards were being debated. We could debate bit-allocation forever. Why can't we use UUIDs? http://tools.ietf.org/html/rfc4122 2. Motivation One of the main reasons for using UUIDs is that no centralized authority is required to administer them (although one format uses IEEE 802 node identifiers, others do not). As a result, generation on demand can be completely automated, and used for a variety of purposes. The UUID generation algorithm described here supports very high allocation rates of up to 10 million per second per machine if necessary, so that they could even be used as transaction IDs. UUIDs are of a fixed size (128 bits) which is reasonably small compared to other alternatives. This lends itself well to sorting, ordering, and hashing of all sorts, storing in databases, simple allocation, and ease of programming in general. Since UUIDs are unique and persistent, they make excellent Uniform Resource Names. The unique ability to generate a new UUID without a registration process allows for UUIDs to be one of the URNs with the lowest minting cost. Brian Schott bfsch...@gmail.com On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote: I know you don't want to resurrect a past discussion. But, UUIDs are designed to solve these kind of problems, frankly. The decision to go with integer IDs is a poor one, and will be negatively affecting the scalability and architecture of our systems well into the future. I'd love to see a discussion around moving away from internal integer identifiers and towards UUID internal identifiers at the next summit. Just my 2 cents, -jay ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Instance IDs and Multiple Zones
Good discussion. I need to understand a bit more about how cross org boundary bursting is envisioned to work before assessing the implications on server id format. Say a user hits the http://servers.myos.com api on zone A, which then calls out to http://servers.osprovider.com api in zone B, which calls out to http://dfw.servers.rackspace.com zone C, which calls down to http://zoned.dfw.servers.rackspace.com zone D (which would not be a public endpoint). [We'll exclude authN and the network implications for now :-] I assume the lowest zone (zone D) is responsible for assigning the id? Does that mean there are now 4 URIs for the same exact resource (I'm assuming a numeric server id here for a moment): http://zoned.dfw.servers.rackspace.com/v1.1/123/servers/12345 (this would be non-public) http://dfw.servers.rackspace.com/v1.1/123/servers/12345 http://servers.osprovider.com/v1.1/456/servers/12345 http://servers.myos.com/v1.1/789/servers/12345 I assume then the user is only returned the URI from the high level zone they are hitting (http://servers.myos.com/v1.1/789/servers/12345 in this example)? If so, that means the high level zone defines everything in the URI except the actually server ID which is assigned by the low level zone. Would users ever get returned a downstream URI they could hit directly? Pure numeric ids will not work in a federated model at scale. If you have registered zone prefixes/suffixes, you will limit the total zone count based on the number of digits you preallocate and need a registration process to ensure uniqueness. How many zones is enough? You could use UUID. If the above flow is accurate, I can only see how you create collisions in your OWN OS deployment. For example, if I purposefully create a UUID collision in servers.myos.com (that I run) with dfw.servers.rackspace.com (that Rackspace runs), it would only affect me since the collision would only be seen in the servers.myos.com namespace. Maybe I'm missing something, but I don't see how you could inject a collision ID downstream - you can just shoot yourself in your own foot. Eric Day, please jump in here if I am off. AFAICT, same applies to dns (which I will discuss more below). I could just make my server ID dns namespace collide with rackspace, but it would still only affect me in my own URI namespace. The other option apart from UUID is a globally unique string prefix. If Rackspace had 3 global API endpoints (ord, dfw, lon) each with 5 zones, the ID would need to be something like rax:dfw:1:12345 (I would actually want to hash the zone id 1 portion with something unique per customer so people couldn't coordinate info about zones and target attacks, etc.). This is obviously redundant with the Rackspace URI since we are representing Rackspace and the region twice, e.g. http://dfw.servers.rackspace.com/v1.1/12345/servers/rax:dfw:1:6789. This option also means we need a mechanism for registering unique prefixes. We could use the same one we are proposing for API extensions, or, as Eric pointed out, use dns, but that would REALLY get redundant, e.g. http://dfw.servers.rackspace.com/v1.1/12345/servers/6789.dfw.servers.racksp ace.com. Using strings also means people could make ids whatever they want as long as they obeyed the prefix/suffix. So one provider could be rax:dfw:1:12345 and another could be osprovider:8F792#@*jsn. That is technically not a big deal, but there is something for consistency and simplicity. The fundamental problem I see here is URI is intended to be the universal resource identifier but since zone federation will create multiple URIs for the same resource, the server id now has to be ANOTHER universal resource identifier. Another issue is whether you want transparency or opaqueness when you are federating. If you hit http://servers.myos.com, create two servers, and the ids that come back are (assuming using dns as server ids for a moment): http://servers.myos.com/v1.1/12345/servers/5678.servers.myos.com http://servers.myos.com/v1.1/12345/servers/6789.dfw.servers.rackspace.com It will be obvious in which deployment the servers live. This will effectively prevent whitelabel federating. UUID would be more opaque. Given all of the above, I think I lean towards UUID. Would love to hear more thought and dialog on this. Erik On 3/22/11 3:49 PM, Eric Day e...@oddments.org wrote: See my previous response to Justin's email as to why UUIDs alone are not sifficient. -Eric On Tue, Mar 22, 2011 at 04:06:14PM -0400, Brian Schott wrote: +1 Sounds like some IPV6 discussions back when the standards were being debated. We could debate bit-allocation forever. Why can't we use UUIDs? http://tools.ietf.org/html/rfc4122 2. Motivation One of the main reasons for using UUIDs is that no centralized authority is required to administer them (although one format uses IEEE 802 node identifiers, others do not). As a result, generation on demand can be
Re: [Openstack] Instance IDs and Multiple Zones
Pure numeric ids will not work in a federated model at scale. Agreed Maybe I'm missing something, but I don't see how you could inject a collision ID downstream - you can just shoot yourself in your own foot. I think that you can get away with it only in simple hierarchical structures. Suppose cloud users are combining multiple public clouds into their own 'megaclouds'. If I'm an evil public cloud operator, I can start handing out UUIDs that match any UUIDs I can discover on the Rackspace cloud, and anyone that has constructed a cloud that combines my cloud and Rackspace would have collisions. Users wouldn't easily know who to blame either. The other option apart from UUID is a globally unique string prefix. If Rackspace had 3 global API endpoints (ord, dfw, lon) each with 5 zones, the ID would need to be something like rax:dfw:1:12345 (I would actually want to hash the zone id 1 portion with something unique per customer so people couldn't coordinate info about zones and target attacks, etc.). This is obviously redundant with the Rackspace URI since we are representing Rackspace and the region twice, e.g. http://dfw.servers.rackspace.com/v1.1/12345/servers/rax:dfw:1:6789. I am in favor of this option, but with a few tweaks: 1) We use DNS, rather than inventing and administering our own scheme 2) I think the server ID looks like dfw.rackspace.com/servers/a282-a6-cj7aks89. It's not necessarily a valid HTTP endpoint, because there's a mapping to a protocol request 3) The client maps it by filling in the http/https protocol (or whatever protocol it is e.g. direct queuing), and it fills in v1.1 because that's the dialect it speaks. 4) Part of the mapping could be to map from a DNS name to an endpoint, perhaps using _srv records (I'm sure I'm mangling all the terminology here) 5) This also allows a form of discovery ... if I tell my cloud controller I want to use rackspace.com, it can then look up the _srv record, find the endpoint (e.g. openstack.rackspace.com), then do a zone listing request and find child zones etc. If I ask my monitoring system to monitor rackspace.com/servers/a6cj7aks89, it knows how to map that to an openstack endpoint. Auth is another story of course. Using strings also means people could make ids whatever they want as long as they obeyed the prefix/suffix. So one provider could be rax:dfw:1:12345 and another could be osprovider:8F792#@*jsn. That is technically not a big deal, but there is something for consistency and simplicity. True. We could restrict the character set to A-Z,0-9 and a few other 'safe characters' if this is a real problem. We probably should eliminate difficult-to-encode characters anyway, whether encoding means umlauts or url-encoding. The fundamental problem I see here is URI is intended to be the universal resource identifier but since zone federation will create multiple URIs for the same resource, the server id now has to be ANOTHER universal resource identifier. I think the server ID should be the unique identifier, and is more important than the REST representation. I think we should avoid remapping the URI unless we have to... (more later) It will be obvious in which deployment the servers live. This will effectively prevent whitelabel federating. UUID would be more opaque. Whitelabel federation for reselling an underlying provider can easily be done by rewriting strings: id.replace(rackspace.com, a.justinsbcloud.com ).replace(citrix.com, b.justinsbcloud.com). I suspect the same approach would work for internal implementation zones also. The truly dedicated will discover the underlying structure whatever scheme you put in place. Would users ever get returned a downstream URI they could hit directly? So now finally I think I can answer this (with my opinion)... Users should usually get the downstream URI. Just like DNS, they can either use that URI directly, or - preferably - use a local openstack endpoint, which acts a bit like a local DNS resolver. Your local openstack proxy could also do things like authentication, so - for example - I authenticate to my local proxy, and it then signs my request before forwarding it. This could also deal with the billing issue - the proxy can do charge-back and enforce internal spending limits and policies, and the public clouds can then bill the organization in aggregate. If you need the proxy to sign your requests, then you _can't_ use the downstream URI directly, which is a great control technique. Some clouds will want to use zones for internal operational reasons, and will want to keep the inner zones secret. So there, we need something like NAT: the front-end zone translates between public and private IDs as they travel in and out. How that translation works is deployment-dependent... they could have a mapping database, or could try to figure out a function which is aware of their internal structure to do this algorithmically. Let me try an example: