subject:"Re\: \[Openstack\] Instance IDs and Multiple Zones"

Good conversation guys. Certainly something we need to get settled out sooner 
than later.

On naming:

No matter how we shake it out (prefixes, mac address, time, etc), we're 
essentially fabricating our own form of UUID ... trying to pick some unique 
qualifier(s) to avoid collisions. 

I think the real driver is making something that is as-short-as-possible and 
mnemonic enough that a user could look at it and say yup, that's mine. 
Personally, I find UUID's to be ugly monsters and think URN's are better for 
providing a mnemonic for remembering names. 

Given: 6373-ba62-9847-feab-b72a-00dd vs. rax:ord:zone3:rack2:cust29:inst383 
... give me a URN anytime. However, this does pose security risks by exposing 
internal layouts. 

We currently allow a user supplied friendly name but under-the-hood use the 
instance ID. Since customers use different auth credentials their instances 
live in different Projects and there is no conflict. Duplicate names are 
allowed across customers (even within customers?) Downside is there are no 
hints for routing from names.

On bursting: 

Currently, the Instance ID is fabricated in the zone where the create() call 
was handled. This Instance ID is treated like a Reservation # which is returned 
to the user for later follow-up (since provisioning can take a while).

The way I currently envision bursting with zones is that the commercial zones 
would be the leaf zones in a deployment. That is, instances would be 
provisioned locally first (depending on Server Best Match) due to their low 
weight scores and ultimately burst through the bottom of the zone tree to the 
commercial cloud. 

I think this works well. If I have a hybrid cloud and issue 'nova list' I would 
see something like:

sleepy - com:myco:development:inst1
dopey - com:myco:development:inst2
blinky - com:myco:development:inst3
inky - rax:ord:zone3:rack2:cust293:inst393
pinky - rax:ord:zone2:rack34:cust293:inst8746
clyde - bobscloud:basement:shelf2:cust9:inst8

and get a good idea of what's what.



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ewan Mellor

We shouldn't keep tainting this argument with concerns about whether the IDs 
are readable or not.  We have UIs and CLIs to make things readable for humans.

We have to accept that, on the scales we care about, any unique ID is going to 
be incomprehensible to a human.  Rely on your presentation layer, that's what 
it's there for!

Ewan.

 -Original Message-
 From: openstack-bounces+ewan.mellor=citrix@lists.launchpad.net
 [mailto:openstack-bounces+ewan.mellor=citrix@lists.launchpad.net]
 On Behalf Of Sandy Walsh
 Sent: 23 March 2011 12:30
 To: openstack@lists.launchpad.net
 Subject: Re: [Openstack] Instance IDs and Multiple Zones
 
 Good conversation guys. Certainly something we need to get settled out
 sooner than later.
 
 On naming:
 
 No matter how we shake it out (prefixes, mac address, time, etc), we're
 essentially fabricating our own form of UUID ... trying to pick some
 unique qualifier(s) to avoid collisions.
 
 I think the real driver is making something that is as-short-as-
 possible and mnemonic enough that a user could look at it and say yup,
 that's mine. Personally, I find UUID's to be ugly monsters and think
 URN's are better for providing a mnemonic for remembering names.
 
 Given: 6373-ba62-9847-feab-b72a-00dd vs.
 rax:ord:zone3:rack2:cust29:inst383 ... give me a URN anytime.
 However, this does pose security risks by exposing internal layouts.
 
 We currently allow a user supplied friendly name but under-the-hood use
 the instance ID. Since customers use different auth credentials their
 instances live in different Projects and there is no conflict.
 Duplicate names are allowed across customers (even within customers?)
 Downside is there are no hints for routing from names.
 
 On bursting:
 
 Currently, the Instance ID is fabricated in the zone where the create()
 call was handled. This Instance ID is treated like a Reservation #
 which is returned to the user for later follow-up (since provisioning
 can take a while).
 
 The way I currently envision bursting with zones is that the commercial
 zones would be the leaf zones in a deployment. That is, instances would
 be provisioned locally first (depending on Server Best Match) due to
 their low weight scores and ultimately burst through the bottom of
 the zone tree to the commercial cloud.
 
 I think this works well. If I have a hybrid cloud and issue 'nova list'
 I would see something like:
 
 sleepy - com:myco:development:inst1
 dopey - com:myco:development:inst2
 blinky - com:myco:development:inst3
 inky - rax:ord:zone3:rack2:cust293:inst393
 pinky - rax:ord:zone2:rack34:cust293:inst8746
 clyde - bobscloud:basement:shelf2:cust9:inst8
 
 and get a good idea of what's what.
 
 
 
 Confidentiality Notice: This e-mail message (including any attached or
 embedded documents) is intended for the exclusive and confidential use
 of the
 individual or entity to which this message is addressed, and unless
 otherwise
 expressly indicated, is confidential and privileged information of
 Rackspace.
 Any dissemination, distribution or copying of the enclosed material is
 prohibited.
 If you receive this transmission in error, please notify us immediately
 by e-mail
 at ab...@rackspace.com, and delete the original message.
 Your cooperation is appreciated.
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

On Mar 23, 2011, at 8:46 AM, Ewan Mellor wrote:

 We have to accept that, on the scales we care about, any unique ID is going 
 to be incomprehensible to a human.  Rely on your presentation layer, that's 
 what it's there for!


+1


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

On Mar 23, 2011, at 11:28 AM, Chris Behrens wrote:

 How would the admin API know which ID to work with if there are collisions?  
 Eric's point is that we'd not know where to route the request.


This reflects a fundamental misunderstanding of the way inter-zone 
communication works. There is no direct routing. Instead, a zone knows about 
its instances and its child zones. If the zone receives a request for some 
action involving a particular instance, it checks if it has that instance among 
its compute nodes; if not, it forwards the request to each of its child zones. 
That is repeated until the leaf zones are reached, and most of those will 
respond with something akin to a 404, indicating that they didn't handle the 
request. The zone that does have the requested instance, though, will carry out 
the action and return the result of that action.

The child zone responses are then aggregated. If all indicate 404, the 
zone returns the same. If one child responds that it has handled the request, 
that response is returned. This repeats back up the zone tree until the zone 
that originally received the request has heard from all of its child zones (or 
they timed out). 

If there were to be a collision (i.e., two leaf nodes handling the 
request), there are only two possibilities: either the authenticated user has 
rights to those nodes, or they do not. If they do not, nothing will happen 
beyond an authorization failure message. If they do have rights to both 
instances, then the action will happen to both instances. Since the context of 
this discussion is deliberate spoofing, my response would be serves them 
right. :)

So it seems that spoofing should have no effect, assuming that our 
authentication/authorization system is sound. If it isn't, then we have bigger 
issues than just ID spoofing, since I could write a program to send API delete 
requests for random instance IDs - no spoofing required.

Without spoofing, let's be realistic: the chance of duplicate uuid 
values colliding is much, much smaller than the chance of a meteorite smashing 
into our data centers. From Wikipedia: In other words, only after generating 1 
billion UUIDs every second for the next 100 years, the probability of creating 
just one duplicate would be about 50%. I believe that that is well beyond our 
scalability goals, so we can effectively ignore the impact of non-spoofed 
collisions.


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Chris Behrens

You have a fundamental misunderstanding of my fundamental understanding of how 
inter-zone communication works. :)  I understand how it works.  I'm asking 
about an admin API that has privileges for actions for all VMs.  As an ISP, I 
want to disable a particular VM because it's being 'bad'.  If someone has 
injected a collision, I would be sending an action to more than 1 VM, not only 
the intended target.  I don't see how collisions can be made to work at all.

And yes, we're talking about spoofing (or really, purposefully colliding a 
known UUID).  I haven't seen any mention to anything else (although I may have 
missed it).  I'm certainly really not worried about machine generated UUIDs 
colliding, myself.

But what we're also talking about here is efficient routing.  Is it necessary?  
No.  Would it scale?  Yes.  A zone name or ID needs to be part of the 
identifier.  I prefer the DNS name idea, although prefixing UUIDs or reserving 
bits in a UUID could also work.

- Chris

On Mar 23, 2011, at 9:01 AM, Ed Leafe wrote:

 On Mar 23, 2011, at 11:28 AM, Chris Behrens wrote:
 
 How would the admin API know which ID to work with if there are collisions?  
 Eric's point is that we'd not know where to route the request.
 
 
   This reflects a fundamental misunderstanding of the way inter-zone 
 communication works. There is no direct routing. Instead, a zone knows 
 about its instances and its child zones. If the zone receives a request for 
 some action involving a particular instance, it checks if it has that 
 instance among its compute nodes; if not, it forwards the request to each of 
 its child zones. That is repeated until the leaf zones are reached, and most 
 of those will respond with something akin to a 404, indicating that they 
 didn't handle the request. The zone that does have the requested instance, 
 though, will carry out the action and return the result of that action.
 
   The child zone responses are then aggregated. If all indicate 404, the 
 zone returns the same. If one child responds that it has handled the request, 
 that response is returned. This repeats back up the zone tree until the zone 
 that originally received the request has heard from all of its child zones 
 (or they timed out). 
 
   If there were to be a collision (i.e., two leaf nodes handling the 
 request), there are only two possibilities: either the authenticated user has 
 rights to those nodes, or they do not. If they do not, nothing will happen 
 beyond an authorization failure message. If they do have rights to both 
 instances, then the action will happen to both instances. Since the context 
 of this discussion is deliberate spoofing, my response would be serves them 
 right. :)
 
   So it seems that spoofing should have no effect, assuming that our 
 authentication/authorization system is sound. If it isn't, then we have 
 bigger issues than just ID spoofing, since I could write a program to send 
 API delete requests for random instance IDs - no spoofing required.
 
   Without spoofing, let's be realistic: the chance of duplicate uuid 
 values colliding is much, much smaller than the chance of a meteorite 
 smashing into our data centers. From Wikipedia: In other words, only after 
 generating 1 billion UUIDs every second for the next 100 years, the 
 probability of creating just one duplicate would be about 50%. I believe 
 that that is well beyond our scalability goals, so we can effectively ignore 
 the impact of non-spoofed collisions.
 
 
 -- Ed Leafe
 
 
 


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

Hi Ed,

On Wed, Mar 23, 2011 at 08:15:54AM -0400, Ed Leafe wrote:
 On Mar 23, 2011, at 1:55 AM, Eric Day wrote:
 
  If we provide some structure to the IDs, such as DNS names, we not only
  solve this namespacing problem but we also get a much more efficient
  routing mechanism.
 
 
   When I read things like this, the DBA in me winces a little. Meaningful 
 PKs, compound PKs - they always end up being a Very Bad Thing. If you want to 
 add efficient DNS routing, that could be added as additional data about an 
 instance that is periodically updated up the zone structure along with the 
 other capability information, but until now we've passed on that as a 
 premature optimization. That was one of the major arguments in favor of the 
 global DB design.

We're talking about a number of partitioning schemes, reserved bits,
URNs, URIs, etc. Because of the namespace issue I believe we will
need some structure to our resource names.

  Lets say you have api.rackspace.com (global aggregation zone),
  rack1.dfw.rackspace.com (real zone running instances), and
  bursty.customer.com (private zone). Bursty is a rackspace customer
  and they want to leverage their private resources alongside the
  public cloud, so they add bursty.customer.com as a private zone
  for their Rackspace account. The api.rackspace.com server now gets
  a terminate request for id x and it needs to know where to route
  the request. If we have a global namespace for instances (such as
  UUIDs), rack1.dfw.rackspace.com and bursty.customer.com could both
  have servers for id x (most likely from bursty spoofing the ID). Now
  api.rackspace.com doesn't know who to forward the request to.
 
   Even if this scenario were to happen, and nova tried to delete an 
 instance with a spoofed ID that did *not* belong to Bursty, it would fail due 
 to improper auth. Otherwise, even without zones/uuids/whatever, I could send 
 termination requests to the API with random IDs and delete any machines with 
 those IDs, whether I had rights to them or not. 

This implies the resource is now uniquely identified along with auth
credentials, which means the resource name cannot stand alone. If
we do have collisions due to spoofing, we're going to see ambiguity
issues crop up in other systems that don't have the auth context. I
strongly believe we need unique resource names that stand on our own
and don't depend on any other component such as auth.

   In the current zone design, a request to terminate id x would not be 
 handled by the outermost zone, since it wouldn't have instances, so it would 
 be forward to each child zone. This would repeat down the zone hierarchy 
 until either there were no more child zones, or a zone found that it had an 
 instance with that ID. In the Bursty example, two zones would find an 
 instance with that ID; one would fail due to auth, and the one owned by 
 Bursty would be terminated as requested. The only way more than one instance 
 would terminate would be if Bursty spoofed their own IDs, which would be 
 their problem, not ours.

I think the In the current zone design is my main concern. This
discussions is taking into account how things need to work in the
near future, not just now. We've punted on routing for now and are
simply sending the request to every zone, but this won't work in the
long run. If we had a large public cloud with hundreds of zones,
and thousands of bursting zones, things will get prohibitively
expensive. It's not that they won't function, it just may be
unreasonable response time.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

Pvo brought up a good use case for naming a little while ago: Migrations.

If we use the instance id (assume UNC) to provide hints to the target zone, 
this means the instance id would need to change should the instance move 
locations. That's a no-no by everyone's measure. 

So, now I'm thinking more about Justin's comment about an external registry.

Perhaps a glance-like entry with metadata that can change?

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

I asked for further details in IRC, which started a discussion
there. To sum up, most folks agree migrations within a zone won't
require a new instance ID. Nothing changes except the compute host
it's running on. Migrations outside of a zone would require a new
instance ID, but this should be fine, since other things would also
change (such as IP, available volumes, ...). A cross-zone migration
will be more of a copy+delete than a proper move.

-Eric

On Wed, Mar 23, 2011 at 06:14:31PM +, Sandy Walsh wrote:
 Pvo brought up a good use case for naming a little while ago: Migrations.
 
 If we use the instance id (assume UNC) to provide hints to the target zone, 
 this means the instance id would need to change should the instance move 
 locations. That's a no-no by everyone's measure. 
 
 So, now I'm thinking more about Justin's comment about an external registry.
 
 Perhaps a glance-like entry with metadata that can change?
 
 Confidentiality Notice: This e-mail message (including any attached or
 embedded documents) is intended for the exclusive and confidential use of the
 individual or entity to which this message is addressed, and unless otherwise
 expressly indicated, is confidential and privileged information of Rackspace.
 Any dissemination, distribution or copying of the enclosed material is 
 prohibited.
 If you receive this transmission in error, please notify us immediately by 
 e-mail
 at ab...@rackspace.com, and delete the original message.
 Your cooperation is appreciated.
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

On Mar 23, 2011, at 3:00 PM, Eric Day wrote:

 Migrations outside of a zone would require a new
 instance ID, but this should be fine, since other things would also
 change (such as IP, available volumes, ...). 

That's probably true in the Rackspace use case, as zones would most 
likely be physically separate hardware, but nothing about zones makes that 
mandatory. 


-- Ed Leafe


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

On Wed, Mar 23, 2011 at 07:40:20PM +, Ed Leafe wrote:
  Migrations outside of a zone would require a new
  instance ID, but this should be fine, since other things would also
  change (such as IP, available volumes, ...). 
 
   That's probably true in the Rackspace use case, as zones would most 
 likely be physically separate hardware, but nothing about zones makes that 
 mandatory. 

It does currently, I wasn't speaking specifically to Rackspace's
use case. Right now some network and volume code are not aware of
cross-zone issues, and instead assume they are the authority for things
like configured IP ranges. We can certainly change this, and if we
do want to allow proper instance migrations between zones, we would
need to allow instance IDs to change. I don't see the importance of
enabling cross-zone migrations (backup+restore seems sufficient). I
may be wrong, but if we did enable this functionality in the future,
I don't see a reason not to allow resource IDs to change.

Sounds like a design summit topic if folks feel we should support this.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ewan Mellor

 From: openstack-bounces+ewan.mellor=citrix@lists.launchpad.net 
 [mailto:openstack-
 bounces+ewan.mellor=citrix@lists.launchpad.net] On Behalf Of Justin Santa 
 Barbara
 Sent: 23 March 2011 19:22
 To: Eric Day
 Cc: openstack@lists.launchpad.net
 Subject: Re: [Openstack] Instance IDs and Multiple Zones

  Migrations outside of a zone would require a new
  instance ID, but this should be fine, since other things would also
  change (such as IP, available volumes, ...). A cross-zone migration
  will be more of a copy+delete than a proper move.

 +1 on this.  If the IP is changing, there's little point in trying to keep 
 the ID the same.  Great 
 point.

I don't agree at all.  There are many good reasons to preserve the identity of 
a VM even when it's IP or location changes.  Billing, for example.  Access 
control.  Intrusion detection.

Just because I move a VM from one place to another, why would I expect its 
identity to change?

Ewan.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Ewan Mellor

-Original Message-
From: Paul Voccio [mailto:paul.voc...@rackspace.com]
Sent: 23 March 2011 22:19
To: Ewan Mellor; Justin Santa Barbara; Eric Day
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Instance IDs and Multiple Zones

I don't agree at all. There are many good reasons to preserve the
identity of a VM even when it's IP or location changes. Billing, for
example. Access control. Intrusion detection.

Just because I move a VM from one place to another, why would I expect
its identity to change?

Where do we put the boundary on the preservation of id? Within the same
deployment? Within the same zone topology? I'm not quite following the
billing aspect. If you shut one down and start another that is a
problem
for billing?

You stated earlier today:
We have to accept that, on the scales we care about, any unique ID is
going to be incomprehensible to a human. Rely on your presentation
layer,
that's what it's there for!

Is this really different? If the id changes, should the user care if it is
presented in the same way with the same data? Am I missing something?

I certainly didn't intend for those statements to be contradictory. I don't
think that they are.

My view is that identity should be preserved as long as it's possible to do so.
A VM that moves around, gets resized, gets rebooted, etc, should have the same
identity.

By identity I mean that other pieces of software should be able to tell that
it's the same thing. A billing system should be able to say that's the same
VM that I saw before. For example, if I charge my customers for a month of
usage, even if they only run the VM for a part of that month, then my billing
system needs to be able to say that VM has moved from here to here, but it's
actually the same VM, so I'm charging for one month, not two. This is the
current charging scheme for RHEL instances hosted on Rackspace Cloud
(http://www.rackspace.com/cloud/blog/2010/08/31/red-hat-license-fee-for-rackspace-cloud-servers-changing-from-hourly-to-monthly/),
not just a corner-case example.

You can invent similar arguments for penetration detection systems (that VM is
acting the way that it used to) or any other system for enforcing policy.

If you are using some kind of location- or path-based identifier for that VM,
then client software has to be notified of and keep track of all the movement
of the VM. If you have a unique identifier, then clients don't have to do any
of this.

My point about the UI was that we shouldn't worry about how complex these IDs
should be. We should make sure that bits of software can talk to each other
correctly and simply, and base our ID scheme on those needs. Once we've
figured out what ID scheme we're using, it's _trivial_ for a UI or CLI to turn
those ugly IDs into Paul's Apache server and Ewan's build machine.

To your point about the boundary of preservation of ID, that's a good question.
If you ignore the security / trust issues, then the obvious answer is that IDs
should be globally, infinitely, permanently unique. That's what UUIDs are for.
We can generate these randomly without any need for a central authority, and
with no fear of collisions. It would certainly be nice if my VM can leave my
SoftLayer DC and arrive in my Rackspace DC and when it comes back I still know
that it's the same VM. That's the OpenStack dream, right?

I'm willing to accept that that's difficult to achieve, and I'd compromise on
identity only being preserved within an ownership/trust boundary. I really
don't see why I should lose track of my VM when it moves from one zone to
another within a given provider though.

Ewan.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

OK, time for everyone to step back and take a deep breath.

There are many implications of the earlier design decision to use 
integer PKs for database entries. Most who have responded here, myself 
included, have indicated that they would prefer that this be changed to either 
a string value comprised of several meaningful bits of information, or a UUID 
approach, or some combination of things that would address various things in 
the operation of a zoned design. I think that this will make an excellent 
discussion at next month's design summit!

But the reality is that this needs to be developed now, under the 
current design of integer PKs. Please note that the only concern here is how to 
reconcile the Rackspace API requirement of globally unique instance IDs with 
the current design of generating PKs in local databases at the compute node 
level. To my understanding, there is no other alternative than partitioning the 
available integer range across zones, so that each zone generates its instance 
PKs starting from a different number, and spaced far enough apart that they 
will never overlap.

In the first post of this thread, I proposed a simple partitioning 
system: allocating a range of integers for each zone, and asked for feedback as 
to what people would think would be a reasonable estimate for the maximum 
number of instances a zone would ever need to create. Most shared my distaste 
for this sort of partitioning system, but no one offered an alternative that 
would be workable given the current constraints. So I'm going to implement a 
partition of 1 billion integers per zone, which should allow for approximately 
1 billion zones, given a 64 bit integer for the PK. This should be workable for 
now, and after the design summit, when we've come to a consensus on changing 
the API to accept something other than integer identifiers, it should not be 
too difficult to retrofit.

Unless someone has a better idea... ;-)


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

(sorry Eric, meant to send to the list)
-S

From: Eric Day [e...@oddments.org]
 Do we want this namespace per zone, deployment, resource owner, or some other 
 dimension?

Good question. We can prevent collisions at the zone level and within a 
deployment (single provider / multi-zone). But hybrid clusters are a different 
matter. Regardless of how we delineate it or which ID scheme we use, we have no 
way of detecting collisions.

In the top-level zones of hybrid installations, all instances.get(id) calls 
issued would have to assume they could get back more than one instance. Ugly, 
but perhaps this is just the nature of the problem?

This includes for 64-bit integer, 1-billion per zone approaches ... but so be 
it.

Let's just get something working.

-S


Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

On Thu, Mar 24, 2011 at 12:23:42AM +, Sandy Walsh wrote:
 From: Eric Day [e...@oddments.org]
  Do we want this namespace per zone, deployment, resource owner, or some 
  other dimension?
 
 Good question. We can prevent collisions at the zone level and within a 
 deployment (single provider / multi-zone). But hybrid clusters are a 
 different matter. Regardless of how we delineate it or which ID scheme we 
 use, we have no way of detecting collisions.

Why not? Some schemes such as the ID.DNS name + ssl cert check I
mentioned before allow us to verify the authenticity of a namespace
before it is used. No other peer could register a zone with that
name unless the cert checks out. Within that zone Nova will prevent
collisions, but if things are really broken (accident or on purpose)
and it starts returning duplicate resource IDs, peer zones can choose
to just use one/none. We can document the behavior as undefined.

So, sure, you can still have duplicates within a zone (or other
namespace), but at least it's self contained and others peering with
it don't need to concern itself or worry about spoofing attacks within
it's own namespace.

 In the top-level zones of hybrid installations, all instances.get(id) calls 
 issued would have to assume they could get back more than one instance. Ugly, 
 but perhaps this is just the nature of the problem?

If we define the API for that call to only return a single instance,
it is up to the child zone to choose which one to send. If it tries
to return an array for a single ID, it would just be a protocol error
and fail.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

 From: Eric Day [e...@oddments.org]
  On Thu, Mar 24, 2011 at 12:23:42AM +, Sandy Walsh wrote:
  Regardless of how we delineate it or which ID scheme we use, we have no way 
  of detecting collisions.
 Why not? Some schemes such as the ID.DNS name + ssl cert check I
mentioned before allow us to verify the authenticity of a namespace
before it is used. No other peer could register a zone with that
name unless the cert checks out. 

Hmm, yeah, you're right, the SSL cert approach should work for validating 
unique zone names. Funny, myself and pvo were talking about that route 
yesterday. 

But will it help us with the duplicates problem? ...

Within that zone Nova will prevent collisions, but if things are really broken 
(accident or on purpose)
and it starts returning duplicate resource IDs, peer zones can choose to just 
use one/none. We can document the behavior as undefined.

I'm not sure that's a good thing ... the use case I was thinking of is the 
customer using two providers:

The customer has his own Openstack deployment (range 0-1B) and outsources to 
Provider-A and Provider-B.
Sadly, Pro-A and Pro-B both use the default ID ranges for service providers 
(let's say 10-11B). 
The customer starts provisioning instances to both provider zones evenly ... 
pow, duplicates.

The customer won't be happy that sometimes he gets status on Instance 
10,000,000,001 from Provider-A and sometimes from Provider-B. Or none at all.

If we append the DNS name of the provider, we bust RS 1.0 compatibility. 

Perhaps you can walk me through how you see the Cert check helping here 
(assuming no prefix on id)?
Or are we assuming that bursting is a RS x.0 API feature and things will change 
then?

-S

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace. 
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message. 
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

On Mar 23, 2011, at 8:59 PM, Eric Day wrote:

 May I ask what is the point of doing this if it won't make cactus and
 we're just going to replace it in a month or two? I think we all agree
 that 64-bit integer IDs are insufficient for multi-zone deployments,
 so no one will be deploying this until we sort it out and come up
 with a better ID.


Because this is just one part of the process of creating a distributed 
scheduler. The process for selecting a host for a new instance won't depend on 
the type of PK used for that instance in a db table.

The only reason I brought it up was that Sandy pointed out this 
uniqueness requirement, and we felt it would be a good idea to ask the list if 
they had any good ideas about alternatives to range partitions. I prefaced my 
initial post with a disclaimer that I wasn't looking to re-argue things that 
had already been discussed and agreed to, but I guess most people missed that 
part. :)


-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

Hi Sandy,

On Thu, Mar 24, 2011 at 01:01:18AM +, Sandy Walsh wrote:
  From: Eric Day [e...@oddments.org]
 Within that zone Nova will prevent collisions, but if things are really 
 broken (accident or on purpose)
 and it starts returning duplicate resource IDs, peer zones can choose to just 
 use one/none. We can document the behavior as undefined.
 
 I'm not sure that's a good thing ... the use case I was thinking of is the 
 customer using two providers:
 
 The customer won't be happy that sometimes he gets status on Instance 
 10,000,000,001 from Provider-A and sometimes from Provider-B. Or none at all.
 
 If we append the DNS name of the provider, we bust RS 1.0 compatibility. 

I think this is fine. RS 1.0, just like the EC2 API, were not designed
with federation in mine. We should not try to jump through hoops to
force it if we have the luxury of defining the next API version and
supporting it more elegantly there.

As for backwards compatibility for RS 1.0/EC2, those APIs could
depend on a global mapping server for non-bursting zones to translate
nova-internal IDs (id.zone) to what they need (integer, etc.), but this
should not be a core component of Nova since it goes against our design
tenets. It should be deprecated (along with the APIs) and shutdown
in a timely manner once the new API and tools are available. Managing
resources in bursting zones would only be available through the new API
(along with other new features), so there will be plenty of incentive
for clients to change.

 Perhaps you can walk me through how you see the Cert check helping here 
 (assuming no prefix on id)?
 Or are we assuming that bursting is a RS x.0 API feature and things will 
 change then?

Yeah, the cert check verifies the zone nova.example.com can
return resource IDs named *.nova.example.com, all others should be
ignored. The ID's need the zone name suffix for it to make sense.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-23 Thread Justin Santa Barbara


  So I'm going to implement a partition of 1 billion integers per zone,
 which should allow for approximately 1 billion zones, given a 64 bit integer
 for the PK. This should be workable for now, and after the design summit,
 when we've come to a consensus on changing the API to accept something other
 than integer identifiers, it should not be too difficult to retrofit.


The type of a server @id in CloudServers is xsd:int, which is a 32-bit
signed integer:
http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd

So if you have 1 billion integers per zone, you only get 2 zones.  You can
have 4 if you're willing to go negative, but surely it's too early in the
campaign. http://docs.rackspacecloud.com/servers/api/v1.0/xsd/server.xsd

I think the only way long-term we're going to have CloudServers
v1.0 compatibility is by having a proxy that bridges between legacy APIs
(EC2 and CS) and future APIs (OpenStack).  I'm guessing that proxy will have
to be stateful to implement mappings of server IDs etc.  Yes, this sucks.
 But at some stage you have to say you know, maybe 640KB wasn't enough, and
we have to make some changes

How about this as a solution: use ranges as you suggest, but let the
starting points for the zone-ids that child-zones draw from be
customer-configured.  We're pushing the problem onto the end-user, but they
probably know best anyway, and we don't really expect anyone to use
sub-zones in anger anyway until Diablo or later, right?

Justin
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

On Mar 23, 2011, at 9:54 PM, Eric Day wrote:

 I don't think anyone is arguing, all the discussion has been very
 healthy IMHO.


Of course we are arguing - presenting evidence for a particular 
position in an effort to persuade is argument. The arguments have not become 
heated or personal, if that's what you meant.

Differing ideas and opposing POVs are wonderful, IMO. Groupthink is 
what should be avoided as unhealthy.



-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

Ok. :)  The original statement felt like it was written with negative
connotations, and I just wanted to say I think it's all been positive.

-Eric

On Wed, Mar 23, 2011 at 10:09:50PM -0400, Ed Leafe wrote:
 On Mar 23, 2011, at 9:54 PM, Eric Day wrote:
 
  I don't think anyone is arguing, all the discussion has been very
  healthy IMHO.
 
 
   Of course we are arguing - presenting evidence for a particular 
 position in an effort to persuade is argument. The arguments have not become 
 heated or personal, if that's what you meant.
 
   Differing ideas and opposing POVs are wonderful, IMO. Groupthink is 
 what should be avoided as unhealthy.
 
 
 
 -- Ed Leafe
 
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jon Slenk

the IDs must be strictly numericalish numbers, with nothing smelling
of something like a string in there, i take it?

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Ed Leafe

On Mar 22, 2011, at 1:11 PM, Jon Slenk wrote:

 the IDs must be strictly numericalish numbers, with nothing smelling
 of something like a string in there, i take it?


Well, since they are defined as: `id` int(11) NOT NULL AUTO_INCREMENT,
I would say the chance of a stringish thing slipping in is pretty small. :)



-- Ed Leafe




___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jon Slenk

On Tue, Mar 22, 2011 at 10:41 AM, Ed Leafe e...@leafe.com wrote:
        Well, since they are defined as: `id` int(11) NOT NULL AUTO_INCREMENT,
 I would say the chance of a stringish thing slipping in is pretty small. :)

if the schema cannot be changed (which might be worth reconsidering
since it seems to be a bit of a root cause of trouble) then maybe you
have to reserve the last 4 or 5 digits of the id to be the zone id,
and then autoincrement on top of that? on the assumption that there
would be a limit of  or 9 zones ever.

but really i'd hazard to suggest that it should somehow be 2 parts,
neither of which are super constrained: a zone part and an in-zone-id
part.

it could even be that the id is left as-is and is semantically
required to be joined with the zone name as a prefix before it is a
valid interzone id.

sincerely.

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

I think _if_ we want to stick with straight numbers, the following are the
'traditional' choices:

1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6.
 Requires that you know in advance how many zones there are.
2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
3) Central allocation - each zone would request an ID from a central pool.
 This might not be a bad thing, if you do want to have a quick lookup table
of ID - zone.  Doesn't work if the zones aren't under the same
administrative control.
4) Block allocation - a refinement of #3, where you get a bunch of IDs.
 Effectively amortizes the cost of the RPC.  Probably not worth the effort
here.

(If you want central allocation without a shared database, that's also
possible, but requires some trickier protocols.)

However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm a
customer of Rackspace CloudServers once it is running on OpenStack, and I
also have a private cloud that the new Rackspace Cloud Business unit has
built for me.  I like both, and then I want to do cloud bursting in between
them, by putting an aggregating zone in front of them.  I think at that
stage, we're screwed unless we figure this out now.  And this scenario only
has one provider (Rackspace) involved!

We can square the circle however - if we want numbers, let's use UUIDs -
they're 128 bit numbers, and won't in practice collide.  I'd still prefer
strings though...

Justin



On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote:

I want to get some input from all of you on what you think is the
 best way to approach this problem: the RS API requires that every instance
 have a unique ID, and we are currently creating these IDs by use of an
 auto-increment field in the instances table. The introduction of zones
 complicates this, as each zone has its own database.

The two obvious solutions are a) a single, shared database and b)
 using a UUID instead of an integer for the ID. Both of these approaches have
 been discussed and rejected, so let's not bring them back up now.

Given integer IDs and separate databases, the only obvious choice is
 partitioning the numeric space so that each zone starts its
 auto-incrementing at a different point, with enough room between starting
 ranges to ensure that they would never overlap. This would require some
 assumptions be made about the maximum number of instances that would ever be
 created in a single zone in order to determine how much numeric space that
 zone would need. I'm looking to get some feedback on what would seem to be
 reasonable guesses to these partition sizes.

The other concern is more aesthetic than technical: we can make the
 numeric spaces big enough to avoid overlap, but then we'll have very large
 ID values; e.g., 10 or more digits for an instance. Computers won't care,
 but people might, so I thought I'd at least bring up this potential
 objection.



 -- Ed Leafe




 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Monsyne Dragon

Also, I should note that there seems to be merges pending to make the 
v1.1 api use urls as instance identifiers in api calls, rather than 
integer id's...
I'm not sure of the impact of that with the v1.0 compat, but that is 
something to think of.


--

--
-Monsyne Dragon




Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Eric Day

On Tue, Mar 22, 2011 at 12:40:21PM -0400, Ed Leafe wrote:
   The two obvious solutions are a) a single, shared database and b) using 
 a UUID instead of an integer for the ID. Both of these approaches have been 
 discussed and rejected, so let's not bring them back up now.

We shouldn't dismiss previous ideas just because we've not chosen
them in the past, but lets not have the same discussion.

   Given integer IDs and separate databases, the only obvious choice is 
 partitioning the numeric space so that each zone starts its auto-incrementing 
 at a different point, with enough room between starting ranges to ensure that 
 they would never overlap. This would require some assumptions be made about 
 the maximum number of instances that would ever be created in a single zone 
 in order to determine how much numeric space that zone would need. I'm 
 looking to get some feedback on what would seem to be reasonable guesses to 
 these partition sizes.

I think we need:

* No central authority such as a globally shared DB. This also
  means not partitioning some set and handing them out to zones as
  offset (this is just another form of a shared DB).

* Ability to seamlessly join existing zones without chance of namespace
  collisions for peering and bursting. This means a globally unique
  zone naming scheme, and for this I'll reiterate the idea of using
  DNS names for zones.

If we want to stick with a single DB per zone, as it looks like we
are, this can simply be the auto-increment value from the instance
table and the zone as: instance.zone.

   The other concern is more aesthetic than technical: we can make the 
 numeric spaces big enough to avoid overlap, but then we'll have very large ID 
 values; e.g., 10 or more digits for an instance. Computers won't care, but 
 people might, so I thought I'd at least bring up this potential objection.

I'm not concerned with aesthetic issues to be honest. We have
copy/paste, DNS, and various techniques for presentation layers.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Ed Leafe

On Mar 22, 2011, at 1:45 PM, Jon Slenk wrote:

 if the schema cannot be changed (which might be worth reconsidering
 since it seems to be a bit of a root cause of trouble) then maybe you
 have to reserve the last 4 or 5 digits of the id to be the zone id,
 and then autoincrement on top of that? on the assumption that there
 would be a limit of  or 9 zones ever.

Just to be clear: I would not have been in favor of using integer IDs. 
However, this was discussed and settled before I was actively involved in the 
OpenStack code, so I didn't want to have this devolve into a resurrection of 
what had already been decided. If someone wants to restart that discussion, I'd 
certainly be interested, but that's not what I'm looking for in this thread.

The question before us is: given integer IDs, what is the best way to 
handle the added complexity of multiple zones?


-- Ed Leafe



Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is 
prohibited.
If you receive this transmission in error, please notify us immediately by 
e-mail
at ab...@rackspace.com, and delete the original message.
Your cooperation is appreciated.


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Eric Day

On Tue, Mar 22, 2011 at 10:48:09AM -0700, Justin Santa Barbara wrote:
We can square the circle however - if we want numbers, let's use UUIDs -
they're 128 bit numbers, and won't in practice collide.  I'd still prefer
strings though...

If we use a number/uuid without a zone prefix, then they can
collide. What happens when I want to burst to my private cloud and
I've fixed my UUIDs to intentionally collide just to cause trouble?

Through peering and bursting we have potentially malicious users
for some deployments and we need to be sure resource ID spoofing and
poisoning is not possible. The simplest way is to have a namespace for
every zone, and the most obvious namespace is the zone name. We'll
of course need a mechanism to detect authenticity of zone names too
(signed certs, etc).

Oh, and all this discussion should not be limited to just instance
IDs, networks and volumes need to be globally addressed as well and
should follow the same mechanism.

-Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

Totally agree with Eric.

Two questions that I think can help us move forward:


   1. Is the decision to stick with integers still valid?  Can someone that
   was there give us the reason for the decision?  Is it documented anywhere?
   2. If we must have integers means that we get 128 bit 'random'
   integers, do we still want integers?



Justin





On Tue, Mar 22, 2011 at 11:25 AM, Eric Day e...@oddments.org wrote:

 On Tue, Mar 22, 2011 at 12:40:21PM -0400, Ed Leafe wrote:
The two obvious solutions are a) a single, shared database and b)
 using a UUID instead of an integer for the ID. Both of these approaches have
 been discussed and rejected, so let's not bring them back up now.

 We shouldn't dismiss previous ideas just because we've not chosen
 them in the past, but lets not have the same discussion.

Given integer IDs and separate databases, the only obvious choice
 is partitioning the numeric space so that each zone starts its
 auto-incrementing at a different point, with enough room between starting
 ranges to ensure that they would never overlap. This would require some
 assumptions be made about the maximum number of instances that would ever be
 created in a single zone in order to determine how much numeric space that
 zone would need. I'm looking to get some feedback on what would seem to be
 reasonable guesses to these partition sizes.

 I think we need:

 * No central authority such as a globally shared DB. This also
  means not partitioning some set and handing them out to zones as
  offset (this is just another form of a shared DB).

 * Ability to seamlessly join existing zones without chance of namespace
  collisions for peering and bursting. This means a globally unique
  zone naming scheme, and for this I'll reiterate the idea of using
  DNS names for zones.

 If we want to stick with a single DB per zone, as it looks like we
 are, this can simply be the auto-increment value from the instance
 table and the zone as: instance.zone.

The other concern is more aesthetic than technical: we can make the
 numeric spaces big enough to avoid overlap, but then we'll have very large
 ID values; e.g., 10 or more digits for an instance. Computers won't care,
 but people might, so I thought I'd at least bring up this potential
 objection.

 I'm not concerned with aesthetic issues to be honest. We have
 copy/paste, DNS, and various techniques for presentation layers.

 -Eric

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Chris Behrens


I think Dragon got it right.  We need a zone identifier prefix on the IDs.  I 
think we need to get away from numbers.  I don't see any reason why they need 
to be numbers.  But, even if they did, you can pick very large numbers and 
reserve some bits for zone ID.

- Chris


On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:

 I think _if_ we want to stick with straight numbers, the following are the 
 'traditional' choices:
 
 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers 2,4,6.  
 Requires that you know in advance how many zones there are.
 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
 3) Central allocation - each zone would request an ID from a central pool.  
 This might not be a bad thing, if you do want to have a quick lookup table of 
 ID - zone.  Doesn't work if the zones aren't under the same administrative 
 control.
 4) Block allocation - a refinement of #3, where you get a bunch of IDs.  
 Effectively amortizes the cost of the RPC.  Probably not worth the effort 
 here.
 
 (If you want central allocation without a shared database, that's also 
 possible, but requires some trickier protocols.)
 
 However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm a 
 customer of Rackspace CloudServers once it is running on OpenStack, and I 
 also have a private cloud that the new Rackspace Cloud Business unit has 
 built for me.  I like both, and then I want to do cloud bursting in between 
 them, by putting an aggregating zone in front of them.  I think at that 
 stage, we're screwed unless we figure this out now.  And this scenario only 
 has one provider (Rackspace) involved!
 
 We can square the circle however - if we want numbers, let's use UUIDs - 
 they're 128 bit numbers, and won't in practice collide.  I'd still prefer 
 strings though...
 
 Justin
 
 
 
 On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote:
I want to get some input from all of you on what you think is the best 
 way to approach this problem: the RS API requires that every instance have a 
 unique ID, and we are currently creating these IDs by use of an 
 auto-increment field in the instances table. The introduction of zones 
 complicates this, as each zone has its own database.
 
The two obvious solutions are a) a single, shared database and b) 
 using a UUID instead of an integer for the ID. Both of these approaches have 
 been discussed and rejected, so let's not bring them back up now.
 
Given integer IDs and separate databases, the only obvious choice is 
 partitioning the numeric space so that each zone starts its auto-incrementing 
 at a different point, with enough room between starting ranges to ensure that 
 they would never overlap. This would require some assumptions be made about 
 the maximum number of instances that would ever be created in a single zone 
 in order to determine how much numeric space that zone would need. I'm 
 looking to get some feedback on what would seem to be reasonable guesses to 
 these partition sizes.
 
The other concern is more aesthetic than technical: we can make the 
 numeric spaces big enough to avoid overlap, but then we'll have very large ID 
 values; e.g., 10 or more digits for an instance. Computers won't care, but 
 people might, so I thought I'd at least bring up this potential objection.
 
 
 
 -- Ed Leafe
 
 
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Jay Pipes

I know you don't want to resurrect a past discussion. But, UUIDs are
designed to solve these kind of problems, frankly. The decision to go
with integer IDs is a poor one, and will be negatively affecting the
scalability and architecture of our systems well into the future.

I'd love to see a discussion around moving away from internal integer
identifiers and towards UUID internal identifiers at the next summit.

Just my 2 cents,
-jay

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Paul Voccio

I agree with the sentiment that integers aren't the way to go long term.
The current spec of the api does introduce some interesting problems to
this discussion. All can be solved. The spec calls for the api to return
an id and a password upon instance creation. This means the api isn't
asynchronous if it has to wait for the zone to create the id. From page 46
of the API Spec states the following:

Note that when creating a server only the server ID and the admin
password are guaranteed to be returned in the request object. Additional
attributes may be retrieved by performing subsequent GETs on the server.



This creates a problem with the bursting if Z1 calls to Z2, which is a
public cloud, which has to wait for Z3-X to find out where it is going be
placed. How would this work?

pvo

On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote:


I think Dragon got it right.  We need a zone identifier prefix on the
IDs.  I think we need to get away from numbers.  I don't see any reason
why they need to be numbers.  But, even if they did, you can pick very
large numbers and reserve some bits for zone ID.

- Chris


On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:

 I think _if_ we want to stick with straight numbers, the following are
the 'traditional' choices:
 
 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers
2,4,6.  Requires that you know in advance how many zones there are.
 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
 3) Central allocation - each zone would request an ID from a central
pool.  This might not be a bad thing, if you do want to have a quick
lookup table of ID - zone.  Doesn't work if the zones aren't under the
same administrative control.
 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
 Effectively amortizes the cost of the RPC.  Probably not worth the
effort here.
 
 (If you want central allocation without a shared database, that's also
possible, but requires some trickier protocols.)
 
 However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
a customer of Rackspace CloudServers once it is running on OpenStack,
and I also have a private cloud that the new Rackspace Cloud Business
unit has built for me.  I like both, and then I want to do cloud
bursting in between them, by putting an aggregating zone in front of
them.  I think at that stage, we're screwed unless we figure this out
now.  And this scenario only has one provider (Rackspace) involved!
 
 We can square the circle however - if we want numbers, let's use UUIDs
- they're 128 bit numbers, and won't in practice collide.  I'd still
prefer strings though...
 
 Justin
 
 
 
 On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote:
I want to get some input from all of you on what you think is
the best way to approach this problem: the RS API requires that every
instance have a unique ID, and we are currently creating these IDs by
use of an auto-increment field in the instances table. The introduction
of zones complicates this, as each zone has its own database.
 
The two obvious solutions are a) a single, shared database and
b) using a UUID instead of an integer for the ID. Both of these
approaches have been discussed and rejected, so let's not bring them
back up now.
 
Given integer IDs and separate databases, the only obvious
choice is partitioning the numeric space so that each zone starts its
auto-incrementing at a different point, with enough room between
starting ranges to ensure that they would never overlap. This would
require some assumptions be made about the maximum number of instances
that would ever be created in a single zone in order to determine how
much numeric space that zone would need. I'm looking to get some
feedback on what would seem to be reasonable guesses to these partition
sizes.
 
The other concern is more aesthetic than technical: we can make
the numeric spaces big enough to avoid overlap, but then we'll have very
large ID values; e.g., 10 or more digits for an instance. Computers
won't care, but people might, so I thought I'd at least bring up this
potential objection.
 
 
 
 -- Ed Leafe
 
 
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


___
Mailing list:

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Paul Voccio

With this, are we saying EC2API wouldn't be able to use the child zones in the 
same way as the OSAPI?

From: Vishvananda Ishaya vishvana...@gmail.commailto:vishvana...@gmail.com
Date: Tue, 22 Mar 2011 12:44:21 -0700
To: Justin Santa Barbara jus...@fathomdb.commailto:jus...@fathomdb.com
Cc: Paul Voccio paul.voc...@rackspace.commailto:paul.voc...@rackspace.com, 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net 
openstack@lists.launchpad.netmailto:openstack@lists.launchpad.net, Chris 
Behrens chris.behr...@rackspace.commailto:chris.behr...@rackspace.com
Subject: Re: [Openstack] Instance IDs and Multiple Zones

The main issue that drove integers is backwards compatibility to the ec2_api 
and existing ec2 toolsets.  People seemed very opposed to the idea of having 
two separate ids in the database, one for ec2 and one for the underlying 
system.  If we want to move to another id scheme that doesn't fit in a 32 bit 
integer we have to provide a way for ec2 style ids to be assigned to instances, 
perhaps through a central authority that hands out unique ids.

Vish

On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:

The API spec doesn't seem to preclude us from doing a fully-synchronous method 
if we want to (it just reserves the option to do an async implementation).  
Obviously we should make scheduling fast, but I think we're fine doing 
synchronous scheduling.  It's still probably going to be much faster than 
CloudServers on a bad day anyway :-)

Anyone have a link to where we chose to go with integer IDs?  I'd like to 
understand why, because presumably we had a good reason.

However, if we don't have documentation of the decision, then I vote that it 
never happened, and instance ids are strings.  We've always been at war with 
Eastasia, and all ids have always been strings.

Justin

On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
paul.voc...@rackspace.commailto:paul.voc...@rackspace.com wrote:
I agree with the sentiment that integers aren't the way to go long term.
The current spec of the api does introduce some interesting problems to
this discussion. All can be solved. The spec calls for the api to return
an id and a password upon instance creation. This means the api isn't
asynchronous if it has to wait for the zone to create the id. From page 46
of the API Spec states the following:

Note that when creating a server only the server ID and the admin
password are guaranteed to be returned in the request object. Additional
attributes may be retrieved by performing subsequent GETs on the server.

This creates a problem with the bursting if Z1 calls to Z2, which is a
public cloud, which has to wait for Z3-X to find out where it is going be
placed. How would this work?

pvo

On 3/22/11 1:39 PM, Chris Behrens 
chris.behr...@rackspace.commailto:chris.behr...@rackspace.com wrote:

I think Dragon got it right.  We need a zone identifier prefix on the
IDs.  I think we need to get away from numbers.  I don't see any reason
why they need to be numbers.  But, even if they did, you can pick very
large numbers and reserve some bits for zone ID.

- Chris

On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:

 I think _if_ we want to stick with straight numbers, the following are
the 'traditional' choices:

 1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers
2,4,6.  Requires that you know in advance how many zones there are.
 2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
 3) Central allocation - each zone would request an ID from a central
pool.  This might not be a bad thing, if you do want to have a quick
lookup table of ID - zone.  Doesn't work if the zones aren't under the
same administrative control.
 4) Block allocation - a refinement of #3, where you get a bunch of IDs.
 Effectively amortizes the cost of the RPC.  Probably not worth the
effort here.

 (If you want central allocation without a shared database, that's also
possible, but requires some trickier protocols.)

 However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
a customer of Rackspace CloudServers once it is running on OpenStack,
and I also have a private cloud that the new Rackspace Cloud Business
unit has built for me.  I like both, and then I want to do cloud
bursting in between them, by putting an aggregating zone in front of
them.  I think at that stage, we're screwed unless we figure this out
now.  And this scenario only has one provider (Rackspace) involved!

 We can square the circle however - if we want numbers, let's use UUIDs
- they're 128 bit numbers, and won't in practice collide.  I'd still
prefer strings though...

 Justin

 On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe 
 e...@leafe.commailto:e...@leafe.com wrote:
I want to get some input from all of you on what you think is
the best way to approach this problem: the RS API requires that every
instance have a unique ID, and we are currently creating these IDs by
use of an auto-increment field

Re: [Openstack] Instance IDs and Multiple Zones

EC2 uses xsd:string for their instance id.  I can't find any additional
guarantees.

Here's a (second hand) quote from Amazon:

http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
Instance ids are unique. You'll never receive a duplicate id. However, the
current format of the instance id is an implementation detail that is
subject to change. If you use the instance id as a string, you should be
fine.

So, strings it is then? :-)



On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya
vishvana...@gmail.comwrote:

 The main issue that drove integers is backwards compatibility to the
 ec2_api and existing ec2 toolsets.  People seemed very opposed to the idea
 of having two separate ids in the database, one for ec2 and one for the
 underlying system.  If we want to move to another id scheme that doesn't fit
 in a 32 bit integer we have to provide a way for ec2 style ids to be
 assigned to instances, perhaps through a central authority that hands out
 unique ids.

 Vish

 On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:

 The API spec doesn't seem to preclude us from doing a fully-synchronous
 method if we want to (it just reserves the option to do an async
 implementation).  Obviously we should make scheduling fast, but I think
 we're fine doing synchronous scheduling.  It's still probably going to be
 much faster than CloudServers on a bad day anyway :-)

 Anyone have a link to where we chose to go with integer IDs?  I'd like to
 understand why, because presumably we had a good reason.

 However, if we don't have documentation of the decision, then I vote that
 it never happened, and instance ids are strings.  We've always been at war
 with Eastasia, and all ids have always been strings.

 Justin




 On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
 paul.voc...@rackspace.comwrote:

 I agree with the sentiment that integers aren't the way to go long term.
 The current spec of the api does introduce some interesting problems to
 this discussion. All can be solved. The spec calls for the api to return
 an id and a password upon instance creation. This means the api isn't
 asynchronous if it has to wait for the zone to create the id. From page 46
 of the API Spec states the following:

 Note that when creating a server only the server ID and the admin
 password are guaranteed to be returned in the request object. Additional
 attributes may be retrieved by performing subsequent GETs on the server.



 This creates a problem with the bursting if Z1 calls to Z2, which is a
 public cloud, which has to wait for Z3-X to find out where it is going be
 placed. How would this work?

 pvo

 On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote:

 
 I think Dragon got it right.  We need a zone identifier prefix on the
 IDs.  I think we need to get away from numbers.  I don't see any reason
 why they need to be numbers.  But, even if they did, you can pick very
 large numbers and reserve some bits for zone ID.
 
 - Chris
 
 
 On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
 
  I think _if_ we want to stick with straight numbers, the following are
 the 'traditional' choices:
 
  1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers
 2,4,6.  Requires that you know in advance how many zones there are.
  2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
  3) Central allocation - each zone would request an ID from a central
 pool.  This might not be a bad thing, if you do want to have a quick
 lookup table of ID - zone.  Doesn't work if the zones aren't under the
 same administrative control.
  4) Block allocation - a refinement of #3, where you get a bunch of IDs.
  Effectively amortizes the cost of the RPC.  Probably not worth the
 effort here.
 
  (If you want central allocation without a shared database, that's also
 possible, but requires some trickier protocols.)
 
  However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
 a customer of Rackspace CloudServers once it is running on OpenStack,
 and I also have a private cloud that the new Rackspace Cloud Business
 unit has built for me.  I like both, and then I want to do cloud
 bursting in between them, by putting an aggregating zone in front of
 them.  I think at that stage, we're screwed unless we figure this out
 now.  And this scenario only has one provider (Rackspace) involved!
 
  We can square the circle however - if we want numbers, let's use UUIDs
 - they're 128 bit numbers, and won't in practice collide.  I'd still
 prefer strings though...
 
  Justin
 
 
 
  On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote:
 I want to get some input from all of you on what you think is
 the best way to approach this problem: the RS API requires that every
 instance have a unique ID, and we are currently creating these IDs by
 use of an auto-increment field in the instances table. The introduction
 of zones complicates this, as each zone has its own database.

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Mark Washenberger

 However, if we don't have documentation of the decision, then I vote that it
 never happened, and instance ids are strings.  We've always been at war with
 Eastasia, and all ids have always been strings.

This approach might help us in fixing some of the nastier bits of the openstack 
api images resource, as well.

Justin Santa Barbara jus...@fathomdb.com said:

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp
 The API spec doesn't seem to preclude us from doing a fully-synchronous
 method if we want to (it just reserves the option to do an async
 implementation).  Obviously we should make scheduling fast, but I think
 we're fine doing synchronous scheduling.  It's still probably going to be
 much faster than CloudServers on a bad day anyway :-)
 
 Anyone have a link to where we chose to go with integer IDs?  I'd like to
 understand why, because presumably we had a good reason.
 
 However, if we don't have documentation of the decision, then I vote that it
 never happened, and instance ids are strings.  We've always been at war with
 Eastasia, and all ids have always been strings.
 
 Justin
 
 
 
 
 On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio 
 paul.voc...@rackspace.comwrote:
 
 I agree with the sentiment that integers aren't the way to go long term.
 The current spec of the api does introduce some interesting problems to
 this discussion. All can be solved. The spec calls for the api to return
 an id and a password upon instance creation. This means the api isn't
 asynchronous if it has to wait for the zone to create the id. From page 46
 of the API Spec states the following:

 Note that when creating a server only the server ID and the admin
 password are guaranteed to be returned in the request object. Additional
 attributes may be retrieved by performing subsequent GETs on the server.



 This creates a problem with the bursting if Z1 calls to Z2, which is a
 public cloud, which has to wait for Z3-X to find out where it is going be
 placed. How would this work?

 pvo

 On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote:

 
 I think Dragon got it right.  We need a zone identifier prefix on the
 IDs.  I think we need to get away from numbers.  I don't see any reason
 why they need to be numbers.  But, even if they did, you can pick very
 large numbers and reserve some bits for zone ID.
 
 - Chris
 
 
 On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
 
  I think _if_ we want to stick with straight numbers, the following are
 the 'traditional' choices:
 
  1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers
 2,4,6.  Requires that you know in advance how many zones there are.
  2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
  3) Central allocation - each zone would request an ID from a central
 pool.  This might not be a bad thing, if you do want to have a quick
 lookup table of ID - zone.  Doesn't work if the zones aren't under the
 same administrative control.
  4) Block allocation - a refinement of #3, where you get a bunch of IDs.
  Effectively amortizes the cost of the RPC.  Probably not worth the
 effort here.
 
  (If you want central allocation without a shared database, that's also
 possible, but requires some trickier protocols.)
 
  However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
 a customer of Rackspace CloudServers once it is running on OpenStack,
 and I also have a private cloud that the new Rackspace Cloud Business
 unit has built for me.  I like both, and then I want to do cloud
 bursting in between them, by putting an aggregating zone in front of
 them.  I think at that stage, we're screwed unless we figure this out
 now.  And this scenario only has one provider (Rackspace) involved!
 
  We can square the circle however - if we want numbers, let's use UUIDs
 - they're 128 bit numbers, and won't in practice collide.  I'd still
 prefer strings though...
 
  Justin
 
 
 
  On Tue, Mar 22, 2011 at 9:40 AM, Ed Leafe e...@leafe.com wrote:
 I want to get some input from all of you on what you think is
 the best way to approach this problem: the RS API requires that every
 instance have a unique ID, and we are currently creating these IDs by
 use of an auto-increment field in the instances table. The introduction
 of zones complicates this, as each zone has its own database.
 
 The two obvious solutions are a) a single, shared database and
 b) using a UUID instead of an integer for the ID. Both of these
 approaches have been discussed and rejected, so let's not bring them
 back up now.
 
 Given integer IDs and separate databases, the only obvious
 choice is partitioning the numeric space so that each zone starts its
 auto-incrementing at a different point, with enough room between
 starting ranges to ensure

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Vishvananda Ishaya

Yes, that is what they say,  Unfortunately all of the ec2 tools expect the 
current format that they are using to various degrees.

Some just need the proper prefix (euca2ools)
Others need the prefix + hex (elasticfox, irrc)
Others allow a string but limit it to 11 chars, etc.

So to keep compatibility we are stuck mimicking amazon's string version for now.

Vish

On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote:

 EC2 uses xsd:string for their instance id.  I can't find any additional 
 guarantees.
 
 Here's a (second hand) quote from Amazon:
 
 http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
 Instance ids are unique. You'll never receive a duplicate id. However, the 
 current format of the instance id is an implementation detail that is subject 
 to change. If you use the instance id as a string, you should be fine.
 
 So, strings it is then? :-)
 
 
 
 On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya vishvana...@gmail.com 
 wrote:
 The main issue that drove integers is backwards compatibility to the ec2_api 
 and existing ec2 toolsets.  People seemed very opposed to the idea of having 
 two separate ids in the database, one for ec2 and one for the underlying 
 system.  If we want to move to another id scheme that doesn't fit in a 32 bit 
 integer we have to provide a way for ec2 style ids to be assigned to 
 instances, perhaps through a central authority that hands out unique ids.
 
 Vish
 
 On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
 
 The API spec doesn't seem to preclude us from doing a fully-synchronous 
 method if we want to (it just reserves the option to do an async 
 implementation).  Obviously we should make scheduling fast, but I think 
 we're fine doing synchronous scheduling.  It's still probably going to be 
 much faster than CloudServers on a bad day anyway :-)
 
 Anyone have a link to where we chose to go with integer IDs?  I'd like to 
 understand why, because presumably we had a good reason.
 
 However, if we don't have documentation of the decision, then I vote that it 
 never happened, and instance ids are strings.  We've always been at war with 
 Eastasia, and all ids have always been strings.
 
 Justin
 
 
 
 
 On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio paul.voc...@rackspace.com 
 wrote:
 I agree with the sentiment that integers aren't the way to go long term.
 The current spec of the api does introduce some interesting problems to
 this discussion. All can be solved. The spec calls for the api to return
 an id and a password upon instance creation. This means the api isn't
 asynchronous if it has to wait for the zone to create the id. From page 46
 of the API Spec states the following:
 
 Note that when creating a server only the server ID and the admin
 password are guaranteed to be returned in the request object. Additional
 attributes may be retrieved by performing subsequent GETs on the server.
 
 
 
 This creates a problem with the bursting if Z1 calls to Z2, which is a
 public cloud, which has to wait for Z3-X to find out where it is going be
 placed. How would this work?
 
 pvo
 
 On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote:
 
 
 I think Dragon got it right.  We need a zone identifier prefix on the
 IDs.  I think we need to get away from numbers.  I don't see any reason
 why they need to be numbers.  But, even if they did, you can pick very
 large numbers and reserve some bits for zone ID.
 
 - Chris
 
 
 On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
 
  I think _if_ we want to stick with straight numbers, the following are
 the 'traditional' choices:
 
  1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers
 2,4,6.  Requires that you know in advance how many zones there are.
  2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
  3) Central allocation - each zone would request an ID from a central
 pool.  This might not be a bad thing, if you do want to have a quick
 lookup table of ID - zone.  Doesn't work if the zones aren't under the
 same administrative control.
  4) Block allocation - a refinement of #3, where you get a bunch of IDs.
  Effectively amortizes the cost of the RPC.  Probably not worth the
 effort here.
 
  (If you want central allocation without a shared database, that's also
 possible, but requires some trickier protocols.)
 
  However, I agree with Monsyne: numeric IDs have got to go.  Suppose I'm
 a customer of Rackspace CloudServers once it is running on OpenStack,
 and I also have a private cloud that the new Rackspace Cloud Business
 unit has built for me.  I like both, and then I want to do cloud
 bursting in between them, by putting an aggregating zone in front of
 them.  I think at that stage, we're screwed unless we figure this out
 now.  And this scenario only has one provider (Rackspace) involved!
 
  We can square the circle however - if we want numbers, let's use UUIDs
 - they're 128 bit numbers, and won't in practice

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Brian Schott

+1
Sounds like some IPV6 discussions back when the standards were being debated.  
We could debate bit-allocation forever.  Why can't we use UUIDs?

http://tools.ietf.org/html/rfc4122


2.  Motivation


   One of the main reasons for using UUIDs is that no centralized
   authority is required to administer them (although one format uses
   IEEE 802 node identifiers, others do not).  As a result, generation
   on demand can be completely automated, and used for a variety of
   purposes.  The UUID generation algorithm described here supports very
   high allocation rates of up to 10 million per second per machine if
   necessary, so that they could even be used as transaction IDs.

   UUIDs are of a fixed size (128 bits) which is reasonably small
   compared to other alternatives.  This lends itself well to sorting,
   ordering, and hashing of all sorts, storing in databases, simple
   allocation, and ease of programming in general.

   Since UUIDs are unique and persistent, they make excellent Uniform
   Resource Names.  The unique ability to generate a new UUID without a
   registration process allows for UUIDs to be one of the URNs with the
   lowest minting cost.



Brian Schott
bfsch...@gmail.com



On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote:

 I know you don't want to resurrect a past discussion. But, UUIDs are
 designed to solve these kind of problems, frankly. The decision to go
 with integer IDs is a poor one, and will be negatively affecting the
 scalability and architecture of our systems well into the future.
 
 I'd love to see a discussion around moving away from internal integer
 identifiers and towards UUID internal identifiers at the next summit.
 
 Just my 2 cents,
 -jay
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Vishvananda Ishaya

Seems resonable

+1 to design summit discussion

Vish

On Mar 22, 2011, at 1:06 PM, Justin Santa Barbara wrote:

 Let's take a leadership position here and go with strings; we're not breaking 
 Amazon's API.  AWS will have to make the same changes when they reach our 
 scale and ambition :-)
 
 We should also start engaging with client tools, because we're never going to 
 be 100% EC2 compatible.  At the least, our endpoints will be different.
 
 I think we should discuss this at the Design Summit, and then make an effort 
 on this front as part of Diablo.
 
 
 
 On Tue, Mar 22, 2011 at 12:58 PM, Vishvananda Ishaya vishvana...@gmail.com 
 wrote:
 Yes, that is what they say,  Unfortunately all of the ec2 tools expect the 
 current format that they are using to various degrees.
 
 Some just need the proper prefix (euca2ools)
 Others need the prefix + hex (elasticfox, irrc)
 Others allow a string but limit it to 11 chars, etc.
 
 So to keep compatibility we are stuck mimicking amazon's string version for 
 now.
 
 Vish
 
 On Mar 22, 2011, at 12:51 PM, Justin Santa Barbara wrote:
 
 EC2 uses xsd:string for their instance id.  I can't find any additional 
 guarantees.
 
 Here's a (second hand) quote from Amazon:
 
 http://serverfault.com/questions/58401/is-the-amazon-ec2-instance-id-unique-forever
 Instance ids are unique. You'll never receive a duplicate id. However, the 
 current format of the instance id is an implementation detail that is 
 subject to change. If you use the instance id as a string, you should be 
 fine.
 
 So, strings it is then? :-)
 
 
 
 On Tue, Mar 22, 2011 at 12:44 PM, Vishvananda Ishaya vishvana...@gmail.com 
 wrote:
 The main issue that drove integers is backwards compatibility to the ec2_api 
 and existing ec2 toolsets.  People seemed very opposed to the idea of having 
 two separate ids in the database, one for ec2 and one for the underlying 
 system.  If we want to move to another id scheme that doesn't fit in a 32 
 bit integer we have to provide a way for ec2 style ids to be assigned to 
 instances, perhaps through a central authority that hands out unique ids.
 
 Vish
 
 On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
 
 The API spec doesn't seem to preclude us from doing a fully-synchronous 
 method if we want to (it just reserves the option to do an async 
 implementation).  Obviously we should make scheduling fast, but I think 
 we're fine doing synchronous scheduling.  It's still probably going to be 
 much faster than CloudServers on a bad day anyway :-)
 
 Anyone have a link to where we chose to go with integer IDs?  I'd like to 
 understand why, because presumably we had a good reason.
 
 However, if we don't have documentation of the decision, then I vote that 
 it never happened, and instance ids are strings.  We've always been at war 
 with Eastasia, and all ids have always been strings.
 
 Justin
 
 
 
 
 On Tue, Mar 22, 2011 at 12:20 PM, Paul Voccio paul.voc...@rackspace.com 
 wrote:
 I agree with the sentiment that integers aren't the way to go long term.
 The current spec of the api does introduce some interesting problems to
 this discussion. All can be solved. The spec calls for the api to return
 an id and a password upon instance creation. This means the api isn't
 asynchronous if it has to wait for the zone to create the id. From page 46
 of the API Spec states the following:
 
 Note that when creating a server only the server ID and the admin
 password are guaranteed to be returned in the request object. Additional
 attributes may be retrieved by performing subsequent GETs on the server.
 
 
 
 This creates a problem with the bursting if Z1 calls to Z2, which is a
 public cloud, which has to wait for Z3-X to find out where it is going be
 placed. How would this work?
 
 pvo
 
 On 3/22/11 1:39 PM, Chris Behrens chris.behr...@rackspace.com wrote:
 
 
 I think Dragon got it right.  We need a zone identifier prefix on the
 IDs.  I think we need to get away from numbers.  I don't see any reason
 why they need to be numbers.  But, even if they did, you can pick very
 large numbers and reserve some bits for zone ID.
 
 - Chris
 
 
 On Mar 22, 2011, at 10:48 AM, Justin Santa Barbara wrote:
 
  I think _if_ we want to stick with straight numbers, the following are
 the 'traditional' choices:
 
  1) Skipping - so zone1 would allocate numbers 1,3,5, zone2 numbers
 2,4,6.  Requires that you know in advance how many zones there are.
  2) Prefixing - so zone0 would get 0xxx, zone1 1xx.
  3) Central allocation - each zone would request an ID from a central
 pool.  This might not be a bad thing, if you do want to have a quick
 lookup table of ID - zone.  Doesn't work if the zones aren't under the
 same administrative control.
  4) Block allocation - a refinement of #3, where you get a bunch of IDs.
  Effectively amortizes the cost of the RPC.  Probably not worth the
 effort here.
 
  (If you want central allocation without a shared database, that's also

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Brian Schott

I remember reading this a while ago.  Not saying we have to do this.  This is 
probably why zones are independent and ids are not unique across zones in EC2.  

This could be handled in the ec2 api service for compatibility.  We could just 
XOR the  top half and the bottom half of a UUID and get a unique hash that just 
the EC2 API needs to keep track of.  The only important thing is that the USER 
doesn't get id collisions.

---

http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/

Anatomy of a Resource ID

So how were the numbers above calculated? To find out, let’s decompose an EC2 
resource ID. After comparing hundreds of IDs, this opaque identifier turned out 
to be a little more transparent than you’d expect.

inline: ec2_resource_id.png
Type

The most trivial of the fields, the type is one of the following values, 
depending on the resource type:

• i – instance
• r – reservation
• vol – EBS volume
• snap – EBS snapshot
• ami – Amazon machine image
• aki – Amazon kernel image
• ari – Amazon ramdisk image
Inner ID

The Inner ID is a 16-bit counter of resources allocated. Each time a resource 
is requested, the Inner ID increments by one. For instance and reservation IDs, 
it increments by two (i.e., these Inner IDs are always even). Instead of 
counting from 0- as you’d expect, the Inner ID uses the following cycle:

• 4000-7FFF
• -3FFF
• C000-
• 8000-BFFF
(This cycle can be easily normalized by XORing with 4000.) When the Inner ID 
has exhausted its space, a new series begins (see below) and the cycle restarts.

Series Marker

For a given resource type, there is one active 8-bit Series ID. This Series ID, 
however, is not embedded directly into the resource ID. Instead, it is XORed to 
the leftmost 8 bits of the Inner ID. The result, which I call the Series 
Marker, is embedded in the ID to the left of the Inner ID.

For example, on the resource ID above the Series ID would be e5 = a7 XOR 42.

Series IDs usually decrement by one each time the Inner ID completes a cycle. I 
say “usually” because while this is the most common behavior, from time to time 
Series IDs seem to jump around in a pattern which is yet to be explained.

UPDATE (Oct 7th 2009): RightScale contributed the missing piece: to normalize a 
series ID, XOR with E5 – this irons out the “jumps” I noticed perfectly.

Superseries Marker

For a given resource type, there is one active 8-bit Superseries ID. Like the 
Series ID, the Superseries ID is not embedded directly into the resource ID. 
Instead, it is XORed to the rightmost 8 bits of the Inner ID. The result – the 
Superseries Marker – is the leftmost byte of the resource ID.

For example, on the resource ID above the Superseries ID would be 69 = 31 XOR 
58.

The Superseries ID changes so rarely that originally I had assumed it was some 
kind of checksum. This would have been odd as it limits the total available IDs 
to 224 = 16.8 million. Up to very recently, the Superseries ID for all resource 
types – instances, images, volumes, snapshots, etc. – was 69 (in the us-east-1 
region (for eu-west-1 the Superseries ID is 74). These days, new instances use 
the Superseries ID 68. This subtle change, unnoticed by the industry, may hint 
at an astonishing achievement: 8.4 million instances launched since EC2′s 
debut! (Instance IDs are even so 8.4M = 16.8M / 2.)

UPDATE (Oct 7th 2009): RightScale suggested to normalize the Superseries ID by 
XORing with 69. In this technique, the superseries ID for us-east-1 was 0, and 
the recent change incremented it to 1.

Brian Schott
bfsch...@gmail.com



On Mar 22, 2011, at 3:44 PM, Vishvananda Ishaya wrote:

 The main issue that drove integers is backwards compatibility to the ec2_api 
 and existing ec2 toolsets.  People seemed very opposed to the idea of having 
 two separate ids in the database, one for ec2 and one for the underlying 
 system.  If we want to move to another id scheme that doesn't fit in a 32 bit 
 integer we have to provide a way for ec2 style ids to be assigned to 
 instances, perhaps through a central authority that hands out unique ids.
 
 Vish
 
 On Mar 22, 2011, at 12:30 PM, Justin Santa Barbara wrote:
 
 The API spec doesn't seem to preclude us from doing a fully-synchronous 
 method if we want to (it just reserves the option to do an async 
 implementation).  Obviously we should make scheduling fast, but I think 
 we're fine doing synchronous scheduling.  It's still probably going to be 
 much faster than CloudServers on a bad day anyway :-)
 
 Anyone have a link to where we chose to go with integer IDs?  I'd like to 
 understand why, because presumably we had a good reason.
 
 However, if we don't have documentation of the decision, then I vote that it 
 never happened, and instance ids are strings.  We've always been at war with 
 Eastasia, and all ids have always been strings.
 
 Justin
 
 
 
 
 On

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Eric Day

See my previous response to Justin's email as to why UUIDs alone are
not sifficient.

-Eric

On Tue, Mar 22, 2011 at 04:06:14PM -0400, Brian Schott wrote:
 +1
 Sounds like some IPV6 discussions back when the standards were being debated. 
  We could debate bit-allocation forever.  Why can't we use UUIDs?
 
 http://tools.ietf.org/html/rfc4122
 
 
 2.  Motivation
 
 
One of the main reasons for using UUIDs is that no centralized
authority is required to administer them (although one format uses
IEEE 802 node identifiers, others do not).  As a result, generation
on demand can be completely automated, and used for a variety of
purposes.  The UUID generation algorithm described here supports very
high allocation rates of up to 10 million per second per machine if
necessary, so that they could even be used as transaction IDs.
 
UUIDs are of a fixed size (128 bits) which is reasonably small
compared to other alternatives.  This lends itself well to sorting,
ordering, and hashing of all sorts, storing in databases, simple
allocation, and ease of programming in general.
 
Since UUIDs are unique and persistent, they make excellent Uniform
Resource Names.  The unique ability to generate a new UUID without a
registration process allows for UUIDs to be one of the URNs with the
lowest minting cost.
 
 
 
 Brian Schott
 bfsch...@gmail.com
 
 
 
 On Mar 22, 2011, at 2:53 PM, Jay Pipes wrote:
 
  I know you don't want to resurrect a past discussion. But, UUIDs are
  designed to solve these kind of problems, frankly. The decision to go
  with integer IDs is a poor one, and will be negatively affecting the
  scalability and architecture of our systems well into the future.
  
  I'd love to see a discussion around moving away from internal integer
  identifiers and towards UUID internal identifiers at the next summit.
  
  Just my 2 cents,
  -jay
  
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Re: [Openstack] Instance IDs and Multiple Zones

2011-03-22 Thread Erik Carlin

Good discussion.  I need to understand a bit more about how cross org
boundary bursting is envisioned to work before assessing the implications
on server id format.

Say a user hits the http://servers.myos.com api on zone A, which then
calls out to http://servers.osprovider.com api in zone B, which calls out
to http://dfw.servers.rackspace.com zone C, which calls down to
http://zoned.dfw.servers.rackspace.com zone D (which would not be a public
endpoint).  

[We'll exclude authN and the network implications for now :-]

I assume the lowest zone (zone D) is responsible for assigning the id?

Does that mean there are now 4 URIs for the same exact resource (I'm
assuming a numeric server id here for a moment):

http://zoned.dfw.servers.rackspace.com/v1.1/123/servers/12345 (this would
be non-public)
http://dfw.servers.rackspace.com/v1.1/123/servers/12345
http://servers.osprovider.com/v1.1/456/servers/12345
http://servers.myos.com/v1.1/789/servers/12345

I assume then the user is only returned the URI from the high level zone
they are hitting (http://servers.myos.com/v1.1/789/servers/12345 in this
example)?  If so, that means the high level zone defines everything in the
URI except the actually server ID which is assigned by the low level zone.
 Would users ever get returned a downstream URI they could hit directly?

Pure numeric ids will not work in a federated model at scale.  If you have
registered zone prefixes/suffixes, you will limit the total zone count
based on the number of digits you preallocate and need a registration
process to ensure uniqueness.  How many zones is enough?

You could use UUID.  If the above flow is accurate, I can only see how you
create collisions in your OWN OS deployment.  For example, if I
purposefully create a UUID collision in servers.myos.com (that I run) with
dfw.servers.rackspace.com (that Rackspace runs), it would only affect me
since the collision would only be seen in the servers.myos.com namespace.
Maybe I'm missing something, but I don't see how you could inject a
collision ID downstream - you can just shoot yourself in your own foot.
Eric Day, please jump in here if I am off.  AFAICT, same applies to dns
(which I will discuss more below).  I could just make my server ID dns
namespace collide with rackspace, but it would still only affect me in my
own URI namespace.

The other option apart from UUID is a globally unique string prefix.  If
Rackspace had 3 global API endpoints (ord, dfw, lon) each with 5 zones,
the ID would need to be something like rax:dfw:1:12345 (I would actually
want to hash the zone id 1 portion with something unique per customer so
people couldn't coordinate info about zones and target attacks, etc.).
This is obviously redundant with the Rackspace URI since we are
representing Rackspace and the region twice, e.g.
http://dfw.servers.rackspace.com/v1.1/12345/servers/rax:dfw:1:6789.

This option also means we need a mechanism for registering unique
prefixes.  We could use the same one we are proposing for API extensions,
or, as Eric pointed out, use dns, but that would REALLY get redundant,
e.g. 
http://dfw.servers.rackspace.com/v1.1/12345/servers/6789.dfw.servers.racksp
ace.com.

Using strings also means people could make ids whatever they want as long
as they obeyed the prefix/suffix.  So one provider could be
rax:dfw:1:12345 and another could be osprovider:8F792#@*jsn.  That is
technically not a big deal, but there is something for consistency and
simplicity.


The fundamental problem I see here is URI is intended to be the universal
resource identifier but since zone federation will create multiple URIs
for the same resource, the server id now has to be ANOTHER universal
resource identifier.

Another issue is whether you want transparency or opaqueness when you are
federating.  If you hit http://servers.myos.com, create two servers, and
the ids that come back are (assuming using dns as server ids for a moment):

http://servers.myos.com/v1.1/12345/servers/5678.servers.myos.com

http://servers.myos.com/v1.1/12345/servers/6789.dfw.servers.rackspace.com

It will be obvious in which deployment the servers live.  This will
effectively prevent whitelabel federating.  UUID would be more opaque.

Given all of the above, I think I lean towards UUID.

Would love to hear more thought and dialog on this.

Erik  



On 3/22/11 3:49 PM, Eric Day e...@oddments.org wrote:

See my previous response to Justin's email as to why UUIDs alone are
not sifficient.

-Eric

On Tue, Mar 22, 2011 at 04:06:14PM -0400, Brian Schott wrote:
 +1
 Sounds like some IPV6 discussions back when the standards were being
debated.  We could debate bit-allocation forever.  Why can't we use
UUIDs?
 
 http://tools.ietf.org/html/rfc4122
 
 
 2.  Motivation
 
 
One of the main reasons for using UUIDs is that no centralized
authority is required to administer them (although one format uses
IEEE 802 node identifiers, others do not).  As a result, generation
on demand can be

Re: [Openstack] Instance IDs and Multiple Zones