It's not about the size of the collision space. It's about the scale-killing impact of enforcing sequential integers (auto-increment keys) across multiple nodes. A random 32-bit integer would be fine as long as the generating algorithm produced acceptable collision rates.
-jay On Tue, Oct 5, 2010 at 2:39 PM, Michael Gundlach <[email protected]> wrote: > On Tue, Oct 5, 2010 at 2:31 PM, Ewan Mellor <[email protected]> > wrote: >> >> Thanks Michael. It would be great to see this in some doccomments in the >> code somewhere. > > Yeah, I agree -- I first touched this code to rename ec2_id to internal_id, > and it's been a bit of a pain because there's a dearth of comments (and some > ambiguous naming, like "instance_id" referring to id in some places and to > ec2_id in other places.) I plan to clean this code up as one of my post-FF > tasks. > >> >> I agree with Jay's comments elsewhere in this thread -- it seems a better >> idea to use a UUID for your internal_id, rather than a long int. That >> way, >> the ID is an extra 64 bits longer, so you can generate them randomly on >> independent nodes without worrying about collisions. > > I think we discussed this in IRC -- 128 bits turns into a 26 byte EC2 ID vs > 14 for a 64 bit int, and someone (Soren?) had a strong negative preference. > Actually, I just checked in code only using a 32 bit integer, and I'd like > some convincing that this isn't sufficient for the foreseeable future. We > want to support 1 million instances, right? Which is 1/4000 the keyspace, > so we have something like a 1 in 4000 chance of a collision once we hit a > million instances. I know I'm not doing the statistics properly, since the > proper question is "what is the chance of at least one collision when > successively generating one million random 32-bit integers?", but it feels > like we can punt on larger values at least until Bexar. > Thoughts? > Michael > > Confidentiality Notice: This e-mail message (including any attached or > embedded documents) is intended for the exclusive and confidential use of > the > individual or entity to which this message is addressed, and unless > otherwise > expressly indicated, is confidential and privileged information of > Rackspace. > Any dissemination, distribution or copying of the enclosed material is > prohibited. > If you receive this transmission in error, please notify us immediately by > e-mail > at [email protected], and delete the original message. > Your cooperation is appreciated. > > _______________________________________________ > Mailing list: https://launchpad.net/~nova > Post to : [email protected] > Unsubscribe : https://launchpad.net/~nova > More help : https://help.launchpad.net/ListHelp > > _______________________________________________ Mailing list: https://launchpad.net/~nova Post to : [email protected] Unsubscribe : https://launchpad.net/~nova More help : https://help.launchpad.net/ListHelp

