Re: [GENERAL] UUID column as pimrary key?

2011-01-08 Thread Jasen Betts
On 2011-01-06, Chris Browne cbbro...@acm.org wrote:

 (Sequence + Host Data), and I'd expect there to
 be a considerable risk of Dumb Techs making mistakes there.  It wouldn't
 be difficult for such a scenario to induce systematic key collisions.

I've seen that happen. cleanup of the effected records was tricky,
a few customers were inconvenienced

-- 
⚂⚃ 100% natural

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Alban Hertroys
On 6 Jan 2011, at 24:27, Chris Browne wrote:

 Next to that, UUID's are generated by computers. I have no doubts that
 the numeric space that makes up a UUID allows for collision chances as
 low as described, but are computers capable of generating those
 numbers sufficiently random that they actually achieve that low a
 chance? I think that's pushing it.
 
 RFC 4122 does NOT point to randomness as the only criterion to
 discourage collisions, and treating UUIDs as if they were merely about
 being sufficiently random to achieve low chance of collision is
 insulting to the drafters of the standard, because they were certainly
 NOT so naive as to think that was sufficient.

I'm sure the designers knew what they were getting into. This comment was aimed 
at people claiming things like with this and that huge number of events a 
collision won't occur in 100 billion years, which - to me at least - looks 
like they're only looking at the big number of bits involved without 
understanding statistical analysis.
Let's just say, if the developers of Microsoft Visual Nuclear Power Plant 
Designer Professional were claiming things like that, would you trust their 
product?

The main point with the randomness of UUID's remains that you _can_ have a 
collision at any given moment. It's unlikely to ever happen, but you can't 
predict when it will happen if it does. The possible consequences of a 
collision matter a lot in designing whether and how to handle these collisions. 
Maybe it doesn't matter at all, maybe you should get a really hefty insurance, 
or maybe you need to evacuate the country.

Opposed to that, a sequence isn't random and therefore you can predict when you 
will run into collisions - namely once the sequence wraps. Considering that 
even a 32-bit sequence allows for several billions of rows before collisions 
_can_ occur, you can be certain that your problem is pretty far into the future.
It _will_ be a big problem without an obvious solution if it occurs though, as 
from that point on you will run into a lot of collisions and the resolution to 
the problem is rather dependent on what you're working on.

Now that is not an argument against protecting your application against 
collisions, if there is a chance that you will run into collisions (you won't 
in a 10-record lookup table, for example) then you need to take that into 
consideration in your designs, but there are many (usually obvious) cases in 
which it's safe to omit it. With UUID's that's a little more complicated.

I don't think anyone in this discussion is saying Don't use UUID's!. Just be 
aware of their limitations and the problem domains where they are sensible to 
use. The same goes for sequences.
It would, for example, be (obviously) pretty insane to use UUID's for a 
10-record lookup table. There's plenty of examples in this thread where they 
shine, I don't need to repeat that.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d25777f11541886517442!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Stuart Bishop
 Next to that, UUID's are generated by computers. I have no doubts that
 the numeric space that makes up a UUID allows for collision chances as
 low as described, but are computers capable of generating those
 numbers sufficiently random that they actually achieve that low a
 chance? I think that's pushing it.

 The main point with the randomness of UUID's remains that you _can_ have a 
 collision at any given moment. It's unlikely to ever happen, but you can't 
 predict when it will happen if it does. The possible consequences of a 
 collision matter a lot in designing whether and how to handle these 
 collisions. Maybe it doesn't matter at all, maybe you should get a really 
 hefty insurance, or maybe you need to evacuate the country.

Given the math, I suspect the chance of a UUID collision generated by
a respected library is roughly the same or less than any other method
of choosing a unique id. If you are looking at these extreme
improbabilities, your SERIAL isn't guaranteed unique either when you
take into account cosmic rays flipping the right bits in your ECC
memory or on your disk platter.

So if you are worried about such things, give up now because your
application is going to have to run with hardware and software in this
universe with all its limitations.

Maybe I should start a business in providing UUID collision insurance?

-- 
Stuart Bishop stu...@stuartbishop.net
http://www.stuartbishop.net/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Jasen Betts
On 2011-01-05, Scott Ribe scott_r...@elevated-dev.com wrote:
 On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:

 In practical use I think the odds of a collision are *far* higher than
 you are suggesting, unless the UUID generation is being done with a lot
 more care than is likely if the user takes these sorts of claims at face
 value.

 Eh? The user taking such claims at face value has no bearing whatsoever on 
 the quality of the UUID generation algorithm provided by the OS. So, unless 
 we're talking about users coming up with their own algorithms, it seems 
 reasonable to assume that the generation is done with a great deal of care. 
 (And if we are talking about users coming up with their own algorithms, then 
 all bets are off; feel free to assume the worst.) I know that is the case on 
 OS X  Linux. I would be shocked if it were not the case on Solaris. I would 
 even be surprised if it were not the case on Windows.


 The IETF Network Working Group designed UUIDs to ensure that their
 uniqueness guarantee would be strong enough that no application would
 need to worry about duplicates, ever. Claims that collisions are too
 likely to depend on UUIDs being unique really are claims that the IETF
 Network Working Group didn't know what it was doing, which I find a
 bit ridiculous. 

Who was it that decided on 32 bits for IP addresses?

-- 
⚂⚃ 100% natural

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Scott Ribe
On Jan 6, 2011, at 2:51 AM, Jasen Betts wrote:

 Who was it that decided on 32 bits for IP addresses?

Nice try, but that was rather long before the IETF existed ;-)

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Scott Ribe
On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:

 If you are looking at these extreme
 improbabilities, your SERIAL isn't guaranteed unique either when you
 take into account cosmic rays flipping the right bits in your ECC
 memory or on your disk platter.

Yes, that's rather the point, the probability is so extremely low that it in 
most cases it should be treated as 0. Some people seem to have a problem 
wrapping their heads around relative magnitudes that extreme.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Bill Moran
In response to Scott Ribe scott_r...@elevated-dev.com:

 On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:
 
  If you are looking at these extreme
  improbabilities, your SERIAL isn't guaranteed unique either when you
  take into account cosmic rays flipping the right bits in your ECC
  memory or on your disk platter.
 
 Yes, that's rather the point, the probability is so extremely low that it in 
 most cases it should be treated as 0. Some people seem to have a problem 
 wrapping their heads around relative magnitudes that extreme.

My head is totally wrapped around probability -- yours is not.  It's
amazing the number of people on this list who are confused by a
bunch of x in y #s being thrown around.

If the chance of a duplicate is 1 in a hundred gazillion, then I can
still hit a dupe the VERY FIRST TIME I USE IT.

I'm writing software that is intended to be used to save lives in the
event of an earthquake or flood or cosmic ray flipping bits or any
other massive disaster.  The last thing I need while a bunch of EMTs
are digging bodies out of the rubble trying to save their lives is to
need to resolve a key conflict with a bunch of mobile devices, most of
which are not reachable because the cell network is down because of
the disaster.  There's going to be enough other shit going wrong ...
my software is supposed to be part of the solution.

I don't give a fuck how small the chance of conflict is, the only
viable option for that chance is 0.  Period.  Any argument to the
contrary is stupid, asinine and outright negligent.

Now, if I were writing facebook or twitter or some other pointless
entertainment app, the odd chance of losing a post or whatever to a
key conflict is not something I'm going to worry about for more than
the time it takes to yawn.

And that's been my point all along, despite people trying to dilute it
with nonsense numbers that they don't understand: UUIDs are sufficiently
unique for 99.999% of the applications out there.  However, you
should always take 5 or 10 minutes to consider whether your application
is one of the .001% that can't tolerate the tiny risk.

And also, if your entire solution to that risk is to rollback the
transaction in the event of a conflict, then your application is simple
enough that UUIDs are overkill anyway.  Use them anyway if you want.

As far as statistics are concerned, the chance that someone as batfuck
insane as Hitler would rise to power in a major country is 1 in
1,102,196,287,287,859,992,396,273,293,203 -- yet it happened.

There.  I Godwined the damn thing.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Scott Ribe
On Jan 6, 2011, at 8:14 AM, Bill Moran wrote:

 I don't give a fuck how small the chance of conflict is, the only
 viable option for that chance is 0.  Period.  Any argument to the
 contrary is stupid, asinine and outright negligent.

Do you give a fuck about the chance that bits will flip in the RAM and not be 
detected? Do you give a fuck about the chance that bits will flip in whatever 
persistent storage is in your device and not be detected? Do you give a fuck 
about the chance that bits will be flipped in the network and not be detected? 
Do you give a fuck about the chance that a transistor will go into a metastable 
state instead of flipping, and lock up the device and lose data not yet saved? 
Well, of course you do. But now what are the relative odds? All of these things 
can happen already (and all of them can happen the very first time you use it), 
so already your system is not precisely 100% reliable. Now is the risk of UUID 
collision somewhere near the same as these risks, or orders of magnitude 
higher, or orders of magnitude lower? It does matter.

 However, you
 should always take 5 or 10 minutes to consider whether your application
 is one of the .001% that can't tolerate the tiny risk.

If an application/device truly can't tolerate a risk of an error of 1 in 
10^-16, that's a problem because you can't build a device without risk of error 
below some threshold (in that general neighborhood I think). You can only push 
risks to lower  lower probability, and it makes no sense to focus on a single 
risk and spend time and effort to push it to orders of magnitude lower 
probability than all the other risks in the system. (As long as there are risks 
at orders of magnitude higher priority, those should get the time  expense.)

 And that's been my point all along, despite people trying to dilute it
 with nonsense numbers that they don't understand...

No, it hasn't been your point all along. Your point has shifted twice now as 
you've been shown to be wrong about the odds. And the numbers used are not 
nonsense at all. All of which somewhat contradicts your statement that your 
head is totally wrapped around probability ;-) Added to your apparent 
ignoring of other error sources in order to focus on one extremely unlikely 
one, well...

 And also, if your entire solution to that risk is to rollback the
 transaction in the event of a conflict, then your application is simple
 enough that UUIDs are overkill anyway.

I kind of doubt that the person who posted that intended it as the entire 
solution. It seemed to me that was intended as just the event that triggers 
conflict resolution and the next step would be to inform the device that the 
conflicting record is getting a new UUID, update appropriately, and so on.

Just so you know, I'm done talking to you. Your arrogance, rudeness, insults, 
condescension and personal attacks are not something that I will deal with 
anymore.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Adrian Klaver
On Thursday 06 January 2011 7:14:00 am Bill Moran wrote:
 In response to Scott Ribe scott_r...@elevated-dev.com:
  On Jan 6, 2011, at 1:52 AM, Stuart Bishop wrote:
   If you are looking at these extreme
   improbabilities, your SERIAL isn't guaranteed unique either when you
   take into account cosmic rays flipping the right bits in your ECC
   memory or on your disk platter.
 
  Yes, that's rather the point, the probability is so extremely low that it
  in most cases it should be treated as 0. Some people seem to have a
  problem wrapping their heads around relative magnitudes that extreme.



 There.  I Godwined the damn thing.

 --
 Bill Moran

Maybe a wrap up is in order:) 
As I said earlier this is one of those arguments that could go forever because 
everyone is right, so to summarize:
1) UUIDs can have a very to extremely large namespace but less than infinite.
2) There are other alternatives i.e SERIAL
3) Managing the above is based on the interaction of three components - 
software,hardware,wetware(people). Any one of which can have a weakness and in 
combination their are many permutations.
4) DBAs need to plan for the worse. Worse being somewhat contextual. Real time 
control of a nuclear plant versus Web social media. Choosing a unique number 
generator and dealing with possible collisions is contingent on this context.


-- 
Adrian Klaver
adrian.kla...@gmail.com

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Scott Ribe
On Jan 6, 2011, at 8:19 AM, Michael Satterwhite wrote:

 That would be a matter of incompetent administration. *NOTHING* can protect 
 against that.

Well, no, not necessarily. It might well be a goal (in fact, is a goal with 
some software that I'm developing), that users/admins don't have to worry about 
data caches moving across machines. My primary point, which I stated 
incompletely, was that in order to depend on node ids as part of unique ids, 
requires a degree of control over the administration of nodes, and for a given 
application this might or might not be practical. For instance, if your app 
runs on cell phones, and the OSs you deploy on give you access to the device 
id, and you don't mind using a rather long prefix to form your unique ids, then 
you have an obvious solution that, as far as I know, is guaranteed to be 
unique. (Ignoring the possibility of hacking of the device id, because no 
matter what you choose as a prefix, if an adversary manages to deliberately 
change the prefix, you can get duplicates.) My secondary point was that this is 
rather difficult to detect in time to prevent conflicts.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Anthony
On Wed, Jan 5, 2011 at 3:43 PM, Bill Moran wmo...@potentialtech.com wrote:
 Despite the fact that the chance of a collision is very, very small, there
 is no easy way to fix it if it happens.  Zero.  It can't be done without
 shutting the system down, recalling all the remote devices and manually
 reconciling the problem ... which is not an option.

Sounds like a security problem - an intentional collision is much more
likely than a random one.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Chris Browne
wmo...@potentialtech.com (Bill Moran) writes:
 If the chance of a duplicate is 1 in a hundred gazillion, then I can
 still hit a dupe the VERY FIRST TIME I USE IT.

 I'm writing software that is intended to be used to save lives in the
 event of an earthquake or flood or cosmic ray flipping bits or any
 other massive disaster.  The last thing I need while a bunch of EMTs
 are digging bodies out of the rubble trying to save their lives is to
 need to resolve a key conflict with a bunch of mobile devices, most of
 which are not reachable because the cell network is down because of
 the disaster.  There's going to be enough other shit going wrong ...
 my software is supposed to be part of the solution.

 I don't give a fuck how small the chance of conflict is, the only
 viable option for that chance is 0.  Period.  Any argument to the
 contrary is stupid, asinine and outright negligent.

If your system is sufficiently negligently designed that this particular
conflict causes it to kill people, then I wouldn't be too inclined to
point at this issue with UUIDs being the Real Problem with the system.

This is NOT the only risk that the system faces; you can't get *nearly*
as low probabilities attached to hardware and network issues such as:
 - Disks failing
 - Cosmic rays twiddling bits in memory
 - Network connections failing part way through the work
 - Dumb techs blindly cloning the same host key onto every one of the
   EMTs' data collection devices

That last item is an argument in *FAVOR* of using UUIDs for the EMTs
digging bodies out of rubble scenario...  The typical alternative to
UUIDs would be to have (Sequence + Host Data), and I'd expect there to
be a considerable risk of Dumb Techs making mistakes there.  It wouldn't
be difficult for such a scenario to induce systematic key collisions.

It's never just about one risk; it's about *all the risks.*

 And that's been my point all along, despite people trying to dilute it
 with nonsense numbers that they don't understand: UUIDs are sufficiently
 unique for 99.999% of the applications out there.  However, you
 should always take 5 or 10 minutes to consider whether your application
 is one of the .001% that can't tolerate the tiny risk.

If your application is so fragile that it cannot tolerate *that* tiny
risk, then I have to wonder if your system isn't Just Plain Too Fragile,
because there are other unavoidable risks likely to be of much greater
probability.
-- 
Have  you noticed   that,  when we were   young,  we were   told that
`everybody else   is doing  it' was   a  really  stupid reason  to  do
something,  but now it's the standard  reason for picking a particular
software package? -- Barry Gehm 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Chris Browne
dennis.jenkins...@gmail.com (dennis jenkins) writes:
 The UUID itself is 128 bits.  Some of those bits are pre-determined.
 I don't recall, but I think that a normal UUID has 121 bits of
 randomness.

That doesn't match RFC 4122 very well...

It indicates 5 forms of UUIDs:

1) Time-based, where about 1/2 the data comes from local timestamp data,
   and 48 bits come from MAC address (or similar)

2) DCE Security (about which it says little)

3) Name-based, using MD5 hashing

4) Randomly generated UUIDs (which are quite likely what you're thinking
   about) have 122 bits of random data

5) Name-based, using SHA-1 hashing

The reasonable choices for a would-be artificial primary key seem to be
1 and 3; in a distributed system, I'd expect to prefer 1, as the time +
host data are likely to eliminate the oh, it might just randomly match
problem.

Note the set of functions in the uuid-ossp contrib module fit this, down
to omitting Version 2 :-).

t...@localhost-  \df+ public.uuid_generate*
 List of 
functions
 Schema |Name| Result data type |Argument data types|  
Type  | Volatility |  Owner   | Language |Source code | Description
++--+---+++--+--++-
 public | uuid_generate_v1   | uuid |   | 
normal | volatile   | postgres | c| uuid_generate_v1   |
 public | uuid_generate_v1mc | uuid |   | 
normal | volatile   | postgres | c| uuid_generate_v1mc |
 public | uuid_generate_v3   | uuid | namespace uuid, name text | 
normal | immutable  | postgres | c| uuid_generate_v3   |
 public | uuid_generate_v4   | uuid |   | 
normal | volatile   | postgres | c| uuid_generate_v4   |
 public | uuid_generate_v5   | uuid | namespace uuid, name text | 
normal | immutable  | postgres | c| uuid_generate_v5   |
(5 rows)
-- 
I'm all for advancement.  However  rich text on an ephemeral media is
a totally brain-dead idea.  Nobody  in their right mind would take the
effort to prettyfy documents that are going to be gone in a few days.
-- Jay Denebeim deneb...@deepthot.ml.org

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Scott Ribe
On Jan 6, 2011, at 9:31 AM, Chris Browne wrote:

 The reasonable choices for a would-be artificial primary key seem to be
 1 and 3; in a distributed system, I'd expect to prefer 1, as the time +
 host data are likely to eliminate the oh, it might just randomly match
 problem.

In some contexts, 1 is considered a security weakness, as it reveals 
information about which machine generated it and when, which is why most 
OS-supplied uuid generators now default to 4 (random). This tends to be more of 
a concern with encryption/security uses, and if it's not a concern for your 
db[*], then your are correct that 1 is likely the best choice.

[*] After all, in many dbs we log all sorts of explicit where/who/when for 
auditing purposes. In that case, having ids that provide a clue of where/when 
most certainly does not add any legitimate security concern.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Alban Hertroys
On 6 Jan 2011, at 17:51, Chris Browne wrote:

 wmo...@potentialtech.com (Bill Moran) writes:
 If your system is sufficiently negligently designed that this particular
 conflict causes it to kill people, then I wouldn't be too inclined to
 point at this issue with UUIDs being the Real Problem with the system.
 
 This is NOT the only risk that the system faces; you can't get *nearly*
 as low probabilities attached to hardware and network issues such as:
 - Disks failing
 - Cosmic rays twiddling bits in memory
 - Network connections failing part way through the work
 - Dumb techs blindly cloning the same host key onto every one of the
   EMTs' data collection devices


Let's say that you actually build a mission critical system for which you'd 
need to evacuate the country if it fails. You pick the best ECC RAM you can 
find, the most reliable type of disk storage available, your fallback network 
has a fallback network of it's own, etc. Basically you have done everything you 
could to ensure that the chances of the system failing are as small as 
technically possible.

All those little failure chances add up to a certain number. Using UUID's for 
your ID's is not required for the design of the system, yet you chose to do so. 
You added a nearly infinite chance of UUID collisions to the accumulated chance 
of the system failing.

Now the system miraculously fails and the country needs evacuating. A committee 
is going to investigate why it failed. If the dumb techy above is responsible, 
they just found themselves a scape-goat. If they didn't, but stumble upon your 
unnecessary usage of UUID's instead... Let's just say I don't want to be that 
person.

I have to agree with Bill here, if lives depend on your system then anything 
that adds to the failure chances is very hard to defend. In the end it often 
boils down to responsibility in case of failure, not to mention what it does to 
your own peace of mind.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d260af211541129314545!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Chris Browne
dal...@solfertje.student.utwente.nl (Alban Hertroys) writes:
 On 6 Jan 2011, at 17:51, Chris Browne wrote:

 wmo...@potentialtech.com (Bill Moran) writes:
 If your system is sufficiently negligently designed that this particular
 conflict causes it to kill people, then I wouldn't be too inclined to
 point at this issue with UUIDs being the Real Problem with the system.
 
 This is NOT the only risk that the system faces; you can't get *nearly*
 as low probabilities attached to hardware and network issues such as:
 - Disks failing
 - Cosmic rays twiddling bits in memory
 - Network connections failing part way through the work
 - Dumb techs blindly cloning the same host key onto every one of the
   EMTs' data collection devices

 Let's say that you actually build a mission critical system for which
 you'd need to evacuate the country if it fails. You pick the best ECC
 RAM you can find, the most reliable type of disk storage available,
 your fallback network has a fallback network of it's own,
 etc. Basically you have done everything you could to ensure that the
 chances of the system failing are as small as technically possible.

 All those little failure chances add up to a certain number. Using
 UUID's for your ID's is not required for the design of the system, yet
 you chose to do so. You added a nearly infinite chance of UUID
 collisions to the accumulated chance of the system failing.

Infinite?  The probability can't conceivably exceed 1.

It's scarcely likely to exceed infinitesimal.

I've built clustered systems, and frequently, the resulting Rube
Goldberg apparatus that tries to protect against failures of the other
apparatus trying to protect against failures of further apparatus that
tries to protect against failures introduces a tall and unwieldy stack
of intricately interwoven components such that operators need to be
*mighty* careful not to tip anything over lest the protective apparatus
collapse, knocking over the system it was supposed to protect.

 Now the system miraculously fails and the country needs evacuating. A
 committee is going to investigate why it failed. If the dumb techy
 above is responsible, they just found themselves a scape-goat. If they
 didn't, but stumble upon your unnecessary usage of UUID's
 instead... Let's just say I don't want to be that person.

If the system is that mission critical, then it well and truly warrants
doing enough proper analysis of the risks to *know* the risks of the
various expectable failure conditions, and to do so in rather more
detail than the oversimplification of characterizing them as
infinitesimal or infinite.

 I have to agree with Bill here, if lives depend on your system then
 anything that adds to the failure chances is very hard to defend. In
 the end it often boils down to responsibility in case of failure, not
 to mention what it does to your own peace of mind.

It seems to me that using serially assigned values, along with manually
assigned server IDs, to construct a would-be-unique value, is likely to
introduce quite a lot *more risk* of system failure than would the use
of UUIDs.

So someone that rules out UUIDs based on some fallacious imagined
infinite chance of collisions is jumping away from a small risk, and
accepting one much more likely to take lives.

We haven't seen any indication that would distinguish between infinite
and infinitesimal, beyond the fact that infinite is infinitely
larger than the largest probability that one can find for an event,
which is 1.
-- 
(format nil ~...@~s cbbrowne gmail.com)
But   life  wasn't yes-no,   on-off.   Life was shades   of gray, and
rainbows not in the order of the spectrum.
-- L. E. Modesitt, Jr., _Adiamante_

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Bill Moran
In response to Chris Browne cbbro...@acm.org:
 
 It seems to me that using serially assigned values, along with manually
 assigned server IDs, to construct a would-be-unique value, is likely to
 introduce quite a lot *more risk* of system failure than would the use
 of UUIDs.

First off, server IDs are not randomly assigned.  They are assigned
automatically when the software is activated and tracked by the master
server.

Second, there is a very simple mechanism to prevent attempted device
cloning from causing a problem. (it's not a particularly difficult
problem to solve, actually)

Third, and very important, is the following conversation we had during
planning:
Can we get good quality random data on mobile devices?
We'll have to research it, we don't know yet.
Wait, before we do the research -- if we find that some mobile devices
have good quality random data and others don't -- can we tell the client
which devices they're allowed to use?
No, the client will dictate which devices it must work on.

So basically, we realized that even if our concerns about UUID
collisions we're unreasonably paranoid, we still couldn't guarantee
that the devices would be up to the task of generating UUIDs with
sufficient randomness.

As I tried to point out earlier in the thread, the collision issue was
one of MANY things we considered in this design.  It is, however, the
ONLY one that's directly relevant to the original conversation.  Now
that my description of our design seems to be coming under fire, I
feel the need to at least point out that we thought it through more
than that.  I suppose that's mostly my own fault for trying to bring
in an example that it's impractical to discuss in full detail.

As other people have pointed out, there are concerns about malicious
impersonation, device failure, data corruption ... each of these may
be more or less likely or dangerous than UUID collision, but not all
of them are _intended_ to be solved by not using UUIDs, so claiming
that abandoning UUIDs does not fix these problems is completely
correct, and also a straw man.

 So someone that rules out UUIDs based on some fallacious imagined
 infinite chance of collisions is jumping away from a small risk, and
 accepting one much more likely to take lives.

The possibility of collisions is not fallacious, however, the use of
infinite (I don't remember who wrote that) is obviously not correct.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread A.M.

On Jan 6, 2011, at 3:52 AM, Stuart Bishop wrote:
 Maybe I should start a business in providing UUID collision insurance?

Your ideas are intriguing to me and I wish to subscribe to your newsletter.

-M

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Dennis Gearon
Hey guys and gals,
 As the originator of this topic, I've received a lot of good answers, 
opinions, and advice. Thank  you.

I'm not sure that more conversation on this will go anywhere but down. It 
seems that UUID vs. Integer is one of those 'values' subjects, like:

  Sexual practices and preferences.
  The right way to raise children.
  How to handle money in a family.
  What are the defintions of a woman and a man?
  What religion is best and correct?
  Which political party really has the best interests of the country as a goal, 
and is the better party?
  What's an appropriate lifestyle?

Uh . . . . my mother never taught me this(deprived childhood), but I learned 
the 
hardway. NEVER discuss these things in public, (especially drunk in a bar in a 
part of town at a different socio economic level than you), but learn what any 
possible mate things of these ;-)

And UUID vs. Integer? Seems to be something not to discuss too long in a 
newsgroup. Seriously, there's some emotions here on this subject that as the 
originator of this topic, I'd like to see calm down and get back to productive 
work. Is it a deal, guys and gals?

   This newsgroup I always brag about to my other geek friends. 'Those people 
on 
the postgres news group are the steadiest, most helpful, most logical people I 
know on the Internet. The don't jump on fancy new trends nor hide technology in 
market speak. Their work revolves around the best tested, theory based 
practices. They just make the tech world a better place with some of the most 
reliable, best supported OSS out there.' And you guys and gals ARE that 
description.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-06 Thread Alban Hertroys
On 6 Jan 2011, at 20:36, Chris Browne wrote:

 Infinite?  The probability can't conceivably exceed 1.


Don't start picking om words please, infinitely small or infinitesimal is 
obviously what I meant to write there.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d263d0511543498117537!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Radosław Smogura
On Tue, 4 Jan 2011 11:07:00 -0800 (PST), Dennis Gearon 
gear...@sbcglobal.net wrote:

I haven't been able to find anywhere, easily, in the documentation
using google
where a list of allowed data types for primary keys is.

So, UUIDs can be primary keys?
Any issues wtih them on sorting or paging of index tables, etc.?


Disadvantage
* aren't ordered in way as records are added to DB.
* 128bit length (PSQL stores them as 128bit value)
* slower to generate you need to use random number generator
* ... if you do select * on table with two uuids you will need to 
scroll GUI to see data :)

* ... really unhandy if you want to make manual updates :)

Advantage:
* simple to generate, and 128bit random is almost globally unique,
* you have your id, before executing query, (in contrast to all this 
autoincrement) so you may put it in dependant rows

* almost every platform has UUID generator

Advantage / disadvantage
* depending on UUID generator, UUID can store some privacy 
information e.g. MAC address of your card, such UUID.


Personally I prefer pooled incremental id's. Fast, unique, you have Id 
before query - but you need to write code by self.



Also, the documentation says that UUIDs are 128 bit value, but never
explicitly
says that's how it's stored. Nor does it use one of the nice, blue 
headered

tables for UUID (or ENUM) showing storage and other attributes as it
does for
numeric, character,boolean, date/time, binary, monetary, geometric,
or network
types.



 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is
usually a better
idea to learn from others’ mistakes, so you do not have to make them
yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Craig Ringer

On 01/05/2011 07:31 PM, Radosław Smogura wrote:


* you have your id, before executing query, (in contrast to all this
autoincrement) so you may put it in dependant rows


Do you mean that with a UUID, you don't need to talk to the database at 
all, you can generate an ID with no interaction with / involvement with 
the database at all? Because other than that, there's not much 
difference in how you normally work with them.



With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
y integer
);
INSERT INTO x(y) VALUES (1);


With a uuid, you'd:

CREATE TABLE x (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
y integer
);
INSERT INTO x(y) VALUES (1);


In either case, you can explicitly call the generator function for 
seq/uuid - nextval(seqname) or uuid_generate_v4() respectively - or you 
can omit the PK column in your inserts and let the database generate it.



Personally I prefer pooled incremental id's. Fast, unique, you have Id
before query - but you need to write code by self.


Many libraries / ORMs / etc that interact with Pg will happily take care 
of this for you. In fact, I had to fight to convince Hibernate that I 
*didn't* want it to increment all my counters in steps of 50.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Radosław Smogura
On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer 
cr...@postnewspapers.com.au wrote:

On 01/05/2011 07:31 PM, Radosław Smogura wrote:


* you have your id, before executing query, (in contrast to all this
autoincrement) so you may put it in dependant rows


Do you mean that with a UUID, you don't need to talk to the database
at all, you can generate an ID with no interaction with / involvement
with the database at all? Because other than that, there's not much
difference in how you normally work with them.


With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
y integer
);
INSERT INTO x(y) VALUES (1);

I mean situation, when You create e.g. in one transaction, book and 
chapters, in some way You need retrieve book's id, by returning clause 
of insert, or by obtaining id form sequence.

It's simpler to write:
book_id = new uuid();
insert into book values(book_id,);
insert into chapters values(new uuid(), book_id, ...);
isn't it?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Szymon Guz
On 5 January 2011 15:28, Radosław Smogura rsmog...@softperience.eu wrote:

 On Wed, 05 Jan 2011 21:50:11 +1100, Craig Ringer 
 cr...@postnewspapers.com.au wrote:

 On 01/05/2011 07:31 PM, Radosław Smogura wrote:

  * you have your id, before executing query, (in contrast to all this
 autoincrement) so you may put it in dependant rows


 Do you mean that with a UUID, you don't need to talk to the database
 at all, you can generate an ID with no interaction with / involvement
 with the database at all? Because other than that, there's not much
 difference in how you normally work with them.


 With a sequence, you might:

 CREATE SEQUENCE x_id_seq;
 CREATE TABLE x (
id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
y integer
 );
 INSERT INTO x(y) VALUES (1);

  I mean situation, when You create e.g. in one transaction, book and
 chapters, in some way You need retrieve book's id, by returning clause of
 insert, or by obtaining id form sequence.
 It's simpler to write:
 book_id = new uuid();
 insert into book values(book_id,);
 insert into chapters values(new uuid(), book_id, ...);
 isn't it?



For me it is simpler just to write this:

bookid = insert into books(...) values(...) returning book_id;
insert into chapters(book_id, ...) values( bookid, ...);

but it's a matter of taste, I think.

regards
Szymon


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:

 * simple to generate, and 128bit random is almost globally unique,

Almost? Should be totally unique, as long as your random source is decent 
quality.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 7:28 AM, Radosław Smogura wrote:

 It's simpler to write:
...
 isn't it?

Depends on the situation, the libraries you're using, and so on.

Now, if you're generating records in a distributed system, where your node 
might be disconnected when it's creating a record, it is *much* simpler in that 
case ;-)

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Grzegorz Jaśkiewicz
On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe scott_r...@elevated-dev.com wrote:
 On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:

 * simple to generate, and 128bit random is almost globally unique,

 Almost? Should be totally unique, as long as your random source is decent 
 quality.

But I would never rely on that alone. You always have a strategy in
place, in case there's a duplicate.


-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Mike Christensen
2011/1/5 Grzegorz Jaśkiewicz gryz...@gmail.com:
 On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe scott_r...@elevated-dev.com 
 wrote:
 On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:

 * simple to generate, and 128bit random is almost globally unique,

 Almost? Should be totally unique, as long as your random source is decent 
 quality.

 But I would never rely on that alone. You always have a strategy in
 place, in case there's a duplicate.

As long as all your UUIDs are generated with the same algorithm, they
are guaranteed to be unique.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 7:55 AM, Grzegorz Jaśkiewicz wrote:

 But I would never rely on that alone. You always have a strategy in
 place, in case there's a duplicate.

That's really unnecessary, basically a total waste of effort.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Bill Moran
In response to Scott Ribe scott_r...@elevated-dev.com:

 On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:
 
  * simple to generate, and 128bit random is almost globally unique,
 
 Almost? Should be totally unique, as long as your random source is decent 
 quality.

This is going off-topic, but I did some research on this because we
were considering using UUIDs for various keys ...

Fact is, UUIDs are not _guaranteed_ to be unique.  If you use the generating
system that includes a MAC address, then in theory, they are guaranteed
to be unique, but in practice, MAC addresses aren't guaranteed to be
unique either, so that's not 100% either.

Beyond that, the namespace size for a UUID is so incomprehensibly huge
that the chance of two randomly generated UUIDs having the same value
is incomprehensibly unlikely ... it is, however, not a 100% guarantee.

Anyway, in our case, we determined that the chance of UUID collision
for the dataset in question was extremely unlikely, however, the
consequences of such a collision were pretty bad.  We also determined
that we were able to control a unit ID for each independent system
that would generate IDs, which could (a) be part of a unique seed for
UUIDs, or (b) be a prefix to a autonumber ID that would be a lot easier
to read and work with manually.  In the end, we chose b for the human
factor.

Face it, reading, remembering, and typing UUIDs kinda sucks.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 8:03 AM, Mike Christensen wrote:

 As long as all your UUIDs are generated with the same algorithm, they
 are guaranteed to be unique.

There is no requirement that they be generated with the same algorithm in order 
to be unique. A MAC/time-based UUID cannot duplicate a random one, and vice 
versa. (Also applies to the 3rd flavor of UUID whose details I do not remember.)

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:

 Beyond that, the namespace size for a UUID is so incomprehensibly huge
 that the chance of two randomly generated UUIDs having the same value
 is incomprehensibly unlikely

Yes, as in: it is *far* more likely that all of your team members and all of 
your client contacts will be simultaneously struck by lightning and killed in 
their sleep, and it is *far* more likely that all life on earth will be wiped 
out by an asteroid impact, and it is *far* more likely that the solar system 
orbits are not actually stable and earth will fly off into space... If you're 
worried about UUID collisions, then either your priorities are completely 
wrong, or you live in a bomb shelter--that's not sarcasm by the way, it's 
simply true, the chance of a collision is so vanishingly small that it is 
dwarfed by all sorts of risks that we all ignore because the chances are so 
low, including the chance that all drives in all your RAIDs across all your 
replicas will simultaneously fail on the same day that fires start in all the 
locations where your tapes are stored and all the sprinkler systems fail... (By 
far more likely, I mean many many many orders of magnitude...)

 In the end, we chose b for the human
 factor.

A very good decision, in the case where you're actually able to control each 
independent system.

 Face it, reading, remembering, and typing UUIDs kinda sucks.

Lots of copy  paste, or custom GUI tools for devs  DBAs, or abuse like 
'...%', all of them painful in their own way.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Bill Moran
In response to Scott Ribe scott_r...@elevated-dev.com:

 On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
 
  Beyond that, the namespace size for a UUID is so incomprehensibly huge
  that the chance of two randomly generated UUIDs having the same value
  is incomprehensibly unlikely
 
 Yes, as in: it is *far* more likely that all of your team members and all of 
 your client contacts will be simultaneously struck by lightning and killed in 
 their sleep, and it is *far* more likely that all life on earth will be wiped 
 out by an asteroid impact, and it is *far* more likely that the solar system 
 orbits are not actually stable and earth will fly off into space... If you're 
 worried about UUID collisions, then either your priorities are completely 
 wrong, or you live in a bomb shelter--that's not sarcasm by the way, it's 
 simply true, the chance of a collision is so vanishingly small that it is 
 dwarfed by all sorts of risks that we all ignore because the chances are so 
 low, including the chance that all drives in all your RAIDs across all your 
 replicas will simultaneously fail on the same day that fires start in all the 
 locations where your tapes are stored and all the sprinkler systems fail... 
 (By far more likely, I mean many many many orders of magnitude...)

That statement demonstrates a lack of investigation and/or consideration
of the circumstances.

I can't find my math or I'd reproduce it here, but consider this:

If you have 50 devices, each generating 100 UUIDs per hour, and you'll
keep records for 1 year, then your argument above is probably accurate.

However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
be keeping those records for 10+ years, the chances of collisions near
the end of that 10 year span get high enough to actually make developers
nervous.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Tom Lane
Scott Ribe scott_r...@elevated-dev.com writes:
 On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
 Beyond that, the namespace size for a UUID is so incomprehensibly huge
 that the chance of two randomly generated UUIDs having the same value
 is incomprehensibly unlikely

 Yes, as in: it is *far* more likely that all of your team members and all of 
 your client contacts will be simultaneously struck by lightning and killed in 
 their sleep, and it is *far* more likely that all life on earth will be wiped 
 out by an asteroid impact,

I see those sorts of arguments all the time, and I consider them pure BS.
Yes, the namespace is theoretically large.  However, the questions you
really have to answer are (a) how much of the namespace is actually
being used by the UUID generation methods in use in a particular
application; (b) how sure can you be that there is not correlation
between UUIDs generated in different places/sessions.

In practical use I think the odds of a collision are *far* higher than
you are suggesting, unless the UUID generation is being done with a lot
more care than is likely if the user takes these sorts of claims at face
value.  The odds may still be low enough to be a very good risk, but
you need to think about it not just bet your database on it without
thinking.

Being paranoid is a good thing.  It's what DBAs are paid for.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Rob Sargent


On 01/05/2011 08:55 AM, Bill Moran wrote:
 In response to Scott Ribe scott_r...@elevated-dev.com:
 
 On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:

 Beyond that, the namespace size for a UUID is so incomprehensibly huge
 that the chance of two randomly generated UUIDs having the same value
 is incomprehensibly unlikely

 Yes, as in: it is *far* more likely that all of your team members and all of 
 your client contacts will be simultaneously struck by lightning and killed 
 in their sleep, and it is *far* more likely that all life on earth will be 
 wiped out by an asteroid impact, and it is *far* more likely that the solar 
 system orbits are not actually stable and earth will fly off into space... 
 If you're worried about UUID collisions, then either your priorities are 
 completely wrong, or you live in a bomb shelter--that's not sarcasm by the 
 way, it's simply true, the chance of a collision is so vanishingly small 
 that it is dwarfed by all sorts of risks that we all ignore because the 
 chances are so low, including the chance that all drives in all your RAIDs 
 across all your replicas will simultaneously fail on the same day that fires 
 start in all the locations where your tapes are stored and all the sprinkler 
 systems fail... (By far more likely, I mean many many many orders of 
 magnitude...)
 
 That statement demonstrates a lack of investigation and/or consideration
 of the circumstances.
 
 I can't find my math or I'd reproduce it here, but consider this:
 
 If you have 50 devices, each generating 100 UUIDs per hour, and you'll
 keep records for 1 year, then your argument above is probably accurate.
 
 However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
 be keeping those records for 10+ years, the chances of collisions near
 the end of that 10 year span get high enough to actually make developers
 nervous.
 

But we're talking about a primary key.  Postgres guarantees the
uniqueness.  1 transaction  in 10^^100 rolls back due to a second
instance of an (otherwise/almost) uuid.  Big deal.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 8:55 AM, Bill Moran wrote:

 That statement demonstrates a lack of investigation and/or consideration
 of the circumstances.

No, it doesn't.

 However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
 be keeping those records for 10+ years, the chances of collisions near
 the end of that 10 year span get high enough to actually make developers
 nervous.

No, they don't. At the end of your hypothetical 10-year period, you will have 
used about 43,000,000,000 UUIDs, or about 
1/100,000,000,000,000,000,000,000,000th of the UUID space (assuming random 
UUIDs). Leaving you with a chance of a single collision of about 
1/18,000,000,000,000,000.

Assuming of course good entropy. If the generation of random numbers is bad, 
then UUIDs are not so useful ;-)

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Leif Biberg Kristensen
On Wednesday 5. January 2011 16.05.29 Bill Moran wrote:

 Beyond that, the namespace size for a UUID is so incomprehensibly huge
 that the chance of two randomly generated UUIDs having the same value
 is incomprehensibly unlikely ... it is, however, not a 100% guarantee.
 
I can't help thinking of the «Birthday Paradox»:

http://en.wikipedia.org/wiki/Birthday_problem

regards, Leif

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:

 In practical use I think the odds of a collision are *far* higher than
 you are suggesting, unless the UUID generation is being done with a lot
 more care than is likely if the user takes these sorts of claims at face
 value.

Eh? The user taking such claims at face value has no bearing whatsoever on the 
quality of the UUID generation algorithm provided by the OS. So, unless we're 
talking about users coming up with their own algorithms, it seems reasonable to 
assume that the generation is done with a great deal of care. (And if we are 
talking about users coming up with their own algorithms, then all bets are off; 
feel free to assume the worst.) I know that is the case on OS X  Linux. I 
would be shocked if it were not the case on Solaris. I would even be surprised 
if it were not the case on Windows.

The IETF Network Working Group designed UUIDs to ensure that their uniqueness 
guarantee would be strong enough that no application would need to worry about 
duplicates, ever. Claims that collisions are too likely to depend on UUIDs 
being unique really are claims that the IETF Network Working Group didn't know 
what it was doing, which I find a bit ridiculous.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote:

 I can't help thinking of the «Birthday Paradox»:

Yes, the calculation of the probability of a collision is the same for the 
birthday paradox as for random UUID collisions.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Grzegorz Jaśkiewicz
On Wed, Jan 5, 2011 at 3:03 PM, Mike Christensen m...@kitchenpc.com wrote:
 2011/1/5 Grzegorz Jaśkiewicz gryz...@gmail.com:
 On Wed, Jan 5, 2011 at 2:37 PM, Scott Ribe scott_r...@elevated-dev.com 
 wrote:
 On Jan 5, 2011, at 1:31 AM, Radosław Smogura wrote:

 * simple to generate, and 128bit random is almost globally unique,

 Almost? Should be totally unique, as long as your random source is decent 
 quality.

 But I would never rely on that alone. You always have a strategy in
 place, in case there's a duplicate.

 As long as all your UUIDs are generated with the same algorithm, they
 are guaranteed to be unique.


Good luck with that 



-- 
GJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Adrian Klaver

On 01/05/2011 08:29 AM, Scott Ribe wrote:

On Jan 5, 2011, at 9:01 AM, Tom Lane wrote:


In practical use I think the odds of a collision are *far* higher than
you are suggesting, unless the UUID generation is being done with a lot
more care than is likely if the user takes these sorts of claims at face
value.


Eh? The user taking such claims at face value has no bearing whatsoever on the 
quality of the UUID generation algorithm provided by the OS. So, unless we're 
talking about users coming up with their own algorithms, it seems reasonable to 
assume that the generation is done with a great deal of care. (And if we are 
talking about users coming up with their own algorithms, then all bets are off; 
feel free to assume the worst.) I know that is the case on OS X  Linux. I 
would be shocked if it were not the case on Solaris. I would even be surprised if 
it were not the case on Windows.

The IETF Network Working Group designed UUIDs to ensure that their uniqueness 
guarantee would be strong enough that no application would need to worry about 
duplicates, ever. Claims that collisions are too likely to depend on UUIDs 
being unique really are claims that the IETF Network Working Group didn't know 
what it was doing, which I find a bit ridiculous.



Maybe or maybe not:)

http://www.ietf.org/rfc/rfc4122.txt

4.3.  Algorithm for Creating a Name-Based UUID

   The version 3 or 5 UUID is meant for generating UUIDs from names
   that are drawn from, and unique within, some name space.  The
   concept of name and name space should be broadly construed, and not
   limited to textual names.  For example, some name spaces are the
   domain name system, URLs, ISO Object IDs (OIDs), X.500 Distinguished
   Names (DNs), and reserved words in a programming language.  The
   mechanisms or conventions used for allocating names and ensuring
   their uniqueness within their name spaces are beyond the scope of
   this specification.

   The requirements for these types of UUIDs are as follows:

   o  The UUIDs generated at different times from the same name in the
  same namespace MUST be equal.

   ..


--
Adrian Klaver
adrian.kla...@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Radosław Smogura
Sorry for not citation...

When I was talking about almost unique, I was meaning that the UUID is 
random so there is no guarantee that you will not generate two indencital 
UUIDs even in subsequent calls, but it has low probability (you have 
greater chances to win in LOTTO).

128bits is huge for now, but what will happen in next 2,3 years? In 20th 
century, people think storing only last two digit of year will be enaugh!!! 
Suppose we are creating gloabl, distributed database for storing information 
about mobile device and base station it logged in.

If we want to guarantee uniquness of UUID across calls, we could talk about 
much more far _pseudo_ random generator, then normal pseudo - randoms, and 
what I think we need to keep state of such random generator, and share and 
lock it for multiple concurrent calls. So it will not be something different 
then ordinal serial column...

Now I think it's clear there is no magical algorithm for UUIDs.

My opinion about all of those UUID with MAC, IP addresses, Microsoft growing 
UUIDs. All of this decrases chance of uniqness of UUID. If we will keep n 
first bits of UUID constant, then we have only 128-n bits random (truly in 
UUIDs we have one decimal field reserved for UUID version). If we want next 
UUID to be greater then previous, at each call we will remove (next-prev) of 
possible values.

Shouldn't this be enaugh for namespace UUIDs
new UUID(namespece.hashCode(), name.hashChode())

or a little joke...
new UUID(1,1) meats this condition
 o  The UUIDs generated at different times from the same name in the
 
same namespace MUST be equal.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Fwd: Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Adrian Klaver



 Original Message 
Subject: Re: [GENERAL] UUID column as pimrary key?
Date: Wed, 5 Jan 2011 10:11:49 -0700
From: Scott Ribe scott_r...@elevated-dev.com
To: Adrian Klaver adrian.kla...@gmail.com

On Jan 5, 2011, at 9:57 AM, Adrian Klaver wrote:


Maybe or maybe not:)


So... If you choose to use a name-based UUID, *and* you do a bad job of 
picking a name, then you have a much higher risk of collision. But it's 
a pretty good bet (as in 100% of all operating systems that I know of) 
that if you simply call the OS's uuid function that won't happen.



--
Scott Ribe


First this is one of those arguments that can go forever because 
everyone is both a bit right and a bit wrong.


Second the original questions was UUID in the context of Postgres and 
the UUID generation algorithm using name is one of the choices in the 
Postgres module:

http://www.postgresql.org/docs/9.0/interactive/uuid-ossp.html
So there is a likelihood that it may be used.

Third the above was a response to your assertion that the ITEF 
guaranteed UUID uniqueness.


Lastly Adrians First Rule applies:
The good will take care of itself, its the bad you have to plan for.

The problem is not if people do the right thing it is if they do the 
wrong thing. As someone else upstream pointed the OP was looking to use 
UUIDs as a PK and in that case the database will enforce another 
namespace and in worst case you will need to retry with a different 
UUID. That covers the bad.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: *****SPAM***** Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 10:30 AM, Radosław Smogura wrote:

 128bits is huge for now, but what will happen in next 2,3 years?

It will still be large. When you get up to around 100 trillion UUIDs, you'll be 
getting up to around a 1 in a billion chance of a single collision. Before you 
claim that we'll outgrow them in a few years, you might want to consider the 
actual numbers, the growth rate of storage density, the number of disks 
required to hold that many records. I have a feeling the answers might surprise 
you ;-)

 If we want to guarantee uniquness of UUID across calls, we could talk about 
 much more far _pseudo_ random generator, then normal pseudo - randoms, and 
 what I think we need to keep state of such random generator, and share and 
 lock it for multiple concurrent calls. So it will not be something different 
 then ordinal serial column...

No, we don't really have to think about it much at all. The IETF and OS 
engineers already did. Random UUIDs are not generated with some sloppy 
amateurish algorithm.

 My opinion about all of those UUID with MAC, IP addresses, Microsoft 
 growing 
 UUIDs. All of this decrases chance of uniqness of UUID.

Well, a decrease from one insanely small chance to another insanely small 
chance is not anything to worry about. After all, you could argue that 128 bits 
is a decrease from 256 bits. It's the same argument. UUIDs were designed to 
avoid collisions, by people who knew what they were doing. More significant 
bits would lower the chance, from already low enough to even more low 
enough.

 Shouldn't this be enaugh for namespace UUIDs
 new UUID(namespece.hashCode(), name.hashChode())
 
 or a little joke...
 new UUID(1,1) meats this condition
o  The UUIDs generated at different times from the same name in the
 
   same namespace MUST be equal.
 

People, people, people, please. *Namespace* UUIDs are intended to map *unique* 
names to UUIDs in the case where you already have *unique* names and just need 
to map them to a more compact form, thus the requirement that the same 
namespace + name always yields the same UUID. This is *not* a weakness in 
UUIDs, nor is it the kind of UUID you get from a simple uuid_gen or similar 
call, nor is it a possible source of collisions for database UUIDs (unless you 
do something enormously stupid, like use database fields to construct a name to 
give to a UUID generator).

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Radosław Smogura
Scott Ribe scott_r...@elevated-dev.com Wednesday 05 January 2011 17:33:51
 On Jan 5, 2011, at 9:19 AM, Leif Biberg Kristensen wrote:
  I can't help thinking of the «Birthday Paradox»:
 Yes, the calculation of the probability of a collision is the same for the
 birthday paradox as for random UUID collisions.

Depends on probability measure we will choose. The true is that probability 
that in two coin drops we will get two reverses is 1/4, but true is, too, as 
Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't 
need to drop again.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Bill Moran
In response to Rob Sargent robjsarg...@gmail.com:

 
 
 On 01/05/2011 08:55 AM, Bill Moran wrote:
  In response to Scott Ribe scott_r...@elevated-dev.com:
  
  On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:
 
  Beyond that, the namespace size for a UUID is so incomprehensibly huge
  that the chance of two randomly generated UUIDs having the same value
  is incomprehensibly unlikely
 
  Yes, as in: it is *far* more likely that all of your team members and all 
  of your client contacts will be simultaneously struck by lightning and 
  killed in their sleep, and it is *far* more likely that all life on earth 
  will be wiped out by an asteroid impact, and it is *far* more likely that 
  the solar system orbits are not actually stable and earth will fly off 
  into space... If you're worried about UUID collisions, then either your 
  priorities are completely wrong, or you live in a bomb shelter--that's not 
  sarcasm by the way, it's simply true, the chance of a collision is so 
  vanishingly small that it is dwarfed by all sorts of risks that we all 
  ignore because the chances are so low, including the chance that all 
  drives in all your RAIDs across all your replicas will simultaneously fail 
  on the same day that fires start in all the locations where your tapes are 
  stored and all the sprinkler systems fail... (By far more likely, I mean 
  many many many orders of magnitude...)
  
  That statement demonstrates a lack of investigation and/or consideration
  of the circumstances.
  
  I can't find my math or I'd reproduce it here, but consider this:
  
  If you have 50 devices, each generating 100 UUIDs per hour, and you'll
  keep records for 1 year, then your argument above is probably accurate.
  
  However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
  be keeping those records for 10+ years, the chances of collisions near
  the end of that 10 year span get high enough to actually make developers
  nervous.
  
 
 But we're talking about a primary key.  Postgres guarantees the
 uniqueness.  1 transaction  in 10^^100 rolls back due to a second
 instance of an (otherwise/almost) uuid.  Big deal.

That doesn't make any sense.  If you're using a single PostgreSQL instance,
then why not just use the built in SERIAL mechanism that guarantees that
you will NEVER have a conflict?

In our case (and I expect it's the case with most people considering UUIDs)
we're talking about independent devices that occasionally synchronize
data between themselves.  These devices need to generate a unique ID
of some sort without having to check with every other device.  This is
one of the problems that UUIDs were intended to fix.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Bill Moran
In response to Scott Ribe scott_r...@elevated-dev.com:

 On Jan 5, 2011, at 8:55 AM, Bill Moran wrote:
 
  That statement demonstrates a lack of investigation and/or consideration
  of the circumstances.
 
 No, it doesn't.
 
  However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
  be keeping those records for 10+ years, the chances of collisions near
  the end of that 10 year span get high enough to actually make developers
  nervous.
 
 No, they don't. At the end of your hypothetical 10-year period, you will have 
 used about 43,000,000,000 UUIDs, or about 
 1/100,000,000,000,000,000,000,000,000th of the UUID space (assuming random 
 UUIDs). Leaving you with a chance of a single collision of about 
 1/18,000,000,000,000,000.

For crying out loud.  If you're going to pick me apart with numbers, then
actually do it with some intelligence.

I could easily counter your argument by upping the numbers to 500,000
mobile devices generating 1 UUIDs per hour over 20 years ... or raise
it even higher if you come back with that same argument ...

But the point (that you are trying to sidestep) is that the UUID namespace
is finite, so therefore you WILL hit a problem with conflicts at some point.
Just because that point is larger than most people have to concern themselves
with isn't an invalidation.

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: *****SPAM***** Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 11:31 AM, Radosław Smogura wrote:

 The true is that probability 
 that in two coin drops we will get two reverses is 1/4, but true is, too, as 
 Newton said, it's 1/3, because if in 1st drop we don't get reverse we don't 
 need to drop again.

Nonsense. You don't stop generating UUIDs just because you haven't yet got a 
collision.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Dennis Gearon
Yes, the ID is generated before the database is contacted, in my case anyway.

The type of UUID that I want is a hybrid, so I would have to write a stored 
procedure and then a trigger upon insert to get the UUID. Not rocket science, I 
just have more on my plate than I can handle. So PHP is my 'main thing', so I 
quickly wrote it in that. Plus, it makes it more database agnostic.

I don't get next/currval behavior, but I don't think I'll need it.

BTW, Switching from Postgres? Not likely anytime soon. I'd have to be up in the 
500M+ rows and be in the  data warehousing/map reducing  arena before I'd 
consider THAT. And there's 'flavors' of Postgres that will do that, anyway.



Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 

idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Craig Ringer cr...@postnewspapers.com.au
To: Radosław Smogura rsmog...@softperience.eu
Cc: Dennis Gearon gear...@sbcglobal.net; pgsql-general@postgresql.org
Sent: Wed, January 5, 2011 2:50:11 AM
Subject: Re: [GENERAL] UUID column as pimrary key?

On 01/05/2011 07:31 PM, Radosław Smogura wrote:

 * you have your id, before executing query, (in contrast to all this
 autoincrement) so you may put it in dependant rows

Do you mean that with a UUID, you don't need to talk to the database at all, 
you 

can generate an ID with no interaction with / involvement with the database at 
all? Because other than that, there's not much difference in how you normally 
work with them.


With a sequence, you might:

CREATE SEQUENCE x_id_seq;
CREATE TABLE x (
id integer PRIMIARY KEY DEFAULT nextval('x_id_seq'),
y integer
);
INSERT INTO x(y) VALUES (1);


With a uuid, you'd:

CREATE TABLE x (
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
y integer
);
INSERT INTO x(y) VALUES (1);


In either case, you can explicitly call the generator function for seq/uuid - 
nextval(seqname) or uuid_generate_v4() respectively - or you can omit the PK 
column in your inserts and let the database generate it.

 Personally I prefer pooled incremental id's. Fast, unique, you have Id
 before query - but you need to write code by self.

Many libraries / ORMs / etc that interact with Pg will happily take care of 
this 

for you. In fact, I had to fight to convince Hibernate that I *didn't* want it 
to increment all my counters in steps of 50.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 12:03 PM, Bill Moran wrote:

 For crying out loud.  If you're going to pick me apart with numbers, then
 actually do it with some intelligence.

If you're going to get nasty, at least try to be accurate.

 I could easily counter your argument by upping the numbers to 500,000
 mobile devices generating 1 UUIDs per hour over 20 years ... or raise
 it even higher if you come back with that same argument ...

Yeah, then you get into the realm of 1 in 10s of millions of a chance of 
collision. But you would need to explain to me how you would get that many 
records into the database committed to disk, when the UUIDs alone without any 
other data represent a stream of 22MB/s ;-) Or, looked at another way, 
inserting 1,388,889 rows/second would indeed be difficult to sustain. 

 But the point (that you are trying to sidestep) is that the UUID namespace
 is finite, so therefore you WILL hit a problem with conflicts at some point.
 Just because that point is larger than most people have to concern themselves
 with isn't an invalidation.

I'm not sidestepping the point at all. The point is that the finiteness of the 
space is a red herring. The space is large enough that there's no chance of 
collision in any realistic scenario. In order to get to a point where the 
probability of collision is high enough to worry about, you have to generate 
(and collect) UUIDs at a rate that is simply not realistic--as in your second 
example quoted above. If you just keep raising your numbers, you could go for 
100,000,000,000,000 devices generating 100,000,000,000,000 UUIDs an hour for 
10,000 years. Collisions would be guaranteed, but that does not make it a 
useful scenario to consider.

2^256 is a finite space as well. Would you argue that because it is finite, so 
therefore you WILL hit a problem with conflicts at some point? How about 
2^512? (Bearing in mind that even though finite that space would be large 
enough to assign approximately 10^74 UUIDs to every atom in the observable 
universe, or 10^51 UUIDs to every atom in the total universe using high-end 
estimates of the size of the non-observable universe)?


-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Rob Sargent


On 01/05/2011 11:57 AM, Bill Moran wrote:
 In response to Rob Sargent robjsarg...@gmail.com:
 


 On 01/05/2011 08:55 AM, Bill Moran wrote:
 In response to Scott Ribe scott_r...@elevated-dev.com:

 On Jan 5, 2011, at 8:05 AM, Bill Moran wrote:

 Beyond that, the namespace size for a UUID is so incomprehensibly huge
 that the chance of two randomly generated UUIDs having the same value
 is incomprehensibly unlikely

 Yes, as in: it is *far* more likely that all of your team members and all 
 of your client contacts will be simultaneously struck by lightning and 
 killed in their sleep, and it is *far* more likely that all life on earth 
 will be wiped out by an asteroid impact, and it is *far* more likely that 
 the solar system orbits are not actually stable and earth will fly off 
 into space... If you're worried about UUID collisions, then either your 
 priorities are completely wrong, or you live in a bomb shelter--that's not 
 sarcasm by the way, it's simply true, the chance of a collision is so 
 vanishingly small that it is dwarfed by all sorts of risks that we all 
 ignore because the chances are so low, including the chance that all 
 drives in all your RAIDs across all your replicas will simultaneously fail 
 on the same day that fires start in all the locations where your tapes are 
 stored and all the sprinkler systems fail... (By far more likely, I mean 
 many many many orders of magnitude...)


 That statement demonstrates a lack of investigation and/or consideration
 of the circumstances.

 I can't find my math or I'd reproduce it here, but consider this:

 If you have 50 devices, each generating 100 UUIDs per hour, and you'll
 keep records for 1 year, then your argument above is probably accurate.

 However, if there are 5000 devices generating 100 UUIDs per hour, and you'll
 be keeping those records for 10+ years, the chances of collisions near
 the end of that 10 year span get high enough to actually make developers
 nervous.


 But we're talking about a primary key.  Postgres guarantees the
 uniqueness.  1 transaction  in 10^^100 rolls back due to a second
 instance of an (otherwise/almost) uuid.  Big deal.
 
 That doesn't make any sense.  If you're using a single PostgreSQL instance,
 then why not just use the built in SERIAL mechanism that guarantees that
 you will NEVER have a conflict?
 
 In our case (and I expect it's the case with most people considering UUIDs)
 we're talking about independent devices that occasionally synchronize
 data between themselves.  These devices need to generate a unique ID
 of some sort without having to check with every other device.  This is
 one of the problems that UUIDs were intended to fix.
 
Indeed there is a finite space.  A very large, but finite space, just as
sequence has I suspect.  If your software cannot handle a rollback for
whatever reason, you have much bigger problem on your hand than the the
remote chance of a collision in uuid generation.

From wikipedia, only after generating 1 billion UUIDs every second for
the next 100 years, the probability of creating just one duplicate would
be about 50%. The probability of one duplicate would be about 50% if
every person on earth owns 600 million UUIDs.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: *****SPAM***** Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 1:13 PM, Raymond O'Donnell wrote:

 Is that taking dark matter into account? :-)

It's not clear to me ;-)

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Raymond O'Donnell

On 05/01/2011 19:41, Scott Ribe wrote:

to every atom in the observable universe, or 10^51 UUIDs to every
atom in the total universe using high-end estimates of the size of
the non-observable universe)?


Is that taking dark matter into account? :-)

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Bill Moran
In response to Rob Sargent robjsarg...@gmail.com:
  
  In our case (and I expect it's the case with most people considering UUIDs)
  we're talking about independent devices that occasionally synchronize
  data between themselves.  These devices need to generate a unique ID
  of some sort without having to check with every other device.  This is
  one of the problems that UUIDs were intended to fix.

 Indeed there is a finite space.  A very large, but finite space, just as
 sequence has I suspect.  If your software cannot handle a rollback for
 whatever reason, you have much bigger problem on your hand than the the
 remote chance of a collision in uuid generation.

You missed the point.

Despite the fact that the chance of a collision is very, very small, there
is no easy way to fix it if it happens.  Zero.  It can't be done without
shutting the system down, recalling all the remote devices and manually
reconciling the problem ... which is not an option.

Also, it's hard to DETECT the collision.  If a device creates a new record
with a duplicate UUID, how do we tell that apart, during synchronization,
from an update to the record?  Now I have your SSN or private medical
data included as part of my record, which violates laws, could lead to
incorrect medical care that could kill someone ... etc.

So, despite the chance being very small, the consequences are HUGE.  I
can't just detect it and roll back.

If you have a single DB backend, these problem are easy to solve -- so
easy, in fact, that using UUIDs is overkill.

But when you have mission-critical data that must be correct and available
under every circumstance, you have to consider that UUIDs are not a
guarantee.  And when a method that DOES have a guarantee is one of the
options, why would you take ANY risk at all, no matter how small?

(BTW: I hope that the people who think that the risk is acceptable aren't
writing medical software.  Even if it only kills one person every 10,000
years because they were given the wrong medicine, that's too often in
my opinion)

-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Rob Sargent


On 01/05/2011 01:43 PM, Bill Moran wrote:
 In response to Rob Sargent robjsarg...@gmail.com:

 In our case (and I expect it's the case with most people considering UUIDs)
 we're talking about independent devices that occasionally synchronize
 data between themselves.  These devices need to generate a unique ID
 of some sort without having to check with every other device.  This is
 one of the problems that UUIDs were intended to fix.

 Indeed there is a finite space.  A very large, but finite space, just as
 sequence has I suspect.  If your software cannot handle a rollback for
 whatever reason, you have much bigger problem on your hand than the the
 remote chance of a collision in uuid generation.
 
 You missed the point.
 
 Despite the fact that the chance of a collision is very, very small, there
 is no easy way to fix it if it happens.  Zero.  It can't be done without
 shutting the system down, recalling all the remote devices and manually
 reconciling the problem ... which is not an option.
 
 Also, it's hard to DETECT the collision.  If a device creates a new record
 with a duplicate UUID, how do we tell that apart, during synchronization,
 from an update to the record?  Now I have your SSN or private medical
 data included as part of my record, which violates laws, could lead to
 incorrect medical care that could kill someone ... etc.
 
 So, despite the chance being very small, the consequences are HUGE.  I
 can't just detect it and roll back.
 
 If you have a single DB backend, these problem are easy to solve -- so
 easy, in fact, that using UUIDs is overkill.
 
 But when you have mission-critical data that must be correct and available
 under every circumstance, you have to consider that UUIDs are not a
 guarantee.  And when a method that DOES have a guarantee is one of the
 options, why would you take ANY risk at all, no matter how small?
 
 (BTW: I hope that the people who think that the risk is acceptable aren't
 writing medical software.  Even if it only kills one person every 10,000
 years because they were given the wrong medicine, that's too often in
 my opinion)
 

Seems the software would need to know whether an update or an add was
begin performed: is this a new patient or one all ready in the db.  New
record patient record and new id (of some arbitrary nature) and any new
subsidiary data associated appropriately.


BTW, you're opinion of efficacy of current medical practice is also out
of whack.  From a JAMA artical: Similar to previous studies, almost a
quarter (22.7%) of active-care patient deaths were rated as at least
possibly preventable by optimal care, with 6.0% rated as probably or
definitely preventable.  That's 3 in 50. How d'ya like them odds?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Andrew Sullivan
On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote:
 I'm not sidestepping the point at all.

You may be missing it, however, because. . .

 The point is that the finiteness of the space is a red herring. The
 space is large enough that there's no chance of collision in any
 realistic scenario.
 In order to get to a point where the probability
 of collision is high enough to worry about, you have to generate
 (and collect) UUIDs at a rate that is simply not realistic--as in
 your second example quoted above.

. . .the example was not that UUIDs are being generated and collected
in one place at that rate, but that they're being generated in several
independent places at a time, and if the cost of the collision is
extremely high, there might be reasons not to use the UUID strategy
but instead to use something else that is generated algorithmically by
the database.  There's a trade-off in having distributed systems
acting completely independently, and while I have lots of confidence
in my colleagues at the IETF (and agree with you that for the
overwhelming majority of cases UUIDs are guaranteed-unique enough),
correctly making these trade-offs still requires thought and
analysis.  It's exactly the kind of of analysis that professional
paranoids like DBAs are for.

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Alban Hertroys
 From wikipedia, only after generating 1 billion UUIDs every second for
 the next 100 years, the probability of creating just one duplicate would
 be about 50%. The probability of one duplicate would be about 50% if
 every person on earth owns 600 million UUIDs.


Even if the chances of one person encountering a UUID collision are about once 
in say 100 billion years, that could be tomorrow. It could also not happen at 
all in that time span. That's how chance works. You can never assume it won't 
happen. If the costs incurred by a collision are lower than the costs of 
preventing it, you can choose to just take the hit, but that doesn't go for 
_every_ problem domain. The medical systems mentioned up-thread are an example 
of that.

Next to that, UUID's are generated by computers. I have no doubts that the 
numeric space that makes up a UUID allows for collision chances as low as 
described, but are computers capable of generating those numbers sufficiently 
random that they actually achieve that low a chance? I think that's pushing it.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4d24f31d11541020617287!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Rob Sargent


On 01/05/2011 03:39 PM, Alban Hertroys wrote:
 From wikipedia, only after generating 1 billion UUIDs every second for
 the next 100 years, the probability of creating just one duplicate would
 be about 50%. The probability of one duplicate would be about 50% if
 every person on earth owns 600 million UUIDs.
 
 
 Even if the chances of one person encountering a UUID collision are about 
 once in say 100 billion years, that could be tomorrow. It could also not 
 happen at all in that time span. That's how chance works. You can never 
 assume it won't happen. If the costs incurred by a collision are lower than 
 the costs of preventing it, you can choose to just take the hit, but that 
 doesn't go for _every_ problem domain. The medical systems mentioned 
 up-thread are an example of that.
 
 Next to that, UUID's are generated by computers. I have no doubts that the 
 numeric space that makes up a UUID allows for collision chances as low as 
 described, but are computers capable of generating those numbers sufficiently 
 random that they actually achieve that low a chance? I think that's pushing 
 it.
 
 Alban Hertroys
 
 --
 If you can't see the forest for the trees,
 cut the trees and you'll see there is no forest.
 
 
 !DSPAM:1214,4d24f31711541026711723!
 
 
I'm not talking about chance. I'm talking about robust software.  You
have to expect some sort of rollback for any of a number of reasons and
deal with them.  The potential UUID collision is just one example of a
contingency one might plan for after having taken care of all the more
probable failure points.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread dennis jenkins
On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran wmo...@potentialtech.com wrote:

 But the point (that you are trying to sidestep) is that the UUID namespace
 is finite, so therefore you WILL hit a problem with conflicts at some point.
 Just because that point is larger than most people have to concern themselves
 with isn't an invalidation.

The UUID itself is 128 bits.  Some of those bits are pre-determined.
I don't recall, but I think that a normal UUID has 121 bits of
randomness.

How many would one have to store in a database before a collision
would even be a concern.  Such a database would be freaking huge.
Probably far larger than anything that anyone has.

Lets say (I'm pulling numbers out of my ass here), that you wanted to
store 2^100 rows in a table.  Each row would have a UUID and some
other meaningful data.  Maybe a short string or something.  I don't
recall what the postgresql row overhead is (~20 bytes?), but lets say
that each row in your magic table of death required 64 bytes.  A table
with 2^100 rows would require nearly 10^31 bytes ( = log_10(64 *
2^100)).  How on Earth would you store that much data?  And why would
you ever need to?

I postulate that UUID collisions in Postgresql, using a good source
for UUID generation, is unlikely to have collisions for any reasonable
database.

Food for thought:
http://blogs.sun.com/dcb/entry/zfs_boils_the_ocean_consumes

ps- If my math is off, I apologize.  Its been a long day...

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Radosław Smogura
Pooled id.. child machine connects to main and says give 1 of ids, 
main increments counter by 1, child allocates in given pool.


On Wed, 5 Jan 2011 17:11:35 -0600, Michael Satterwhite 
mich...@weblore.com wrote:

Once multiple machines are linked to maintain the database, this has
a flaw in
it as a series is not guaranteed to be unique between machines 
(actually I

think they're pretty well guaranteed *NOT* to be


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Szymon Guz
On 6 January 2011 00:32, dennis jenkins dennis.jenkins...@gmail.com wrote:

 On Wed, Jan 5, 2011 at 1:03 PM, Bill Moran wmo...@potentialtech.com
 wrote:
 
  But the point (that you are trying to sidestep) is that the UUID
 namespace
  is finite, so therefore you WILL hit a problem with conflicts at some
 point.
  Just because that point is larger than most people have to concern
 themselves
  with isn't an invalidation.

 The UUID itself is 128 bits.  Some of those bits are pre-determined.
 I don't recall, but I think that a normal UUID has 121 bits of
 randomness.

 How many would one have to store in a database before a collision
 would even be a concern.  Such a database would be freaking huge.
 Probably far larger than anything that anyone has.


Not necessarily.

Bad luck: two inserts, and one collision.
Good luck: many many inserts, without any collision.

If you have ten pairs of socks... how many do you need to choose, to have
two from the same pair (to have a collision)?
Good luck: 2
Bad luck: 11

With uuid you can write a program that most of the time works correctly, but
sometimes not.
If you need a program that works correctly all the time, you can use UUID,
with the overhead of a procedure, that checks the collisions, and does
something with that, or you could generate the PK without UUID, and be sure
that the generator won't have collisions, just because it is a normal
sequence.

regards
Szymon


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Chris Browne
dal...@solfertje.student.utwente.nl (Alban Hertroys) writes:
 From wikipedia, only after generating 1 billion UUIDs every second for
 the next 100 years, the probability of creating just one duplicate would
 be about 50%. The probability of one duplicate would be about 50% if
 every person on earth owns 600 million UUIDs.


 Even if the chances of one person encountering a UUID collision are
 about once in say 100 billion years, that could be tomorrow. It could
 also not happen at all in that time span. That's how chance works. You
 can never assume it won't happen. If the costs incurred by a collision
 are lower than the costs of preventing it, you can choose to just take
 the hit, but that doesn't go for _every_ problem domain. The medical
 systems mentioned up-thread are an example of that.

 Next to that, UUID's are generated by computers. I have no doubts that
 the numeric space that makes up a UUID allows for collision chances as
 low as described, but are computers capable of generating those
 numbers sufficiently random that they actually achieve that low a
 chance? I think that's pushing it.

I think I'd rather analyze this by:
a) Reading the RFC, and
b) Looking at some of the common implementations likely to get used

rather than to merely have a doubt.

RFC 4122 does NOT point to randomness as the only criterion to
discourage collisions, and treating UUIDs as if they were merely about
being sufficiently random to achieve low chance of collision is
insulting to the drafters of the standard, because they were certainly
NOT so naive as to think that was sufficient.
-- 
cbbrowne,@,acm.org
What I find most amusing  about com and .NET  is that they are trying
to solve a problem I only had when programming using MS tools.
-- Max M m...@mxm.dk (on comp.lang.python)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Chris Browne
a...@crankycanuck.ca (Andrew Sullivan) writes:
 On Wed, Jan 05, 2011 at 12:41:43PM -0700, Scott Ribe wrote:
 I'm not sidestepping the point at all.

 You may be missing it, however, because. . .

 The point is that the finiteness of the space is a red herring. The
 space is large enough that there's no chance of collision in any
 realistic scenario.
 In order to get to a point where the probability
 of collision is high enough to worry about, you have to generate
 (and collect) UUIDs at a rate that is simply not realistic--as in
 your second example quoted above.

 . . .the example was not that UUIDs are being generated and collected
 in one place at that rate, but that they're being generated in several
 independent places at a time, and if the cost of the collision is
 extremely high, there might be reasons not to use the UUID strategy
 but instead to use something else that is generated algorithmically by
 the database.  There's a trade-off in having distributed systems
 acting completely independently, and while I have lots of confidence
 in my colleagues at the IETF (and agree with you that for the
 overwhelming majority of cases UUIDs are guaranteed-unique enough),
 correctly making these trade-offs still requires thought and
 analysis.  It's exactly the kind of of analysis that professional
 paranoids like DBAs are for.

But it seems to me that some of the analytics are getting a little *too*
paranoid, on the perhaps UUIDs are the wrong answer side of the
column.

There's no panaceas, here; if the process that is using IDs is fragile,
then things can break down whether one is using UUID or SERIAL.

I prefer the probably unique enough side of the fence, myself.

And the process that uses the IDs needs to be robust enough that things
won't just fall apart in tatters if it runs into non-uniqueness.

I'd expect that to not need to be a terribly big deal - if there's a
UNIQUE index on a UUID-based column, then an insert will fail, and the
process can pick between things like:
 - Responding that it had a problem, or
 - Retrying.

And if the system isn't prepared for that sort of condition, then it's
also not prepared for some seemingly more likely error conditions such
as:
 - The DB connection timed out because something fuzzed out on the
   network
 - The DB server fell over and is restarting because (power failed,
   someone kicked the switch, disk ran out, ...)

It seems rather silly to be *totally* paranoid about the
not-infinite-uniqueness of UUIDs when there are plenty of other risks
lurking around that also need erro checking.
-- 
cbbrowne,@,gmail.com
http://linuxdatabases.info/info/slony.html
How can you dream the impossible dream when you can't get any sleep?
-- Sam Robb

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: *****SPAM***** Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 3:03 PM, Andrew Sullivan wrote:

 ...the example was not that UUIDs are being generated and collected
 in one place at that rate, but that they're being generated in several
 independent places at a time...

In order for a collision to matter, really in order for there to *be* a 
collision, the duplicate pair has to be collected in one place.

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: *****SPAM***** Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Scott Ribe
On Jan 5, 2011, at 4:11 PM, Michael Satterwhite wrote:

 Each machine would have a unique machine_id. This would guarantee uniqueness 
 and be very easy to maintain.

And if somebody clones the disk to a new machine, but leaves the old one in 
service? Or do you use the MAC address and hope that's unique?

-- 
Scott Ribe
scott_r...@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Andrew Sullivan
On Wed, Jan 05, 2011 at 06:22:08PM -0500, Chris Browne wrote:
 But it seems to me that some of the analytics are getting a little *too*
 paranoid, on the perhaps UUIDs are the wrong answer side of the
 column.

That could be.  I was simply noting that there are cases where one
could legitimately decide that the UUID is a bad fit.  I was just
objecting to the seeming (to me anyway) claim that UUIDs can always be
used.

 There's no panaceas, here; if the process that is using IDs is fragile,
 then things can break down whether one is using UUID or SERIAL.

Not necessarily.  In a distributed system where some rows will
sometimes (and unpredictably) be shared, guaranteeing that two systems
cannot generate the same ID could be really important.  In those
cases, it's probably better to have a fixed generator ID plus a serial
(or, for that matter, a UUID) because then you can be sure you don't
run into one another.  (Of course, some UUID examples already have
this built in.  It sort of depends on the algorithm one is using.)

 I prefer the probably unique enough side of the fence, myself.

Me too, most of the time.

 And the process that uses the IDs needs to be robust enough that things
 won't just fall apart in tatters if it runs into non-uniqueness.

But that robustness requirement might be impossible in a distributed
system, was all I was trying to point out.

 It seems rather silly to be *totally* paranoid about the
 not-infinite-uniqueness of UUIDs when there are plenty of other risks
 lurking around that also need erro checking.

I fully agree with this.

A
-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Andrew Sullivan
On Wed, Jan 05, 2011 at 07:44:29PM -0700, Scott Ribe wrote: 

 In order for a collision to matter, really in order for there to *be* a 
 collision, the duplicate pair has to be collected in one place.

Not at the time of generation, though.  They only have to end up in
the same place at once.  For most cases, the value of the data
compared to the improbably low liklihood of collision means that you
deal with this case by saying, Yuck.  Well, someone loses.  But if
the data is valuable enough (say, unique identifiers for nuclear
warheads), that's just not an acceptable trade-off.

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-05 Thread Andrew Sullivan
On Wed, Jan 05, 2011 at 07:46:18PM -0700, Scott Ribe wrote:
 And if somebody clones the disk to a new machine, but leaves the old one in 
 service? Or do you use the MAC address and hope that's unique?
 

Excellent questions, and exactly the sort (as I was arguing elsewhere
in this thread) one has to answer in order to decide what the right
strategy is.

For instance, it may be that your application need is dealing with
incredibly valuable identifiers, that not cloning the disk is a matter
of national security, that the cost of finding and fixing a failed
unique identification case runs into the billions of dollars (not to
mention the millions of deaths), and that the assignment is performed
in real time by distributed systems that only eventually, if at all,
deliver data to a unified system.  In this case, the cost of the
failure of uniqueness is very high, and it would be prudent to arrange
something that guaranteed that no assigner node could possibly create
a duplicate identifier.  There are UUID approaches that can do this;
they mostly work by guaranteeing a fixed and assignable local part.
UUID vs. serial for one of those columns seems to me to be nothing but
an application efficiency problem.

What is far more likely is that ordinary-value identifiers are being
generated, and that you are using them for more or less pedestrian
reasons.  The cost of an undetected duplicate is still maybe millions
of dollars, but the chances of a duplicate going undetected at
creation time (like, say, the MAC address of the creating machine is
used, the useful lifetime of the UUID is on the order of single-digit
years, and there are no more than 10 database back ends involved) are
fairly low.  In this case, using a proven UUID generator seems like a
no-brainer to me (and in fact, I work on an application where we do
this).

I don't understand the number of knees in this conversation that seem
to be jerking against the answer, It depends.  Of course it depends.
If there were one answer for everything, developing good
database-backed systems wouldn't be something people would pay any of
us for.

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-04 Thread Joshua D. Drake
On Tue, 2011-01-04 at 11:07 -0800, Dennis Gearon wrote:
 I haven't been able to find anywhere, easily, in the documentation using 
 google 
 where a list of allowed data types for primary keys is.

Anything that can be UNIQUE NOT NULL

 
 So, UUIDs can be primary keys?

Yes.

JD


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-04 Thread David Wall
We're using UUID for primary keys in PG 8.4 without any issues.  I have 
no real insights into the details or performance issues, but always 
figured it was stored as a binary 128-bit value, but with added benefits 
of being able to enter and view them using a standard string format.  We 
don't sort them as they have no real meaning for us.


On 1/4/2011 11:07 AM, Dennis Gearon wrote:

I haven't been able to find anywhere, easily, in the documentation using google
where a list of allowed data types for primary keys is.

So, UUIDs can be primary keys?
Any issues wtih them on sorting or paging of index tables, etc.?

Also, the documentation says that UUIDs are 128 bit value, but never explicitly
says that's how it's stored. Nor does it use one of the nice, blue headered
tables for UUID (or ENUM) showing storage and other attributes as it does for
numeric, character,boolean, date/time, binary, monetary, geometric, or network
types.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UUID column as pimrary key?

2011-01-04 Thread Chris Browne
d.w...@computer.org (David Wall) writes:
 We're using UUID for primary keys in PG 8.4 without any issues.  I
 have no real insights into the details or performance issues, but
 always figured it was stored as a binary 128-bit value, but with added
 benefits of being able to enter and view them using a standard string
 format.  We don't sort them as they have no real meaning for us.

In principle, this might be a reason to want to do the long-outstanding
work on hash indexes; with UUIDs, it mayn't be useful to sort the
values, but you *do* want to be able to validate that they're unique.
-- 
output = (cbbrowne @ gmail.com)
http://www3.sympatico.ca/cbbrowne/
Computers are  like air conditioners:  They stop working  properly if
you open windows.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general