Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-28 Thread Marti Raudsepp
On Fri, Apr 25, 2014 at 8:58 PM, Josh Berkus j...@agliodbs.com wrote:
 Well, I've already had collisions with UUID-OSSP, in production, with
 only around 20 billion values.  So clearly there aren't 122bits of true
 randomness in OSSP.  I can't speak for other implementations because I
 haven't tried them.

Interesting. The statistical chances of this happening should be
approximately 4e-17. Are you certain that this was due to uuid-ossp
and not an application bug?

Can you say what kind of operating system and environment that was? I
skimmed the sources of uuid-ossp 1.6.2 and it seems to be doing the
right thing, using /dev/urandom or /dev/random on Unixes and
CryptGenRandom on Windows. Barring any bugs, of course. However, if
these fail for whatever reason (e.g. out of file descriptors), it
falls back to libc random(), which is clearly broken.

On Fri, Apr 25, 2014 at 6:18 PM, Greg Stark st...@mit.edu wrote:
 The difficulty lies not really in the PRNG implementation (which is
 hard but well enough understood that it's not much of an issue these
 days). The difficulty lies in obtaining enough entropy. There are ways
 of obtaining enough entropy and they are available.

 Obtaining enough entropy requires access to hardware devices which
 means a kernel system call.

This is a solved problem in most environments, too. The kernel
collects entropy from unpredictable events and then seeds a global
CSPRNG with that. This collection happens always regardless of whether
you request random numbers or not, so essentially comes for free.
Applications can then request output from this CSPRNG.

Reason being, this infrastructure is necessary for more critical tasks
than generating UUIDs: pretty much all of cryptography requires random
numbers.

 They also deplete
 the available entropy pool for other sources which may means they have
 security consequences.

This only affects the Linux /dev/random, which is discouraged these
days for that reason. Applications should use urandom instead. To my
knowledge, there are no other operating systems that have this
depletion behavior.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-26 Thread Martijn van Oosterhout
On Fri, Apr 25, 2014 at 04:18:18PM +0100, Greg Stark wrote:
 Which isn't to say they're a bad idea but like everything else in
 engineering there are tradeoffs and no such thing as a free lunch.
 You can avoid depleting the entropy pool by including data you expect
 to be unique as a kind of fake entropy -- which quickly gets you back
 to looking for things like MAC address to avoid duplicates across
 systems.

ISTM you could use the database identifier we already have to at least
produce UUIDs which are unique amongst PostgreSQL instances. That
might be something worth aiming for?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-26 Thread Tom Lane
Martijn van Oosterhout klep...@svana.org writes:
 On Fri, Apr 25, 2014 at 04:18:18PM +0100, Greg Stark wrote:
 Which isn't to say they're a bad idea but like everything else in
 engineering there are tradeoffs and no such thing as a free lunch.
 You can avoid depleting the entropy pool by including data you expect
 to be unique as a kind of fake entropy -- which quickly gets you back
 to looking for things like MAC address to avoid duplicates across
 systems.

 ISTM you could use the database identifier we already have to at least
 produce UUIDs which are unique amongst PostgreSQL instances. That
 might be something worth aiming for?

It's worth noting in this connection that we've never tried hard to ensure
that database identifiers are actually unique.  One potentially serious
issue is that slave servers will have the same identifier as their master.

Also, I think there's a still-open issue that creation of the identifier
has a thinko about using OR instead of XOR, resulting in way few bits of
freedom than it should have even with the limited amount of entropy used.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-26 Thread Josh Berkus
On 04/26/2014 11:18 AM, Tom Lane wrote:
 It's worth noting in this connection that we've never tried hard to ensure
 that database identifiers are actually unique.  One potentially serious
 issue is that slave servers will have the same identifier as their master.

Yeah, this is one of those things I've been thinking about.  The proble
is that we need a node ID, which identifies the PostgreSQL instance,
and a dataset ID, which identifies the chain of data, especially when
combined with the timeline ID.  So a master and replica would have
different node IDs, but the same dataset ID, until the replica is
promoted, at which point its dataset ID + timeline No. would change.
This would allow for relatively easy management of large clusters by
allowing automated identification of databases and their mirrors.

However, there's a fundamental problem with the concept of the dataset
ID in that there's absolutely no way for PostgreSQL to know when it has
a unique dataset.  Consider a downtime database file cloning for
example; the two databases would have the same identifier and yet both
be standalones which quickly diverge.  So I haven't thought of a good
solution to that.

We could implement a NodeID, though, based on some combination of IP/MAC
address and port though.  Not entirely reliable, but better than nothing ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-26 Thread Josh Berkus
On 04/25/2014 11:46 AM, David Fetter wrote:
 On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote:
 You may say oh, that's not the job of the identifer, but if it's not,
 WTF is the identifer for, then?
 
 Frequently, it's to provide some kind of opacity in the sense of not
 have an obvious predecessor or successor.

A far better solution to that is to not share the unadorned ID with the
user.

Basically, there's two different reasons to offer UUIDs in PostgreSQL:

1) because they actually serve a useful purpose in providing a globally
unique identifier;

2) because they work well with existing platforms and frameworks.

Given the state of the art, the above two goals are separate and
exclusive, apologists for poorly conceived UUID algorithms
nonwithstanding.  So either we provide a UUID type which actually helps
identify unique entities between database servers, OR we supply a UUID
which just works with popular web frameworks, or we supply both *as
two or more different types*.  But claiming that types chosen because
they're popular are also technically sound is misleading at best.

Further, based on our experience with OSSP, if we're going to make a
UUId type in core because it's currently popular, we'd better be pretty
sure that it's still going to be popular in 5 or 10 years from now.
Otherwise we're better off keeping it an extension.

I personally am interested in a UUID type which would support doing
multi-master replication of JSON databases built on PostgreSQL, and will
probably write one if nobody else does first, and I don't see existing,
naive randomization-based UUIDS as ever filling that role adequately.
Although, as I said, Andres' work in this area may have already taken
care of this.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-26 Thread Jim Nasby

On 4/25/14, 12:58 PM, Josh Berkus wrote:

Well, I've already had collisions with UUID-OSSP, in production, with
only around 20 billion values.  So clearly there aren't 122bits of true
randomness in OSSP.  I can't speak for other implementations because I
haven't tried them.


Or perhaps you should be buying lottery tickets? ;)

Can you write this up in a blog post? I've argued with people more than once about why 
it's a bad idea to trust on 1 in a bazillion odds to protect your data 
(though, usually in the context of SHA1), and it'd be good to be able to point at a real 
world example of this failing.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-26 Thread Greg Stark
On Sat, Apr 26, 2014 at 8:58 PM, Josh Berkus j...@agliodbs.com wrote:
 However, there's a fundamental problem with the concept of the dataset
 ID in that there's absolutely no way for PostgreSQL to know when it has
 a unique dataset.  Consider a downtime database file cloning for
 example; the two databases would have the same identifier and yet both
 be standalones which quickly diverge.  So I haven't thought of a good
 solution to that.

If you're content to use random numbers then you could generate one
from system entropy on every startup. If you generated a new timeline
for every startup then the pair of system id and random startup id
(which would be the new timelineid) would let you look at any two
instances and determine if they're related and where they diverged
even if it was from a database clone.

I don't think MAC address or other hardware identifiers really saves
you from using system entropy anyways. You might very well install a
clone on the same machine and in an environment like Heroku you could
very easily end up restoring a database onto the same VM twice
entirely by accident. I actually think using /dev/urandom is a better
idea than depending on things like MAC address almost always.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-25 Thread Greg Stark
On Fri, Apr 25, 2014 at 1:43 AM, Marti Raudsepp ma...@juffo.org wrote:
 Obviously you can't use random(). That's why I talked about
 cryptographic PRNGs, crypto libraries do proper seeding and generate
 reliably random numbers all the time.


The difficulty lies not really in the PRNG implementation (which is
hard but well enough understood that it's not much of an issue these
days). The difficulty lies in obtaining enough entropy. There are ways
of obtaining enough entropy and they are available. But they're not
free.

Obtaining enough entropy requires access to hardware devices which
means a kernel system call. Kernel system calls are relatively slow
when you're talking about generating sequential IDs. They also deplete
the available entropy pool for other sources which may means they have
security consequences.

Which isn't to say they're a bad idea but like everything else in
engineering there are tradeoffs and no such thing as a free lunch.
You can avoid depleting the entropy pool by including data you expect
to be unique as a kind of fake entropy -- which quickly gets you back
to looking for things like MAC address to avoid duplicates across
systems.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-25 Thread Josh Berkus
On 04/24/2014 05:23 PM, Marti Raudsepp wrote:
 On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus j...@agliodbs.com wrote:
 A pseudo-random UUID is frankly pretty
 useless to me because (a) it's not really unique
 
 This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
 long as you can trust the random number generator, the chances of a
 value occurring twice can be estimated using the birthday paradox:
 there's a 50% chance of having *one* collision in a set of 2^61 items.
 Storing this amount of UUIDs alone requires 32 exabytes of storage.
 Factor in the tuple and indexing overheads and you'd be needing close
 to all the hard disk space ever manufactured in the world.

Well, I've already had collisions with UUID-OSSP, in production, with
only around 20 billion values.  So clearly there aren't 122bits of true
randomness in OSSP.  I can't speak for other implementations because I
haven't tried them.

 (b) it doesn't help me route data at all.
 
 That's really out of scope for UUIDs. They're about generating
 identifiers, not describing what the identifier means. UUIDs also
 don't happen to cure cancer.

http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327

On the contrary, I would argue that an object identifier which is
completely random is possibly the worst way to form an ID of all
possible concepts; there's no relationship whatsoever between the ID,
the application stack, and the application data; you don't even get the
pseudo-time indexing you get with Serials.   The only reason to do it is
because you're too lazy do implement a better way.

Or to put it another way: a value which is truly random is no identifier
at all.

Compare this with a composite identifier which carries information about
the node, table, and schema of origin for the tuple.  Not only does this
help ensure uniqueness, but it also supports intelligent sharding and
multi-master replication systems.  I don't speak hypothetically; we've
done this in the past and will do it again in the future.

I would love to have some machinery inside PostgreSQL to make this
easier (for example, a useful unique database ID), but I suspect that
acutal implementation will always remain application-specific.

You may say oh, that's not the job of the identifer, but if it's not,
WTF is the identifer for, then?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-25 Thread David Fetter
On Fri, Apr 25, 2014 at 10:58:29AM -0700, Josh Berkus wrote:
 On 04/24/2014 05:23 PM, Marti Raudsepp wrote:
  On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus j...@agliodbs.com wrote:
  A pseudo-random UUID is frankly pretty
  useless to me because (a) it's not really unique
  
  This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
  long as you can trust the random number generator, the chances of a
  value occurring twice can be estimated using the birthday paradox:
  there's a 50% chance of having *one* collision in a set of 2^61 items.
  Storing this amount of UUIDs alone requires 32 exabytes of storage.
  Factor in the tuple and indexing overheads and you'd be needing close
  to all the hard disk space ever manufactured in the world.
 
 Well, I've already had collisions with UUID-OSSP, in production, with
 only around 20 billion values.  So clearly there aren't 122bits of true
 randomness in OSSP.  I can't speak for other implementations because I
 haven't tried them.
 
  (b) it doesn't help me route data at all.
  
  That's really out of scope for UUIDs. They're about generating
  identifiers, not describing what the identifier means. UUIDs also
  don't happen to cure cancer.
 
 http://it.toolbox.com/blogs/database-soup/primary-keyvil-part-i-7327
 
 On the contrary, I would argue that an object identifier which is
 completely random is possibly the worst way to form an ID of all
 possible concepts; there's no relationship whatsoever between the ID,
 the application stack, and the application data; you don't even get the
 pseudo-time indexing you get with Serials.   The only reason to do it is
 because you're too lazy do implement a better way.
 
 Or to put it another way: a value which is truly random is no identifier
 at all.

Not exactly.  It's at least potentially hiding information an attacker
could use, with all the caveats that carries.

 Compare this with a composite identifier which carries information about
 the node, table, and schema of origin for the tuple.  Not only does this
 help ensure uniqueness, but it also supports intelligent sharding and
 multi-master replication systems.  I don't speak hypothetically; we've
 done this in the past and will do it again in the future.

This is an excellent idea, but I don't think it's in scope for UUIDs.

 I would love to have some machinery inside PostgreSQL to make this
 easier (for example, a useful unique database ID), but I suspect that
 acutal implementation will always remain application-specific.
 
 You may say oh, that's not the job of the identifer, but if it's not,
 WTF is the identifer for, then?

Frequently, it's to provide some kind of opacity in the sense of not
have an obvious predecessor or successor.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Josh Berkus
Alvaro,

 I don't understand your point.  I'm only replying to Tom's assertion
 that UUID generation might not be all that unique after all (or, in
 other words, AIUI, that the universally unique part of the name is
 wishful thinking and not an actual property of the real thing.)
 
 Oh, I think I see your point: it's that no matter what we do here, there
 would be no way to guarantee that a value we generate does not collide
 with any other value elsewhere (either on other uuidserial columns, or
 on other servers).
 
 Is that it?
 
 Because if it is, then I think the problem is that the UUID concept
 might be flawed yet users still want to use it, and we do no service by
 refusing to provide it on those grounds.

It's more than that:

1) the concept of UUIDs is fundamentally flawed, to the extent that if
we have a UUID type in core its flaws become our flaws, to be handled in
bug reports forever.

2) Because the concept of UUIDs is flawed, there are multiple competing
implementations, none of which is clearly dominant and durable.  As
such, any UUID algorithm we adopt for core stands a significant risk of
being later abandoned by everyone else and becoming a PostgreSQL wart.

3) In general, users who want UUIDs don't want a generic concept of
UUIDs; they want the specific UUIDs which work with their individual
programming languages, web frameworks, or queueing platforms.  So, see
competing implementations above.

As case in point for (2), as I said upthread: uuid-ossp, which has been
our option for UUID in contrib since originally it was the only OSS
implementation, is now abandoned by everyone but us.

Additionally, were I to adopt a UUID scheme for PostgreSQL, I would want
to to be *for postgresql*, with components indicating server, table and
schema of origin for each ID.  A pseudo-random UUID is frankly pretty
useless to me because (a) it's not really unique, and (b) it doesn't
help me route data at all.

Alternatively, what would be *really* useful is to have a way for an
extension to plug into the serial concept, so that it gets all of the
benefits of serial (permissions, dependancies, etc.) while being able to
call a custom generator function.

Oh, and:

4) IIRC, Andres has already worked out a scheme for distributed serials
to support BDR.  So this is a solved problem for the only really
interesting use case ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Christopher Browne
Last year, I built a pl/pgsql generator of version 1-ish UUIDs, which
would combine timestamps with local information to construct data that kind
of emulated the timestamp+MAC address that is version #1 of UUID.

Note that there are several versions of UUIDs:

1.  Combines MAC address, timestamp, random #
2.  DCE Security (replaces some bits with user's UID/GID and others with
POSIX Domain); I don't think this one is much used...
3.  MD5 Hash
4.  Purely Random
5.  SHA-1 Hash

There are merits to each.  The tough one is #1, as that requires pulling
data that can't generally be accessed portably.

I figured out (and could probably donate some code) how to construct the
bits of #1 using the inputs of *my* choice (e.g. - I set up to make up my
own MAC address surrogate, and transformed PostgreSQL timestamp values into
the timestamp, and threw in my own bit of randomness), which provided
well-formed UUIDs with nice enough characteristics.

It wouldn't be out there to do a somewhat PostgreSQL-flavoured version of
this that wouldn't actually use MAC addresses, but rather, would use data
we have:

a) Having a sequence feeding some local uniqueness would fit with the
clock seq bits (e.g. - the octets in RFC 4122 entitled
clock-seq-and-reserved and clock-seq-low)
b) NOW() provides data for time-low, time-mid, time-high-and-version
c) We'd need 6 hex octets for node; I seem to recall there being
something established by initdb that might be usable.

The only piece that's directly troublesome, for UUID Type 1, is the node
value.  I'll observe that it isn't unusual for UUID implementations to
generate random values for that.

Note that for the other UUID versions, there's NO non-portable data needed.

It seems to me that a UUIDserial type, which combined:
  a) A sequence, to be the 'clock';
  b) Possibly another sequence to store local node ID, which might get
seeded from DB internals
would provide a PostgreSQL-flavoured version of UUID Type 1.


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Marti Raudsepp
On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus j...@agliodbs.com wrote:
 A pseudo-random UUID is frankly pretty
 useless to me because (a) it's not really unique

This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
long as you can trust the random number generator, the chances of a
value occurring twice can be estimated using the birthday paradox:
there's a 50% chance of having *one* collision in a set of 2^61 items.
Storing this amount of UUIDs alone requires 32 exabytes of storage.
Factor in the tuple and indexing overheads and you'd be needing close
to all the hard disk space ever manufactured in the world.

If you believe there's a chance of ever seeing a pseudorandom UUID
collision in practice, you should be buying lottery tickets.

To the contrary. Combined with the fact that pseudorandom UUID
generation doesn't require any configuration (node ID), doesn't leak
any private data (MAC address) and relies on infrastructure that's
ubiquitous anyway (cryptographic PRNG) it's almost always the right
answer.

 (b) it doesn't help me route data at all.

That's really out of scope for UUIDs. They're about generating
identifiers, not describing what the identifier means. UUIDs also
don't happen to cure cancer.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Thu, Apr 24, 2014 at 8:40 PM, Josh Berkus j...@agliodbs.com wrote:
 A pseudo-random UUID is frankly pretty
 useless to me because (a) it's not really unique

 This is FUD. A pseudorandom UUID contains 122 bits of randomness. As
 long as you can trust the random number generator, the chances of a
 value occurring twice can be estimated using the birthday paradox:
 there's a 50% chance of having *one* collision in a set of 2^61 items.

Of course, the weak spot in this analysis is the assumption that there
are actually 122 independent bits in the value.  It's not difficult to
imagine that systems with crummy random() implementations might only have
something like 32 bits worth of real randomness.  Or less.  Seeding your
PRNG from gettimeofday(), for instance, is highly likely to lead to
collisions ... no matter how good the PRNG itself is.

 If you believe there's a chance of ever seeing a pseudorandom UUID
 collision in practice, you should be buying lottery tickets.

Now *that*, I'd call FUD.  The issue here is not whether collisions
are improbable under ideal circumstances.  The issue is how much work
does it take to have some confidence that you're anywhere near the
ideal case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Marti Raudsepp
On Fri, Apr 25, 2014 at 3:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Of course, the weak spot in this analysis is the assumption that there
 are actually 122 independent bits in the value.  It's not difficult to
 imagine that systems with crummy random() implementations might only have
 something like 32 bits worth of real randomness.

Obviously you can't use random(). That's why I talked about
cryptographic PRNGs, crypto libraries do proper seeding and generate
reliably random numbers all the time.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table

2014-04-24 Thread Christopher Browne
On Thu, Apr 24, 2014 at 8:43 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Fri, Apr 25, 2014 at 3:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Of course, the weak spot in this analysis is the assumption that there
  are actually 122 independent bits in the value.  It's not difficult to
  imagine that systems with crummy random() implementations might only have
  something like 32 bits worth of real randomness.

 Obviously you can't use random(). That's why I talked about
 cryptographic PRNGs, crypto libraries do proper seeding and generate
 reliably random numbers all the time.


... And we can't be certain that there won't be some internal
characteristic weakness.

Cryptography is *hard*; treating it as certainty that things will be gotten
correct
tends to be a foolish assumption.

Which is why UUID type 1 resolves this by combining multiple sorts of
anti-correlations, the combination of:
a) Node-specific information (e.g. - in the standard form, parts of the MAC
address), so no collisions between node A and node B.
b) Timestamp, so that things that happen at different times will be kept
unique.
c) An extra sequence, so that if there are multiple events on the same node
at the same time, they *still* don't collide.

I trust the combination to work pretty well, and that's why it was designed
that way.

A RNG, however good, can't provide the same guarantees of lack of conflicts.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?