On Tue, Nov 8, 2016 at 2:58 PM, Craig Ringer <craig.rin...@2ndquadrant.com>
wrote:

> 2ndQuadrant/bdr


That is similar. I'm not clear on the usage of OID for sequence (`
DirectFunctionCall1(nextval_oid, seqoid)`) ... does that imply a lock
around a sequence generation? also different is that your sequence doesn't
reset on the time basis, it ascends and wraps independently of the time.

(also, you appear to modulo against the max (2^n-1), not the cardinality
(2^n), ... should that be an & ... i.e. take SEQUENCE_BITS of 1 ->
MAX_SEQ_ID of ((1<<1)-1) = 1 -> (seq % 1) = {0} ... not {0,1} as expected;
(seq & 1) = {0,1} as expected)

We tried 64-bit values for ids (based on twitter's snowflake), but found
that time-replay would cause collisions. We had a server have its time
corrected, going backwards, by an admin; leading to duplicate ids being
generated, leading to a fun day of debugging and a hard lesson about our
assumption that time always increases over time. Using node_id doesn't
protect against this, since it is the same node creating the colliding ids
as the original ids. By extending the ids to include a significant amount
of randomness, and requiring a restart of the db for the time value to move
backwards (by latching onto the last seen time), we narrow the window for
collisions to close enough to zero that winning the lottery is far more
likely (http://preshing.com/20110504/hash-collision-probabilities/ has the
exact math). We also increase the time scale for id wrap around to long
past the likely life expectancy of the software we're building today.

-- 
Clifford Hammerschmidt, P.Eng.

Reply via email to