> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio <postg...@jeltef.nl> wrote:
> 
> Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Thanks!

> Now with the added comments, one thing pops out to me: The comments
> mention that we use "Monotonic Random", but when I read the spec that
> explicitly recommends against using an increment of 1 when using
> monotonic random. I feel like if we use an increment of 1, we're
> better off going for the "Fixed-Length Dedicated Counter Bits" method
> (i.e. change the code to start the counter at 0). See patch 0005 for
> an example of that change.
> 
> I'm also wondering if we really want to use the extra rand_b bits for
> this. The spec says we MAY, but it does remove the amount of
> randomness in our UUIDs.

Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...

UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute 
to global sortability of values generated at frequency less than 1KHz.
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be 
non-integral)
d. Randomness bits.

Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17

Consider all UUIDs generated at any given milliseconds. Probability of a 
collision of two UUIDs generated at frequency less than 1KHz is p = 2^-(c2+d)
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)

First, observe that c3 contributes unguessability at exactly same scale as 
decreases counter capacity. There is no difference between using bits in d 
directly, or in c3. There is no point in non-zero c3. Every bit that could be 
given to c3 can equally be given to d.

Second, observe that c2 bits contribute to both collision protection and 
counter capacity! And when the time ticks, c2 also contribute to 
unguessability! So, technically, we should consider using all available bits as 
c2 bits.

How many c1 bits do we need? I've chosen one - to prevent occasional counter 
capacity reduction.

If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and 
d = 56 as an arbitrary compromise between capacity of one backend per ms and 
prevention of global collision.
This compromise is mostly dictated by maximum frequency of UUID generation by 
one backend, I've chosen 200MHz as a sane value.


This compromise is much easier when you do not have 74 spare bits, this crazy 
amount of information forgives almost any mistake. Imagine you have to 
distribute 10 bits between c2 and d. And you try to prevent collision between 
10 independent devices which need capacity to generate IDs with frequency of 
10KHz each and keep sortability. You would have something like c1=1, c2=3,d=6.

Sorry for this long and vague explanation, if it still seems too uncertain we 
can have a chat or something like that. I don't think this number picking stuff 
deserve to be commented, because it still is quite close to random. RFC gives 
us too much freedom of choice.

Thanks!


Best regards, Andrey Borodin.

Reply via email to