Sorry for this long reply. I was looking on refactoring around pg_strong_random() and could not decide what to do. Finally, I decided to post at least something.
> On 22 Mar 2024, at 19:15, Peter Eisentraut <pe...@eisentraut.org> wrote: > > I have been studying the uuidv() function. > > I find this code extremely hard to follow. > > We don't need to copy all that documentation from the RFC 4122bis document. > People can read that themselves. What I would like to see is easy to find > information what from there we are implementing. Like, > > - UUID version 7 > - fixed-length dedicated counter > - counter is 18 bits > - 4 bits are initialized as zero I've removed table taken from RFC. > That's more or less all I would need to know what is going on. > > That said, I don't understand why you say it's an 18 bit counter, when you > overwrite 6 bits with variant and version. Then it's just a 12 bit counter? > Which is the size of the rand_a field, so that kind of makes sense. But 12 > bits is the recommended minimum, and (in this patch) we don't use > sub-millisecond timestamp precision, so we should probably use more than the > minimum? No, we use 4 bits in data[6], 8 bits in data[7], and 6 bits data[8]. It's 18 total. Essentially, we use both partial bytes and one whole byte between. There was a bug - we used 1 extra byte of random numbers that was not necessary, I think that's what lead you to think that we use 12-bit counter. > Also, you are initializing 4 bits (I think?) to zero to guard against counter > rollovers (so it's really just an 8 bit counter?). But nothing checks > against such rollovers, so I don't understand the use of that. No, there's only one guard rollover bit. Here: uuid->data[6] = (uuid->data[6] & 0xf7); Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized. Rollover is carried into time tick here: ++sequence_counter; if (sequence_counter > 0x3ffff) { /* We only have 18-bit counter */ sequence_counter = 0; previous_timestamp++; } I think we might use 10 bits of microseconds and have 8 bits of a counter. Effect of a counter won't change much. But I'm not sure if this is allowed per RFC. If time source is coarse-grained it still acts like a random initializer. And when it is precise - time is "natural" source of entropy. > The code code be organized better. In the not-increment_counter case, you > could use two separate pg_strong_random calls: One to initialize rand_b, > starting at &uuid->data[8], and one to initialize the counter. Then the > former could be shared between the two branches, and the code to assign the > sequence_counter to the uuid fields could also be shared. Call to pg_strong_random() is very expensive in builds without ssl (and even with ssl too). If we could ammortize random numbers in small buffers - that would save a lot of time (see v8-0002-Buffer-random-numbers.patch upthread). Or, perhaps, we can ignore cost of two pg_string_random() calls. > > I would also prefer if the normal case (not-increment_counter) were the first > if branch. Done. > Some other notes on your patch: > > - Your rebase duplicated the documentation of uuid_extract_timestamp and > uuid_extract_version. > > - PostgreSQL code uses uint64 etc. instead of uint64_t etc. > > - It seems the added includes > > #include "access/xlog.h" > #include "utils/builtins.h" > #include "utils/datetime.h" > > are not needed. > > - The static variables sequence_counter and previous_timestamp could be kept > inside the uuidv7() function. Fixed. Thanks! Best regards, Andrey Borodin.
v24-0001-Implement-UUID-v7.patch
Description: Binary data