On Fri, 26 May 2023 00:13:57 GMT, Andrei Pangin <apan...@openjdk.org> wrote:
>> UUID is the very important class that is used to track identities of objects >> in large scale systems. On some of our systems, `UUID.randomUUID` takes >1% >> of total CPU time, and is frequently a scalability bottleneck due to >> `SecureRandom` synchronization. >> >> The major issue with UUID code itself is that it reads from the single >> `SecureRandom` instance by 16 bytes. So the heavily contended `SecureRandom` >> is bashed with very small requests. This also has a chilling effect on other >> users of `SecureRandom`, when there is a heavy UUID generation traffic. >> >> We can improve this by doing the bulk reads from the backing SecureRandom >> and possibly striping the reads across many instances of it. >> >> >> Benchmark Mode Cnt Score Error Units >> >> ### AArch64 (m6g.4xlarge, Graviton, 16 cores) >> >> # Before >> UUIDRandomBench.single thrpt 15 3.545 ± 0.058 ops/us >> UUIDRandomBench.max thrpt 15 1.832 ± 0.059 ops/us ; negative scaling >> >> # After >> UUIDRandomBench.single thrpt 15 4.421 ± 0.047 ops/us >> UUIDRandomBench.max thrpt 15 6.658 ± 0.092 ops/us ; positive >> scaling, ~1.5x >> >> ### x86_64 (c6.8xlarge, Xeon, 18 cores) >> >> # Before >> UUIDRandomBench.single thrpt 15 2.710 ± 0.038 ops/us >> UUIDRandomBench.max thrpt 15 1.880 ± 0.029 ops/us ; negative >> scaling >> >> # After >> Benchmark Mode Cnt Score Error Units >> UUIDRandomBench.single thrpt 15 3.099 ± 0.022 ops/us >> UUIDRandomBench.max thrpt 15 3.555 ± 0.062 ops/us ; positive >> scaling, ~1.2x >> >> >> Note that there is still a scalability bottleneck in current default random >> (`NativePRNG`), because it synchronizes over a singleton instance for SHA1 >> mixer, then the engine itself, etc. -- it is quite a whack-a-mole to figure >> out the synchronization story there. The scalability fix in current default >> `SecureRandom` would be much more intrusive and risky, since it would change >> a core crypto class with unknown bug fanout. >> >> Using the bulk reads even when the underlying PRNG is heavily synchronized >> is still a win. A more scalable PRNG would benefit from this as well. This >> PR adds a system property to select the PRNG implementation, and there we >> can clearly see the benefit with more scalable PRNG sources: >> >> >> Benchmark Mode Cnt Score Error Units >> >> ### x86_64 (c6.8xlarge, Xeon, 18 cores) >> >> # Before, hacked `new SecureRandom()` to >> `SecureRandom.getInstance("SHA1PRNG")` >> UUIDRandomBench.single thrpt ... > > src/java.base/share/classes/java/util/UUID.java line 224: > >> 222: // Try to pull the UUID from the current buffer at >> current position. >> 223: if (stamp != 0) { >> 224: int p = (int)VH_POS.getAndAdd(this, UUID_CHUNK); > > An atomic update together with an optimistic lock looks non-idiomatic use of > StampedLock to me. Won't a simple CAS loop be simpler? E.g. in pseudocode: > > while ((p = this.pos) + 16) < buf.length) { > long msb = getLong(buf, p); > long lsb = getLong(buf, p + 8); > if (cas(this.pos, p, p + 16)) { > return new UUID(msb, lsb); > } > } > > // refill buffer under lock Nope, I don't think you cannot do this, because there is a race on concurrently-updating `buf`. The StampedLock (RW lock), protects the buffer-reading threads from seeing the `buf` undergoing the initialization by buffer-writing threads. Atomic pos only arbitrates the concurrent buffer-threads. That atomic does not help to resolve the buf races. > src/java.base/share/classes/java/util/UUID.java line 260: > >> 258: buf[c + 8] &= 0x3f; /* clear variant */ >> 259: buf[c + 8] |= (byte) 0x80; /* set to IETF >> variant */ >> 260: } > > I'm not sure I understand the point about initialization. Why not just > setting the required version bits right in UUID constructor without updating > the array? This will be faster and simpler IMO. Right, we can do things straight on locals, without ever touching the heap, let's see... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206430447 PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206433588