Re: RFR: 8308804: Improve UUID.randomUUID performance with bulk/scalable PRNG access [v2]

Aleksey Shipilev Fri, 26 May 2023 02:52:27 -0700

> UUID is the very important class that is used to track identities of objects 
> in large scale systems. On some of our systems, `UUID.randomUUID` takes >1% 
> of total CPU time, and is frequently a scalability bottleneck due to 
> `SecureRandom` synchronization.
> 
> The major issue with UUID code itself is that it reads from the single 
> `SecureRandom` instance by 16 bytes. So the heavily contended `SecureRandom` 
> is bashed with very small requests. This also has a chilling effect on other 
> users of `SecureRandom`, when there is a heavy UUID generation traffic.
> 
> We can improve this by doing the bulk reads from the backing SecureRandom and 
> possibly striping the reads across many instances of it. 
> 
> 
> Benchmark               Mode  Cnt  Score   Error   Units
> 
> ### AArch64 (m6g.4xlarge, Graviton, 16 cores)
> 
> # Before
> UUIDRandomBench.single  thrpt   15  3.545 ± 0.058  ops/us
> UUIDRandomBench.max     thrpt   15  1.832 ± 0.059  ops/us ; negative scaling
> 
> # After
> UUIDRandomBench.single  thrpt   15  4.421 ± 0.047  ops/us 
> UUIDRandomBench.max     thrpt   15  6.658 ± 0.092  ops/us ; positive scaling, 
> ~1.5x
> 
> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
> 
> # Before
> UUIDRandomBench.single  thrpt   15  2.710 ± 0.038  ops/us
> UUIDRandomBench.max     thrpt   15  1.880 ± 0.029  ops/us  ; negative scaling 
> 
> # After
> Benchmark                Mode  Cnt  Score   Error   Units
> UUIDRandomBench.single  thrpt   15  3.099 ± 0.022  ops/us
> UUIDRandomBench.max     thrpt   15  3.555 ± 0.062  ops/us  ; positive 
> scaling, ~1.2x
> 
> 
> Note that there is still a scalability bottleneck in current default random 
> (`NativePRNG`), because it synchronizes over a singleton instance for SHA1 
> mixer, then the engine itself, etc. -- it is quite a whack-a-mole to figure 
> out the synchronization story there. The scalability fix in current default 
> `SecureRandom` would be much more intrusive and risky, since it would change 
> a core crypto class with unknown bug fanout.
> 
> Using the bulk reads even when the underlying PRNG is heavily synchronized is 
> still a win. A more scalable PRNG would benefit from this as well. This PR 
> adds a system property to select the PRNG implementation, and there we can 
> clearly see the benefit with more scalable PRNG sources:
> 
> 
> Benchmark               Mode  Cnt   Score   Error   Units
> 
> ### x86_64 (c6.8xlarge, Xeon, 18 cores)
> 
> # Before, hacked `new SecureRandom()` to 
> `SecureRandom.getInstance("SHA1PRNG")`
> UUIDRandomBench.single  thrpt   15  3.661 ± 0.008  ops/us
> UUIDRandomBench...


Aleksey Shipilev has updated the pull request incrementally with two additional 
commits since the last revision:

 - Handle privileged properties
 - Use ByteArray to convert. Do version/variant preparations straight on 
locals. Move init out of optimistic lock section.

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/14135/files
  - new: https://git.openjdk.org/jdk/pull/14135/files/51dc2903..be38dffe

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=14135&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14135&range=00-01

  Stats: 211 lines in 2 files changed: 103 ins; 21 del; 87 mod
  Patch: https://git.openjdk.org/jdk/pull/14135.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/14135/head:pull/14135

PR: https://git.openjdk.org/jdk/pull/14135

Re: RFR: 8308804: Improve UUID.randomUUID performance with bulk/scalable PRNG access [v2]

Reply via email to