On Thu, 7 Jan 2021 18:50:05 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> Removing the UUID clone cache and running the microbenchmark along with the 
>> changes in #1933:
>> 
>> Benchmark                                                  (size)   Mode  
>> Cnt    Score    Error   Units
>> UUIDBench.fromType3Bytes                                    20000  thrpt   
>> 12    2.182 ±  0.090  ops/us
>> UUIDBench.fromType3Bytes:·gc.alloc.rate                     20000  thrpt   
>> 12  439.020 ± 18.241  MB/sec
>> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   
>> 12  264.022 ±  0.003    B/op
>> 
>> The goal now is if to simplify the digest code and compare alternatives.
>
> I've run various tests and concluded that the `VarHandle`ized code is 
> matching or improving upon the `Unsafe`-riddled code in `ByteArrayAccess`. I 
> then went ahead and consolidated to use similar code pattern in 
> `ByteArrayAccess` for consistency, which amounts to a good cleanup.
> 
> With MD5 intrinsics disabled, I get this baseline:
> 
> Benchmark                                                  (size)   Mode  Cnt 
>    Score    Error   Units
> UUIDBench.fromType3Bytes                                    20000  thrpt   12 
>    1.245 ±  0.077  ops/us
> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12 
>  488.042 ±  0.004    B/op
> 
> With the current patch here (not including #1933): 
> Benchmark                                                  (size)   Mode  Cnt 
>    Score    Error   Units
> UUIDBench.fromType3Bytes                                    20000  thrpt   12 
>    1.431 ±  0.106  ops/us
> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12 
>  408.035 ±  0.006    B/op
> 
> If I isolate the `ByteArrayAccess` changes I'm getting performance neutral or 
> slightly better numbers compared to baseline for these tests:
> 
> Benchmark                                                  (size)   Mode  Cnt 
>    Score    Error   Units
> UUIDBench.fromType3Bytes                                    20000  thrpt   12 
>    1.317 ±  0.092  ops/us
> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12 
>  488.042 ±  0.004    B/op

Thanks for the performance enhancement, I will take a look.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1855

Reply via email to