On Sun, 20 Dec 2020 20:45:55 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>>> I have to say that introducing a ThreadLocal here seems like a step in the 
>>> wrong direction. With a ThreadLocal, if I read this correctly, a 
>>> MessageDigest will be cached with each thread that ever calls this API, and 
>>> it won't be garbage collected until the thread dies. Some threads live 
>>> indefinitely, e.g., the ones in the fork-join common pool. Is this what we 
>>> want? Is UUID creation so frequent that each thread needs its own, or could 
>>> thread safety be handled using a simple locking protocol?
>> 
>> This is a good point. Significant effort has gone into recent releases to 
>> reduce the use of TLs in the (JDK-8245309, JDK-8245308, JDK-8245304, 
>> JDK-8245302) so adding new usages is a disappointing. So I think this PR 
>> does need good justifications, and probably a lot more exploration of 
>> options.
>
> As Stuart argues a TL approach is likely to look better in a microbenchmark, 
> but make things worse in other regards.
> 
> I started exploring options to this in #1855 (not logged an RFE, yet), and I 
> think there is potential to get most of the gains seen here without the 
> introduction of a `ThreadLocal`. The current patch also speeds up MD5.digest 
> for small chunks (+19% for 16 bytes, but less relative impact on larger 
> chunks, down to being lost in the noise on 16Kb). Speed-up of 
> `UUID.nameUUIDFromBytes` is somewhat modest right now (+4%, -17% 
> allocations), but I think it can be improved further without complicating 
> things too much. The MD5 intrinsic added by JDK-8250902 adds some constraints 
> on the implementation that holds back some polish. This can be fixed, but 
> requires some coordination.

A trick that could be used here instead of `ThreadLocal` is to store an 
instance of MD5 retrieved via MessageDigest.getInstance, but clone it before 
use. See [this 
commit](https://github.com/openjdk/jdk/pull/1855/commits/2f266316d62ca875c38e2f771412d02625414bf4).
 

This maintains thread safety, avoids the allure of TLs, and gives a substantial 
speed-up on the `UUIDBench.fromType3Bytes` micro (comparing it against the 
other optimizations that were already in #1855):

 
Benchmark                                                  (size)   Mode  Cnt   
 Score    Error   Units
UUIDBench.fromType3Bytes                                    20000  thrpt   12   
 1.523 ±  0.066  ops/us
UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  
408.036 ±  0.003    B/op

Benchmark                                                  (size)   Mode  Cnt   
 Score    Error   Units
UUIDBench.fromType3Bytes                                    20000  thrpt   12   
 2.186 ±  0.228  ops/us
UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  
264.023 ±  0.006    B/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/1821

Reply via email to