Hi,

I've filed two RFEs:

- https://bugs.openjdk.java.net/browse/JDK-8196331 for providing a
  fast-path for ASCII/Latin1 input to Character.digit. Patch out
  for review
- https://bugs.openjdk.java.net/browse/JDK-8196334 for tracking
  the optimizations specific to UUID proposed here

Jon, if you want to provide a patch I'd be happy to sponsor. I have
a few ideas on how to retrofit existing behavior that we can discuss
offline. Also, if you haven't filed an OCA then this might be a good
time to do so[1]

Thanks!

/Claes

[1] http://openjdk.java.net/contribute/

On 2018-01-29 00:19, Claes Redestad wrote:
Hi Jon,

On 2018-01-27 17:05, Jon Chambers wrote:
Regardless, I wanted to call this optimization opportunity to your
attention, and would be happy to offer a proper patch if this seems like a
worthwhile change.

this does looks promising - and at least points out there might be
some performance issues lurking here!

Found another compatibility issue, however: The current
UUID.fromString allow use of any unicode digits parseable into
digits. Say, Arabic digits (U+0660 - U+0669):

jshell> UUID.fromString("\u0669-\u0669-\u0669-\u0669-\u0669");
$2 ==> 00000009-0009-0009-0009-000000000009

That you're getting some improvement from avoiding
Character.digit(..) stands out in a way, so I wondered if there's
something going on here... and it does seem that this method could
be optimized by providing a fast-path in the CharacterDataLatin1
plane...

I experimented with a few solutions, and the best I could find was
a 256 element byte[] lookup table for digits in CharacterDataLatin1
(spanning the entire 8-bit space of CharacterDataLatin1 allows us
to avoid a branch):

http://cr.openjdk.java.net/~redestad/scratch/char.digit.00/

This speeds up your micro testing java.util.UUID
(UUIDBenchmark.benchmarkUUIDFromString) by ~1.6x. The
improvement is there on all counters when looking at -prof perfnorm
(less loads, stores, branches, instructions, better CPI etc)[1], so seems
like a pretty straightforward thing to consider independently.

Your FastUUID implementation is still quite a bit faster, though, but
it'd be good to dig around for a bit to see if there are other more
general improvements like this to be had..

Thanks!

/Claes

[1]
Baseline:
Benchmark                                                    Mode Cnt     Score   Error  Units UUIDBenchmark.benchmarkUUIDFromString                        avgt 4     0.505 ± 0.066  us/op
UUIDBenchmark.benchmarkUUIDFromString:CPI avgt 1.024           #/op
UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-load-misses avgt          3.000           #/op UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-loads avgt 211.128           #/op UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-stores avgt         34.894           #/op UUIDBenchmark.benchmarkUUIDFromString:L1-icache-load-misses avgt          0.067           #/op UUIDBenchmark.benchmarkUUIDFromString:LLC-load-misses avgt          0.103           #/op
UUIDBenchmark.benchmarkUUIDFromString:LLC-loads avgt 0.812           #/op
UUIDBenchmark.benchmarkUUIDFromString:LLC-store-misses avgt          0.004           #/op UUIDBenchmark.benchmarkUUIDFromString:LLC-stores avgt 0.054           #/op UUIDBenchmark.benchmarkUUIDFromString:branch-misses avgt 13.269           #/op UUIDBenchmark.benchmarkUUIDFromString:branches avgt 301.931           #/op
UUIDBenchmark.benchmarkUUIDFromString:cycles avgt 1585.057           #/op
UUIDBenchmark.benchmarkUUIDFromString:dTLB-load-misses avgt          0.005           #/op UUIDBenchmark.benchmarkUUIDFromString:dTLB-loads avgt 208.788           #/op UUIDBenchmark.benchmarkUUIDFromString:dTLB-store-misses avgt         ≈ 10⁻⁴           #/op UUIDBenchmark.benchmarkUUIDFromString:dTLB-stores avgt 30.540           #/op UUIDBenchmark.benchmarkUUIDFromString:iTLB-load-misses avgt          0.002           #/op UUIDBenchmark.benchmarkUUIDFromString:iTLB-loads avgt 0.006           #/op UUIDBenchmark.benchmarkUUIDFromString:instructions avgt 1547.568           #/op

Results with http://cr.openjdk.java.net/~redestad/scratch/char.digit.00/ -- : Benchmark                                                    Mode Cnt     Score   Error  Units UUIDBenchmark.benchmarkUUIDFromString                        avgt 4     0.310 ± 0.022  us/op
UUIDBenchmark.benchmarkUUIDFromString:CPI avgt 0.832           #/op
UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-load-misses avgt          2.026           #/op UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-loads avgt 208.227           #/op UUIDBenchmark.benchmarkUUIDFromString:L1-dcache-stores avgt         30.673           #/op UUIDBenchmark.benchmarkUUIDFromString:L1-icache-load-misses avgt          0.052           #/op UUIDBenchmark.benchmarkUUIDFromString:LLC-load-misses avgt          0.042           #/op
UUIDBenchmark.benchmarkUUIDFromString:LLC-loads avgt 1.027           #/op
UUIDBenchmark.benchmarkUUIDFromString:LLC-store-misses avgt          0.007           #/op UUIDBenchmark.benchmarkUUIDFromString:LLC-stores avgt 0.044           #/op UUIDBenchmark.benchmarkUUIDFromString:branch-misses avgt 0.040           #/op UUIDBenchmark.benchmarkUUIDFromString:branches avgt 255.547           #/op
UUIDBenchmark.benchmarkUUIDFromString:cycles avgt 981.466           #/op
UUIDBenchmark.benchmarkUUIDFromString:dTLB-load-misses avgt          0.003           #/op UUIDBenchmark.benchmarkUUIDFromString:dTLB-loads avgt 213.797           #/op UUIDBenchmark.benchmarkUUIDFromString:dTLB-store-misses avgt         ≈ 10⁻⁴           #/op UUIDBenchmark.benchmarkUUIDFromString:dTLB-stores avgt 30.888           #/op UUIDBenchmark.benchmarkUUIDFromString:iTLB-load-misses avgt          0.002           #/op UUIDBenchmark.benchmarkUUIDFromString:iTLB-loads avgt 0.002           #/op UUIDBenchmark.benchmarkUUIDFromString:instructions avgt 1180.129           #/op

Reply via email to