On Fri, 2 Apr 2021 03:10:57 GMT, Dong Bo <don...@openjdk.org> wrote: >> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding. >> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic >> idea can be found at >> http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. >> >> Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build. >> Tests in `test/jdk/java/util/Base64/` and >> `compiler/intrinsics/base64/TestBase64.java` runned specially for the >> correctness of the implementation. >> >> There can be illegal characters at the start of the input if the data is >> MIME encoded. >> It would be no benefits to use SIMD for this case, so the stub use no-simd >> instructions for MIME encoded data now. >> >> A JMH micro, Base64Decode.java, is added for performance test. >> With different input length (upper-bounded by parameter `maxNumBytes` in the >> JMH micro), >> we witness ~2.5x improvements with long inputs and no regression with short >> inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on >> Kunpeng916. >> >> The Base64Decode.java JMH micro-benchmark results: >> >> Benchmark (lineSize) (maxNumBytes) Mode Cnt >> Score Error Units >> >> # Kunpeng916, intrinsic >> Base64Decode.testBase64Decode 4 1 avgt 5 >> 48.614 ± 0.609 ns/op >> Base64Decode.testBase64Decode 4 3 avgt 5 >> 58.199 ± 1.650 ns/op >> Base64Decode.testBase64Decode 4 7 avgt 5 >> 69.400 ± 0.931 ns/op >> Base64Decode.testBase64Decode 4 32 avgt 5 >> 96.818 ± 1.687 ns/op >> Base64Decode.testBase64Decode 4 64 avgt 5 >> 122.856 ± 9.217 ns/op >> Base64Decode.testBase64Decode 4 80 avgt 5 >> 130.935 ± 1.667 ns/op >> Base64Decode.testBase64Decode 4 96 avgt 5 >> 143.627 ± 1.751 ns/op >> Base64Decode.testBase64Decode 4 112 avgt 5 >> 152.311 ± 1.178 ns/op >> Base64Decode.testBase64Decode 4 512 avgt 5 >> 342.631 ± 0.584 ns/op >> Base64Decode.testBase64Decode 4 1000 avgt 5 >> 573.635 ± 1.050 ns/op >> Base64Decode.testBase64Decode 4 20000 avgt 5 >> 9534.136 ± 45.172 ns/op >> Base64Decode.testBase64Decode 4 50000 avgt 5 >> 22718.726 ± 192.070 ns/op >> Base64Decode.testBase64MIMEDecode 4 1 avgt 10 >> 63.558 ± 0.336 ns/op >> Base64Decode.testBase64MIMEDecode 4 3 avgt 10 >> 82.504 ± 0.848 ns/op >> Base64Decode.testBase64MIMEDecode 4 7 avgt 10 >> 120.591 ± 0.608 ns/op >> Base64Decode.testBase64MIMEDecode 4 32 avgt 10 >> 324.314 ± 6.236 ns/op >> Base64Decode.testBase64MIMEDecode 4 64 avgt 10 >> 532.678 ± 4.670 ns/op >> Base64Decode.testBase64MIMEDecode 4 80 avgt 10 >> 678.126 ± 4.324 ns/op >> Base64Decode.testBase64MIMEDecode 4 96 avgt 10 >> 771.603 ± 6.393 ns/op >> Base64Decode.testBase64MIMEDecode 4 112 avgt 10 >> 889.608 ± 0.759 ns/op >> Base64Decode.testBase64MIMEDecode 4 512 avgt 10 >> 3663.557 ± 3.422 ns/op >> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 >> 7017.784 ± 9.128 ns/op >> Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 >> 128670.660 ± 7951.521 ns/op >> Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 >> 317113.667 ± 161.758 ns/op >> >> # Kunpeng916, default >> Base64Decode.testBase64Decode 4 1 avgt 5 >> 48.455 ± 0.571 ns/op >> Base64Decode.testBase64Decode 4 3 avgt 5 >> 57.937 ± 0.505 ns/op >> Base64Decode.testBase64Decode 4 7 avgt 5 >> 73.823 ± 1.452 ns/op >> Base64Decode.testBase64Decode 4 32 avgt 5 >> 106.484 ± 1.243 ns/op >> Base64Decode.testBase64Decode 4 64 avgt 5 >> 141.004 ± 1.188 ns/op >> Base64Decode.testBase64Decode 4 80 avgt 5 >> 156.284 ± 0.572 ns/op >> Base64Decode.testBase64Decode 4 96 avgt 5 >> 174.137 ± 0.177 ns/op >> Base64Decode.testBase64Decode 4 112 avgt 5 >> 188.445 ± 0.572 ns/op >> Base64Decode.testBase64Decode 4 512 avgt 5 >> 610.847 ± 1.559 ns/op >> Base64Decode.testBase64Decode 4 1000 avgt 5 >> 1155.368 ± 0.813 ns/op >> Base64Decode.testBase64Decode 4 20000 avgt 5 >> 19751.477 ± 24.669 ns/op >> Base64Decode.testBase64Decode 4 50000 avgt 5 >> 50046.586 ± 523.155 ns/op >> Base64Decode.testBase64MIMEDecode 4 1 avgt 10 >> 64.130 ± 0.238 ns/op >> Base64Decode.testBase64MIMEDecode 4 3 avgt 10 >> 82.096 ± 0.205 ns/op >> Base64Decode.testBase64MIMEDecode 4 7 avgt 10 >> 118.849 ± 0.610 ns/op >> Base64Decode.testBase64MIMEDecode 4 32 avgt 10 >> 331.177 ± 4.732 ns/op >> Base64Decode.testBase64MIMEDecode 4 64 avgt 10 >> 549.117 ± 0.177 ns/op >> Base64Decode.testBase64MIMEDecode 4 80 avgt 10 >> 702.951 ± 4.572 ns/op >> Base64Decode.testBase64MIMEDecode 4 96 avgt 10 >> 799.566 ± 0.301 ns/op >> Base64Decode.testBase64MIMEDecode 4 112 avgt 10 >> 923.749 ± 0.389 ns/op >> Base64Decode.testBase64MIMEDecode 4 512 avgt 10 >> 4000.725 ± 2.519 ns/op >> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 >> 7674.994 ± 9.281 ns/op >> Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 >> 142059.001 ± 157.920 ns/op >> Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 >> 355698.369 ± 216.542 ns/op > > Dong Bo has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains six additional commits since the > last revision: > > - Merge branch 'master' into aarch64.base64.decode > - copyright > - trivial fixes > - Handling error in SIMD case with loops, combining two non-SIMD cases into > one code blob, addressing other comments > - Merge branch 'master' into aarch64.base64.decode > - 8256245: AArch64: Implement Base64 decoding intrinsic
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5802: > 5800: // The 1st character of the input can be illegal if the data is > MIME encoded. > 5801: // We cannot benefits from SIMD for this case. The max line size of > MIME > 5802: // encoding is 76, with the PreProcess80B blob, we actually use > no-simd "cannot benefit" ------------- PR: https://git.openjdk.java.net/jdk/pull/3228