On Sat, 27 Mar 2021 08:58:03 GMT, Dong Bo <don...@openjdk.org> wrote:
> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding. > Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea > can be found at > http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. > > Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build. > Tests in `test/jdk/java/util/Base64/` and > `compiler/intrinsics/base64/TestBase64.java` runned specially for the > correctness of the implementation. > > There can be illegal characters at the start of the input if the data is MIME > encoded. > It would be no benefits to use SIMD for this case, so the stub use no-simd > instructions for MIME encoded data now. > > A JMH micro, Base64Decode.java, is added for performance test. > With different input length (upper-bounded by parameter `maxNumBytes` in the > JMH micro), > we witness ~2.5x improvements with long inputs and no regression with short > inputs for raw base64 decodeing, minor improvements (~10.95%) for MIME on > Kunpeng916. > > The Base64Decode.java JMH micro-benchmark results: > > Benchmark (lineSize) (maxNumBytes) Mode Cnt > Score Error Units > > # Kunpeng916, intrinsic > Base64Decode.testBase64Decode 4 1 avgt 5 > 48.614 ± 0.609 ns/op > Base64Decode.testBase64Decode 4 3 avgt 5 > 58.199 ± 1.650 ns/op > Base64Decode.testBase64Decode 4 7 avgt 5 > 69.400 ± 0.931 ns/op > Base64Decode.testBase64Decode 4 32 avgt 5 > 96.818 ± 1.687 ns/op > Base64Decode.testBase64Decode 4 64 avgt 5 > 122.856 ± 9.217 ns/op > Base64Decode.testBase64Decode 4 80 avgt 5 > 130.935 ± 1.667 ns/op > Base64Decode.testBase64Decode 4 96 avgt 5 > 143.627 ± 1.751 ns/op > Base64Decode.testBase64Decode 4 112 avgt 5 > 152.311 ± 1.178 ns/op > Base64Decode.testBase64Decode 4 512 avgt 5 > 342.631 ± 0.584 ns/op > Base64Decode.testBase64Decode 4 1000 avgt 5 > 573.635 ± 1.050 ns/op > Base64Decode.testBase64Decode 4 20000 avgt 5 > 9534.136 ± 45.172 ns/op > Base64Decode.testBase64Decode 4 50000 avgt 5 > 22718.726 ± 192.070 ns/op > Base64Decode.testBase64MIMEDecode 4 1 avgt 10 > 63.558 ± 0.336 ns/op > Base64Decode.testBase64MIMEDecode 4 3 avgt 10 > 82.504 ± 0.848 ns/op > Base64Decode.testBase64MIMEDecode 4 7 avgt 10 > 120.591 ± 0.608 ns/op > Base64Decode.testBase64MIMEDecode 4 32 avgt 10 > 324.314 ± 6.236 ns/op > Base64Decode.testBase64MIMEDecode 4 64 avgt 10 > 532.678 ± 4.670 ns/op > Base64Decode.testBase64MIMEDecode 4 80 avgt 10 > 678.126 ± 4.324 ns/op > Base64Decode.testBase64MIMEDecode 4 96 avgt 10 > 771.603 ± 6.393 ns/op > Base64Decode.testBase64MIMEDecode 4 112 avgt 10 > 889.608 ± 0.759 ns/op > Base64Decode.testBase64MIMEDecode 4 512 avgt 10 > 3663.557 ± 3.422 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7017.784 ± 9.128 ns/op > Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 > 128670.660 ± 7951.521 ns/op > Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 > 317113.667 ± 161.758 ns/op > > # Kunpeng916, default > Base64Decode.testBase64Decode 4 1 avgt 5 > 48.455 ± 0.571 ns/op > Base64Decode.testBase64Decode 4 3 avgt 5 > 57.937 ± 0.505 ns/op > Base64Decode.testBase64Decode 4 7 avgt 5 > 73.823 ± 1.452 ns/op > Base64Decode.testBase64Decode 4 32 avgt 5 > 106.484 ± 1.243 ns/op > Base64Decode.testBase64Decode 4 64 avgt 5 > 141.004 ± 1.188 ns/op > Base64Decode.testBase64Decode 4 80 avgt 5 > 156.284 ± 0.572 ns/op > Base64Decode.testBase64Decode 4 96 avgt 5 > 174.137 ± 0.177 ns/op > Base64Decode.testBase64Decode 4 112 avgt 5 > 188.445 ± 0.572 ns/op > Base64Decode.testBase64Decode 4 512 avgt 5 > 610.847 ± 1.559 ns/op > Base64Decode.testBase64Decode 4 1000 avgt 5 > 1155.368 ± 0.813 ns/op > Base64Decode.testBase64Decode 4 20000 avgt 5 > 19751.477 ± 24.669 ns/op > Base64Decode.testBase64Decode 4 50000 avgt 5 > 50046.586 ± 523.155 ns/op > Base64Decode.testBase64MIMEDecode 4 1 avgt 10 > 64.130 ± 0.238 ns/op > Base64Decode.testBase64MIMEDecode 4 3 avgt 10 > 82.096 ± 0.205 ns/op > Base64Decode.testBase64MIMEDecode 4 7 avgt 10 > 118.849 ± 0.610 ns/op > Base64Decode.testBase64MIMEDecode 4 32 avgt 10 > 331.177 ± 4.732 ns/op > Base64Decode.testBase64MIMEDecode 4 64 avgt 10 > 549.117 ± 0.177 ns/op > Base64Decode.testBase64MIMEDecode 4 80 avgt 10 > 702.951 ± 4.572 ns/op > Base64Decode.testBase64MIMEDecode 4 96 avgt 10 > 799.566 ± 0.301 ns/op > Base64Decode.testBase64MIMEDecode 4 112 avgt 10 > 923.749 ± 0.389 ns/op > Base64Decode.testBase64MIMEDecode 4 512 avgt 10 > 4000.725 ± 2.519 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7674.994 ± 9.281 ns/op > Base64Decode.testBase64MIMEDecode 4 20000 avgt 10 > 142059.001 ± 157.920 ns/op > Base64Decode.testBase64MIMEDecode 4 50000 avgt 10 > 355698.369 ± 216.542 ns/op src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5624: > 5622: __ ld4(in0, in1, in2, in3, arrangement, __ post(src, 4 * size)); > 5623: > 5624: // we need unsigned saturationg substract, to make sure all input > values "saturating subtract" src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5649: > 5647: __ orr(decL3, arrangement, decL3, decH3); > 5648: > 5649: // check iilegal inputs, value larger than 63 (maximum of 6 bits) "illegal inputs". Are there existing jtreg tests that cover these cases? src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5772: > 5770: // The value of index 64 is set to 0, so that we know that we > already get the > 5771: // decoded data with the 1st lookup. > 5772: static const uint8_t fromBase64ForSIMD[128] = { This table and the one below seem to be identical to first half of the NoSIMD tables. Can't you just use one set of 256-entry tables for both SIMD and non-SIMD algorithms? src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5803: > 5801: Register dst = c_rarg3; // dest array > 5802: Register doff = c_rarg4; // position for writing to dest array > 5803: Register isURL = c_rarg5; // Base64 or URL chracter set "character set" src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5830: > 5828: > 5829: // The 1st character of the input can be illegal if the data is > MIME encoded. > 5830: // We can not benefits from SIMD for this case. The max line size > of MIME "cannot benefit" ------------- PR: https://git.openjdk.java.net/jdk/pull/3228