On Thu, 24 Jun 2021 17:02:03 GMT, Scott Gibbons <sgibb...@openjdk.org> wrote:
>> Add the Base64 Decode intrinsic for x86 to utilize AVX-512 for acceleration. >> Also allows for performance improvement for non-AVX-512 enabled platforms. >> Due to the nature of MIME-encoded inputs, modify the intrinsic signature to >> accept an additional parameter (isMIME) for fast-path MIME decoding. >> >> A change was made to the signature of DecodeBlock in Base64.java to provide >> the intrinsic information as to whether MIME decoding was being done. This >> allows for the intrinsic to bypass the expensive setup of zmm registers from >> AVX tables, knowing there may be invalid Base64 characters every 76 >> characters or so. A change was also made here removing the restriction that >> the intrinsic must return an even multiple of 3 bytes decoded. This >> implementation handles the pad characters at the end of the string and will >> return the actual number of characters decoded. >> >> The AVX portion of this code will decode in blocks of 256 bytes per loop >> iteration, then in chunks of 64 bytes, followed by end fixup decoding. The >> non-AVX code is an assembly-optimized version of the java DecodeBlock and >> behaves identically. >> >> Running the Base64Decode benchmark, this change increases decode performance >> by an average of 2.6x with a maximum 19.7x for buffers > ~20k. The numbers >> are given in the table below. >> >> **Base Score** is without intrinsic support, **Optimized Score** is using >> this intrinsic, and **Gain** is **Base** / **Optimized**. >> >> >> Benchmark Name | Base Score | Optimized Score | Gain >> -- | -- | -- | -- >> testBase64Decode size 1 | 15.36 | 15.32 | 1.00 >> testBase64Decode size 3 | 17.00 | 16.72 | 1.02 >> testBase64Decode size 7 | 20.60 | 18.82 | 1.09 >> testBase64Decode size 32 | 34.21 | 26.77 | 1.28 >> testBase64Decode size 64 | 54.43 | 38.35 | 1.42 >> testBase64Decode size 80 | 66.40 | 48.34 | 1.37 >> testBase64Decode size 96 | 73.16 | 52.90 | 1.38 >> testBase64Decode size 112 | 84.93 | 51.82 | 1.64 >> testBase64Decode size 512 | 288.81 | 32.04 | 9.01 >> testBase64Decode size 1000 | 560.48 | 40.79 | 13.74 >> testBase64Decode size 20000 | 9530.28 | 483.37 | 19.72 >> testBase64Decode size 50000 | 24552.24 | 1735.07 | 14.15 >> testBase64MIMEDecode size 1 | 22.87 | 21.36 | 1.07 >> testBase64MIMEDecode size 3 | 27.79 | 25.32 | 1.10 >> testBase64MIMEDecode size 7 | 44.74 | 43.81 | 1.02 >> testBase64MIMEDecode size 32 | 142.69 | 129.56 | 1.10 >> testBase64MIMEDecode size 64 | 256.90 | 243.80 | 1.05 >> testBase64MIMEDecode size 80 | 311.60 | 310.80 | 1.00 >> testBase64MIMEDecode size... > > Scott Gibbons has updated the pull request incrementally with one additional > commit since the last revision: > > Fixed Windows register stomping. We found this optimization causes https://bugs.openjdk.org/browse/JDK-8321599 ------------- PR Comment: https://git.openjdk.org/jdk/pull/4368#issuecomment-1847357495