On Fri, 25 Feb 2022 09:19:33 GMT, Claes Redestad <[email protected]> wrote:
>> Splitting out these micro changes from #7231
>>
>> - Clean up and simplify setup and code
>> - Add variants with different inputs with varying lengths and encoding
>> weights, but also relevant mixes of each so that we both cover interesting
>> corner cases but also verify that performance behave when there's a
>> multitude of input shapes. Both simple and mixed variants are interesting
>> diagnostic tools.
>> - Drop all charsets from the default run configuration except UTF-8.
>> Motivation: Defaults should give good coverage but try to keep runtimes at
>> bay. Additionally if the charset tested can't encode the higher codepoints
>> used in these micros the results might be misleading. If you for example
>> test using ISO-8859-1 the UTF-16 bytes in StringDecode.decodeUTF16 will have
>> all been replaced by `?`, so the test is effectively the same as testing
>> ASCII-only.
>
> Claes Redestad has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Brent review
test/micro/org/openjdk/bench/java/lang/StringDecode.java line 72:
> 70: public void setup() {
> 71: charset = Charset.forName(charsetName);
> 72: asciiString = LOREM.substring(0, 32).getBytes(charset);
This is problematic IMO in that it's missing short strings such as "Claes".
Average Java strings are about 32 bytes long AFAICR, and people writing
(vectorized) ijntrinsics have a nasty habit of optimizing for long strings, to
the detriment of typical-length ones.
Whether we like it or not, people will optimize for benchmarks, so it's
important that benchmark data is realistic. The shortest here is 15 bytes, as
far as I can see. I'd certainly include a short string of just a few bytes so
that intrinsics don't cause regressions in important cases.
-------------
PR: https://git.openjdk.java.net/jdk/pull/7516