On Wed, 15 Mar 2023 13:42:22 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
>> Can you check what happen adding much more inputs to the dataset including >> non-latin chars as well and use `-prof perfnorm` to check what `perf` report >> re branches/branch-misses? >> >> You can use `SplittableRandom` to pre-populate an array of inputs which >> sequence is "random" but still allow deterministic benchmarking and feed the >> benchmark method by cycling the pre-computed inputs. >> In the real world I expect `isDigit` to happen on different input types and >> both having C2 with both branches places based on prev inputs distribution >> and a confused branch-predictor to allow comparing vs something that looks a >> bit nearest to the real world (TBD, I know). >> I expect in that case that a single cmp + mask to work better depending on >> latin input distribution/occurrence > > I created a randomized version of `Characters.isDigit` which tests with code > points picked at random such that any category (Latin1, negative, different > planes, unassiged) are equally probable. > > Baseline: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.503 ± 0.371 ns/op > > > Current PR: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.393 ± 0.336 ns/op > > > Using StringLatin1.canEncode: > > > Benchmark (codePoint) Mode Cnt Score Error Units > Characters.isDigitRandom 1632 avgt 15 5.377 ± 0.322 ns/op > > > Seems the PR still has a small improvement for this scenario. The > StringLatin1.canEncode regression disappears. > > In the real world ASCII/Latin1 seems to dominate most data, so this scenario > is perhaps not very realistic. > > I'm running this on a Mac, so cannot try `-prof perfnorm`. Many thanks to have tried, yep, I was curious indeed re the "StringLatin1.canEncode regression" case. I would still modify the benchmark to use inputs (I know that will make it memory bound sadly, due to reading inputs - but the size of such inputs can be a benchmark parameter, together with the bias eg "latin","mix", "non-latin") "semi-randomly" generated based on the mentioned strategies/biases. It will benefit future tests on this, although could be provided as a separate PR. ------------- PR: https://git.openjdk.org/jdk/pull/13040