On Mon, 1 Feb 2021 13:11:52 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:
> Since Compact Strings implementation, there are simple methods in String and > StringBuilders: `coder()`, `value()`, `isLatin1()`. They are mostly there to > capture `COMPACT_STRINGS` flag that would fold to "false" when compact > strings are disabled with `-XX:-CompactStrings`. > > This was to guarantee the same performance when Compact String are > unimplemented and/or have surprising performance behaviors. Quite some time > had passed since JDK 9 release, and we can consider simplifying these. For > example, ignoring `COMPACT_STRINGS` and inlining the methods simplifies the > code. Some new code added to `String` after JDK 9 already has these methods > inlined, checking `coder` directly. > > The major drawback is that supplying `-XX:-CompactStrings` would have to go > through the same `coder == UTF16` paths the "normal" Compact Strings are > going. It would not be folded to the most optimal form by the optimizer, > because `coder` field value is generally opaque to optimizers. > > On the flip side, there is an advantage on paths that do not have optimizer > ready (yet), for example at startup. Hello World startup improves for about > 0.3 msec (or ~1.3%). The improvement is quite probably more visible on larger > workloads, i.e. large application servers. (In fact, this is where I spotted > this opportunity: OpenLiberty startup profiles). > > Interpreter-only "Hello World": > > > Performance counter stats for 'taskset -c 13 build/baseline/bin/java > -Xms128m -Xmx128m -Xint Hello' (5000 runs): > > 23.40 msec task-clock # 0.933 CPUs utilized > ( +- 0.01% ) > 86,133,190 cycles # 3.680 GHz > ( +- 0.01% ) > 79,854,588 instructions # 0.93 insn per cycle > ( +- 0.00% ) > > 0.02507105 +- 0.00000343 seconds time elapsed ( +- 0.01% ) > > Performance counter stats for 'taskset -c 13 > build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m -Xint > Hello' (5000 runs): > > 23.07 msec task-clock # 0.935 CPUs utilized > ( +- 0.01% ) > 84,877,118 cycles # 3.679 GHz > ( +- 0.01% ) > 79,231,598 instructions # 0.93 insn per cycle > ( +- 0.00% ) > > 0.02466467 +- 0.00000382 seconds time elapsed ( +- 0.02% ) > > > > OpenLiberty startup: > > > Before: 2.296, 2.281, 2.291 (seconds) > After: 2.254, 2.267, 2.272 (seconds) > > > Additional tests: > - [x] Linux `tier1` default (passes) > - [x] Linux `tier1` `-XX:-CompactStrings` Was this abandoned out of concern there'd be noticeable regressions when `COMPACT_STRING` is `false`? > > It would be useful to drive a discussion of removing fallback provided > > by COMPACT_STRINGS? entirely. > > It was retained initially as a hedge against bugs in the new implementation. > > It adds complexity to the implementation and can hide implementation bugs. > > Isn't COMPACT_STRING inherently slower if you are processing predominantly > Chinese/Japanese/Korean text? I'm curious how common it is to deploy with `-XX:-CompactStrings` in such locales. Does anyone have any real world data on this on JDK 11 or later? In most apps I'd expect a large number of strings created and retained to be latin1-encodable independent of locale (think: XML/JSON/HTML tags/attributes, many of the Strings created and passed around by the JDK and libraries etc..), so it's not clear-cut that `+CompactStrings` is always inherently slower there. And if it is I think the slowness might stem mainly from things like speculatively assuming latin1 in some places when decoding character data etc. Since JDK 9 there has seen several useful optimizations that might have removed some rough edges here, e.g., https://bugs.openjdk.java.net/browse/JDK-8231717 Academics aside, if `-CompactStrings` is commonly used and this patch would harm performance in that mode (would it, really?) then we shouldn't do this for a small startup win. ------------- PR: https://git.openjdk.java.net/jdk/pull/2334