On Mon, 1 Feb 2021 13:11:52 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:

> Since Compact Strings implementation, there are simple methods in String and 
> StringBuilders: `coder()`, `value()`, `isLatin1()`. They are mostly there to 
> capture `COMPACT_STRINGS` flag that would fold to "false" when compact 
> strings are disabled with `-XX:-CompactStrings`.
> 
> This was to guarantee the same performance when Compact String are 
> unimplemented and/or have surprising performance behaviors. Quite some time 
> had passed since JDK 9 release, and we can consider simplifying these. For 
> example, ignoring `COMPACT_STRINGS` and inlining the methods simplifies the 
> code. Some new code added to `String` after JDK 9 already has these methods 
> inlined, checking `coder` directly.
> 
> The major drawback is that supplying `-XX:-CompactStrings` would have to go 
> through the same `coder == UTF16` paths the "normal" Compact Strings are 
> going. It would not be folded to the most optimal form by the optimizer, 
> because `coder` field value is generally opaque to optimizers.
> 
> On the flip side, there is an advantage on paths that do not have optimizer 
> ready (yet), for example at startup. Hello World startup improves for about 
> 0.3 msec (or ~1.3%). The improvement is quite probably more visible on larger 
> workloads, i.e. large application servers. (In fact, this is where I spotted 
> this opportunity: OpenLiberty startup profiles).
> 
> Interpreter-only "Hello World":
> 
> 
>  Performance counter stats for 'taskset -c 13 build/baseline/bin/java 
> -Xms128m -Xmx128m -Xint Hello' (5000 runs):
> 
>              23.40 msec task-clock                #    0.933 CPUs utilized    
>         ( +-  0.01% )
>         86,133,190      cycles                    #    3.680 GHz              
>         ( +-  0.01% )
>         79,854,588      instructions              #    0.93  insn per cycle   
>         ( +-  0.00% )
> 
>         0.02507105 +- 0.00000343 seconds time elapsed  ( +-  0.01% )
> 
>  Performance counter stats for 'taskset -c 13 
> build/linux-x86_64-server-release/images/jdk/bin/java -Xms128m -Xmx128m -Xint 
> Hello' (5000 runs):
> 
>              23.07 msec task-clock                #    0.935 CPUs utilized    
>         ( +-  0.01% )
>         84,877,118      cycles                    #    3.679 GHz              
>         ( +-  0.01% )
>         79,231,598      instructions              #    0.93  insn per cycle   
>         ( +-  0.00% )
> 
>         0.02466467 +- 0.00000382 seconds time elapsed  ( +-  0.02% )
> 
> 
> 
> OpenLiberty startup:
> 
> 
>  Before: 2.296, 2.281, 2.291 (seconds)
>  After:  2.254, 2.267, 2.272 (seconds)
> 
> 
> Additional tests:
>  - [x] Linux `tier1` default (passes)
>  - [x] Linux `tier1` `-XX:-CompactStrings`

Was this abandoned out of concern there'd be noticeable regressions when 
`COMPACT_STRING` is `false`? 

> > It would be useful to drive a discussion of removing fallback provided
> > by COMPACT_STRINGS? entirely.
> > It was retained initially as a hedge against bugs in the new implementation.
> > It adds complexity to the implementation and can hide implementation bugs.
> 
> Isn't COMPACT_STRING inherently slower if you are processing predominantly 
> Chinese/Japanese/Korean text?

I'm curious how common it is to deploy with `-XX:-CompactStrings` in such 
locales. Does anyone have any real world data on this on JDK 11 or later?

In most apps I'd expect a large number of strings created and retained to be 
latin1-encodable independent of locale (think: XML/JSON/HTML tags/attributes, 
many of the Strings created and passed around by the JDK and libraries etc..), 
so it's not clear-cut that `+CompactStrings` is always inherently slower there. 
And if it is I think the slowness might stem mainly from things like 
speculatively assuming latin1 in some places when decoding character data etc. 
Since JDK 9 there has seen several useful optimizations that might have removed 
some rough edges here, e.g., https://bugs.openjdk.java.net/browse/JDK-8231717 

Academics aside, if `-CompactStrings` is commonly used and this patch would 
harm performance in that mode (would it, really?) then we shouldn't do this for 
a small startup win.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2334

Reply via email to