Hi, while investigating an issue I've found out that performance of expression (1) can be significantly improved by turning it into (2)
---------------------------------------------------------------------------------------------------- //str is an instance of java.lang.String (1) return new StringBuilder().append(str, beginIndex, endIndex).toString(); ----> (2) String substring = str.substring(beginIndex, endIndex); return new StringBuilder().append(substring).toString(); ---------------------------------------------------------------------------------------------------- Using benchmark [1] I've got the following results: Benchmark (length) Mode Cnt Score Error Units StringBuilderAppendBenchmark.appendBounds 10 avgt 20 21,161 ± 0,545 ns/op StringBuilderAppendBenchmark.appendBounds 100 avgt 20 72,230 ± 3,452 ns/op StringBuilderAppendBenchmark.appendBounds 1000 avgt 20 476,256 ± 12,044 ns/op StringBuilderAppendBenchmark.appendSubString 10 avgt 20 26,023 ± 0,871 ns/op StringBuilderAppendBenchmark.appendSubString 100 avgt 20 38,824 ± 1,185 ns/op StringBuilderAppendBenchmark.appendSubString 1000 avgt 20 214,072 ± 3,372 ns/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 10 avgt 20 80,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 100 avgt 20 320,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 1000 avgt 20 2120,001 ± 0,001 B/op StringBuilderAppendBenchmark.appendSubString:·gc.alloc.rate.norm 10 avgt 20 96,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendSubString:·gc.alloc.rate.norm 100 avgt 20 192,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendSubString:·gc.alloc.rate.norm 1000 avgt 20 1088,000 ± 0,001 B/op The reason for such a difference seems to be @HotSpotIntrinsicCandidate over StringBuilder.append(String), while StringBuilder.append(CharSerquence, int, int) copies chars/bytes one by one. I've prepared a patch [2] and applied it to fresh sources of JDK 11. After rebuild the same benchmark demonstrated severe performance degradation: - before patch Benchmark (length) Mode Cnt Score Error Units StringBuilderAppendBenchmark.appendBounds 10 avgt 20 21,592 ± 0,763 ns/op StringBuilderAppendBenchmark.appendBounds 100 avgt 20 72,241 ± 2,799 ns/op StringBuilderAppendBenchmark.appendBounds 1000 avgt 20 488,106 ± 43,042 ns/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 10 avgt 20 80,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 100 avgt 20 320,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 1000 avgt 20 2132,001 ± 10,691 B/op - after patch Benchmark (length) Mode Cnt Score Error Units StringBuilderAppendBenchmark.appendBounds 10 avgt 20 40,186 ± 1,828 ns/op StringBuilderAppendBenchmark.appendBounds 100 avgt 20 89,031 ± 6,664 ns/op StringBuilderAppendBenchmark.appendBounds 1000 avgt 20 669,934 ± 76,888 ns/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 10 avgt 20 152,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 100 avgt 20 440,000 ± 0,001 B/op StringBuilderAppendBenchmark.appendBounds:·gc.alloc.rate.norm 1000 avgt 20 2688,001 ± 0,002 B/op As I see the reason of degradation is that patched code now dodges intrinsified StringBuilder.append(String) and calls directly AbstractStringBuilder.append(String) which in spite of utilizing System::arraycopy performes worse than intrinsic. So my question is should I rework the patch somehow to make it perform better or I need to apply the transformation mentioned above ad hoc? Appendix: 1) benchmark @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"}) public class StringBuilderAppendBenchmark { @Benchmark @SuppressWarnings("StringBufferReplaceableByString") public String appendSubString(Data data) { String str = data.str; int beginIndex = data.beginIndex; int endIndex = data.endIndex; String substring = str.substring(beginIndex, endIndex); return new StringBuilder().append('L').append(substring).append(';').toString(); } @Benchmark public String appendBounds(Data data) { String str = data.str; int beginIndex = data.beginIndex; int endIndex = data.endIndex; return new StringBuilder().append('L').append(str, beginIndex, endIndex).append(';').toString(); } @State(Scope.Thread) public static class Data { String str; @Param({"10", "100", "1000"}) private int length; private int beginIndex; private int endIndex; @Setup public void setup() { str = randomString(); beginIndex = length / 4; endIndex = length / 4* 3; } private String randomString() { ThreadLocalRandom random = ThreadLocalRandom.current(); char[] chars = "abcdefghijklmnopqrstuvwxyz".toCharArray(); StringBuilder sb = new StringBuilder(length); for (int i = 0; i < length; i++) { char c = chars[random.nextInt(chars.length)]; sb.append(c); } return sb.toString(); } } } 2) patch diff --git a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java --- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java +++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java @@ -626,10 +626,7 @@ s = "null"; } checkRange(start, end, s.length()); - int len = end - start; - ensureCapacityInternal(count + len); - appendChars(s, start, end); - return this; + return append(s.subSequence(start, end).toString()); } /** @@ -1686,27 +1683,6 @@ this.count = count + end - off; } - private final void appendChars(CharSequence s, int off, int end) { - if (isLatin1()) { - byte[] val = this.value; - for (int i = off, j = count; i < end; i++) { - char c = s.charAt(i); - if (StringLatin1.canEncode(c)) { - val[j++] = (byte)c; - } else { - count = j; - inflate(); - StringUTF16.putCharsSB(this.value, j, s, i, end); - count += end - i; - return; - } - } - } else { - StringUTF16.putCharsSB(this.value, count, s, off, end); - } - count += end - off; - } - /* IndexOutOfBoundsException, if out of bounds */ private static void checkRange(int start, int end, int len) { if (start < 0 || start > end || end > len) {