On Sat, 29 Jun 2024 11:53:31 GMT, Shaojin Wen <d...@openjdk.org> wrote:
> We need a String format solution with good performance. String Template was > once expected, but it has been removed. j.u.Formatter is powerful, but its > performance is not good enough. > > This PR implements a subset of j.u.Formatter capabilities. The performance is > good enough that it is a fastpath for commonly used functions. When the > supported functions are exceeded, it will fall back to using j.u.Formatter. > > The performance of this implementation is good enough, the fastpath has low > detection cost, There is no noticeable performance degradation when falling > back to j.u.Formatter via fastpath. > > Below is a comparison of String.format and concat-based and StringBuilder: > > * benchmark java code > > public class StringFormat { > @Benchmark > public String stringIntFormat() { > return "%s %d".formatted(s, i); > } > > @Benchmark > public String stringIntConcat() { > return s + " " + i; > } > > @Benchmark > public String stringIntStringBuilder() { > return new StringBuilder(s).append(" ").append(i).toString(); > } > } > > > * benchmark number on macbook m1 pro > > Benchmark Mode Cnt Score Error Units > StringFormat.stringIntConcat avgt 15 6.541 ? 0.056 ns/op > StringFormat.stringIntFormat avgt 15 17.399 ? 0.133 ns/op > StringFormat.stringIntStringBuilder avgt 15 8.004 ? 0.063 ns/op > > > From the above data, we can see that the implementation of fastpath reduces > the performance difference between String.format and StringBuilder from 10 > times to 2~3 times. > > The implementation of fastpath supports the following four specifiers, which > can appear at most twice and support a width of 1 to 9. > > d > x > X > s > > If necessary, we can add a few more. > > > Below is a comparison of performance numbers running on a MacBook M1, showing > a significant performance improvement. > > -Benchmark Mode Cnt Score Error Units > (baseline) > -StringFormat.complexFormat avgt 15 895.954 ? 52.541 ns/op > -StringFormat.decimalFormat avgt 15 277.420 ? 18.254 ns/op > -StringFormat.stringFormat avgt 15 66.787 ? 2.715 ns/op > -StringFormat.stringIntFormat avgt 15 81.046 ? 1.879 ns/op > -StringFormat.widthStringFormat avgt 15 38.897 ? 0.114 ns/op > -StringFormat.widthStringIntFormat avgt 15 109.841 ? 1.028 ns/op > > +Benchmark Mode Cnt Score Error Units > (current f925339e93fdf7a281462554ce5d73139bd0f0cd) > +StringFormat.complexF... src/java.base/share/classes/java/lang/StringFormat.java line 65: > 63: int max = format.length(); > 64: int off = format.indexOf('%'); > 65: if (off == -1 || off + 1 == max) { If there is no `%` in the format string, should this just return the format string as result? Suggestion: int off = format.indexOf('%'); if (off == -1) { // no formatting to be done return format; } int max = format.length(); if (off + 1 == max) { src/java.base/share/classes/java/lang/StringFormat.java line 82: > 80: conv = format.charAt(off + 2); > 81: } > 82: } is it worth handling `width > 9`? Suggestion: for (int i = 2; conv >= '1' && conv <= '9' && off + i < max; i++) { width = (width * 10) + (conv - '0'); if (off + i < max) { conv = format.charAt(off + i); } } src/java.base/share/classes/java/lang/StringFormat.java line 90: > 88: if (conv == STRING) { > 89: arg = String.valueOf(arg); > 90: } should these conditions be combined? Suggestion: if (conv == STRING) { if (isBigInt(arg)) { conv = DECIMAL_INTEGER; } else { arg = String.valueOf(arg); } } src/java.base/share/classes/java/lang/StringFormat.java line 102: > 100: int max = format.length(); > 101: int off0 = format.indexOf('%'); > 102: if (off0 == -1 || off0 + 1 == max) { similar to above, if `%` is not found, return `format` Suggestion: int off0 = format.indexOf('%'); if (off0 == -1) { // no formatting to be done return format; } int max = format.length(); if (off0 + 1 == max) { src/java.base/share/classes/java/lang/StringFormat.java line 110: > 108: width0 = conv0 - '0'; > 109: if (off0 + 2 < max) { > 110: conv0 = format.charAt(off0 + 2); is it worth handling `width0 > 9`? Suggestion: for (int i = 2; conv0 >= '1' && conv0 <= '9' && off0 + i < max; i++) { width0 = (width0 * 10) + (conv0 - '0'); if (off0 + i < max) { conv0 = format.charAt(off0 + i); src/java.base/share/classes/java/lang/StringFormat.java line 123: > 121: width1 = conv1 - '0'; > 122: if (off1 + 2 < max) { > 123: conv1 = format.charAt(off1 + 2); is it worth handling `width1 > 9`? Suggestion: for (int i = 2; conv1 >= '1' && conv1 <= '9' && off1 + i < max; i++) { width1 = (width1 * 10) + (conv1 - '0'); if (off1 + i < max) { conv1 = format.charAt(off1 + i); src/java.base/share/classes/java/lang/StringFormat.java line 151: > 149: coder |= str.coder(); > 150: arg1 = String.valueOf(str); > 151: } similar to above, should these conditions be consolidated? Suggestion: String str; byte coder = format.coder(); if (conv0 == STRING) { if (isBigInt(arg0)) { conv0 = DECIMAL_INTEGER; } else { str = String.valueOf(arg0); coder |= str.coder(); arg0 = str; } } if (conv1 == STRING) { if (isBigInt(arg1)) { conv1 = DECIMAL_INTEGER; } else { str = String.valueOf(arg1); coder |= str.coder(); arg1 = String.valueOf(str); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19956#discussion_r1659880719 PR Review Comment: https://git.openjdk.org/jdk/pull/19956#discussion_r1659888420 PR Review Comment: https://git.openjdk.org/jdk/pull/19956#discussion_r1659881435 PR Review Comment: https://git.openjdk.org/jdk/pull/19956#discussion_r1659882380 PR Review Comment: https://git.openjdk.org/jdk/pull/19956#discussion_r1659888524 PR Review Comment: https://git.openjdk.org/jdk/pull/19956#discussion_r1659888863 PR Review Comment: https://git.openjdk.org/jdk/pull/19956#discussion_r1659882487