Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]
On Wed, 27 Sep 2023 19:51:25 GMT, Raffaello Giulietti wrote: >> 温绍锦 has updated the pull request incrementally with one additional commit >> since the last revision: >> >> fix : the exception thrown when the input does not include conversion is >> different from baselne. > > You might consider this alternative, which IMO is simpler and more readable. > > > int parse() { > // %[argument_index$][flags][width][.precision][t]conversion > // %(\d+$)?([-#+ 0,(<]*)?(\d+)?(.\d+)?([tT])?([a-zA-Z%]) > parseArgument(); > parseFlag(); > parseWidth(); > int precisionSize = parsePrecision(); > if (precisionSize < 0) { > return 0; > } > > // ([tT])?([a-zA-Z%]) > char t = '\0', conversion = '\0'; > if ((c == 't' || c == 'T') && off + 1 < max) { > char c1 = s.charAt(off + 1); > if (isConversion(c1)) { > t = c; > conversion = c1; > off += 2; > } > } > if (conversion == '\0' && isConversion(c)) { > conversion = c; > ++off; > } > > if (argSize + flagSize + widthSize + precisionSize + t + > conversion != 0) { > if (al != null) { > FormatSpecifier formatSpecifier > = new FormatSpecifier(s, start, argSize, > flagSize, widthSize, precisionSize, t, conversion); > al.add(formatSpecifier); > } > return off - start; > } > return 0; > } > > private void parseArgument() { > // (\d+$)? > int i = off; > for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body > if (i == max || c != '$') { > c = first; > return; > } > ++i; // skip '$' > if (i < max) { > c = s.charAt(i); > } > argSize = i - off; > off = i; > } > > private void parseFlag() { > // ([-#+ 0,(<]*)? > int i = off; > for (; i < max && Flags.isFlag(c = s.charAt(i)); ++i); // empty > body > flagSize = i - off; > off = i; > } > > private void parseWidth() { > // (\d+)? > int i = off; > for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body > widthSize = i - off; > off = i; > } > > private int parsePrecision() { > // (.\d+)? > if (c != '.') { > return 0; > } > int i = off + 1; > for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty bod... @rgiulietti @cl4es Sorry, I plan to make a change, Now there is part of the parser code in the constructor of FormatSpecifier. This is not clear. I plan to move the parser part of the constructor of FormatSpecifier into FormatSpecifierParser. This will be clearer and the performance will be faster. * now FormatSpecifier( String s, int i, int argSize, int flagSize, int widthSize, int precisionSize, char t, char conversion ) { int argEnd = i + argSize; int flagEnd = argEnd + flagSize; int widthEnd = flagEnd + widthSize; int precesionEnd = widthEnd + precisionSize; if (argSize > 0) { index(s, i, argEnd); } if (flagSize > 0) { flags(s, argEnd, flagEnd); } if (widthSize > 0) { width(s, flagEnd, widthEnd); } if (precisionSize > 0) { precision(s, widthEnd, precesionEnd); } if (t != '\0') { dt = true; if (t == 'T') { flags = Flags.add(flags, Flags.UPPERCASE); } } conversion(conversion); check(); } * plan to: FormatSpecifier(int index, int flags, int width, int precision, char t, char conversion) { if (index > 0) { this.index = index; } if (flags > 0) { this.flags = flags; if (Flags.contains(flags, Flags.PREVIOUS)) this.index = -1; } if (width > 0) { this.width = width; } this.precision = precision; if (t != '\0') { dt = true; if (t == 'T') { this.flags = Flags.add(flags, Flags.UPPERCASE);
Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]
On Wed, 27 Sep 2023 19:13:02 GMT, 温绍锦 wrote: >> @cl4es made performance optimizations for the simple specifiers of >> String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the >> same idea, I continued to make improvements. I made patterns like %2d %02d >> also be optimized. >> >> The following are the test results based on MacBookPro M1 Pro: >> >> >> -Benchmark Mode Cnt Score Error Units >> -StringFormat.complexFormat avgt 15 1862.233 ? 217.479 ns/op >> -StringFormat.int02Format avgt 15 312.491 ? 26.021 ns/op >> -StringFormat.intFormat avgt 1584.432 ? 4.145 ns/op >> -StringFormat.longFormatavgt 1587.330 ? 6.111 ns/op >> -StringFormat.stringFormat avgt 1563.985 ? 11.366 ns/op >> -StringFormat.stringIntFormat avgt 1587.422 ? 0.147 ns/op >> -StringFormat.widthStringFormat avgt 15 250.740 ? 32.639 ns/op >> -StringFormat.widthStringIntFormat avgt 15 312.474 ? 16.309 ns/op >> >> +Benchmark Mode CntScoreError Units >> +StringFormat.complexFormat avgt 15 740.626 ? 66.671 ns/op >> (+151.45) >> +StringFormat.int02Format avgt 15 131.049 ? 0.432 ns/op >> (+138.46) >> +StringFormat.intFormat avgt 15 67.229 ? 4.155 ns/op >> (+25.59) >> +StringFormat.longFormatavgt 15 66.444 ? 0.614 ns/op >> (+31.44) >> +StringFormat.stringFormat avgt 15 62.619 ? 4.652 ns/op >> (+2.19) >> +StringFormat.stringIntFormat avgt 15 89.606 ? 13.966 ns/op >> (-2.44) >> +StringFormat.widthStringFormat avgt 15 52.462 ? 15.649 ns/op >> (+377.95) >> +StringFormat.widthStringIntFormat avgt 15 101.814 ? 3.147 ns/op >> (+206.91) > > 温绍锦 has updated the pull request incrementally with one additional commit > since the last revision: > > fix : the exception thrown when the input does not include conversion is > different from baselne. src/java.base/share/classes/java/util/Formatter.java line 3136: > 3134: int flagEnd = argEnd + flagSize; > 3135: int widthEnd = flagEnd + widthSize; > 3136: int precesionEnd = widthEnd + precisionSize; Suggestion: int precisionEnd = widthEnd + precisionSize; - PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339135086
Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]
On Wed, 27 Sep 2023 19:13:02 GMT, 温绍锦 wrote: >> @cl4es made performance optimizations for the simple specifiers of >> String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the >> same idea, I continued to make improvements. I made patterns like %2d %02d >> also be optimized. >> >> The following are the test results based on MacBookPro M1 Pro: >> >> >> -Benchmark Mode Cnt Score Error Units >> -StringFormat.complexFormat avgt 15 1862.233 ? 217.479 ns/op >> -StringFormat.int02Format avgt 15 312.491 ? 26.021 ns/op >> -StringFormat.intFormat avgt 1584.432 ? 4.145 ns/op >> -StringFormat.longFormatavgt 1587.330 ? 6.111 ns/op >> -StringFormat.stringFormat avgt 1563.985 ? 11.366 ns/op >> -StringFormat.stringIntFormat avgt 1587.422 ? 0.147 ns/op >> -StringFormat.widthStringFormat avgt 15 250.740 ? 32.639 ns/op >> -StringFormat.widthStringIntFormat avgt 15 312.474 ? 16.309 ns/op >> >> +Benchmark Mode CntScoreError Units >> +StringFormat.complexFormat avgt 15 740.626 ? 66.671 ns/op >> (+151.45) >> +StringFormat.int02Format avgt 15 131.049 ? 0.432 ns/op >> (+138.46) >> +StringFormat.intFormat avgt 15 67.229 ? 4.155 ns/op >> (+25.59) >> +StringFormat.longFormatavgt 15 66.444 ? 0.614 ns/op >> (+31.44) >> +StringFormat.stringFormat avgt 15 62.619 ? 4.652 ns/op >> (+2.19) >> +StringFormat.stringIntFormat avgt 15 89.606 ? 13.966 ns/op >> (-2.44) >> +StringFormat.widthStringFormat avgt 15 52.462 ? 15.649 ns/op >> (+377.95) >> +StringFormat.widthStringIntFormat avgt 15 101.814 ? 3.147 ns/op >> (+206.91) > > 温绍锦 has updated the pull request incrementally with one additional commit > since the last revision: > > fix : the exception thrown when the input does not include conversion is > different from baselne. You might consider this alternative, which IMO is simpler and more readable. int parse() { // %[argument_index$][flags][width][.precision][t]conversion // %(\d+$)?([-#+ 0,(<]*)?(\d+)?(.\d+)?([tT])?([a-zA-Z%]) parseArgument(); parseFlag(); parseWidth(); int precisionSize = parsePrecision(); if (precisionSize < 0) { return 0; } // ([tT])?([a-zA-Z%]) char t = '\0', conversion = '\0'; if ((c == 't' || c == 'T') && off + 1 < max) { char c1 = s.charAt(off + 1); if (isConversion(c1)) { t = c; conversion = c1; off += 2; } } if (conversion == '\0' && isConversion(c)) { conversion = c; ++off; } if (argSize + flagSize + widthSize + precisionSize + t + conversion != 0) { if (al != null) { FormatSpecifier formatSpecifier = new FormatSpecifier(s, start, argSize, flagSize, widthSize, precisionSize, t, conversion); al.add(formatSpecifier); } return off - start; } return 0; } private void parseArgument() { // (\d+$)? int i = off; for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body if (i == max || c != '$') { c = first; return; } ++i; // skip '$' if (i < max) { c = s.charAt(i); } argSize = i - off; off = i; } private void parseFlag() { // ([-#+ 0,(<]*)? int i = off; for (; i < max && Flags.isFlag(c = s.charAt(i)); ++i); // empty body flagSize = i - off; off = i; } private void parseWidth() { // (\d+)? int i = off; for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body widthSize = i - off; off = i; } private int parsePrecision() { // (.\d+)? if (c != '.') { return 0; } int i = off + 1; for (; i < max && isDigit(c = s.charAt(i)); ++i); // empty body if (i == off + 1) { // a '.' but no digits return -1; } int size = i - off; off = i; return size; } - PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1737986458
Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]
> @cl4es made performance optimizations for the simple specifiers of > String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the > same idea, I continued to make improvements. I made patterns like %2d %02d > also be optimized. > > The following are the test results based on MacBookPro M1 Pro: > > > -Benchmark Mode Cnt Score Error Units > -StringFormat.complexFormat avgt 15 1862.233 ? 217.479 ns/op > -StringFormat.int02Format avgt 15 312.491 ? 26.021 ns/op > -StringFormat.intFormat avgt 1584.432 ? 4.145 ns/op > -StringFormat.longFormatavgt 1587.330 ? 6.111 ns/op > -StringFormat.stringFormat avgt 1563.985 ? 11.366 ns/op > -StringFormat.stringIntFormat avgt 1587.422 ? 0.147 ns/op > -StringFormat.widthStringFormat avgt 15 250.740 ? 32.639 ns/op > -StringFormat.widthStringIntFormat avgt 15 312.474 ? 16.309 ns/op > > +Benchmark Mode CntScoreError Units > +StringFormat.complexFormat avgt 15 740.626 ? 66.671 ns/op > (+151.45) > +StringFormat.int02Format avgt 15 131.049 ? 0.432 ns/op > (+138.46) > +StringFormat.intFormat avgt 15 67.229 ? 4.155 ns/op > (+25.59) > +StringFormat.longFormatavgt 15 66.444 ? 0.614 ns/op > (+31.44) > +StringFormat.stringFormat avgt 15 62.619 ? 4.652 ns/op (+2.19) > +StringFormat.stringIntFormat avgt 15 89.606 ? 13.966 ns/op (-2.44) > +StringFormat.widthStringFormat avgt 15 52.462 ? 15.649 ns/op > (+377.95) > +StringFormat.widthStringIntFormat avgt 15 101.814 ? 3.147 ns/op > (+206.91) 温绍锦 has updated the pull request incrementally with one additional commit since the last revision: fix : the exception thrown when the input does not include conversion is different from baselne. - Changes: - all: https://git.openjdk.org/jdk/pull/15776/files - new: https://git.openjdk.org/jdk/pull/15776/files/eafac656..8a6fe0e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=15776=11 - incr: https://webrevs.openjdk.org/?repo=jdk=15776=10-11 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15776.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15776/head:pull/15776 PR: https://git.openjdk.org/jdk/pull/15776