On Sun, 17 Sep 2023 16:01:33 GMT, Shaojin Wen <d...@openjdk.org> wrote:

> @cl4es made performance optimizations for the simple specifiers of 
> String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the 
> same idea, I continued to make improvements. I made patterns like %2d %02d 
> also be optimized.
> 
> The following are the test results based on MacBookPro M1 Pro: 
> 
> 
> -Benchmark                          Mode  Cnt     Score     Error  Units
> -StringFormat.complexFormat         avgt   15  1862.233 ? 217.479  ns/op
> -StringFormat.int02Format           avgt   15   312.491 ?  26.021  ns/op
> -StringFormat.intFormat             avgt   15    84.432 ?   4.145  ns/op
> -StringFormat.longFormat            avgt   15    87.330 ?   6.111  ns/op
> -StringFormat.stringFormat          avgt   15    63.985 ?  11.366  ns/op
> -StringFormat.stringIntFormat       avgt   15    87.422 ?   0.147  ns/op
> -StringFormat.widthStringFormat     avgt   15   250.740 ?  32.639  ns/op
> -StringFormat.widthStringIntFormat  avgt   15   312.474 ?  16.309  ns/op
> 
> +Benchmark                          Mode  Cnt    Score    Error  Units
> +StringFormat.complexFormat         avgt   15  740.626 ? 66.671  ns/op 
> (+151.45)
> +StringFormat.int02Format           avgt   15  131.049 ?  0.432  ns/op 
> (+138.46)
> +StringFormat.intFormat             avgt   15   67.229 ?  4.155  ns/op 
> (+25.59)
> +StringFormat.longFormat            avgt   15   66.444 ?  0.614  ns/op 
> (+31.44)
> +StringFormat.stringFormat          avgt   15   62.619 ?  4.652  ns/op (+2.19)
> +StringFormat.stringIntFormat       avgt   15   89.606 ? 13.966  ns/op (-2.44)
> +StringFormat.widthStringFormat     avgt   15   52.462 ? 15.649  ns/op 
> (+377.95)
> +StringFormat.widthStringIntFormat  avgt   15  101.814 ?  3.147  ns/op 
> (+206.91)

For maintainability purposes, we should strive for simplicity and readability.

The parser looks good (but see the small proposed enhancements).

I'll continue with the rest tomorrow.

You might consider this alternative, which IMO is simpler and more readable.


        int parse() {
            // %[argument_index$][flags][width][.precision][t]conversion
            // %(\d+$)?([-#+ 0,(<]*)?(\d+)?(.\d+)?([tT])?([a-zA-Z%])
            parseArgument();
            parseFlag();
            parseWidth();
            int precisionSize = parsePrecision();
            if (precisionSize < 0) {
                return 0;
            }

            // ([tT])?([a-zA-Z%])
            char t = '\0', conversion = '\0';
            if ((c == 't' || c == 'T') && off + 1 < max) {
                char c1 = s.charAt(off + 1);
                if (isConversion(c1)) {
                    t = c;
                    conversion = c1;
                    off += 2;
                }
            }
            if (conversion == '\0' && isConversion(c)) {
                conversion = c;
                ++off;
            }

            if (argSize + flagSize + widthSize + precisionSize + t + conversion 
!= 0) {
                if (al != null) {
                    FormatSpecifier formatSpecifier
                            = new FormatSpecifier(s, start, argSize, flagSize, 
widthSize, precisionSize, t, conversion);
                    al.add(formatSpecifier);
                }
                return off - start;
            }
            return 0;
        }

        private void parseArgument() {
            // (\d+$)?
            int i = off;
            for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
            if (i == max || c != '$') {
                c = first;
                return;
            }
            ++i;  // skip '$'
            if (i < max) {
                c = s.charAt(i);
            }
            argSize = i - off;
            off = i;
        }

        private void parseFlag() {
            // ([-#+ 0,(<]*)?
            int i = off;
            for (; i < max && Flags.isFlag(c = s.charAt(i)); ++i);  // empty 
body
            flagSize = i - off;
            off = i;
        }

        private void parseWidth() {
            // (\d+)?
            int i = off;
            for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
            widthSize = i - off;
            off = i;
        }

        private int parsePrecision() {
            // (.\d+)?
            if (c != '.') {
                return 0;
            }
            int i = off + 1;
            for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
            if (i == off + 1) {  // a '.' but no digits
                return -1;
            }
            int size = i - off;
            off = i;
            return size;
        }

Meaningful external reviews take a _lot_ of engineering time on such a popular 
platform as the JDK.
They make sense only on somehow stable code, otherwise they are a waste of 
human resources.

So sure, take your time to stabilize your code, make your tests, but please 
refrain from keep on changing too much once you publish a PR. If reviewers have 
to face continuous changes, they might become uninterested in reviewing.

src/java.base/share/classes/java/util/Formatter.java line 2899:

> 2897:             if (c == '.') {
> 2898:                 // (\.\d+)?
> 2899:                 precisionSize = parsePrecision();

`precisionSize == 0` has two different meanings: (1) there's a '.' not followed 
by digits and (2) there's no precision field at all.

In the proposed 
[alternative](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458),
 case (1) is indicated by `-1`, while case (2) is indicated by `0`. Besides, 
the decision is taken in the `parsePrecision()` method, not partly here and 
partly in the method.

src/java.base/share/classes/java/util/Formatter.java line 2918:

> 2916:                 conversion = c;
> 2917:                 ++off;
> 2918:             }

Suggestion:

            } else if (isConversion(c)) {
                conversion = c;
                ++off;
            }

src/java.base/share/classes/java/util/Formatter.java line 2933:

> 2931:         private void parseArgument() {
> 2932:             // (\d+\$)?
> 2933:             for (int size = 0; off < max; c = s.charAt(++off), size++) {

This argument_index parser is incorrect.

Say the invalid "specifier" is `"%12"`.
The parser would throw a `StringIndexOutOfBoundsException` on `s.charAt(++off)`.

src/java.base/share/classes/java/util/Formatter.java line 2934:

> 2932:             // (\d+\$)?
> 2933:             int i = off;
> 2934:             for (; i < max && isDigit(c); c = next(++i));  // empty body

No need for `next()` if done as in the proposed alternative.

src/java.base/share/classes/java/util/Formatter.java line 2947:

> 2945:                     c = first;
> 2946:                 }
> 2947:             }

I think the logic in the proposed alternative is more readable.

src/java.base/share/classes/java/util/Formatter.java line 2953:

> 2951:             // ([-#+ 0,(\<]*)?
> 2952:             int i = off;
> 2953:             for (; i < max && Flags.isFlag(c); c = next(++i));  // 
> empty body

No need for `next()` if done as in the proposed alternative.

src/java.base/share/classes/java/util/Formatter.java line 2961:

> 2959:             // (\d+)?
> 2960:             int i = off;
> 2961:             for (; i < max && isDigit(c); c = next(++i));  // empty body

You are using `next()` here, but the simpler solution in `parsePrecision()`.
I think consistency simplifies understanding.

src/java.base/share/classes/java/util/Formatter.java line 2966:

> 2964: 
> 2965:         private int parsePrecision() {
> 2966:             // (\.\d+)?

Since processing of `'.'` is done by the caller, the regex pattern in this 
comment is confusing.
Either prefer handling `'.'` here (see 
[this](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458)), or 
adjust the regex in the comment.

src/java.base/share/classes/java/util/Formatter.java line 2978:

> 2976:         }
> 2977: 
> 2978:         private char next(int off) {

There's no real need for this more expensive method, compared with a simple 
`s.charAt(i)`, in the alternative proposed 
[before](https://github.com/openjdk/jdk/pull/15776#issuecomment-1737986458).

src/java.base/share/classes/java/util/Formatter.java line 3136:

> 3134:             int flagEnd = argEnd + flagSize;
> 3135:             int widthEnd = flagEnd + widthSize;
> 3136:             int precesionEnd = widthEnd + precisionSize;

Suggestion:

            int precisionEnd = widthEnd + precisionSize;

-------------

PR Review: https://git.openjdk.org/jdk/pull/15776#pullrequestreview-1647449239
PR Review: https://git.openjdk.org/jdk/pull/15776#pullrequestreview-1660250320
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1737986458
PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1739462969
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339852931
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1347671680
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1338962488
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339856693
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339856725
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339857403
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339844252
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1347679848
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339842414
PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339135086

Reply via email to