Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]

2023-09-28 Thread 温绍锦
On Wed, 27 Sep 2023 19:51:25 GMT, Raffaello Giulietti  
wrote:

>> 温绍锦 has updated the pull request incrementally with one additional commit 
>> since the last revision:
>> 
>>   fix : the exception thrown when the input does not include conversion is 
>> different from baselne.
>
> You might consider this alternative, which IMO is simpler and more readable.
> 
> 
> int parse() {
> // %[argument_index$][flags][width][.precision][t]conversion
> // %(\d+$)?([-#+ 0,(<]*)?(\d+)?(.\d+)?([tT])?([a-zA-Z%])
> parseArgument();
> parseFlag();
> parseWidth();
> int precisionSize = parsePrecision();
> if (precisionSize < 0) {
> return 0;
> }
> 
> // ([tT])?([a-zA-Z%])
> char t = '\0', conversion = '\0';
> if ((c == 't' || c == 'T') && off + 1 < max) {
> char c1 = s.charAt(off + 1);
> if (isConversion(c1)) {
> t = c;
> conversion = c1;
> off += 2;
> }
> }
> if (conversion == '\0' && isConversion(c)) {
> conversion = c;
> ++off;
> }
> 
> if (argSize + flagSize + widthSize + precisionSize + t + 
> conversion != 0) {
> if (al != null) {
> FormatSpecifier formatSpecifier
> = new FormatSpecifier(s, start, argSize, 
> flagSize, widthSize, precisionSize, t, conversion);
> al.add(formatSpecifier);
> }
> return off - start;
> }
> return 0;
> }
> 
> private void parseArgument() {
> // (\d+$)?
> int i = off;
> for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
> if (i == max || c != '$') {
> c = first;
> return;
> }
> ++i;  // skip '$'
> if (i < max) {
> c = s.charAt(i);
> }
> argSize = i - off;
> off = i;
> }
> 
> private void parseFlag() {
> // ([-#+ 0,(<]*)?
> int i = off;
> for (; i < max && Flags.isFlag(c = s.charAt(i)); ++i);  // empty 
> body
> flagSize = i - off;
> off = i;
> }
> 
> private void parseWidth() {
> // (\d+)?
> int i = off;
> for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
> widthSize = i - off;
> off = i;
> }
> 
> private int parsePrecision() {
> // (.\d+)?
> if (c != '.') {
> return 0;
> }
> int i = off + 1;
> for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty bod...

@rgiulietti @cl4es Sorry, I plan to make a change, Now there is part of the 
parser code in the constructor of FormatSpecifier. This is not clear. I plan to 
move the parser part of the constructor of FormatSpecifier into 
FormatSpecifierParser. This will be clearer and the performance will be faster.

* now

FormatSpecifier(
String s,
int i,
int argSize,
int flagSize,
int widthSize,
int precisionSize,
char t,
char conversion
) {
int argEnd = i + argSize;
int flagEnd = argEnd + flagSize;
int widthEnd = flagEnd + widthSize;
int precesionEnd = widthEnd + precisionSize;

if (argSize > 0) {
index(s, i, argEnd);
}
if (flagSize > 0) {
flags(s, argEnd, flagEnd);
}
if (widthSize > 0) {
width(s, flagEnd, widthEnd);
}
if (precisionSize > 0) {
precision(s, widthEnd, precesionEnd);
}
if (t != '\0') {
dt = true;
if (t == 'T') {
flags = Flags.add(flags, Flags.UPPERCASE);
}
}
conversion(conversion);
check();
}


* plan to:

FormatSpecifier(int index, int flags, int width, int precision, char t, 
char conversion) {
if (index > 0) {
this.index = index;
}
if (flags > 0) {
this.flags = flags;
if (Flags.contains(flags, Flags.PREVIOUS))
this.index = -1;
}
if (width > 0) {
this.width = width;
}
this.precision = precision;
if (t != '\0') {
dt = true;
if (t == 'T') {
this.flags = Flags.add(flags, Flags.UPPERCASE);

Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]

2023-09-28 Thread Raffaello Giulietti
On Wed, 27 Sep 2023 19:13:02 GMT, 温绍锦  wrote:

>> @cl4es made performance optimizations for the simple specifiers of 
>> String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the 
>> same idea, I continued to make improvements. I made patterns like %2d %02d 
>> also be optimized.
>> 
>> The following are the test results based on MacBookPro M1 Pro: 
>> 
>> 
>> -Benchmark  Mode  Cnt Score Error  Units
>> -StringFormat.complexFormat avgt   15  1862.233 ? 217.479  ns/op
>> -StringFormat.int02Format   avgt   15   312.491 ?  26.021  ns/op
>> -StringFormat.intFormat avgt   1584.432 ?   4.145  ns/op
>> -StringFormat.longFormatavgt   1587.330 ?   6.111  ns/op
>> -StringFormat.stringFormat  avgt   1563.985 ?  11.366  ns/op
>> -StringFormat.stringIntFormat   avgt   1587.422 ?   0.147  ns/op
>> -StringFormat.widthStringFormat avgt   15   250.740 ?  32.639  ns/op
>> -StringFormat.widthStringIntFormat  avgt   15   312.474 ?  16.309  ns/op
>> 
>> +Benchmark  Mode  CntScoreError  Units
>> +StringFormat.complexFormat avgt   15  740.626 ? 66.671  ns/op 
>> (+151.45)
>> +StringFormat.int02Format   avgt   15  131.049 ?  0.432  ns/op 
>> (+138.46)
>> +StringFormat.intFormat avgt   15   67.229 ?  4.155  ns/op 
>> (+25.59)
>> +StringFormat.longFormatavgt   15   66.444 ?  0.614  ns/op 
>> (+31.44)
>> +StringFormat.stringFormat  avgt   15   62.619 ?  4.652  ns/op 
>> (+2.19)
>> +StringFormat.stringIntFormat   avgt   15   89.606 ? 13.966  ns/op 
>> (-2.44)
>> +StringFormat.widthStringFormat avgt   15   52.462 ? 15.649  ns/op 
>> (+377.95)
>> +StringFormat.widthStringIntFormat  avgt   15  101.814 ?  3.147  ns/op 
>> (+206.91)
>
> 温绍锦 has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   fix : the exception thrown when the input does not include conversion is 
> different from baselne.

src/java.base/share/classes/java/util/Formatter.java line 3136:

> 3134: int flagEnd = argEnd + flagSize;
> 3135: int widthEnd = flagEnd + widthSize;
> 3136: int precesionEnd = widthEnd + precisionSize;

Suggestion:

int precisionEnd = widthEnd + precisionSize;

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15776#discussion_r1339135086


Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]

2023-09-27 Thread Raffaello Giulietti
On Wed, 27 Sep 2023 19:13:02 GMT, 温绍锦  wrote:

>> @cl4es made performance optimizations for the simple specifiers of 
>> String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the 
>> same idea, I continued to make improvements. I made patterns like %2d %02d 
>> also be optimized.
>> 
>> The following are the test results based on MacBookPro M1 Pro: 
>> 
>> 
>> -Benchmark  Mode  Cnt Score Error  Units
>> -StringFormat.complexFormat avgt   15  1862.233 ? 217.479  ns/op
>> -StringFormat.int02Format   avgt   15   312.491 ?  26.021  ns/op
>> -StringFormat.intFormat avgt   1584.432 ?   4.145  ns/op
>> -StringFormat.longFormatavgt   1587.330 ?   6.111  ns/op
>> -StringFormat.stringFormat  avgt   1563.985 ?  11.366  ns/op
>> -StringFormat.stringIntFormat   avgt   1587.422 ?   0.147  ns/op
>> -StringFormat.widthStringFormat avgt   15   250.740 ?  32.639  ns/op
>> -StringFormat.widthStringIntFormat  avgt   15   312.474 ?  16.309  ns/op
>> 
>> +Benchmark  Mode  CntScoreError  Units
>> +StringFormat.complexFormat avgt   15  740.626 ? 66.671  ns/op 
>> (+151.45)
>> +StringFormat.int02Format   avgt   15  131.049 ?  0.432  ns/op 
>> (+138.46)
>> +StringFormat.intFormat avgt   15   67.229 ?  4.155  ns/op 
>> (+25.59)
>> +StringFormat.longFormatavgt   15   66.444 ?  0.614  ns/op 
>> (+31.44)
>> +StringFormat.stringFormat  avgt   15   62.619 ?  4.652  ns/op 
>> (+2.19)
>> +StringFormat.stringIntFormat   avgt   15   89.606 ? 13.966  ns/op 
>> (-2.44)
>> +StringFormat.widthStringFormat avgt   15   52.462 ? 15.649  ns/op 
>> (+377.95)
>> +StringFormat.widthStringIntFormat  avgt   15  101.814 ?  3.147  ns/op 
>> (+206.91)
>
> 温绍锦 has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   fix : the exception thrown when the input does not include conversion is 
> different from baselne.

You might consider this alternative, which IMO is simpler and more readable.


int parse() {
// %[argument_index$][flags][width][.precision][t]conversion
// %(\d+$)?([-#+ 0,(<]*)?(\d+)?(.\d+)?([tT])?([a-zA-Z%])
parseArgument();
parseFlag();
parseWidth();
int precisionSize = parsePrecision();
if (precisionSize < 0) {
return 0;
}

// ([tT])?([a-zA-Z%])
char t = '\0', conversion = '\0';
if ((c == 't' || c == 'T') && off + 1 < max) {
char c1 = s.charAt(off + 1);
if (isConversion(c1)) {
t = c;
conversion = c1;
off += 2;
}
}
if (conversion == '\0' && isConversion(c)) {
conversion = c;
++off;
}

if (argSize + flagSize + widthSize + precisionSize + t + conversion 
!= 0) {
if (al != null) {
FormatSpecifier formatSpecifier
= new FormatSpecifier(s, start, argSize, flagSize, 
widthSize, precisionSize, t, conversion);
al.add(formatSpecifier);
}
return off - start;
}
return 0;
}

private void parseArgument() {
// (\d+$)?
int i = off;
for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
if (i == max || c != '$') {
c = first;
return;
}
++i;  // skip '$'
if (i < max) {
c = s.charAt(i);
}
argSize = i - off;
off = i;
}

private void parseFlag() {
// ([-#+ 0,(<]*)?
int i = off;
for (; i < max && Flags.isFlag(c = s.charAt(i)); ++i);  // empty 
body
flagSize = i - off;
off = i;
}

private void parseWidth() {
// (\d+)?
int i = off;
for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
widthSize = i - off;
off = i;
}

private int parsePrecision() {
// (.\d+)?
if (c != '.') {
return 0;
}
int i = off + 1;
for (; i < max && isDigit(c = s.charAt(i)); ++i);  // empty body
if (i == off + 1) {  // a '.' but no digits
return -1;
}
int size = i - off;
off = i;
return size;
}

-

PR Comment: https://git.openjdk.org/jdk/pull/15776#issuecomment-1737986458


Re: RFR: 8316704: Regex-free parsing of Formatter and FormatProcessor specifiers [v12]

2023-09-27 Thread 温绍锦
> @cl4es made performance optimizations for the simple specifiers of 
> String.format in PR https://github.com/openjdk/jdk/pull/2830. Based on the 
> same idea, I continued to make improvements. I made patterns like %2d %02d 
> also be optimized.
> 
> The following are the test results based on MacBookPro M1 Pro: 
> 
> 
> -Benchmark  Mode  Cnt Score Error  Units
> -StringFormat.complexFormat avgt   15  1862.233 ? 217.479  ns/op
> -StringFormat.int02Format   avgt   15   312.491 ?  26.021  ns/op
> -StringFormat.intFormat avgt   1584.432 ?   4.145  ns/op
> -StringFormat.longFormatavgt   1587.330 ?   6.111  ns/op
> -StringFormat.stringFormat  avgt   1563.985 ?  11.366  ns/op
> -StringFormat.stringIntFormat   avgt   1587.422 ?   0.147  ns/op
> -StringFormat.widthStringFormat avgt   15   250.740 ?  32.639  ns/op
> -StringFormat.widthStringIntFormat  avgt   15   312.474 ?  16.309  ns/op
> 
> +Benchmark  Mode  CntScoreError  Units
> +StringFormat.complexFormat avgt   15  740.626 ? 66.671  ns/op 
> (+151.45)
> +StringFormat.int02Format   avgt   15  131.049 ?  0.432  ns/op 
> (+138.46)
> +StringFormat.intFormat avgt   15   67.229 ?  4.155  ns/op 
> (+25.59)
> +StringFormat.longFormatavgt   15   66.444 ?  0.614  ns/op 
> (+31.44)
> +StringFormat.stringFormat  avgt   15   62.619 ?  4.652  ns/op (+2.19)
> +StringFormat.stringIntFormat   avgt   15   89.606 ? 13.966  ns/op (-2.44)
> +StringFormat.widthStringFormat avgt   15   52.462 ? 15.649  ns/op 
> (+377.95)
> +StringFormat.widthStringIntFormat  avgt   15  101.814 ?  3.147  ns/op 
> (+206.91)

温绍锦 has updated the pull request incrementally with one additional commit since 
the last revision:

  fix : the exception thrown when the input does not include conversion is 
different from baselne.

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/15776/files
  - new: https://git.openjdk.org/jdk/pull/15776/files/eafac656..8a6fe0e9

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk=15776=11
 - incr: https://webrevs.openjdk.org/?repo=jdk=15776=10-11

  Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod
  Patch: https://git.openjdk.org/jdk/pull/15776.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/15776/head:pull/15776

PR: https://git.openjdk.org/jdk/pull/15776