Re: RFR: 8333396: Performance regression of new DecimalFormat and DecimalFormat.format [v2]

2024-06-03 Thread lingjun-cg
On Mon, 3 Jun 2024 06:13:35 GMT, lingjun-cg  wrote:

>> ### Performance regression of DecimalFormat.format
>> From the output of perf, we can see the hottest regions contain atomic 
>> instructions.  But when run with JDK 11, there is no such problem. The 
>> reason is the removed biased locking.  
>> The DecimalFormat uses StringBuffer everywhere, and StringBuffer itself 
>> contains many synchronized methods.
>> So I added support for some new methods that accept StringBuilder which is 
>> lock-free.
>> 
>> ### Performance regression of new DecimalFormat
>> After comparing the flame graph between current jdk and jdk 11,  the method 
>> java.text.DecimalFormatSymbols#findNonFormatChar takes a significant time.  
>> The performance becomes as good as jdk11 after replacing it with a simple 
>> loop implementation.
>> 
>> 
>> 
>> ### Test result
>> 
>> @BenchmarkMode(Mode.AverageTime)
>> @Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
>> @Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
>> @State(Scope.Thread)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> public class JmhDecimalFormat {
>> 
>> private DecimalFormat format;
>> 
>> @Setup(Level.Trial)
>> public void setup() {
>> format = new DecimalFormat("#0.0");
>> }
>> 
>> @Benchmark
>> public void testNewAndFormat() throws InterruptedException {
>> new DecimalFormat("#0.0").format(9524234.1236457);
>> }
>> 
>> @Benchmark
>> public void testNewOnly() throws InterruptedException {
>> new DecimalFormat("#0.0");
>> }
>> 
>> @Benchmark
>> public void testFormatOnly() throws InterruptedException {
>> format.format(9524234.1236457);
>> }
>> }
>> 
>>  Current JDK before optimize
>> 
>>  Benchmark Mode  CntScore   Error  Units
>> JmhDecimalFormat.testFormatOnly   avgt   50  642.099 ? 1.253  ns/op
>> JmhDecimalFormat.testNewAndFormat avgt   50  989.307 ? 3.676  ns/op
>> JmhDecimalFormat.testNewOnly  avgt   50  303.381 ? 5.252  ns/op
>> 
>> 
>> 
>>  Current JDK after optimize
>> 
>> Benchmark  Mode  CntScore   Error  Units
>> JmhDecimalFormat.testFormatOnlyavgt   50  351.499 ? 0.761  ns/op
>> JmhDecimalFormat.testNewAndFormat  avgt   50  615.145 ? 2.478  ns/op
>> JmhDecimalFormat.testNewOnly   avgt   50  209.874 ? 9.951  ns/op
>> 
>> 
>> ### JDK 11 
>> 
>> Benchmark  Mode  CntScore   Error  Units
>> JmhDecimalFormat.testFormatOnlyavgt   50  364.214 ? 1.191  ns/op
>> JmhDecimalFormat.testNewAndForma...
>
> lingjun-cg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   896: Performance regression of new DecimalFormat and 
> DecimalFormat.format

Hi naotaoj,
What I mean "performance regression" is compare to JDK 11. We have an 
server side application that use DecimalFormat.format API seriously. When 
migrate it from JDK 11 to JDK 21, we found a performance  degradation.
So I write the JMH test case "JmhDecimalFormat".  It show that there a 
performance regression since JDK 21.

These are the perfasm output for running JMH on both JDK 11 and JDK21. There 
are some  hot regions around the atomic instructions in JDK 21, but no such 
problem in JDK 11.
[jdk11.txt](https://github.com/user-attachments/files/15541278/jdk11.txt)
[jdk21.txt](https://github.com/user-attachments/files/15541279/jdk21.txt)

Maybe the [JEP 374: Deprecate and Disable Biased 
Locking](https://openjdk.org/jeps/374) is the reason?
So I run the benchmark on JDK 11 again but with option '-XX:-UseBiasedLocking', 
 there only a minor gap between jdk11 and jdk 21.

OK, return to my patch. java.text use StringBuffer internally,  but nearly all 
methods in StringBuffer are synchronized:

@Override
public synchronized StringBuffer append(Object obj) {

}

@Override
@IntrinsicCandidate
public synchronized StringBuffer append(String str) {
...
}


>From the above analysis, the atomic instructions slow down 
>DecimalFormat.format, and StringBuffer's synchronized methods generate there 
>atomic instructions. So If remove these synchronized methods, it will get a 
>better performance.
So I replace StringBuffer with StringBuilder in  java.text.NumberFormat.

public final String format(double number) {
// Use fast-path for double result if that works
String result = fastFormat(number);
if (result != null)
return result;

-return format(number, new StringBuffer(),
+return format(number, new StringBuilder(),
  DontCareFieldPosition.INSTANCE).toString();
}

@@ -367,7 +367,7 @@ public final String format(double number) {
 * @see java.text.Format#format
 */
public final String format(long number) {
   - return format(number, new StringBuffer(),
   + return 

Re: RFR: 8333396: Performance regression of new DecimalFormat and DecimalFormat.format [v2]

2024-06-03 Thread Naoto Sato
On Mon, 3 Jun 2024 06:13:35 GMT, lingjun-cg  wrote:

>> ### Performance regression of DecimalFormat.format
>> From the output of perf, we can see the hottest regions contain atomic 
>> instructions.  But when run with JDK 11, there is no such problem. The 
>> reason is the removed biased locking.  
>> The DecimalFormat uses StringBuffer everywhere, and StringBuffer itself 
>> contains many synchronized methods.
>> So I added support for some new methods that accept StringBuilder which is 
>> lock-free.
>> 
>> ### Performance regression of new DecimalFormat
>> After comparing the flame graph between current jdk and jdk 11,  the method 
>> java.text.DecimalFormatSymbols#findNonFormatChar takes a significant time.  
>> The performance becomes as good as jdk11 after replacing it with a simple 
>> loop implementation.
>> 
>> 
>> 
>> ### Test result
>> 
>> @BenchmarkMode(Mode.AverageTime)
>> @Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
>> @Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
>> @State(Scope.Thread)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> public class JmhDecimalFormat {
>> 
>> private DecimalFormat format;
>> 
>> @Setup(Level.Trial)
>> public void setup() {
>> format = new DecimalFormat("#0.0");
>> }
>> 
>> @Benchmark
>> public void testNewAndFormat() throws InterruptedException {
>> new DecimalFormat("#0.0").format(9524234.1236457);
>> }
>> 
>> @Benchmark
>> public void testNewOnly() throws InterruptedException {
>> new DecimalFormat("#0.0");
>> }
>> 
>> @Benchmark
>> public void testFormatOnly() throws InterruptedException {
>> format.format(9524234.1236457);
>> }
>> }
>> 
>>  Current JDK before optimize
>> 
>>  Benchmark Mode  CntScore   Error  Units
>> JmhDecimalFormat.testFormatOnly   avgt   50  642.099 ? 1.253  ns/op
>> JmhDecimalFormat.testNewAndFormat avgt   50  989.307 ? 3.676  ns/op
>> JmhDecimalFormat.testNewOnly  avgt   50  303.381 ? 5.252  ns/op
>> 
>> 
>> 
>>  Current JDK after optimize
>> 
>> Benchmark  Mode  CntScore   Error  Units
>> JmhDecimalFormat.testFormatOnlyavgt   50  351.499 ? 0.761  ns/op
>> JmhDecimalFormat.testNewAndFormat  avgt   50  615.145 ? 2.478  ns/op
>> JmhDecimalFormat.testNewOnly   avgt   50  209.874 ? 9.951  ns/op
>> 
>> 
>> ### JDK 11 
>> 
>> Benchmark  Mode  CntScore   Error  Units
>> JmhDecimalFormat.testFormatOnlyavgt   50  364.214 ? 1.191  ns/op
>> JmhDecimalFormat.testNewAndForma...
>
> lingjun-cg has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   896: Performance regression of new DecimalFormat and 
> DecimalFormat.format

Hi,

Can you please provide more details? As to StringBuffer, I think it is being 
used since those classes in `java.text` package have been created. I am not 
sure why that contributes to what you described as the "performance regression".

Separately, please split this PR into two, as combining two different issues 
into a single JBS issue/PR is not right. The second issue is likely due to 
loading stream classes for the first time at JVM startup.

-

PR Comment: https://git.openjdk.org/jdk/pull/19513#issuecomment-2146264799


Re: RFR: 8333396: Performance regression of new DecimalFormat and DecimalFormat.format [v2]

2024-06-02 Thread lingjun-cg
> ### Performance regression of DecimalFormat.format
> From the output of perf, we can see the hottest regions contain atomic 
> instructions.  But when run with JDK 11, there is no such problem. The reason 
> is the removed biased locking.  
> The DecimalFormat uses StringBuffer everywhere, and StringBuffer itself 
> contains many synchronized methods.
> So I added support for some new methods that accept StringBuilder which is 
> lock-free.
> 
> ### Performance regression of new DecimalFormat
> After comparing the flame graph between current jdk and jdk 11,  the method 
> java.text.DecimalFormatSymbols#findNonFormatChar takes a significant time.  
> The performance becomes as good as jdk11 after replacing it with a simple 
> loop implementation.
> 
> 
> 
> ### Test result
> 
> @BenchmarkMode(Mode.AverageTime)
> @Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
> @Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
> @State(Scope.Thread)
> @OutputTimeUnit(TimeUnit.NANOSECONDS)
> public class JmhDecimalFormat {
> 
> private DecimalFormat format;
> 
> @Setup(Level.Trial)
> public void setup() {
> format = new DecimalFormat("#0.0");
> }
> 
> @Benchmark
> public void testNewAndFormat() throws InterruptedException {
> new DecimalFormat("#0.0").format(9524234.1236457);
> }
> 
> @Benchmark
> public void testNewOnly() throws InterruptedException {
> new DecimalFormat("#0.0");
> }
> 
> @Benchmark
> public void testFormatOnly() throws InterruptedException {
> format.format(9524234.1236457);
> }
> }
> 
>  Current JDK before optimize
> 
>  Benchmark Mode  CntScore   Error  Units
> JmhDecimalFormat.testFormatOnly   avgt   50  642.099 ? 1.253  ns/op
> JmhDecimalFormat.testNewAndFormat avgt   50  989.307 ? 3.676  ns/op
> JmhDecimalFormat.testNewOnly  avgt   50  303.381 ? 5.252  ns/op
> 
> 
> 
>  Current JDK after optimize
> 
> Benchmark  Mode  CntScore   Error  Units
> JmhDecimalFormat.testFormatOnlyavgt   50  351.499 ? 0.761  ns/op
> JmhDecimalFormat.testNewAndFormat  avgt   50  615.145 ? 2.478  ns/op
> JmhDecimalFormat.testNewOnly   avgt   50  209.874 ? 9.951  ns/op
> 
> 
> ### JDK 11 
> 
> Benchmark  Mode  CntScore   Error  Units
> JmhDecimalFormat.testFormatOnlyavgt   50  364.214 ? 1.191  ns/op
> JmhDecimalFormat.testNewAndFormat  avgt   50  658.699 ? 2.311  ns/op
> JmhDecimalFormat.testNewOnly   avgt   50  248.300 ? 5.158  ns/op

lingjun-cg has updated the pull request incrementally with one additional 
commit since the last revision:

  896: Performance regression of new DecimalFormat and DecimalFormat.format

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/19513/files
  - new: https://git.openjdk.org/jdk/pull/19513/files/a6755f8f..b48962b5

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=19513&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19513&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/19513.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19513/head:pull/19513

PR: https://git.openjdk.org/jdk/pull/19513


RFR: 8333396: Performance regression of new DecimalFormat and DecimalFormat.format

2024-06-02 Thread lingjun-cg
### Performance regression of DecimalFormat.format
>From the output of perf, we can see the hottest regions contain atomic 
>instructions.  But when run with JDK 11, there is no such problem. The reason 
>is the removed biased locking.  
The DecimalFormat uses StringBuffer everywhere, and StringBuffer itself 
contains many synchronized methods.
So I added support for some new methods that accept StringBuilder which is 
lock-free.

### Performance regression of new DecimalFormat
After comparing the flame graph between current jdk and jdk 11,  the method 
java.text.DecimalFormatSymbols#findNonFormatChar takes a significant time.  The 
performance becomes as good as jdk11 after replacing it with a simple loop 
implementation.

-

Commit messages:
 - 896: Performance regression of new DecimalFormat and DecimalFormat.format

Changes: https://git.openjdk.org/jdk/pull/19513/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19513&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-896
  Stats: 318 lines in 11 files changed: 268 ins; 0 del; 50 mod
  Patch: https://git.openjdk.org/jdk/pull/19513.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19513/head:pull/19513

PR: https://git.openjdk.org/jdk/pull/19513