Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal

Peter Levart Thu, 20 Mar 2014 00:07:21 -0700

On 03/20/2014 12:32 AM, Peter Levart wrote:

On 03/19/2014 11:01 PM, Brian Burkhalter wrote:
On Mar 14, 2014, at 7:17 AM, Brian Burkhalter<[email protected] <mailto:[email protected]>> wrote:
On Mar 14, 2014, at 3:39 AM, Peter Levart wrote:
But in general it would be better to just use"ThreadLocalRandom.current()" everywhere you use "rnd" variable.This is precisely it's purpose - a random number generator that isnever contended. The overhead of ThreadLocalRandom.current() callis hardly measurable by itself.
I'll update that and re-run some of the benchmarks later.
Following up on the content above and this earlier message in the thread:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-March/025676.html
I have posted a revised patch (NB: I know lines 2897-2906 should beelsewhere)
http://cr.openjdk.java.net/~bpb/6375303/webrev.01/<http://cr.openjdk.java.net/%7Ebpb/6375303/webrev.01/>
and updated benchmark source (using only ThreadLocalRandom.current())
http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java<http://cr.openjdk.java.net/%7Ebpb/6375303/Bench6375303.java>
and updated benchmark  results for three different variations
http://cr.openjdk.java.net/~bpb/6375303/6375303-bench-2.html<http://cr.openjdk.java.net/%7Ebpb/6375303/6375303-bench-2.html>
This version of toString() is from Peter and dispenses with thevolatile qualifier on stringCache. At least on my system, there is nostatistically significant micro-performance difference among thethree versions tested, viz., baseline, toString() change only,toString() change plus other cleanup.
Any comments appreciated.

Thanks,

Brian
Hi Brian,
Here's my promised run of your latest webrev and microbenchmark on ARMplatform (Raspberry Pi) with just released JDK 8 for ARM (-clientcompiler, since -server does not work on Raspberry Pi):
org.openjdk.jmh.Main parameters: ".*" -i 10 -r 5 -wi 5 -w 1 -f 1 -t 1

--- Baseline, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.s.Bench6375303.testFirstToString     avgt        10   330618.266     2211.637 
   ns/op
o.s.Bench6375303.testToString          avgt        10       80.546        0.134 
   ns/op

--- Proposed webrev, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.s.Bench6375303.testFirstToString     avgt        10   326588.284     1714.892 
   ns/op
o.s.Bench6375303.testToString          avgt        10      102.582        0.295 
   ns/op

--- Previous variant with volatile stringCache field, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.s.Bench6375303.testFirstToString     avgt        10   328795.783     2508.173 
   ns/op
o.s.Bench6375303.testToString          avgt        10      105.741        0.316 
   ns/op


So both variants seem to be more or less the same but slower than baseline.

Why would they be slower? Answer: they have more bytecodes.

If I run with following JVM options: -XX:+UnlockDiagnosticVMOptions -XX:MaxInlineSize=100 
(and only the "testToString" benchmark), I get:


--- Baseline, 1-thread ---

Benchmark                         Mode   Samples         Mean   Mean error    
Units
o.s.Bench6375303.testToString     avgt        10       80.839        0.742    
ns/op

--- Proposed webrev, 1-thread ---

Benchmark                         Mode   Samples         Mean   Mean error    
Units
o.s.Bench6375303.testToString     avgt        10       80.851        0.771    
ns/op

--- Previous variant with volatile stringCache field, 1-thread ---

Benchmark                         Mode   Samples         Mean   Mean error    
Units
o.s.Bench6375303.testToString     avgt        10       80.834        0.749    
ns/op

Hi,

The answer, I was thinking about last night, for question: "Why is thisdouble-checked non-volatile-then-volatile trick not any faster than purevolatile variant even on ARM platform where volatile read should havesome penalty compared to normal read?", might be in the fact thatRaspberry Pi is a single-core/single-thread "machine". Would anyone withJVM JIT compiler expertise care to share some insight? I suspect that onsuch platform, the compiler optimizes volatile accesses so that they areperformed without otherwise necessary memory fences...


Regards, Peter



So the solution is to "reduce number of bytecodes in toString()". For example, 
the following:


     public String toString() {
         String sc = stringCache;
         if (sc == null) {
             sc = toStringSlow();
         }
         return sc;
     }

     private String toStringSlow() {
         String sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET);
         if (sc == null) {
             sc = layoutChars(true);
             if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) {
                 sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET);
             }
         }
         return sc;
     }


...gives the good results even without special JVM options:

Benchmark                         Mode   Samples         Mean   Mean error    
Units
o.s.Bench6375303.testToString     avgt        10       80.925        0.313    
ns/op


Regards, Peter

Re: JDK 9 RFR of 6375303: Review use of caching in BigDecimal

Reply via email to