On 03/20/2014 12:32 AM, Peter Levart wrote:
On 03/19/2014 11:01 PM, Brian Burkhalter wrote:
On Mar 14, 2014, at 7:17 AM, Brian Burkhalter
<brian.burkhal...@oracle.com <mailto:brian.burkhal...@oracle.com>> wrote:
On Mar 14, 2014, at 3:39 AM, Peter Levart wrote:
But in general it would be better to just use
"ThreadLocalRandom.current()" everywhere you use "rnd" variable.
This is precisely it's purpose - a random number generator that is
never contended. The overhead of ThreadLocalRandom.current() call
is hardly measurable by itself.
I'll update that and re-run some of the benchmarks later.
Following up on the content above and this earlier message in the thread:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2014-March/025676.html
I have posted a revised patch (NB: I know lines 2897-2906 should be
elsewhere)
http://cr.openjdk.java.net/~bpb/6375303/webrev.01/
<http://cr.openjdk.java.net/%7Ebpb/6375303/webrev.01/>
and updated benchmark source (using only ThreadLocalRandom.current())
http://cr.openjdk.java.net/~bpb/6375303/Bench6375303.java
<http://cr.openjdk.java.net/%7Ebpb/6375303/Bench6375303.java>
and updated benchmark results for three different variations
http://cr.openjdk.java.net/~bpb/6375303/6375303-bench-2.html
<http://cr.openjdk.java.net/%7Ebpb/6375303/6375303-bench-2.html>
This version of toString() is from Peter and dispenses with the
volatile qualifier on stringCache. At least on my system, there is no
statistically significant micro-performance difference among the
three versions tested, viz., baseline, toString() change only,
toString() change plus other cleanup.
Any comments appreciated.
Thanks,
Brian
Hi Brian,
Here's my promised run of your latest webrev and microbenchmark on ARM
platform (Raspberry Pi) with just released JDK 8 for ARM (-client
compiler, since -server does not work on Raspberry Pi):
org.openjdk.jmh.Main parameters: ".*" -i 10 -r 5 -wi 5 -w 1 -f 1 -t 1
--- Baseline, 1-thread ---
Benchmark Mode Samples Mean Mean error
Units
o.s.Bench6375303.testFirstToString avgt 10 330618.266 2211.637
ns/op
o.s.Bench6375303.testToString avgt 10 80.546 0.134
ns/op
--- Proposed webrev, 1-thread ---
Benchmark Mode Samples Mean Mean error
Units
o.s.Bench6375303.testFirstToString avgt 10 326588.284 1714.892
ns/op
o.s.Bench6375303.testToString avgt 10 102.582 0.295
ns/op
--- Previous variant with volatile stringCache field, 1-thread ---
Benchmark Mode Samples Mean Mean error
Units
o.s.Bench6375303.testFirstToString avgt 10 328795.783 2508.173
ns/op
o.s.Bench6375303.testToString avgt 10 105.741 0.316
ns/op
So both variants seem to be more or less the same but slower than baseline.
Why would they be slower? Answer: they have more bytecodes.
If I run with following JVM options: -XX:+UnlockDiagnosticVMOptions -XX:MaxInlineSize=100
(and only the "testToString" benchmark), I get:
--- Baseline, 1-thread ---
Benchmark Mode Samples Mean Mean error
Units
o.s.Bench6375303.testToString avgt 10 80.839 0.742
ns/op
--- Proposed webrev, 1-thread ---
Benchmark Mode Samples Mean Mean error
Units
o.s.Bench6375303.testToString avgt 10 80.851 0.771
ns/op
--- Previous variant with volatile stringCache field, 1-thread ---
Benchmark Mode Samples Mean Mean error
Units
o.s.Bench6375303.testToString avgt 10 80.834 0.749
ns/op
Hi,
The answer, I was thinking about last night, for question: "Why is this
double-checked non-volatile-then-volatile trick not any faster than pure
volatile variant even on ARM platform where volatile read should have
some penalty compared to normal read?", might be in the fact that
Raspberry Pi is a single-core/single-thread "machine". Would anyone with
JVM JIT compiler expertise care to share some insight? I suspect that on
such platform, the compiler optimizes volatile accesses so that they are
performed without otherwise necessary memory fences...
Regards, Peter
So the solution is to "reduce number of bytecodes in toString()". For example,
the following:
public String toString() {
String sc = stringCache;
if (sc == null) {
sc = toStringSlow();
}
return sc;
}
private String toStringSlow() {
String sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET);
if (sc == null) {
sc = layoutChars(true);
if (!U.compareAndSwapObject(this, STRING_CACHE_OFFSET, null, sc)) {
sc = (String) U.getObjectVolatile(this, STRING_CACHE_OFFSET);
}
}
return sc;
}
...gives the good results even without special JVM options:
Benchmark Mode Samples Mean Mean error
Units
o.s.Bench6375303.testToString avgt 10 80.925 0.313
ns/op
Regards, Peter