Re: RFR: 8007806: Need a Throwables performance counter

David Holmes Sun, 24 Feb 2013 14:18:55 -0800

We've not-so-slightly hijacked Nils' thread here - apologies for that.


On 25/02/2013 8:05 AM, Peter Levart wrote:


Just looked at one way jstat accesses the counters. It runs in a
separate VM and maps-in a file that is already mapped in the observing
VM in the direct buffer. It then accesses it via a LongBuffer view (for
long counters). So there's no synchronization between counter updater
and counter reader. On ARM v6 jstat could see a "torn" long counter
then, right?

Right. With current implementation of PerfLongCounter it uses simplestores (not atomic ops).

The double-32bit-CAS updater that I presented would not make it worse
then on such platforms, I suppose.


No change in tearing abaility.

On the platforms that support 64bit atomic stores, there are not such
problems. And I assume those same platforms also support 64bit CAS, or
are there platforms with 64bit atomic stores and no 64bit CAS?

Most of them actually :) All Java platforms must support atomicload/store of 64-bit values to support volatile long and doublevariables. On 32-bit platforms this is done via a range of techniques -for example on x86 it is done via the FPU. But these atomic accesses arecurrently restricted to Java volatile field accesses via bytecode -there are not exposed via the Unsafe methods, nor are they madeavailable via the Atomic:: class in the VM.

Some of these 32-bit platforms also support the 64-bit CAS, which iswhat supports_cx8() is intended to indicate.

If the PerfCounters were supposed to be thread-safe then they might usethese alternate atomic access operations.


David

Regards, Peter


David

Regards, Peter


David
-----

If this is true and it is not that important, then instead of a
synchronized update of 64bit counter, a 32bit CAS could be used,
optionally (rarely) followed by a second 32bit CAS, like for example:

http://dl.dropbox.com/u/101777488/jdk8-tl/PerfCounter/webrev.01/index.html



I tried this on ARM v6 and it works much better than synchronized
access, but I don't know if it's acceptable. It guarantees eventual
correctness of summed value if the only operation performed is
add() (no
set() intermingled) and has the same possibility of incorrect
half-half
reads by observers as current PerfCounter has for unsynchronized
observers.

Here's the comparison of unpatched/patched PerfCounter.increment()
micro-benchmark on single-core ARM v6 (Raspbery-PI):

*** Original PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
            1 threads, Tavg =    269.34 ns/op (σ =   0.00 ns/op) [
269.34]
            2 threads, Tavg =  7,170.48 ns/op (σ = 410.77 ns/op) [
6,783.73,  7,603.95]
            3 threads, Tavg = 12,034.82 ns/op (σ = 418.99 ns/op)
[11,792.33, 11,714.67, 12,639.26]
            4 threads, Tavg = 16,029.76 ns/op (σ = 1,411.44 ns/op)
[15,592.04, 18,511.52, 15,642.52, 14,818.16]


*** Patched PerfCounter, ARM v6

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 1
#
            1 threads, Tavg =    166.21 ns/op (σ =   0.00 ns/op) [
166.21]
            2 threads, Tavg =    332.58 ns/op (σ =   0.12 ns/op) [
332.45,    332.70]
            3 threads, Tavg =    500.30 ns/op (σ =   0.22 ns/op) [
500.04,    500.29,    500.58]
            4 threads, Tavg =    667.95 ns/op (σ =   2.11 ns/op) [
665.22,    667.18,    668.40,    671.04]


Regards, Peter


On 02/24/2013 11:31 AM, David Holmes wrote:

On 24/02/2013 6:50 PM, Peter Levart wrote:

Hi David,

I thought it was ok to pass null, but I don't know the "portability"
issues in-depth. The javadoc for Unsafe says:

/"This method refers to a variable by means of two parameters, and
so it
provides (in effect) a double-register addressing mode for Java
variables. When the object reference is null, this method uses its
offset as an absolute address. This is similar in operation to
methods
such as getInt(long), which provide (in effect) a single-register
addressing mode for non-Java variables. However, because Java
variables
may have a different layout in memory from non-Java variables,
programmers should not assume that these two addressing modes are
ever
equivalent. Also, programmers should remember that offsets from the
double-register addressing mode cannot be portably confused with
longs
used in the single-register addressing mode."/


That is the doc for getXXX but not for getAndAddXXX or
compareAndSwapXXX. You can't have null here:

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapLong(JNIEnv *env, jobject
unsafe, jobject obj, jlong offset, jlong e, jlong x))
  UnsafeWrapper("Unsafe_CompareAndSwapLong");
  Handle p (THREAD, JNIHandles::resolve(obj));
  jlong* addr = (jlong*)(index_oop_from_field_offset_long(p(),
offset));
  if (VM_Version::supports_cx8())
    return (jlong)(Atomic::cmpxchg(x, addr, e)) == e;
  else {
    jboolean success = false;
    ObjectLocker ol(p, THREAD);
    if (*addr == e) { *addr = x; success = true; }
    return success;
  }
UNSAFE_END

David
-----

Does anybody know the in-depth interpretation of the above? Is it
only
the particular Java/native type differences (for example,
endianess of
variables) that these two addressing modes might interpret
differently
or something else too?

Regards, Peter


On 02/24/2013 12:39 AM, David Holmes wrote:

Peter,

In your use of Unsafe you pass "null" as the object. I'm pretty
certain you can't pass null here. Unsafe operates on fields or
array
elements.

David

On 24/02/2013 5:39 AM, Peter Levart wrote:

Hi Nils,

If the counters are updated frequently from multiple threads,
there
might be contention/scalability issues. Instead of
synchronization on
updates, you might consider using atomic updates provided by
sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===================================================================

--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

  package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
  import java.nio.ByteBuffer;
  import java.nio.ByteOrder;
  import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
  public class PerfCounter {
      private static final Perf perf =
          AccessController.doPrivileged(new Perf.GetPerfAction());
+    private static final Unsafe unsafe =
+        Unsafe.getUnsafe();

      // Must match values defined in
hotspot/src/share/vm/runtime/perfdata.hpp
      private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

      private final String name;
      private final LongBuffer lb;
+    private final DirectBuffer db;

      private PerfCounter(String name, int type) {
          this.name = name;
          ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
          bb.order(ByteOrder.nativeOrder());
          this.lb = bb.asLongBuffer();
+        this.db = bb instanceof DirectBuffer ? (DirectBuffer)
bb :
null;
      }

      static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
      /**
       * Returns the current value of the perf counter.
       */
-    public synchronized long get() {
+    public long get() {
+        if (db != null) {
+            return unsafe.getLongVolatile(null, db.address());
+        }
+        else {
+            synchronized (this) {
-        return lb.get(0);
-    }
+                return lb.get(0);
+            }
+        }
+    }

      /**
       * Sets the value of the perf counter to the given newValue.
       */
-    public synchronized void set(long newValue) {
+    public void set(long newValue) {
+        if (db != null) {
+            unsafe.putOrderedLong(null, db.address(), newValue);
+        }
+        else {
+            synchronized (this) {
-        lb.put(0, newValue);
-    }
+                lb.put(0, newValue);
+            }
+        }
+    }

      /**
       * Adds the given value to the perf counter.
       */
-    public synchronized void add(long value) {
-        long res = get() + value;
+    public void add(long value) {
+        if (db != null) {
+            unsafe.getAndAddLong(null, db.address(), value);
+        }
+        else {
+            synchronized (this) {
+                long res = lb.get(0) + value;
-        lb.put(0, res);
+                lb.put(0, res);
+            }
+        }
      }

      /**



Testing the PerfCounter.increment() method in a loop on multiple
threads
sharing the same PerfCounter instance, for example, on a 4-core
Intel i7
machine produces the following results:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical
CPUS: 8
#
            1 threads, Tavg =     19.02 ns/op (? = 0.00 ns/op)
            2 threads, Tavg =    109.93 ns/op (? = 6.17 ns/op)
            3 threads, Tavg =    136.64 ns/op (? = 2.99 ns/op)
            4 threads, Tavg =    293.26 ns/op (? = 5.30 ns/op)
            5 threads, Tavg =    316.94 ns/op (? = 6.28 ns/op)
            6 threads, Tavg =    686.96 ns/op (? = 7.09 ns/op)
            7 threads, Tavg =    793.28 ns/op (? = 10.57 ns/op)
            8 threads, Tavg =    898.15 ns/op (? = 14.63 ns/op)


With the presented patch, the results are a little better:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical
CPUS: 8
#
# Measure:
            1 threads, Tavg =      5.22 ns/op (? = 0.00 ns/op)
            2 threads, Tavg =     34.51 ns/op (? = 0.60 ns/op)
            3 threads, Tavg =     54.85 ns/op (? = 1.42 ns/op)
            4 threads, Tavg =     74.67 ns/op (? = 1.71 ns/op)
            5 threads, Tavg =     94.71 ns/op (? = 41.68 ns/op)
            6 threads, Tavg =    114.80 ns/op (? = 32.10 ns/op)
            7 threads, Tavg =    136.70 ns/op (? = 26.80 ns/op)
            8 threads, Tavg =    158.48 ns/op (? = 9.93 ns/op)


The scalability is not much better, but the raw speed is, so it
might
present less contention when used in real-world code. If you
wanted
even
better scalability, there is a new class in JDK8, the
java.util.concurrent.LongAdder. But that doesn't buy atomic
"set()" -
only "add()". And it can't update native-memory variables, so it
could
only be used for add-only counters and in conjunction with a
background
thread that would periodically flush the sum to the native
memory....

Regards, Peter


On 02/08/2013 06:10 PM, Nils Loodin wrote:

It would be interesting to know the number of thrown
throwables in
the
JVM, to be able to do some high level application diagnostics /
statistics. A good way to put this number would be a performance
counter, since it is accessible both from Java and from the VM.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

Regards,
Nils Loodin

Re: RFR: 8007806: Need a Throwables performance counter

Reply via email to