Hi Nils,

If the counters are updated frequently from multiple threads, there might be contention/scalability issues. Instead of synchronization on updates, you might consider using atomic updates provided by sun.misc.Unsafe, like for example:


Index: jdk/src/share/classes/sun/misc/PerfCounter.java
===================================================================
--- jdk/src/share/classes/sun/misc/PerfCounter.java
+++ jdk/src/share/classes/sun/misc/PerfCounter.java
@@ -25,6 +25,8 @@

 package sun.misc;

+import sun.nio.ch.DirectBuffer;
+
 import java.nio.ByteBuffer;
 import java.nio.ByteOrder;
 import java.nio.LongBuffer;
@@ -50,6 +52,8 @@
 public class PerfCounter {
     private static final Perf perf =
         AccessController.doPrivileged(new Perf.GetPerfAction());
+    private static final Unsafe unsafe =
+        Unsafe.getUnsafe();

// Must match values defined in hotspot/src/share/vm/runtime/perfdata.hpp
     private final static int V_Constant  = 1;
@@ -59,12 +63,14 @@

     private final String name;
     private final LongBuffer lb;
+    private final DirectBuffer db;

     private PerfCounter(String name, int type) {
         this.name = name;
         ByteBuffer bb = perf.createLong(name, U_None, type, 0L);
         bb.order(ByteOrder.nativeOrder());
         this.lb = bb.asLongBuffer();
+        this.db = bb instanceof DirectBuffer ? (DirectBuffer) bb : null;
     }

     static PerfCounter newPerfCounter(String name) {
@@ -79,23 +85,44 @@
     /**
      * Returns the current value of the perf counter.
      */
-    public synchronized long get() {
+    public long get() {
+        if (db != null) {
+            return unsafe.getLongVolatile(null, db.address());
+        }
+        else {
+            synchronized (this) {
-        return lb.get(0);
-    }
+                return lb.get(0);
+            }
+        }
+    }

     /**
      * Sets the value of the perf counter to the given newValue.
      */
-    public synchronized void set(long newValue) {
+    public void set(long newValue) {
+        if (db != null) {
+            unsafe.putOrderedLong(null, db.address(), newValue);
+        }
+        else {
+            synchronized (this) {
-        lb.put(0, newValue);
-    }
+                lb.put(0, newValue);
+            }
+        }
+    }

     /**
      * Adds the given value to the perf counter.
      */
-    public synchronized void add(long value) {
-        long res = get() + value;
+    public void add(long value) {
+        if (db != null) {
+            unsafe.getAndAddLong(null, db.address(), value);
+        }
+        else {
+            synchronized (this) {
+                long res = lb.get(0) + value;
-        lb.put(0, res);
+                lb.put(0, res);
+            }
+        }
     }

     /**



Testing the PerfCounter.increment() method in a loop on multiple threads sharing the same PerfCounter instance, for example, on a 4-core Intel i7 machine produces the following results:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
           1 threads, Tavg =     19.02 ns/op (? =   0.00 ns/op)
           2 threads, Tavg =    109.93 ns/op (? =   6.17 ns/op)
           3 threads, Tavg =    136.64 ns/op (? =   2.99 ns/op)
           4 threads, Tavg =    293.26 ns/op (? =   5.30 ns/op)
           5 threads, Tavg =    316.94 ns/op (? =   6.28 ns/op)
           6 threads, Tavg =    686.96 ns/op (? =   7.09 ns/op)
           7 threads, Tavg =    793.28 ns/op (? =  10.57 ns/op)
           8 threads, Tavg =    898.15 ns/op (? =  14.63 ns/op)


With the presented patch, the results are a little better:

#
# PerfCounter_increment: run duration:  5,000 ms, #of logical CPUS: 8
#
# Measure:
           1 threads, Tavg =      5.22 ns/op (? =   0.00 ns/op)
           2 threads, Tavg =     34.51 ns/op (? =   0.60 ns/op)
           3 threads, Tavg =     54.85 ns/op (? =   1.42 ns/op)
           4 threads, Tavg =     74.67 ns/op (? =   1.71 ns/op)
           5 threads, Tavg =     94.71 ns/op (? =  41.68 ns/op)
           6 threads, Tavg =    114.80 ns/op (? =  32.10 ns/op)
           7 threads, Tavg =    136.70 ns/op (? =  26.80 ns/op)
           8 threads, Tavg =    158.48 ns/op (? =   9.93 ns/op)


The scalability is not much better, but the raw speed is, so it might present less contention when used in real-world code. If you wanted even better scalability, there is a new class in JDK8, the java.util.concurrent.LongAdder. But that doesn't buy atomic "set()" - only "add()". And it can't update native-memory variables, so it could only be used for add-only counters and in conjunction with a background thread that would periodically flush the sum to the native memory....

Regards, Peter


On 02/08/2013 06:10 PM, Nils Loodin wrote:
It would be interesting to know the number of thrown throwables in the JVM, to be able to do some high level application diagnostics / statistics. A good way to put this number would be a performance counter, since it is accessible both from Java and from the VM.

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8007806
http://cr.openjdk.java.net/~nloodin/8007806/webrev.00/

Regards,
Nils Loodin

Reply via email to