Hi, Claes,

Thanks for starting to look into this.

On 06/12/2016 06:28 PM, Claes Redestad wrote:
Hi,

this seems quite promising.

A bit too much to review in one go, but let me start with some musings while I digest it more fully:

  49     /**
50 * A special key that is used to overwrite a key when the entry is removed.
  51      */
  52     private static final class Tombstone {}
  53
  54     private static final Tombstone TOMBSTONE = new Tombstone();

Since you're only checking identity for TOMBSTONE, couldn't it simply be an Object?

Yes, could be an Object. I initially had statefull Tombstones, but now it is just a special reference.

Similarly, could the class Removed you introduced in ClassValue be replaced by a similar Object TOMBSTONE?

Removed instances in ClassValue are different. Two distinct instances of Removed are not the same and by not being the same, ABA problem is avoided if during computation of some value in first thread, a value is computed by some other thread, installed and then removed before trying to install the computed value of the first thread.


Thanks!

In the meanwhile I simplified some methods of LinearProbeHashtable (including get()) so that they don't have to spin-wait for a removing thread to install a tombstone. I did that after realizing that I could reverse writes in remove() method(s). remove() method(s) now 1st overwrite the key with a tombstone, and after that they clear the value. Here's a webrev with just that simplification + your comment applied:

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.4/

Separately and not included in above webrev, I managed to improve average lookup time for MethodType keys by optimizing the locations of keys when the table is rehashed. This does not affect ClassValue lookup performance since ClassValue keys are so perfect that there are no hash collisions at all. But if LinearProbeHashtable is to be used for interning MethodType(s) too, than this is good news. I plan to publish the mentioned change separately with benchmark results when ready...

Regards, Peter




/Claes

Note: I'd argue that this might help with some footprint regressions in JDK 9 (due to increased use of MHs, which in turn use ClassValues) and if getting through review in a reasonable time frame I think this should be targetted for JDK 9 as a regression bug fix.

On 2016-06-09 16:42, Peter Levart wrote:
Hi,

The patch for this issue is a result of the following discussion on
mlvm-dev list:

http://mail.openjdk.java.net/pipermail/mlvm-dev/2016-May/006602.html

I propose the following improvement to the implementation of ClassValue
API:

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3/


This is a re-implementation of ClassValue API that leverages a
ConcurrentMap for storage. ConcurrentHashMap was initially tested, but
using it regressed lookup performance for about ~10%. So a simple
ConcurrentMap implementation (LinearProbeHashtable) was created which
has smaller footprint, better CPU cache locality and consequently faster
lookup performance for the task at hand. The goal of this task was to
shrink the memory footprint of ClassValue API. To see the end effect
let's first look at the results of a benchmark that simulates a
re-deployment of an application in an app server where several apps are
deployed and each uses it's own set of classes and ClassValue(s):

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueExpungeBench.java


Benchmark                                 (classValuesPerPart)
(classesPerPart)  (impl)  (partitions)  Mode  Cnt     Score Error  Units
ClassValueExpungeBench.redeployPartition 8              1024
jdk9            16    ss   16    65.682 ± 1.485  ms/op
ClassValueExpungeBench.redeployPartition 8              4096
jdk9            16    ss   16   247.040 ± 7.684  ms/op
ClassValueExpungeBench.redeployPartition 64              1024
jdk9            16    ss   16   302.536 ± 27.750  ms/op
ClassValueExpungeBench.redeployPartition 64              4096
jdk9            16    ss   16  1174.002 ± 77.183  ms/op

Benchmark (classValuesPerPart)  (classesPerPart)  (impl) (partitions)
Mode  Cnt    Score   Error  Units
ClassValueExpungeBench.redeployPartition 8              1024
pl04.3            16    ss   16   47.179 ± 1.436  ms/op
ClassValueExpungeBench.redeployPartition 8              4096
pl04.3            16    ss   16  163.067 ± 8.118  ms/op
ClassValueExpungeBench.redeployPartition 64              1024
pl04.3            16    ss   16   67.581 ± 1.718  ms/op
ClassValueExpungeBench.redeployPartition 64              4096
pl04.3            16    ss   16  240.458 ± 6.616  ms/op

A by-product of this simulation is a heap dump histogram taken after the
last warmup iteration when all "applications" are deployed:

top-10 classes heap dump for 64 ClassValue(s) x 4096 Class(es) x 16
partitions

jdk9:

  num     #instances         #bytes  class name (module)
-------------------------------------------------------
    1:       4194309      167772360 java.util.WeakHashMap$Entry
(java.base@9-internal)
    2:       4195329      134250528 java.lang.ClassValue$Entry
(java.base@9-internal)
    3:         65539       34603248 [Ljava.util.WeakHashMap$Entry;
(java.base@9-internal)
    4:         65537       34603032 [Ljava.lang.ClassValue$Entry;
(java.base@9-internal)
5: 67301 7552448 java.lang.Class (java.base@9-internal)
    6:         65536        4194304 java.lang.ClassValue$ClassValueMap
(java.base@9-internal)
    7:         65552        2097664 java.lang.ref.ReferenceQueue
(java.base@9-internal)
    8:         67736        1676344  [Ljava.lang.Object;
(java.base@9-internal)
    9:         66349        1116848  [I (java.base@9-internal)
   10:         65554        1048864 java.lang.ref.ReferenceQueue$Lock
(java.base@9-internal)
-------------------------------------------------------
                          388915640 == 370 MiB

pl04.3:

  num     #instances         #bytes  class name (module)
-------------------------------------------------------
    1:        133274       69833848  [Ljava.lang.Object;
(java.base@9-internal)
2: 67300 7552376 java.lang.Class (java.base@9-internal)
    3:         65536        2097152 java.lang.ClassValue$ClassValueMap
(java.base@9-internal)
    4:         65536        2097152
java.lang.ClassValue$ClassValueMap$WeakNode (java.base@9-internal)
    5:         66349        1116848  [I (java.base@9-internal)
    6:         65991        1055856  java.lang.Object
(java.base@9-internal)
    7:          7434         696584  [B (java.base@9-internal)
    8:          1551         299680  [Ljava.lang.Class;
(java.base@9-internal)
    9:          2447         176184  java.lang.reflect.Field
(java.base@9-internal)
   10:          7154         171696  java.lang.String
(java.base@9-internal)
-------------------------------------------------------
                           85097376 == 81 MiB


Footprint is reduced for more than 4 times. The main improvement of this
implementation is that to support M Class(es) x N ClassValue(s), the
only part of the data structure that has O(M*N) space footprint are the
backing array(s) of LinearProbeHashtable. No less than 3 and no more
than 6 Object slots are used per Class x ClassValue association (4 in
average). Other supporting parts of the data structure are either O(M),
O(N) or O(1).

The lookup time has improved too with this patch (slightly):

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueBench.java


http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.3.bench_results.pdf



As to the LinearProbeHashtable, it has been updated from initial version
to implement the whole ConcurrentMap interface. Not much was needed for
that (~150 lines of heavy commented code) comparing to the
implementation in webrev.04.2
(http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04.2/).
The end effect was that I was able to test the correctness of the
implementation using MOAT tests. Other changes to LinearProbeHashtable
from webrev.04.2 (for those following the discussion on mlvm-dev) are:

     - the lookup logic that was repeated in each and every method was
factored-out into a lookup() private method.
     - further benchmarking revealed that is was not beneficial to have
a 'skip' field in tombstone objects. Reading this field to optimize
skipping over a consecutive run of tombstones reduces cache-locality of
algorithm. On average it is faster to just skip one tombstone at a time
and use a singleton tombstone object to be able to compare just the
reference.
     - The following spin loop was added to get() method (and similar to
some other methods) to preserve SC properties of the methods:

     public V get(Object key) {
         Object[] tab = table;
         int i = lookup(tab, key);
         if (i >= 0) {
             Object value = getVolatile(tab, i + 1);
             if (value != null) {
                 return (V) value;
             }
             // possible race with removing an entry can cause oldValue
             // to be observed cleared. If this happens, we know the key
             // has/will be tombstoned just after that but we must wait
             // for that to happen in order to preserve SC properties of
// this method; without this wait one could even observe the
             // following in one thread:
             //
             //      table.get(k) == null && table.containsKey(k);
             //
             while (getVolatile(tab, i) != TOMBSTONE) {
                 Thread.onSpinWait();
             }
         }
         return null;
     }

...the write this spin-loop is waiting on is in the remove() method(s):

                         tab[i + 1] = null; // clear the value...
                         tombstoned++;
                         putVolatile(tab, i, TOMBSTONE); // ...publish
tombstone


The lookup performance of LinearProbeHashtable was put to test comparing
it with ConcurrentHashMap in the following benchmark:

http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_src.tgz

Results:

http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_bench.pdf



In this benchmark, several variations are tested. The one from
webrev.04.2 with stateful tombstones is called LinearProbeHashTable and
the one with singleton stateless tombstone, presented in this RFR as
webrev.04.3 is called LinearProbeHashTable1. The benchmark shows that
even during concurrent removals and insertions, the successful lookup
performance (GetExistent) does not degrade much and is on-par with
ConcurrentHashMap. This benchmark also explores the suitability of using
LinearProbeHashTable for keys derived from MethodType.

A note to reviewers: It is useless to compare the old implementation
with the new one side by side as they have not much in common (just
public API and javadocs). Although the net line count increases with
this patch, the complexity of ClassValue implmenetation is reduced,
because it is a layered implementation. ClassValue API is implemented in
terms of a well-known ConcurrentMap API and correctness of ConcurrentMap
implementation is verified by MOAT tests.

Regards, Peter


Reply via email to