Hi Peter,

thank you. As some background info, the machine I'm running the benchmarks on is a 32-core Xeon.

Results of running your latest benchmarks with unmodified 9-dev and your two patches (02, 03) are attached. Overall, it seems your solution 03 is a bit slower than 02, and 02 shines especially in the expunging benchmark, but also for random access with large numbers of classes and class values. It appears to be somewhat slower than the unmodified case, though.

It may well be that running the bechmark so few times does not deliver a stable enough result. I'd like Aleksey to comment on this: is adopting Peter's code worthwhile given it improves on footprint and reduces code size and complexity?

I agree regarding whether there's a point in optimising for single-value storage whilst maintaining full flexibility. In a scenario where it is known that only one value will be associated with a class, it's better to use static fields.

Best,

Michael

Benchmark                                 (classCount)  (classValueCount)  
(classValuesPerPart)  (classesPerPart)   (impl)  (partitions)  Mode  Cnt     
Score    Error  Units
ClassValueBench.randomAccess                       128                  1       
            N/A               N/A  unknown           N/A  avgt   10    10.190 
±  0.014  ns/op
ClassValueBench.randomAccess                       128                  4       
            N/A               N/A  unknown           N/A  avgt   10    12.000 
±  0.164  ns/op
ClassValueBench.randomAccess                       128                 16       
            N/A               N/A  unknown           N/A  avgt   10    16.131 
±  0.026  ns/op
ClassValueBench.randomAccess                       128                256       
            N/A               N/A  unknown           N/A  avgt   10    24.267 
±  0.065  ns/op
ClassValueBench.randomAccess                      1024                  1       
            N/A               N/A  unknown           N/A  avgt   10    18.375 
±  0.046  ns/op
ClassValueBench.randomAccess                      1024                  4       
            N/A               N/A  unknown           N/A  avgt   10    26.755 
±  0.018  ns/op
ClassValueBench.randomAccess                      1024                 16       
            N/A               N/A  unknown           N/A  avgt   10    26.263 
±  0.024  ns/op
ClassValueBench.randomAccess                      1024                256       
            N/A               N/A  unknown           N/A  avgt   10    53.543 
±  0.419  ns/op
ClassValueBench.sequentialAccess                   128                  1       
            N/A               N/A  unknown           N/A  avgt   10    11.063 
±  0.077  ns/op
ClassValueBench.sequentialAccess                   128                  4       
            N/A               N/A  unknown           N/A  avgt   10     9.384 
±  0.033  ns/op
ClassValueBench.sequentialAccess                   128                 16       
            N/A               N/A  unknown           N/A  avgt   10    10.534 
±  0.036  ns/op
ClassValueBench.sequentialAccess                   128                256       
            N/A               N/A  unknown           N/A  avgt   10    18.038 
±  0.119  ns/op
ClassValueBench.sequentialAccess                  1024                  1       
            N/A               N/A  unknown           N/A  avgt   10    14.862 
±  0.013  ns/op
ClassValueBench.sequentialAccess                  1024                  4       
            N/A               N/A  unknown           N/A  avgt   10    11.586 
±  0.027  ns/op
ClassValueBench.sequentialAccess                  1024                 16       
            N/A               N/A  unknown           N/A  avgt   10     8.949 
±  0.116  ns/op
ClassValueBench.sequentialAccess                  1024                256       
            N/A               N/A  unknown           N/A  avgt   10    43.170 
±  0.074  ns/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
              8              1024  unknown            16    ss   16   130.911 
± 10.815  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
              8              4096  unknown            16    ss   16   435.190 
± 32.679  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
             64              1024  unknown            16    ss   16   569.942 
± 68.902  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
             64              4096  unknown            16    ss   16  1485.027 
± 91.200  ms/op
Benchmark                                 (classCount)  (classValueCount)  
(classValuesPerPart)  (classesPerPart)   (impl)  (partitions)  Mode  Cnt    
Score    Error  Units
ClassValueBench.randomAccess                       128                  1       
            N/A               N/A  unknown           N/A  avgt   10   11.488 ± 
 0.050  ns/op
ClassValueBench.randomAccess                       128                  4       
            N/A               N/A  unknown           N/A  avgt   10   13.497 ± 
 0.038  ns/op
ClassValueBench.randomAccess                       128                 16       
            N/A               N/A  unknown           N/A  avgt   10   15.256 ± 
 1.289  ns/op
ClassValueBench.randomAccess                       128                256       
            N/A               N/A  unknown           N/A  avgt   10   25.620 ± 
 0.076  ns/op
ClassValueBench.randomAccess                      1024                  1       
            N/A               N/A  unknown           N/A  avgt   10   18.020 ± 
 0.022  ns/op
ClassValueBench.randomAccess                      1024                  4       
            N/A               N/A  unknown           N/A  avgt   10   26.993 ± 
 0.023  ns/op
ClassValueBench.randomAccess                      1024                 16       
            N/A               N/A  unknown           N/A  avgt   10   31.862 ± 
 0.019  ns/op
ClassValueBench.randomAccess                      1024                256       
            N/A               N/A  unknown           N/A  avgt   10   42.897 ± 
 0.121  ns/op
ClassValueBench.sequentialAccess                   128                  1       
            N/A               N/A  unknown           N/A  avgt   10   10.776 ± 
 0.068  ns/op
ClassValueBench.sequentialAccess                   128                  4       
            N/A               N/A  unknown           N/A  avgt   10   10.084 ± 
 0.041  ns/op
ClassValueBench.sequentialAccess                   128                 16       
            N/A               N/A  unknown           N/A  avgt   10    9.572 ± 
 0.052  ns/op
ClassValueBench.sequentialAccess                   128                256       
            N/A               N/A  unknown           N/A  avgt   10   17.200 ± 
 0.056  ns/op
ClassValueBench.sequentialAccess                  1024                  1       
            N/A               N/A  unknown           N/A  avgt   10   12.623 ± 
 0.019  ns/op
ClassValueBench.sequentialAccess                  1024                  4       
            N/A               N/A  unknown           N/A  avgt   10   11.008 ± 
 0.007  ns/op
ClassValueBench.sequentialAccess                  1024                 16       
            N/A               N/A  unknown           N/A  avgt   10   11.021 ± 
 0.059  ns/op
ClassValueBench.sequentialAccess                  1024                256       
            N/A               N/A  unknown           N/A  avgt   10   38.818 ± 
 0.023  ns/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
              8              1024  unknown            16    ss   16  103.340 ± 
 8.328  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
              8              4096  unknown            16    ss   16  354.411 ± 
25.083  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
             64              1024  unknown            16    ss   16  268.129 ± 
35.009  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
             64              4096  unknown            16    ss   16  709.350 ± 
58.439  ms/op
Benchmark                                 (classCount)  (classValueCount)  
(classValuesPerPart)  (classesPerPart)   (impl)  (partitions)  Mode  Cnt    
Score    Error  Units
ClassValueBench.randomAccess                       128                  1       
            N/A               N/A  unknown           N/A  avgt   10   12.538 ± 
 0.015  ns/op
ClassValueBench.randomAccess                       128                  4       
            N/A               N/A  unknown           N/A  avgt   10   13.910 ± 
 0.033  ns/op
ClassValueBench.randomAccess                       128                 16       
            N/A               N/A  unknown           N/A  avgt   10   16.351 ± 
 0.021  ns/op
ClassValueBench.randomAccess                       128                256       
            N/A               N/A  unknown           N/A  avgt   10   26.028 ± 
 0.027  ns/op
ClassValueBench.randomAccess                      1024                  1       
            N/A               N/A  unknown           N/A  avgt   10   23.315 ± 
 0.053  ns/op
ClassValueBench.randomAccess                      1024                  4       
            N/A               N/A  unknown           N/A  avgt   10   28.323 ± 
 0.053  ns/op
ClassValueBench.randomAccess                      1024                 16       
            N/A               N/A  unknown           N/A  avgt   10   29.514 ± 
 0.070  ns/op
ClassValueBench.randomAccess                      1024                256       
            N/A               N/A  unknown           N/A  avgt   10   45.339 ± 
 0.035  ns/op
ClassValueBench.sequentialAccess                   128                  1       
            N/A               N/A  unknown           N/A  avgt   10   12.847 ± 
 0.041  ns/op
ClassValueBench.sequentialAccess                   128                  4       
            N/A               N/A  unknown           N/A  avgt   10    9.992 ± 
 0.029  ns/op
ClassValueBench.sequentialAccess                   128                 16       
            N/A               N/A  unknown           N/A  avgt   10    9.424 ± 
 0.003  ns/op
ClassValueBench.sequentialAccess                   128                256       
            N/A               N/A  unknown           N/A  avgt   10   18.729 ± 
 0.101  ns/op
ClassValueBench.sequentialAccess                  1024                  1       
            N/A               N/A  unknown           N/A  avgt   10   14.755 ± 
 0.040  ns/op
ClassValueBench.sequentialAccess                  1024                  4       
            N/A               N/A  unknown           N/A  avgt   10   12.609 ± 
 0.044  ns/op
ClassValueBench.sequentialAccess                  1024                 16       
            N/A               N/A  unknown           N/A  avgt   10   10.160 ± 
 0.032  ns/op
ClassValueBench.sequentialAccess                  1024                256       
            N/A               N/A  unknown           N/A  avgt   10   41.481 ± 
 0.051  ns/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
              8              1024  unknown            16    ss   16  105.299 ± 
 6.092  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
              8              4096  unknown            16    ss   16  356.506 ± 
25.686  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
             64              1024  unknown            16    ss   16  284.888 ± 
42.546  ms/op
ClassValueExpungeBench.redeployPartition           N/A                N/A       
             64              4096  unknown            16    ss   16  825.090 ± 
95.535  ms/op

Am 08.05.2016 um 17:18 schrieb Peter Levart <peter.lev...@gmail.com>:

Hi Michael,


On 05/06/2016 04:48 PM, Michael Haupt wrote:
Hi Peter,

thank you. I've run the full benchmark in my setup and uploaded the updated cumulative results to http://cr.openjdk.java.net/~mhaupt/8031043/.

The benchmark indeed shows that this latest addition to the group slows down random and sequential access, especially for small numbers of values and classes. The OpenJDK tests are fine; I'm running a batch of internal tests as well.

Given that one concern with this issue, next to reducing footprint, was to optimise for the single-value case, I'm still a bit hesitant even though the sheer amount of code reduction is impressive. I'll evaluate further.

Interesting. I observed quite the opposite on my machine (i7-4771, 8 MiB cache) . For sequential access pattern or for random access with small number of CV(s) and Class(es) the results are comparable. Only for 256 CV(s) x 1024 Class(es) and with random access pattern, I observed about 20% drop of performance which I attributed to the difference in design of CHM vs. the 'cache' of JDK 9 ClassValue (worse CPU cache locality for CHM):

    http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueBench.java

I doubt that single value per Class instance is something that is beneficial to optimize. Such optimization would be very fragile so it would not be something to rely on. Typical or even worst case performance is more important in my opinion.

The fast-path lookup performance is the most important performance aspect of ClassValue, but it is not the only one that can be observed. Footprint and consequential GC / expunging overhead is also something to consider. The implementation presented in my webrev.02 maintains a linked list of weakly-referenced ClassValueMap(s). For each stale dequeued key, it probes each map and removes such key from any live map(s) containing it. This works optimally when the matrix of (ClassValue, Class) pairs is not sparse. I did an experiment with alternative expunging design where I maintain an array of weakly-referenced ClassValueMap(s) on each key that is inserted in them. This has approx. 10% additiona footprint overhead compared to original expunging design (but still just half the footprint overhead of jdk 9 ClassValue design):

    http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.03/

The situation I envisioned was when a single JVM hosts multiple (say N) isolated applications (in an app server for example) and when one such application is re-deployed.

In original design (webrev.02) each dequeued ClassValue.key is probed against all class maps that remain and belong to the other N-1 applications. In the alternative expunging design (webrev.03) the dequeued key just scans the array of weakly-referenced maps that the key was inserted into.

I created a benchmark to exercise such situation(s):

    http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/ClassValueExpungeBench.java

It measures the time of a hypothetical redeployment of one application in an app server where there are 16 such running applications. The measurement includes class-loading, GC time and initialization of ClassValue(s). Results show that alternative expunging design (webrev.03) doesn't bring any improvements (or that original supposedly sub-optimal expunging design (webrev.02) doesn't show any weaknesses) for the range of parameters exercised in the benchmark.

What this benchmark shows too is that original jdk 9 ClassValue has at least 2x overhead with cleanup compared to my designs (note that benchmark includes classloading time too).

 Regards, Peter


Best,

Michael

Am 05.05.2016 um 17:21 schrieb Peter Levart <peter.lev...@gmail.com>:

Hi Michael,


On 05/04/2016 06:02 PM, Michael Haupt wrote:
Hi Peter,

thank you for chiming in again! :-) I'll look at this in depth on Friday.

Good. Because I found bugs in expunging logic and a discrepancy of behavior when a value is installed concurrently by some other thread and then later removed while the 1st thread is still calculating the value. Current ClassValue re-tries the computation until it can make sure there were no concurrent changes to the entry during its computation. I fixed both things and verified that the behavior is now the same:

    http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.02/

Regards, Peter


-- 

Oracle
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG |
 Schiffbauergasse 14 | 14467 Potsdam, Germany

ORACLE Deutschland B.V. & Co. KG | Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V. | Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Nederland, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher
Green Oracle Oracle is committed to developing practices and products that help protect the environment



_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev



-- 

Oracle
Dr. Michael Haupt | Principal Member of Technical Staff
Phone: +49 331 200 7277 | Fax: +49 331 200 7561
Oracle Java Platform Group | LangTools Team | Nashorn
Oracle Deutschland B.V. & Co. KG |
 Schiffbauergasse 14 | 14467 Potsdam, Germany

ORACLE Deutschland B.V. & Co. KG | Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V. | Hertogswetering 163/167, 3543 AS Utrecht, Niederlande
Handelsregister der Handelskammer Midden-Nederland, Nr. 30143697
Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val Maher
Green OracleOracle is committed to developing practices and products that help protect the environment

_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Reply via email to