Re: ClassValue perf?

2016-05-26 Thread Michael Haupt
Hi Peter,thank you for this wonderful piece of work.Am 26.05.2016 um 10:59 schrieb Peter Levart :
How does this implementation compare on your hardware, Michael?Results attached. It improves on the unpatched version in all cases, is in most cases even faster than the "simple solution" (reduce initial size to 1), and reduces complexity of ClassValue. It passes all open and closed jli-related tests as well as the Nashorn tests. Looking really good.Let me run the full internal test suite across platforms.Best,MichaelBenchmark  (CC) (CVC) plaintwisti   plevart2 plevart4 
CVB.randomAccess   128  1 10.277   9.90511.574   9.788 
CVB.randomAccess   128  4 12.081   11.445   13.758   11.476 
CVB.randomAccess   128  1616.352   16.461   15.201   12.588 
CVB.randomAccess   128  256   24.486   24.365   26.177   21.532 
CVB.randomAccess   1024 1 18.951   16.691   19.439   14.674 
CVB.randomAccess   1024 4 27.497   24.634   27.348   22.818 
CVB.randomAccess   1024 1626.988   26.522   32.034   25.353 
CVB.randomAccess   1024 256   54.643   51.415   45.496   35.947 
CVB.sequentialAccess   128  1 11.276   9.37010.724   8.290 
CVB.sequentialAccess   128  4 9.3029.43410.343   8.577 
CVB.sequentialAccess   128  1610.723   10.734   9.5768.427 
CVB.sequentialAccess   128  256   17.721   17.947   17.351   15.646 
CVB.sequentialAccess   1024 1 15.313   16.217   12.763   9.835 
CVB.sequentialAccess   1024 4 11.737   11.779   10.992   9.752 
CVB.sequentialAccess   1024 168.8208.98310.062   8.776 
CVB.sequentialAccess   1024 256   44.024   43.792   39.478   32.867 
CVEB.redeployPartition N/A  N/A   144.797  151.230  118.095  104.374 
CVEB.redeployPartition N/A  N/A   392.969  445.776  370.319  345.316 
CVEB.redeployPartition N/A  N/A   464.723  419.487  252.764  146.739 
CVEB.redeployPartition N/A  N/A   1646.825 1553.961 773.508  428.923 

-- Dr. Michael Haupt | Principal Member of Technical StaffPhone: +49 331 200 7277 | Fax: +49 331 200 7561Oracle Java Platform Group | LangTools Team | NashornOracle Deutschland B.V. & Co. KG | Schiffbauergasse 14 | 14467 Potsdam, GermanyORACLE Deutschland B.V. & Co. KG | Hauptverwaltung: Riesstraße 25, D-80992 MünchenRegistergericht: Amtsgericht München, HRA 95603Komplementärin: ORACLE Deutschland Verwaltung B.V. | Hertogswetering 163/167, 3543 AS Utrecht, NiederlandeHandelsregister der Handelskammer Midden-Nederland, Nr. 30143697Geschäftsführer: Alexander van der Ven, Jan Schultheiss, Val MaherOracle is committed to developing practices and products that help protect the environment

___
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


LinearProbeHashtable Re: ClassValue perf?

2016-05-26 Thread Paul Sandoz
Hi Peter,

Opportunistically if your LinearProbeHashtable works out then i am wondering if 
we could replace the use of CHM within MethodType.ConcurrentWeakInternSet, 
which only uses get/putIfAbsent/remove.

Thereby CHM can use VarHandles without inducing a circular dependency.

Paul.

> On 26 May 2016, at 10:59, Peter Levart  wrote:
> 
> Hi Michael,
> 
> On 05/23/2016 03:56 PM, Michael Haupt wrote:
>> I've ran the unpatched version and Peter's two patches once more. The 
>> results are attached (results.txt). They confirm Aleksey's observation.
>> 
>> Regarding the 03 patch (plevart3 column in the results), perfasm output (see 
>> http://cr.openjdk.java.net/~mhaupt/8031043/perfasm.zip) suggests the cost is 
>> mainly accrued in ConcurrentHashMap. The same is the case for the 02 patch 
>> (plevart2 column).
>> 
>> As things stand, I think we can even focus on Peter's 02 patch, as this is 
>> the faster of his two proposals (plevart2 column in the results), reduces 
>> the footprint, and reduces the implementation complexity. Can anything be 
>> done to improve on its performance? (It has slight performance slowdowns for 
>> the single-value case as well.)
> 
> I can't think of anything else short of improving performance of CHM itself.
> 
> Or replacing CHM with a "better" implementation:
> 
> 
> http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04/
> 
> This webrev is similar to webrev.02. It's only difference is in ClassValueMap 
> which extends LinearProbeHashtable instead of ConcurrentHashMap. 
> LinearProbeHashtable is a simple implementation of a linear-probe hash table. 
> It's not a full Map implementation. It only implements methods needed in 
> ClassValue. With this implementation I get a slight boost compared to JDK 9 
> ClassValue implementation for all sizes and counts:
> 
> Benchmark (classCount)  (classValueCount)  (impl)  
> Mode  Cnt   Score   Error  Units
> ClassValueBench.randomAccess   128  1jdk9  
> avgt   10   9.079 ± 0.092  ns/op
> ClassValueBench.randomAccess   128  4jdk9  
> avgt   10  10.615 ± 0.102  ns/op
> ClassValueBench.randomAccess   128 16jdk9  
> avgt   10  11.665 ± 0.012  ns/op
> ClassValueBench.randomAccess   128256jdk9  
> avgt   10  19.151 ± 0.219  ns/op
> ClassValueBench.randomAccess  1024  1jdk9  
> avgt   10  14.642 ± 0.425  ns/op
> ClassValueBench.randomAccess  1024  4jdk9  
> avgt   10  22.577 ± 0.093  ns/op
> ClassValueBench.randomAccess  1024 16jdk9  
> avgt   10  19.864 ± 0.736  ns/op
> ClassValueBench.randomAccess  1024256jdk9  
> avgt   10  60.470 ± 0.285  ns/op
> ClassValueBench.sequentialAccess   128  1jdk9  
> avgt   10   9.741 ± 0.033  ns/op
> ClassValueBench.sequentialAccess   128  4jdk9  
> avgt   10   8.252 ± 0.029  ns/op
> ClassValueBench.sequentialAccess   128 16jdk9  
> avgt   10   7.888 ± 1.249  ns/op
> ClassValueBench.sequentialAccess   128256jdk9  
> avgt   10  16.493 ± 0.415  ns/op
> ClassValueBench.sequentialAccess  1024  1jdk9  
> avgt   10  13.376 ± 0.452  ns/op
> ClassValueBench.sequentialAccess  1024  4jdk9  
> avgt   10  10.023 ± 0.020  ns/op
> ClassValueBench.sequentialAccess  1024 16jdk9  
> avgt   10   8.029 ± 0.178  ns/op
> ClassValueBench.sequentialAccess  1024256jdk9  
> avgt   10  33.472 ± 0.058  ns/op
> 
> Benchmark (classCount)  (classValueCount)  (impl)  
> Mode  Cnt   Score   Error  Units
> ClassValueBench.randomAccess   128  1pl04  
> avgt   10   8.955 ± 0.055  ns/op
> ClassValueBench.randomAccess   128  4pl04  
> avgt   10   9.999 ± 0.017  ns/op
> ClassValueBench.randomAccess   128 16pl04  
> avgt   10  11.615 ± 1.928  ns/op
> ClassValueBench.randomAccess   128256pl04  
> avgt   10  17.063 ± 0.460  ns/op
> ClassValueBench.randomAccess  1024  1pl04  
> avgt   10  12.553 ± 0.086  ns/op
> ClassValueBench.randomAccess  1024  4pl04  
> avgt   10  16.766 ± 0.221  ns/op
> ClassValueBench.randomAccess  1024 16pl04  
> avgt   10  18.496 ± 0.051  ns/op
> ClassValueBench.randomAccess  1024256pl04  
> avgt   10  41.390 ± 0.321  ns/op
> ClassValueBench.sequentialAccess   128  1pl04  
> avgt   10   7.854 ± 0.381  ns/op
> ClassValueBench.sequentialAccess   128  4pl04  
> avgt   10   7.498 ± 0

Re: ClassValue perf?

2016-05-26 Thread Peter Levart

Hi Michael,


On 05/23/2016 03:56 PM, Michael Haupt wrote:
I've ran the unpatched version and Peter's two patches once more. The 
results are attached (results.txt). They confirm Aleksey's observation.


Regarding the 03 patch (plevart3 column in the results), perfasm 
output (see http://cr.openjdk.java.net/~mhaupt/8031043/perfasm.zip 
) suggests 
the cost is mainly accrued in ConcurrentHashMap. The same is the case 
for the 02 patch (plevart2 column).


As things stand, I think we can even focus on Peter's 02 patch, as 
this is the faster of his two proposals (plevart2 column in the 
results), reduces the footprint, and reduces the implementation 
complexity. Can anything be done to improve on its performance? (It 
has slight performance slowdowns for the single-value case as well.)


I can't think of anything else short of improving performance of CHM itself.

Or replacing CHM with a "better" implementation:

http://cr.openjdk.java.net/~plevart/misc/ClassValue.Alternative2/webrev.04/

This webrev is similar to webrev.02. It's only difference is in 
ClassValueMap which extends LinearProbeHashtable instead of 
ConcurrentHashMap. LinearProbeHashtable is a simple implementation of a 
linear-probe hash table. It's not a full Map implementation. It only 
implements methods needed in ClassValue. With this implementation I get 
a slight boost compared to JDK 9 ClassValue implementation for all sizes 
and counts:


Benchmark (classCount)  (classValueCount) 
(impl)  Mode  Cnt   Score   Error  Units
ClassValueBench.randomAccess   128  1 jdk9  
avgt   10   9.079 ± 0.092  ns/op
ClassValueBench.randomAccess   128  4 jdk9  
avgt   10  10.615 ± 0.102  ns/op
ClassValueBench.randomAccess   128 16 jdk9  
avgt   10  11.665 ± 0.012  ns/op
ClassValueBench.randomAccess   128256 jdk9  
avgt   10  19.151 ± 0.219  ns/op
ClassValueBench.randomAccess  1024  1 jdk9  
avgt   10  14.642 ± 0.425  ns/op
ClassValueBench.randomAccess  1024  4 jdk9  
avgt   10  22.577 ± 0.093  ns/op
ClassValueBench.randomAccess  1024 16 jdk9  
avgt   10  19.864 ± 0.736  ns/op
ClassValueBench.randomAccess  1024256 jdk9  
avgt   10  60.470 ± 0.285  ns/op
ClassValueBench.sequentialAccess   128  1 jdk9  
avgt   10   9.741 ± 0.033  ns/op
ClassValueBench.sequentialAccess   128  4 jdk9  
avgt   10   8.252 ± 0.029  ns/op
ClassValueBench.sequentialAccess   128 16 jdk9  
avgt   10   7.888 ± 1.249  ns/op
ClassValueBench.sequentialAccess   128256 jdk9  
avgt   10  16.493 ± 0.415  ns/op
ClassValueBench.sequentialAccess  1024  1 jdk9  
avgt   10  13.376 ± 0.452  ns/op
ClassValueBench.sequentialAccess  1024  4 jdk9  
avgt   10  10.023 ± 0.020  ns/op
ClassValueBench.sequentialAccess  1024 16 jdk9  
avgt   10   8.029 ± 0.178  ns/op
ClassValueBench.sequentialAccess  1024256 jdk9  
avgt   10  33.472 ± 0.058  ns/op


Benchmark (classCount)  (classValueCount) 
(impl)  Mode  Cnt   Score   Error  Units
ClassValueBench.randomAccess   128  1 pl04  
avgt   10   8.955 ± 0.055  ns/op
ClassValueBench.randomAccess   128  4 pl04  
avgt   10   9.999 ± 0.017  ns/op
ClassValueBench.randomAccess   128 16 pl04  
avgt   10  11.615 ± 1.928  ns/op
ClassValueBench.randomAccess   128256 pl04  
avgt   10  17.063 ± 0.460  ns/op
ClassValueBench.randomAccess  1024  1 pl04  
avgt   10  12.553 ± 0.086  ns/op
ClassValueBench.randomAccess  1024  4 pl04  
avgt   10  16.766 ± 0.221  ns/op
ClassValueBench.randomAccess  1024 16 pl04  
avgt   10  18.496 ± 0.051  ns/op
ClassValueBench.randomAccess  1024256 pl04  
avgt   10  41.390 ± 0.321  ns/op
ClassValueBench.sequentialAccess   128  1 pl04  
avgt   10   7.854 ± 0.381  ns/op
ClassValueBench.sequentialAccess   128  4 pl04  
avgt   10   7.498 ± 0.055  ns/op
ClassValueBench.sequentialAccess   128 16 pl04  
avgt   10   9.218 ± 1.000  ns/op
ClassValueBench.sequentialAccess   128256 pl04  
avgt   10  13.593 ± 0.275  ns/op
ClassValueBench.sequentialAccess  1024  1 pl04  
avgt   10   8.774 ± 0.037  ns/op
ClassValueBench.sequentialAccess  1024  4 pl04  
avgt   10   8.562 ± 0.014  ns/op
ClassValueBench.sequentialAccess  1024 16 pl04  
avgt   10   7.596 ± 0.027  ns/op
Clas