On Wed, 17 Mar 2021 20:41:47 GMT, Alexander Zvegintsev <azveg...@openjdk.org> 
wrote:

>> FYI: probably is better/simpler to review it via webrev.
>> 
>> After migration to the lcms from the kcms the performance of some operations 
>> was regressed. One possible workaround was to split the operation into 
>> multiple threads. But unfortunately, we have too many bottlenecks which 
>> prevent using multithreading. This is the request to remove/minimize such 
>> bottlenecks(at least some of them), but it does not affect the 
>> single-threaded performance it should be handled separately.
>> 
>> The main code pattern optimized here is this:
>>     activate();
>>     byte[] theHeader = getData(cmmProfile, icSigHead);
>>     ---->  CMSManager.getModule().getTagData(p, tagSignature);
>> Notes about the code above:
>> 
>> 1. Before the change "activate()" method checked that the "cmmProfile" field 
>> was not null. After that we usually used the "cmmProfile" as the parameter 
>> to some other method. This included two volatile reads, and also required to 
>> check when we need to call the "activate()" method before usage of the 
>> "cmmProfile" field.
>> Solution: "activate()" renamed to the "cmmProfile()" which became an 
>> accessor for the field, so we will get one volatile read and can easily 
>> monitor the usage of the field itself(it is used directly only in this 
>> method).
>> 
>> 2. The synchronized static method "CMSManager.getModule()" reimplemented to 
>> have only one volatile read.
>> 
>> 3. The usage of locking in the "getTagData()" is changed. Instead of the 
>> synchronized instance methods, we now use the mix of "ConcurrentHashMap" and 
>> StampedLock.
>> 
>> See some comments inline.
>> 
>> Some numbers(small numbers are better):
>> 
>> 1. Performance of ((ICC_ProfileRGB) 
>> ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();
>> 
>> jdk 15.0.2
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   19,624 ±    0,059  us/op
>> CMMPerf.testGetMatrix                  avgt    5    0,154 ±    0,001  us/op
>> 
>> jdk - before the fix
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   12,935 ±    0,042  us/op
>> CMMPerf.testGetMatrix                  avgt    5    0,127 ±    0,007  us/op
>> 
>> jdk - after the fix
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5    0,561 ±    0,005  us/op
>> CMMPerf.testGetMatrix                  avgt    5    0,092 ±    0,001  us/op
>> 
>> 2. Part of performance gain in jdk17 is from some other fixes, for example
>>     Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and 
>> ColorSpace.getInstance(ColorSpace.CS_sRGB);
>> 
>> jdk 15.0.2
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    2,299 ±    0,032  us/op
>> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    2,210 ±    0,051  us/op
>> CMMPerf.testGetSRGBProfile             avgt    5    0,019 ±    0,001  us/op
>> CMMPerf.testGetSRGBSpace               avgt    5    0,018 ±    0,001  us/op
>> 
>> jdk - same before/ after the fix
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    0,005 ±    0,001  us/op
>> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    0,005 ±    0,001  us/op
>> CMMPerf.testGetSRGBProfile             avgt    5    0,005 ±    0,001  us/op
>> CMMPerf.testGetSRGBSpace               avgt    5    0,005 ±    0,001  us/op
>> 
>> note "ThreadsMAX" is 32 threads.
>
> src/java.desktop/share/native/liblcms/LCMS.c line 644:
> 
>> 642:         return cmmProfile;
>> 643:     }
>> 644:     return NULL;
> 
> Why do we need to do this from native code? (except easing of access to a 
> private method of a class in another package.)
> Will it give some noticeable performance boost if we implement it on java 
> side?

Yes, this is the only reason.
I have a todo to check what access will be better, 
AWTAccessor/methodhandle/reflection vs jni.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2957

Reply via email to