On Wed, 17 Mar 2021 20:41:47 GMT, Alexander Zvegintsev <azveg...@openjdk.org> wrote:
>> FYI: probably is better/simpler to review it via webrev. >> >> After migration to the lcms from the kcms the performance of some operations >> was regressed. One possible workaround was to split the operation into >> multiple threads. But unfortunately, we have too many bottlenecks which >> prevent using multithreading. This is the request to remove/minimize such >> bottlenecks(at least some of them), but it does not affect the >> single-threaded performance it should be handled separately. >> >> The main code pattern optimized here is this: >> activate(); >> byte[] theHeader = getData(cmmProfile, icSigHead); >> ----> CMSManager.getModule().getTagData(p, tagSignature); >> Notes about the code above: >> >> 1. Before the change "activate()" method checked that the "cmmProfile" field >> was not null. After that we usually used the "cmmProfile" as the parameter >> to some other method. This included two volatile reads, and also required to >> check when we need to call the "activate()" method before usage of the >> "cmmProfile" field. >> Solution: "activate()" renamed to the "cmmProfile()" which became an >> accessor for the field, so we will get one volatile read and can easily >> monitor the usage of the field itself(it is used directly only in this >> method). >> >> 2. The synchronized static method "CMSManager.getModule()" reimplemented to >> have only one volatile read. >> >> 3. The usage of locking in the "getTagData()" is changed. Instead of the >> synchronized instance methods, we now use the mix of "ConcurrentHashMap" and >> StampedLock. >> >> See some comments inline. >> >> Some numbers(small numbers are better): >> >> 1. Performance of ((ICC_ProfileRGB) >> ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix(); >> >> jdk 15.0.2 >> Benchmark Mode Cnt Score Error Units >> CMMPerf.ThreadsMAX.testGetMatrix avgt 5 19,624 ± 0,059 us/op >> CMMPerf.testGetMatrix avgt 5 0,154 ± 0,001 us/op >> >> jdk - before the fix >> Benchmark Mode Cnt Score Error Units >> CMMPerf.ThreadsMAX.testGetMatrix avgt 5 12,935 ± 0,042 us/op >> CMMPerf.testGetMatrix avgt 5 0,127 ± 0,007 us/op >> >> jdk - after the fix >> Benchmark Mode Cnt Score Error Units >> CMMPerf.ThreadsMAX.testGetMatrix avgt 5 0,561 ± 0,005 us/op >> CMMPerf.testGetMatrix avgt 5 0,092 ± 0,001 us/op >> >> 2. Part of performance gain in jdk17 is from some other fixes, for example >> Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and >> ColorSpace.getInstance(ColorSpace.CS_sRGB); >> >> jdk 15.0.2 >> Benchmark Mode Cnt Score Error Units >> CMMPerf.ThreadsMAX.testGetSRGBProfile avgt 5 2,299 ± 0,032 us/op >> CMMPerf.ThreadsMAX.testGetSRGBSpace avgt 5 2,210 ± 0,051 us/op >> CMMPerf.testGetSRGBProfile avgt 5 0,019 ± 0,001 us/op >> CMMPerf.testGetSRGBSpace avgt 5 0,018 ± 0,001 us/op >> >> jdk - same before/ after the fix >> Benchmark Mode Cnt Score Error Units >> CMMPerf.ThreadsMAX.testGetSRGBProfile avgt 5 0,005 ± 0,001 us/op >> CMMPerf.ThreadsMAX.testGetSRGBSpace avgt 5 0,005 ± 0,001 us/op >> CMMPerf.testGetSRGBProfile avgt 5 0,005 ± 0,001 us/op >> CMMPerf.testGetSRGBSpace avgt 5 0,005 ± 0,001 us/op >> >> note "ThreadsMAX" is 32 threads. > > src/java.desktop/share/native/liblcms/LCMS.c line 644: > >> 642: return cmmProfile; >> 643: } >> 644: return NULL; > > Why do we need to do this from native code? (except easing of access to a > private method of a class in another package.) > Will it give some noticeable performance boost if we implement it on java > side? Yes, this is the only reason. I have a todo to check what access will be better, AWTAccessor/methodhandle/reflection vs jni. ------------- PR: https://git.openjdk.java.net/jdk/pull/2957