On Fri, 12 Mar 2021 05:10:25 GMT, Sergey Bylokhov <s...@openjdk.org> wrote:
> FYI: probably is better/simpler to review it via webrev. > > After migration to the lcms from the kcms the performance of some operations > was regressed. One possible workaround was to split the operation into > multiple threads. But unfortunately, we have too many bottlenecks which > prevent using multithreading. This is the request to remove/minimize such > bottlenecks(at least some of them), but it does not affect the > single-threaded performance it should be handled separately. > > The main code pattern optimized here is this: > activate(); > byte[] theHeader = getData(cmmProfile, icSigHead); > ----> CMSManager.getModule().getTagData(p, tagSignature); > Notes about the code above: > > 1. Before the change "activate()" method checked that the "cmmProfile" field > was not null. After that we usually used the "cmmProfile" as the parameter to > some other method. This included two volatile reads, and also required to > check when we need to call the "activate()" method before usage of the > "cmmProfile" field. > Solution: "activate()" renamed to the "cmmProfile()" which became an accessor > for the field, so we will get one volatile read and can easily monitor the > usage of the field itself(it is used directly only in this method). > > 2. The synchronized static method "CMSManager.getModule()" reimplemented to > have only one volatile read. > > 3. The usage of locking in the "getTagData()" is changed. Instead of the > synchronized instance methods, we now use the mix of "ConcurrentHashMap" and > StampedLock. > > See some comments inline. > > Some numbers(small numbers are better): > > 1. Performance of ((ICC_ProfileRGB) > ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix(); > > jdk 15.0.2 > Benchmark Mode Cnt Score Error Units > CMMPerf.ThreadsMAX.testGetMatrix avgt 5 19,624 ± 0,059 us/op > CMMPerf.testGetMatrix avgt 5 0,154 ± 0,001 us/op > > jdk - before the fix > Benchmark Mode Cnt Score Error Units > CMMPerf.ThreadsMAX.testGetMatrix avgt 5 12,935 ± 0,042 us/op > CMMPerf.testGetMatrix avgt 5 0,127 ± 0,007 us/op > > jdk - after the fix > Benchmark Mode Cnt Score Error Units > CMMPerf.ThreadsMAX.testGetMatrix avgt 5 0,561 ± 0,005 us/op > CMMPerf.testGetMatrix avgt 5 0,092 ± 0,001 us/op > > 2. Part of performance gain in jdk17 is from some other fixes, for example > Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and > ColorSpace.getInstance(ColorSpace.CS_sRGB); > > jdk 15.0.2 > Benchmark Mode Cnt Score Error Units > CMMPerf.ThreadsMAX.testGetSRGBProfile avgt 5 2,299 ± 0,032 us/op > CMMPerf.ThreadsMAX.testGetSRGBSpace avgt 5 2,210 ± 0,051 us/op > CMMPerf.testGetSRGBProfile avgt 5 0,019 ± 0,001 us/op > CMMPerf.testGetSRGBSpace avgt 5 0,018 ± 0,001 us/op > > jdk - same before/ after the fix > Benchmark Mode Cnt Score Error Units > CMMPerf.ThreadsMAX.testGetSRGBProfile avgt 5 0,005 ± 0,001 us/op > CMMPerf.ThreadsMAX.testGetSRGBSpace avgt 5 0,005 ± 0,001 us/op > CMMPerf.testGetSRGBProfile avgt 5 0,005 ± 0,001 us/op > CMMPerf.testGetSRGBSpace avgt 5 0,005 ± 0,001 us/op > > note "ThreadsMAX" is 32 threads. src/java.desktop/share/native/liblcms/LCMS.c line 644: > 642: return cmmProfile; > 643: } > 644: return NULL; Why do we need to do this from native code? (except easing of access to a private method of a class in another package.) Will it give some noticeable performance boost if we implement it on java side? ------------- PR: https://git.openjdk.java.net/jdk/pull/2957