Hi Laurent,

You are looking at the wrong loop.  It's tough to explain...

vis_*.c are only ever compiled or used on Solaris. They convince the compiler to emit Sparc's version of MMX instructions. They are not even compiled on any other build except for Solaris.

You were probably confused because they look like the implementations of the functions you were looking for and you never saw any other implementation of that function. That's because all of the software loops are actually constructed using a very complicated system of Macros. If you look at loops/IntArgbPre.c you will see a bunch of macro calls at the top which expand to declaring the functions such as "IntArgbPreSrcMaskFill". Then you will see a structure with a bunch of Macro invocations in it which expand to declaring a structure describing the loops, one per loop function. Then you will see a bunch more macro invocations, one per line, which surprisingly expand to entire functions for each one of them.

You'd have to do some serious tracing of macros to see what the code looks like, but most of the macros expand from either IntArgb.h or LoopMacros.h...

                        ...jim

On 9/24/15 7:59 AM, Laurent Bourgès wrote:
Sergey,

I managed to create a new benchmark with JMH + perfasm profiler:
http://cr.openjdk.java.net/~lbourges/jmh/ellipse_fill/

See MyBenchMark.java that fills an ellipse with radius in {"100", "500",
"900", "1400"}

I tested with both Oracle JDK8 and Oracle JDK9 EA b81 ie using the
ductus rendering engine:
http://cr.openjdk.java.net/~lbourges/jmh/ellipse_fill/bench_jdk8.log
http://cr.openjdk.java.net/~lbourges/jmh/ellipse_fill/bench_jdk9.log

JDK8:
Benchmark                     (size)  Mode  Cnt  Score   Error  Units
MyBenchmark.fillEllipse          100  avgt    3  0,207 ± 0,034  ms/op
MyBenchmark.fillEllipse          500  avgt    3  1,931 ± 0,112  ms/op
MyBenchmark.fillEllipse          900  avgt    3  5,158 ± 0,346  ms/op
MyBenchmark.fillEllipse         1400  avgt    3  9,628 ± 1,321  ms/op

JDK9:
Benchmark                     (size)  Mode  Cnt   Score   Error  Units
MyBenchmark.fillEllipse          100  avgt    3   0,223 ± 0,005  ms/op
MyBenchmark.fillEllipse          500  avgt    3   2,069 ± 0,044  ms/op
MyBenchmark.fillEllipse          900  avgt    3   5,393 ± 0,285  ms/op
MyBenchmark.fillEllipse         1400  avgt    3  12,305 ± 0,104  ms/op

JDK9 is slower ~ 10% in this test.


I tried to interpret the profiler info but I just noticed the hotspots
are located in native code (libawt.so):

JDK8:

....[Hottest 
Regions]...............................................................................
  48,53%   51,78%  [0x7f78197f9ae1:0x7f78197f9b27] in IntArgbPreSrcMaskFill 
(libawt.so)
  11,27%   11,68%  [0x7f78197f9900:0x7f78197f9aa6] in IntArgbPreSrcMaskFill 
(libawt.so)
   9,91%   11,58%  [0x7f7813bc6527:0x7f7813bc65bd] in writeAlpha8 (libdcpr.so)
   6,51%    2,73%  [0x7f7813bc5471:0x7f7813bc560a] in processJumpBuffer; 
processSubBufferInTile (libdcpr.so)
   2,13%    2,16%  [0x7f7813bc6436:0x7f7813bc6506] in writeAlpha8 (libdcpr.so)


JDK9:
...[Hottest
Regions]...............................................................................
  61,90%   66,72%  [0x7f71ae7f5678:0x7f71ae7f5837] in
IntArgbPreSrcMaskFill (libawt.so)
  10,06%    5,40%  [0x7f71acb0aa77:0x7f71acb0afa9] in processJumpBuffer;
processSubBufferInTile; reset.isra.4 (libdcpr.so)
   9,23%   10,45%  [0x7f71acb0bb68:0x7f71acb0bc7d] in writeAlpha8
(libdcpr.so)

So this test is using the software pixel loop [IntArgbPreSrcMaskFill].

I looked at the source code and compared the libawt / java2d / loops /
vis_IntArgbPre_Mask.c from openjdk8 and openjdk9 but those are the same !

Can it be a JNI issue or a compilation issue (gcc settings ...) with
that native code ?

Any idea, Sergey ?

Thanks for the tips,
Laurent

2015-09-24 4:17 GMT+02:00 Sergey Bylokhov <[email protected]
<mailto:[email protected]>>:

    On 22.09.15 0:15, Laurent Bourgès wrote:

        Conclusion:
        The new patch seems promising as it is very close to ductus
        performance.
        Filling ellipse seems slower on OpenJDK9 (492 / 437 = 12%
        slower) ! Any
        MaskFill changes ?


    For such checks I suggest to use JMH + "prof perfasm". It will
    provide really good info per java methods(before/after compilation)
    including assemblers, plus the log include the native methods.
    Example looks like this:
    http://cr.openjdk.java.net/~shade/jmh/perfasm-sample.log

    http://openjdk.java.net/projects/code-tools/jmh

    It is really good in java2d because sometimes it is unclear where
    the problem is occurs(java or native or new objects etc), and any
    java profilers can change the behavior of application.

    --
    Best regards, Sergey.


Reply via email to