Re: [OpenJDK Rasterizer] RFR: Marlin renderer #2

Jim Graham Fri, 19 Jun 2015 15:35:32 -0700

Hi Laurent,

I still have no idea what you mean when you say "arrays = [0]".


Does that mean "new foo[0]"?  Or "new foo[getBucketSize of bucket #0]"?

The latter is what I was envisioning using...

                        ...jim

On 6/19/15 5:08 AM, Laurent Bourgès wrote:

Jim,

here are the benchmark results:
- REF: Marlin reference = initial capacity tuned for arrays and
OffHeapEdgeArray
- NO_INITIAL: initial arrays = [0]
- NO_INITIALS_OFFHEAP_16: initial arrays = [0] and OffHeapEdgeArray(16)

I pushed all details (stats & benchmarks):
http://cr.openjdk.java.net/~lbourges/marlin/bench_initial_arrays/


1/ Benchmark results:

The OffHeapEdgeArray size is more critical: 5% slower than previous test
(initial arrays = [0])

*Renderer*      *Test count*    30      10      10      10

        *Threads*       *4*     *1*     *2*     *4*
*REF*   *Pct95*         237.848         233.887         238.43  241.226
*NO_INITIALS*   *Pct95*         244.261         241.116         244.028         
247.639
*NO_INITIALS
OFF_HEAP_16*    *Pct95*         257.091         253.211         256.13  261.93


For the complex map, it is more pronounced: ~20% slower than the
reference test:

*REF:*
dc_shp_alllayers_2013-00-30-07-00-47.ser        4       100     770.511         
775.448
770.448         4.668   765.125         787.473         100     
        100.00%

*NO_INITIALS_OFF_HEAP_16:*
dc_shp_alllayers_2013-00-30-07-00-47.ser        4       100     902.238         
934.679
910.759         14.478  898.332         956.92  100     
        120.53%

**
*NO_INITIALS:*
dc_shp_alllayers_2013-00-30-07-00-47.ser        4       100     815.775         
823.593
817.352         6.752   813.031         872.658         100     
        106.21%



2/ Statistics: cache accesses (and array sizes per bucket) are very huge.

For example:
- stats_NO_INITIALS.log:
Loading DrawingCommands: ../maps/dc_shp_alllayers_2013-00-30-07-00-47.ser
Loaded DrawingCommands: DrawingCommands{width=1400, height=800,
commands=*135213*}
...
INFO: ArrayCache: int resize: 0 - dirty int resize: 140612 - dirty float
resize: 104025 - dirty byte resize: 103966 - oversize: 0
...
INFO: Array caches for thread: ctx1
INFO: IntArrayCache[4096]: get: 281224 created: 2 - returned: 281224 ::
cache size: 2
INFO: Dirty Array caches for thread: ctx1
INFO: IntArrayCache[4096]: get: 562448 created: 4 - returned: 562448 ::
cache size: 4
INFO: FloatArrayCache[4096]: get: 104025 created: 2 - returned: 104025
:: cache size: 2
INFO: ByteArrayCache[65536]: get: 103966 created: 1 - returned: 103966
:: cache size: 1

- stats_NO_INITIALS_OFFHEAP_16.log:
INFO: renderer.edges.resize[*483598*] sum: 86874016 avg: 179.64 [32 | 4096]

The OffHeapEdgeArray is resized a lot for this map: 4096 is the good
capacity for this test case.

Several test cases need a lot more memory: 32K, 64K or 128K.
*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[15915] sum:
16182208 avg: 1016.789 [32 | 131072]*
*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[7807] sum:
6053440 avg: 775.386 [32 | 65536]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[2231] sum:
4420224 avg: 1981.274 [32 | 131072]*
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[483598] sum:
86874016 avg: 179.64 [32 | 4096]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[4696] sum:
1284224 avg: 273.471 [32 | 8192]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[1655] sum:
520224 avg: 314.334 [32 | 8192]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[794] sum:
1068960 avg: 1346.297 [32 | 16384]
*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[852] sum:
938048 avg: 1100.995 [32 | 32768]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[22] sum:
134217696 avg: 6100804.363 [32 | 67108864]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[62062] sum:
9914976 avg: 159.759 [32 | 65536]
*
The spiral test needs up to 67 108 864 bytes !*
*


To conclude, I already tuned initial capacities according to my
benchmarks without consuming too much memory ~ 512K. However, I agree
these capacities can be adjusted again depending on the workload or if
you have any preference.


3/ Heap size:

I have run again the test NO_INITIALS with only 512m heap:

==> marlin_NO_INITIALS_Xmx512m.log <==
Threads    4    1    2    4
Pct95    250.374    240.754    250.038    260.331

==> marlin_NO_INITIALS.log <==
Threads    4    1    2    4
Pct95    244.261    241.116    244.028    247.639

So the weak cache has a bigger impact the smaller is the heap !
Actually, adding more threads implies more renderer contexts with their
caches that creates more garbage (weak).

Typically the weak cache impacts small memory applications or web
servers = many concurrent map requests !

To conclude, the less garbage Marlin produces, the best performance it is.

To be fair, I should also run again the reference test with 512m; but
let's stop here for now.


I hope these new results will give you an overview of the memory / array
cache issue that Marlin has to deal with.

Laurent

Re: [OpenJDK Rasterizer] RFR: Marlin renderer #2

Reply via email to