Hi Laurent,
I still have no idea what you mean when you say "arrays = [0]".
Does that mean "new foo[0]"? Or "new foo[getBucketSize of bucket #0]"?
The latter is what I was envisioning using...
...jim
On 6/19/15 5:08 AM, Laurent Bourgès wrote:
Jim,
here are the benchmark results:
- REF: Marlin reference = initial capacity tuned for arrays and
OffHeapEdgeArray
- NO_INITIAL: initial arrays = [0]
- NO_INITIALS_OFFHEAP_16: initial arrays = [0] and OffHeapEdgeArray(16)
I pushed all details (stats & benchmarks):
http://cr.openjdk.java.net/~lbourges/marlin/bench_initial_arrays/
1/ Benchmark results:
The OffHeapEdgeArray size is more critical: 5% slower than previous test
(initial arrays = [0])
*Renderer* *Test count* 30 10 10 10
*Threads* *4* *1* *2* *4*
*REF* *Pct95* 237.848 233.887 238.43 241.226
*NO_INITIALS* *Pct95* 244.261 241.116 244.028
247.639
*NO_INITIALS
OFF_HEAP_16* *Pct95* 257.091 253.211 256.13 261.93
For the complex map, it is more pronounced: ~20% slower than the
reference test:
*REF:*
dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 770.511
775.448
770.448 4.668 765.125 787.473 100
100.00%
*NO_INITIALS_OFF_HEAP_16:*
dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 902.238
934.679
910.759 14.478 898.332 956.92 100
120.53%
**
*NO_INITIALS:*
dc_shp_alllayers_2013-00-30-07-00-47.ser 4 100 815.775
823.593
817.352 6.752 813.031 872.658 100
106.21%
2/ Statistics: cache accesses (and array sizes per bucket) are very huge.
For example:
- stats_NO_INITIALS.log:
Loading DrawingCommands: ../maps/dc_shp_alllayers_2013-00-30-07-00-47.ser
Loaded DrawingCommands: DrawingCommands{width=1400, height=800,
commands=*135213*}
...
INFO: ArrayCache: int resize: 0 - dirty int resize: 140612 - dirty float
resize: 104025 - dirty byte resize: 103966 - oversize: 0
...
INFO: Array caches for thread: ctx1
INFO: IntArrayCache[4096]: get: 281224 created: 2 - returned: 281224 ::
cache size: 2
INFO: Dirty Array caches for thread: ctx1
INFO: IntArrayCache[4096]: get: 562448 created: 4 - returned: 562448 ::
cache size: 4
INFO: FloatArrayCache[4096]: get: 104025 created: 2 - returned: 104025
:: cache size: 2
INFO: ByteArrayCache[65536]: get: 103966 created: 1 - returned: 103966
:: cache size: 1
- stats_NO_INITIALS_OFFHEAP_16.log:
INFO: renderer.edges.resize[*483598*] sum: 86874016 avg: 179.64 [32 | 4096]
The OffHeapEdgeArray is resized a lot for this map: 4096 is the good
capacity for this test case.
Several test cases need a lot more memory: 32K, 64K or 128K.
*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[15915] sum:
16182208 avg: 1016.789 [32 | 131072]*
*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[7807] sum:
6053440 avg: 775.386 [32 | 65536]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[2231] sum:
4420224 avg: 1981.274 [32 | 131072]*
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[483598] sum:
86874016 avg: 179.64 [32 | 4096]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[4696] sum:
1284224 avg: 273.471 [32 | 8192]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[1655] sum:
520224 avg: 314.334 [32 | 8192]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[794] sum:
1068960 avg: 1346.297 [32 | 16384]
*stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[852] sum:
938048 avg: 1100.995 [32 | 32768]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[22] sum:
134217696 avg: 6100804.363 [32 | 67108864]
stats_NO_INITIAL_OFFHEAP_16.log:INFO: renderer.edges.resize[62062] sum:
9914976 avg: 159.759 [32 | 65536]
*
The spiral test needs up to 67 108 864 bytes !*
*
To conclude, I already tuned initial capacities according to my
benchmarks without consuming too much memory ~ 512K. However, I agree
these capacities can be adjusted again depending on the workload or if
you have any preference.
3/ Heap size:
I have run again the test NO_INITIALS with only 512m heap:
==> marlin_NO_INITIALS_Xmx512m.log <==
Threads 4 1 2 4
Pct95 250.374 240.754 250.038 260.331
==> marlin_NO_INITIALS.log <==
Threads 4 1 2 4
Pct95 244.261 241.116 244.028 247.639
So the weak cache has a bigger impact the smaller is the heap !
Actually, adding more threads implies more renderer contexts with their
caches that creates more garbage (weak).
Typically the weak cache impacts small memory applications or web
servers = many concurrent map requests !
To conclude, the less garbage Marlin produces, the best performance it is.
To be fair, I should also run again the reference test with 512m; but
let's stop here for now.
I hope these new results will give you an overview of the memory / array
cache issue that Marlin has to deal with.
Laurent