Jim, First, here are both updated webrev and benchmark results: - results: http://jmmc.fr/~bourgesl/share/java2d-pisces/patch_opt_night.log - webrev: http://jmmc.fr/~bourgesl/share/java2d-pisces/webrev-2/
Note: the webrev is partially "cleaner" - work still in progress ! Changes made: - optimized cleanup of alpha / edges arrays - TileState HARD reference stored in SunGraphics2D to avoid repeated ThreadLocal or ConcurrentQueue accesses - TileState propagated in RenderingEngine to PiscesRenderingEngine: warning: interface compatibility issues - minor tuning. Now the ArrayCache (IntArrayCache, Dirty... and FloatArrayCache) are totally useless during MapBench tests as the RendererContext stores large arrays (16K int or float arrays) + rowAARLE (2Mb). However, I keep the array caching for very high workload ... to be discussed later. Comparison (open office format): http://jmmc.fr/~bourgesl/share/java2d-pisces/compareRef_Patch.ods Patch2 vs ductus: 1 *102,11%* 2 *144,49%* 4 *263,13%* In average, patch2 is equal or better than ductus: 44% for 2 threads and 2.6 times for 4 threads ! In the following table, you can see gain variations depending on the test (work load): my patch performs better than ductus for complex test case only. test threads Tavg Tmed *Med+Stddev* boulder_17 1 82,54% 77,68% *76,99%* boulder_17 2 119,57% 120,24% *128,56%* boulder_17 4 149,95% 150,39% * 161,98%* shp_alllayers_47 1 107,26% 107,18% *107,02%* shp_alllayers_47 2 144,24% 144,18% *147,00%* shp_alllayers_47 4 288,05% 289,10% *286,04%* Secondly, here are my comments: 2013/4/24 Jim Graham <james.gra...@oracle.com> > > Originally the version that was used in embedded used RLE because it > stored the results in the shape itself. On desktop I never found that to > be a necessary optimization especially because it actually wastes memory > for no gain during animations, but that was why they used RLE as a storage > format. Would it speed up the code to use a different storage format? > Maybe it could be a very good idea: compressing alpha array to RLE and then decompressing it to fill byte[] tile array seems a bad idea. However, keeping RLE encoding may help having smaller arrays to store a complete tile line as I want: width = 4096 (or more) x height = 32. As memory is cheap nowadays, I could try having a large 1D array to store alpha values for complete tile line: 512K only ! > Also, in the version we use in JavaFX we removed the tiling altogether and > return one alpha array for the entire rasterization. We might consider > doing that for this code as well if it allows us to get rid of Ductus - it > was a Ductus design constraint that forced the tiling (it was again based > on the expected size of the hardware AA engine)... > I think tiling is still interesting as such small arrays stay in the cpu cache ! however, I could try tuning the tile width to be larger (256x32) instead of (32x32 tiles) ... Finally, Who could help me working on pisces ? Could we form a tiger team ? or at least could denis and you have some time to help me ? Laurent