On Fri, 24 Jan 2020 16:55:47 GMT, Frederic Thevenet <github.com+7450507+ftheve...@openjdk.org> wrote:
>>> Here are the results when running JavaFX 14-ea+7. >>> The columns of the table correspond the width of the target snapshot, while >>> the rows correspond to the height and the content of the cells is the >>> average time* spent (in ms) in `Node::snapshot` >>> (*) each test is ran 10 times and the elapsed time is averaged after >>> pruning outliers, using Grubbs' test. >>> >>> 1024 2048 3072 4096 5120 6144 7168 8192 >>> 1024 6.304272 10.303935 15.052336 35.929304 >>> 23.860095 28.828812 35.315288 27.867205 >>> 2048 11.544367 21.156326 28.368750 28.887164 >>> 47.134738 54.354708 55.480251 56.722649 >>> 3072 15.503187 30.215269 41.304645 39.789648 >>> 82.255091 82.576379 96.618722 106.586547 >>> 4096 20.928336 38.768648 64.255423 52.608217 >>> 101.797347 132.516816 158.525192 166.872889 >>> 5120 28.693431 67.275306 68.090280 76.208412 >>> 133.974510 157.120373 182.329784 210.069066 >>> 6144 29.972591 54.751002 88.171906 104.489291 >>> 147.788597 185.185643 213.562819 228.643761 >>> 7168 33.668398 63.088490 98.756212 130.502678 >>> 196.367121 225.166481 239.328794 260.162501 >>> 8192 40.961901 87.067460 128.230351 178.127225 >>> 198.479068 225.806211 266.170239 325.967840 >> >> Any idea why 4096x1024 and 1024x4096 are so different? Same for 8192x1024 >> and 1024x8192. > > I don't, to be honest. > The results for some dimensions (not always the same) can vary pretty widely > from one run to another, despite all my effort to repeat results and remove > outliers. > Out of curiosity, I also tried to eliminate the GC as possible culprit by > running it with epsilon, but it seems to make a significant difference. > I ran that test on a laptop with Integrated Intel graphics and no dedicated > vram (Intel UHD Graphics 620), though, so this might be why. > Maybe someone could try and run the bench on hardware with a discreet GPU? With regard to why the tiling version is significantly slower, though, I do have a pretty good idea; as Kevin hinted, the pixel copy into a temporary buffer before copying into the final image is where most the extra time is spent. The reason why is is some much slower is a little bit of a pity, though; profiling a run of the benchmark shows that a lot of time is spent into `IntTo4ByteSameConverter::doConvert` and the reason for this turns out that this is due to the fact that, under Windows and the D3D pipeline anyway, the `WriteableImage` used to collate the tiles and the tiles returned from the RTTexture have different pixel formats (IntARGB for the tile and byteBGRA for the `WriteableImage`). So if we could use a `WriteableImage` with an IntARGB pixel format as the recipient for the snapshot (at least as long as no image was provided by the caller), I suspect that the copy would be much faster. Unfortunately it seems the only way to choose the pixel format for a `WritableImage` is to initialize it with a `PixelBuffer`, but then one can no longer use a `PixelWriter` to update it and it desn't seems to me that there is a way to safely access the `PixelBuffer` from an image's reference alone. I'm pretty new to this code base though (which is quite large; I haven't read it all quite yet... ;-), so hopefully there's a way to do that that has eluded me so far. ------------- PR: https://git.openjdk.java.net/jfx/pull/68