On Fri, 24 Jan 2020 16:55:47 GMT, Frederic Thevenet 
<github.com+7450507+ftheve...@openjdk.org> wrote:

>>> Here are the results when running JavaFX 14-ea+7.
>>> The columns of the table correspond the width of the target snapshot, while 
>>> the rows correspond to the height and the content of the cells is the 
>>> average time* spent (in ms) in `Node::snapshot`
>>> (*) each test is ran 10 times and the elapsed time is averaged after 
>>> pruning outliers, using Grubbs' test.
>>> 
>>> 1024        2048    3072    4096    5120    6144    7168    8192
>>> 1024        6.304272        10.303935       15.052336       35.929304       
>>> 23.860095       28.828812       35.315288       27.867205
>>> 2048        11.544367       21.156326       28.368750       28.887164       
>>> 47.134738       54.354708       55.480251       56.722649
>>> 3072        15.503187       30.215269       41.304645       39.789648       
>>> 82.255091       82.576379       96.618722       106.586547
>>> 4096        20.928336       38.768648       64.255423       52.608217       
>>> 101.797347      132.516816      158.525192      166.872889
>>> 5120        28.693431       67.275306       68.090280       76.208412       
>>> 133.974510      157.120373      182.329784      210.069066
>>> 6144        29.972591       54.751002       88.171906       104.489291      
>>> 147.788597      185.185643      213.562819      228.643761
>>> 7168        33.668398       63.088490       98.756212       130.502678      
>>> 196.367121      225.166481      239.328794      260.162501
>>> 8192        40.961901       87.067460       128.230351      178.127225      
>>> 198.479068      225.806211      266.170239      325.967840
>> 
>> Any idea why 4096x1024 and 1024x4096 are so different? Same for 8192x1024 
>> and 1024x8192.
> 
> I don't, to be honest. 
> The results for some dimensions  (not always the same) can vary pretty widely 
> from one run to another, despite all my effort to repeat results and remove 
> outliers.
> Out of curiosity, I also tried to eliminate the GC as possible culprit by 
> running it with epsilon, but it seems to make a significant difference.
> I ran that test on a laptop with Integrated Intel graphics and no dedicated 
> vram (Intel UHD Graphics 620), though, so this might be why. 
> Maybe someone could try and run the bench on hardware with a discreet GPU?

With regard to why the tiling version is significantly slower, though, I do 
have a pretty good idea; as Kevin hinted, the pixel copy into a temporary 
buffer before copying into the final image is where most the extra time is 
spent.
The reason why is is some much slower is a little bit of a pity, though; 
profiling a run of the benchmark shows that a lot of time is spent into 
`IntTo4ByteSameConverter::doConvert` and the reason for this turns out that 
this is due to the fact that, under Windows and the D3D pipeline anyway, the 
`WriteableImage` used to collate the tiles and the tiles returned from the 
RTTexture have different pixel formats (IntARGB for the tile and byteBGRA for 
the `WriteableImage`).
So if we could use a `WriteableImage` with an IntARGB pixel format as the 
recipient for the snapshot (at least as long as no image was provided by the 
caller), I suspect that the copy would be much faster.
Unfortunately it seems the only way to choose the pixel format for a 
`WritableImage` is to initialize it with a `PixelBuffer`, but then one can no 
longer use a `PixelWriter` to update it and it desn't seems to me that there is 
a way to safely access the `PixelBuffer` from an image's reference alone.
I'm pretty new to this code base though (which is quite large; I haven't read 
it all quite yet... ;-), so hopefully there's a way to do that that has eluded 
me so far.

-------------

PR: https://git.openjdk.java.net/jfx/pull/68

Reply via email to