Re: Optimised, high-performance, multi-threaded rendering pipeline

Michael Paus Sun, 27 Nov 2016 23:11:42 -0800

I am interested too although I have only been listening quietly so fardue to lack of time.

Cheers
Michael

Am 28.11.16 um 06:54 schrieb Felix Bembrick:

Sorry Gerrit - you did indeed.


Maybe you'd also like to participate in the offline discussion (especially now 
that you don't work for Oracle)?

On 28 Nov. 2016, at 16:07, han.s...@icloud.com wrote:

Well I mentioned before that I'm interested too :)

Cheers,

Gerrit


Am 27. Nov. 2016, 22:58 +0100 schrieb Felix Bembrick <felix.bembr...@gmail.com>:

Well, given that you and Benjamin seem to be the only people interested in it, 
perhaps we should discuss it offline (so as not to bother Oracle or spam list 
this)...

On 28 Nov. 2016, at 06:57, Tobias Bley <b...@jpro.io> wrote:

Where can we read more about your HPR renderer?

Am 25.11.2016 um 16:45 schrieb Felix Bembrick <felix.bembr...@gmail.com>:

Short answer? Maybe.

But exactly one more word than any from Oracle ;-)

On 26 Nov. 2016, at 00:07, Tobias Bley <b...@jpro.io> wrote:

A very short answer ;) ….

Do you have any URL?

Am 25.11.2016 um 12:19 schrieb Felix Bembrick <felix.bembr...@gmail.com>:

Yes.

On 25 Nov. 2016, at 21:45, Tobias Bley <b...@jpro.io> wrote:

Hi,

@Felix: Is there any Github project, demo video or trial to test HPR with 
JavaFX?

Best regards,
Tobi

Am 11.11.2016 um 12:08 schrieb Felix Bembrick <felix.bembr...@gmail.com>:

Thanks Laurent,

That's another thing we discovered: using Java itself in the most performant 
way can help a lot.

It can be tricky, but profiling can often highlight various patterns of object 
instantiation that show-up red flags and can lead you directly to regions of 
the code that can be refactored to be significantly more efficient.

Also, the often overlooked GC log analysis can lead to similar discoveries and 
remedies.

Blessings,

Felix

On 11 Nov. 2016, at 21:55, Laurent Bourgès <bourges.laur...@gmail.com> wrote:

Hi,

To optimize Pisces that became the Marlin rasterizer, I carefully avoided any 
both array allocation (byte/int/float pools) and also reduced array copies or 
clean up ie only clear dirty parts.

This approach is generic and could be applied in other critical places of the 
rendering pipelines.

FYI here are my fosdem 2016 slides on the Marlin renderer:
https://bourgesl.github.io/fosdem-2016/slides/fosdem-2016-Marlin.pdf

Of course I would be happy to share my experience and work with a tiger team on 
optimizing JavaFX graphics.

However I would like getting sort of sponsoring for my potential 
contributions...

Cheers,
Laurent

Le 11 nov. 2016 11:29, "Tobi" <t...@ultramixer.com> a écrit :

Hi,

thanks Felix, Laurent and Chris for sharing your stuff with the community!

I am happy to see starting a discussion about boosting up the JavaFX rendering 
performance. I can confirm that the performance of JavaFX scene graph is not 
there where it should be. So multithreading would be an excellent, but 
difficult approach.

Felix, concerning your research of other toolkits: Do they all use 
multithreading or are there any toolkits which use single threading but are 
faster than JavaFX?

So maybe there are other points than multithreading where we can boost the 
performance?

2) your HPR sounds great. Did you already try DemoFX (part 3) benchmark with 
your HPR?


Best regards,
Tobi

Am 10.11.2016 um 19:11 schrieb Felix Bembrick <felix.bembr...@gmail.com>:

(Thanks to Kevin for lifting my "awaiting moderation" impasse).

So, with all the recent discussions regarding the great contribution by
Laurent Bourgès of MarlinFX, it was suggested that a separate thread be
started to discuss parallelisation of the JavaFX rendering pipeline in
general.

As has been correctly pointed-out, converting or modifying the existing
rendering pipeline into a fully multi-threaded and performant beast is
indeed quite a complex task.

But, that's exactly what myself and my colleagues have been working on for
about 2 years.

The result is what we call the Hyper Rendering Pipeline (HPR).

Work on HPR started when we developed FXMark and were (bitterly)
disappointed with the performance of the JavaFX scene graph. Many JavaFX
developers have blogged about the need to dramatically minimise the number
of nodes (especially on embedded devices) in order to achieve even
"acceptable" performance. Often it is the case that most (if not all
rendering) is eventually done in a single Canvas node.

Now, as well already know, the JavaFX Canvas does perform very well and the
recent awesome work (DemoFX) by Chris Newland, just for example, shows what
can be done with this one node.

But, the majority of the animation plumbing in JavaFX is related to the
scene graph itself and is designed to make use of multiple nodes and node
types. At the moment, the performance of this scene graph is the Achilles
Heel of JavaFX (or at least one of them).

Enter HPR.

I personally have worked with a number of hardware-accelerated toolkits
over the years and am astounded by just how sluggish the rendering pipeline
for JavaFX is. When I am animating just a couple of hundred nodes using
JavaFX and transitions, I am lucky to get more than about 30 FPS, but on
the same (very powerful) machine, I can use other toolkits to render
thousands of "objects" and achieve frame rates well over 1000 FPS.

So, we refactored the entire scene graph rendering pipeline with the
following goals and principles:

1. It is written using JavaFX 9 and Java 9 (but could theoretically be
back-ported to JavaFX 8 though I see no reason to).

2. We analysed how other toolkits had optimised their own rendering
pipelines (especially Qt which has made some significant advances in this
area in recent years). We also analysed recent examples of multi-threaded
rendering using the new Vulkan API.

3. We carefully analysed and determined which parts of the pipeline should
best utilise the CPU and which parts should best utilise the GPU.

4. For those parts most suited to the CPU, we use the advanced concurrency
features of Java 8/9 to maximise parallelisation and throughput by
utilising multiple cores & threads in as an efficient manner as possible.

5. We devoted a large amount of time to optimising the "communication"
between the CPU and GPU to be far less "chatty" and this alone led to some
huge performance gains.

6. We also looked at the structure of the scene graph itself and after
studying products such as OpenSceneGraph, we refactored the JavaFX scene
graph in such a way that it lends itself to optimised rendering much more
easily.

7. This is clearly not a "small" patch. In fact to refer to it as a
"patch" is probably rather inappropriate.

The end result is that we now have a fully-functional prototype of HPR and,
already, we are seeing very significant performance improvements.

At the minimum, scene graph rendering performance has improved by 500% and,
with judicious and sometimes "tricky" use of caching, we have seen
improvements in performance of 10x or more.

And... we are only just *starting* with the performance optimisation phase.

The potential for HPR is massive as it opens-up the possibility for the
JavaFX scene graph and the animation/transition infrastructure to be used
for a whole new class of applications including games, advanced
visualisations etc., without having to rely on imperative programming of a
single Canvas node.

I believe that HPR, along with tremendous recent developments like JPro and
the outstanding work by Gluon on mobiles and embedded devices, could
position JavaFX to be the best graphics toolkit of any kind in any language
and, be the ONLY *truly* cross-platform graphics technology available.

WORA for graphics and UIs is finally within reach!

Blessings,

Felix

Re: Optimised, high-performance, multi-threaded rendering pipeline

Reply via email to