Hi Anders,

On Fri, 9 Apr 2021 at 07:02, Anders Backman <[email protected]> wrote:

> It looks like for each MatrixTransform you add to the scene, you get TWO
> calls to glMatrixUnform4fv, one for the osg_ModelViewMatrix and one for the
> osg_ModelViewProjectionMatrix
> Compared to the fixed pipeline I have 1500 gl calls during one frame, and
> in fixed only 450.
> That should make for some of the decrease in performance.
>

Interesting stats.  I guess we added more complexity to the OSG state
tracking one could figure out whether both osg_ModelViewMatrix and
osg_ModelViewProjectionMatrix are required, this would require quite a few
changes to the core OSG to juggle this.  This would increase the OSG CPU
overhead so not a free addition even it was a net gain for some
applications.



> Other than that a profiling does not give anything specific CPU related
> when running the application.
> It is just that the draw call can be 2-4 times slower (perhaps due to the
> number of gl calls).
>

CPU bottleneck is an ever present problem with OpenGL, which has got worse
with using shaders.  There's a reason why Vulkan was created and why I
started to work on the VulkanSceneGraph.


> But the GPU time is also higher, about 100% with the same complexity (and
> a really simple shader).
>

The GPU is only as fast as the pipe that feeds it, so if the CPU side is
bogged down the GPU will stall and take more time.  Changing state on the
GPU also has a significant impact on performance.

Batching geometry and state helps with CPU and GPU overheads but can't fix
it completely.  The biggest performance gain with using shaders that you
can start batching some scenes much more aggressively using techniquie like
instances.  In you scene this is probably the way to go.

Or... just try the VulkanSceneGraph.  If you just have a bunch of small
geometries that you are controlling with matrices set on the CPU then the
VSG will blow the OSG out of water.  Both Vulkan and the VSG are very well
optimized for this type of load.  Vulkan has a "Push Constants" that are
very lightweight way to pass regularly changing values, it has
significantly lower overhead than using uniforms to do the same.  The VSG
uses push constants for send modelview matrices to the GPU.

The VSG helps by making culling an explicit task - you place CullGroups
above any subgraphs that you want to enable view frustum culling for rather
than being enabled for all nodes all the time unless explicitly disabled.
This dramatically cuts the number of conditionals during traversal as well
as focusing the culling task to just nodes where it's known that it will be
important.  The VSG also allows you to say that a subgraph betlow a
MatrixTransform doesn't require any culling so the view frustum doesn't
need to be transformed into the local coordinate frame - this is another
important optimization that lowers the CPU overhead.

A final important part of the performance puzzle is that Vulkan has all the
command and data preparation done in the application user thread, and then
passed as a block (command buffer) with a single submission call.  You can
prepare multiple command buffers in a parallel and submit them together.

I could go on... the Vulkan and VSG have lots of tricks that radically
change how much performance you can get out of the whole CPU/GPU system.

I guess I need to write a short VSG vs OSG example that illustrates this
type of task, it's an example of worst case scenario for OpenGL/OSG which
won't make Vulkan/VSG even break a sweat.

Cheers,
Robert.

-- 
You received this message because you are subscribed to the Google Groups 
"OpenSceneGraph Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osg-users/CAFN7Y%2BX8uw1Q4OV4gaARROBHv10c8H3sUmy3z%2BvG2nNe9Fxy0w%40mail.gmail.com.

Reply via email to