Hi Anders, On Fri, 9 Apr 2021 at 07:02, Anders Backman <[email protected]> wrote:
> It looks like for each MatrixTransform you add to the scene, you get TWO > calls to glMatrixUnform4fv, one for the osg_ModelViewMatrix and one for the > osg_ModelViewProjectionMatrix > Compared to the fixed pipeline I have 1500 gl calls during one frame, and > in fixed only 450. > That should make for some of the decrease in performance. > Interesting stats. I guess we added more complexity to the OSG state tracking one could figure out whether both osg_ModelViewMatrix and osg_ModelViewProjectionMatrix are required, this would require quite a few changes to the core OSG to juggle this. This would increase the OSG CPU overhead so not a free addition even it was a net gain for some applications. > Other than that a profiling does not give anything specific CPU related > when running the application. > It is just that the draw call can be 2-4 times slower (perhaps due to the > number of gl calls). > CPU bottleneck is an ever present problem with OpenGL, which has got worse with using shaders. There's a reason why Vulkan was created and why I started to work on the VulkanSceneGraph. > But the GPU time is also higher, about 100% with the same complexity (and > a really simple shader). > The GPU is only as fast as the pipe that feeds it, so if the CPU side is bogged down the GPU will stall and take more time. Changing state on the GPU also has a significant impact on performance. Batching geometry and state helps with CPU and GPU overheads but can't fix it completely. The biggest performance gain with using shaders that you can start batching some scenes much more aggressively using techniquie like instances. In you scene this is probably the way to go. Or... just try the VulkanSceneGraph. If you just have a bunch of small geometries that you are controlling with matrices set on the CPU then the VSG will blow the OSG out of water. Both Vulkan and the VSG are very well optimized for this type of load. Vulkan has a "Push Constants" that are very lightweight way to pass regularly changing values, it has significantly lower overhead than using uniforms to do the same. The VSG uses push constants for send modelview matrices to the GPU. The VSG helps by making culling an explicit task - you place CullGroups above any subgraphs that you want to enable view frustum culling for rather than being enabled for all nodes all the time unless explicitly disabled. This dramatically cuts the number of conditionals during traversal as well as focusing the culling task to just nodes where it's known that it will be important. The VSG also allows you to say that a subgraph betlow a MatrixTransform doesn't require any culling so the view frustum doesn't need to be transformed into the local coordinate frame - this is another important optimization that lowers the CPU overhead. A final important part of the performance puzzle is that Vulkan has all the command and data preparation done in the application user thread, and then passed as a block (command buffer) with a single submission call. You can prepare multiple command buffers in a parallel and submit them together. I could go on... the Vulkan and VSG have lots of tricks that radically change how much performance you can get out of the whole CPU/GPU system. I guess I need to write a short VSG vs OSG example that illustrates this type of task, it's an example of worst case scenario for OpenGL/OSG which won't make Vulkan/VSG even break a sweat. Cheers, Robert. -- You received this message because you are subscribed to the Google Groups "OpenSceneGraph Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osg-users/CAFN7Y%2BX8uw1Q4OV4gaARROBHv10c8H3sUmy3z%2BvG2nNe9Fxy0w%40mail.gmail.com.
