Hi All, I've now installed 18.04 on my new AMD2700+Geforce2060 system and did a range of more tests and learnt a few things along the way.
First up I tried out the open source graphics drivers that come with 18.04 and they do a really poor job at supporting the 2060, screen resolution pegged at 1024x768 and while the OSG compiled and ran just fine for my small city test model I only get 39fps on it. I couldn't work out how to get the vulkan drivers working so didn't do any tests. Second I installed the NVidia drivers, Ubuntu/Kubuntu now requires a few more steps than it used a few years back, seems like they have a strong preference for the open source drivers, but as performance and support for modern cards really sucks I don't feel this is a great move. Once I installed the NVidia drivers I frame rate for the small city scene and standard path jumped to 368fps at 1920x1020 so way more than an order of magnitude better, also got my dual monitors work fine too. While exploring the different options in the GUI for the displays I came across the toggle for switching off the compositor. This used to be alongside the desktop effects settings GUI, but now moved to the display settings. Switching off the compositor suddenly let the hand brake off and my new system started pushing frame rates higher than my older Intel+Gefore1060 system. Curiously the old system had compositor switched on and didn't see the same capping of framerates with the VSG/Vulkan. I don't know whether why this is happening as they now both have 18.04 installed, perhaps it's hardware, perhaps it's the later NVidia drivers, I'll look to upgrading the NVIdia drives on the old system next. Switching off the compositor on the Intel system helps the max performance as well, but only 25% rather than 200% like I saw on the new system. Now that I've switched off the compositor on the AMD2700+Geforce 2060 system I'm seeing more predictable results between the two systems and see patterns emerge. Intel Core i7 4770S AMD Ryzen7 2700 Geforce 1060 Geforce 2060 OSG@1920x1080 484fps 369fps (28% slower) VSG@1920x1080 2168fps 2697fps (23% faster) VSG@192x108 2712fps 2842fps (4% faster) So here we finally see the Geforce 2060 stretch it legs and beat the 1060 thanks to it's better fill rate. The OSG's slow performance on the AMD chip though more than overwhelmed the results at is significantly slower. For users that rely on OSG applications and considering whether to go for an Intel vs AMD or investing in a new GPU, the Intel is going to be the far more critical change. The results show how different approaches I've used in the VSG for reducing node size and the complexity of traversal, along with Vulkan there just isn't the same AMD penalty that we see with the OSG, instead we see the scaling we should expect with upgrading the graphics hardware. The difference isn't just down to OpenGL vs Vulkan with the difference between Intel and AMD, in developing the VSG I wrote two test programs, osggroups and vsggroups, that both create a quad tree graph (11 levels deep by default) and traversers it 10x and then destructs it. Here we can see like for like on pure CPU scene graph operations. Intel Core i7 4770S AMD Ryzen7 2700 osgroups 3.77 secs 4.91 (30% slower) vsggroups 0.55secs 0.55secs - almost identical! The results with osggroup CPU test mirrors the speed difference in the osgviewer test with the small city model I've been using, so this indicates that it's not just down to differences in OpenGL vs Vulkan that we see differences in performance. The vsg results being nearly identical doesn't quite tell the full story. I've run more VSG related tests and find that double dispatch visitor&traversal vs single dispatch visitor&traversal and find that the Intel chip sees more penalty with double dispatch than the AMD chip. The AMD tests though show that the destruction of the scene graph is higher than the Intel chip. Things tend to balance out for the vsggroups test though, it's more fluke than any important. The key take away is that when you use the CPU's more efficiently like the VSG does compared to OSG the two chips both perform in a over similar way w.r.t work per cycles. Are these efficiencies hat I've efficiency with the VSG possible with the OSG? Unfortunately not without breaking key features. osg::Node's are significantly bigger than their vsg::Node counterparts as the OSG nodes hold more optional data. The OSG traversal also checks more settings - like NodeMask or presence of optional StateSet that can be stored with all Nodes. The osg::NodeVisitor has more different options that control it's behavior so adds more work on the traversal through the scene graph. All these extra checks and memory usage cause more cache misses, more branch predication failures and less parallelism. To illustrate the difference between OSG and VSG when I run perf stat I see that the OSG osggroup run achieves a ~0.7 instructions per cycle while the VSG's vsggruoup run achieves ~2.2 instructions per cycle. To close the gap we'd need to look at getting rid of NodeMask on all Nodes, changing NodeVisitor to be less flexible moving more responsibility on the subclasses to do more work. Such changes would break a lot of end user applications. Cheers, Robert _______________________________________________ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org