J-S,

You have not mentioned which lispsm is used but if thats DrawBounds (flavour) it does 2 extra cull & render passes. Shadow map pass is preceeded with depth buffer render pass used to compute DrawBounds. I should also mention that this computation is made using ReadPixels and scanning the picture on CPU. But picture is small (64x64) and I once did tests on performance penalty for ReadPixels at GF 8800, and they not seemed to be a bottleneck. I turned off reading after first succesful ReadPixels and framerate did not changed. But I guess this situation may change with different GPUs.

Our DBs are also made from small batches. My observations were that not only the size of primitive sets but small state attribute changes between them were hitting hard as well. I once did an experiment. We had a DB that was suffering from small batches problem. I have built Texture Atlases, then put all the scene Textures into single Texture2DArray (yes it was huge). Then I removed Statesets and created only one StateSet at scene root with Shaders to use Texture2DArray I built. I think I have not done anything to primitve sets they were the same as before only StateSets were gone. And framerate went up 2 times.

Cheers,
Wojtek

-----Oryginalna wiadomość----- From: Jean-Sébastien Guay
Sent: Friday, January 21, 2011 10:13 PM
To: OpenSceneGraph Users
Subject: [osg-users] Optimizing scene structure and geometry

Hi all,

I thought I had a pretty firm grasp on what to optimize given a certain
set of scene stats, but I've optimized what I can and I'm still getting
little improvement in results. So I'll explain my situation here and
hope you guys have some good suggestions. Sorry if this is a long
message, but I prefer to give all the relevant data now rather than get
asked later.

The whole scene is about a 200m x 200m square (apart from the ocean and
skydome but these are not significant, I have removed them and confirmed
that the situation is the same). The worst case viewpoint is a flying
view where the whole scene could be visible at once. So I need to
balance culling cost with draw cost, since in some views we will see
only part of the scene (so we should be able to cull away at least part
of what's not visible) and in the flying view everything is visible so
we shouldn't waste too much time doing cull tests which we know will not
cull anything.

The other thing is that there are a lot of dynamic objects, so there are
a lot of transforms. But I can't change this, it's part of our simulation.

So, after doing some optimization (removing redundant groups, building
texture atlases where possible, merging geodes and geometry, generating
triangle strips, most of which I did with the osgUtil::Optimizer), I get
the following stats, which I'll talk about a bit later:

Scene stats:
StateSets     1345
Groups         392
Transforms     672
Geodes         992
Geometry       992
Vertices    139859
Primitives   87444

Camera stats:
State graphs       1282
Drawables          2151
PrimitiveSets     73953
Triangles          3538
Tri. Strips      211091
Tri. Fans            16
Quads             11526
Quad Strips         534
Total primitives 226705

And, both in our simulator and in osgViewer, for the same scene and same
viewpoint, I get:

FPS: ~35
Cull: 5.4ms
Draw: 19ms
GPU: 19ms

This is on a pretty good machine: Core i7 920, GeForce GTX 260.

First of all, the stats above tell me that the "Primitives" part of the
scene stats refers to primitive sets, not just primitives... Since the
camera stats tell me there are over 226000 primitives in the current view.

As you can see, the number of primitiveSets is very high. If I
understand correctly, each PrimitiveSet will result in an OpenGL draw
call, and since my draw time is what's high now, I would want to reduce
that (since I'm currently at about 3 primitives per primitiveSet on
average). If I remove triangle strip generation from the optimizer
options, the stats become:

Scene stats:
StateSets     1345
Groups         392
Transforms     672
Geodes         992
Geometry       992
Vertices    190392
Primitives   51197

Camera stats:
State graphs       1254
Drawables          2117
PrimitiveSets      4899
Triangles         17122
Tri. Strips         191
Tri. Fans          7212
Quads            106464
Quad Strips         534
Total primitives 131523

This indicates to me that the tristrip visitor in the optimizer does a
pretty bad job. I looked at an .osg dump, and it seems to generate a
separate strip for each quad (so one strip for 4 vertices) which is
ridiculous... But that's a subject for another day.

When I disabled the tristripper, you can see a massive decrease in the
number of primitiveSets (and even in the number of primitives), however
there was no significant change in the frame rate and timings. I don't
understand this. I would have expected, with more primitives per
primitiveSet (I'm now at about 26 prims per primSet on average, as
opposed to around 3 before) and much less draw calls, that the draw time
would have been much lower. That's not what happens in practice.

My previous attempts at optimizing (using the osgUtil::Optimizer) were
also centered around lowering the number of primitives (by creating
texture atlases and sharing state so the merging of geodes and geometry
objects gave good results). And even though that also lowered the
numbers (I started at around 2215 Geodes and 2521 Geometry objects in
the same scene, compare that to 992 each now), it also had underwhelming
results in practice.

Clearly there are more than one primitiveSet per Geometry in the above
stats. What I see in the dumped .osg file, is there is often things like:

          PrimitiveSets 4
          {
            DrawArrays TRIANGLES 0 12
            DrawArrays QUADS 12 152
            DrawArrays TRIANGLES 164 12
            DrawArrays QUADS 176 152
          }

I would expect, by reordering the vertex/color/normal/texCoord data, I
would be able to get only 2 primitiveSets there, one TRIANGLES and one
QUADS. Am I wrong? Why does the osgUtil::Optimizer not do this already
when merging Geometry objects? I expect because it's easier not to do
it, but still, it gives sub-optimal results...

Of course I can't do that for strips or fans, unless I insert new
vertices to restart the strip. Again this is something that could be
done, but might bring diminishing returns in my case given that my own
scene contains many more triangles and quads than strips and fans (when
I turn off tristripping).

So, first of all, am I on the right track trying to reduce the number of
primitiveSets? Do you think on current hardware, disabling tristripping
is a good idea?

Why, when disabling tristripping which reduced the number of
primitiveSets from 73953 to 4899, didn't I see an increase in performance?

Is there some other way to find out what's going on and seeing what I
can improve to increase the performance? I've tried running our app in
gDEBugger, which tipped me off that I was batching poorly when using
triangle strips (about 3 prims per primitiveSet as I said above).
Turning off triangle strips improved the situation (as gDEBugger sees
it), but not by that much, which is probably coherent with what I'm
seeing in practice, but I'm no closer to finding out what to improve
next. What is not mergeable now is like that because of different
settings in StateSets (backface culling on vs off, can't use texture
atlas because the wrap mode is set to REPEAT, etc.), so I don't think
osgUtil::Optimizer can help me improve the situation further...

I have looked at video memory usage by the way, and I'm fine in that
respect, so I don't think I'm getting any thrashing or paging between
video RAM and main RAM at runtime. Also, I'm using display lists for
most of the objects in the scene, I tried using Vertex Buffer Objects
and it actually slowed it down.

I should also mention that these results are obtained using
osgShadow::LightSpacePerspectiveShadowMap. I can run the dumped .osg
file with

  osgshadow --lispsm --noUpdate --mapres 2048 <dumped_file>.osg

and I get the results above, which are pretty similar to our simulator.
If I run the same data file in plain osgViewer without shadows, it runs
at a solid 60Hz, with stats and timings:

Scene stats:
StateSets     1345
Groups         392
Transforms     672
Geodes         992
Geometry       992
Vertices    190392
Primitives   51197

Camera stats:
State graphs        321
Drawables           810
PrimitiveSets      1774
Triangles          7243
Tri. Strips          85
Tri. Fans          2508
Quads             39370
Quad Strips         178
Total primitives  49384

FPS: 60
Cull: 1.7ms
Draw: 8ms
GPU: 6.8ms

(that's the no tristrips version, so compare these stats to the second
set of stats from the top, not the first)

I would have expected most numbers there to be half what they were with
shadows enabled, but as you can see they're consistently less than half,
so shadows added more than a 100% overhead... Note that even if it added
exactly 100% overhead, I would still be at 16ms draw, which is too much,
but I'm just mentioning it in case it may prompt some other suggestions.

I'm not sure I could send my whole scene to everyone on the list, but I
might be able to send it to someone if they want to see firsthand. Just
the bare .osg file without any textures and without ocean and skydome
shows the problem adequately well.

Thanks in advance for any suggestions you might have. I really need to
improve this, and I've been working for a while already with only a
small improvement to show for my time...

J-S
--
______________________________________________________
Jean-Sebastien Guay    jean-sebastien.g...@cm-labs.com
                               http://www.cm-labs.com/
                        http://whitestar02.webhop.org/
_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to