Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-14 Thread Stuart Buchanan
On Mon, Dec 12, 2011 at 8:19 AM, Erik Hofman wrote:
> This reminds me that vegetation uses a texture strip with 8 different
> trees at a size of 256x64 pixels whereas clouds use one texture for
> every cloud (puff) using 256x256 pixels. Maybe that makes a difference?

It varies by cloud definition, but most of the global clouds use a single
texture containing 16 different cloud types, so we're pretty efficient.

> By the way, both use transparency.
In slightly different ways. The trees use two passes - one with
alpha-testing to draw the opaque parts, and another with alpha-blending
to handle the edges.

The clouds just use alpha-blending. I've tried using two passes with
the clouds, but the performance and visual impact was not good.

-Stuart

--
Cloud Computing - Latest Buzzword or a Glimpse of the Future?
This paper surveys cloud computing today: What are the benefits? 
Why are businesses embracing it? What are its payoffs and pitfalls?
http://www.accelacomm.com/jaw/sdnl/114/51425149/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-13 Thread Mathias Fröhlich

Hi,

On Tuesday, December 13, 2011 15:31:43 Csaba Halász wrote:
> 2011/12/12 Mathias Fröhlich :
> > As an answer to the previous mail, point sprites may help here too. You
> > will get the bilboard effect for free.
> > 
> > We have a queriable limit in the maximum supported point size which
> > nobody guarantees to be really high. But in reality point sprites can
> > get up to render buffer size for almost any GPU I know. The open source
> > radeon driver does glClear by drawing a screen sized point sprite...
> 
> When using the binary fglrx driver there are known problems with point
> sprites (not sure if anybody ever figured out the real cause) so if we
> switch to point sprites we should be careful to keep the current
> method as an alternative for the benefit of fglrx users.

Well, that's the runway lighting problems?

The reason is well known to me. We do triangles in point mode using back face 
culling to get the directional lights. Enabling point sprites in this mode is 
something fglrx does not like since some time. The open source driver can go 
well with this without any fallbacks. This one has other prolems, so just 
suggesting the oss drivers is not yet a real option.

I dont think that this kind of problems also apply to a possible cloud 
implementation. What you would do for clouds is much closer to the particles 
in osg. And these also work for fglrx.

Mathias

--
Cloud Computing - Latest Buzzword or a Glimpse of the Future?
This paper surveys cloud computing today: What are the benefits? 
Why are businesses embracing it? What are its payoffs and pitfalls?
http://www.accelacomm.com/jaw/sdnl/114/51425149/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-13 Thread Csaba Halász
2011/12/12 Mathias Fröhlich :
>
>
> As an answer to the previous mail, point sprites may help here too. You will
> get the bilboard effect for free.
>
> We have a queriable limit in the maximum supported point size which nobody
> guarantees to be really high. But in reality point sprites can get up to
> render buffer size for almost any GPU I know. The open source radeon driver
> does glClear by drawing a screen sized point sprite...

When using the binary fglrx driver there are known problems with point
sprites (not sure if anybody ever figured out the real cause) so if we
switch to point sprites we should be careful to keep the current
method as an alternative for the benefit of fglrx users.

-- 
Csaba/Jester

--
Systems Optimization Self Assessment
Improve efficiency and utilization of IT resources. Drive out cost and 
improve service delivery. Take 5 minutes to use this Systems Optimization 
Self Assessment. http://www.accelacomm.com/jaw/sdnl/114/51450054/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-12 Thread Emilian Huminiuc
On Sunday 11 December 2011 22:04:02 Stuart Buchanan wrote:
> On Thu, Dec 8, 2011 at 10:47 AM,I wrote:
> > 2011/12/8 Mathias Fröhlich wrote:
> >> If I do not respond to list mails when you need some response, fell
> >> free to contact me directly. I just miss some mails every now and
> >> then ...> 
> > Thanks for the offer. Will do.
> 
> I've had a look, and I think I can change the code to create a single
> PrimitiveSet for each cloud fairly easily.
> 
> On thinking about this a bit more, one thing that I don't quite understand
> is why the behaviour for clouds should differ so much from our random
> vegetation.
> 
> The random vegetation code we have is very similar - a small number
> of geometries being used again and again. Yet, the performance is far,
> far better, even with much higher numbers of objects.
> 
> I had thought that the main difference was the use of transparency,
> where the clouds are larger and generally more transparent than
> the trees.
> 
> If so, and the alpha blending of the textures has the most impact on
> framerate, will changing the geometry help significantly? Or is it the
> case that the transparency _within_ a geometry is much more effectively
> handled by OSG than the transparency between different geometries?
> 
> -Stuart
> 
On a sidenote to this discussion, all cloud*.eff files force render bin 10, 
moving them back to render bin 9 (as they were before according to this:
http://mapserver.flightgear.org/git/?p=fgdata;a=commitdiff;h=f94af651aecc63ea1989529f0114b28b4bcef48f
and also to the setup in  simgear/scene/util/RenderConstants.hxx ) gives me 
back a lot of performance (roughly from 15fps to >30 fps with fair weather, 
and much less framedrop in cloudheavy configs with a 8600gt).
Maybe there's some performance to be gained back from that too?

--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-12 Thread Erik Hofman
On Mon, 2011-12-12 at 10:56 +0100, Frederic Bouvier wrote:
> > De: "Erik Hofman"
> > 
> > On Mon, 2011-12-12 at 08:38 +0100, Mathias Fröhlich wrote:
> > 
> > > Also textures are just handed over to OpenGL.
> > 
> > This reminds me that vegetation uses a texture strip with 8 different
> > trees at a size of 256x64 pixels whereas clouds use one texture for
> > every cloud (puff) using 256x256 pixels. Maybe that makes a
> > difference?
> > 
> > By the way, both use transparency.
> 
> Do they use alpha blending or alpha testing ?

That, I don't know. I'm not familiair with the shaders.

Erik


--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-12 Thread Frederic Bouvier
> De: "Erik Hofman"
> 
> On Mon, 2011-12-12 at 08:38 +0100, Mathias Fröhlich wrote:
> 
> > Also textures are just handed over to OpenGL.
> 
> This reminds me that vegetation uses a texture strip with 8 different
> trees at a size of 256x64 pixels whereas clouds use one texture for
> every cloud (puff) using 256x256 pixels. Maybe that makes a
> difference?
> 
> By the way, both use transparency.

Do they use alpha blending or alpha testing ?

Regards,
-Fred

--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-12 Thread Erik Hofman
On Mon, 2011-12-12 at 08:38 +0100, Mathias Fröhlich wrote:

> Also textures are just handed over to OpenGL.

This reminds me that vegetation uses a texture strip with 8 different
trees at a size of 256x64 pixels whereas clouds use one texture for
every cloud (puff) using 256x256 pixels. Maybe that makes a difference?

By the way, both use transparency.

Erik


--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-11 Thread Mathias Fröhlich

Hi Stuart,

On Sunday, December 11, 2011 23:04:02 you wrote:
> I've had a look, and I think I can change the code to create a single
> PrimitiveSet for each cloud fairly easily.
I think you can try.

As an answer to the previous mail, point sprites may help here too. You will 
get the bilboard effect for free.

We have a queriable limit in the maximum supported point size which nobody 
guarantees to be really high. But in reality point sprites can get up to 
render buffer size for almost any GPU I know. The open source radeon driver 
does glClear by drawing a screen sized point sprite...

So, I am not 100% sure that just switching to point sprites is a good idea, 
but I think this could be reasonable. May be by about 99% ...
Thoughts? ... anybody listening?

> On thinking about this a bit more, one thing that I don't quite understand
> is why the behaviour for clouds should differ so much from our random
> vegetation.
> 
> The random vegetation code we have is very similar - a small number
> of geometries being used again and again. Yet, the performance is far,
> far better, even with much higher numbers of objects.
Hmm, is there really a higher number of object?
I would guess that the number of trees that is actually drawn on each frame is 
lower?
Can you verify this? May be a simple counter temporarily hacked into the 
vegetation and cloud code could provide harder numbers?

But yes, if this is the same, then we should find out. In the end this is also 
driver dependent. But what I see here on my setup is with a very high 
probability just draw limited.

When I understand the clouds right, there is only one cloud drawable that 
accounts for all the quad sprites in the scene. Then you draw a seperate quad 
for all sprites. True?
Since I assume that there is one drawable issuing several tousands of single 
quad draws, this will not show up in osg's depth sorting at all. All osg has 
to sort when this single drawable needs to be drawn with respect to itself. 
Sorting a single element is relatively cheap :)

If this is the case, transparency on or off should not show up on the CPU time. 
I agree that transparency costs a little more on the GPU. But still, todays 
GPU's should really do that fast enough. Think at the particle systems and how 
many particles you can do before you see a measurable reaction from the GPU.
There are vizualization techinques out there to draw geometry with several 
10^x point sprites. So, the GPU is really designed to do that.

If we have many cloud drawables putting them into the depth sorted render bin 
will increase the cull times. But again, multiple ones but only a few will not 
show up significantly on sorting.

You can also try to play with osg's frame statistics. I guess you know that 
you can switch that on from the debug(?) menu.
I expect transparency to show up on the orange GPU bar. Being CPU and draw 
limited means the yellow bar is long. And the blue one grows when cull happens 
to be a problem.
... just a rule of thumb for our problem.

For comparison, here I see about the same length for the yellow and orange bar 
with traditional clouds. Switching on 3d clouds leaves the orange bar mostly 
untouched and raises the yellow bar about by that factor I see in the frame 
rate reduction. This is on my notebook with a medium fast gpu.
So, I conclude that the GPU does not care at all for the clouds. It is the CPU 
that needs to do so much to make that geometry happen on the GPU.
How does this look on your machine?

> I had thought that the main difference was the use of transparency,
> where the clouds are larger and generally more transparent than
> the trees.
Hmm, see above. Do you see a long orange bar with the clouds? Much longer then 
without?
I am sure the fill rate needs to be high with the clouds. My feeling is that 
transparency on or off only makes this worse by say a factor of two?!
But I see a frame rate drop with 3d clouds by a factor of 10 or more.
You can experiment with switching on and off blending in the clouds. Since you 
still draw them back to front you should still occupy the same fill rate on the 
GPU. But the read modify write cycle needed for blending is then gone in 
favour to a cheaper just produce a color and write it if the depth test 
passes, which should pass almost every time in the clouds because of drawing 
back to front.

> If so, and the alpha blending of the textures has the most impact on
> framerate, will changing the geometry help significantly? Or is it the
> case that the transparency _within_ a geometry is much more effectively
> handled by OSG than the transparency between different geometries?
Well, there is nothing to handle for transparency within a geometry. The 
geometries are atomic for osgs transparency. If you implement something non 
atomic in the draw routine like you do for the clouds it's your cpu time. But 
the only thing that osg does is to sort drawables that are in the depth sorted 
render bin so that they are dra

Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-11 Thread Stuart Buchanan
On Thu, Dec 8, 2011 at 10:47 AM,I wrote:
> 2011/12/8 Mathias Fröhlich wrote:
>> If I do not respond to list mails when you need some response, fell free to
>> contact me directly. I just miss some mails every now and then ...
>
> Thanks for the offer. Will do.

I've had a look, and I think I can change the code to create a single
PrimitiveSet for each cloud fairly easily.

On thinking about this a bit more, one thing that I don't quite understand is
why the behaviour for clouds should differ so much from our random
vegetation.

The random vegetation code we have is very similar - a small number
of geometries being used again and again. Yet, the performance is far,
far better, even with much higher numbers of objects.

I had thought that the main difference was the use of transparency,
where the clouds are larger and generally more transparent than
the trees.

If so, and the alpha blending of the textures has the most impact on
framerate, will changing the geometry help significantly? Or is it the
case that the transparency _within_ a geometry is much more effectively
handled by OSG than the transparency between different geometries?

-Stuart

--
Learn Windows Azure Live!  Tuesday, Dec 13, 2011
Microsoft is holding a special Learn Windows Azure training event for 
developers. It will provide a great way to learn Windows Azure and what it 
provides. You can attend the event by watching it streamed LIVE online.  
Learn more at http://p.sf.net/sfu/ms-windowsazure
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-08 Thread Stuart Buchanan
2011/12/8 Mathias Fröhlich wrote:
> The cloud code changes its internal data structures in the draw call?
> Or are these lists per context data structures?
> This is kind of proboblematic for two reasons:
> 1. draw happens in paralell in all configured multi viewer contexts. So I fear
> concurrent modifications on a shared datastructure.
> 2. The eyepoint on different cameras can be different which might give 
> different
> results in sorting I guess?
> While the second point might not happen in current flightgear implementations,
> I still think that we should try to keep this in mind.

Good point, and one I'd not thought about before.

> And, more performance related.
> I see that single quad that is used to draw the clouds. Then we have this loop
> over all the sprites. So what the driver does is on every new quad there is a
> new geometry setup that happens. And this is one of the expensive things in a
> driver. Mostly this is the best optimized code path since this happens often,
> but still it is relatively much work on the CPU to teach the GPU that there is
> a new bunch of geometry to render. A rule of thumb that really depends on the
> kind of shaders that are used and how much fragments are renderd, but for our
> average models, including the clouds I think, It will take about the same time
> to render one quad or a single bunch of 1000 quads. It's just the GPU is so
> fast and the geometry setup on the CPU dominates.
>
> So having a huger array of vertex attributes done with one draw command
> consisting of only one osg::PrimitiveSet could improove rendering speeds.
> Sure this needs some more infrastructure to change this at the right time in
> the frame. Also this is concurrent with the depth sorting we have discussed
> before. So the question is what is the right compromise then? Having
> relatively huge bunches of geometry helps rendering and osg's sorting speed.
> But huge bunches might give problems with the sort order blending in the wrong
> order. Which is BTW the kind of artefact that somebody pointed out on a recent
> X-Plane screenshot. So we are currently better, but still, we need to get back
> to 'real time' ...

So it sounds like I should create a single geometry per cloud, rather
than per-sprite?

If I'm doing that I might as well scale the geometry when I create it,
rather than
relying on the shader to do so for me.

The major problem I can see with this is performing the billboard
rotation. To do
that in the vertex shader, I'll need to pass in the center of each
sprite. However,
I wonder if I can make this more efficient by doing it outside of the
shader, and
a) performing the same rotation on each sprite within the cloud
b) only performing rotation when required, rather than per-frame.

Presumably I could retain the manual sorting we do at present, and simply
sort the geometry itself. Or is that better left to OSG?

In fact, this may remove most of the function from the shader itself!

> How much ShaderGeometry's do we have? One per cloud?
> For me this also raises the question of reproducible clouds. If we have
> multiple independent viewers in the future, we need to draw the same clouds on
> each with a bare minimum of communication. So, what is needed to generate the
> exactly same cloud. May be an initial seed for the random number generator, a
> position and a size?

I need to check, but I'm fairly sure we use an initial seed already.

> We may need to identify such a set of parameters and may
> be we should have a peudo loader for osg producing this kind of clouds from
> these parameters. The you would be able to load and use these clouds from
> fgviewer and see isolated statistics about the draw/cull whatever steps. This
> might also help in understanding what is going on.

We already have export/import methods, though I've never used them in anger!
Certainly using fgviewer would be an excellent idea.

> If I do not respond to list mails when you need some response, fell free to
> contact me directly. I just miss some mails every now and then ...

Thanks for the offer. Will do.

-Stuart

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-07 Thread Mathias Fröhlich

Hi,

On Wednesday, December 07, 2011 23:05:06 Stuart Buchanan wrote:
> 2011/12/6 Mathias Fröhlich wrote:
> > As usual, I did not take the time to really look into the code. So please
> > excuse more or less obvious questions about the current implementation.
> 
> The relevant code is newcloud.[c|h]xx and CloudShaderGeometry.[c|h]xx
> in simgear/scene/sky/
> 
> It's entirely possible that there are some inefficiencies there. The only
> people who have done anything with it are Tim and myself, and I'm
> certainly not a good graphics programmer.

Still just a quick look from my side:

The cloud code changes its internal data structures in the draw call?
Or are these lists per context data structures?
This is kind of proboblematic for two reasons:
1. draw happens in paralell in all configured multi viewer contexts. So I fear 
concurrent modifications on a shared datastructure.
2. The eyepoint on different cameras can be different which might give 
different 
results in sorting I guess?
While the second point might not happen in current flightgear implementations, 
I still think that we should try to keep this in mind.

And, more performance related.
I see that single quad that is used to draw the clouds. Then we have this loop 
over all the sprites. So what the driver does is on every new quad there is a 
new geometry setup that happens. And this is one of the expensive things in a 
driver. Mostly this is the best optimized code path since this happens often, 
but still it is relatively much work on the CPU to teach the GPU that there is 
a new bunch of geometry to render. A rule of thumb that really depends on the 
kind of shaders that are used and how much fragments are renderd, but for our 
average models, including the clouds I think, It will take about the same time 
to render one quad or a single bunch of 1000 quads. It's just the GPU is so 
fast and the geometry setup on the CPU dominates.

So having a huger array of vertex attributes done with one draw command 
consisting of only one osg::PrimitiveSet could improove rendering speeds.
Sure this needs some more infrastructure to change this at the right time in 
the frame. Also this is concurrent with the depth sorting we have discussed 
before. So the question is what is the right compromise then? Having 
relatively huge bunches of geometry helps rendering and osg's sorting speed. 
But huge bunches might give problems with the sort order blending in the wrong 
order. Which is BTW the kind of artefact that somebody pointed out on a recent 
X-Plane screenshot. So we are currently better, but still, we need to get back 
to 'real time' ...

How much ShaderGeometry's do we have? One per cloud?
For me this also raises the question of reproducible clouds. If we have 
multiple independent viewers in the future, we need to draw the same clouds on 
each with a bare minimum of communication. So, what is needed to generate the 
exactly same cloud. May be an initial seed for the random number generator, a 
position and a size? We may need to identify such a set of parameters and may 
be we should have a peudo loader for osg producing this kind of clouds from 
these parameters. The you would be able to load and use these clouds from 
fgviewer and see isolated statistics about the draw/cull whatever steps. This 
might also help in understanding what is going on.

Also Integrating this all with atmospheric scattering would be nice. I guess 
we really need to move the clouds into a post processing step doing all the 
athmospheric computations. But that also raises the idea of having the 
scattering at a completely different place which should improove 
maintainability and execution speed ...

> We use the Effects code to pick up the Texture (see newcloud.cxx line 108).
> I guess it's possible that is not doing the right thing.
Ok, Don't know off hand. But at least we have no state change in between the 
distinct quads.

> We sort all the sprites within a cloud to minimize the state changes, and
> then use heuristics to minimize the amount of re-sorting
> (see CloudShaderGeometry.cxx line 97
> 
> Is that what you mean?
Well, kind of yes. Sorting happens due to depth but there is no state change 
at all. Which is good.

> We already use array textures so we have multiple different textures in a
> single image file. Is that what you mean?
Yes. Good!

> Thanks very much for the explanations - very useful as always. If you have
> the time to take a look at the code and see if there are any really
> obvious mistakes,
> I would be very grateful.
Thanks for your work in doing this all!
If I do not respond to list mails when you need some response, fell free to 
contact me directly. I just miss some mails every now and then ...

Greetings

Mathias

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for 

Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-07 Thread Stuart Buchanan
2011/12/6 Mathias Fröhlich wrote:
> As usual, I did not take the time to really look into the code. So please
> excuse more or less obvious questions about the current implementation.

The relevant code is newcloud.[c|h]xx and CloudShaderGeometry.[c|h]xx
in simgear/scene/sky/

It's entirely possible that there are some inefficiencies there. The only people
who have done anything with it are Tim and myself, and I'm certainly not a
good graphics programmer.

> I assume that you have a few textures that is used very often?
> Or do these textures really have a different content?

For the global clouds we typically use a single texture for all the clouds
in the layer, and there are usually between 1 and 3 layers.

> If they are the same content wise, you can help at first osg and then also the
> GL driver if these are also the same on osg level. That means the
> osg::Texture* state attribute must be the same for all uses of the same
> texture picture.
>
> May be you already know this, but the description pretty much sounds like an
> effect originating from this.

We use the Effects code to pick up the Texture (see newcloud.cxx line 108). I
guess it's possible that is not doing the right thing.


> Really what matters much more are state changes and geometry setup.
> I still assume that you have very small bunches of geometry that the clouds
> are made of. This really hurts both osg and the driver.
> Sure to get transparency right they must be distinct. But I guess the problem
> to be solved is how to get maximum sized bunches of geometry with as little
> state changes as possible ahd still have the transparency right ...
>
> For osg this means that there must be relatively huge atomic
> Geometry/PrimitiveSets that are probably pre sorted to order the draw in the
> depth. Probably an octree like structure holding the cloud subscene helps
> here.

We sort all the sprites within a cloud to minimize the state changes, and then
use heuristics to minimize the amount of re-sorting
(see CloudShaderGeometry.cxx line 97

Is that what you mean?

> There is also a technicque called 'pre integration' available for clouds.

I'll look into this.

> Together I still think that the amount of draws and state changes must be cut
> down.
> In terms of state changes, an exact depth sorted draw order causes may be a
> huge amount of texture state changes. May be it is possible to collapse the
> different textures into one and pick out the appropriate subrange of the
> texture? May be an array texture, may be a 3d texture with different layers in
> each discrete z dimension?

We already use array textures so we have multiple different textures in a single
image file. Is that what you mean?

Thanks very much for the explanations - very useful as always. If you have the
time to take a look at the code and see if there are any really
obvious mistakes,
I would be very grateful.

-Stuart

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-05 Thread Mathias Fröhlich

Hi,


On Monday, December 05, 2011 12:07:31 Stuart Buchanan wrote:
> I had come to the conclusion that the only way to get a signficant increase
> in performance would be move to using Impostors. That's a big change,
> particularly as the OSG implimentation appears to be broken/bit-rotted.
> I've been strenuously avoiding having to think about implementing it
> myself, but I may have to just bite the bullet. Either way, it isn't going
> to happen for 2.6.0!

There is also a technicque called 'pre integration' available for clouds. So 
youre actually really computing the rendering equations integrals but the 
space filling volume elements have their conrtibutions to the integral value 
already precomputed on a texture map on the volume elements surface.
The problem to solve is still how to draw this fast in the right back to front 
order.

And I agree that this is also something like the imposters in the sense that 
there are pre computed areas having a fixed ficture for some time.

Together I still think that the amount of draws and state changes must be cut 
down.
In terms of state changes, an exact depth sorted draw order causes may be a 
huge amount of texture state changes. May be it is possible to collapse the 
different textures into one and pick out the appropriate subrange of the 
texture? May be an array texture, may be a 3d texture with different layers in 
each discrete z dimension?

Greetings

Mathias

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-05 Thread Mathias Fröhlich

Hi,

As usual, I did not take the time to really look into the code. So please 
excuse more or less obvious questions about the current implementation.

On Monday, December 05, 2011 09:26:09 thorsten.i.r...@jyu.fi wrote:
> Since according to the newsletter Stuart's current ongoing quest is to get
> better performance for 3d clouds, here are some of my observation:
> 
> * I've noticed that when I use the relatively lowres Altocumulus texture
> sheet (3x3 on one sheet) I can basically use a ridiculous number of
> sprites without performance deterioration, whereas when I use the hires
> Cumulus sheets (1x2 plus 1x3) the number of sprites I can show before
> performance takes a nosedive goes down substantially.
> 
> The high resolution is however only needed for the small amount of clouds
> which are relatively close, but what makes a real difference is the amount
> of distant clouds, because there are so much more. So my guess is that
> using lowres textures for distant clouds would do just fine and improve
> performance. I've been wondering if dds sheets with the mipmaps would not
> automatically address that problem.
>
> The other option to test would be to scale down the resolution of the Cu
> cloud textures and see if the result is still acceptable (I know it isn't
> perfect, there was a reason I went to high resolution in the first place,
> but maybe the flaws can be hidden by the right mixture with other texture
> types).

I assume that you have a few textures that is used very often?
Or do these textures really have a different content?

If they are the same content wise, you can help at first osg and then also the 
GL driver if these are also the same on osg level. That means the 
osg::Texture* state attribute must be the same for all uses of the same 
texture picture.

May be you already know this, but the description pretty much sounds like an 
effect originating from this.


> * There seems still to be stuff computed in the shaders per vertex that is
> actually an uniform per frame - eyepos for instance. I wonder if the
> computations could be speeded up significantly  by consequently pulling
> all things that are really uniforms out of the shaders.
While it is always better to do as little things as possible in an inner loop, 
but in this case:

It really does not matter at all for a GPU. The geometry setup, which wires 
together the shaders from the different stages and connects uniforms is about 
the same amount of CPU work in the driver if you connect an external uniform 
or if you connect a varying from the vertex stage. It is getting complicated 
for the driver once you hit the maximum number if varyings for a hardware.
May be some other corner cases also ...
The work that the vertex shader does is really minimal. You would probably be 
able to mesure this effect when you have *huge* amounts of vertices. But huge 
means in the order of 1e6-1e8 vertices what you can get in CAD tools.

Really what matters much more are state changes and geometry setup.
I still assume that you have very small bunches of geometry that the clouds 
are made of. This really hurts both osg and the driver.
Sure to get transparency right they must be distinct. But I guess the problem 
to be solved is how to get maximum sized bunches of geometry with as little 
state changes as possible ahd still have the transparency right ...

For osg this means that there must be relatively huge atomic 
Geometry/PrimitiveSets that are probably pre sorted to order the draw in the 
depth. Probably an octree like structure holding the cloud subscene helps 
here.

May be you can also look into osg's fast geometry. If you use index arrays in 
osg, the draw stage will resort to glVertex3f calls which is slow in any 
driver. Make sure that the geometry is drawn by glDrawElements and the like.

Greetings

Mathias

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!

2011-12-05 Thread Stuart Buchanan
On Mon, Dec 5, 2011 at 8:26 AM,  Thorsten Renk wrote:
> Since according to the newsletter Stuart's current ongoing quest is to get
> better performance for 3d clouds, here are some of my observation:

Thanks very much for the observations. Lots of food for thought :)

As an FYI, the investigations I've been doing haven't born much fruit.
In fact, I've been thinking that my "quest" is a bit like that of the Holy
Grail - something you never actually attain :)

I had come to the conclusion that the only way to get a signficant increase
in performance would be move to using Impostors. That's a big change,
particularly as the OSG implimentation appears to be broken/bit-rotted. I've
been strenuously avoiding having to think about implementing it myself,
but I may have to just bite the bullet. Either way, it isn't going to happen
for 2.6.0!

> * I've noticed that when I use the relatively lowres Altocumulus texture
> sheet (3x3 on one sheet) I can basically use a ridiculous number of
> sprites without performance deterioration, whereas when I use the hires
> Cumulus sheets (1x2 plus 1x3) the number of sprites I can show before
> performance takes a nosedive goes down substantially.

That's very interesting information indeed. I will do some like-for-like
experiments

One contributing factor may be differences in the amount of transparency
in the different textures.

> * There seems still to be stuff computed in the shaders per vertex that is
> actually an uniform per frame - eyepos for instance. I wonder if the
> computations could be speeded up significantly  by consequently pulling
> all things that are really uniforms out of the shaders.

I'll take a look. Vector and matrix calculations should be very efficient in the
GPU, and we're "only" performing these per-vertex rather than per-fragment,
so there may not be much benefit.

> * We're likewise fond of computing stuff per frame that changes more like
> per minute. The orientation of faraway clouds doesn't have to be computed
> per frame, because it can't change much per frame. If there'd be a way to
> store the value used last time, then (based on a distance criterion), one
> could assign clouds into n task groups and recompute a task group only
> every nth frame and use the last stored value otherwise. Back when I
> rotated clouds from Nasal, this did work and improved performance by a
> factor 5 or 6 - not sure how much it could do with a Shader setup, not
> sure how to do it technically, but my guess is that it would speed things
> up.

We already use this technique for sorting the sprites within the cloud, by using
a heuristic that if the sprites were already sorted the previous time
we checked,
they probably still are.

We could do something similar for calculating the eyepoint outside of
the shader,
but as pointed out above, I'm not sure this is the main perf limitation.

-Stuart

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel