Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
On Mon, Dec 12, 2011 at 8:19 AM, Erik Hofman wrote: This reminds me that vegetation uses a texture strip with 8 different trees at a size of 256x64 pixels whereas clouds use one texture for every cloud (puff) using 256x256 pixels. Maybe that makes a difference? It varies by cloud definition, but most of the global clouds use a single texture containing 16 different cloud types, so we're pretty efficient. By the way, both use transparency. In slightly different ways. The trees use two passes - one with alpha-testing to draw the opaque parts, and another with alpha-blending to handle the edges. The clouds just use alpha-blending. I've tried using two passes with the clouds, but the performance and visual impact was not good. -Stuart -- Cloud Computing - Latest Buzzword or a Glimpse of the Future? This paper surveys cloud computing today: What are the benefits? Why are businesses embracing it? What are its payoffs and pitfalls? http://www.accelacomm.com/jaw/sdnl/114/51425149/ ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
2011/12/12 Mathias Fröhlich mathias.froehl...@gmx.net: As an answer to the previous mail, point sprites may help here too. You will get the bilboard effect for free. We have a queriable limit in the maximum supported point size which nobody guarantees to be really high. But in reality point sprites can get up to render buffer size for almost any GPU I know. The open source radeon driver does glClear by drawing a screen sized point sprite... When using the binary fglrx driver there are known problems with point sprites (not sure if anybody ever figured out the real cause) so if we switch to point sprites we should be careful to keep the current method as an alternative for the benefit of fglrx users. -- Csaba/Jester -- Systems Optimization Self Assessment Improve efficiency and utilization of IT resources. Drive out cost and improve service delivery. Take 5 minutes to use this Systems Optimization Self Assessment. http://www.accelacomm.com/jaw/sdnl/114/51450054/ ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
Hi, On Tuesday, December 13, 2011 15:31:43 Csaba Halász wrote: 2011/12/12 Mathias Fröhlich mathias.froehl...@gmx.net: As an answer to the previous mail, point sprites may help here too. You will get the bilboard effect for free. We have a queriable limit in the maximum supported point size which nobody guarantees to be really high. But in reality point sprites can get up to render buffer size for almost any GPU I know. The open source radeon driver does glClear by drawing a screen sized point sprite... When using the binary fglrx driver there are known problems with point sprites (not sure if anybody ever figured out the real cause) so if we switch to point sprites we should be careful to keep the current method as an alternative for the benefit of fglrx users. Well, that's the runway lighting problems? The reason is well known to me. We do triangles in point mode using back face culling to get the directional lights. Enabling point sprites in this mode is something fglrx does not like since some time. The open source driver can go well with this without any fallbacks. This one has other prolems, so just suggesting the oss drivers is not yet a real option. I dont think that this kind of problems also apply to a possible cloud implementation. What you would do for clouds is much closer to the particles in osg. And these also work for fglrx. Mathias -- Cloud Computing - Latest Buzzword or a Glimpse of the Future? This paper surveys cloud computing today: What are the benefits? Why are businesses embracing it? What are its payoffs and pitfalls? http://www.accelacomm.com/jaw/sdnl/114/51425149/ ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
On Mon, 2011-12-12 at 08:38 +0100, Mathias Fröhlich wrote: Also textures are just handed over to OpenGL. This reminds me that vegetation uses a texture strip with 8 different trees at a size of 256x64 pixels whereas clouds use one texture for every cloud (puff) using 256x256 pixels. Maybe that makes a difference? By the way, both use transparency. Erik -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
De: Erik Hofman On Mon, 2011-12-12 at 08:38 +0100, Mathias Fröhlich wrote: Also textures are just handed over to OpenGL. This reminds me that vegetation uses a texture strip with 8 different trees at a size of 256x64 pixels whereas clouds use one texture for every cloud (puff) using 256x256 pixels. Maybe that makes a difference? By the way, both use transparency. Do they use alpha blending or alpha testing ? Regards, -Fred -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
On Mon, 2011-12-12 at 10:56 +0100, Frederic Bouvier wrote: De: Erik Hofman On Mon, 2011-12-12 at 08:38 +0100, Mathias Fröhlich wrote: Also textures are just handed over to OpenGL. This reminds me that vegetation uses a texture strip with 8 different trees at a size of 256x64 pixels whereas clouds use one texture for every cloud (puff) using 256x256 pixels. Maybe that makes a difference? By the way, both use transparency. Do they use alpha blending or alpha testing ? That, I don't know. I'm not familiair with the shaders. Erik -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
On Thu, Dec 8, 2011 at 10:47 AM,I wrote: 2011/12/8 Mathias Fröhlich wrote: If I do not respond to list mails when you need some response, fell free to contact me directly. I just miss some mails every now and then ... Thanks for the offer. Will do. I've had a look, and I think I can change the code to create a single PrimitiveSet for each cloud fairly easily. On thinking about this a bit more, one thing that I don't quite understand is why the behaviour for clouds should differ so much from our random vegetation. The random vegetation code we have is very similar - a small number of geometries being used again and again. Yet, the performance is far, far better, even with much higher numbers of objects. I had thought that the main difference was the use of transparency, where the clouds are larger and generally more transparent than the trees. If so, and the alpha blending of the textures has the most impact on framerate, will changing the geometry help significantly? Or is it the case that the transparency _within_ a geometry is much more effectively handled by OSG than the transparency between different geometries? -Stuart -- Learn Windows Azure Live! Tuesday, Dec 13, 2011 Microsoft is holding a special Learn Windows Azure training event for developers. It will provide a great way to learn Windows Azure and what it provides. You can attend the event by watching it streamed LIVE online. Learn more at http://p.sf.net/sfu/ms-windowsazure ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
Hi Stuart, On Sunday, December 11, 2011 23:04:02 you wrote: I've had a look, and I think I can change the code to create a single PrimitiveSet for each cloud fairly easily. I think you can try. As an answer to the previous mail, point sprites may help here too. You will get the bilboard effect for free. We have a queriable limit in the maximum supported point size which nobody guarantees to be really high. But in reality point sprites can get up to render buffer size for almost any GPU I know. The open source radeon driver does glClear by drawing a screen sized point sprite... So, I am not 100% sure that just switching to point sprites is a good idea, but I think this could be reasonable. May be by about 99% ... Thoughts? ... anybody listening? On thinking about this a bit more, one thing that I don't quite understand is why the behaviour for clouds should differ so much from our random vegetation. The random vegetation code we have is very similar - a small number of geometries being used again and again. Yet, the performance is far, far better, even with much higher numbers of objects. Hmm, is there really a higher number of object? I would guess that the number of trees that is actually drawn on each frame is lower? Can you verify this? May be a simple counter temporarily hacked into the vegetation and cloud code could provide harder numbers? But yes, if this is the same, then we should find out. In the end this is also driver dependent. But what I see here on my setup is with a very high probability just draw limited. When I understand the clouds right, there is only one cloud drawable that accounts for all the quad sprites in the scene. Then you draw a seperate quad for all sprites. True? Since I assume that there is one drawable issuing several tousands of single quad draws, this will not show up in osg's depth sorting at all. All osg has to sort when this single drawable needs to be drawn with respect to itself. Sorting a single element is relatively cheap :) If this is the case, transparency on or off should not show up on the CPU time. I agree that transparency costs a little more on the GPU. But still, todays GPU's should really do that fast enough. Think at the particle systems and how many particles you can do before you see a measurable reaction from the GPU. There are vizualization techinques out there to draw geometry with several 10^x point sprites. So, the GPU is really designed to do that. If we have many cloud drawables putting them into the depth sorted render bin will increase the cull times. But again, multiple ones but only a few will not show up significantly on sorting. You can also try to play with osg's frame statistics. I guess you know that you can switch that on from the debug(?) menu. I expect transparency to show up on the orange GPU bar. Being CPU and draw limited means the yellow bar is long. And the blue one grows when cull happens to be a problem. ... just a rule of thumb for our problem. For comparison, here I see about the same length for the yellow and orange bar with traditional clouds. Switching on 3d clouds leaves the orange bar mostly untouched and raises the yellow bar about by that factor I see in the frame rate reduction. This is on my notebook with a medium fast gpu. So, I conclude that the GPU does not care at all for the clouds. It is the CPU that needs to do so much to make that geometry happen on the GPU. How does this look on your machine? I had thought that the main difference was the use of transparency, where the clouds are larger and generally more transparent than the trees. Hmm, see above. Do you see a long orange bar with the clouds? Much longer then without? I am sure the fill rate needs to be high with the clouds. My feeling is that transparency on or off only makes this worse by say a factor of two?! But I see a frame rate drop with 3d clouds by a factor of 10 or more. You can experiment with switching on and off blending in the clouds. Since you still draw them back to front you should still occupy the same fill rate on the GPU. But the read modify write cycle needed for blending is then gone in favour to a cheaper just produce a color and write it if the depth test passes, which should pass almost every time in the clouds because of drawing back to front. If so, and the alpha blending of the textures has the most impact on framerate, will changing the geometry help significantly? Or is it the case that the transparency _within_ a geometry is much more effectively handled by OSG than the transparency between different geometries? Well, there is nothing to handle for transparency within a geometry. The geometries are atomic for osgs transparency. If you implement something non atomic in the draw routine like you do for the clouds it's your cpu time. But the only thing that osg does is to sort drawables that are in the depth sorted render bin so that they are drawn back to
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
2011/12/6 Mathias Fröhlich wrote: As usual, I did not take the time to really look into the code. So please excuse more or less obvious questions about the current implementation. The relevant code is newcloud.[c|h]xx and CloudShaderGeometry.[c|h]xx in simgear/scene/sky/ It's entirely possible that there are some inefficiencies there. The only people who have done anything with it are Tim and myself, and I'm certainly not a good graphics programmer. I assume that you have a few textures that is used very often? Or do these textures really have a different content? For the global clouds we typically use a single texture for all the clouds in the layer, and there are usually between 1 and 3 layers. If they are the same content wise, you can help at first osg and then also the GL driver if these are also the same on osg level. That means the osg::Texture* state attribute must be the same for all uses of the same texture picture. May be you already know this, but the description pretty much sounds like an effect originating from this. We use the Effects code to pick up the Texture (see newcloud.cxx line 108). I guess it's possible that is not doing the right thing. snip Really what matters much more are state changes and geometry setup. I still assume that you have very small bunches of geometry that the clouds are made of. This really hurts both osg and the driver. Sure to get transparency right they must be distinct. But I guess the problem to be solved is how to get maximum sized bunches of geometry with as little state changes as possible ahd still have the transparency right ... For osg this means that there must be relatively huge atomic Geometry/PrimitiveSets that are probably pre sorted to order the draw in the depth. Probably an octree like structure holding the cloud subscene helps here. We sort all the sprites within a cloud to minimize the state changes, and then use heuristics to minimize the amount of re-sorting (see CloudShaderGeometry.cxx line 97 Is that what you mean? There is also a technicque called 'pre integration' available for clouds. I'll look into this. Together I still think that the amount of draws and state changes must be cut down. In terms of state changes, an exact depth sorted draw order causes may be a huge amount of texture state changes. May be it is possible to collapse the different textures into one and pick out the appropriate subrange of the texture? May be an array texture, may be a 3d texture with different layers in each discrete z dimension? We already use array textures so we have multiple different textures in a single image file. Is that what you mean? Thanks very much for the explanations - very useful as always. If you have the time to take a look at the code and see if there are any really obvious mistakes, I would be very grateful. -Stuart -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
[Flightgear-devel] Trying to get more performance out of the 3D clouds!
Since according to the newsletter Stuart's current ongoing quest is to get better performance for 3d clouds, here are some of my observation: * I've noticed that when I use the relatively lowres Altocumulus texture sheet (3x3 on one sheet) I can basically use a ridiculous number of sprites without performance deterioration, whereas when I use the hires Cumulus sheets (1x2 plus 1x3) the number of sprites I can show before performance takes a nosedive goes down substantially. The high resolution is however only needed for the small amount of clouds which are relatively close, but what makes a real difference is the amount of distant clouds, because there are so much more. So my guess is that using lowres textures for distant clouds would do just fine and improve performance. I've been wondering if dds sheets with the mipmaps would not automatically address that problem. The other option to test would be to scale down the resolution of the Cu cloud textures and see if the result is still acceptable (I know it isn't perfect, there was a reason I went to high resolution in the first place, but maybe the flaws can be hidden by the right mixture with other texture types). * There seems still to be stuff computed in the shaders per vertex that is actually an uniform per frame - eyepos for instance. I wonder if the computations could be speeded up significantly by consequently pulling all things that are really uniforms out of the shaders. * We're likewise fond of computing stuff per frame that changes more like per minute. The orientation of faraway clouds doesn't have to be computed per frame, because it can't change much per frame. If there'd be a way to store the value used last time, then (based on a distance criterion), one could assign clouds into n task groups and recompute a task group only every nth frame and use the last stored value otherwise. Back when I rotated clouds from Nasal, this did work and improved performance by a factor 5 or 6 - not sure how much it could do with a Shader setup, not sure how to do it technically, but my guess is that it would speed things up. Maybe some of this helps! Cheers, * Thorsten -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
On Mon, Dec 5, 2011 at 8:26 AM, Thorsten Renk wrote: Since according to the newsletter Stuart's current ongoing quest is to get better performance for 3d clouds, here are some of my observation: Thanks very much for the observations. Lots of food for thought :) As an FYI, the investigations I've been doing haven't born much fruit. In fact, I've been thinking that my quest is a bit like that of the Holy Grail - something you never actually attain :) I had come to the conclusion that the only way to get a signficant increase in performance would be move to using Impostors. That's a big change, particularly as the OSG implimentation appears to be broken/bit-rotted. I've been strenuously avoiding having to think about implementing it myself, but I may have to just bite the bullet. Either way, it isn't going to happen for 2.6.0! * I've noticed that when I use the relatively lowres Altocumulus texture sheet (3x3 on one sheet) I can basically use a ridiculous number of sprites without performance deterioration, whereas when I use the hires Cumulus sheets (1x2 plus 1x3) the number of sprites I can show before performance takes a nosedive goes down substantially. That's very interesting information indeed. I will do some like-for-like experiments One contributing factor may be differences in the amount of transparency in the different textures. * There seems still to be stuff computed in the shaders per vertex that is actually an uniform per frame - eyepos for instance. I wonder if the computations could be speeded up significantly by consequently pulling all things that are really uniforms out of the shaders. I'll take a look. Vector and matrix calculations should be very efficient in the GPU, and we're only performing these per-vertex rather than per-fragment, so there may not be much benefit. * We're likewise fond of computing stuff per frame that changes more like per minute. The orientation of faraway clouds doesn't have to be computed per frame, because it can't change much per frame. If there'd be a way to store the value used last time, then (based on a distance criterion), one could assign clouds into n task groups and recompute a task group only every nth frame and use the last stored value otherwise. Back when I rotated clouds from Nasal, this did work and improved performance by a factor 5 or 6 - not sure how much it could do with a Shader setup, not sure how to do it technically, but my guess is that it would speed things up. We already use this technique for sorting the sprites within the cloud, by using a heuristic that if the sprites were already sorted the previous time we checked, they probably still are. We could do something similar for calculating the eyepoint outside of the shader, but as pointed out above, I'm not sure this is the main perf limitation. -Stuart -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
Hi, As usual, I did not take the time to really look into the code. So please excuse more or less obvious questions about the current implementation. On Monday, December 05, 2011 09:26:09 thorsten.i.r...@jyu.fi wrote: Since according to the newsletter Stuart's current ongoing quest is to get better performance for 3d clouds, here are some of my observation: * I've noticed that when I use the relatively lowres Altocumulus texture sheet (3x3 on one sheet) I can basically use a ridiculous number of sprites without performance deterioration, whereas when I use the hires Cumulus sheets (1x2 plus 1x3) the number of sprites I can show before performance takes a nosedive goes down substantially. The high resolution is however only needed for the small amount of clouds which are relatively close, but what makes a real difference is the amount of distant clouds, because there are so much more. So my guess is that using lowres textures for distant clouds would do just fine and improve performance. I've been wondering if dds sheets with the mipmaps would not automatically address that problem. The other option to test would be to scale down the resolution of the Cu cloud textures and see if the result is still acceptable (I know it isn't perfect, there was a reason I went to high resolution in the first place, but maybe the flaws can be hidden by the right mixture with other texture types). I assume that you have a few textures that is used very often? Or do these textures really have a different content? If they are the same content wise, you can help at first osg and then also the GL driver if these are also the same on osg level. That means the osg::Texture* state attribute must be the same for all uses of the same texture picture. May be you already know this, but the description pretty much sounds like an effect originating from this. * There seems still to be stuff computed in the shaders per vertex that is actually an uniform per frame - eyepos for instance. I wonder if the computations could be speeded up significantly by consequently pulling all things that are really uniforms out of the shaders. While it is always better to do as little things as possible in an inner loop, but in this case: It really does not matter at all for a GPU. The geometry setup, which wires together the shaders from the different stages and connects uniforms is about the same amount of CPU work in the driver if you connect an external uniform or if you connect a varying from the vertex stage. It is getting complicated for the driver once you hit the maximum number if varyings for a hardware. May be some other corner cases also ... The work that the vertex shader does is really minimal. You would probably be able to mesure this effect when you have *huge* amounts of vertices. But huge means in the order of 1e6-1e8 vertices what you can get in CAD tools. Really what matters much more are state changes and geometry setup. I still assume that you have very small bunches of geometry that the clouds are made of. This really hurts both osg and the driver. Sure to get transparency right they must be distinct. But I guess the problem to be solved is how to get maximum sized bunches of geometry with as little state changes as possible ahd still have the transparency right ... For osg this means that there must be relatively huge atomic Geometry/PrimitiveSets that are probably pre sorted to order the draw in the depth. Probably an octree like structure holding the cloud subscene helps here. May be you can also look into osg's fast geometry. If you use index arrays in osg, the draw stage will resort to glVertex3f calls which is slow in any driver. Make sure that the geometry is drawn by glDrawElements and the like. Greetings Mathias -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] Trying to get more performance out of the 3D clouds!
Hi, On Monday, December 05, 2011 12:07:31 Stuart Buchanan wrote: I had come to the conclusion that the only way to get a signficant increase in performance would be move to using Impostors. That's a big change, particularly as the OSG implimentation appears to be broken/bit-rotted. I've been strenuously avoiding having to think about implementing it myself, but I may have to just bite the bullet. Either way, it isn't going to happen for 2.6.0! There is also a technicque called 'pre integration' available for clouds. So youre actually really computing the rendering equations integrals but the space filling volume elements have their conrtibutions to the integral value already precomputed on a texture map on the volume elements surface. The problem to solve is still how to draw this fast in the right back to front order. And I agree that this is also something like the imposters in the sense that there are pre computed areas having a fixed ficture for some time. Together I still think that the amount of draws and state changes must be cut down. In terms of state changes, an exact depth sorted draw order causes may be a huge amount of texture state changes. May be it is possible to collapse the different textures into one and pick out the appropriate subrange of the texture? May be an array texture, may be a 3d texture with different layers in each discrete z dimension? Greetings Mathias -- Cloud Services Checklist: Pricing and Packaging Optimization This white paper is intended to serve as a reference, checklist and point of discussion for anyone considering optimizing the pricing and packaging model of a cloud services business. Read Now! http://www.accelacomm.com/jaw/sfnl/114/51491232/ ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel