Re: [osg-users] Question about views, contexts and threading
Robert, I tried your suggestion, but it didn't have any effect. It's probably a driver issue then (nvidia 180.06 beta). I should receive a dual GTX260 system any day now; I'll try and see if that works better. Robert Osfield wrote: HI Ferdi, Could try the same tests but with the following env var set: set OSG_SERIALIZE_DRAW_DISPATCH=OFF This will disable the mutex that serializes the draw dispatch. Have a search through the archives on this topic as I've written lots about this topic and the fact serialize draw curious improves performance on systems that I've tested on. I still haven't had feedback from the community on this topic as it's likely to be something effected by hardware/drivers and OS. Robert. On Thu, Nov 20, 2008 at 4:05 PM, Ferdi Smit <[EMAIL PROTECTED]> wrote: Thank you, that at least explains some of the drawing times I've been seeing. I ran more tests on our dual-gpu system, summarized below. Not striclty OSG related, but they may be interesting nonetheless... - Scene of 25x 1 million polygon model, all visible. Culling etc neglibile. - Stand-alone refers to one rendering context only; normal, non-parallel rendering - frame rates in FPS CPU Affinity on different cores OSG_THREADING=SingleThreaded (1 core shows heavy use, 2nd core show moderate use, 2 cores idle) Quadro 56008800GTX Single-GPU / Stand-alone1615 Single-GPU / Multi-Threaded7.57.5 Single-GPU / Multi-Processing7.57.5 Multi-GPU / Multi-Threaded6.56.5 Multi-GPU / Multi-Processing1615 Quadro 56008800GTX OSG_THREADING=ThreadPerContext (CPU Affinity is set but appears to be ignored: 1 core shows heavy use, others idle) Single-GPU / Stand-alone1615 Single-GPU / Multi-Threaded7.57.5 Single-GPU / Multi-Processing7.57.5 Multi-GPU / Multi-Threaded3.511 Multi-GPU / Multi-Processing1114 Quadro 56008800GTX Baseline: Multi-GPU / Multi-Threaded6.56.5 Speeding up one card by rendering empty scene*, effect on other card: Multi-GPU / Multi-Threaded6000*15 Multi-GPU / Multi-Threaded714* All results are reasonable, except: Single-GPU / Multi-Processing7.57.5 Multi-GPU / Multi-Threaded6.56.5 Multi-GPU / Multi-Processing1615 Which is very strange; using two distinct GPUs simultaneously in a threaded way in the same address space is slower than sharing a single GPU. I can only conclude that OpenGL drivers can not handle multi-threading with different contexts on different devices. It also seems that the Quadro is the culprit, locking the driver or something. If you let the quadro render fast, the 8800 also renders fast. However, if you allow the 8800 to render fast, both will remain slow. -- Regards, Ferdi Smit INS3 Visualization and 3D Interfaces CWI Amsterdam, The Netherlands ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org -- Regards, Ferdi Smit INS3 Visualization and 3D Interfaces CWI Amsterdam, The Netherlands ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Question about views, contexts and threading
I tried the changes of osgviewer and not much difference on 4 systems except the texture on the cow appears in the copies (Optimizer previously bleached the cow). My traces for system curly look the same. Others systems I just look at behavior. -Don Hi All, On Thu, Nov 20, 2008 at 4:01 PM, Robert Osfield <[EMAIL PROTECTED]> wrote: I think the best lead would be that perhaps the texture object/display lists buffer_value containers aren't being resized to fit the new number of contexts which the app is running single threaded. In theory addView should be stopping all threads, and then issuing the Node::resizeGLObejcts() on the scene graph so handling this situation, but perhaps this isn't happening. I've looked into the CompositeViewer:::addView()/View::setSceneData()/Viewer::setSceneData() methods and only the Viewer::setSceneData() has a call to resize the GL objects. The actual code looks like: void Viewer::setSceneData(osg::Node* node) { setReferenceTime(0.0); View::setSceneData(node); if (_threadingModel!=SingleThreaded && getSceneData()) { // make sure that existing scene graph objects are allocated with thread safe ref/unref getSceneData()->setThreadSafeRefUnref(true); // update the scene graph so that it has enough GL object buffer memory for the graphics contexts that will be using it. getSceneData()->resizeGLObjectBuffers(osg::DisplaySettings::instance()->getMaxNumberOfGraphicsContexts()); } } My guess is that we need to move the resize/setThreadSafeRefUnref() up into the View::setSceneData() method. The Viewer::setSceneData() method is a viewer so has access to members of ViewerBase that View doesn't have. Another issue is that if we are setting the View up prior to any call to stopThreading as the resize isn't thread safe. As we don't know whether this is the cause of the problem yet, I've modified J-S's osgviewer.cpp to do the resize. Could users who've seen problems try this version out, if this works then we have workaround that end users can apply to existing apps, and we can figure out a solution to fix it permanently in svn/trunk. Robert. ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
Re: [osg-users] Question about views, contexts and threading
HI Ferdi, Could try the same tests but with the following env var set: set OSG_SERIALIZE_DRAW_DISPATCH=OFF This will disable the mutex that serializes the draw dispatch. Have a search through the archives on this topic as I've written lots about this topic and the fact serialize draw curious improves performance on systems that I've tested on. I still haven't had feedback from the community on this topic as it's likely to be something effected by hardware/drivers and OS. Robert. On Thu, Nov 20, 2008 at 4:05 PM, Ferdi Smit <[EMAIL PROTECTED]> wrote: > Thank you, that at least explains some of the drawing times I've been > seeing. > > I ran more tests on our dual-gpu system, summarized below. Not striclty OSG > related, but they may be interesting nonetheless... > > - Scene of 25x 1 million polygon model, all visible. Culling etc neglibile. > - Stand-alone refers to one rendering context only; normal, non-parallel > rendering > - frame rates in FPS > > CPU Affinity on different cores > OSG_THREADING=SingleThreaded > (1 core shows heavy use, 2nd core show moderate use, 2 cores idle) > > Quadro 56008800GTX > Single-GPU / Stand-alone1615 > > Single-GPU / Multi-Threaded7.57.5 > Single-GPU / Multi-Processing7.57.5 > > Multi-GPU / Multi-Threaded6.56.5 > Multi-GPU / Multi-Processing1615 > > Quadro 56008800GTX > > OSG_THREADING=ThreadPerContext > (CPU Affinity is set but appears to be ignored: 1 core shows heavy use, > others idle) > > Single-GPU / Stand-alone1615 > > Single-GPU / Multi-Threaded7.57.5 > Single-GPU / Multi-Processing7.57.5 > > Multi-GPU / Multi-Threaded3.511 > Multi-GPU / Multi-Processing1114 > > > Quadro 56008800GTX > Baseline: > Multi-GPU / Multi-Threaded6.56.5 > > Speeding up one card by rendering empty scene*, effect on other card: > Multi-GPU / Multi-Threaded6000*15 > Multi-GPU / Multi-Threaded714* > > > All results are reasonable, except: > > Single-GPU / Multi-Processing7.57.5 > Multi-GPU / Multi-Threaded6.56.5 > Multi-GPU / Multi-Processing1615 > > Which is very strange; using two distinct GPUs simultaneously in a threaded > way in the same address space is slower than sharing a single GPU. I can > only conclude that OpenGL drivers can not handle multi-threading with > different contexts on different devices. It also seems that the Quadro is > the culprit, locking the driver or something. If you let the quadro render > fast, the 8800 also renders fast. However, if you allow the 8800 to render > fast, both will remain slow. > > Robert Osfield wrote: >> >> Hi Ferdi, >> >> The understand what is happening with draw in the two instances you >> need to understand how OpenGL operates. For each graphics context >> OpenGL maintains a FIFO that is filled by the applications graphics >> thread for that context, and is drained by the driver that batches the >> commands/data in the fifo up into a form that can be pushed to the >> graphics card. >> >> Now if this FIFO has plenty of room then the application can keep >> filling the FIFO without OpenGL ever blocking the applications graphis >> thread - in this case the draw dispatch times (the OSG side) are >> relatively low. If however you fill the FIFO then OpenGL will block >> the applications graphics thread till enough room has been made by the >> GPU consuming command/data at the other end. When you get to this >> point often you'll find draw dispatch times that suddenly jump up, and >> it's not because it's suddenly doing more work - in fact the app >> graphics thread is just sitting their idle waiting for the graphics >> drive/GPU to do it's stuff. >> >> Now drivers may have different sized FIFO's, and different GPU's will >> work at different speeds and possibly have other features that affect >> the FIFO filling/emptying. One would expect slower GPU's to empty the >> fifo slower so are more likely to block, but the driver can also have >> affect. The architecture of overall hardware, what other threads are >> running, how contended the various parts of the hardware etc all can >> have an effect. The fact that one GPU's draw dispatch is far longer >> than another might simply bit that it's pushed just hard enough to >> fill the FIFO, but it might still hit frame just fine, but the draw >> times will be drastically higher because of the blocking due to the >> filled FIFO, a slightly lower load will lead could lead to FIFO not >> blocking an huge drop in draw dispatch times. It's very no linear, >> small differences can result in large observed differences, but often >> the long draw time might not be anything to worry about - it's just an >> early warning sign, you might still hit your target frame rate just >> fine
Re: [osg-users] Question about views, contexts and threading
Thank you, that at least explains some of the drawing times I've been seeing. I ran more tests on our dual-gpu system, summarized below. Not striclty OSG related, but they may be interesting nonetheless... - Scene of 25x 1 million polygon model, all visible. Culling etc neglibile. - Stand-alone refers to one rendering context only; normal, non-parallel rendering - frame rates in FPS CPU Affinity on different cores OSG_THREADING=SingleThreaded (1 core shows heavy use, 2nd core show moderate use, 2 cores idle) Quadro 56008800GTX Single-GPU / Stand-alone1615 Single-GPU / Multi-Threaded7.57.5 Single-GPU / Multi-Processing7.57.5 Multi-GPU / Multi-Threaded6.56.5 Multi-GPU / Multi-Processing1615 Quadro 56008800GTX OSG_THREADING=ThreadPerContext (CPU Affinity is set but appears to be ignored: 1 core shows heavy use, others idle) Single-GPU / Stand-alone1615 Single-GPU / Multi-Threaded7.57.5 Single-GPU / Multi-Processing7.57.5 Multi-GPU / Multi-Threaded3.511 Multi-GPU / Multi-Processing1114 Quadro 56008800GTX Baseline: Multi-GPU / Multi-Threaded6.56.5 Speeding up one card by rendering empty scene*, effect on other card: Multi-GPU / Multi-Threaded6000*15 Multi-GPU / Multi-Threaded714* All results are reasonable, except: Single-GPU / Multi-Processing7.57.5 Multi-GPU / Multi-Threaded6.56.5 Multi-GPU / Multi-Processing1615 Which is very strange; using two distinct GPUs simultaneously in a threaded way in the same address space is slower than sharing a single GPU. I can only conclude that OpenGL drivers can not handle multi-threading with different contexts on different devices. It also seems that the Quadro is the culprit, locking the driver or something. If you let the quadro render fast, the 8800 also renders fast. However, if you allow the 8800 to render fast, both will remain slow. Robert Osfield wrote: Hi Ferdi, The understand what is happening with draw in the two instances you need to understand how OpenGL operates. For each graphics context OpenGL maintains a FIFO that is filled by the applications graphics thread for that context, and is drained by the driver that batches the commands/data in the fifo up into a form that can be pushed to the graphics card. Now if this FIFO has plenty of room then the application can keep filling the FIFO without OpenGL ever blocking the applications graphis thread - in this case the draw dispatch times (the OSG side) are relatively low. If however you fill the FIFO then OpenGL will block the applications graphics thread till enough room has been made by the GPU consuming command/data at the other end. When you get to this point often you'll find draw dispatch times that suddenly jump up, and it's not because it's suddenly doing more work - in fact the app graphics thread is just sitting their idle waiting for the graphics drive/GPU to do it's stuff. Now drivers may have different sized FIFO's, and different GPU's will work at different speeds and possibly have other features that affect the FIFO filling/emptying. One would expect slower GPU's to empty the fifo slower so are more likely to block, but the driver can also have affect. The architecture of overall hardware, what other threads are running, how contended the various parts of the hardware etc all can have an effect. The fact that one GPU's draw dispatch is far longer than another might simply bit that it's pushed just hard enough to fill the FIFO, but it might still hit frame just fine, but the draw times will be drastically higher because of the blocking due to the filled FIFO, a slightly lower load will lead could lead to FIFO not blocking an huge drop in draw dispatch times. It's very no linear, small differences can result in large observed differences, but often the long draw time might not be anything to worry about - it's just an early warning sign, you might still hit your target frame rate just fine. Robert. Robert. On Tue, Nov 18, 2008 at 3:31 PM, Ferdi Smit <[EMAIL PROTECTED]> wrote: Hi Robert, I ran some more tests with a realistic scene of ~25M polygons (25 times the same 1M model). Stand-alone this is rendered at ~15 FPS on one GPU (8800GTX or Quadro FX5600 + Intel Quad Core). Multi-_processing_ with two contexts at two gpus, both rendering this scene, the 8800 stays at 15 but the Quadro drops to 12. Multi-_threading_ with two contexts at two gpus, the 8800 drops to 9.5 and the quadro to 4.5 FPS. This is weird. Also, the 8800 reports (in the osg performance hud) that GPU=65 and Draw=10. Draw is always much lower than GPU. But the Quadro in multi-threading goes to GPU=210 and Draw=210; GPU and Draw are suddenly equal now. What does t
Re: [osg-users] Question about views, contexts and threading
Hi Ferdi, The understand what is happening with draw in the two instances you need to understand how OpenGL operates. For each graphics context OpenGL maintains a FIFO that is filled by the applications graphics thread for that context, and is drained by the driver that batches the commands/data in the fifo up into a form that can be pushed to the graphics card. Now if this FIFO has plenty of room then the application can keep filling the FIFO without OpenGL ever blocking the applications graphis thread - in this case the draw dispatch times (the OSG side) are relatively low. If however you fill the FIFO then OpenGL will block the applications graphics thread till enough room has been made by the GPU consuming command/data at the other end. When you get to this point often you'll find draw dispatch times that suddenly jump up, and it's not because it's suddenly doing more work - in fact the app graphics thread is just sitting their idle waiting for the graphics drive/GPU to do it's stuff. Now drivers may have different sized FIFO's, and different GPU's will work at different speeds and possibly have other features that affect the FIFO filling/emptying. One would expect slower GPU's to empty the fifo slower so are more likely to block, but the driver can also have affect. The architecture of overall hardware, what other threads are running, how contended the various parts of the hardware etc all can have an effect. The fact that one GPU's draw dispatch is far longer than another might simply bit that it's pushed just hard enough to fill the FIFO, but it might still hit frame just fine, but the draw times will be drastically higher because of the blocking due to the filled FIFO, a slightly lower load will lead could lead to FIFO not blocking an huge drop in draw dispatch times. It's very no linear, small differences can result in large observed differences, but often the long draw time might not be anything to worry about - it's just an early warning sign, you might still hit your target frame rate just fine. Robert. Robert. On Tue, Nov 18, 2008 at 3:31 PM, Ferdi Smit <[EMAIL PROTECTED]> wrote: > Hi Robert, > > I ran some more tests with a realistic scene of ~25M polygons (25 times the > same 1M model). Stand-alone this is rendered at ~15 FPS on one GPU (8800GTX > or Quadro FX5600 + Intel Quad Core). Multi-_processing_ with two contexts at > two gpus, both rendering this scene, the 8800 stays at 15 but the Quadro > drops to 12. Multi-_threading_ with two contexts at two gpus, the 8800 drops > to 9.5 and the quadro to 4.5 FPS. This is weird. Also, the 8800 reports (in > the osg performance hud) that GPU=65 and Draw=10. Draw is always much lower > than GPU. But the Quadro in multi-threading goes to GPU=210 and Draw=210; > GPU and Draw are suddenly equal now. What does this Draw statistic > represent? Is it time spend in driver draw calls? > > I suspect buggy Quadro drivers, but I'm not sure. It's the only system I can > test on. I'm sorry if this diverts from a pure OSG discussion; perhaps I > should take it to an nvidia forum. > > Robert Osfield wrote: >> >> Hi Ferdi, >> >> W.r.t performance and stability of multi-threading the graphics, as >> long as you have two GPU's the most efficient way to drive them should >> be multi-threaded - there is a caveat though, hardware and drivers >> aren't always up to scratch, and even then they should be able to >> manage the multi-threads and multi-gpus seemless they fail too. >> >> I'm poised to build a new machine based on the new Intel iCore7 and >> X58 motherboard, it'll be interesting to see how well is scales. >> >> W.r.t PBO readback - it's very very sensitive to the pixel formats you >> use. See the osgscreencapture example. >> >> Robert. >> >> On Mon, Nov 17, 2008 at 5:31 PM, Ferdi Smit <[EMAIL PROTECTED]> wrote: >> >>> >>> Thanks Robert. I did a quick test with two viewers from two threads and >>> it >>> appears to be working. Btw, from my experience, PBO doesn't seem to be >>> any >>> faster (and on some hardware much slower) for downloading textures to >>> host >>> than glReadPixels, while for uploads it is almost consistently faster. >>> Anyway, that should not be a problem, even to code it manually. >>> >>> One question about the OpenGL driver, are you by any chance aware of any >>> threading issues? Is it completely re-entrant from two different contexts >>> and threads? With this two-thread setup, I see some occasional erratic >>> fluctuation in drawing time in the osg performance hud for a completely >>> still scene. The GPU performance is very stable, regardless of the load >>> on >>> the other card, but the drawing time (software) sometimes goes from >>> something like 0.4 to 2.6 or 1.5 for a couple of frames. I do not notice >>> this, or not as much, when using two separate processes instead of two >>> threads. The only difference I can think of here is that the OpenGL >>> driver >>> part is in the same address space and maybe internall
Re: [osg-users] Question about views, contexts and threading
Hi Robert, I ran some more tests with a realistic scene of ~25M polygons (25 times the same 1M model). Stand-alone this is rendered at ~15 FPS on one GPU (8800GTX or Quadro FX5600 + Intel Quad Core). Multi-_processing_ with two contexts at two gpus, both rendering this scene, the 8800 stays at 15 but the Quadro drops to 12. Multi-_threading_ with two contexts at two gpus, the 8800 drops to 9.5 and the quadro to 4.5 FPS. This is weird. Also, the 8800 reports (in the osg performance hud) that GPU=65 and Draw=10. Draw is always much lower than GPU. But the Quadro in multi-threading goes to GPU=210 and Draw=210; GPU and Draw are suddenly equal now. What does this Draw statistic represent? Is it time spend in driver draw calls? I suspect buggy Quadro drivers, but I'm not sure. It's the only system I can test on. I'm sorry if this diverts from a pure OSG discussion; perhaps I should take it to an nvidia forum. Robert Osfield wrote: Hi Ferdi, W.r.t performance and stability of multi-threading the graphics, as long as you have two GPU's the most efficient way to drive them should be multi-threaded - there is a caveat though, hardware and drivers aren't always up to scratch, and even then they should be able to manage the multi-threads and multi-gpus seemless they fail too. I'm poised to build a new machine based on the new Intel iCore7 and X58 motherboard, it'll be interesting to see how well is scales. W.r.t PBO readback - it's very very sensitive to the pixel formats you use. See the osgscreencapture example. Robert. On Mon, Nov 17, 2008 at 5:31 PM, Ferdi Smit <[EMAIL PROTECTED]> wrote: Thanks Robert. I did a quick test with two viewers from two threads and it appears to be working. Btw, from my experience, PBO doesn't seem to be any faster (and on some hardware much slower) for downloading textures to host than glReadPixels, while for uploads it is almost consistently faster. Anyway, that should not be a problem, even to code it manually. One question about the OpenGL driver, are you by any chance aware of any threading issues? Is it completely re-entrant from two different contexts and threads? With this two-thread setup, I see some occasional erratic fluctuation in drawing time in the osg performance hud for a completely still scene. The GPU performance is very stable, regardless of the load on the other card, but the drawing time (software) sometimes goes from something like 0.4 to 2.6 or 1.5 for a couple of frames. I do not notice this, or not as much, when using two separate processes instead of two threads. The only difference I can think of here is that the OpenGL driver part is in the same address space and maybe internally locks occasionally? Or is this nonsense? Anyway, the osg part seems to be fairly straightforward and simple like this. Thanks. Robert Osfield wrote: Hi Ferdi, osgViewer::CompositeViewer runs all of the views synchronously - one frame() call dispatches update, event, cull and draw traversals for all the views. So for you case where you want them to run async, this isn't supported. Supporting within CompositeViewer would really complicate the API so it's not something I gone for. What you will be able to do is use two separate Viewer's. You are likely to want to run two threads for each of the viewers frame loops as well. To get the render to image result to the second viewer all you need to do is assign the same osg::Image to the first viewer's Camera for it to copy to, and then attach the same osg::Image to a texture in the scene of the second viewer. The OSG should automatically do the glReadPixels to the image data, dirty the Image, and then automatically the texture will update in the second viewer. You could potentially optimize things by using an PBO but the off the shelf osg::PixelBufferObject isn't suitable for read in this way so you'll need to roll you own support for this. It's worth noting that I've never written a app like the above, so you are rather working on the bleeding edge. I "think" it should work, or at least I can't spot any major problems that might appear. Robert. On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit <[EMAIL PROTECTED]> wrote: I'm looking to do the following in OSG, and I wonder if I'm on the right track (before wasting time needlessly): have two render processes run in parallel on two different GPUs, have one render a scene to texture and let this texture be read by the other process and mapped to an object in a different scene. Problem, the rendering of the first scene to texture is very slow and the rendering of the second scene is very fast. I intend to solve it in the following way in pseudo-code: - new CompositeViewer - Add two Views - Construct two contexts, one on localhost:0.0, one on localhost:0.1 - Attach contexts to cameras of corresponding Views - Set composite viewer threading mode to thread-per-context --- First process - Set view camera mode to FBO and pre-render - Add post-dr
Re: [osg-users] Question about views, contexts and threading
Hi Ferdi, W.r.t performance and stability of multi-threading the graphics, as long as you have two GPU's the most efficient way to drive them should be multi-threaded - there is a caveat though, hardware and drivers aren't always up to scratch, and even then they should be able to manage the multi-threads and multi-gpus seemless they fail too. I'm poised to build a new machine based on the new Intel iCore7 and X58 motherboard, it'll be interesting to see how well is scales. W.r.t PBO readback - it's very very sensitive to the pixel formats you use. See the osgscreencapture example. Robert. On Mon, Nov 17, 2008 at 5:31 PM, Ferdi Smit <[EMAIL PROTECTED]> wrote: > Thanks Robert. I did a quick test with two viewers from two threads and it > appears to be working. Btw, from my experience, PBO doesn't seem to be any > faster (and on some hardware much slower) for downloading textures to host > than glReadPixels, while for uploads it is almost consistently faster. > Anyway, that should not be a problem, even to code it manually. > > One question about the OpenGL driver, are you by any chance aware of any > threading issues? Is it completely re-entrant from two different contexts > and threads? With this two-thread setup, I see some occasional erratic > fluctuation in drawing time in the osg performance hud for a completely > still scene. The GPU performance is very stable, regardless of the load on > the other card, but the drawing time (software) sometimes goes from > something like 0.4 to 2.6 or 1.5 for a couple of frames. I do not notice > this, or not as much, when using two separate processes instead of two > threads. The only difference I can think of here is that the OpenGL driver > part is in the same address space and maybe internally locks occasionally? > Or is this nonsense? > > Anyway, the osg part seems to be fairly straightforward and simple like > this. Thanks. > > > Robert Osfield wrote: >> >> Hi Ferdi, >> >> osgViewer::CompositeViewer runs all of the views synchronously - one >> frame() call dispatches update, event, cull and draw traversals for >> all the views. So for you case where you want them to run async, this >> isn't supported. Supporting within CompositeViewer would really >> complicate the API so it's not something I gone for. >> >> What you will be able to do is use two separate Viewer's. You are >> likely to want to run two threads for each of the viewers frame loops >> as well. To get the render to image result to the second viewer all >> you need to do is assign the same osg::Image to the first viewer's >> Camera for it to copy to, and then attach the same osg::Image to a >> texture in the scene of the second viewer. The OSG should >> automatically do the glReadPixels to the image data, dirty the Image, >> and then automatically the texture will update in the second viewer. >> You could potentially optimize things by using an PBO but the off the >> shelf osg::PixelBufferObject isn't suitable for read in this way so >> you'll need to roll you own support for this. >> >> It's worth noting that I've never written a app like the above, so you >> are rather working on the bleeding edge. I "think" it should work, or >> at least I can't spot any major problems that might appear. >> >> Robert. >> >> On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit <[EMAIL PROTECTED]> wrote: >> >>> >>> I'm looking to do the following in OSG, and I wonder if I'm on the right >>> track (before wasting time needlessly): have two render processes run in >>> parallel on two different GPUs, have one render a scene to texture and >>> let >>> this texture be read by the other process and mapped to an object in a >>> different scene. Problem, the rendering of the first scene to texture is >>> very slow and the rendering of the second scene is very fast. >>> >>> I intend to solve it in the following way in pseudo-code: >>> >>> - new CompositeViewer >>> - Add two Views >>> - Construct two contexts, one on localhost:0.0, one on localhost:0.1 >>> - Attach contexts to cameras of corresponding Views >>> - Set composite viewer threading mode to thread-per-context >>> >>> --- First process >>> - Set view camera mode to FBO and pre-render >>> - Add post-draw callback and render textures >>> - Download texture to host memory in post-draw callback >>> - (possibly add post-render camera to render textured screen quad as >>> output) >>> >>> --- Second process >>> - Add update-callback and regular texture >>> - Upload host memory to texture in update callback (if available, >>> non-blocking) >>> >>> The downloading and uploading of textures uses multiple slots and regular >>> threaded locking, so to ensure we never read or write the same memory at >>> the >>> same time. The second process doesn't block if no new texture is >>> available, >>> it just continues using the old one then. >>> >>> Some questions. Will the two processes now run at independent frame >>> rates, >>> or will the composite viewer synchronize them? I ne
Re: [osg-users] Question about views, contexts and threading
Thanks Robert. I did a quick test with two viewers from two threads and it appears to be working. Btw, from my experience, PBO doesn't seem to be any faster (and on some hardware much slower) for downloading textures to host than glReadPixels, while for uploads it is almost consistently faster. Anyway, that should not be a problem, even to code it manually. One question about the OpenGL driver, are you by any chance aware of any threading issues? Is it completely re-entrant from two different contexts and threads? With this two-thread setup, I see some occasional erratic fluctuation in drawing time in the osg performance hud for a completely still scene. The GPU performance is very stable, regardless of the load on the other card, but the drawing time (software) sometimes goes from something like 0.4 to 2.6 or 1.5 for a couple of frames. I do not notice this, or not as much, when using two separate processes instead of two threads. The only difference I can think of here is that the OpenGL driver part is in the same address space and maybe internally locks occasionally? Or is this nonsense? Anyway, the osg part seems to be fairly straightforward and simple like this. Thanks. Robert Osfield wrote: Hi Ferdi, osgViewer::CompositeViewer runs all of the views synchronously - one frame() call dispatches update, event, cull and draw traversals for all the views. So for you case where you want them to run async, this isn't supported. Supporting within CompositeViewer would really complicate the API so it's not something I gone for. What you will be able to do is use two separate Viewer's. You are likely to want to run two threads for each of the viewers frame loops as well. To get the render to image result to the second viewer all you need to do is assign the same osg::Image to the first viewer's Camera for it to copy to, and then attach the same osg::Image to a texture in the scene of the second viewer. The OSG should automatically do the glReadPixels to the image data, dirty the Image, and then automatically the texture will update in the second viewer. You could potentially optimize things by using an PBO but the off the shelf osg::PixelBufferObject isn't suitable for read in this way so you'll need to roll you own support for this. It's worth noting that I've never written a app like the above, so you are rather working on the bleeding edge. I "think" it should work, or at least I can't spot any major problems that might appear. Robert. On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit <[EMAIL PROTECTED]> wrote: I'm looking to do the following in OSG, and I wonder if I'm on the right track (before wasting time needlessly): have two render processes run in parallel on two different GPUs, have one render a scene to texture and let this texture be read by the other process and mapped to an object in a different scene. Problem, the rendering of the first scene to texture is very slow and the rendering of the second scene is very fast. I intend to solve it in the following way in pseudo-code: - new CompositeViewer - Add two Views - Construct two contexts, one on localhost:0.0, one on localhost:0.1 - Attach contexts to cameras of corresponding Views - Set composite viewer threading mode to thread-per-context --- First process - Set view camera mode to FBO and pre-render - Add post-draw callback and render textures - Download texture to host memory in post-draw callback - (possibly add post-render camera to render textured screen quad as output) --- Second process - Add update-callback and regular texture - Upload host memory to texture in update callback (if available, non-blocking) The downloading and uploading of textures uses multiple slots and regular threaded locking, so to ensure we never read or write the same memory at the same time. The second process doesn't block if no new texture is available, it just continues using the old one then. Some questions. Will the two processes now run at independent frame rates, or will the composite viewer synchronize them? I need them to run independently. I read OSG does not support multi-threaded updating of the scene graph. However, if I use two distinct scene graphs with two contexts, I can _pull_ updates in an update callback from another thread, right? What I can not do is push updates at arbitrary times; that would make sense. How do I make the TrackballManipulator work for only the first process? It seems that as soon as I set that camera to FBO it just doesn't respond to events (or maybe something else is wrong... I added another orthogonal camera to the view1->getCamera() that renders the screenquad in post-render mode). Also, the second process camera is affected when I move the mouse in the first process window. Is it sufficient to call view2->getCamera()->setAllowEventFocus(false); to disable this behavior? Finally, can I do this the same way with a shared context on a single GPU (i.e. both on :0.0) sharing texture data
Re: [osg-users] Question about views, contexts and threading
Hi Ferdi, osgViewer::CompositeViewer runs all of the views synchronously - one frame() call dispatches update, event, cull and draw traversals for all the views. So for you case where you want them to run async, this isn't supported. Supporting within CompositeViewer would really complicate the API so it's not something I gone for. What you will be able to do is use two separate Viewer's. You are likely to want to run two threads for each of the viewers frame loops as well. To get the render to image result to the second viewer all you need to do is assign the same osg::Image to the first viewer's Camera for it to copy to, and then attach the same osg::Image to a texture in the scene of the second viewer. The OSG should automatically do the glReadPixels to the image data, dirty the Image, and then automatically the texture will update in the second viewer. You could potentially optimize things by using an PBO but the off the shelf osg::PixelBufferObject isn't suitable for read in this way so you'll need to roll you own support for this. It's worth noting that I've never written a app like the above, so you are rather working on the bleeding edge. I "think" it should work, or at least I can't spot any major problems that might appear. Robert. On Mon, Nov 17, 2008 at 9:37 AM, Ferdi Smit <[EMAIL PROTECTED]> wrote: > I'm looking to do the following in OSG, and I wonder if I'm on the right > track (before wasting time needlessly): have two render processes run in > parallel on two different GPUs, have one render a scene to texture and let > this texture be read by the other process and mapped to an object in a > different scene. Problem, the rendering of the first scene to texture is > very slow and the rendering of the second scene is very fast. > > I intend to solve it in the following way in pseudo-code: > > - new CompositeViewer > - Add two Views > - Construct two contexts, one on localhost:0.0, one on localhost:0.1 > - Attach contexts to cameras of corresponding Views > - Set composite viewer threading mode to thread-per-context > > --- First process > - Set view camera mode to FBO and pre-render > - Add post-draw callback and render textures > - Download texture to host memory in post-draw callback > - (possibly add post-render camera to render textured screen quad as output) > > --- Second process > - Add update-callback and regular texture > - Upload host memory to texture in update callback (if available, > non-blocking) > > The downloading and uploading of textures uses multiple slots and regular > threaded locking, so to ensure we never read or write the same memory at the > same time. The second process doesn't block if no new texture is available, > it just continues using the old one then. > > Some questions. Will the two processes now run at independent frame rates, > or will the composite viewer synchronize them? I need them to run > independently. I read OSG does not support multi-threaded updating of the > scene graph. However, if I use two distinct scene graphs with two contexts, > I can _pull_ updates in an update callback from another thread, right? What > I can not do is push updates at arbitrary times; that would make sense. How > do I make the TrackballManipulator work for only the first process? It seems > that as soon as I set that camera to FBO it just doesn't respond to events > (or maybe something else is wrong... I added another orthogonal camera to > the view1->getCamera() that renders the screenquad in post-render mode). > Also, the second process camera is affected when I move the mouse in the > first process window. Is it sufficient to call > view2->getCamera()->setAllowEventFocus(false); to disable this behavior? > Finally, can I do this the same way with a shared context on a single GPU > (i.e. both on :0.0) sharing texture data directly on the GPU in different > textures? Ignoring the slow context switching issues for the time being. > > Am I one the right track here, or should this be done differently? I know > all this is possible because I have the manual OpenGL code for it working, > both using shared contexts and with up/downloading of texture data. > > -- > Regards, > > Ferdi Smit > INS3 Visualization and 3D Interfaces > CWI Amsterdam, The Netherlands > > ___ > osg-users mailing list > osg-users@lists.openscenegraph.org > http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org > ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org
[osg-users] Question about views, contexts and threading
I'm looking to do the following in OSG, and I wonder if I'm on the right track (before wasting time needlessly): have two render processes run in parallel on two different GPUs, have one render a scene to texture and let this texture be read by the other process and mapped to an object in a different scene. Problem, the rendering of the first scene to texture is very slow and the rendering of the second scene is very fast. I intend to solve it in the following way in pseudo-code: - new CompositeViewer - Add two Views - Construct two contexts, one on localhost:0.0, one on localhost:0.1 - Attach contexts to cameras of corresponding Views - Set composite viewer threading mode to thread-per-context --- First process - Set view camera mode to FBO and pre-render - Add post-draw callback and render textures - Download texture to host memory in post-draw callback - (possibly add post-render camera to render textured screen quad as output) --- Second process - Add update-callback and regular texture - Upload host memory to texture in update callback (if available, non-blocking) The downloading and uploading of textures uses multiple slots and regular threaded locking, so to ensure we never read or write the same memory at the same time. The second process doesn't block if no new texture is available, it just continues using the old one then. Some questions. Will the two processes now run at independent frame rates, or will the composite viewer synchronize them? I need them to run independently. I read OSG does not support multi-threaded updating of the scene graph. However, if I use two distinct scene graphs with two contexts, I can _pull_ updates in an update callback from another thread, right? What I can not do is push updates at arbitrary times; that would make sense. How do I make the TrackballManipulator work for only the first process? It seems that as soon as I set that camera to FBO it just doesn't respond to events (or maybe something else is wrong... I added another orthogonal camera to the view1->getCamera() that renders the screenquad in post-render mode). Also, the second process camera is affected when I move the mouse in the first process window. Is it sufficient to call view2->getCamera()->setAllowEventFocus(false); to disable this behavior? Finally, can I do this the same way with a shared context on a single GPU (i.e. both on :0.0) sharing texture data directly on the GPU in different textures? Ignoring the slow context switching issues for the time being. Am I one the right track here, or should this be done differently? I know all this is possible because I have the manual OpenGL code for it working, both using shared contexts and with up/downloading of texture data. -- Regards, Ferdi Smit INS3 Visualization and 3D Interfaces CWI Amsterdam, The Netherlands ___ osg-users mailing list osg-users@lists.openscenegraph.org http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org