Here is a set of specific commands I have just run with the example code John posted.

    export __GL_SYNC_TO_VBLANK=0
    export OSG_SERIALIZE_DRAW_DISPATCH=OFF
    export OSG_THREADING=CullThreadPerCameraDrawThreadPerContext

    ./multiWindows 0       testex.ive
    ./multiWindows 0 1     testex.ive
    ./multiWindows 0 1 2   testex.ive
    ./multiWindows 0 1 2 3 testex.ive

    Using the 's' key to display FPS for each case I get

      case        FPS
      ----        ---
      0           146
      0 1          52
      0 1 2        26
      0 1 2 3      15


I am still working on making the testex.ive available on an FTP site.

-Steve


On Tue, 14 Dec 2010, John Kelso wrote:

Hi all,

As Tim and Robert requested, attached is the OSG program I've been using to show
the problem with threading.  It's called multiWindows.cpp

Tim, I'd be very interested if you could run it and see what happens.
Anybody else out there have a system with more than one graphics card that
can give it a try?

To run it, specify 1 or more screen numbers, then a file to load.

For example:

  multiWindows 0 1 2 3 bigHonkingModelFile.ive

will create windows on 4 displays (:0.0 :0.1 :0.2 :0.3 -or- :0.0 :1.0 :2.0
:3.0 - look at the #if in the source for how to choose which one) and also
set processor affinity to processors 0 1 2 3.

As Steve mentioned, we have been using a pretty big file to show the drop in
frame rate.  Steve's working on getting it onto an ftp server.

As for the non-OSG program that doesn't show the problem, it uses a package
called "DGL", which is the OpenGL component of the DIVERSE package.  In
brief, DGL lets you run an OpenGL program as a callback.  The program I
wrote was the basic OpenGL helix program, but modified to spew enough
triangles to give a frame rate that was less than 60hz on our system.
I always got the same frame rate no matter if I ran on 1, 2, 3, or 4 cards.

--> I think the following might be important <---

DGL has its own threading and draw code.  It uses OpenThreads for threading.
The OpenGL calls generated by draw() are sent to the defined windows using
OSG's SceneView class and Producer.  So, it's not completely OSG-free, but
as its threading works, perhaps this indicates that the OSG problem is not
in SceneView.

If anyone wants to install DGL I can send them details on how to get it and
install it, and the modified helix test file. The DIVERSE home page is
http://diverse.sourceforge.net/diverse/

I hope this is helpful.

Many thanks,

John

On Tue, 14 Dec 2010, Steve Satterfield wrote:

Hi Tim,

I have pulled your questions out of the body of the test and responding to
them up front.

Are you using Linux?

Yes, we are running CentOS and our sys admin keeps it very much up to date.

Could you share the source of this program?

  Yes, we can post the source code. John Kelso did the actual work and
  he will follow up with the code and details in a separate
  message. There are actually two test programs.

  The first test is a straight OSG only test. It is the primary code
  used for most of the tests. It reads any OSG loadable file. We have
  an .ive test case. I need to make it available via FTP. Details will
  follow.

  The second test does not use OSG and does the graphics directly with
  OpenGL. It does require some additional software to download and install.
  John will provide details.

It is paradoxical. That it works at all is do to the fact that, with
vsync enabled, all GPU activity is buffered up until the after the
next SwapBuffers call.

  I am not entirely clear what you mean in this statement. I will say
  that for the majority of our testing, we have the Nvidia environment
  variable "__GL_SYNC_TO_VBLANK" set to 0 so the swap is tied to
  vblank. I believe this is specific to the Nvidia driver. For normal
  production its set to 1. The X/N performance is observed in both
  cases.


I put together a multicard system specifically to look at these
issues, and I too am very interested in getting it to work.

  Does this mean you are seeing performance problems like I have
  described on your system? We would certainly be interested in
  hearing how our test program(s) run on your multi-card system.

  I will add that we had Nvidia contacts interested in eliminating if
  the problem is related to Nvidia drivers. They got the X/N
  performance on a a non-Nvidia machine and that's what prompted me to
  build a dual ATI based machine as I reported in the original
  message. Its always useful to demonstrate a problem on multiple
  platforms.


-Steve






On Mon, 13 Dec 2010, Tim Moore wrote:



On Mon, Dec 13, 2010 at 9:51 PM, Steve Satterfield 
<st...@nist.gov<mailto:st...@nist.gov>> wrote:

Hi,

I would like to update the discussion we started back in October
regarding an apparent problem scaling OSG to multiple windows on
multiple graphics cards. For full details on the previous discussion
search the email archive for "problem scaling to multiple cards".

Summary of the problem:

  We have a multi-threaded OSG application using version 2.8.3.  (We also
  ran the same tests using version 2.9.9 and got the same results.)  We
  have a system with four Nvidia FX5800 cards (an immersive cave like
  config) and 8 cores with 32 GB memory.

Are you using Linux?
  Since the application is parallel drawing to independent cards using
  different cores, we expect the frame rate to be independent of the number
  of cards in use.  However, frame rate is actually X/N where N is the
  number of cards being used.

  For example if the frame rate is 100 using one card, the frame rate
  drops to 50 for 2 cards and 25 for 4 cards in use.  If the
  application worked properly, the FPS would be 100 in all cases.


We have tried a number of things to isolate the problem:

...
   * We have created a pure OpenGL threaded application which draws to
     1, 2, 3, or 4 cards. There is no OSG involved. This application
     runs properly, no degradation in FPS for multiple cards.

Could you share the source of this program?
   * When we set OSG_SERIALIZE_DRAW_DISPATCH to OFF (default is ON)
     the total FPS actually drops. Watching the graphical statistics,
     the DRAW is clearly running in parallel, but is actually a bit
     slower than when the DRAW is serialized.

     While this behavior is consistent with the messages posted by
     Robert in August 2007 (search for OSG_SERIALIZE_DRAW_DISPATCH),
     its not what one would think should happen. Specifically, it
     seems counter-intuitive that serialized DRAW is faster than
     parallel DRAW.

It is paradoxical. That it works at all is do to the fact that, with vsync 
enabled, all GPU activity is buffered up until the after the next SwapBuffers 
call.
...
OSG is a critical component to our immersive virtual reality
environment.  Our scientific visualization applications are continuing
to demand more and more performance. We need multi-threading to work
properly.

What experiences are others with multiple cards having regarding
multi-threading? If anyone is interested, we can send our test program.

We would very much appreciate any help or suggestions on solving this
problem anyone can offer.
I put together a multicard system specifically to look at these issues, and I 
too am very interested in getting it to work.

Tim



_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to