Sorry, I was going off your post that read "This leads me to believe that bus contention was causing the lack of scalability." I was just trying to outline a way to validate that suspicion. If you've already ruled out bus contention, then disregard my post.
   -Paul

On 1/27/2012 11:40 AM, Jason Daly wrote:

Hi, Paul,

As Tim pointed out earlier, the more limiting resource is likely to be lock contention in the NVidia driver instead of bus contention.

Note that Bb is going to be 4000 MB/sec (500 MB/sec * 8), per Quadroplex (PCI-e 2.0 8x) on the NIST machine. On my system I have an X58 chipset, so it's even wider (500 MB/sec * 16) per card. Also, the NIST machine's FSB is limited to 1.6 GT/sec, which is less than the 5 GT/sec that PCI-e provides.

Given that the data is static (we're not transferring any vertex data over the bus, just making glDrawElements calls to VBOs in STATIC_DRAW mode), I doubt that the OpenGL command stream is going to be getting anywhere near the actual bus bandwidth.



On 01/27/2012 09:17 AM, Paul Martz wrote:
Hi Jason -- I agree that the system bus is a likely source of the scaling issue.
It is certainly a single resource that has to be shared by all the GPUs in the
system.

To make a case for bus bandwidth limitation, you might work through the math.

              Bb
FPS = ----------------
        Nw * Nd * Sd + O

Where Bb is the bus bandwidth in bytes, Nw is the number of windows you're
rendering to, Nd is the number of drawing commands, Sd is the size of the
drawing commands in bytes, and O is the per frame OpenGL command overhead in bytes.

The knowns:
Bb = 500 MB/sec (PCIe 2.0)
Nw = 4
Nd = 5500

Sd is a little harder to compute. It'll depend on the draw technology you're
using (buffer objects or display lists) and the underlying OpenGL
implementation. You could make a very rough guess here by figuring a fullword
per OpenGL command, and a fullword per OpenGL command parameter. Just for the
same of example, let's says Sd = 64 (16 fullwords to draw a single osg::Geometry).

O encompasses all the per-frame OpenGL commands that OSG emits: glClear,
glClearColor, glClearMask, dlDepthMask, matrices, light sources, swap, etc. You
could plug in a rough guess like you would for Sd. Again, just as an example,
let's use O = 2048.

Plugging all that into a calculator, I get FPS = 371. But if Nw dropped to a
single window, then FPS would jump to over 1400 -- or, more likely, you'd become
limited by some other single resource in the system.

The nice thing about algebra is that you can solve for the unknown, so of course
you have the FPS, and if you have a pretty good guess for O (which can actually
be pretty sloppy anyhow), then you ought to be able to solve for Sd and ask
yourself if that result makes sense.

I hope this helps.
     -Paul



_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org




--
  -Paul Martz      Skew Matrix Software
                   http://www.skew-matrix.com/

_______________________________________________
osg-users mailing list
osg-users@lists.openscenegraph.org
http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org

Reply via email to