Karl Czajkowski wrote:
> 
> We have one particularly brute-force application that draws large
> batches of points (millions per frame) with low alpha value using an
> acculative blend mode and no z-buffer.  I am curious what kind of
> performance is expected through the DRI in the long run.
> 
> Using an Nvidia Quadro2Pro and their drivers, the application can
> sustain 30 Mpts/s draw rate into a 512x512 window.  Using the DRI
> radeon driver it sustains only about 1.5 Mpts/s. Using a hand-coded
> software benchmark we get about 10 Mpts/s on both an 800 MHz PIII and a
> 1.7 GHz Xeon using MMX and SSE instructions.
> 
> The opengl application draws the points in direct mode without vertex
> arrays or anything (we saw performance degradation when we tried
> vertex arrays a while back).  The software benchmark is tuned to
> process small batches of about 1000 points to get good cache behavior
> and loop benefits. It uses SSE to do the perpsective vector transform
> on the 4x1 vertex, and MMX to do the saturating RGB blend into a
> 32-bpp pixel array. It also uses the SSE prefetch t1/nta instructions
> to good effect.
> 
> The bottleneck in our benchmark appears to be the rasterization loop
> that draws an array of screen-coordinate points into the
> image. Turning off the MMX rasterization pass yields SSE vertex
> transform rates of 20 Mpts/s on the 800 MHz PIII and 30 Mpt/s on the
> 1.7 GHz Xeon.
> 
> Would the DRI ever be expected to be as efficient as the Nvidia driver
> or our software benchmark for handling batches of primitives like
> this?  Right now it looks like we would get a big win using our
> software pass and dumping the image into a GL window for display/GUI
> if we want to run the application on a radeon-equipped notebook.

While we will have a T&L driver for the radeon 7500 before too long, nvidia do
have a good headstart and a lot of money to apply to the problem.  

At the moment, I'd ask if you've tried the mesa-4-0-branch version of the
radeon driver - the code there should be a bit more efficient than on the
trunk.  If you wanted to post a simple GL program that excersizes the
functionality, I can check if any bad code paths are being triggered
accidentally.  1.5m vertices/second sounds low compared to the results I'm
getting elsewhere, so there may be something going wrong.

Keith

_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to