Karl Czajkowski wrote: > > We have one particularly brute-force application that draws large > batches of points (millions per frame) with low alpha value using an > acculative blend mode and no z-buffer. I am curious what kind of > performance is expected through the DRI in the long run. > > Using an Nvidia Quadro2Pro and their drivers, the application can > sustain 30 Mpts/s draw rate into a 512x512 window. Using the DRI > radeon driver it sustains only about 1.5 Mpts/s. Using a hand-coded > software benchmark we get about 10 Mpts/s on both an 800 MHz PIII and a > 1.7 GHz Xeon using MMX and SSE instructions. > > The opengl application draws the points in direct mode without vertex > arrays or anything (we saw performance degradation when we tried > vertex arrays a while back). The software benchmark is tuned to > process small batches of about 1000 points to get good cache behavior > and loop benefits. It uses SSE to do the perpsective vector transform > on the 4x1 vertex, and MMX to do the saturating RGB blend into a > 32-bpp pixel array. It also uses the SSE prefetch t1/nta instructions > to good effect. > > The bottleneck in our benchmark appears to be the rasterization loop > that draws an array of screen-coordinate points into the > image. Turning off the MMX rasterization pass yields SSE vertex > transform rates of 20 Mpts/s on the 800 MHz PIII and 30 Mpt/s on the > 1.7 GHz Xeon. > > Would the DRI ever be expected to be as efficient as the Nvidia driver > or our software benchmark for handling batches of primitives like > this? Right now it looks like we would get a big win using our > software pass and dumping the image into a GL window for display/GUI > if we want to run the application on a radeon-equipped notebook.
While we will have a T&L driver for the radeon 7500 before too long, nvidia do have a good headstart and a lot of money to apply to the problem. At the moment, I'd ask if you've tried the mesa-4-0-branch version of the radeon driver - the code there should be a bit more efficient than on the trunk. If you wanted to post a simple GL program that excersizes the functionality, I can check if any bad code paths are being triggered accidentally. 1.5m vertices/second sounds low compared to the results I'm getting elsewhere, so there may be something going wrong. Keith _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel