Ian Romanick wrote:
Here's a simple patch that gives about a 50% (on my box) speed boost to glReadPixels performance in 24-bit. I measured using the benchmark built into progs/demos/readpix. The interesting thing is that the core MMX & SSE2 routines can be used for other cards as well. For example, it looks like MGA, Unichrome, and others can use the same code for 24-bit.

Before persuing this too far, I'd like to look at ways to make the *compiled* code from spantmp.h be more device-independent. That would make it easier to generate a bunch of these generic routines and just plug them in.

Here's version 3 of the patch. This is *probably* the last version that will circulate as a patch. Here are the changes from the last version of the patch:

- Fixes the problem where the R200 driver would only use the MMX version.
- Numerous little optimizations to all 3 versions. The SSE version is still crap. :(
- Trivially optimized the "C" version. ;)

I'm thinking that a lot of this will actually get pulled into spantmp.h when I commit it. My thinking is to have the driver define which pixel format it uses (e.g., "#define SPANTMP_USE_BGRA8888_REV") and have spantmp.h automatically generate the optimized versions (based on the existance of USE_MMX_ASM, etc.). Since there are just handful of pixel formats that appear in practice, this should be pretty easy to do.

My only concern is big-endian machines. I should be able to try this out on a Rage128 in a Power Mac. Maybe there will be another version as a patch...ugh...

Attachment: r200_readpixels-03.tar.bz2
Description: BZip2 compressed data

Reply via email to