Am Samstag, 9. Oktober 2004 03:33 schrieb Ian Romanick:
> Ian Romanick wrote:
> > Here's a simple patch that gives about a 50% (on my box) speed boost to
> > glReadPixels performance in 24-bit.  I measured using the benchmark
> > built into progs/demos/readpix.  The interesting thing is that the core
> > MMX & SSE2 routines can be used for other cards as well.  For example,
> > it looks like MGA, Unichrome, and others can use the same code for
> > 24-bit.
> >
> > Before persuing this too far, I'd like to look at ways to make the
> > *compiled* code from spantmp.h be more device-independent.  That would
> > make it easier to generate a bunch of these generic routines and just
> > plug them in.
>
> Here's version 3 of the patch.  This is *probably* the last version that
> will circulate as a patch.  Here are the changes from the last version
> of the patch:
>
> - Fixes the problem where the R200 driver would only use the MMX version.
> - Numerous little optimizations to all 3 versions.  The SSE version is
> still crap. :(
> - Trivially optimized the "C" version. ;)
>
> I'm thinking that a lot of this will actually get pulled into spantmp.h
> when I commit it.  My thinking is to have the driver define which pixel
> format it uses (e.g., "#define SPANTMP_USE_BGRA8888_REV") and have
> spantmp.h automatically generate the optimized versions (based on the
> existance of USE_MMX_ASM, etc.).  Since there are just handful of pixel
> formats that appear in practice, this should be pretty easy to do.
>
> My only concern is big-endian machines.  I should be able to try this
> out on a Rage128 in a Power Mac.  Maybe there will be another version as
> a patch...ugh...

Ian,

NONE of your three versions gave me direct rendering?!
I've tested with and without your TLS-patch (progress?).

The symbols are in.
DRI-Mesa/Patches> nm /usr/X11R6-NO-TLS/lib/modules/dri/r200_dri.so | grep 
r200ReadRGBASpan_ARGB8888
00175714 t r200ReadRGBASpan_ARGB8888
00175be4 t r200ReadRGBASpan_ARGB8888_MMX
00175ad4 t r200ReadRGBASpan_ARGB8888_SSE
001759c4 t r200ReadRGBASpan_ARGB8888_SSE2

But
DRI-Mesa/Patches> nm /usr/X11R6-NO-TLS/lib/modules/dri/r200_dri.so | grep 
_generic_read_RGBA_span_BGRA8888
         U _generic_read_RGBA_span_BGRA8888_REV_MMX
         U _generic_read_RGBA_span_BGRA8888_REV_SSE
         U _generic_read_RGBA_span_BGRA8888_REV_SSE2

I'm on XFree86 DRI CVS build as long as my distro based on it;-)

Any ideas?

-Dieter

BTW The old indirect mode is way faster then direct for me:

progs/demos> ./readpix
Mesa: software DXTn compression/decompression available
GL_VERSION = 1.3 Mesa 6.3
GL_RENDERER = Mesa DRI R200 20040929 AGP 4x x86/MMX+/3DNow!+/SSE TCL
Loaded 194 by 188 image

Benchmarking...
Result:  348 reads in 4.009000 seconds = 3165940.633574 pixels/sec
Benchmarking...
Result:  344 reads in 4.007000 seconds = 3131112.553032 pixels/sec
Benchmarking...
Result:  346 reads in 4.001000 seconds = 3154039.490127 pixels/sec
Benchmarking...
Result:  278 reads in 4.007000 seconds = 2530375.842276 pixels/sec
Benchmarking...
Result:  275 reads in 4.003000 seconds = 2505570.821884 pixels/sec
Benchmarking...
Result:  272 reads in 4.001000 seconds = 2479476.130967 pixels/sec
glDrawBuffer(GL_FRONT)
Benchmarking...
Result:  342 reads in 4.004000 seconds = 3115240.759241 pixels/sec
Benchmarking...
Result:  352 reads in 4.010000 seconds = 3201532.169576 pixels/sec
Benchmarking...
Result:  342 reads in 4.004000 seconds = 3115240.759241 pixels/sec
Benchmarking...
Result:  269 reads in 4.011000 seconds = 2446015.457492 pixels/sec
Benchmarking...
Result:  268 reads in 4.000000 seconds = 2443624.000000 pixels/sec
Benchmarking...
Result:  270 reads in 4.010000 seconds = 2455720.698254 pixels/sec


Mesa indirect:
progs/demos> ./readpix
GL_VERSION = 1.2 (1.5 Mesa 6.3)
GL_RENDERER = Mesa GLX Indirect
Loaded 194 by 188 image
Benchmarking...
Result:  1793 reads in 4.002000 seconds = 16340403.798101 pixels/sec
Benchmarking...
Result:  1797 reads in 4.000000 seconds = 16385046.000000 pixels/sec
Benchmarking...
Result:  1792 reads in 4.000000 seconds = 16339456.000000 pixels/sec
Benchmarking...
Result:  800 reads in 4.003000 seconds = 7288933.300025 pixels/sec
Benchmarking...
Result:  799 reads in 4.004000 seconds = 7278003.996004 pixels/sec
Benchmarking...
Result:  797 reads in 4.004000 seconds = 7259786.213786 pixels/sec
glDrawBuffer(GL_FRONT)
Benchmarking...
Result:  294 reads in 4.007000 seconds = 2676008.984278 pixels/sec
Benchmarking...
Result:  290 reads in 4.002000 seconds = 2642898.550725 pixels/sec
Benchmarking...
Result:  291 reads in 4.008000 seconds = 2648041.916168 pixels/sec
Benchmarking...
Result:  241 reads in 4.009000 seconds = 2192504.864056 pixels/sec
Benchmarking...
Result:  240 reads in 4.015000 seconds = 2180144.458281 pixels/sec
Benchmarking...
Result:  240 reads in 4.014000 seconds = 2180687.593423 pixels/sec


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
--
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to