Alan Cox wrote:
On Mer, 2004-10-06 at 19:36, Ian Romanick wrote:

from video RAM to system RAM. It has to convert the pixel data from its native, on-card format to RGBA8888. In the case of my patch, it converts from BGRA to RGBA while doing the copy. That's why it needs the SSE2 shift instructions.

From the data Soreen posted it seems to come down to "how many bytes can
you pull at once", the rest is noise to the PCI latency.

That matches what I saw. I tested both routines (MMX & SSE2) outside Mesa with the source and destination buffers in main memory. Both routines were obviously much faster. However, I noticed that the SSE2 version took less of a hit going to vram->system than the MMX version.


Here's my question. Is there any way to "trick" it into doing back-to-back reads as a single PCI transfer? So, if I did something like:

        movaps  (%ebx), %xmm0
        movaps  16(%ebx), %xmm1

It would do a single 32-byte PCI transfer? I /assume/ there isn't any way to do so. When I unrolled the inner loop of the SSE2 version one time (and had code like the above), the performance increase was on the order of 1%.


------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl -- _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to