Hi,

On Sat, Oct 15, 2011 at 7:01 AM, Ronald S. Bultje <[email protected]> wrote:
> On Sat, Oct 15, 2011 at 2:53 AM, Loren Merritt <[email protected]> 
> wrote:
>> On Fri, 14 Oct 2011, Ronald S. Bultje wrote:
>>
>>> +    packusdw        m0, m1
>>> +    packusdw        m2, m3
>>
>> sse4
>
> Ah, that's why Kieran's assembly was marked sse4. I'll make a
> sse2-version that needs a pmaxsw x, zero also then.
>
>> Are things usually unaligned?
>
> No, I'm a little too pessimistic in this patch. In fact, the src in
> this function is always aligned, so these should be mova. I'm not sure
> about dest, in my tests they tend to be aligned but I'm not sure if
> the API guarantees that. I don't think it does. I can test for
> alignment at function start and split the loop into two copies, one
> for aligned dest and one for unaligned dest.

New patch attached. If the aligned memory move is important to you,
I'll write a test that ensures alignment for >=sse2 and use a
mova-version of the same copy in that case.

Ronald

Attachment: 0001-swscale-write-yuv2plane1-MMX-SSE2-SSE4-functions.patch
Description: Binary data

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to