On 12/05/2012 04:46 PM, Ronald S. Bultje wrote:
> Hi,
>
> On Wed, Dec 5, 2012 at 9:53 AM, Justin Ruggles
> wrote:
>> ---
>> libavutil/x86/float_dsp.asm |3 +++
>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/libavutil/x86/float_dsp.asm b/libavutil/x86/float_dsp.asm
>> index 4a1742f..dc75532 100644
>> --- a/libavutil/x86/float_dsp.asm
>> +++ b/libavutil/x86/float_dsp.asm
>> @@ -127,6 +127,9 @@ cglobal vector_dmul_scalar, 3,3,3, dst, src, len
>> cglobal vector_dmul_scalar, 4,4,3, dst, src, mul, len
>> %endif
>> %if ARCH_X86_32
>> +; PROLOGUE loads len from the wrong stack address because mul is an
>> 8-byte
>> +; parameter and PROLOGUE assumes all parameters are 4-byte
>> +mov lenq, [esp+0x18]
>> VBROADCASTSD m0, mulm
>
> That loads len twice. Why not:
>
> %if ARCH_X86_32
> cglobal vector_dmul_scalar, 3,4,3, dst, src, mul, len, lenaddr
> mov lenq, lenaddrm
> %else
> cglobal vector_dmul_scalar, 4,4,3, dst, src, mul, len
> %endif
>
> which is more functionally correct?
That might be ok. At least I could reduce the loaded registers to 3 to
avoid the extra incorrect load. I'll test various options.
Thanks,
Justin
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel