Re: [libav-devel] [PATCH 2/2] x86: float_dsp: fix loading of the len parameter on x86-32

2012-12-06 Thread Justin Ruggles
On 12/05/2012 04:46 PM, Ronald S. Bultje wrote:
> Hi,
> 
> On Wed, Dec 5, 2012 at 9:53 AM, Justin Ruggles  
> wrote:
>> ---
>>  libavutil/x86/float_dsp.asm |3 +++
>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/libavutil/x86/float_dsp.asm b/libavutil/x86/float_dsp.asm
>> index 4a1742f..dc75532 100644
>> --- a/libavutil/x86/float_dsp.asm
>> +++ b/libavutil/x86/float_dsp.asm
>> @@ -127,6 +127,9 @@ cglobal vector_dmul_scalar, 3,3,3, dst, src, len
>>  cglobal vector_dmul_scalar, 4,4,3, dst, src, mul, len
>>  %endif
>>  %if ARCH_X86_32
>> +; PROLOGUE loads len from the wrong stack address because mul is an 
>> 8-byte
>> +; parameter and PROLOGUE assumes all parameters are 4-byte
>> +mov  lenq, [esp+0x18]
>>  VBROADCASTSD   m0, mulm
> 
> That loads len twice. Why not:
> 
> %if ARCH_X86_32
> cglobal vector_dmul_scalar, 3,4,3, dst, src, mul, len, lenaddr
> mov lenq, lenaddrm
> %else
> cglobal vector_dmul_scalar, 4,4,3, dst, src, mul, len
> %endif
> 
> which is more functionally correct?

That might be ok. At least I could reduce the loaded registers to 3 to
avoid the extra incorrect load. I'll test various options.

Thanks,
Justin
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/2] x86: float_dsp: fix loading of the len parameter on x86-32

2012-12-05 Thread Ronald S. Bultje
Hi,

On Wed, Dec 5, 2012 at 9:53 AM, Justin Ruggles  wrote:
> ---
>  libavutil/x86/float_dsp.asm |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/libavutil/x86/float_dsp.asm b/libavutil/x86/float_dsp.asm
> index 4a1742f..dc75532 100644
> --- a/libavutil/x86/float_dsp.asm
> +++ b/libavutil/x86/float_dsp.asm
> @@ -127,6 +127,9 @@ cglobal vector_dmul_scalar, 3,3,3, dst, src, len
>  cglobal vector_dmul_scalar, 4,4,3, dst, src, mul, len
>  %endif
>  %if ARCH_X86_32
> +; PROLOGUE loads len from the wrong stack address because mul is an 
> 8-byte
> +; parameter and PROLOGUE assumes all parameters are 4-byte
> +mov  lenq, [esp+0x18]
>  VBROADCASTSD   m0, mulm

That loads len twice. Why not:

%if ARCH_X86_32
cglobal vector_dmul_scalar, 3,4,3, dst, src, mul, len, lenaddr
mov lenq, lenaddrm
%else
cglobal vector_dmul_scalar, 4,4,3, dst, src, mul, len
%endif

which is more functionally correct?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel