On Thu, 5 Jan 2012, Vitor Sessak wrote:

>+; input  %1={x1,x2,x3,x4}, %2={y1,y2,y3,y4}
>+; output %3={x4,y1,y2,y3}
>+%macro ROTLEFT_SSE 3
>+    BUILDINVHIGHLOW %1, %2, %3
>+    shufps  %3, %3, %2, 0x99
>+%endmacro
(and other such macros)

If some macro args can be described as output and some as input, then
output should come first, to match the order of instruction arguments.

>+%macro PSHUFD_SSE_AVX 3
>+    shufps %1, %2, %2, %3
>+%endmacro
>+%macro PSHUFD_SSE2 3
>+    pshufd %1, %2, %3
>+%endmacro

The recommended way to write such things has changed since you previously
posted this patch.

%macro PSHUFD 3
%if cpuflag(sse2) && notcpuflag(avx)
    pshufd %1, %2, %3
%else
    shufps %1, %2, %2, %3
%endif
%endmacro

This eliminates the defines at toplevel that used to be needed to select
an implementation.

>+%macro SPILL 2 ; xmm#, mempos
>+    movaps [tmpq+(%2-8)*16 + 32*4], m%1
>+%endmacro
>+%macro UNSPILL 2
>+    movaps m%1, [tmpq+(%2-8)*16 + 32*4]
>+%endmacro
>+%define SPILLED(x) [tmpq+(x-8)*16 + 32*4]

Use SPILLED in defining SPILL.

>+%define mova movaps
>+%define movu movups

cglobal undoes this. But it becomes unnecessary with cpuflags if you only
have a sse1 version.

> AVX_INSTR movsd, 1, 0, 0
> AVX_INSTR movss, 1, 0, 0
> AVX_INSTR mpsadbw, 0, 1, 0
>+AVX_INSTR movhlps, 1, 0, 0
>+AVX_INSTR movlhps, 1, 0, 0
> AVX_INSTR mulpd, 1, 0, 1
> AVX_INSTR mulps, 1, 0, 1
> AVX_INSTR mulsd, 1, 0, 1

Alphabetize.

>int align_end = count - (count & 3);

How much faster is ff_four_imdct36_float_sse? If you have 3 trailing
blocks, should you round up? Caveat: make sure any unused space that is
processed by simd float arithmetic contains valid floats, because NANs
are slow.

--Loren Merritt
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to