Hi, On Sat, Dec 27, 2014 at 11:31 AM, Clément Bœsch <u...@pkh.me> wrote:
> On Sat, Dec 27, 2014 at 11:02:37AM -0500, Ronald S. Bultje wrote: > > --- > > libavcodec/x86/vp9lpf.asm | 21 ++++++++++++++++----- > > 1 file changed, 16 insertions(+), 5 deletions(-) > > > > diff --git a/libavcodec/x86/vp9lpf.asm b/libavcodec/x86/vp9lpf.asm > > index e0f7386..c62ac46 100644 > > --- a/libavcodec/x86/vp9lpf.asm > > +++ b/libavcodec/x86/vp9lpf.asm > > @@ -307,7 +307,20 @@ SECTION .text > > %endif > > %endmacro > > > > -%macro LOOPFILTER 2 ; %1=v/h %2=size1 > > +%macro LOOPFILTER 3 ; %1=v/h %2=size1 %3=stack > > +%if UNIX64 > > +cglobal vp9_loop_filter_%1_%2_16, 5, 9, 16, %3, dst, stride, E, I, H, > mstride, dst2, stride3, mstride3 > > +%else > > +%if WIN64 > > +cglobal vp9_loop_filter_%1_%2_16, 4, 8, 16, %3, dst, stride, E, I, > mstride, dst2, stride3, mstride3 > > +%else > > > +cglobal vp9_loop_filter_%1_%2_16, 2, 6, 16, %3, dst, stride, mstride, > dst2, stride3, mstride3 > > +%define Ed dword r2m > > +%define Id dword r3m > > +%endif > > +%define Hd dword r4m > > So every 32-bit arch end up here, right? > Well, rather, both win64 and x86-32. Unix64 preloads 6 registers to Hd is in a register upon function entry already, win64 has 4, so Hd is in stack; x86-32 has stack-only for argument-passing, so everything is in stack; we load dst/stride and keep the rest where it is to preserve registers. > > +%endif > > + > > mov mstrideq, strideq > > neg mstrideq > > > > @@ -795,10 +808,8 @@ SECTION .text > > > > %macro LPF_16_VH 2 > > INIT_XMM %2 > > -cglobal vp9_loop_filter_v_%1_16, 5,10,16, dst, stride, E, I, H, > mstride, dst2, stride3, mstride3 > > - LOOPFILTER v, %1 > > -cglobal vp9_loop_filter_h_%1_16, 5,10,16, 256, dst, stride, E, I, H, > mstride, dst2, stride3, mstride3 > > - LOOPFILTER h, %1 > > +LOOPFILTER v, %1, 0 > > +LOOPFILTER h, %1, 256 > > Should be OK assuming 0 is indeed the default stack size (x86inc seems to > suggest it to be set to 16 or 32 somehow). That's alignment if there's stack usage at all. 0 means no stack usage in the function at all, so we skip allocation, and none of the internal logic applies. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel