Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Ronald S. Bultje rsbul...@gmail.com writes: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++ libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 --- libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 185 5 files changed, 191 insertions(+), 101 deletions(-) How is this different from the patch you sent yesterday? -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Sat, Dec 8, 2012 at 8:41 AM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++ libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 --- libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 185 5 files changed, 191 insertions(+), 101 deletions(-) How is this different from the patch you sent yesterday? It adds a missing %endrep for win64. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Sat, Dec 8, 2012 at 4:12 PM, Ronald S. Bultje rsbul...@gmail.com wrote: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++ libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 --- libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 185 5 files changed, 191 insertions(+), 101 deletions(-) One more fix for invalid stack free'ing if a YMM function on win64 used 6 registers, but no stack. Any more reviews, or can this be applied? Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
On 12/9/12 1:14 AM, Ronald S. Bultje wrote: Hi, On Sat, Dec 8, 2012 at 4:12 PM, Ronald S. Bultje rsbul...@gmail.com wrote: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++ libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 --- libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 185 5 files changed, 191 insertions(+), 101 deletions(-) One more fix for invalid stack free'ing if a YMM function on win64 used 6 registers, but no stack. Any more reviews, or can this be applied? I was waiting for the local x86 experts to chip in, did you test it on win64 and mac I take, I can try it on linux if nobody did already. lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
On 12/09/2012 02:05 AM, Luca Barbato wrote: I was waiting for the local x86 experts to chip in, did you test it on win64 and mac I take, I can try it on linux if nobody did already. on amd64 seems fine as well. I guess can be pushed tomorrow. lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
On Sat, Dec 08, 2012 at 08:42:36AM -0800, Ronald S. Bultje wrote: On Sat, Dec 8, 2012 at 8:41 AM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++ libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 --- libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 185 5 files changed, 191 insertions(+), 101 deletions(-) How is this different from the patch you sent yesterday? It adds a missing %endrep for win64. ... extra good karma for using --annotate with git-send-email ... Diego ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Ronald S. Bultje rsbul...@gmail.com writes: +%if mmsize = 16 HAVE_ALIGNED_STACK How much overhead would it be to drop HAVE_ALIGNED_STACK entirely? -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: +%if mmsize = 16 HAVE_ALIGNED_STACK How much overhead would it be to drop HAVE_ALIGNED_STACK entirely? Well, for now, we still have a ton of functions that don't use the cglobal-method of allocating stack. I only ported h264/vp8 loopfilter, nothing else. But anyway, more generally, it's 4-5 instructions per function. For typical functions with an inner loop, that's negligible, but for a select small set of functions, it may be significant. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Ronald S. Bultje rsbul...@gmail.com writes: Hi, On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: +%if mmsize = 16 HAVE_ALIGNED_STACK How much overhead would it be to drop HAVE_ALIGNED_STACK entirely? Well, for now, we still have a ton of functions that don't use the cglobal-method of allocating stack. I only ported h264/vp8 loopfilter, nothing else. But anyway, more generally, it's 4-5 instructions per function. For typical functions with an inner loop, that's negligible, but for a select small set of functions, it may be significant. The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx}, ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale. Besides, does anyone still use 32-bit where performance is that critical? -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Fri, Dec 7, 2012 at 2:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: +%if mmsize = 16 HAVE_ALIGNED_STACK How much overhead would it be to drop HAVE_ALIGNED_STACK entirely? Well, for now, we still have a ton of functions that don't use the cglobal-method of allocating stack. I only ported h264/vp8 loopfilter, nothing else. But anyway, more generally, it's 4-5 instructions per function. For typical functions with an inner loop, that's negligible, but for a select small set of functions, it may be significant. The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx}, ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale. lavr also. Besides, does anyone still use 32-bit where performance is that critical? This is used for YMM (e.g. avx float) stack alignment (to 32-byte) also, so it will affect 64-bit also. My personal point of view is that the code to take advantage of an actual feature of the compiler/system (alignment) is there. I don't see why we'd remove it. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Ronald S. Bultje rsbul...@gmail.com writes: Hi, On Fri, Dec 7, 2012 at 2:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: +%if mmsize = 16 HAVE_ALIGNED_STACK How much overhead would it be to drop HAVE_ALIGNED_STACK entirely? Well, for now, we still have a ton of functions that don't use the cglobal-method of allocating stack. I only ported h264/vp8 loopfilter, nothing else. But anyway, more generally, it's 4-5 instructions per function. For typical functions with an inner loop, that's negligible, but for a select small set of functions, it may be significant. The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx}, ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale. lavr also. There are no references to HAVE_ALIGNED_STACK there. Besides, does anyone still use 32-bit where performance is that critical? This is used for YMM (e.g. avx float) stack alignment (to 32-byte) also, so it will affect 64-bit also. My personal point of view is that the code to take advantage of an actual feature of the compiler/system (alignment) is there. I don't see why we'd remove it. Tracking how different compilers align the stack is a pain. -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Fri, Dec 7, 2012 at 2:08 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: Hi, On Fri, Dec 7, 2012 at 2:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: +%if mmsize = 16 HAVE_ALIGNED_STACK How much overhead would it be to drop HAVE_ALIGNED_STACK entirely? Well, for now, we still have a ton of functions that don't use the cglobal-method of allocating stack. I only ported h264/vp8 loopfilter, nothing else. But anyway, more generally, it's 4-5 instructions per function. For typical functions with an inner loop, that's negligible, but for a select small set of functions, it may be significant. The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx}, ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale. lavr also. There are no references to HAVE_ALIGNED_STACK there. It crashes on x86-32 icc10.x and msvc. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Ronald S. Bultje rsbul...@gmail.com writes: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++- libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 167 +--- 5 files changed, 181 insertions(+), 93 deletions(-) What happened to this? Is there something wrong with the patch? -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Wed, Oct 24, 2012 at 10:07 AM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++- libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 167 +--- 5 files changed, 181 insertions(+), 93 deletions(-) What happened to this? Is there something wrong with the patch? I am addressing reviews from the x264 people and am somewhat slow at testing new revisions because of other work... Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
On Fri, Oct 05, 2012 at 01:38:44PM -0700, Ronald S. Bultje wrote: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++- libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 - libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 160 +--- 5 files changed, 175 insertions(+), 92 deletions(-) This has tabs in many places - please fix your editor. Diego ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Fri, Oct 5, 2012 at 1:46 PM, Diego Biurrun di...@biurrun.de wrote: On Fri, Oct 05, 2012 at 01:38:44PM -0700, Ronald S. Bultje wrote: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). --- libavcodec/x86/h264_deblock.asm | 27 ++- libavcodec/x86/h264dsp_init.c | 4 +- libavcodec/x86/vp8dsp.asm | 68 - libavcodec/x86/vp8dsp_init.c| 8 -- libavutil/x86/x86inc.asm| 160 +--- 5 files changed, 175 insertions(+), 92 deletions(-) This has tabs in many places - please fix your editor. Fixed. Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Ronald S. Bultje rsbul...@gmail.com writes: Hi, On Fri, Sep 14, 2012 at 6:19 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8 DSP functions so they can be used if there is no aligned stack (e.g. MSVC 32bit). Note: it is currently slightly ugly, we need a register to store the unaligned stack; there may be better solutions for this. Please comment. You could store the original stack pointer at a fixed offset from the aligned stack pointer. I actually do that; the problem is that it means I can't directly use the original stack arguments (on x86-32: all of them), and I need to align stack before loading arguments off the stack (so I can share the instructions to reserve stack space with win64 xmm backup), so we're stuck in a catch-22 then. I can indeed use the last argument until I have loaded the args off the stack (final one clobbering itself by loading onto itself), but then I can't use stack arguments in the middle of the function (sws uses that in a few places). Copy the stack arguments to the proper offset on the aligned stack. -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Ronald S. Bultje rsbul...@gmail.com writes: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8 DSP functions so they can be used if there is no aligned stack (e.g. MSVC 32bit). Note: it is currently slightly ugly, we need a register to store the unaligned stack; there may be better solutions for this. Please comment. You could store the original stack pointer at a fixed offset from the aligned stack pointer. -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Hi, On Fri, Sep 14, 2012 at 6:19 PM, Måns Rullgård m...@mansr.com wrote: Ronald S. Bultje rsbul...@gmail.com writes: From: Ronald S. Bultje rsbul...@gmail.com Use this in VP8 DSP functions so they can be used if there is no aligned stack (e.g. MSVC 32bit). Note: it is currently slightly ugly, we need a register to store the unaligned stack; there may be better solutions for this. Please comment. You could store the original stack pointer at a fixed offset from the aligned stack pointer. I actually do that; the problem is that it means I can't directly use the original stack arguments (on x86-32: all of them), and I need to align stack before loading arguments off the stack (so I can share the instructions to reserve stack space with win64 xmm backup), so we're stuck in a catch-22 then. I can indeed use the last argument until I have loaded the args off the stack (final one clobbering itself by loading onto itself), but then I can't use stack arguments in the middle of the function (sws uses that in a few places). Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel