Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-08 Thread Måns Rullgård
Ronald S. Bultje rsbul...@gmail.com writes:

 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8/H264-8bit loopfilter functions so they can be used if
 there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
 ---
  libavcodec/x86/h264_deblock.asm |  27 ++
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 ---
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 185 
 
  5 files changed, 191 insertions(+), 101 deletions(-)

How is this different from the patch you sent yesterday?

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-08 Thread Ronald S. Bultje
Hi,

On Sat, Dec 8, 2012 at 8:41 AM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8/H264-8bit loopfilter functions so they can be used if
 there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
 ---
  libavcodec/x86/h264_deblock.asm |  27 ++
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 ---
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 185 
 
  5 files changed, 191 insertions(+), 101 deletions(-)

 How is this different from the patch you sent yesterday?

It adds a missing %endrep for win64.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-08 Thread Ronald S. Bultje
Hi,

On Sat, Dec 8, 2012 at 4:12 PM, Ronald S. Bultje rsbul...@gmail.com wrote:
 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8/H264-8bit loopfilter functions so they can be used if
 there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
 ---
  libavcodec/x86/h264_deblock.asm |  27 ++
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 ---
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 185 
 
  5 files changed, 191 insertions(+), 101 deletions(-)

One more fix for invalid stack free'ing if a YMM function on win64
used 6 registers, but no stack. Any more reviews, or can this be
applied?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-08 Thread Luca Barbato

On 12/9/12 1:14 AM, Ronald S. Bultje wrote:

Hi,

On Sat, Dec 8, 2012 at 4:12 PM, Ronald S. Bultje rsbul...@gmail.com wrote:

From: Ronald S. Bultje rsbul...@gmail.com

Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
---
  libavcodec/x86/h264_deblock.asm |  27 ++
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 ---
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 185 
  5 files changed, 191 insertions(+), 101 deletions(-)


One more fix for invalid stack free'ing if a YMM function on win64
used 6 registers, but no stack. Any more reviews, or can this be
applied?


I was waiting for the local x86 experts to chip in, did you test it on 
win64 and mac I take, I can try it on linux if nobody did already.


lu

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-08 Thread Luca Barbato
On 12/09/2012 02:05 AM, Luca Barbato wrote:
 I was waiting for the local x86 experts to chip in, did you test it on
 win64 and mac I take, I can try it on linux if nobody did already.

on amd64 seems fine as well.

I guess can be pushed tomorrow.

lu

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-08 Thread Diego Biurrun
On Sat, Dec 08, 2012 at 08:42:36AM -0800, Ronald S. Bultje wrote:
 On Sat, Dec 8, 2012 at 8:41 AM, Måns Rullgård m...@mansr.com wrote:
  Ronald S. Bultje rsbul...@gmail.com writes:
  Use this in VP8/H264-8bit loopfilter functions so they can be used if
  there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
  ---
   libavcodec/x86/h264_deblock.asm |  27 ++
   libavcodec/x86/h264dsp_init.c   |   4 +-
   libavcodec/x86/vp8dsp.asm   |  68 ---
   libavcodec/x86/vp8dsp_init.c|   8 --
   libavutil/x86/x86inc.asm| 185 
  
   5 files changed, 191 insertions(+), 101 deletions(-)
 
  How is this different from the patch you sent yesterday?
 
 It adds a missing %endrep for win64.

... extra good karma for using --annotate with git-send-email ...

Diego
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-07 Thread Måns Rullgård
Ronald S. Bultje rsbul...@gmail.com writes:

 +%if mmsize = 16  HAVE_ALIGNED_STACK

How much overhead would it be to drop HAVE_ALIGNED_STACK entirely?

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-07 Thread Ronald S. Bultje
Hi,

On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 +%if mmsize = 16  HAVE_ALIGNED_STACK

 How much overhead would it be to drop HAVE_ALIGNED_STACK entirely?

Well, for now, we still have a ton of functions that don't use the
cglobal-method of allocating stack. I only ported h264/vp8 loopfilter,
nothing else.

But anyway, more generally, it's 4-5 instructions per function. For
typical functions with an inner loop, that's negligible, but for a
select small set of functions, it may be significant.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-07 Thread Måns Rullgård
Ronald S. Bultje rsbul...@gmail.com writes:

 Hi,

 On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 +%if mmsize = 16  HAVE_ALIGNED_STACK

 How much overhead would it be to drop HAVE_ALIGNED_STACK entirely?

 Well, for now, we still have a ton of functions that don't use the
 cglobal-method of allocating stack. I only ported h264/vp8 loopfilter,
 nothing else.

 But anyway, more generally, it's 4-5 instructions per function. For
 typical functions with an inner loop, that's negligible, but for a
 select small set of functions, it may be significant.

The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx},
ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale.

Besides, does anyone still use 32-bit where performance is that
critical?

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-07 Thread Ronald S. Bultje
Hi,

On Fri, Dec 7, 2012 at 2:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:
 On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 +%if mmsize = 16  HAVE_ALIGNED_STACK

 How much overhead would it be to drop HAVE_ALIGNED_STACK entirely?

 Well, for now, we still have a ton of functions that don't use the
 cglobal-method of allocating stack. I only ported h264/vp8 loopfilter,
 nothing else.

 But anyway, more generally, it's 4-5 instructions per function. For
 typical functions with an inner loop, that's negligible, but for a
 select small set of functions, it may be significant.

 The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx},
 ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale.

lavr also.

 Besides, does anyone still use 32-bit where performance is that
 critical?

This is used for YMM (e.g. avx float) stack alignment (to 32-byte)
also, so it will affect 64-bit also. My personal point of view is that
the code to take advantage of an actual feature of the compiler/system
(alignment) is there. I don't see why we'd remove it.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-07 Thread Måns Rullgård
Ronald S. Bultje rsbul...@gmail.com writes:

 Hi,

 On Fri, Dec 7, 2012 at 2:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:
 On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 +%if mmsize = 16  HAVE_ALIGNED_STACK

 How much overhead would it be to drop HAVE_ALIGNED_STACK entirely?

 Well, for now, we still have a ton of functions that don't use the
 cglobal-method of allocating stack. I only ported h264/vp8 loopfilter,
 nothing else.

 But anyway, more generally, it's 4-5 instructions per function. For
 typical functions with an inner loop, that's negligible, but for a
 select small set of functions, it may be significant.

 The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx},
 ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale.

 lavr also.

There are no references to HAVE_ALIGNED_STACK there.

 Besides, does anyone still use 32-bit where performance is that
 critical?

 This is used for YMM (e.g. avx float) stack alignment (to 32-byte)
 also, so it will affect 64-bit also. My personal point of view is that
 the code to take advantage of an actual feature of the compiler/system
 (alignment) is there. I don't see why we'd remove it.

Tracking how different compilers align the stack is a pain.

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-12-07 Thread Ronald S. Bultje
Hi,

On Fri, Dec 7, 2012 at 2:08 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 Hi,

 On Fri, Dec 7, 2012 at 2:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:
 On Fri, Dec 7, 2012 at 1:01 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 +%if mmsize = 16  HAVE_ALIGNED_STACK

 How much overhead would it be to drop HAVE_ALIGNED_STACK entirely?

 Well, for now, we still have a ton of functions that don't use the
 cglobal-method of allocating stack. I only ported h264/vp8 loopfilter,
 nothing else.

 But anyway, more generally, it's 4-5 instructions per function. For
 typical functions with an inner loop, that's negligible, but for a
 select small set of functions, it may be significant.

 The remaining functions are ff_h264_idct8_add(4)_10_{sse2,avx},
 ff_hadamard8_diff(16)_{sse2,ssse3}, and something in swscale.

 lavr also.

 There are no references to HAVE_ALIGNED_STACK there.

It crashes on x86-32 icc10.x and msvc.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-10-24 Thread Måns Rullgård
Ronald S. Bultje rsbul...@gmail.com writes:

 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8/H264-8bit loopfilter functions so they can be used if
 there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
 ---
  libavcodec/x86/h264_deblock.asm |  27 ++-
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 167 
 +---
  5 files changed, 181 insertions(+), 93 deletions(-)

What happened to this?  Is there something wrong with the patch?

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-10-24 Thread Ronald S. Bultje
Hi,

On Wed, Oct 24, 2012 at 10:07 AM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8/H264-8bit loopfilter functions so they can be used if
 there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
 ---
  libavcodec/x86/h264_deblock.asm |  27 ++-
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 167 
 +---
  5 files changed, 181 insertions(+), 93 deletions(-)

 What happened to this?  Is there something wrong with the patch?

I am addressing reviews from the x264 people and am somewhat slow at
testing new revisions because of other work...

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-10-05 Thread Diego Biurrun
On Fri, Oct 05, 2012 at 01:38:44PM -0700, Ronald S. Bultje wrote:
 From: Ronald S. Bultje rsbul...@gmail.com
 
 Use this in VP8/H264-8bit loopfilter functions so they can be used if
 there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
 ---
  libavcodec/x86/h264_deblock.asm |  27 ++-
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 -
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 160 
 +---
  5 files changed, 175 insertions(+), 92 deletions(-)

This has tabs in many places - please fix your editor.

Diego
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-10-05 Thread Ronald S. Bultje
Hi,

On Fri, Oct 5, 2012 at 1:46 PM, Diego Biurrun di...@biurrun.de wrote:
 On Fri, Oct 05, 2012 at 01:38:44PM -0700, Ronald S. Bultje wrote:
 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8/H264-8bit loopfilter functions so they can be used if
 there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
 ---
  libavcodec/x86/h264_deblock.asm |  27 ++-
  libavcodec/x86/h264dsp_init.c   |   4 +-
  libavcodec/x86/vp8dsp.asm   |  68 -
  libavcodec/x86/vp8dsp_init.c|   8 --
  libavutil/x86/x86inc.asm| 160 
 +---
  5 files changed, 175 insertions(+), 92 deletions(-)

 This has tabs in many places - please fix your editor.

Fixed.

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-09-15 Thread Måns Rullgård
Ronald S. Bultje rsbul...@gmail.com writes:

 Hi,

 On Fri, Sep 14, 2012 at 6:19 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8 DSP functions so they can be used if there is no
 aligned stack (e.g. MSVC 32bit).

 Note: it is currently slightly ugly, we need a register to store the
 unaligned stack; there may be better solutions for this. Please comment.

 You could store the original stack pointer at a fixed offset from the
 aligned stack pointer.

 I actually do that; the problem is that it means I can't directly use
 the original stack arguments (on x86-32: all of them), and I need to
 align stack before loading arguments off the stack (so I can share the
 instructions to reserve stack space with win64 xmm backup), so we're
 stuck in a catch-22 then.

 I can indeed use the last argument until I have loaded the args off
 the stack (final one clobbering itself by loading onto itself), but
 then I can't use stack arguments in the middle of the function (sws
 uses that in a few places).

Copy the stack arguments to the proper offset on the aligned stack.

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-09-14 Thread Måns Rullgård
Ronald S. Bultje rsbul...@gmail.com writes:

 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8 DSP functions so they can be used if there is no
 aligned stack (e.g. MSVC 32bit).

 Note: it is currently slightly ugly, we need a register to store the
 unaligned stack; there may be better solutions for this. Please comment.

You could store the original stack pointer at a fixed offset from the
aligned stack pointer.

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] x86inc: support stack mem allocation and re-alignment in PROLOGUE.

2012-09-14 Thread Ronald S. Bultje
Hi,

On Fri, Sep 14, 2012 at 6:19 PM, Måns Rullgård m...@mansr.com wrote:
 Ronald S. Bultje rsbul...@gmail.com writes:

 From: Ronald S. Bultje rsbul...@gmail.com

 Use this in VP8 DSP functions so they can be used if there is no
 aligned stack (e.g. MSVC 32bit).

 Note: it is currently slightly ugly, we need a register to store the
 unaligned stack; there may be better solutions for this. Please comment.

 You could store the original stack pointer at a fixed offset from the
 aligned stack pointer.

I actually do that; the problem is that it means I can't directly use
the original stack arguments (on x86-32: all of them), and I need to
align stack before loading arguments off the stack (so I can share the
instructions to reserve stack space with win64 xmm backup), so we're
stuck in a catch-22 then.

I can indeed use the last argument until I have loaded the args off
the stack (final one clobbering itself by loading onto itself), but
then I can't use stack arguments in the middle of the function (sws
uses that in a few places).

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel