Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-15 Thread Martin Storsjö

On Wed, 15 Jun 2016, Janne Grunau wrote:


On 2016-06-13 21:04:15 +0300, Martin Storsjö wrote:

On Mon, 13 Jun 2016, Diego Biurrun wrote:


This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
---

Added missing changes for h264idct.

libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++
libavcodec/arm/h264dsp_init_arm.c | 37 +++
libavcodec/h264_mb.c  |  6 ++--
libavcodec/h264_mb_template.c |  2 +-
libavcodec/h264addpx_template.c   |  4 +--
libavcodec/h264dsp.h  | 49 ---
libavcodec/h264dsp_template.c | 36 +++
libavcodec/h264idct.h | 19 ++--
libavcodec/h264idct_template.c| 30 ---
libavcodec/ppc/h264dsp.c  | 31 +--
libavcodec/x86/h264_deblock.asm   | 20 ++---
libavcodec/x86/h264_deblock_10bit.asm | 14 -
libavcodec/x86/h264_idct.asm  | 48 +++---
libavcodec/x86/h264_idct_10bit.asm| 18 ++--
libavcodec/x86/h264_weight.asm|  4 +--
libavcodec/x86/h264_weight_10bit.asm  |  4 +--
libavcodec/x86/h264dsp_init.c | 22 +++---
tests/checkasm/h264dsp.c  |  4 +--
18 files changed, 199 insertions(+), 186 deletions(-)


This doesn't do the actual cleanup of the asm that I had expected.

For x86_64, you should remove the corresponding movsxd/movsxdifnidn, for
aarch64, you need to change use of wN (e.g. w3) into xN (e.g. x4) for those
parameters. If not, we're basically clipping the variable and just reading
the lower 32 bit, which is just about as wrong as what we did before
(although in practice it will probably be fine).

Do we have a policy about how to do this, is asm cleanup later ok? What does
e.g. James and Janne think about that for x86 and aarch64 respectively?


I'd prefer corrosponding chnages to C and asm code in the same commit


Also, this lacks the commit message that we discussed, pointing out
that
h264idct was lacking the necessary sign extension before.


I think we should fix the x86 asm first and then change the stride type


That's my opinion as well, first fixing the few omissions in the currently 
almost-consistent code, then later changing it to an (hopefully) easier 
version.


// Martin
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-14 Thread Janne Grunau
On 2016-06-13 21:04:15 +0300, Martin Storsjö wrote:
> On Mon, 13 Jun 2016, Diego Biurrun wrote:
> 
> >This avoids SIMD-optimized functions having to sign-extend their
> >line size argument manually to be able to do pointer arithmetic.
> >---
> >
> >Added missing changes for h264idct.
> >
> >libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++
> >libavcodec/arm/h264dsp_init_arm.c | 37 +++
> >libavcodec/h264_mb.c  |  6 ++--
> >libavcodec/h264_mb_template.c |  2 +-
> >libavcodec/h264addpx_template.c   |  4 +--
> >libavcodec/h264dsp.h  | 49 
> >---
> >libavcodec/h264dsp_template.c | 36 +++
> >libavcodec/h264idct.h | 19 ++--
> >libavcodec/h264idct_template.c| 30 ---
> >libavcodec/ppc/h264dsp.c  | 31 +--
> >libavcodec/x86/h264_deblock.asm   | 20 ++---
> >libavcodec/x86/h264_deblock_10bit.asm | 14 -
> >libavcodec/x86/h264_idct.asm  | 48 +++---
> >libavcodec/x86/h264_idct_10bit.asm| 18 ++--
> >libavcodec/x86/h264_weight.asm|  4 +--
> >libavcodec/x86/h264_weight_10bit.asm  |  4 +--
> >libavcodec/x86/h264dsp_init.c | 22 +++---
> >tests/checkasm/h264dsp.c  |  4 +--
> >18 files changed, 199 insertions(+), 186 deletions(-)
> 
> This doesn't do the actual cleanup of the asm that I had expected.
> 
> For x86_64, you should remove the corresponding movsxd/movsxdifnidn, for
> aarch64, you need to change use of wN (e.g. w3) into xN (e.g. x4) for those
> parameters. If not, we're basically clipping the variable and just reading
> the lower 32 bit, which is just about as wrong as what we did before
> (although in practice it will probably be fine).
> 
> Do we have a policy about how to do this, is asm cleanup later ok? What does
> e.g. James and Janne think about that for x86 and aarch64 respectively?

I'd prefer corrosponding chnages to C and asm code in the same commit

> Also, this lacks the commit message that we discussed, pointing out 
> that
> h264idct was lacking the necessary sign extension before.

I think we should fix the x86 asm first and then change the stride type

Janne
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-14 Thread Diego Biurrun
On Mon, Jun 13, 2016 at 09:04:15PM +0300, Martin Storsjö wrote:
> On Mon, 13 Jun 2016, Diego Biurrun wrote:
> > This avoids SIMD-optimized functions having to sign-extend their
> > line size argument manually to be able to do pointer arithmetic.
> > ---
> >
> > Added missing changes for h264idct.
> >
> > libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++
> > libavcodec/arm/h264dsp_init_arm.c | 37 +++
> > libavcodec/h264_mb.c  |  6 ++--
> > libavcodec/h264_mb_template.c |  2 +-
> > libavcodec/h264addpx_template.c   |  4 +--
> > libavcodec/h264dsp.h  | 49 
> > ---
> > libavcodec/h264dsp_template.c | 36 +++
> > libavcodec/h264idct.h | 19 ++--
> > libavcodec/h264idct_template.c| 30 ---
> > libavcodec/ppc/h264dsp.c  | 31 +--
> > libavcodec/x86/h264_deblock.asm   | 20 ++---
> > libavcodec/x86/h264_deblock_10bit.asm | 14 -
> > libavcodec/x86/h264_idct.asm  | 48 
> > +++---
> > libavcodec/x86/h264_idct_10bit.asm| 18 ++--
> > libavcodec/x86/h264_weight.asm|  4 +--
> > libavcodec/x86/h264_weight_10bit.asm  |  4 +--
> > libavcodec/x86/h264dsp_init.c | 22 +++---
> > tests/checkasm/h264dsp.c  |  4 +--
> > 18 files changed, 199 insertions(+), 186 deletions(-)
> 
> This doesn't do the actual cleanup of the asm that I had expected.

Yes, I know.  The more pressing problem, however, is that this patch
fails checkasm on power7.  Until that is resolved, this patch is on hold
anyway.

Diego
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-13 Thread Martin Storsjö

On Mon, 13 Jun 2016, Diego Biurrun wrote:


This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
---

Added missing changes for h264idct.

libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++
libavcodec/arm/h264dsp_init_arm.c | 37 +++
libavcodec/h264_mb.c  |  6 ++--
libavcodec/h264_mb_template.c |  2 +-
libavcodec/h264addpx_template.c   |  4 +--
libavcodec/h264dsp.h  | 49 ---
libavcodec/h264dsp_template.c | 36 +++
libavcodec/h264idct.h | 19 ++--
libavcodec/h264idct_template.c| 30 ---
libavcodec/ppc/h264dsp.c  | 31 +--
libavcodec/x86/h264_deblock.asm   | 20 ++---
libavcodec/x86/h264_deblock_10bit.asm | 14 -
libavcodec/x86/h264_idct.asm  | 48 +++---
libavcodec/x86/h264_idct_10bit.asm| 18 ++--
libavcodec/x86/h264_weight.asm|  4 +--
libavcodec/x86/h264_weight_10bit.asm  |  4 +--
libavcodec/x86/h264dsp_init.c | 22 +++---
tests/checkasm/h264dsp.c  |  4 +--
18 files changed, 199 insertions(+), 186 deletions(-)


This doesn't do the actual cleanup of the asm that I had expected.

For x86_64, you should remove the corresponding movsxd/movsxdifnidn, for 
aarch64, you need to change use of wN (e.g. w3) into xN (e.g. x4) for 
those parameters. If not, we're basically clipping the variable and just 
reading the lower 32 bit, which is just about as wrong as what we did 
before (although in practice it will probably be fine).


Do we have a policy about how to do this, is asm cleanup later ok? What 
does e.g. James and Janne think about that for x86 and aarch64 
respectively?


Also, this lacks the commit message that we discussed, pointing out that 
h264idct was lacking the necessary sign extension before.


// Martin
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-13 Thread Vittorio Giovara
On Mon, Jun 13, 2016 at 1:29 PM, Diego Biurrun  wrote:
> This avoids SIMD-optimized functions having to sign-extend their
> line size argument manually to be able to do pointer arithmetic.
> ---

ok if oracle is happy
-- 
Vittorio
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-13 Thread Diego Biurrun
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
---

Added missing changes for h264idct.

 libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++
 libavcodec/arm/h264dsp_init_arm.c | 37 +++
 libavcodec/h264_mb.c  |  6 ++--
 libavcodec/h264_mb_template.c |  2 +-
 libavcodec/h264addpx_template.c   |  4 +--
 libavcodec/h264dsp.h  | 49 ---
 libavcodec/h264dsp_template.c | 36 +++
 libavcodec/h264idct.h | 19 ++--
 libavcodec/h264idct_template.c| 30 ---
 libavcodec/ppc/h264dsp.c  | 31 +--
 libavcodec/x86/h264_deblock.asm   | 20 ++---
 libavcodec/x86/h264_deblock_10bit.asm | 14 -
 libavcodec/x86/h264_idct.asm  | 48 +++---
 libavcodec/x86/h264_idct_10bit.asm| 18 ++--
 libavcodec/x86/h264_weight.asm|  4 +--
 libavcodec/x86/h264_weight_10bit.asm  |  4 +--
 libavcodec/x86/h264dsp_init.c | 22 +++---
 tests/checkasm/h264dsp.c  |  4 +--
 18 files changed, 199 insertions(+), 186 deletions(-)

diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c 
b/libavcodec/aarch64/h264dsp_init_aarch64.c
index b106f11..25acf00 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -18,6 +18,7 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include 
 #include 
 
 #include "libavutil/attributes.h"
@@ -25,48 +26,48 @@
 #include "libavutil/aarch64/cpu.h"
 #include "libavcodec/h264dsp.h"
 
-void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
  int beta, int8_t *tc0);
-void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
  int beta, int8_t *tc0);
-void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int 
alpha,
int beta, int8_t *tc0);
-void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int 
alpha,
int beta, int8_t *tc0);
 
-void ff_weight_h264_pixels_16_neon(uint8_t *dst, int stride, int height,
+void ff_weight_h264_pixels_16_neon(uint8_t *dst, ptrdiff_t stride, int height,
int log2_den, int weight, int offset);
-void ff_weight_h264_pixels_8_neon(uint8_t *dst, int stride, int height,
+void ff_weight_h264_pixels_8_neon(uint8_t *dst, ptrdiff_t stride, int height,
   int log2_den, int weight, int offset);
-void ff_weight_h264_pixels_4_neon(uint8_t *dst, int stride, int height,
+void ff_weight_h264_pixels_4_neon(uint8_t *dst, ptrdiff_t stride, int height,
   int log2_den, int weight, int offset);
 
-void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, int stride,
+void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride,
  int height, int log2_den, int weightd,
  int weights, int offset);
-void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, int stride,
+void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride,
 int height, int log2_den, int weightd,
 int weights, int offset);
-void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, int stride,
+void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride,
 int height, int log2_den, int weightd,
 int weights, int offset);
 
-void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride);
-void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride);
+void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride);
 void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset,
- int16_t *block, int stride,
+ int16_t *block, ptrdiff_t stride,
  const uint8_t nnzc[6*8]);
 void ff_h264_idct_add16intra_neon(uint8_t *dst, const int *block_offset,
-  int16_t *block, int 

Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-13 Thread Luca Barbato
On 13/06/16 18:23, Diego Biurrun wrote:
> This avoids SIMD-optimized functions having to sign-extend their
> line size argument manually to be able to do pointer arithmetic.
> ---
>  libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++
>  libavcodec/arm/h264dsp_init_arm.c | 37 +++
>  libavcodec/h264_mb.c  |  6 ++--
>  libavcodec/h264_mb_template.c |  2 +-
>  libavcodec/h264dsp.h  | 49 
> ---
>  libavcodec/h264dsp_template.c | 36 +++
>  libavcodec/ppc/h264dsp.c  | 31 +--
>  libavcodec/x86/h264_deblock.asm   | 20 ++---
>  libavcodec/x86/h264_deblock_10bit.asm | 14 -
>  libavcodec/x86/h264_idct.asm  | 48 +++---
>  libavcodec/x86/h264_idct_10bit.asm| 18 ++--
>  libavcodec/x86/h264_weight.asm|  4 +--
>  libavcodec/x86/h264_weight_10bit.asm  |  4 +--
>  libavcodec/x86/h264dsp_init.c | 22 +++---
>  tests/checkasm/h264dsp.c  |  4 +--
>  15 files changed, 168 insertions(+), 164 deletions(-)

Sure.

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t

2016-06-13 Thread Diego Biurrun
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
---
 libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++
 libavcodec/arm/h264dsp_init_arm.c | 37 +++
 libavcodec/h264_mb.c  |  6 ++--
 libavcodec/h264_mb_template.c |  2 +-
 libavcodec/h264dsp.h  | 49 ---
 libavcodec/h264dsp_template.c | 36 +++
 libavcodec/ppc/h264dsp.c  | 31 +--
 libavcodec/x86/h264_deblock.asm   | 20 ++---
 libavcodec/x86/h264_deblock_10bit.asm | 14 -
 libavcodec/x86/h264_idct.asm  | 48 +++---
 libavcodec/x86/h264_idct_10bit.asm| 18 ++--
 libavcodec/x86/h264_weight.asm|  4 +--
 libavcodec/x86/h264_weight_10bit.asm  |  4 +--
 libavcodec/x86/h264dsp_init.c | 22 +++---
 tests/checkasm/h264dsp.c  |  4 +--
 15 files changed, 168 insertions(+), 164 deletions(-)

diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c 
b/libavcodec/aarch64/h264dsp_init_aarch64.c
index b106f11..25acf00 100644
--- a/libavcodec/aarch64/h264dsp_init_aarch64.c
+++ b/libavcodec/aarch64/h264dsp_init_aarch64.c
@@ -18,6 +18,7 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include 
 #include 
 
 #include "libavutil/attributes.h"
@@ -25,48 +26,48 @@
 #include "libavutil/aarch64/cpu.h"
 #include "libavcodec/h264dsp.h"
 
-void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
  int beta, int8_t *tc0);
-void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha,
  int beta, int8_t *tc0);
-void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int 
alpha,
int beta, int8_t *tc0);
-void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha,
+void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int 
alpha,
int beta, int8_t *tc0);
 
-void ff_weight_h264_pixels_16_neon(uint8_t *dst, int stride, int height,
+void ff_weight_h264_pixels_16_neon(uint8_t *dst, ptrdiff_t stride, int height,
int log2_den, int weight, int offset);
-void ff_weight_h264_pixels_8_neon(uint8_t *dst, int stride, int height,
+void ff_weight_h264_pixels_8_neon(uint8_t *dst, ptrdiff_t stride, int height,
   int log2_den, int weight, int offset);
-void ff_weight_h264_pixels_4_neon(uint8_t *dst, int stride, int height,
+void ff_weight_h264_pixels_4_neon(uint8_t *dst, ptrdiff_t stride, int height,
   int log2_den, int weight, int offset);
 
-void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, int stride,
+void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride,
  int height, int log2_den, int weightd,
  int weights, int offset);
-void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, int stride,
+void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride,
 int height, int log2_den, int weightd,
 int weights, int offset);
-void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, int stride,
+void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t 
stride,
 int height, int log2_den, int weightd,
 int weights, int offset);
 
-void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride);
-void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride);
+void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride);
+void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride);
 void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset,
- int16_t *block, int stride,
+ int16_t *block, ptrdiff_t stride,
  const uint8_t nnzc[6*8]);
 void ff_h264_idct_add16intra_neon(uint8_t *dst, const int *block_offset,
-  int16_t *block, int stride,
+  int16_t *block, ptrdiff_t stride,
   const uint8_t nnzc[6*8]);
 void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset,
-