Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
On Wed, 15 Jun 2016, Janne Grunau wrote: On 2016-06-13 21:04:15 +0300, Martin Storsjö wrote: On Mon, 13 Jun 2016, Diego Biurrun wrote: This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. --- Added missing changes for h264idct. libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++ libavcodec/arm/h264dsp_init_arm.c | 37 +++ libavcodec/h264_mb.c | 6 ++-- libavcodec/h264_mb_template.c | 2 +- libavcodec/h264addpx_template.c | 4 +-- libavcodec/h264dsp.h | 49 --- libavcodec/h264dsp_template.c | 36 +++ libavcodec/h264idct.h | 19 ++-- libavcodec/h264idct_template.c| 30 --- libavcodec/ppc/h264dsp.c | 31 +-- libavcodec/x86/h264_deblock.asm | 20 ++--- libavcodec/x86/h264_deblock_10bit.asm | 14 - libavcodec/x86/h264_idct.asm | 48 +++--- libavcodec/x86/h264_idct_10bit.asm| 18 ++-- libavcodec/x86/h264_weight.asm| 4 +-- libavcodec/x86/h264_weight_10bit.asm | 4 +-- libavcodec/x86/h264dsp_init.c | 22 +++--- tests/checkasm/h264dsp.c | 4 +-- 18 files changed, 199 insertions(+), 186 deletions(-) This doesn't do the actual cleanup of the asm that I had expected. For x86_64, you should remove the corresponding movsxd/movsxdifnidn, for aarch64, you need to change use of wN (e.g. w3) into xN (e.g. x4) for those parameters. If not, we're basically clipping the variable and just reading the lower 32 bit, which is just about as wrong as what we did before (although in practice it will probably be fine). Do we have a policy about how to do this, is asm cleanup later ok? What does e.g. James and Janne think about that for x86 and aarch64 respectively? I'd prefer corrosponding chnages to C and asm code in the same commit Also, this lacks the commit message that we discussed, pointing out that h264idct was lacking the necessary sign extension before. I think we should fix the x86 asm first and then change the stride type That's my opinion as well, first fixing the few omissions in the currently almost-consistent code, then later changing it to an (hopefully) easier version. // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
On 2016-06-13 21:04:15 +0300, Martin Storsjö wrote: > On Mon, 13 Jun 2016, Diego Biurrun wrote: > > >This avoids SIMD-optimized functions having to sign-extend their > >line size argument manually to be able to do pointer arithmetic. > >--- > > > >Added missing changes for h264idct. > > > >libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++ > >libavcodec/arm/h264dsp_init_arm.c | 37 +++ > >libavcodec/h264_mb.c | 6 ++-- > >libavcodec/h264_mb_template.c | 2 +- > >libavcodec/h264addpx_template.c | 4 +-- > >libavcodec/h264dsp.h | 49 > >--- > >libavcodec/h264dsp_template.c | 36 +++ > >libavcodec/h264idct.h | 19 ++-- > >libavcodec/h264idct_template.c| 30 --- > >libavcodec/ppc/h264dsp.c | 31 +-- > >libavcodec/x86/h264_deblock.asm | 20 ++--- > >libavcodec/x86/h264_deblock_10bit.asm | 14 - > >libavcodec/x86/h264_idct.asm | 48 +++--- > >libavcodec/x86/h264_idct_10bit.asm| 18 ++-- > >libavcodec/x86/h264_weight.asm| 4 +-- > >libavcodec/x86/h264_weight_10bit.asm | 4 +-- > >libavcodec/x86/h264dsp_init.c | 22 +++--- > >tests/checkasm/h264dsp.c | 4 +-- > >18 files changed, 199 insertions(+), 186 deletions(-) > > This doesn't do the actual cleanup of the asm that I had expected. > > For x86_64, you should remove the corresponding movsxd/movsxdifnidn, for > aarch64, you need to change use of wN (e.g. w3) into xN (e.g. x4) for those > parameters. If not, we're basically clipping the variable and just reading > the lower 32 bit, which is just about as wrong as what we did before > (although in practice it will probably be fine). > > Do we have a policy about how to do this, is asm cleanup later ok? What does > e.g. James and Janne think about that for x86 and aarch64 respectively? I'd prefer corrosponding chnages to C and asm code in the same commit > Also, this lacks the commit message that we discussed, pointing out > that > h264idct was lacking the necessary sign extension before. I think we should fix the x86 asm first and then change the stride type Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
On Mon, Jun 13, 2016 at 09:04:15PM +0300, Martin Storsjö wrote: > On Mon, 13 Jun 2016, Diego Biurrun wrote: > > This avoids SIMD-optimized functions having to sign-extend their > > line size argument manually to be able to do pointer arithmetic. > > --- > > > > Added missing changes for h264idct. > > > > libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++ > > libavcodec/arm/h264dsp_init_arm.c | 37 +++ > > libavcodec/h264_mb.c | 6 ++-- > > libavcodec/h264_mb_template.c | 2 +- > > libavcodec/h264addpx_template.c | 4 +-- > > libavcodec/h264dsp.h | 49 > > --- > > libavcodec/h264dsp_template.c | 36 +++ > > libavcodec/h264idct.h | 19 ++-- > > libavcodec/h264idct_template.c| 30 --- > > libavcodec/ppc/h264dsp.c | 31 +-- > > libavcodec/x86/h264_deblock.asm | 20 ++--- > > libavcodec/x86/h264_deblock_10bit.asm | 14 - > > libavcodec/x86/h264_idct.asm | 48 > > +++--- > > libavcodec/x86/h264_idct_10bit.asm| 18 ++-- > > libavcodec/x86/h264_weight.asm| 4 +-- > > libavcodec/x86/h264_weight_10bit.asm | 4 +-- > > libavcodec/x86/h264dsp_init.c | 22 +++--- > > tests/checkasm/h264dsp.c | 4 +-- > > 18 files changed, 199 insertions(+), 186 deletions(-) > > This doesn't do the actual cleanup of the asm that I had expected. Yes, I know. The more pressing problem, however, is that this patch fails checkasm on power7. Until that is resolved, this patch is on hold anyway. Diego ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
On Mon, 13 Jun 2016, Diego Biurrun wrote: This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. --- Added missing changes for h264idct. libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++ libavcodec/arm/h264dsp_init_arm.c | 37 +++ libavcodec/h264_mb.c | 6 ++-- libavcodec/h264_mb_template.c | 2 +- libavcodec/h264addpx_template.c | 4 +-- libavcodec/h264dsp.h | 49 --- libavcodec/h264dsp_template.c | 36 +++ libavcodec/h264idct.h | 19 ++-- libavcodec/h264idct_template.c| 30 --- libavcodec/ppc/h264dsp.c | 31 +-- libavcodec/x86/h264_deblock.asm | 20 ++--- libavcodec/x86/h264_deblock_10bit.asm | 14 - libavcodec/x86/h264_idct.asm | 48 +++--- libavcodec/x86/h264_idct_10bit.asm| 18 ++-- libavcodec/x86/h264_weight.asm| 4 +-- libavcodec/x86/h264_weight_10bit.asm | 4 +-- libavcodec/x86/h264dsp_init.c | 22 +++--- tests/checkasm/h264dsp.c | 4 +-- 18 files changed, 199 insertions(+), 186 deletions(-) This doesn't do the actual cleanup of the asm that I had expected. For x86_64, you should remove the corresponding movsxd/movsxdifnidn, for aarch64, you need to change use of wN (e.g. w3) into xN (e.g. x4) for those parameters. If not, we're basically clipping the variable and just reading the lower 32 bit, which is just about as wrong as what we did before (although in practice it will probably be fine). Do we have a policy about how to do this, is asm cleanup later ok? What does e.g. James and Janne think about that for x86 and aarch64 respectively? Also, this lacks the commit message that we discussed, pointing out that h264idct was lacking the necessary sign extension before. // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
On Mon, Jun 13, 2016 at 1:29 PM, Diego Biurrunwrote: > This avoids SIMD-optimized functions having to sign-extend their > line size argument manually to be able to do pointer arithmetic. > --- ok if oracle is happy -- Vittorio ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. --- Added missing changes for h264idct. libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++ libavcodec/arm/h264dsp_init_arm.c | 37 +++ libavcodec/h264_mb.c | 6 ++-- libavcodec/h264_mb_template.c | 2 +- libavcodec/h264addpx_template.c | 4 +-- libavcodec/h264dsp.h | 49 --- libavcodec/h264dsp_template.c | 36 +++ libavcodec/h264idct.h | 19 ++-- libavcodec/h264idct_template.c| 30 --- libavcodec/ppc/h264dsp.c | 31 +-- libavcodec/x86/h264_deblock.asm | 20 ++--- libavcodec/x86/h264_deblock_10bit.asm | 14 - libavcodec/x86/h264_idct.asm | 48 +++--- libavcodec/x86/h264_idct_10bit.asm| 18 ++-- libavcodec/x86/h264_weight.asm| 4 +-- libavcodec/x86/h264_weight_10bit.asm | 4 +-- libavcodec/x86/h264dsp_init.c | 22 +++--- tests/checkasm/h264dsp.c | 4 +-- 18 files changed, 199 insertions(+), 186 deletions(-) diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c index b106f11..25acf00 100644 --- a/libavcodec/aarch64/h264dsp_init_aarch64.c +++ b/libavcodec/aarch64/h264dsp_init_aarch64.c @@ -18,6 +18,7 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include #include #include "libavutil/attributes.h" @@ -25,48 +26,48 @@ #include "libavutil/aarch64/cpu.h" #include "libavcodec/h264dsp.h" -void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_weight_h264_pixels_16_neon(uint8_t *dst, int stride, int height, +void ff_weight_h264_pixels_16_neon(uint8_t *dst, ptrdiff_t stride, int height, int log2_den, int weight, int offset); -void ff_weight_h264_pixels_8_neon(uint8_t *dst, int stride, int height, +void ff_weight_h264_pixels_8_neon(uint8_t *dst, ptrdiff_t stride, int height, int log2_den, int weight, int offset); -void ff_weight_h264_pixels_4_neon(uint8_t *dst, int stride, int height, +void ff_weight_h264_pixels_4_neon(uint8_t *dst, ptrdiff_t stride, int height, int log2_den, int weight, int offset); -void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, int stride, +void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int height, int log2_den, int weightd, int weights, int offset); -void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, int stride, +void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int height, int log2_den, int weightd, int weights, int offset); -void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, int stride, +void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int height, int log2_den, int weightd, int weights, int offset); -void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride); -void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride); +void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride); void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset, - int16_t *block, int stride, + int16_t *block, ptrdiff_t stride, const uint8_t nnzc[6*8]); void ff_h264_idct_add16intra_neon(uint8_t *dst, const int *block_offset, - int16_t *block, int
Re: [libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
On 13/06/16 18:23, Diego Biurrun wrote: > This avoids SIMD-optimized functions having to sign-extend their > line size argument manually to be able to do pointer arithmetic. > --- > libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++ > libavcodec/arm/h264dsp_init_arm.c | 37 +++ > libavcodec/h264_mb.c | 6 ++-- > libavcodec/h264_mb_template.c | 2 +- > libavcodec/h264dsp.h | 49 > --- > libavcodec/h264dsp_template.c | 36 +++ > libavcodec/ppc/h264dsp.c | 31 +-- > libavcodec/x86/h264_deblock.asm | 20 ++--- > libavcodec/x86/h264_deblock_10bit.asm | 14 - > libavcodec/x86/h264_idct.asm | 48 +++--- > libavcodec/x86/h264_idct_10bit.asm| 18 ++-- > libavcodec/x86/h264_weight.asm| 4 +-- > libavcodec/x86/h264_weight_10bit.asm | 4 +-- > libavcodec/x86/h264dsp_init.c | 22 +++--- > tests/checkasm/h264dsp.c | 4 +-- > 15 files changed, 168 insertions(+), 164 deletions(-) Sure. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] h264dsp: Change type of stride parameters to ptrdiff_t
This avoids SIMD-optimized functions having to sign-extend their line size argument manually to be able to do pointer arithmetic. --- libavcodec/aarch64/h264dsp_init_aarch64.c | 37 +++ libavcodec/arm/h264dsp_init_arm.c | 37 +++ libavcodec/h264_mb.c | 6 ++-- libavcodec/h264_mb_template.c | 2 +- libavcodec/h264dsp.h | 49 --- libavcodec/h264dsp_template.c | 36 +++ libavcodec/ppc/h264dsp.c | 31 +-- libavcodec/x86/h264_deblock.asm | 20 ++--- libavcodec/x86/h264_deblock_10bit.asm | 14 - libavcodec/x86/h264_idct.asm | 48 +++--- libavcodec/x86/h264_idct_10bit.asm| 18 ++-- libavcodec/x86/h264_weight.asm| 4 +-- libavcodec/x86/h264_weight_10bit.asm | 4 +-- libavcodec/x86/h264dsp_init.c | 22 +++--- tests/checkasm/h264dsp.c | 4 +-- 15 files changed, 168 insertions(+), 164 deletions(-) diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c index b106f11..25acf00 100644 --- a/libavcodec/aarch64/h264dsp_init_aarch64.c +++ b/libavcodec/aarch64/h264dsp_init_aarch64.c @@ -18,6 +18,7 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include #include #include "libavutil/attributes.h" @@ -25,48 +26,48 @@ #include "libavutil/aarch64/cpu.h" #include "libavcodec/h264dsp.h" -void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_v_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_h_loop_filter_luma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_v_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, int stride, int alpha, +void ff_h264_h_loop_filter_chroma_neon(uint8_t *pix, ptrdiff_t stride, int alpha, int beta, int8_t *tc0); -void ff_weight_h264_pixels_16_neon(uint8_t *dst, int stride, int height, +void ff_weight_h264_pixels_16_neon(uint8_t *dst, ptrdiff_t stride, int height, int log2_den, int weight, int offset); -void ff_weight_h264_pixels_8_neon(uint8_t *dst, int stride, int height, +void ff_weight_h264_pixels_8_neon(uint8_t *dst, ptrdiff_t stride, int height, int log2_den, int weight, int offset); -void ff_weight_h264_pixels_4_neon(uint8_t *dst, int stride, int height, +void ff_weight_h264_pixels_4_neon(uint8_t *dst, ptrdiff_t stride, int height, int log2_den, int weight, int offset); -void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, int stride, +void ff_biweight_h264_pixels_16_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int height, int log2_den, int weightd, int weights, int offset); -void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, int stride, +void ff_biweight_h264_pixels_8_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int height, int log2_den, int weightd, int weights, int offset); -void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, int stride, +void ff_biweight_h264_pixels_4_neon(uint8_t *dst, uint8_t *src, ptrdiff_t stride, int height, int log2_den, int weightd, int weights, int offset); -void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, int stride); -void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, int stride); +void ff_h264_idct_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride); +void ff_h264_idct_dc_add_neon(uint8_t *dst, int16_t *block, ptrdiff_t stride); void ff_h264_idct_add16_neon(uint8_t *dst, const int *block_offset, - int16_t *block, int stride, + int16_t *block, ptrdiff_t stride, const uint8_t nnzc[6*8]); void ff_h264_idct_add16intra_neon(uint8_t *dst, const int *block_offset, - int16_t *block, int stride, + int16_t *block, ptrdiff_t stride, const uint8_t nnzc[6*8]); void ff_h264_idct_add8_neon(uint8_t **dest, const int *block_offset, -