Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-13 Thread James Almer

On 2/12/2024 11:54 PM, Nuo Mi wrote:

On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:


Besides simplifying address computations (it saves 432B of .text
in hevcdsp.o alone here) it also fixes undefined behaviour that
occurs if mx or my are 0 (happens when the filters are unused)
because they lead to an array index of -1 in the old code.
This happens in the checkasm-hevc_pel FATE-test.

Signed-off-by: Andreas Rheinhardt 

Thank you, Andreas,


Hi all,
If there are no objections, I'll merge it tomorrow.


Just pushed it. Thanks Andreas for fixing this.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-12 Thread Nuo Mi
On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:

> Besides simplifying address computations (it saves 432B of .text
> in hevcdsp.o alone here) it also fixes undefined behaviour that
> occurs if mx or my are 0 (happens when the filters are unused)
> because they lead to an array index of -1 in the old code.
> This happens in the checkasm-hevc_pel FATE-test.
>
> Signed-off-by: Andreas Rheinhardt 
>
> Thank you, Andreas,

Hi all,
If there are no objections, I'll merge it tomorrow.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-12 Thread Nuo Mi
On Mon, Feb 12, 2024 at 5:48 AM Christophe Gisquet <
christophe.gisq...@gmail.com> wrote:

> Le dim. 11 févr. 2024 à 12:37, Nuo Mi  a écrit :
> > > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = {
> > > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = {
> > >
> > Do you know why this is [4][16]? [4][8] should suffice.
>
> Probably so that all coefficient banks are aligned. Another use for it
> is you can directly use the address in some instruction instead of
> using/wasting a reg for holding the data.
>

Hi Christophe,
Thank you for the explanation.
However, epel (chroma) did not adhere to this. X86 doesn't require this
either since VVC didn't do it.
It's a bit odd that only some architectures and luma need this
I'll revisit this when I determine which architecture needs it and leave a
comment in the code accordingly.

>
> --
> Christophe Gisquet
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-11 Thread Christophe Gisquet
Le dim. 11 févr. 2024 à 12:37, Nuo Mi  a écrit :
> > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = {
> > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = {
> >
> Do you know why this is [4][16]? [4][8] should suffice.

Probably so that all coefficient banks are aligned. Another use for it
is you can directly use the address in some instruction instead of
using/wasting a reg for holding the data.

-- 
Christophe Gisquet
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-11 Thread Nuo Mi
On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:

> Besides simplifying address computations (it saves 432B of .text
> in hevcdsp.o alone here) it also fixes undefined behaviour that
>
We can save more if we change
void (*put_hevc_epel[10][2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t
srcstride, int height, intptr_t mx, intptr_t my, int width);
to
void (*put_hevc_epel[10][2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t
srcstride, int height, const int8_t *hf, const int8_t *vf, int width);
But it may need a lot of work.

occurs if mx or my are 0 (happens when the filters are unused)
> because they lead to an array index of -1 in the old code.
> This happens in the checkasm-hevc_pel FATE-test.
>
> Signed-off-by: Andreas Rheinhardt 
> ---
> The loongarch and mips parts of this are untested. Luckily we have a
> loongarch patchwork runner...
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-11 Thread Nuo Mi
On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:

> Besides simplifying address computations (it saves 432B of .text
> in hevcdsp.o alone here) it also fixes undefined behaviour that
> occurs if mx or my are 0 (happens when the filters are unused)
> because they lead to an array index of -1 in the old code.
> This happens in the checkasm-hevc_pel FATE-test.
>
> Signed-off-by: Andreas Rheinhardt 
> ---
> The loongarch and mips parts of this are untested. Luckily we have a
> loongarch patchwork runner...
>
>  libavcodec/hevcdsp.c|   6 +-
>  libavcodec/hevcdsp.h|   5 +-
>  libavcodec/hevcdsp_template.c   |  38 ++--
>  libavcodec/loongarch/hevc_mc.S  | 224 +---
>  libavcodec/loongarch/hevc_mc_bi_lsx.c   |   6 +-
>  libavcodec/loongarch/hevc_mc_uni_lsx.c  |   6 +-
>  libavcodec/loongarch/hevc_mc_uniw_lsx.c |   4 +-
>  libavcodec/loongarch/hevcdsp_lsx.c  |   6 +-
>  libavcodec/mips/hevc_mc_bi_msa.c|   6 +-
>  libavcodec/mips/hevc_mc_biw_msa.c   |   6 +-
>  libavcodec/mips/hevc_mc_uni_msa.c   |   6 +-
>  libavcodec/mips/hevc_mc_uniw_msa.c  |   6 +-
>  libavcodec/mips/hevcdsp_mmi.c   |  20 +--
>  libavcodec/mips/hevcdsp_msa.c   |   6 +-
>  libavcodec/x86/hevcdsp_init.c   |   4 +-
>  15 files changed, 112 insertions(+), 237 deletions(-)
>
> diff --git a/libavcodec/hevcdsp.c b/libavcodec/hevcdsp.c
> index 2ca551df1d..630fdc012e 100644
> --- a/libavcodec/hevcdsp.c
> +++ b/libavcodec/hevcdsp.c
> @@ -91,7 +91,8 @@ static const int8_t transform[32][32] = {
>90, -90,  88, -85,  82, -78,  73, -67,  61, -54,  46, -38,  31,
> -22,  13,  -4 },
>  };
>
> -DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[7][4] = {
> +DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[8][4] = {
> +{  0 },
>  { -2, 58, 10, -2},
>  { -4, 54, 16, -2},
>  { -6, 46, 28, -4},
> @@ -101,7 +102,8 @@ DECLARE_ALIGNED(16, const int8_t,
> ff_hevc_epel_filters)[7][4] = {
>  { -2, 10, 58, -2},
>  };
>
> -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = {
> +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = {
>
Do you know why this is [4][16]? [4][8] should suffice.
If some architecture requires 16, we might need to update
VVC_INTER_LUMA_TAPS to 16 in the future.
Thank you

> +{ 0 },
>  { -1,  4,-10, 58, 17, -5,  1,  0, -1,  4,-10, 58, 17, -5,  1,  0},
>  { -1,  4,-11, 40, 40,-11,  4, -1, -1,  4,-11, 40, 40,-11,  4, -1},
>  {  0,  1, -5, 17, 58,-10,  4, -1,  0,  1, -5, 17, 58,-10,  4, -1}
> diff --git a/libavcodec/hevcdsp.h b/libavcodec/hevcdsp.h
> index 1b9c5bb6bc..a5933dcac4 100644
>
> --
> 2.34.1
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing

2024-02-11 Thread Andreas Rheinhardt
Besides simplifying address computations (it saves 432B of .text
in hevcdsp.o alone here) it also fixes undefined behaviour that
occurs if mx or my are 0 (happens when the filters are unused)
because they lead to an array index of -1 in the old code.
This happens in the checkasm-hevc_pel FATE-test.

Signed-off-by: Andreas Rheinhardt 
---
The loongarch and mips parts of this are untested. Luckily we have a
loongarch patchwork runner...

 libavcodec/hevcdsp.c|   6 +-
 libavcodec/hevcdsp.h|   5 +-
 libavcodec/hevcdsp_template.c   |  38 ++--
 libavcodec/loongarch/hevc_mc.S  | 224 +---
 libavcodec/loongarch/hevc_mc_bi_lsx.c   |   6 +-
 libavcodec/loongarch/hevc_mc_uni_lsx.c  |   6 +-
 libavcodec/loongarch/hevc_mc_uniw_lsx.c |   4 +-
 libavcodec/loongarch/hevcdsp_lsx.c  |   6 +-
 libavcodec/mips/hevc_mc_bi_msa.c|   6 +-
 libavcodec/mips/hevc_mc_biw_msa.c   |   6 +-
 libavcodec/mips/hevc_mc_uni_msa.c   |   6 +-
 libavcodec/mips/hevc_mc_uniw_msa.c  |   6 +-
 libavcodec/mips/hevcdsp_mmi.c   |  20 +--
 libavcodec/mips/hevcdsp_msa.c   |   6 +-
 libavcodec/x86/hevcdsp_init.c   |   4 +-
 15 files changed, 112 insertions(+), 237 deletions(-)

diff --git a/libavcodec/hevcdsp.c b/libavcodec/hevcdsp.c
index 2ca551df1d..630fdc012e 100644
--- a/libavcodec/hevcdsp.c
+++ b/libavcodec/hevcdsp.c
@@ -91,7 +91,8 @@ static const int8_t transform[32][32] = {
   90, -90,  88, -85,  82, -78,  73, -67,  61, -54,  46, -38,  31, -22,  
13,  -4 },
 };
 
-DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[7][4] = {
+DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[8][4] = {
+{  0 },
 { -2, 58, 10, -2},
 { -4, 54, 16, -2},
 { -6, 46, 28, -4},
@@ -101,7 +102,8 @@ DECLARE_ALIGNED(16, const int8_t, 
ff_hevc_epel_filters)[7][4] = {
 { -2, 10, 58, -2},
 };
 
-DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = {
+DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = {
+{ 0 },
 { -1,  4,-10, 58, 17, -5,  1,  0, -1,  4,-10, 58, 17, -5,  1,  0},
 { -1,  4,-11, 40, 40,-11,  4, -1, -1,  4,-11, 40, 40,-11,  4, -1},
 {  0,  1, -5, 17, 58,-10,  4, -1,  0,  1, -5, 17, 58,-10,  4, -1}
diff --git a/libavcodec/hevcdsp.h b/libavcodec/hevcdsp.h
index 1b9c5bb6bc..a5933dcac4 100644
--- a/libavcodec/hevcdsp.h
+++ b/libavcodec/hevcdsp.h
@@ -126,8 +126,9 @@ typedef struct HEVCDSPContext {
 
 void ff_hevc_dsp_init(HEVCDSPContext *hpc, int bit_depth);
 
-extern const int8_t ff_hevc_epel_filters[7][4];
-extern const int8_t ff_hevc_qpel_filters[3][16];
+/** ff_hevc_.pel_filters[0] are dummies to simplify array addressing */
+extern const int8_t ff_hevc_epel_filters[8][4];
+extern const int8_t ff_hevc_qpel_filters[4][16];
 
 void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth);
 void ff_hevc_dsp_init_arm(HEVCDSPContext *c, const int bit_depth);
diff --git a/libavcodec/hevcdsp_template.c b/libavcodec/hevcdsp_template.c
index 9b48bdf08e..121c44c401 100644
--- a/libavcodec/hevcdsp_template.c
+++ b/libavcodec/hevcdsp_template.c
@@ -301,9 +301,9 @@ IDCT_DC(32)
 //
 

 #define ff_hevc_pel_filters ff_hevc_qpel_filters
-#define DECL_HV_FILTER(f)  \
-const uint8_t *hf = ff_hevc_ ## f ## _filters[mx - 1]; \
-const uint8_t *vf = ff_hevc_ ## f ## _filters[my - 1];
+#define DECL_HV_FILTER(f)  \
+const uint8_t *hf = ff_hevc_ ## f ## _filters[mx]; \
+const uint8_t *vf = ff_hevc_ ## f ## _filters[my];
 
 #define FW_PUT(p, f, t)
   \
 static void FUNC(put_hevc_## f)(int16_t *dst, const uint8_t *src, ptrdiff_t 
srcstride, int height,\
@@ -421,7 +421,7 @@ static void FUNC(put_hevc_qpel_bi_h)(uint8_t *_dst, 
ptrdiff_t _dststride, const
 pixel *dst  = (pixel *)_dst;
 ptrdiff_t dststride = _dststride / sizeof(pixel);
 
-const int8_t *filter= ff_hevc_qpel_filters[mx - 1];
+const int8_t *filter= ff_hevc_qpel_filters[mx];
 
 int shift = 14  + 1 - BIT_DEPTH;
 #if BIT_DEPTH < 14
@@ -449,7 +449,7 @@ static void FUNC(put_hevc_qpel_bi_v)(uint8_t *_dst, 
ptrdiff_t _dststride,
 pixel *dst  = (pixel *)_dst;
 ptrdiff_t dststride = _dststride / sizeof(pixel);
 
-const int8_t *filter= ff_hevc_qpel_filters[my - 1];
+const int8_t *filter= ff_hevc_qpel_filters[my];
 
 int shift = 14 + 1 - BIT_DEPTH;
 #if BIT_DEPTH < 14
@@ -487,7 +487,7 @@ static void FUNC(put_hevc_qpel_bi_hv)(uint8_t *_dst, 
ptrdiff_t _dststride,
 #endif
 
 src   -= QPEL_EXTRA_BEFORE * srcstride;
-filter = ff_hevc_qpel_filters[mx - 1];
+filter = ff_hevc_qpel_filters[mx];
 for (y = 0; y < height + QPEL_EXTRA; y++) {
 for (x = 0; x < width; x++)
 tmp[x] = QPEL_FILTER(src, 1) >>