Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
On 2/12/2024 11:54 PM, Nuo Mi wrote: On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt < andreas.rheinha...@outlook.com> wrote: Besides simplifying address computations (it saves 432B of .text in hevcdsp.o alone here) it also fixes undefined behaviour that occurs if mx or my are 0 (happens when the filters are unused) because they lead to an array index of -1 in the old code. This happens in the checkasm-hevc_pel FATE-test. Signed-off-by: Andreas Rheinhardt Thank you, Andreas, Hi all, If there are no objections, I'll merge it tomorrow. Just pushed it. Thanks Andreas for fixing this. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt < andreas.rheinha...@outlook.com> wrote: > Besides simplifying address computations (it saves 432B of .text > in hevcdsp.o alone here) it also fixes undefined behaviour that > occurs if mx or my are 0 (happens when the filters are unused) > because they lead to an array index of -1 in the old code. > This happens in the checkasm-hevc_pel FATE-test. > > Signed-off-by: Andreas Rheinhardt > > Thank you, Andreas, Hi all, If there are no objections, I'll merge it tomorrow. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
On Mon, Feb 12, 2024 at 5:48 AM Christophe Gisquet < christophe.gisq...@gmail.com> wrote: > Le dim. 11 févr. 2024 à 12:37, Nuo Mi a écrit : > > > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = { > > > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = { > > > > > Do you know why this is [4][16]? [4][8] should suffice. > > Probably so that all coefficient banks are aligned. Another use for it > is you can directly use the address in some instruction instead of > using/wasting a reg for holding the data. > Hi Christophe, Thank you for the explanation. However, epel (chroma) did not adhere to this. X86 doesn't require this either since VVC didn't do it. It's a bit odd that only some architectures and luma need this I'll revisit this when I determine which architecture needs it and leave a comment in the code accordingly. > > -- > Christophe Gisquet > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
Le dim. 11 févr. 2024 à 12:37, Nuo Mi a écrit : > > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = { > > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = { > > > Do you know why this is [4][16]? [4][8] should suffice. Probably so that all coefficient banks are aligned. Another use for it is you can directly use the address in some instruction instead of using/wasting a reg for holding the data. -- Christophe Gisquet ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt < andreas.rheinha...@outlook.com> wrote: > Besides simplifying address computations (it saves 432B of .text > in hevcdsp.o alone here) it also fixes undefined behaviour that > We can save more if we change void (*put_hevc_epel[10][2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, intptr_t mx, intptr_t my, int width); to void (*put_hevc_epel[10][2][2])(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height, const int8_t *hf, const int8_t *vf, int width); But it may need a lot of work. occurs if mx or my are 0 (happens when the filters are unused) > because they lead to an array index of -1 in the old code. > This happens in the checkasm-hevc_pel FATE-test. > > Signed-off-by: Andreas Rheinhardt > --- > The loongarch and mips parts of this are untested. Luckily we have a > loongarch patchwork runner... > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
On Sun, Feb 11, 2024 at 4:21 PM Andreas Rheinhardt < andreas.rheinha...@outlook.com> wrote: > Besides simplifying address computations (it saves 432B of .text > in hevcdsp.o alone here) it also fixes undefined behaviour that > occurs if mx or my are 0 (happens when the filters are unused) > because they lead to an array index of -1 in the old code. > This happens in the checkasm-hevc_pel FATE-test. > > Signed-off-by: Andreas Rheinhardt > --- > The loongarch and mips parts of this are untested. Luckily we have a > loongarch patchwork runner... > > libavcodec/hevcdsp.c| 6 +- > libavcodec/hevcdsp.h| 5 +- > libavcodec/hevcdsp_template.c | 38 ++-- > libavcodec/loongarch/hevc_mc.S | 224 +--- > libavcodec/loongarch/hevc_mc_bi_lsx.c | 6 +- > libavcodec/loongarch/hevc_mc_uni_lsx.c | 6 +- > libavcodec/loongarch/hevc_mc_uniw_lsx.c | 4 +- > libavcodec/loongarch/hevcdsp_lsx.c | 6 +- > libavcodec/mips/hevc_mc_bi_msa.c| 6 +- > libavcodec/mips/hevc_mc_biw_msa.c | 6 +- > libavcodec/mips/hevc_mc_uni_msa.c | 6 +- > libavcodec/mips/hevc_mc_uniw_msa.c | 6 +- > libavcodec/mips/hevcdsp_mmi.c | 20 +-- > libavcodec/mips/hevcdsp_msa.c | 6 +- > libavcodec/x86/hevcdsp_init.c | 4 +- > 15 files changed, 112 insertions(+), 237 deletions(-) > > diff --git a/libavcodec/hevcdsp.c b/libavcodec/hevcdsp.c > index 2ca551df1d..630fdc012e 100644 > --- a/libavcodec/hevcdsp.c > +++ b/libavcodec/hevcdsp.c > @@ -91,7 +91,8 @@ static const int8_t transform[32][32] = { >90, -90, 88, -85, 82, -78, 73, -67, 61, -54, 46, -38, 31, > -22, 13, -4 }, > }; > > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[7][4] = { > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[8][4] = { > +{ 0 }, > { -2, 58, 10, -2}, > { -4, 54, 16, -2}, > { -6, 46, 28, -4}, > @@ -101,7 +102,8 @@ DECLARE_ALIGNED(16, const int8_t, > ff_hevc_epel_filters)[7][4] = { > { -2, 10, 58, -2}, > }; > > -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = { > +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = { > Do you know why this is [4][16]? [4][8] should suffice. If some architecture requires 16, we might need to update VVC_INTER_LUMA_TAPS to 16 in the future. Thank you > +{ 0 }, > { -1, 4,-10, 58, 17, -5, 1, 0, -1, 4,-10, 58, 17, -5, 1, 0}, > { -1, 4,-11, 40, 40,-11, 4, -1, -1, 4,-11, 40, 40,-11, 4, -1}, > { 0, 1, -5, 17, 58,-10, 4, -1, 0, 1, -5, 17, 58,-10, 4, -1} > diff --git a/libavcodec/hevcdsp.h b/libavcodec/hevcdsp.h > index 1b9c5bb6bc..a5933dcac4 100644 > > -- > 2.34.1 > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] avcodec/hevcdsp: Offset ff_hevc_.pel_filters to simplify addressing
Besides simplifying address computations (it saves 432B of .text in hevcdsp.o alone here) it also fixes undefined behaviour that occurs if mx or my are 0 (happens when the filters are unused) because they lead to an array index of -1 in the old code. This happens in the checkasm-hevc_pel FATE-test. Signed-off-by: Andreas Rheinhardt --- The loongarch and mips parts of this are untested. Luckily we have a loongarch patchwork runner... libavcodec/hevcdsp.c| 6 +- libavcodec/hevcdsp.h| 5 +- libavcodec/hevcdsp_template.c | 38 ++-- libavcodec/loongarch/hevc_mc.S | 224 +--- libavcodec/loongarch/hevc_mc_bi_lsx.c | 6 +- libavcodec/loongarch/hevc_mc_uni_lsx.c | 6 +- libavcodec/loongarch/hevc_mc_uniw_lsx.c | 4 +- libavcodec/loongarch/hevcdsp_lsx.c | 6 +- libavcodec/mips/hevc_mc_bi_msa.c| 6 +- libavcodec/mips/hevc_mc_biw_msa.c | 6 +- libavcodec/mips/hevc_mc_uni_msa.c | 6 +- libavcodec/mips/hevc_mc_uniw_msa.c | 6 +- libavcodec/mips/hevcdsp_mmi.c | 20 +-- libavcodec/mips/hevcdsp_msa.c | 6 +- libavcodec/x86/hevcdsp_init.c | 4 +- 15 files changed, 112 insertions(+), 237 deletions(-) diff --git a/libavcodec/hevcdsp.c b/libavcodec/hevcdsp.c index 2ca551df1d..630fdc012e 100644 --- a/libavcodec/hevcdsp.c +++ b/libavcodec/hevcdsp.c @@ -91,7 +91,8 @@ static const int8_t transform[32][32] = { 90, -90, 88, -85, 82, -78, 73, -67, 61, -54, 46, -38, 31, -22, 13, -4 }, }; -DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[7][4] = { +DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[8][4] = { +{ 0 }, { -2, 58, 10, -2}, { -4, 54, 16, -2}, { -6, 46, 28, -4}, @@ -101,7 +102,8 @@ DECLARE_ALIGNED(16, const int8_t, ff_hevc_epel_filters)[7][4] = { { -2, 10, 58, -2}, }; -DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[3][16] = { +DECLARE_ALIGNED(16, const int8_t, ff_hevc_qpel_filters)[4][16] = { +{ 0 }, { -1, 4,-10, 58, 17, -5, 1, 0, -1, 4,-10, 58, 17, -5, 1, 0}, { -1, 4,-11, 40, 40,-11, 4, -1, -1, 4,-11, 40, 40,-11, 4, -1}, { 0, 1, -5, 17, 58,-10, 4, -1, 0, 1, -5, 17, 58,-10, 4, -1} diff --git a/libavcodec/hevcdsp.h b/libavcodec/hevcdsp.h index 1b9c5bb6bc..a5933dcac4 100644 --- a/libavcodec/hevcdsp.h +++ b/libavcodec/hevcdsp.h @@ -126,8 +126,9 @@ typedef struct HEVCDSPContext { void ff_hevc_dsp_init(HEVCDSPContext *hpc, int bit_depth); -extern const int8_t ff_hevc_epel_filters[7][4]; -extern const int8_t ff_hevc_qpel_filters[3][16]; +/** ff_hevc_.pel_filters[0] are dummies to simplify array addressing */ +extern const int8_t ff_hevc_epel_filters[8][4]; +extern const int8_t ff_hevc_qpel_filters[4][16]; void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth); void ff_hevc_dsp_init_arm(HEVCDSPContext *c, const int bit_depth); diff --git a/libavcodec/hevcdsp_template.c b/libavcodec/hevcdsp_template.c index 9b48bdf08e..121c44c401 100644 --- a/libavcodec/hevcdsp_template.c +++ b/libavcodec/hevcdsp_template.c @@ -301,9 +301,9 @@ IDCT_DC(32) // #define ff_hevc_pel_filters ff_hevc_qpel_filters -#define DECL_HV_FILTER(f) \ -const uint8_t *hf = ff_hevc_ ## f ## _filters[mx - 1]; \ -const uint8_t *vf = ff_hevc_ ## f ## _filters[my - 1]; +#define DECL_HV_FILTER(f) \ +const uint8_t *hf = ff_hevc_ ## f ## _filters[mx]; \ +const uint8_t *vf = ff_hevc_ ## f ## _filters[my]; #define FW_PUT(p, f, t) \ static void FUNC(put_hevc_## f)(int16_t *dst, const uint8_t *src, ptrdiff_t srcstride, int height,\ @@ -421,7 +421,7 @@ static void FUNC(put_hevc_qpel_bi_h)(uint8_t *_dst, ptrdiff_t _dststride, const pixel *dst = (pixel *)_dst; ptrdiff_t dststride = _dststride / sizeof(pixel); -const int8_t *filter= ff_hevc_qpel_filters[mx - 1]; +const int8_t *filter= ff_hevc_qpel_filters[mx]; int shift = 14 + 1 - BIT_DEPTH; #if BIT_DEPTH < 14 @@ -449,7 +449,7 @@ static void FUNC(put_hevc_qpel_bi_v)(uint8_t *_dst, ptrdiff_t _dststride, pixel *dst = (pixel *)_dst; ptrdiff_t dststride = _dststride / sizeof(pixel); -const int8_t *filter= ff_hevc_qpel_filters[my - 1]; +const int8_t *filter= ff_hevc_qpel_filters[my]; int shift = 14 + 1 - BIT_DEPTH; #if BIT_DEPTH < 14 @@ -487,7 +487,7 @@ static void FUNC(put_hevc_qpel_bi_hv)(uint8_t *_dst, ptrdiff_t _dststride, #endif src -= QPEL_EXTRA_BEFORE * srcstride; -filter = ff_hevc_qpel_filters[mx - 1]; +filter = ff_hevc_qpel_filters[mx]; for (y = 0; y < height + QPEL_EXTRA; y++) { for (x = 0; x < width; x++) tmp[x] = QPEL_FILTER(src, 1) >>