Re: [FFmpeg-devel] [PATCH v1 2/2] vaapi: add vaapi_avs2 support

2024-01-19 Thread Zhao Zhili

> -Original Message-
> From: ffmpeg-devel  On Behalf Of 
> jianfeng.zheng
> Sent: 2024年1月19日 23:53
> To: ffmpeg-devel@ffmpeg.org
> Cc: jianfeng.zheng 
> Subject: [FFmpeg-devel] [PATCH v1 2/2] vaapi: add vaapi_avs2 support
> 
> see https://github.com/intel/libva/pull/738
> 
> [Moore Threads](https://www.mthreads.com) (short for Mthreads) is a
> Chinese GPU manufacturer. All our products, like MTTS70/MTTS80/.. ,
> support AVS2 8bit/10bit HW decoding at max 8k resolution.
> 
> Signed-off-by: jianfeng.zheng 
> ---
>  configure|   7 +
>  libavcodec/Makefile  |   2 +
>  libavcodec/allcodecs.c   |   1 +
>  libavcodec/avs2.c| 345 ++-
>  libavcodec/avs2.h| 460 +++-
>  libavcodec/avs2_parser.c |   5 +-
>  libavcodec/avs2dec.c | 569 +
>  libavcodec/avs2dec.h |  48 +++
>  libavcodec/avs2dec_headers.c | 787 +++
>  libavcodec/codec_desc.c  |   5 +-
>  libavcodec/defs.h|   4 +
>  libavcodec/hwaccels.h|   1 +
>  libavcodec/libdavs2.c|   2 +-
>  libavcodec/profiles.c|   6 +
>  libavcodec/profiles.h|   1 +
>  libavcodec/vaapi_avs2.c  | 227 ++
>  libavcodec/vaapi_decode.c|   5 +
>  libavformat/matroska.c   |   1 +
>  libavformat/mpeg.h   |   1 +
>  19 files changed, 2450 insertions(+), 27 deletions(-)
>  create mode 100644 libavcodec/avs2dec.c
>  create mode 100644 libavcodec/avs2dec.h
>  create mode 100644 libavcodec/avs2dec_headers.c
>  create mode 100644 libavcodec/vaapi_avs2.c
> 

Please split the patch properly. It's hard to review in a single chunk, and it 
can't be tested
without the hardware.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v1 1/2] vaapi: add vaapi_cavs support

2024-01-19 Thread Zhao Zhili

> -Original Message-
> From: ffmpeg-devel  On Behalf Of 
> jianfeng.zheng
> Sent: 2024年1月19日 23:50
> To: ffmpeg-devel@ffmpeg.org
> Cc: jianfeng.zheng 
> Subject: [FFmpeg-devel] [PATCH v1 1/2] vaapi: add vaapi_cavs support
> 
> see https://github.com/intel/libva/pull/738
> 
> [Moore Threads](https://www.mthreads.com) (short for Mthreads) is a
> Chinese GPU manufacturer. All our products, like MTTS70/MTTS80/.. ,
> support AVS/AVS+ HW decoding at max 2k resolution.

Please use description more objective and neutrality.

> 
> Signed-off-by: jianfeng.zheng 
> ---
>  configure |  14 ++
>  libavcodec/Makefile   |   1 +
>  libavcodec/cavs.c |  12 +
>  libavcodec/cavs.h |  36 ++-
>  libavcodec/cavs_parser.c  |  16 ++
>  libavcodec/cavsdec.c  | 473 +-
>  libavcodec/defs.h |   3 +
>  libavcodec/hwaccels.h |   1 +
>  libavcodec/profiles.c |   6 +
>  libavcodec/profiles.h |   1 +
>  libavcodec/vaapi_cavs.c   | 164 +
>  libavcodec/vaapi_decode.c |   4 +
>  12 files changed, 669 insertions(+), 62 deletions(-)
>  create mode 100644 libavcodec/vaapi_cavs.c
> 
> diff --git a/configure b/configure
> index c8ae0a061d..89759eda5d 100755
> --- a/configure
> +++ b/configure
> @@ -2463,6 +2463,7 @@ HAVE_LIST="
>  xmllint
>  zlib_gzip
>  openvino2
> +va_profile_avs
>  "
> 
>  # options emitted with CONFIG_ prefix but not available on the command line
> @@ -3202,6 +3203,7 @@ wmv3_dxva2_hwaccel_select="vc1_dxva2_hwaccel"
>  wmv3_nvdec_hwaccel_select="vc1_nvdec_hwaccel"
>  wmv3_vaapi_hwaccel_select="vc1_vaapi_hwaccel"
>  wmv3_vdpau_hwaccel_select="vc1_vdpau_hwaccel"
> +cavs_vaapi_hwaccel_deps="vaapi va_profile_avs VAPictureParameterBufferAVS"
> 
>  # hardware-accelerated codecs
>  mediafoundation_deps="mftransform_h MFCreateAlignedMemoryBuffer"
> @@ -7175,6 +7177,18 @@ if enabled vaapi; then
>  check_type "va/va.h va/va_enc_vp8.h"  "VAEncPictureParameterBufferVP8"
>  check_type "va/va.h va/va_enc_vp9.h"  "VAEncPictureParameterBufferVP9"
>  check_type "va/va.h va/va_enc_av1.h"  "VAEncPictureParameterBufferAV1"
> +
> +#
> +# Using 'VA_CHECK_VERSION' in source codes make things easy. But we have 
> to wait
> +# until newly added VAProfile being distributed by VAAPI released 
> version.
> +#
> +# Before or after that, we can use auto-detection to keep version 
> compatibility.
> +# It always works.
> +#
> +disable va_profile_avs &&
> +test_code cc va/va.h "VAProfile p1 = VAProfileAVSJizhun, p2 = 
> VAProfileAVSGuangdian;" &&
> +enable va_profile_avs
> +enabled va_profile_avs && check_type "va/va.h va/va_dec_avs.h" 
> "VAPictureParameterBufferAVS"
>  fi
> 
>  if enabled_all opencl libdrm ; then
> diff --git a/libavcodec/Makefile b/libavcodec/Makefile
> index bb42095165..7d92375fed 100644
> --- a/libavcodec/Makefile
> +++ b/libavcodec/Makefile
> @@ -1055,6 +1055,7 @@ OBJS-$(CONFIG_VP9_VAAPI_HWACCEL)  += vaapi_vp9.o
>  OBJS-$(CONFIG_VP9_VDPAU_HWACCEL)  += vdpau_vp9.o
>  OBJS-$(CONFIG_VP9_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_vp9.o
>  OBJS-$(CONFIG_VP8_QSV_HWACCEL)+= qsvdec.o
> +OBJS-$(CONFIG_CAVS_VAAPI_HWACCEL) += vaapi_cavs.o
> 
>  # Objects duplicated from other libraries for shared builds
>  SHLIBOBJS  += log2_tab.o reverse.o
> diff --git a/libavcodec/cavs.c b/libavcodec/cavs.c
> index fdd577f7fb..ed7b278336 100644
> --- a/libavcodec/cavs.c
> +++ b/libavcodec/cavs.c

Please split the patch.

> @@ -810,6 +810,14 @@ av_cold int ff_cavs_init(AVCodecContext *avctx)
>  if (!h->cur.f || !h->DPB[0].f || !h->DPB[1].f)
>  return AVERROR(ENOMEM);
> 
> +h->out[0].f = av_frame_alloc();
> +h->out[1].f = av_frame_alloc();
> +h->out[2].f = av_frame_alloc();
> +if (!h->out[0].f || !h->out[1].f || !h->out[2].f) {
> +ff_cavs_end(avctx);
> +return AVERROR(ENOMEM);
> +}
> +
>  h->luma_scan[0] = 0;
>  h->luma_scan[1] = 8;
>  h->intra_pred_l[INTRA_L_VERT]   = intra_pred_vert;
> @@ -840,6 +848,10 @@ av_cold int ff_cavs_end(AVCodecContext *avctx)
>  av_frame_free(&h->DPB[0].f);
>  av_frame_free(&h->DPB[1].f);
> 
> +av_frame_free(&h->out[0].f);
> +av_frame_free(&h->out[1].f);
> +av_frame_free(&h->out[2].f);
> +
>  av_freep(&h->top_qp);
>  av_freep(&h->top_mv[0]);
>  av_freep(&h->top_mv[1]);
> diff --git a/libavcodec/cavs.h b/libavcodec/cavs.h
> index 244c322b35..ef03c1a974 100644
> --- a/libavcodec/cavs.h
> +++ b/libavcodec/cavs.h
> @@ -39,8 +39,10 @@
>  #define EXT_START_CODE  0x01b5
>  #define USER_START_CODE 0x01b2
>  #define CAVS_START_CODE 0x01b0
> +#define VIDEO_SEQ_END_CODE  0x01b1
>  #define PIC_I_START_CODE0x01b3
>  #define PIC_PB_START_CODE   0x01b6
> +#define VIDEO_EDIT_CODE 0x01b7
> 

[FFmpeg-devel] [PATCH 2/2] avcodec/speexdec: fix setting frame_size from extradata

2024-01-19 Thread James Almer
Finishes fixing vp5/potter512-400-partial.avi

The fate-matroska-ms-mode test ref is updated to reflect that the Speex decoder
can now read the stream.

Signed-off-by: James Almer 
---
 libavcodec/speexdec.c   | 4 +---
 tests/ref/fate/matroska-ms-mode | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/libavcodec/speexdec.c b/libavcodec/speexdec.c
index c73b2a7ec2..51c5834769 100644
--- a/libavcodec/speexdec.c
+++ b/libavcodec/speexdec.c
@@ -1420,9 +1420,7 @@ static int parse_speex_extradata(AVCodecContext *avctx,
 if (s->nb_channels <= 0 || s->nb_channels > 2)
 return AVERROR_INVALIDDATA;
 s->bitrate = bytestream_get_le32(&buf);
-s->frame_size = bytestream_get_le32(&buf);
-if (s->frame_size < NB_FRAME_SIZE << s->mode)
-return AVERROR_INVALIDDATA;
+s->frame_size = (1 + (s->mode > 0)) * bytestream_get_le32(&buf);
 s->vbr = bytestream_get_le32(&buf);
 s->frames_per_packet = bytestream_get_le32(&buf);
 if (s->frames_per_packet <= 0 ||
diff --git a/tests/ref/fate/matroska-ms-mode b/tests/ref/fate/matroska-ms-mode
index 5c91209910..0e31c990dc 100644
--- a/tests/ref/fate/matroska-ms-mode
+++ b/tests/ref/fate/matroska-ms-mode
@@ -1,4 +1,4 @@
-a2897e3951b0054d0fa31fe51860444f *tests/data/fate/matroska-ms-mode.matroska
+e7f44cd6a5c0f45fea11874afb8c1c0d *tests/data/fate/matroska-ms-mode.matroska
 413103 tests/data/fate/matroska-ms-mode.matroska
 #extradata 0:   40, 0x54290c93
 #extradata 1:  114, 0xb6c80771
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 1/2] avcodec/speexdec: relax the extradata check for the speex string

2024-01-19 Thread James Almer
There could be bogus bytes at the start, as is the case of
vp5/potter512-400-partial.avi from the FATE suite, which could be a case of bad
remuxing from an OGG source.

Partially fixes decoding of vp5/potter512-400-partial.avi

Signed-off-by: James Almer 
---
 libavcodec/speexdec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libavcodec/speexdec.c b/libavcodec/speexdec.c
index 08c7e77e7d..c73b2a7ec2 100644
--- a/libavcodec/speexdec.c
+++ b/libavcodec/speexdec.c
@@ -52,6 +52,7 @@
  */
 
 #include "libavutil/avassert.h"
+#include "libavutil/avstring.h"
 #include "libavutil/float_dsp.h"
 #include "avcodec.h"
 #include "bytestream.h"
@@ -1397,9 +1398,9 @@ static int parse_speex_extradata(AVCodecContext *avctx,
 const uint8_t *extradata, int extradata_size)
 {
 SpeexContext *s = avctx->priv_data;
-const uint8_t *buf = extradata;
+const uint8_t *buf = av_strnstr(extradata, "Speex   ", extradata_size);
 
-if (memcmp(buf, "Speex   ", 8))
+if (!buf)
 return AVERROR_INVALIDDATA;
 
 buf += 28;
-- 
2.43.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avutil/eval: Use even better PRNG

2024-01-19 Thread Michael Niedermayer
On Fri, Jan 19, 2024 at 09:53:46AM +0100, Michael Koch wrote:
> There is still a small problem with the random generator, but this has
> nothing to do with the recent changes.
> If the random() expression is used in the geq filter, then multiple pixels
> get the same sequence of random numbers.
> As can be shown with this command, where the frame has only two pixels:
> 
> ffmpeg -loglevel repeat -f lavfi -i nullsrc=size=1x2,format=gray -vf
> "geq=lum='print(random(0));print(random(0));print(random(0))'" -frames 1 -y
> out.png
> 
> I think it's because the filter is executed in multiple threads.
> -filter_threads 1 fixes the problem, but it slows down the whole filter
> thread.

You can avoid this by using
ifnot(X,st(0,Y))

which would reseed the random number generator differently on the first pixel of
each line
Not sure this is the best solution, better ideas are welcome

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Some people wanted to paint the bikeshed green, some blue and some pink.
People argued and fought, when they finally agreed, only rust was left.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] x86/tx_float: enable SIMD for sizes over 131072

2024-01-19 Thread Michael Niedermayer
On Thu, Jan 18, 2024 at 05:33:41PM +0100, Lynne wrote:
> The tables for the new sizes were added last year due
> to being required for SDR.
> However, the assembly was never updated to use them.
> 
> Patch attached.
> 

>  tx_float.asm |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> a8a354082090729d8b772e84f82266a210371d10  
> 0001-x86-tx_float-enable-SIMD-for-sizes-over-131072.patch
> From ccfd9366025105a7dba0471965856b12d73bbd17 Mon Sep 17 00:00:00 2001
> From: Lynne 
> Date: Thu, 18 Jan 2024 17:30:29 +0100
> Subject: [PATCH] x86/tx_float: enable SIMD for sizes over 131072
> 
> The tables for the new sizes were added last year due
> to being required for SDR.
> However, the assembly was never updated to use them.
> ---
>  libavutil/x86/tx_float.asm | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

LGTM

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/2] avformat: add a Tile Grid stream group type

2024-01-19 Thread James Almer

On 1/19/2024 6:22 PM, Michael Niedermayer wrote:

On Wed, Jan 17, 2024 at 05:41:33PM -0300, James Almer wrote:

This will be used to support tiled image formats like HEIF.

Signed-off-by: James Almer 
---
  libavformat/avformat.c |  5 +
  libavformat/avformat.h |  3 +++
  libavformat/dump.c | 36 
  libavformat/options.c  |  9 +
  4 files changed, 53 insertions(+)


Iam sure ive forgotten something but this fails build


My bad, i sent "avformat/dump: only print streams within a group in 
verbose levels" as a patch independent of this set, but it goes before 
these two patches.




libavformat/dump.c: In function ‘dump_stream_group’:
libavformat/dump.c:722:9: error: too many arguments to function ‘dump_metadata’
  dump_metadata(NULL, stg->metadata, "", AV_LOG_INFO);
  ^
libavformat/dump.c:166:13: note: declared here
  static void dump_metadata(void *ctx, const AVDictionary *m, const char 
*indent)
  ^
libavformat/dump.c:746:13: error: too many arguments to function 
‘dump_stream_format’
  dump_stream_format(ic, st->index, i, index, is_output, 
AV_LOG_VERBOSE);
  ^~
libavformat/dump.c:522:13: note: declared here
  static void dump_stream_format(const AVFormatContext *ic, int i,
  ^~
ffbuild/common.mak:81: recipe for target 'libavformat/dump.o' failed
make: *** [libavformat/dump.o] Error 1

[...]


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/2] avformat: add a Tile Grid stream group type

2024-01-19 Thread Michael Niedermayer
On Wed, Jan 17, 2024 at 05:41:33PM -0300, James Almer wrote:
> This will be used to support tiled image formats like HEIF.
> 
> Signed-off-by: James Almer 
> ---
>  libavformat/avformat.c |  5 +
>  libavformat/avformat.h |  3 +++
>  libavformat/dump.c | 36 
>  libavformat/options.c  |  9 +
>  4 files changed, 53 insertions(+)

Iam sure ive forgotten something but this fails build

libavformat/dump.c: In function ‘dump_stream_group’:
libavformat/dump.c:722:9: error: too many arguments to function ‘dump_metadata’
 dump_metadata(NULL, stg->metadata, "", AV_LOG_INFO);
 ^
libavformat/dump.c:166:13: note: declared here
 static void dump_metadata(void *ctx, const AVDictionary *m, const char *indent)
 ^
libavformat/dump.c:746:13: error: too many arguments to function 
‘dump_stream_format’
 dump_stream_format(ic, st->index, i, index, is_output, 
AV_LOG_VERBOSE);
 ^~
libavformat/dump.c:522:13: note: declared here
 static void dump_stream_format(const AVFormatContext *ic, int i,
 ^~
ffbuild/common.mak:81: recipe for target 'libavformat/dump.o' failed
make: *** [libavformat/dump.o] Error 1

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Avoid a single point of failure, be that a person or equipment.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] lavc/dxvenc: add DXV encoder with support for DXT1 texture format

2024-01-19 Thread Connor Worley
I've tested the latest patch with both the lavc decoder and Resolume's 
proprietary software, and the encoded outputs are working for me.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] RISC-V vector DSP functions: Motivation for commit 446b009

2024-01-19 Thread Rémi Denis-Courmont
Hi,

Le perjantaina 19. tammikuuta 2024, 17.30.00 EET Michael Platzer via ffmpeg-
devel a écrit :
> Commit 446b0090cbb66ee614dcf6ca79c78dc8eb7f0e37 by Remi Denis-Courmont has
> replaced RISC-V vector loads and stores with negative stride with vrgather
> (generalized permutation within vector registers) instructions in order to
> reverse the elements in a vector register. The commit message explains that
> this change was done, but it does not explain why.

It was faster on what the best approximation of real hardware available at the 
time, i.e. a Sipeed Lichee Pi4A board. There are no benchmarks in the commit 
because I don't like to publish benchmarks collected from prototypes. 
Nevertheless I think the commit message hints enough that anybody could easily 
guess that it was a performance optimisation, if I'm being honest.

This is not exactly surprising: typical hardware can only access so many 
memory addresses simultaneously (i.e. one or maybe two), so indexed loads and 
strided loads are bound to be much slower than unit-strided loads.

Maybe you have access to special hardware that is able to optimise the special 
case of strides equal to minus one to reduce the number of memory accesses. 
But I didn't back then, and as a matter of fact, I still don't. Hardware 
donations are welcome.

> I fail to see what could possibly have motivated this change.

> The RISC-V vector loads and stores support negative stride values for use
> cases such as this one.

[Citation required]

> Using vrgather instead replaces the more specific operation with a more
> generic one,

That is a very subjective and unsubstantiated assertion. This feels a bit 
hypocritical while you are attacking me for not providing justification.

As far as I can tell, neither instruction are specific to reversing vector 
element order. An actual real-life specific instruction exists on Arm in the 
form of vector-reverse. I don't know any ISA with load-reverse or store-
reverse.

> which is likely to be less performant on most HW architectures.

Would you care to define "most architectures"? I only know one commercially 
available hardware architecture as of today, Kendryte K230 SoC with T-Head 
C908 CPU, so I can't make much sense of your sentence here.

> In addition, it requires to setup an index vector,

That is irrelevant since in this loop, the vector bank is not a bottleneck. 
The loop can run with maximul LMUL either way. And besides, the loop turned 
out to be faster with a smaller multiplier.

> thus raising dynamic instruction count.

It adds only one instruction (reverse subtraction) in the main loop, and even 
that could be optimised away if relevant.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2] lavc/dxvenc: add DXV encoder with support for DXT1 texture format

2024-01-19 Thread Connor Worley
Signed-off-by: Connor Worley 
---
 Changelog |   1 +
 configure |   1 +
 doc/general_contents.texi |   3 +-
 libavcodec/Makefile   |   1 +
 libavcodec/allcodecs.c|   1 +
 libavcodec/dxvenc.c   | 361 ++
 libavcodec/version.h  |   2 +-
 7 files changed, 368 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/dxvenc.c

diff --git a/Changelog b/Changelog
index 5b2899d05b..224d84664a 100644
--- a/Changelog
+++ b/Changelog
@@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to youngest 
within each release,
 releases are sorted from youngest to oldest.
 
 version :
+- DXV DXT1 encoder
 - LEAD MCMP decoder
 - EVC decoding using external library libxevd
 - EVC encoding using external library libxeve
diff --git a/configure b/configure
index c8ae0a061d..21663000f8 100755
--- a/configure
+++ b/configure
@@ -2851,6 +2851,7 @@ dvvideo_decoder_select="dvprofile idctdsp"
 dvvideo_encoder_select="dvprofile fdctdsp me_cmp pixblockdsp"
 dxa_decoder_deps="zlib"
 dxv_decoder_select="lzf texturedsp"
+dxv_encoder_select="texturedspenc"
 eac3_decoder_select="ac3_decoder"
 eac3_encoder_select="ac3_encoder"
 eamad_decoder_select="aandcttables blockdsp bswapdsp"
diff --git a/doc/general_contents.texi b/doc/general_contents.texi
index 8b48fed060..f269cbd1a9 100644
--- a/doc/general_contents.texi
+++ b/doc/general_contents.texi
@@ -670,7 +670,8 @@ library:
 @item Redirector@tab   @tab X
 @item RedSpark  @tab   @tab X
 @item Renderware TeXture Dictionary @tab   @tab X
-@item Resolume DXV  @tab   @tab X
+@item Resolume DXV  @tab X @tab X
+@tab Encoding is only supported for the DXT1 (Normal Quality, No Alpha) 
texture format.
 @item RF64  @tab   @tab X
 @item RL2   @tab   @tab X
 @tab Audio and video format used in some games by Entertainment Software 
Partners.
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index bb42095165..96361ac794 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -341,6 +341,7 @@ OBJS-$(CONFIG_DVVIDEO_ENCODER) += dvenc.o dv.o 
dvdata.o
 OBJS-$(CONFIG_DXA_DECODER) += dxa.o
 OBJS-$(CONFIG_DXTORY_DECODER)  += dxtory.o
 OBJS-$(CONFIG_DXV_DECODER) += dxv.o
+OBJS-$(CONFIG_DXV_ENCODER) += dxvenc.o
 OBJS-$(CONFIG_EAC3_DECODER)+= eac3_data.o
 OBJS-$(CONFIG_EAC3_ENCODER)+= eac3enc.o eac3_data.o
 OBJS-$(CONFIG_EACMV_DECODER)   += eacmv.o
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index 93ce8e3224..ef8c3a6d7d 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -106,6 +106,7 @@ extern const FFCodec ff_dvvideo_encoder;
 extern const FFCodec ff_dvvideo_decoder;
 extern const FFCodec ff_dxa_decoder;
 extern const FFCodec ff_dxtory_decoder;
+extern const FFCodec ff_dxv_encoder;
 extern const FFCodec ff_dxv_decoder;
 extern const FFCodec ff_eacmv_decoder;
 extern const FFCodec ff_eamad_decoder;
diff --git a/libavcodec/dxvenc.c b/libavcodec/dxvenc.c
new file mode 100644
index 00..3a5b310c9b
--- /dev/null
+++ b/libavcodec/dxvenc.c
@@ -0,0 +1,361 @@
+/*
+ * Resolume DXV encoder
+ * Copyright (C) 2024 Connor Worley 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include 
+
+#include "libavutil/crc.h"
+#include "libavutil/imgutils.h"
+#include "libavutil/opt.h"
+
+#include "bytestream.h"
+#include "codec_internal.h"
+#include "encode.h"
+#include "texturedsp.h"
+
+#define DXV_HEADER_LENGTH 12
+
+/*
+ * DXV uses LZ-like back-references to avoid copying words that have already
+ * appeared in the decompressed stream. Using a simple hash table (HT)
+ * significantly speeds up the lookback process while encoding.
+ */
+#define LOOKBACK_HT_ELEMS 0x4
+#define LOOKBACK_WORDS0x20202
+
+enum DXVTextureFormat {
+DXV_FMT_DXT1 = MKBETAG('D', 'X', 'T', '1'),
+};
+
+typedef struct HTEntry {
+uint32_t key;
+uint32_t pos;
+} HTEntry;
+
+static void ht_init(HTEntry *ht)
+{
+for (size_t i = 0; i < LOOKBACK_HT_ELEMS; i++) {
+ht[i].pos = -1;
+}
+}
+
+static uint32_t ht_lookup_and_upsert(HTEntry *ht, AVCRC *hash_ctx,
+ 

Re: [FFmpeg-devel] [PATCH] Add DXV encoder with support for DXT1 texture format

2024-01-19 Thread Vittorio Giovara
On Fri, Jan 19, 2024 at 5:56 PM Connor Worley 
wrote:

> Thanks for the feedback! For the next revision, is it preferred to reply
> to this thread or create a new one?
>

here is fine


> On 1/19/24 08:23, Vittorio Giovara wrote:
> >> diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
> >> index 93ce8e3224..ef8c3a6d7d 100644
> >> --- a/libavcodec/allcodecs.c
> >> +++ b/libavcodec/allcodecs.c
> >> @@ -106,6 +106,7 @@ extern const FFCodec ff_dvvideo_encoder;
> >>  extern const FFCodec ff_dvvideo_decoder;
> >>  extern const FFCodec ff_dxa_decoder;
> >>  extern const FFCodec ff_dxtory_decoder;
> >> +extern const FFCodec ff_dxv_encoder;
> >>  extern const FFCodec ff_dxv_decoder;
> >>
> > nit: keep list in order
>
>
> Not sure what you mean, the present order seems to be encoder followed by
> decoder for codecs that have both.
>

disregard, I assumed it was in alphabetical order


> > What does the HT stand for?
>
>
> Hash table --  this change implements a simple linear probing approach.
>

got it, would be nice to have a small comment on why it's needed, as
documentation


> >> +#define LOOKBACK_WORDS0x20202
> >> +
> >> +enum DXVTextureFormat {
> >> +DXV_FMT_DXT1 = MKBETAG('D', 'X', 'T', '1'),
> >> +};
> >>
> > Why would you go for an enum here? Just for future expansion and the
> switch
> > case below?
>
>
> Exactly, that's the plan.
>

very cool
-- 
Vittorio
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] Add DXV encoder with support for DXT1 texture format

2024-01-19 Thread Connor Worley
Thanks for the feedback! For the next revision, is it preferred to reply to 
this thread or create a new one?

On 1/19/24 08:23, Vittorio Giovara wrote:
>> diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
>> index 93ce8e3224..ef8c3a6d7d 100644
>> --- a/libavcodec/allcodecs.c
>> +++ b/libavcodec/allcodecs.c
>> @@ -106,6 +106,7 @@ extern const FFCodec ff_dvvideo_encoder;
>>  extern const FFCodec ff_dvvideo_decoder;
>>  extern const FFCodec ff_dxa_decoder;
>>  extern const FFCodec ff_dxtory_decoder;
>> +extern const FFCodec ff_dxv_encoder;
>>  extern const FFCodec ff_dxv_decoder;
>>
> nit: keep list in order


Not sure what you mean, the present order seems to be encoder followed by 
decoder for codecs that have both.

>>  extern const FFCodec ff_eacmv_decoder;
>>  extern const FFCodec ff_eamad_decoder;
>> diff --git a/libavcodec/dxvenc.c b/libavcodec/dxvenc.c
>> new file mode 100644
>> index 00..33080fa1c9
>> --- /dev/null
>> +++ b/libavcodec/dxvenc.c
>> @@ -0,0 +1,358 @@
>> +/*
>> + * Resolume DXV encoder
>> + * Copyright (C) 2015 Vittorio Giovara 
>> + * Copyright (C) 2015 Tom Butterworth 
>> + * Copyright (C) 2018 Paul B Mahol
>> + * Copyright (C) 2024 Connor Worley 
>>
> Idk about tom or paul, but I haven't done anything for this encoder :)
> I think you can prune the list of copyright quite a bit here


Got it. I copied some code verbatim from dxv.c and hapenc.c and erred on the 
side of overcrediting.


>> +#define LOOKBACK_HT_ELEMS 0x4
>>
> What does the HT stand for?


Hash table --  this change implements a simple linear probing approach.

>> +#define LOOKBACK_WORDS0x20202
>> +
>> +enum DXVTextureFormat {
>> +DXV_FMT_DXT1 = MKBETAG('D', 'X', 'T', '1'),
>> +};
>>
> Why would you go for an enum here? Just for future expansion and the switch
> case below?


Exactly, that's the plan.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] Add DXV encoder with support for DXT1 texture format

2024-01-19 Thread Vittorio Giovara
Hi,
thanks for the patch, below a few minor nits

On Wed, Jan 17, 2024 at 9:57 PM Connor Worley 
wrote:

> diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
> index 93ce8e3224..ef8c3a6d7d 100644
> --- a/libavcodec/allcodecs.c
> +++ b/libavcodec/allcodecs.c
> @@ -106,6 +106,7 @@ extern const FFCodec ff_dvvideo_encoder;
>  extern const FFCodec ff_dvvideo_decoder;
>  extern const FFCodec ff_dxa_decoder;
>  extern const FFCodec ff_dxtory_decoder;
> +extern const FFCodec ff_dxv_encoder;
>  extern const FFCodec ff_dxv_decoder;
>

nit: keep list in order


>  extern const FFCodec ff_eacmv_decoder;
>  extern const FFCodec ff_eamad_decoder;
> diff --git a/libavcodec/dxvenc.c b/libavcodec/dxvenc.c
> new file mode 100644
> index 00..33080fa1c9
> --- /dev/null
> +++ b/libavcodec/dxvenc.c
> @@ -0,0 +1,358 @@
> +/*
> + * Resolume DXV encoder
> + * Copyright (C) 2015 Vittorio Giovara 
> + * Copyright (C) 2015 Tom Butterworth 
> + * Copyright (C) 2018 Paul B Mahol
> + * Copyright (C) 2024 Connor Worley 
>

Idk about tom or paul, but I haven't done anything for this encoder :)
I think you can prune the list of copyright quite a bit here


> +#define LOOKBACK_HT_ELEMS 0x4
>

What does the HT stand for?


> +#define LOOKBACK_WORDS0x20202
> +
> +enum DXVTextureFormat {
> +DXV_FMT_DXT1 = MKBETAG('D', 'X', 'T', '1'),
> +};
>

Why would you go for an enum here? Just for future expansion and the switch
case below?

lgtm overall
-- 
Vittorio
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v1 2/2] vaapi: add vaapi_avs2 support

2024-01-19 Thread jianfeng.zheng
see https://github.com/intel/libva/pull/738

[Moore Threads](https://www.mthreads.com) (short for Mthreads) is a
Chinese GPU manufacturer. All our products, like MTTS70/MTTS80/.. ,
support AVS2 8bit/10bit HW decoding at max 8k resolution.

Signed-off-by: jianfeng.zheng 
---
 configure|   7 +
 libavcodec/Makefile  |   2 +
 libavcodec/allcodecs.c   |   1 +
 libavcodec/avs2.c| 345 ++-
 libavcodec/avs2.h| 460 +++-
 libavcodec/avs2_parser.c |   5 +-
 libavcodec/avs2dec.c | 569 +
 libavcodec/avs2dec.h |  48 +++
 libavcodec/avs2dec_headers.c | 787 +++
 libavcodec/codec_desc.c  |   5 +-
 libavcodec/defs.h|   4 +
 libavcodec/hwaccels.h|   1 +
 libavcodec/libdavs2.c|   2 +-
 libavcodec/profiles.c|   6 +
 libavcodec/profiles.h|   1 +
 libavcodec/vaapi_avs2.c  | 227 ++
 libavcodec/vaapi_decode.c|   5 +
 libavformat/matroska.c   |   1 +
 libavformat/mpeg.h   |   1 +
 19 files changed, 2450 insertions(+), 27 deletions(-)
 create mode 100644 libavcodec/avs2dec.c
 create mode 100644 libavcodec/avs2dec.h
 create mode 100644 libavcodec/avs2dec_headers.c
 create mode 100644 libavcodec/vaapi_avs2.c

diff --git a/configure b/configure
index 89759eda5d..bde3217241 100755
--- a/configure
+++ b/configure
@@ -2464,6 +2464,7 @@ HAVE_LIST="
 zlib_gzip
 openvino2
 va_profile_avs
+va_profile_avs2
 "
 
 # options emitted with CONFIG_ prefix but not available on the command line
@@ -3204,6 +3205,7 @@ wmv3_nvdec_hwaccel_select="vc1_nvdec_hwaccel"
 wmv3_vaapi_hwaccel_select="vc1_vaapi_hwaccel"
 wmv3_vdpau_hwaccel_select="vc1_vdpau_hwaccel"
 cavs_vaapi_hwaccel_deps="vaapi va_profile_avs VAPictureParameterBufferAVS"
+avs2_vaapi_hwaccel_deps="vaapi va_profile_avs2 VAPictureParameterBufferAVS2"
 
 # hardware-accelerated codecs
 mediafoundation_deps="mftransform_h MFCreateAlignedMemoryBuffer"
@@ -7189,6 +7191,11 @@ if enabled vaapi; then
 test_code cc va/va.h "VAProfile p1 = VAProfileAVSJizhun, p2 = 
VAProfileAVSGuangdian;" &&
 enable va_profile_avs
 enabled va_profile_avs && check_type "va/va.h va/va_dec_avs.h" 
"VAPictureParameterBufferAVS"
+
+disable va_profile_avs2 &&
+test_code cc va/va.h "VAProfile p1 = VAProfileAVS2Main, p2 = 
VAProfileAVS2Main10;" &&
+enable va_profile_avs2
+enabled va_profile_avs2 && check_type "va/va.h va/va_dec_avs2.h" 
"VAPictureParameterBufferAVS2"
 fi
 
 if enabled_all opencl libdrm ; then
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 7d92375fed..ac3925ed57 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -285,6 +285,7 @@ OBJS-$(CONFIG_BRENDER_PIX_DECODER) += brenderpix.o
 OBJS-$(CONFIG_C93_DECODER) += c93.o
 OBJS-$(CONFIG_CAVS_DECODER)+= cavs.o cavsdec.o cavsdsp.o \
   cavsdata.o
+OBJS-$(CONFIG_AVS2_DECODER)+= avs2.o avs2dec.o avs2dec_headers.o
 OBJS-$(CONFIG_CBD2_DECODER)+= dpcm.o
 OBJS-$(CONFIG_CCAPTION_DECODER)+= ccaption_dec.o ass.o
 OBJS-$(CONFIG_CDGRAPHICS_DECODER)  += cdgraphics.o
@@ -1056,6 +1057,7 @@ OBJS-$(CONFIG_VP9_VDPAU_HWACCEL)  += vdpau_vp9.o
 OBJS-$(CONFIG_VP9_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_vp9.o
 OBJS-$(CONFIG_VP8_QSV_HWACCEL)+= qsvdec.o
 OBJS-$(CONFIG_CAVS_VAAPI_HWACCEL) += vaapi_cavs.o
+OBJS-$(CONFIG_AVS2_VAAPI_HWACCEL) += vaapi_avs2.o
 
 # Objects duplicated from other libraries for shared builds
 SHLIBOBJS  += log2_tab.o reverse.o
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index 93ce8e3224..5900e71804 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -76,6 +76,7 @@ extern const FFCodec ff_bmv_video_decoder;
 extern const FFCodec ff_brender_pix_decoder;
 extern const FFCodec ff_c93_decoder;
 extern const FFCodec ff_cavs_decoder;
+extern const FFCodec ff_avs2_decoder;
 extern const FFCodec ff_cdgraphics_decoder;
 extern const FFCodec ff_cdtoons_decoder;
 extern const FFCodec ff_cdxl_decoder;
diff --git a/libavcodec/avs2.c b/libavcodec/avs2.c
index ead8687d0a..c235708fad 100644
--- a/libavcodec/avs2.c
+++ b/libavcodec/avs2.c
@@ -1,7 +1,9 @@
 /*
+ * Chinese AVS2-Video (GY/T 299.1-2016 or IEEE 1857.4-2018) decoder.
  * AVS2 related definitions
  *
  * Copyright (C) 2022 Zhao Zhili, 
+ * Copyright (c) 2022 JianfengZheng 
  *
  * This file is part of FFmpeg.
  *
@@ -20,23 +22,332 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+/**
+ * @file
+ * Chinese AVS2-Video (GY/T 299.1-2016 or IEEE 1857.4-2018) definitions
+ * @author JianfengZheng 
+ */
+
+#include "libavcodec/internal.h"
+#include "avcodec.h"
+#include "get_bits.h"
+#include "bytestream.h"
 #include "avs2.h"
+#include "startcode.h"
+
+static AVS2LevelLimit const *ff_avs2

[FFmpeg-devel] [PATCH v1 1/2] vaapi: add vaapi_cavs support

2024-01-19 Thread jianfeng.zheng
see https://github.com/intel/libva/pull/738

[Moore Threads](https://www.mthreads.com) (short for Mthreads) is a
Chinese GPU manufacturer. All our products, like MTTS70/MTTS80/.. ,
support AVS/AVS+ HW decoding at max 2k resolution.

Signed-off-by: jianfeng.zheng 
---
 configure |  14 ++
 libavcodec/Makefile   |   1 +
 libavcodec/cavs.c |  12 +
 libavcodec/cavs.h |  36 ++-
 libavcodec/cavs_parser.c  |  16 ++
 libavcodec/cavsdec.c  | 473 +-
 libavcodec/defs.h |   3 +
 libavcodec/hwaccels.h |   1 +
 libavcodec/profiles.c |   6 +
 libavcodec/profiles.h |   1 +
 libavcodec/vaapi_cavs.c   | 164 +
 libavcodec/vaapi_decode.c |   4 +
 12 files changed, 669 insertions(+), 62 deletions(-)
 create mode 100644 libavcodec/vaapi_cavs.c

diff --git a/configure b/configure
index c8ae0a061d..89759eda5d 100755
--- a/configure
+++ b/configure
@@ -2463,6 +2463,7 @@ HAVE_LIST="
 xmllint
 zlib_gzip
 openvino2
+va_profile_avs
 "
 
 # options emitted with CONFIG_ prefix but not available on the command line
@@ -3202,6 +3203,7 @@ wmv3_dxva2_hwaccel_select="vc1_dxva2_hwaccel"
 wmv3_nvdec_hwaccel_select="vc1_nvdec_hwaccel"
 wmv3_vaapi_hwaccel_select="vc1_vaapi_hwaccel"
 wmv3_vdpau_hwaccel_select="vc1_vdpau_hwaccel"
+cavs_vaapi_hwaccel_deps="vaapi va_profile_avs VAPictureParameterBufferAVS"
 
 # hardware-accelerated codecs
 mediafoundation_deps="mftransform_h MFCreateAlignedMemoryBuffer"
@@ -7175,6 +7177,18 @@ if enabled vaapi; then
 check_type "va/va.h va/va_enc_vp8.h"  "VAEncPictureParameterBufferVP8"
 check_type "va/va.h va/va_enc_vp9.h"  "VAEncPictureParameterBufferVP9"
 check_type "va/va.h va/va_enc_av1.h"  "VAEncPictureParameterBufferAV1"
+
+#
+# Using 'VA_CHECK_VERSION' in source codes make things easy. But we have 
to wait
+# until newly added VAProfile being distributed by VAAPI released version.
+#
+# Before or after that, we can use auto-detection to keep version 
compatibility.
+# It always works.
+#
+disable va_profile_avs &&
+test_code cc va/va.h "VAProfile p1 = VAProfileAVSJizhun, p2 = 
VAProfileAVSGuangdian;" &&
+enable va_profile_avs
+enabled va_profile_avs && check_type "va/va.h va/va_dec_avs.h" 
"VAPictureParameterBufferAVS"
 fi
 
 if enabled_all opencl libdrm ; then
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index bb42095165..7d92375fed 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1055,6 +1055,7 @@ OBJS-$(CONFIG_VP9_VAAPI_HWACCEL)  += vaapi_vp9.o
 OBJS-$(CONFIG_VP9_VDPAU_HWACCEL)  += vdpau_vp9.o
 OBJS-$(CONFIG_VP9_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_vp9.o
 OBJS-$(CONFIG_VP8_QSV_HWACCEL)+= qsvdec.o
+OBJS-$(CONFIG_CAVS_VAAPI_HWACCEL) += vaapi_cavs.o
 
 # Objects duplicated from other libraries for shared builds
 SHLIBOBJS  += log2_tab.o reverse.o
diff --git a/libavcodec/cavs.c b/libavcodec/cavs.c
index fdd577f7fb..ed7b278336 100644
--- a/libavcodec/cavs.c
+++ b/libavcodec/cavs.c
@@ -810,6 +810,14 @@ av_cold int ff_cavs_init(AVCodecContext *avctx)
 if (!h->cur.f || !h->DPB[0].f || !h->DPB[1].f)
 return AVERROR(ENOMEM);
 
+h->out[0].f = av_frame_alloc();
+h->out[1].f = av_frame_alloc();
+h->out[2].f = av_frame_alloc();
+if (!h->out[0].f || !h->out[1].f || !h->out[2].f) {
+ff_cavs_end(avctx);
+return AVERROR(ENOMEM);
+}
+
 h->luma_scan[0] = 0;
 h->luma_scan[1] = 8;
 h->intra_pred_l[INTRA_L_VERT]   = intra_pred_vert;
@@ -840,6 +848,10 @@ av_cold int ff_cavs_end(AVCodecContext *avctx)
 av_frame_free(&h->DPB[0].f);
 av_frame_free(&h->DPB[1].f);
 
+av_frame_free(&h->out[0].f);
+av_frame_free(&h->out[1].f);
+av_frame_free(&h->out[2].f);
+
 av_freep(&h->top_qp);
 av_freep(&h->top_mv[0]);
 av_freep(&h->top_mv[1]);
diff --git a/libavcodec/cavs.h b/libavcodec/cavs.h
index 244c322b35..ef03c1a974 100644
--- a/libavcodec/cavs.h
+++ b/libavcodec/cavs.h
@@ -39,8 +39,10 @@
 #define EXT_START_CODE  0x01b5
 #define USER_START_CODE 0x01b2
 #define CAVS_START_CODE 0x01b0
+#define VIDEO_SEQ_END_CODE  0x01b1
 #define PIC_I_START_CODE0x01b3
 #define PIC_PB_START_CODE   0x01b6
+#define VIDEO_EDIT_CODE 0x01b7
 
 #define A_AVAIL  1
 #define B_AVAIL  2
@@ -164,10 +166,15 @@ struct dec_2dvlc {
 typedef struct AVSFrame {
 AVFrame *f;
 int poc;
+int outputed;
+
+AVBufferRef   *hwaccel_priv_buf;
+void  *hwaccel_picture_private;
 } AVSFrame;
 
 typedef struct AVSContext {
 AVCodecContext *avctx;
+int got_pix_fmt;
 BlockDSPContext bdsp;
 H264ChromaContext h264chroma;
 VideoDSPContext vdsp;
@@ -175,6 +182,7 @@ typedef struct AVSContext {
 GetBitContext gb;
 AVSF

[FFmpeg-devel] RISC-V vector DSP functions: Motivation for commit 446b009

2024-01-19 Thread Michael Platzer via ffmpeg-devel
Hi,

Commit 446b0090cbb66ee614dcf6ca79c78dc8eb7f0e37 by Remi Denis-Courmont has 
replaced RISC-V vector loads and stores with negative stride with vrgather 
(generalized permutation within vector registers) instructions in order to 
reverse the elements in a vector register. The commit message explains that 
this change was done, but it does not explain why.

I fail to see what could possibly have motivated this change. The RISC-V vector 
loads and stores support negative stride values for use cases such as this one. 
Using vrgather instead replaces the more specific operation with a more generic 
one, which is likely to be less performant on most HW architectures. In 
addition, it requires to setup an index vector, thus raising dynamic 
instruction count.

Could someone familiar with this change (perhaps Remi himself) please explain 
the motivation for this change?

Thanks,
Michael
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] web/template_head2: fix broken anchor on 'Contribute' link

2024-01-19 Thread Marth64
LGTM on inspection. Thank you. From W3 validator:
Document checking completed. No errors or warnings to show.

On Fri, Jan 19, 2024 at 06:58 Anton Khirnov  wrote:

> Quoting Marth64 (2024-01-08 22:40:40)
> > Signed-off-by: Marth64 
> > ---
> >  src/template_head2 | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/template_head2 b/src/template_head2
> > index 7ba634b..102fac0 100644
> > --- a/src/template_head2
> > +++ b/src/template_head2
> > @@ -34,7 +34,7 @@
> >Developers
> >  
> >Source Code
> > -   href="developer.html#Contributing">Contribute
> > +   href="developer.html#Introduction">Contribute
> >http://fate.ffmpeg.org";>FATE
> >http://coverage.ffmpeg.org";>Code
> Coverage
> >  
> > --
> > 2.34.1
>
> Pushed all your pending web patches, hope I didn't miss any.
>
> Thanks,
> --
> Anton Khirnov
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] 回复: [PATCH 7/8] avcodec/x86/vvc: add avg and avg_w AVX2 optimizations

2024-01-19 Thread Wu Jianhua
>发件人: ffmpeg-devel  代表 Michael Niedermayer 
>
>发送时间: 2024年1月18日 13:48
>收件人: FFmpeg development discussions and patches
>主题: Re: [FFmpeg-devel] [PATCH 7/8] avcodec/x86/vvc: add avg and avg_w AVX2 
>optimizations
>
>On Thu, Jan 18, 2024 at 10:24:03PM +0800, toq...@outlook.com wrote:
>> From: Wu Jianhua 
>>
>> The avg/avg_w is based on dav1d.
>> See 
>> https://code.videolan.org/videolan/dav1d/-/blob/master/src/x86/mc_avx2.asm
>>
>>
>> Signed-off-by: Wu Jianhua 
>> ---
>>  libavcodec/x86/vvc/Makefile  |   3 +-
>>  libavcodec/x86/vvc/vvc_mc.asm| 301 +++
>>  libavcodec/x86/vvc/vvcdsp_init.c |  54 ++
>>  3 files changed, 357 insertions(+), 1 deletion(-)
>>  create mode 100644 libavcodec/x86/vvc/vvc_mc.asm
> 
> error: cannot convert from y to UTF-8
> fatal: could not parse patch
> 
> [...]

I used the wrong encoding method. It's updated in the v2. 

Thanks for verifying this.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2 8/8] tests/checkasm/vvc_mc: add check_avg

2024-01-19 Thread toqsxw
From: Wu Jianhua 

Signed-off-by: Wu Jianhua 
---
 tests/checkasm/vvc_mc.c | 64 +
 1 file changed, 64 insertions(+)

diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c
index 711280deec..8adb00573f 100644
--- a/tests/checkasm/vvc_mc.c
+++ b/tests/checkasm/vvc_mc.c
@@ -35,6 +35,7 @@
 static const uint32_t pixel_mask[] = { 0x, 0x03ff03ff, 0x0fff0fff, 
0x3fff3fff, 0x };
 static const int sizes[] = { 2, 4, 8, 16, 32, 64, 128 };
 
+#define SIZEOF_PIXEL ((bit_depth + 7) / 8)
 #define PIXEL_STRIDE (MAX_CTU_SIZE * 2)
 #define EXTRA_BEFORE 3
 #define EXTRA_AFTER  4
@@ -261,10 +262,73 @@ static void check_put_vvc_chroma_uni(void)
 report("put_uni_chroma");
 }
 
+#define AVG_SRC_BUF_SIZE (MAX_CTU_SIZE * MAX_CTU_SIZE)
+#define AVG_DST_BUF_SIZE (MAX_PB_SIZE * MAX_PB_SIZE * 2)
+
+static void check_avg(void)
+{
+LOCAL_ALIGNED_32(int16_t, src00, [AVG_SRC_BUF_SIZE]);
+LOCAL_ALIGNED_32(int16_t, src01, [AVG_SRC_BUF_SIZE]);
+LOCAL_ALIGNED_32(int16_t, src10, [AVG_SRC_BUF_SIZE]);
+LOCAL_ALIGNED_32(int16_t, src11, [AVG_SRC_BUF_SIZE]);
+LOCAL_ALIGNED_32(uint8_t, dst0, [AVG_DST_BUF_SIZE]);
+LOCAL_ALIGNED_32(uint8_t, dst1, [AVG_DST_BUF_SIZE]);
+VVCDSPContext c;
+
+for (int bit_depth = 8; bit_depth <= 12; bit_depth += 2) {
+randomize_avg_src((uint8_t*)src00, (uint8_t*)src10, AVG_SRC_BUF_SIZE * 
sizeof(int16_t));
+randomize_avg_src((uint8_t*)src01, (uint8_t*)src11, AVG_SRC_BUF_SIZE * 
sizeof(int16_t));
+ff_vvc_dsp_init(&c, bit_depth);
+for (int h = 2; h <= MAX_CTU_SIZE; h *= 2) {
+for (int w = 2; w <= MAX_CTU_SIZE; w *= 2) {
+{
+   declare_func_emms(AV_CPU_FLAG_MMX | AV_CPU_FLAG_MMXEXT, 
void, uint8_t *dst, ptrdiff_t dst_stride,
+const int16_t *src0, const int16_t *src1, int width, 
int height);
+if (check_func(c.inter.avg, "avg_%d_%dx%d", bit_depth, w, 
h)) {
+memset(dst0, 0, AVG_DST_BUF_SIZE);
+memset(dst1, 0, AVG_DST_BUF_SIZE);
+call_ref(dst0, MAX_CTU_SIZE * SIZEOF_PIXEL, src00, 
src01, w, h);
+call_new(dst1, MAX_CTU_SIZE * SIZEOF_PIXEL, src10, 
src11, w, h);
+if (memcmp(dst0, dst1, DST_BUF_SIZE))
+fail();
+if (w == h)
+bench_new(dst0, MAX_CTU_SIZE * SIZEOF_PIXEL, 
src00, src01, w, h);
+}
+}
+{
+declare_func_emms(AV_CPU_FLAG_MMX | AV_CPU_FLAG_MMXEXT, 
void, uint8_t *dst, ptrdiff_t dst_stride,
+const int16_t *src0, const int16_t *src1, int width, 
int height,
+int denom, int w0, int w1, int o0, int o1);
+{
+const int denom = rnd() % 8;
+const int w0= rnd() % 256 - 128;
+const int w1= rnd() % 256 - 128;
+const int o0= rnd() % 256 - 128;
+const int o1= rnd() % 256 - 128;
+if (check_func(c.inter.w_avg, "w_avg_%d_%dx%d", 
bit_depth, w, h)) {
+memset(dst0, 0, AVG_DST_BUF_SIZE);
+memset(dst1, 0, AVG_DST_BUF_SIZE);
+
+call_ref(dst0, MAX_CTU_SIZE * SIZEOF_PIXEL, src00, 
src01, w, h, denom, w0, w1, o0, o1);
+call_new(dst1, MAX_CTU_SIZE * SIZEOF_PIXEL, src10, 
src11, w, h, denom, w0, w1, o0, o1);
+if (memcmp(dst0, dst1, DST_BUF_SIZE))
+fail();
+if (w == h)
+bench_new(dst0, MAX_CTU_SIZE * SIZEOF_PIXEL, 
src00, src01, w, h, denom, w0, w1, o0, o1);
+}
+}
+}
+}
+}
+}
+report("avg");
+}
+
 void checkasm_check_vvc_mc(void)
 {
 check_put_vvc_luma();
 check_put_vvc_luma_uni();
 check_put_vvc_chroma();
 check_put_vvc_chroma_uni();
+check_avg();
 }
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2 7/8] avcodec/x86/vvc: add avg and avg_w AVX2 optimizations

2024-01-19 Thread toqsxw
From: Wu Jianhua 

The avg/avg_w is based on dav1d.
See https://code.videolan.org/videolan/dav1d/-/blob/master/src/x86/mc_avx2.asm

vvc_avg_8_2x2_c: 71.6
vvc_avg_8_2x2_avx2: 26.8
vvc_avg_8_2x4_c: 140.8
vvc_avg_8_2x4_avx2: 34.6
vvc_avg_8_2x8_c: 410.3
vvc_avg_8_2x8_avx2: 41.3
vvc_avg_8_2x16_c: 769.3
vvc_avg_8_2x16_avx2: 60.3
vvc_avg_8_2x32_c: 1669.6
vvc_avg_8_2x32_avx2: 105.1
vvc_avg_8_2x64_c: 1978.3
vvc_avg_8_2x64_avx2: 425.8
vvc_avg_8_2x128_c: 6536.8
vvc_avg_8_2x128_avx2: 1315.1
vvc_avg_8_4x2_c: 155.6
vvc_avg_8_4x2_avx2: 26.1
vvc_avg_8_4x4_c: 250.3
vvc_avg_8_4x4_avx2: 31.3
vvc_avg_8_4x8_c: 831.8
vvc_avg_8_4x8_avx2: 41.3
vvc_avg_8_4x16_c: 1461.1
vvc_avg_8_4x16_avx2: 57.1
vvc_avg_8_4x32_c: 2821.6
vvc_avg_8_4x32_avx2: 105.1
vvc_avg_8_4x64_c: 3615.8
vvc_avg_8_4x64_avx2: 412.6
vvc_avg_8_4x128_c: 11962.6
vvc_avg_8_4x128_avx2: 1274.3
vvc_avg_8_8x2_c: 215.8
vvc_avg_8_8x2_avx2: 29.1
vvc_avg_8_8x4_c: 430.6
vvc_avg_8_8x4_avx2: 37.6
vvc_avg_8_8x8_c: 1463.3
vvc_avg_8_8x8_avx2: 51.8
vvc_avg_8_8x16_c: 2630.1
vvc_avg_8_8x16_avx2: 97.6
vvc_avg_8_8x32_c: 5813.8
vvc_avg_8_8x32_avx2: 196.6
vvc_avg_8_8x64_c: 6687.3
vvc_avg_8_8x64_avx2: 487.8
vvc_avg_8_8x128_c: 13178.6
vvc_avg_8_8x128_avx2: 1290.6
vvc_avg_8_16x2_c: 443.8
vvc_avg_8_16x2_avx2: 28.3
vvc_avg_8_16x4_c: 1253.3
vvc_avg_8_16x4_avx2: 32.1
vvc_avg_8_16x8_c: 2236.3
vvc_avg_8_16x8_avx2: 44.3
vvc_avg_8_16x16_c: 5127.8
vvc_avg_8_16x16_avx2: 63.3
vvc_avg_8_16x32_c: 6573.3
vvc_avg_8_16x32_avx2: 223.6
vvc_avg_8_16x64_c: 30311.8
vvc_avg_8_16x64_avx2: 437.8
vvc_avg_8_16x128_c: 25693.3
vvc_avg_8_16x128_avx2: 1266.8
vvc_avg_8_32x2_c: 954.6
vvc_avg_8_32x2_avx2: 32.1
vvc_avg_8_32x4_c: 2359.6
vvc_avg_8_32x4_avx2: 39.6
vvc_avg_8_32x8_c: 5703.6
vvc_avg_8_32x8_avx2: 57.1
vvc_avg_8_32x16_c: 9967.6
vvc_avg_8_32x16_avx2: 107.1
vvc_avg_8_32x32_c: 21327.6
vvc_avg_8_32x32_avx2: 272.6
vvc_avg_8_32x64_c: 39240.8
vvc_avg_8_32x64_avx2: 529.6
vvc_avg_8_32x128_c: 52580.8
vvc_avg_8_32x128_avx2: 1338.8
vvc_avg_8_64x2_c: 1647.3
vvc_avg_8_64x2_avx2: 38.8
vvc_avg_8_64x4_c: 5130.1
vvc_avg_8_64x4_avx2: 58.8
vvc_avg_8_64x8_c: 6529.3
vvc_avg_8_64x8_avx2: 88.3
vvc_avg_8_64x16_c: 19913.6
vvc_avg_8_64x16_avx2: 162.3
vvc_avg_8_64x32_c: 39360.8
vvc_avg_8_64x32_avx2: 295.8
vvc_avg_8_64x64_c: 49658.3
vvc_avg_8_64x64_avx2: 784.1
vvc_avg_8_64x128_c: 108513.1
vvc_avg_8_64x128_avx2: 1977.1
vvc_avg_8_128x2_c: 3226.1
vvc_avg_8_128x2_avx2: 61.1
vvc_avg_8_128x4_c: 10280.3
vvc_avg_8_128x4_avx2: 94.6
vvc_avg_8_128x8_c: 18079.3
vvc_avg_8_128x8_avx2: 155.3
vvc_avg_8_128x16_c: 45121.8
vvc_avg_8_128x16_avx2: 285.3
vvc_avg_8_128x32_c: 48651.8
vvc_avg_8_128x32_avx2: 581.6
vvc_avg_8_128x64_c: 165078.6
vvc_avg_8_128x64_avx2: 1942.8
vvc_avg_8_128x128_c: 339103.1
vvc_avg_8_128x128_avx2: 4332.6
vvc_avg_10_2x2_c: 144.3
vvc_avg_10_2x2_avx2: 26.8
vvc_avg_10_2x4_c: 142.6
vvc_avg_10_2x4_avx2: 45.3
vvc_avg_10_2x8_c: 478.1
vvc_avg_10_2x8_avx2: 38.1
vvc_avg_10_2x16_c: 518.3
vvc_avg_10_2x16_avx2: 58.1
vvc_avg_10_2x32_c: 2059.8
vvc_avg_10_2x32_avx2: 93.1
vvc_avg_10_2x64_c: 2383.8
vvc_avg_10_2x64_avx2: 714.8
vvc_avg_10_2x128_c: 4498.3
vvc_avg_10_2x128_avx2: 1466.3
vvc_avg_10_4x2_c: 228.6
vvc_avg_10_4x2_avx2: 26.8
vvc_avg_10_4x4_c: 378.3
vvc_avg_10_4x4_avx2: 30.6
vvc_avg_10_4x8_c: 866.8
vvc_avg_10_4x8_avx2: 44.6
vvc_avg_10_4x16_c: 1018.1
vvc_avg_10_4x16_avx2: 58.1
vvc_avg_10_4x32_c: 3590.8
vvc_avg_10_4x32_avx2: 128.8
vvc_avg_10_4x64_c: 4200.8
vvc_avg_10_4x64_avx2: 663.6
vvc_avg_10_4x128_c: 8450.8
vvc_avg_10_4x128_avx2: 1531.8
vvc_avg_10_8x2_c: 369.3
vvc_avg_10_8x2_avx2: 28.3
vvc_avg_10_8x4_c: 513.8
vvc_avg_10_8x4_avx2: 32.1
vvc_avg_10_8x8_c: 1720.3
vvc_avg_10_8x8_avx2: 49.1
vvc_avg_10_8x16_c: 1894.8
vvc_avg_10_8x16_avx2: 71.6
vvc_avg_10_8x32_c: 3931.3
vvc_avg_10_8x32_avx2: 148.1
vvc_avg_10_8x64_c: 7964.3
vvc_avg_10_8x64_avx2: 613.1
vvc_avg_10_8x128_c: 15540.1
vvc_avg_10_8x128_avx2: 1585.1
vvc_avg_10_16x2_c: 877.3
vvc_avg_10_16x2_avx2: 27.6
vvc_avg_10_16x4_c: 955.8
vvc_avg_10_16x4_avx2: 29.8
vvc_avg_10_16x8_c: 3419.6
vvc_avg_10_16x8_avx2: 62.6
vvc_avg_10_16x16_c: 3826.8
vvc_avg_10_16x16_avx2: 54.3
vvc_avg_10_16x32_c: 7655.3
vvc_avg_10_16x32_avx2: 86.3
vvc_avg_10_16x64_c: 30011.1
vvc_avg_10_16x64_avx2: 692.6
vvc_avg_10_16x128_c: 47894.8
vvc_avg_10_16x128_avx2: 1580.3
vvc_avg_10_32x2_c: 944.3
vvc_avg_10_32x2_avx2: 29.8
vvc_avg_10_32x4_c: 2022.6
vvc_avg_10_32x4_avx2: 35.1
vvc_avg_10_32x8_c: 6148.8
vvc_avg_10_32x8_avx2: 51.3
vvc_avg_10_32x16_c: 12601.6
vvc_avg_10_32x16_avx2: 70.8
vvc_avg_10_32x32_c: 15958.6
vvc_avg_10_32x32_avx2: 124.3
vvc_avg_10_32x64_c: 31784.6
vvc_avg_10_32x64_avx2: 757.3
vvc_avg_10_32x128_c: 63892.8
vvc_avg_10_32x128_avx2: 1711.3
vvc_avg_10_64x2_c: 1890.8
vvc_avg_10_64x2_avx2: 34.3
vvc_avg_10_64x4_c: 6267.3
vvc_avg_10_64x4_avx2: 42.6
vvc_avg_10_64x8_c: 12778.1
vvc_avg_10_64x8_avx2: 67.8
vvc_avg_10_64x16_c: 22304.3
vvc_avg_10_64x16_avx2: 116.8
vvc_avg_10_64x32_c: 30777.1
vvc_avg_10_64x32_avx2: 201.1
vvc_avg_10_64x64_c: 60169.1
vvc_avg_10_64x64_avx2: 1454.3
vvc_avg_10_64x128_c: 124392.8
vvc_avg_10_64x128_avx2: 3648.6
vvc_avg_10_128x2

[FFmpeg-devel] [PATCH v2 6/8] tests/checkasm: add checkasm_check_vvc_mc

2024-01-19 Thread toqsxw
From: Wu Jianhua 

Signed-off-by: Wu Jianhua 
---
 tests/checkasm/Makefile   |   1 +
 tests/checkasm/checkasm.c |   3 +
 tests/checkasm/checkasm.h |   1 +
 tests/checkasm/vvc_mc.c   | 270 ++
 4 files changed, 275 insertions(+)
 create mode 100644 tests/checkasm/vvc_mc.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 3b5b54352b..3562acb2b2 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -40,6 +40,7 @@ AVCODECOBJS-$(CONFIG_V210_DECODER)  += v210dec.o
 AVCODECOBJS-$(CONFIG_V210_ENCODER)  += v210enc.o
 AVCODECOBJS-$(CONFIG_VORBIS_DECODER)+= vorbisdsp.o
 AVCODECOBJS-$(CONFIG_VP9_DECODER)   += vp9dsp.o
+AVCODECOBJS-$(CONFIG_VVC_DECODER)   += vvc_mc.o
 
 CHECKASMOBJS-$(CONFIG_AVCODEC)  += $(AVCODECOBJS-yes)
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 87f24c77ca..36a97957e5 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -194,6 +194,9 @@ static const struct {
 #if CONFIG_VORBIS_DECODER
 { "vorbisdsp", checkasm_check_vorbisdsp },
 #endif
+#if CONFIG_VVC_DECODER
+{ "vvc_mc", checkasm_check_vvc_mc },
+#endif
 #endif
 #if CONFIG_AVFILTER
 #if CONFIG_AFIR_FILTER
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 4db8c495ea..53cb3ccfbf 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -131,6 +131,7 @@ void checkasm_check_vp8dsp(void);
 void checkasm_check_vp9dsp(void);
 void checkasm_check_videodsp(void);
 void checkasm_check_vorbisdsp(void);
+void checkasm_check_vvc_mc(void);
 
 struct CheckasmPerf;
 
diff --git a/tests/checkasm/vvc_mc.c b/tests/checkasm/vvc_mc.c
new file mode 100644
index 00..711280deec
--- /dev/null
+++ b/tests/checkasm/vvc_mc.c
@@ -0,0 +1,270 @@
+/*
+ * Copyright (c) 2023-2024 Nuo Mi
+ * Copyright (c) 2023-2024 Wu Jianhua
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+
+#include "checkasm.h"
+#include "libavcodec/avcodec.h"
+#include "libavcodec/vvc/vvc_ctu.h"
+#include "libavcodec/vvc/vvc_data.h"
+
+#include "libavutil/common.h"
+#include "libavutil/internal.h"
+#include "libavutil/internal.h"
+#include "libavutil/intreadwrite.h"
+#include "libavutil/mem_internal.h"
+
+static const uint32_t pixel_mask[] = { 0x, 0x03ff03ff, 0x0fff0fff, 
0x3fff3fff, 0x };
+static const int sizes[] = { 2, 4, 8, 16, 32, 64, 128 };
+
+#define PIXEL_STRIDE (MAX_CTU_SIZE * 2)
+#define EXTRA_BEFORE 3
+#define EXTRA_AFTER  4
+#define SRC_EXTRA(EXTRA_BEFORE + EXTRA_AFTER) * 2
+#define SRC_BUF_SIZE (PIXEL_STRIDE + SRC_EXTRA) * (PIXEL_STRIDE + SRC_EXTRA)
+#define DST_BUF_SIZE (MAX_CTU_SIZE * MAX_CTU_SIZE * 2)
+#define SRC_OFFSET   ((PIXEL_STRIDE + EXTRA_BEFORE * 2) * EXTRA_BEFORE)
+
+#define randomize_buffers(buf0, buf1, size, mask)   \
+do {\
+int k;  \
+for (k = 0; k < size; k += 4) { \
+uint32_t r = rnd() & mask;  \
+AV_WN32A(buf0 + k, r);  \
+AV_WN32A(buf1 + k, r);  \
+}   \
+} while (0)
+
+#define randomize_pixels(buf0, buf1, size)  \
+do {\
+uint32_t mask = pixel_mask[(bit_depth - 8) >> 1];   \
+randomize_buffers(buf0, buf1, size, mask);  \
+} while (0)
+
+#define randomize_avg_src(buf0, buf1, size) \
+do {\
+uint32_t mask = 0x3fff3fff; \
+randomize_buffers(buf0, buf1, size, mask);  \
+} while (0)
+
+static void check_put_vvc_luma(void)
+{
+LOCAL_ALIGNED_32(int16_t, dst0, [DST_BUF_SIZE / 2]);
+LOCAL_ALIGNED_32(int16_t, dst1, [DST_BUF_SIZE / 2]);
+LOCAL_ALIGNED_32(uint8_t, src0, [SRC_BUF_SIZE]);
+LOCAL_ALIGNED_32(uint8_t, src1, [SRC_BUF_SIZE]);
+VVCDSPContext c;
+
+declare_func_emms(AV_CPU_FLAG_MMX | AV_CPU_FLAG_MMXEXT, void, int16_t 
*dst, const uint8_t *src, const ptrdiff_t src_stride,
+  

[FFmpeg-devel] [PATCH v2 5/8] avcodec/vvcdec: reuse h26x/2656_inter.asm to enable x86 optimizations

2024-01-19 Thread toqsxw
From: Wu Jianhua 

Signed-off-by: Wu Jianhua 
---
 libavcodec/Makefile  |   1 +
 libavcodec/vvc/vvcdsp.c  |   4 +
 libavcodec/vvc/vvcdsp.h  |   2 +
 libavcodec/x86/vvc/Makefile  |   6 +
 libavcodec/x86/vvc/vvcdsp_init.c | 202 +++
 5 files changed, 215 insertions(+)
 create mode 100644 libavcodec/x86/vvc/Makefile
 create mode 100644 libavcodec/x86/vvc/vvcdsp_init.c

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index bb42095165..ce33631b60 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -65,6 +65,7 @@ OBJS = ac3_parser.o   
  \
 
 # subsystems
 include $(SRC_PATH)/libavcodec/vvc/Makefile
+include $(SRC_PATH)/libavcodec/x86/vvc/Makefile
 OBJS-$(CONFIG_AANDCTTABLES)+= aandcttab.o
 OBJS-$(CONFIG_AC3DSP)  += ac3dsp.o ac3.o ac3tab.o
 OBJS-$(CONFIG_ADTS_HEADER) += adts_header.o 
mpeg4audio_sample_rates.o
diff --git a/libavcodec/vvc/vvcdsp.c b/libavcodec/vvc/vvcdsp.c
index c82ea7be30..c542be5258 100644
--- a/libavcodec/vvc/vvcdsp.c
+++ b/libavcodec/vvc/vvcdsp.c
@@ -138,4 +138,8 @@ void ff_vvc_dsp_init(VVCDSPContext *vvcdsp, int bit_depth)
 VVC_DSP(8);
 break;
 }
+
+#if ARCH_X86
+ff_vvc_dsp_init_x86(vvcdsp, bit_depth);
+#endif
 }
diff --git a/libavcodec/vvc/vvcdsp.h b/libavcodec/vvc/vvcdsp.h
index b5a63c5833..6f59e73654 100644
--- a/libavcodec/vvc/vvcdsp.h
+++ b/libavcodec/vvc/vvcdsp.h
@@ -167,4 +167,6 @@ typedef struct VVCDSPContext {
 
 void ff_vvc_dsp_init(VVCDSPContext *hpc, int bit_depth);
 
+void ff_vvc_dsp_init_x86(VVCDSPContext *hpc, const int bit_depth);
+
 #endif /* AVCODEC_VVC_VVCDSP_H */
diff --git a/libavcodec/x86/vvc/Makefile b/libavcodec/x86/vvc/Makefile
new file mode 100644
index 00..b4acc22501
--- /dev/null
+++ b/libavcodec/x86/vvc/Makefile
@@ -0,0 +1,6 @@
+clean::
+   $(RM) $(CLEANSUFFIXES:%=libavcodec/x86/vvc/%)
+
+OBJS-$(CONFIG_VVC_DECODER) += x86/vvc/vvcdsp_init.o
+X86ASM-OBJS-$(CONFIG_VVC_DECODER)  += x86/h26x/h2656dsp.o   \
+   
  x86/h26x/h2656_inter.o
diff --git a/libavcodec/x86/vvc/vvcdsp_init.c b/libavcodec/x86/vvc/vvcdsp_init.c
new file mode 100644
index 00..c197cdb4cc
--- /dev/null
+++ b/libavcodec/x86/vvc/vvcdsp_init.c
@@ -0,0 +1,202 @@
+/*
+ * VVC DSP init for x86
+ *
+ * Copyright (C) 2022-2024 Nuo Mi
+ * Copyright (c) 2023-2024 Wu Jianhua
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+#include "libavutil/cpu.h"
+#include "libavutil/x86/asm.h"
+#include "libavutil/x86/cpu.h"
+#include "libavcodec/vvc/vvcdec.h"
+#include "libavcodec/vvc/vvc_ctu.h"
+#include "libavcodec/vvc/vvcdsp.h"
+#include "libavcodec/x86/h26x/h2656dsp.h"
+
+#define FW_PUT(name, depth, opt) \
+static void ff_vvc_put_ ## name ## _ ## depth ## _##opt(int16_t *dst, const 
uint8_t *src, ptrdiff_t srcstride, \
+ int height, const int8_t *hf, 
const int8_t *vf, int width)\
+{  
\
+ff_h2656_put_## name ## _ ## depth ## _##opt(dst, 2 * MAX_PB_SIZE, src, 
srcstride, height, hf, vf, width); \
+}
+
+#define FW_PUT_TAP(fname, bitd, opt ) \
+FW_PUT(fname##4,   bitd, opt );   \
+FW_PUT(fname##8,   bitd, opt );   \
+FW_PUT(fname##16,  bitd, opt );   \
+FW_PUT(fname##32,  bitd, opt );   \
+FW_PUT(fname##64,  bitd, opt );   \
+FW_PUT(fname##128, bitd, opt );   \
+
+#define FW_PUT_4TAP(fname, bitd, opt) \
+FW_PUT(fname ## 2, bitd, opt) \
+FW_PUT_TAP(fname,  bitd, opt)
+
+#define FW_PUT_4TAP_SSE4(bitd)   \
+FW_PUT_4TAP(pixels,  bitd, sse4) \
+FW_PUT_4TAP(4tap_h,  bitd, sse4) \
+FW_PUT_4TAP(4tap_v,  bitd, sse4) \
+FW_PUT_4TAP(4tap_hv, bitd, sse4)
+
+#define FW_PUT_8TAP_SSE4(bitd)  \
+FW_PUT_TAP(8tap_h,  bitd, sse4) \
+FW_PUT_TAP(8tap_v,  bitd, sse4) \
+FW_PUT_TAP(8tap_hv, bitd, sse4)
+
+#define FW_PUT_SSE4(bitd)  \
+FW_PUT_4TAP_SSE4(bitd) \
+FW_PUT_8TAP_SSE4(bitd)
+
+FW_PUT_SSE4( 8);
+FW_PUT_SSE4(10);
+FW_PUT_SSE4(12);
+
+#define FW_PUT_T

[FFmpeg-devel] [PATCH v2 1/8] avcodec/vvc/vvc_inter_template: move put/put_luma/put_chroma template to h2656_inter_template.c

2024-01-19 Thread toqsxw
From: Wu Jianhua 

Signed-off-by: Wu Jianhua 
---
 libavcodec/h26x/h2656_inter_template.c | 577 +
 libavcodec/vvc/vvc_inter_template.c| 559 +---
 2 files changed, 578 insertions(+), 558 deletions(-)
 create mode 100644 libavcodec/h26x/h2656_inter_template.c

diff --git a/libavcodec/h26x/h2656_inter_template.c 
b/libavcodec/h26x/h2656_inter_template.c
new file mode 100644
index 00..864f6c7e7d
--- /dev/null
+++ b/libavcodec/h26x/h2656_inter_template.c
@@ -0,0 +1,577 @@
+/*
+ * inter prediction template for HEVC/VVC
+ *
+ * Copyright (C) 2022 Nuo Mi
+ * Copyright (C) 2024 Wu Jianhua
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#define CHROMA_EXTRA_BEFORE 1
+#define CHROMA_EXTRA3
+#define LUMA_EXTRA_BEFORE   3
+#define LUMA_EXTRA  7
+
+static void FUNC(put_pixels)(int16_t *dst,
+const uint8_t *_src, const ptrdiff_t _src_stride,
+const int height, const int8_t *hf, const int8_t *vf, const int width)
+{
+const pixel *src= (const pixel *)_src;
+const ptrdiff_t src_stride  = _src_stride / sizeof(pixel);
+
+for (int y = 0; y < height; y++) {
+for (int x = 0; x < width; x++)
+dst[x] = src[x] << (14 - BIT_DEPTH);
+src += src_stride;
+dst += MAX_PB_SIZE;
+}
+}
+
+static void FUNC(put_uni_pixels)(uint8_t *_dst, const ptrdiff_t _dst_stride,
+const uint8_t *_src, const ptrdiff_t _src_stride, const int height,
+ const int8_t *hf, const int8_t *vf, const int width)
+{
+const pixel *src= (const pixel *)_src;
+pixel *dst  = (pixel *)_dst;
+const ptrdiff_t src_stride  = _src_stride / sizeof(pixel);
+const ptrdiff_t dst_stride  = _dst_stride / sizeof(pixel);
+
+for (int y = 0; y < height; y++) {
+memcpy(dst, src, width * sizeof(pixel));
+src += src_stride;
+dst += dst_stride;
+}
+}
+
+static void FUNC(put_uni_w_pixels)(uint8_t *_dst, const ptrdiff_t _dst_stride,
+const uint8_t *_src, const ptrdiff_t _src_stride, const int height,
+const int denom, const int wx, const int _ox,  const int8_t *hf, const 
int8_t *vf,
+const int width)
+{
+const pixel *src= (const pixel *)_src;
+pixel *dst  = (pixel *)_dst;
+const ptrdiff_t src_stride  = _src_stride / sizeof(pixel);
+const ptrdiff_t dst_stride  = _dst_stride / sizeof(pixel);
+const int shift = denom + 14 - BIT_DEPTH;
+#if BIT_DEPTH < 14
+const int offset= 1 << (shift - 1);
+#else
+const int offset= 0;
+#endif
+const int ox= _ox * (1 << (BIT_DEPTH - 8));
+
+for (int y = 0; y < height; y++) {
+for (int x = 0; x < width; x++) {
+const int v = (src[x] << (14 - BIT_DEPTH));
+dst[x] = av_clip_pixel(((v * wx + offset) >> shift) + ox);
+}
+src += src_stride;
+dst += dst_stride;
+}
+}
+
+#define LUMA_FILTER(src, stride)   
\
+(filter[0] * src[x - 3 * stride] + 
\
+ filter[1] * src[x - 2 * stride] + 
\
+ filter[2] * src[x - stride] + 
\
+ filter[3] * src[x ] + 
\
+ filter[4] * src[x + stride] + 
\
+ filter[5] * src[x + 2 * stride] + 
\
+ filter[6] * src[x + 3 * stride] + 
\
+ filter[7] * src[x + 4 * stride])
+
+static void FUNC(put_luma_h)(int16_t *dst, const uint8_t *_src, const 
ptrdiff_t _src_stride,
+const int height, const int8_t *hf, const int8_t *vf, const int width)
+{
+const pixel *src   = (const pixel*)_src;
+const ptrdiff_t src_stride = _src_stride / sizeof(pixel);
+const int8_t *filter   = hf;
+
+for (int y = 0; y < height; y++) {
+for (int x = 0; x < width; x++)
+dst[x] = LUMA_FILTER(src, 1) >> (BIT_DEPTH - 8);
+src += src_stride;
+dst += MAX_PB_SIZE;
+}
+}
+
+static void FUNC(put_luma_v)(int

[FFmpeg-devel] [PATCH v2 4/8] avcodec/x86/h26x/h2656_inter: add dststride to put

2024-01-19 Thread toqsxw
From: Wu Jianhua 

Signed-off-by: Wu Jianhua 
---
 libavcodec/x86/h26x/h2656_inter.asm | 32 ++---
 libavcodec/x86/h26x/h2656dsp.c  |  4 ++--
 libavcodec/x86/h26x/h2656dsp.h  |  2 +-
 libavcodec/x86/hevcdsp_init.c   |  2 +-
 4 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/libavcodec/x86/h26x/h2656_inter.asm 
b/libavcodec/x86/h26x/h2656_inter.asm
index 4316c8ae3d..68f88832a6 100644
--- a/libavcodec/x86/h26x/h2656_inter.asm
+++ b/libavcodec/x86/h26x/h2656_inter.asm
@@ -22,8 +22,6 @@
 ; */
 %include "libavutil/x86/x86util.asm"
 
-%define MAX_PB_SIZE 64
-
 SECTION_RODATA 32
 cextern pw_255
 cextern pw_512
@@ -332,7 +330,7 @@ SECTION .text
 %endmacro
 
 %macro LOOP_END 3
-add  %1q, 2*MAX_PB_SIZE  ; dst += dststride
+add  %1q, dststrideq ; dst += dststride
 add  %2q, %3q; src += srcstride
 dec  heightd ; cmp height
 jnz   .loop  ; height loop
@@ -529,7 +527,7 @@ SECTION .text
 
 
 ; **
-; void %1_put_pixels(int16_t *dst, const uint8_t *_src, ptrdiff_t srcstride,
+; void %1_put_pixels(int16_t *dst, ptrdiff_t dststride, const uint8_t *_src, 
ptrdiff_t srcstride,
 ; int height, const int8_t *hf, const int8_t *vf, int 
width)
 ; **
 
@@ -539,7 +537,7 @@ SECTION .text
 %endmacro
 
 %macro MC_PIXELS 3
-cglobal %1_put_pixels%2_%3, 4, 4, 3, dst, src, srcstride, height
+cglobal %1_put_pixels%2_%3, 5, 5, 3, dst, dststride, src, srcstride, height
 pxor  m2, m2
 .loop:
 SIMPLE_LOAD   %2, %3, srcq, m0
@@ -569,10 +567,10 @@ cglobal %1_put_uni_pixels%2_%3, 5, 5, 2, dst, dststride, 
src, srcstride, height
 %endif
 
 ; **
-; void %1_put_4tap_hX(int16_t *dst,
+; void %1_put_4tap_hX(int16_t *dst, ptrdiff_t dststride,
 ;  const uint8_t *_src, ptrdiff_t _srcstride, int height, int8_t *hf, 
int8_t *vf, int width);
 ; **
-cglobal %1_put_4tap_h%2_%3, 5, 5, XMM_REGS, dst, src, srcstride, height, hf
+cglobal %1_put_4tap_h%2_%3, 6, 6, XMM_REGS, dst, dststride, src, srcstride, 
height, hf
 %assign %%stride ((%3 + 7)/8)
 MC_4TAP_FILTER   %3, hf, m4, m5
 .loop:
@@ -602,10 +600,10 @@ cglobal %1_put_uni_4tap_h%2_%3, 6, 7, XMM_REGS, dst, 
dststride, src, srcstride,
 RET
 
 ; **
-; void %1_put_4tap_v(int16_t *dst,
+; void %1_put_4tap_v(int16_t *dst, ptrdiff_t dststride,
 ;  const uint8_t *_src, ptrdiff_t _srcstride, int height, int8_t *hf, 
int8_t *vf, int width)
 ; **
-cglobal %1_put_4tap_v%2_%3, 6, 6, XMM_REGS, dst, src, srcstride, height, 
r3src, vf
+cglobal %1_put_4tap_v%2_%3, 7, 7, XMM_REGS, dst, dststride, src, srcstride, 
height, r3src, vf
 sub srcq, srcstrideq
 MC_4TAP_FILTER%3, vf, m4, m5
 lea   r3srcq, [srcstrideq*3]
@@ -639,10 +637,10 @@ cglobal %1_put_uni_4tap_v%2_%3, 7, 7, XMM_REGS, dst, 
dststride, src, srcstride,
 
 %macro PUT_4TAP_HV 3
 ; **
-; void put_4tap_hv(int16_t *dst,
+; void put_4tap_hv(int16_t *dst, ptrdiff_t dststride,
 ;  const uint8_t *_src, ptrdiff_t _srcstride, int height, int8_t *hf, 
int8_t *vf, int width)
 ; **
-cglobal %1_put_4tap_hv%2_%3, 6, 7, 16 , dst, src, srcstride, height, hf, vf, 
r3src
+cglobal %1_put_4tap_hv%2_%3, 7, 8, 16 , dst, dststride, src, srcstride, 
height, hf, vf, r3src
 %assign %%stride ((%3 + 7)/8)
 sub srcq, srcstrideq
 MC_4TAP_HV_FILTER%3
@@ -774,12 +772,12 @@ cglobal %1_put_uni_4tap_hv%2_%3, 7, 8, 16 , dst, 
dststride, src, srcstride, heig
 %endmacro
 
 ; **
-; void put_8tap_hX_X_X(int16_t *dst, const uint8_t *_src, ptrdiff_t srcstride,
+; void put_8tap_hX_X_X(int16_t *dst, ptrdiff_t dststride, const uint8_t *_src, 
ptrdiff_t srcstride,
 ;   int height, const int8_t *hf, const int8_t *vf, int 
width)
 ; **
 
 %macro PUT_8TAP 3
-cglobal %1_put_8tap_h%2_%3, 5, 5, 16, dst, src, srcstride, height, hf
+cglobal %1_put_8tap_h%2_%3, 6, 6, 16, dst, dststride, src, srcstride, height, 
hf
 MC_8TAP_FILTER  %3, hf
 .loop:
 MC_8TAP_H_LOAD  %3, srcq, %2, 10
@@ -814,10 +812,10 @@ cglobal %1_put_uni_8tap_h%2_%3, 6, 7, 16 , dst, 
dststride, src, srcstride, heigh
 
 
 ; **
-; void put_8tap_vX_X_X(int16_t *dst, const uint8_t *_src, ptrdiff_t srcstride,
+; void put_8tap_vX_X_X(int16_t *dst, ptrdiff_t dststride, const uint8_t *_src, 
ptrdiff_t srcstride,
 ;  int height, const int8_t *hf, const int8_t *vf, int 
width)
 ; **
-cglobal %1_put_8tap_v%2_%3, 6, 8, 16, dst, src, srcstride, height, r3src, vf
+cglobal %1_put_8tap_v%2_%3, 7, 8, 16, dst, dststride, src, srcstride, height, 

[FFmpeg-devel] [PATCH v2 3/8] avcodec/x86/hevc_mc: move put/put_uni to h26x/h2656_inter.asm

2024-01-19 Thread toqsxw
From: Wu Jianhua 

This enable that the asm optimization can be reused by VVC

Signed-off-by: Wu Jianhua 
---
 libavcodec/x86/Makefile |1 +
 libavcodec/x86/h26x/h2656_inter.asm | 1135 +++
 libavcodec/x86/h26x/h2656dsp.c  |   98 +++
 libavcodec/x86/h26x/h2656dsp.h  |  103 +++
 libavcodec/x86/hevc_mc.asm  |  462 +--
 libavcodec/x86/hevcdsp_init.c   |  108 ++-
 6 files changed, 1461 insertions(+), 446 deletions(-)
 create mode 100644 libavcodec/x86/h26x/h2656_inter.asm
 create mode 100644 libavcodec/x86/h26x/h2656dsp.c
 create mode 100644 libavcodec/x86/h26x/h2656dsp.h

diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index d5fb30645a..8098cd840c 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -167,6 +167,7 @@ X86ASM-OBJS-$(CONFIG_HEVC_DECODER) += 
x86/hevc_add_res.o\
   x86/hevc_deblock.o\
   x86/hevc_idct.o   \
   x86/hevc_mc.o \
+  x86/h26x/h2656_inter.o\
   x86/hevc_sao.o\
   x86/hevc_sao_10bit.o
 X86ASM-OBJS-$(CONFIG_JPEG2000_DECODER) += x86/jpeg2000dsp.o
diff --git a/libavcodec/x86/h26x/h2656_inter.asm 
b/libavcodec/x86/h26x/h2656_inter.asm
new file mode 100644
index 00..4316c8ae3d
--- /dev/null
+++ b/libavcodec/x86/h26x/h2656_inter.asm
@@ -0,0 +1,1135 @@
+; /*
+; * Provide SSE luma and chroma mc functions for HEVC/VVC decoding
+; * Copyright (c) 2013 Pierre-Edouard LEPERE
+; * Copyright (c) 2023-2024 Nuo Mi
+; * Copyright (c) 2023-2024 Wu Jianhua
+; *
+; * This file is part of FFmpeg.
+; *
+; * FFmpeg is free software; you can redistribute it and/or
+; * modify it under the terms of the GNU Lesser General Public
+; * License as published by the Free Software Foundation; either
+; * version 2.1 of the License, or (at your option) any later version.
+; *
+; * FFmpeg is distributed in the hope that it will be useful,
+; * but WITHOUT ANY WARRANTY; without even the implied warranty of
+; * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+; * Lesser General Public License for more details.
+; *
+; * You should have received a copy of the GNU Lesser General Public
+; * License along with FFmpeg; if not, write to the Free Software
+; * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
USA
+; */
+%include "libavutil/x86/x86util.asm"
+
+%define MAX_PB_SIZE 64
+
+SECTION_RODATA 32
+cextern pw_255
+cextern pw_512
+cextern pw_2048
+cextern pw_1023
+cextern pw_1024
+cextern pw_4096
+cextern pw_8192
+%define scale_8 pw_512
+%define scale_10 pw_2048
+%define scale_12 pw_8192
+%define max_pixels_8 pw_255
+%define max_pixels_10 pw_1023
+max_pixels_12:  times 16 dw ((1 << 12)-1)
+cextern pb_0
+
+SECTION .text
+%macro SIMPLE_LOAD 4;width, bitd, tab, r1
+%if %1 == 2 || (%2 == 8 && %1 <= 4)
+movd  %4, [%3]   ; 
load data from source
+%elif %1 == 4 || (%2 == 8 && %1 <= 8)
+movq  %4, [%3]   ; 
load data from source
+%elif notcpuflag(avx)
+movu  %4, [%3]   ; 
load data from source
+%elif %1 <= 8 || (%2 == 8 && %1 <= 16)
+movdqu   %4, [%3]
+%else
+movu  %4, [%3]
+%endif
+%endmacro
+
+%macro MC_4TAP_FILTER 4 ; bitdepth, filter, a, b,
+vpbroadcastw   %3, [%2q + 0 * 2]  ; coeff 0, 1
+vpbroadcastw   %4, [%2q + 1 * 2]  ; coeff 2, 3
+%if %1 != 8
+pmovsxbw   %3, xmm%3
+pmovsxbw   %4, xmm%4
+%endif
+%endmacro
+
+%macro MC_4TAP_HV_FILTER 1
+vpbroadcastw  m12, [vfq + 0 * 2]  ; vf 0, 1
+vpbroadcastw  m13, [vfq + 1 * 2]  ; vf 2, 3
+vpbroadcastw  m14, [hfq + 0 * 2]  ; hf 0, 1
+vpbroadcastw  m15, [hfq + 1 * 2]  ; hf 2, 3
+
+pmovsxbw  m12, xm12
+pmovsxbw  m13, xm13
+%if %1 != 8
+pmovsxbw  m14, xm14
+pmovsxbw  m15, xm15
+%endif
+lea   r3srcq, [srcstrideq*3]
+%endmacro
+
+%macro MC_8TAP_SAVE_FILTER 5;offset, mm registers
+mova [rsp + %1 + 0*mmsize], %2
+mova [rsp + %1 + 1*mmsize], %3
+mova [rsp + %1 + 2*mmsize], %4
+mova [rsp + %1 + 3*mmsize], %5
+%endmacro
+
+%macro MC_8TAP_FILTER 2-3 ;bitdepth, filter, offset
+vpbroadcastw  m12, [%2q + 0 * 2]  ; coeff 0, 1
+vpbroadcastw  m13, [%2q + 1 * 2]  ; coeff 2, 3
+vpbroadcastw  m14, [%2q + 2 * 2]  ; coeff 4, 5
+vpbroadcastw  m15, [%2q + 3 * 2]  ; coeff 6, 7
+%if %0 == 3
+MC_8TAP_SAVE_FILTER%3, m12, m13, m14, m15
+%endif
+
+%if %1 != 8
+pmovsxbw  m12, xm12
+pmovsxbw 

[FFmpeg-devel] [PATCH v2 2/8] avcodec/hevcdsp_template: reuse put/put_luma/put_chroma from h2656_inter_template

2024-01-19 Thread toqsxw
From: Wu Jianhua 

Signed-off-by: Wu Jianhua 
---
 libavcodec/hevcdsp_template.c | 594 +++---
 1 file changed, 46 insertions(+), 548 deletions(-)

diff --git a/libavcodec/hevcdsp_template.c b/libavcodec/hevcdsp_template.c
index 0de14e9dcf..9b48bdf08e 100644
--- a/libavcodec/hevcdsp_template.c
+++ b/libavcodec/hevcdsp_template.c
@@ -26,6 +26,7 @@
 #include "bit_depth_template.c"
 #include "hevcdsp.h"
 #include "h26x/h2656_sao_template.c"
+#include "h26x/h2656_inter_template.c"
 
 static void FUNC(put_pcm)(uint8_t *_dst, ptrdiff_t stride, int width, int 
height,
   GetBitContext *gb, int pcm_bit_depth)
@@ -299,37 +300,51 @@ IDCT_DC(32)
 

 //
 

-static void FUNC(put_hevc_pel_pixels)(int16_t *dst,
-  const uint8_t *_src, ptrdiff_t 
_srcstride,
-  int height, intptr_t mx, intptr_t my, 
int width)
-{
-int x, y;
-const pixel *src= (const pixel *)_src;
-ptrdiff_t srcstride = _srcstride / sizeof(pixel);
-
-for (y = 0; y < height; y++) {
-for (x = 0; x < width; x++)
-dst[x] = src[x] << (14 - BIT_DEPTH);
-src += srcstride;
-dst += MAX_PB_SIZE;
-}
-}
-
-static void FUNC(put_hevc_pel_uni_pixels)(uint8_t *_dst, ptrdiff_t _dststride, 
const uint8_t *_src, ptrdiff_t _srcstride,
-  int height, intptr_t mx, intptr_t 
my, int width)
-{
-int y;
-const pixel *src= (const pixel *)_src;
-ptrdiff_t srcstride = _srcstride / sizeof(pixel);
-pixel *dst  = (pixel *)_dst;
-ptrdiff_t dststride = _dststride / sizeof(pixel);
-
-for (y = 0; y < height; y++) {
-memcpy(dst, src, width * sizeof(pixel));
-src += srcstride;
-dst += dststride;
-}
-}
+#define ff_hevc_pel_filters ff_hevc_qpel_filters
+#define DECL_HV_FILTER(f)  \
+const uint8_t *hf = ff_hevc_ ## f ## _filters[mx - 1]; \
+const uint8_t *vf = ff_hevc_ ## f ## _filters[my - 1];
+
+#define FW_PUT(p, f, t)
   \
+static void FUNC(put_hevc_## f)(int16_t *dst, const uint8_t *src, ptrdiff_t 
srcstride, int height,\
+  intptr_t mx, intptr_t my, int width) 
   \
+{  
   \
+DECL_HV_FILTER(p)  
   \
+FUNC(put_ ## t)(dst, src, srcstride, height, hf, vf, width);   
   \
+}
+
+#define FW_PUT_UNI(p, f, t)
   \
+static void FUNC(put_hevc_ ## f)(uint8_t *dst, ptrdiff_t dststride, const 
uint8_t *src,   \
+  ptrdiff_t srcstride, int height, intptr_t 
mx, intptr_t my, int width)   \
+{  
   \
+DECL_HV_FILTER(p)  
   \
+FUNC(put_ ## t)(dst, dststride, src, srcstride, height, hf, vf, width);
   \
+}
+
+#define FW_PUT_UNI_W(p, f, t)  
   \
+static void FUNC(put_hevc_ ## f)(uint8_t *dst, ptrdiff_t dststride, const 
uint8_t *src,   \
+  ptrdiff_t srcstride,int height, int denom, 
int wx, int ox,  \
+  intptr_t mx, intptr_t my, int width) 
   \
+{  
   \
+DECL_HV_FILTER(p)  
   \
+FUNC(put_ ## t)(dst, dststride, src, srcstride, height, denom, wx, ox, hf, 
vf, width);\
+}
+
+#define FW_PUT_FUNCS(f, t, dir)   \
+FW_PUT(f, f ## _ ## dir, t ## _ ## dir) \
+FW_PUT_UNI(f, f ## _uni_ ## dir, uni_ ## t ## _ ## dir)\
+FW_PUT_UNI_W(f, f ## _uni_w_ ## dir, uni_## t ## _w_ ## dir)
+
+FW_PUT(pel, pel_pixels, pixels)
+FW_PUT_UNI(pel, pel_uni_pixels, uni_pixels)
+FW_PUT_UNI_W(pel, pel_uni_w_pixels, uni_w_pixels)
+
+FW_PUT_FUNCS(qpel, luma,   h )
+FW_PUT_FUNCS(qpel, luma,   v )
+FW_PUT_FUNCS(qpel, luma,   hv)
+FW_PUT_FUNCS(epel, chroma, h )
+FW_PUT_FUNCS(epel, chroma, v )
+FW_PUT_FUNCS(epel, chroma, hv)
 
 static void FUNC(put_hevc_pel_bi_pixels)(uint8_t *_

Re: [FFmpeg-devel] [PATCH] Revert "all: Don't set AVClass.item_name to its default value"

2024-01-19 Thread Anton Khirnov
Quoting James Almer (2024-01-19 14:01:47)
> On 1/19/2024 9:36 AM, Anton Khirnov wrote:
> > Some callers assume that item_name is always set, so this may be
> > considered an API break.
> > 
> > This reverts commit 0c6203c97a99f69dbaa6e4011d48c331ef5e.
> 
> Ok.
> 
> We could fwiw announce that starting with an arbitrary major soname 
> item_name can be NULL, and revert this revert.

Might be better to just replace it with a new field, since otherwise we
cannot signal this to the users.

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] Revert "all: Don't set AVClass.item_name to its default value"

2024-01-19 Thread James Almer

On 1/19/2024 9:36 AM, Anton Khirnov wrote:

Some callers assume that item_name is always set, so this may be
considered an API break.

This reverts commit 0c6203c97a99f69dbaa6e4011d48c331ef5e.


Ok.

We could fwiw announce that starting with an arbitrary major soname 
item_name can be NULL, and revert this revert.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] web/template_head2: fix broken anchor on 'Contribute' link

2024-01-19 Thread Anton Khirnov
Quoting Marth64 (2024-01-08 22:40:40)
> Signed-off-by: Marth64 
> ---
>  src/template_head2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/template_head2 b/src/template_head2
> index 7ba634b..102fac0 100644
> --- a/src/template_head2
> +++ b/src/template_head2
> @@ -34,7 +34,7 @@
>Developers
>  
>Source Code
> -  Contribute
> +  Contribute
>http://fate.ffmpeg.org";>FATE
>http://coverage.ffmpeg.org";>Code Coverage
>  
> -- 
> 2.34.1

Pushed all your pending web patches, hope I didn't miss any.

Thanks,
-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avutil/eval: Use even better PRNG

2024-01-19 Thread Michael Koch
There is still a small problem with the random generator, but this has 
nothing to do with the recent changes.
If the random() expression is used in the geq filter, then multiple 
pixels get the same sequence of random numbers.

As can be shown with this command, where the frame has only two pixels:

ffmpeg -loglevel repeat -f lavfi -i nullsrc=size=1x2,format=gray -vf 
"geq=lum='print(random(0));print(random(0));print(random(0))'" -frames 1 
-y out.png


I think it's because the filter is executed in multiple threads.
-filter_threads 1 fixes the problem, but it slows down the whole filter 
thread.


Michael


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".