Re: [FFmpeg-devel] [PATCH v2] aacenc_pred: prevent UB in ff_aac_adjust_common_pred()

2024-10-06 Thread Martin Storsjö

On Sat, 5 Oct 2024, Sean McGovern wrote:


Hi

On Sat, Oct 5, 2024, 19:15 Lynne via ffmpeg-devel 
wrote:


On 05/10/2024 20:58, Sean McGovern wrote:

---
  libavcodec/aacenc_pred.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libavcodec/aacenc_pred.c b/libavcodec/aacenc_pred.c
index a486c44d42..a6dfaa25fb 100644
--- a/libavcodec/aacenc_pred.c
+++ b/libavcodec/aacenc_pred.c
@@ -153,9 +153,7 @@ void ff_aac_adjust_common_pred(AACEncContext *s,

ChannelElement *cpe)

  int start, w, w2, g, i, count = 0;
  SingleChannelElement *sce0 = &cpe->ch[0];
  SingleChannelElement *sce1 = &cpe->ch[1];
-const int pmax0 = FFMIN(sce0->ics.max_sfb,

ff_aac_pred_sfb_max[s->samplerate_index]);

-const int pmax1 = FFMIN(sce1->ics.max_sfb,

ff_aac_pred_sfb_max[s->samplerate_index]);

-const int pmax  = FFMIN(pmax0, pmax1);
+const int pmax = FFMIN(sce1->ics.max_sfb,

ff_aac_pred_sfb_max[s->samplerate_index]);


  if (!cpe->common_window ||
  sce0->ics.window_sequence[0] == EIGHT_SHORT_SEQUENCE ||
@@ -164,7 +162,7 @@ void ff_aac_adjust_common_pred(AACEncContext *s,

ChannelElement *cpe)


  for (w = 0; w < sce0->ics.num_windows; w +=

sce0->ics.group_len[w]) {

  start = 0;
-for (g = 0; g < sce0->ics.num_swb; g++) {
+for (g = 0; g < pmax; g++) {
  int sfb = w*16+g;
  int sum = sce0->ics.prediction_used[sfb] +

sce1->ics.prediction_used[sfb];

  float ener0 = 0.0f, ener1 = 0.0f, ener01 = 0.0f;


I'm not sure I see the UB here?



It corrects the issue noted by both the x86_64 and PPC64 UBsan FATE nodes.


That issue will be impossible to find for people looking at this code, 
once such runs no longer are visible on FATE.


Always summarize the issue and how you go about fixing it, in the commit 
message.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/mediacodecenc: Extract configOBUs from AV1CodecConfigurationRecord

2024-10-04 Thread Martin Storsjö

On Sat, 5 Oct 2024, Zhao Zhili wrote:


From: Zhao Zhili 

MediaCodec can generate AV1CodecConfigurationRecord, which shouldn't
be put into packet->data. Skip four bytes and extract configOBUs
if it exist.
---
I did some test on Pixel 8 Pro. AV1 hardware encoding works with a lot
of bugs:

1. It's broken for width non-aligned to 16. For width 1080 and pixel
format YUV420P, MediaCodec use 1080 as stride. For pixel format NV12,
MediaCodec use 1088 as stride. There is no API to get the stride info.
AMEDIAFORMAT_KEY_STRIDE doesn't work. And set stride to MediaCodec has
no effect from my test, at least on that device. We know the buffer
size provided by MediaCodec, but we still cannot get stride by
buf_size / height:

 1) For YUV420P, buf_size = 1080 * height
 2) For NV12, buf_size = 1080 + 1088 * (height - 1). Yes, buf_size doesn't
count last line's padding :(


Isn't this pretty much the case for the encoders for other codecs as well 
- there aren't really any compat guarantees for how they behave for widths 
that aren't a multiple of 16? At least back when there when Android added 
CTS tests to guarantee some sort of cross device consistent behaviour, 
they only tested/mandated the behviour for a couple resolutions, that all 
were even multiples of 16.


I guess the difference here is whether it's possible to do cropping in the 
same way as via the h264_metadata/hevc_metadata BSFs?


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/mfenc: add support for AV1 MF encoders

2024-10-04 Thread Martin Storsjö

On Fri, 4 Oct 2024, Dash Santosh wrote:


From 77c708805c52302861650cf770f6c32a33590e90 Mon Sep 17 00:00:00 2001
From: Min Chen 
Date: Fri, 4 Oct 2024 23:04:04 +0530
Subject: [PATCH] avcodec/mfenc: add support for AV1 MF encoders
X-Unsent: 1
To: ffmpeg-devel@ffmpeg.org

Signed-off-by: Dash Santosh 
---
configure  | 1 +
libavcodec/allcodecs.c | 1 +
libavcodec/mf_utils.c  | 2 ++
libavcodec/mfenc.c | 1 +
4 files changed, 5 insertions(+)

diff --git a/configure b/configure
index 0247ea08d6..63bc53cc27 100755
--- a/configure
+++ b/configure
@@ -3347,6 +3347,7 @@ av1_cuvid_decoder_deps="cuvid CUVIDAV1PICPARAMS"
av1_mediacodec_decoder_deps="mediacodec"
av1_mediacodec_encoder_deps="mediacodec"
av1_mediacodec_encoder_select="extract_extradata_bsf"
+av1_mf_encoder_deps="mediafoundation"
av1_nvenc_encoder_deps="nvenc NV_ENC_PIC_PARAMS_AV1"
av1_nvenc_encoder_select="atsc_a53"
av1_qsv_decoder_select="qsvdec"
diff --git a/libavcodec/allcodecs.c b/libavcodec/allcodecs.c
index aa0fc47647..f5317616b7 100644
--- a/libavcodec/allcodecs.c
+++ b/libavcodec/allcodecs.c
@@ -838,6 +838,7 @@ extern const FFCodec ff_av1_nvenc_encoder;
extern const FFCodec ff_av1_qsv_decoder;
extern const FFCodec ff_av1_qsv_encoder;
extern const FFCodec ff_av1_amf_encoder;
+extern const FFCodec ff_av1_mf_encoder;
extern const FFCodec ff_av1_vaapi_encoder;
extern const FFCodec ff_libopenh264_encoder;
extern const FFCodec ff_libopenh264_decoder;
diff --git a/libavcodec/mf_utils.c b/libavcodec/mf_utils.c
index 48e3a63efc..f740a6090b 100644
--- a/libavcodec/mf_utils.c
+++ b/libavcodec/mf_utils.c
@@ -240,6 +240,7 @@ static struct GUID_Entry guid_names[] = {
GUID_ENTRY(MFMediaType_Video),
GUID_ENTRY(MFAudioFormat_PCM),
GUID_ENTRY(MFAudioFormat_Float),
+GUID_ENTRY(MFVideoFormat_AV1),
GUID_ENTRY(MFVideoFormat_H264),
GUID_ENTRY(MFVideoFormat_H264_ES),
GUID_ENTRY(ff_MFVideoFormat_HEVC),
@@ -507,6 +508,7 @@ void ff_media_type_dump(void *log, IMFMediaType *type)
const CLSID *ff_codec_to_mf_subtype(enum AVCodecID codec)
{
switch (codec) {
+case AV_CODEC_ID_AV1:   return &MFVideoFormat_AV1;
case AV_CODEC_ID_H264:  return &MFVideoFormat_H264;
case AV_CODEC_ID_HEVC:  return &ff_MFVideoFormat_HEVC;


Doing this like this would break compilation with any earlier SDK, that 
doesn't contain a declaration of MFVideoFormat_AV1. See how we've provided 
a local definition of MFVideoFormat_HEVC in the form of 
ff_MFVideoFormat_HEVC, to work around this issue.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] configure: Enable -Wno-implicit-const-int-float-conversion if available

2024-10-04 Thread Martin Storsjö

On Wed, 2 Oct 2024, Martin Storsjö wrote:


This silences a lot of compile warnings (around 160 instances at least), when
compiling with Clang.

These warnings look like this:

   libavformat/http.c:176:133: warning: implicit conversion from 'long long' to 
'double' changes value from 9223372036854775807 to 9223372036854775808 
[-Wimplicit-const-int-float-conversion]
 176 | { "end_offset", "try to limit the request to bytes preceding this 
offset", OFFSET(end_off), AV_OPT_TYPE_INT64, { .i64 = 0 }, 0, INT64_MAX, D },
 | ~
   ^
---
configure | 1 +
1 file changed, 1 insertion(+)

diff --git a/configure b/configure
index dc1b9b2bea..2b0ba07771 100755
--- a/configure
+++ b/configure
@@ -7459,6 +7459,7 @@ check_disable_warning -Wno-pointer-sign
check_disable_warning -Wno-unused-const-variable
check_disable_warning -Wno-bool-operation
check_disable_warning -Wno-char-subscripts
+check_disable_warning -Wno-implicit-const-int-float-conversion

check_disable_warning_headers(){
warning_flag=-W${1#-Wno-}
--
2.39.5 (Apple Git-154)


I'll push this one soon as well, if there's no good ideas (with a 
volunteered implementation) to get around the actual issue that the 
compiler warns about.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] libavcodec: x86: Remove an explicit include of config.asm

2024-10-04 Thread Martin Storsjö

On Wed, 2 Oct 2024, Martin Storsjö wrote:


This file is never included explicitly anywhere else, it's only
included implicitly by passing -Pconfig.asm on the command line.
---
libavcodec/x86/celt_pvq_search.asm | 1 -
1 file changed, 1 deletion(-)

diff --git a/libavcodec/x86/celt_pvq_search.asm 
b/libavcodec/x86/celt_pvq_search.asm
index e9bff02650..3c6974d370 100644
--- a/libavcodec/x86/celt_pvq_search.asm
+++ b/libavcodec/x86/celt_pvq_search.asm
@@ -20,7 +20,6 @@
;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
;**

-%include "config.asm"
%include "libavutil/x86/x86util.asm"

%ifdef __NASM_VER__
--
2.39.5 (Apple Git-154)


Will push soon, as this is kinda trivial.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] av1dec: Don't crash if decoding of some frames have failed

2024-10-04 Thread Martin Storsjö

On Wed, 2 Oct 2024, Martin Storsjö wrote:


If decoding with hwaccel, but decoding fails, these pointers
are null at this point.
---
libavcodec/av1dec.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
index 6a9de07d16..bc4ef63e68 100644
--- a/libavcodec/av1dec.c
+++ b/libavcodec/av1dec.c
@@ -281,6 +281,8 @@ static void skip_mode_params(AV1DecContext *s)
forward_idx  = -1;
backward_idx = -1;
for (i = 0; i < AV1_REFS_PER_FRAME; i++) {
+if (!s->ref[header->ref_frame_idx[i]].raw_frame_header)
+return;
ref_hint = 
s->ref[header->ref_frame_idx[i]].raw_frame_header->order_hint;
dist = get_relative_dist(seq, ref_hint, header->order_hint);
if (dist < 0) {
--
2.39.5 (Apple Git-154)


OK'd by James on irc, will push later.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/4] lavc/vulkan: add SPIR-V compilation support

2024-10-04 Thread Martin Storsjö

On Fri, 4 Oct 2024, Lynne via ffmpeg-devel wrote:


On 04/10/2024 12:01, epira...@gmail.com wrote:

On 4 Oct 2024, at 11:31, Lynne via ffmpeg-devel wrote:


This is the same as with libavfilter.

We will need SPIR-V compilation for at least three different things,
like the VC-2 encoder and decoder, AV1 film grain synthesis for
hardware with no support for it, and possibly other codecs.
---
  libavcodec/Makefile |  4 
  libavcodec/vulkan_glslang.c | 19 +++
  libavcodec/vulkan_shaderc.c | 19 +++
  3 files changed, 42 insertions(+)
  create mode 100644 libavcodec/vulkan_glslang.c
  create mode 100644 libavcodec/vulkan_shaderc.c

diff --git a/libavcodec/vulkan_shaderc.c b/libavcodec/vulkan_shaderc.c
new file mode 100644
index 00..9f60bf4dfd
--- /dev/null
+++ b/libavcodec/vulkan_shaderc.c
@@ -0,0 +1,19 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
02110-1301 USA

+ */
+
+#include "libavutil/vulkan_shaderc.c"


Wouldn’t this cause duplicate symbol issues with for example 
ff_vk_shaderc_init

being in both libavfilter and libavcodec?


No. For completely identical objects like we have here, the linker 
deduplicates while linking.


Not quite...

A linker doesn't deduplicate things wrt static libraries; it's the other 
way around. With a static library, the linker only pulls in specifically 
the object files that are needed to fulfill some missing symbol. As long 
as both objects provide the exact same set of symbols, there's no 
collision and no issue with duplicates.


If you have two objects that provide a differing set of symbols, you may 
end up pulling in both object files, and you have such a conflict.


If a static library is linked with flags like --wholearchive, forcing the 
linker to pull in all object files (as if they were specified directly on 
the linker command line), you'd also hit the same conflict. (Not that we 
usually do this, though.)


This is how vulkan.c is also handled across libavutil, libavfilter and 
libavcodec. We also handle something else in the same way, but I don't 
remember what.


I'm not sure about these cases, but we have a slightly similar thing with 
e.g. log2_tab.c, which contains an ff_ prefixed symbol; for static-only 
builds, we only provide the object file in libavutil, as it will be 
accessible to other libraries from there. For shared library builds, we 
include the object in the other shared libraries, so that the symbols 
(which aren't visible/exported across shared libraries) won't need to be 
accessed across shared libraries.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] checkasm: lls: Use relative tolerances rather than absolute ones

2024-10-04 Thread Martin Storsjö
Depending on the magnitude of the output values, the potential
errors can be larger.

This fixes errors in the lls tests on x86_32 for some seeds,
observed with GCC 11 (on Ubuntu 22.04, with the distro compiler,
with -m32).
---
 tests/checkasm/lls.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/tests/checkasm/lls.c b/tests/checkasm/lls.c
index 1e0b56974c..4251032e02 100644
--- a/tests/checkasm/lls.c
+++ b/tests/checkasm/lls.c
@@ -46,28 +46,32 @@ static void test_update(LLSModel *lls, const double *var)
 call_new(lls, var);
 
 for (size_t i = 0; i < lls->indep_count; i++)
-for (size_t j = i; j < lls->indep_count; j++)
+for (size_t j = i; j < lls->indep_count; j++) {
+double eps = FFMAX(2 * DBL_EPSILON * fabs(refcovar[i][j]),
+   8 * DBL_EPSILON);
 if (!double_near_abs_eps(refcovar[i][j], lls->covariance[i][j],
- 8 * DBL_EPSILON)) {
+ eps)) {
 fprintf(stderr, "%zu, %zu: %- .12f - %- .12f = % .12g\n", i, j,
 refcovar[i][j], lls->covariance[i][j],
 refcovar[i][j] - lls->covariance[i][j]);
 fail();
 }
+}
 
 bench_new(lls, var);
 }
 
-#define EPS 0.2
 static void test_evaluate(LLSModel *lls, const double *param, int order)
 {
-double refprod, newprod;
+double refprod, newprod, eps;
 declare_func_float(double, LLSModel *, const double *, int);
 
 refprod = call_ref(lls, param, order);
 newprod = call_new(lls, param, order);
 
-if (!double_near_abs_eps(refprod, newprod, EPS)) {
+eps = FFMAX(2 * DBL_EPSILON * fabs(refprod), 0.2);
+
+if (!double_near_abs_eps(refprod, newprod, eps)) {
 fprintf(stderr, "%- .12f - %- .12f = % .12g\n",
 refprod, newprod, refprod - newprod);
 fail();
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] arm: Consistently use proper interworking function returns

2024-10-04 Thread Martin Storsjö
Use "bx lr", or "pop {lr}", which do proper mode switching
between thumb and arm modes. A plain "mov pc, lr" does not switch
from thumb mode to arm mode (while in arm mode, it does switch
mode for a thumb caller).

This is normally not an issue, as CONFIG_THUMB only is enabled if
the C compiler defaults to thumb; but stick to patterns that can
do mode switching if needed, for consistency.
---
 libswresample/arm/resample.S  | 8 
 libswscale/arm/hscale.S   | 3 +--
 libswscale/arm/output.S   | 3 +--
 libswscale/arm/yuv2rgb_neon.S | 3 +--
 4 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/libswresample/arm/resample.S b/libswresample/arm/resample.S
index 3ce7623246..791f4cc016 100644
--- a/libswresample/arm/resample.S
+++ b/libswresample/arm/resample.S
@@ -30,7 +30,7 @@ function ff_resample_common_apply_filter_x4_float_neon, 
export=1
 vpadd.f32   d0, d0, d1 @ pair 
adding of the 4x32-bit accumulated values
 vpadd.f32   d0, d0, d0 @ pair 
adding of the 4x32-bit accumulator values
 vst1.32 {d0[0]}, [r0]  @ write 
accumulator
-mov pc, lr
+bx  lr
 endfunc
 
 function ff_resample_common_apply_filter_x8_float_neon, export=1
@@ -46,7 +46,7 @@ function ff_resample_common_apply_filter_x8_float_neon, 
export=1
 vpadd.f32   d0, d0, d1 @ pair 
adding of the 4x32-bit accumulated values
 vpadd.f32   d0, d0, d0 @ pair 
adding of the 4x32-bit accumulator values
 vst1.32 {d0[0]}, [r0]  @ write 
accumulator
-mov pc, lr
+bx  lr
 endfunc
 
 function ff_resample_common_apply_filter_x4_s16_neon, export=1
@@ -59,7 +59,7 @@ function ff_resample_common_apply_filter_x4_s16_neon, export=1
 vpadd.s32   d0, d0, d1 @ pair 
adding of the 4x32-bit accumulated values
 vpadd.s32   d0, d0, d0 @ pair 
adding of the 4x32-bit accumulator values
 vst1.32 {d0[0]}, [r0]  @ write 
accumulator
-mov pc, lr
+bx  lr
 endfunc
 
 function ff_resample_common_apply_filter_x8_s16_neon, export=1
@@ -73,5 +73,5 @@ function ff_resample_common_apply_filter_x8_s16_neon, export=1
 vpadd.s32   d0, d0, d1 @ pair 
adding of the 4x32-bit accumulated values
 vpadd.s32   d0, d0, d0 @ pair 
adding of the 4x32-bit accumulator values
 vst1.32 {d0[0]}, [r0]  @ write 
accumulator
-mov pc, lr
+bx  lr
 endfunc
diff --git a/libswscale/arm/hscale.S b/libswscale/arm/hscale.S
index dd4d453957..5c3551a0f1 100644
--- a/libswscale/arm/hscale.S
+++ b/libswscale/arm/hscale.S
@@ -65,6 +65,5 @@ function ff_hscale_8_to_15_neon, export=1
 subsr2, #2 @ dstW 
-= 2
 bgt 1b @ loop 
until end of line
 vpop{q4-q7}
-pop {r4-r12, lr}
-mov pc, lr
+pop {r4-r12, pc}
 endfunc
diff --git a/libswscale/arm/output.S b/libswscale/arm/output.S
index 70846dee1f..5f10585f81 100644
--- a/libswscale/arm/output.S
+++ b/libswscale/arm/output.S
@@ -73,6 +73,5 @@ function ff_yuv2planeX_8_neon, export=1
 subsr4, r4, #8 @ dstW 
-= 8
 bgt 2b @ loop 
until width is consumed
 vpop{q4-q7}
-pop {r4-r12, lr}
-mov pc, lr
+pop {r4-r12, pc}
 endfunc
diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 474465427d..6777d625f9 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -262,8 +262,7 @@ function ff_\ifmt\()_to_\ofmt\()_neon, export=1
 increment_and_test_\ifmt
 bgt 1b
 vpop{q4-q7}
-pop {r4-r12, lr}
-mov pc, lr
+pop {r4-r12, pc}
 endfunc
 .endm
 
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] configure: Enable -Wno-implicit-const-int-float-conversion if available

2024-10-02 Thread Martin Storsjö
This silences a lot of compile warnings (around 160 instances at least), when
compiling with Clang.

These warnings look like this:

libavformat/http.c:176:133: warning: implicit conversion from 'long long' 
to 'double' changes value from 9223372036854775807 to 9223372036854775808 
[-Wimplicit-const-int-float-conversion]
  176 | { "end_offset", "try to limit the request to bytes preceding 
this offset", OFFSET(end_off), AV_OPT_TYPE_INT64, { .i64 = 0 }, 0, INT64_MAX, D 
},
  | ~   
^
---
 configure | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configure b/configure
index dc1b9b2bea..2b0ba07771 100755
--- a/configure
+++ b/configure
@@ -7459,6 +7459,7 @@ check_disable_warning -Wno-pointer-sign
 check_disable_warning -Wno-unused-const-variable
 check_disable_warning -Wno-bool-operation
 check_disable_warning -Wno-char-subscripts
+check_disable_warning -Wno-implicit-const-int-float-conversion
 
 check_disable_warning_headers(){
 warning_flag=-W${1#-Wno-}
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] libavcodec: x86: Remove an explicit include of config.asm

2024-10-02 Thread Martin Storsjö
This file is never included explicitly anywhere else, it's only
included implicitly by passing -Pconfig.asm on the command line.
---
 libavcodec/x86/celt_pvq_search.asm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libavcodec/x86/celt_pvq_search.asm 
b/libavcodec/x86/celt_pvq_search.asm
index e9bff02650..3c6974d370 100644
--- a/libavcodec/x86/celt_pvq_search.asm
+++ b/libavcodec/x86/celt_pvq_search.asm
@@ -20,7 +20,6 @@
 ;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 ;**
 
-%include "config.asm"
 %include "libavutil/x86/x86util.asm"
 
 %ifdef __NASM_VER__
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] av1dec: Don't crash if decoding of some frames have failed

2024-10-02 Thread Martin Storsjö
If decoding with hwaccel, but decoding fails, these pointers
are null at this point.
---
 libavcodec/av1dec.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
index 6a9de07d16..bc4ef63e68 100644
--- a/libavcodec/av1dec.c
+++ b/libavcodec/av1dec.c
@@ -281,6 +281,8 @@ static void skip_mode_params(AV1DecContext *s)
 forward_idx  = -1;
 backward_idx = -1;
 for (i = 0; i < AV1_REFS_PER_FRAME; i++) {
+if (!s->ref[header->ref_frame_idx[i]].raw_frame_header)
+return;
 ref_hint = 
s->ref[header->ref_frame_idx[i]].raw_frame_header->order_hint;
 dist = get_relative_dist(seq, ref_hint, header->order_hint);
 if (dist < 0) {
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] libavutil: Makefile: Fix alphabetical order for the film_grain_params files

2024-10-02 Thread Martin Storsjö

On Mon, 30 Sep 2024, Martin Storsjö wrote:


---
libavutil/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 6e6fa8d800..d3c95b12a0 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -34,6 +34,7 @@ HEADERS = adler32.h   
  \
  executor.h\
  fifo.h\
  file.h\
+  film_grain_params.h   \
  frame.h   \
  hash.h\
  hdr_dynamic_metadata.h\
@@ -93,7 +94,6 @@ HEADERS = adler32.h   
  \
  xtea.h\
  tea.h \
  tx.h  \
-  film_grain_params.h   \
  video_hint.h

ARCH_HEADERS = bswap.h  \
@@ -135,6 +135,7 @@ OBJS = adler32.o
\
   file.o   \
   file_open.o  \
   float_dsp.o  \
+   film_grain_params.o  \
   fixed_dsp.o  \
   frame.o  \
   hash.o   \
@@ -189,7 +190,6 @@ OBJS = adler32.o
\
   version.o\
   video_enc_params.o   \
   video_hint.o \
-   film_grain_params.o  \


OBJS-$(CONFIG_CUDA) += hwcontext_cuda.o
--
2.39.5 (Apple Git-154)


Pushed, together with a similar patch for libavcodec, for moving entries 
to the right section.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avcodec/videotoolbox: add AV1 hardware acceleration

2024-10-02 Thread Martin Storsjö

On Tue, 1 Oct 2024, Martin Storsjö wrote:


On Fri, 27 Sep 2024, Cameron Gutman wrote:


On Thu, Sep 26, 2024 at 4:25 PM Martin Storsjö  wrote:


From: Jan Ekström 

Use AV1DecContext's current_obu to access the original OBUs, and
feed them to videotoolbox, rather than the bare slice data passed
via decode_slice.

This requires a small addition to AV1DecContext, for keeping track
of the current range of OBUs that belong to the current frame.

Co-authored-by: Ruslan Chernenko 
Co-authored-by: Martin Storsjö 
---
v3: Adjust where nb_unit/start_unit are set, add code comments explaining
the roles of nb_unit/start_unit.
---


I've got 3 positive reports from folks testing this version, so LGTM.


Thanks! If there's no further opposition to this, and James doesn't mind the 
changes to the generic code in av1dec.c/h, I'll go ahead and push this later 
today or tomorrow.


Pushed this now, thanks!

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] avcodec/videotoolbox: add AV1 hardware acceleration

2024-10-01 Thread Martin Storsjö

On Fri, 27 Sep 2024, Cameron Gutman wrote:


On Thu, Sep 26, 2024 at 4:25 PM Martin Storsjö  wrote:


From: Jan Ekström 

Use AV1DecContext's current_obu to access the original OBUs, and
feed them to videotoolbox, rather than the bare slice data passed
via decode_slice.

This requires a small addition to AV1DecContext, for keeping track
of the current range of OBUs that belong to the current frame.

Co-authored-by: Ruslan Chernenko 
Co-authored-by: Martin Storsjö 
---
v3: Adjust where nb_unit/start_unit are set, add code comments explaining
the roles of nb_unit/start_unit.
---


I've got 3 positive reports from folks testing this version, so LGTM.


Thanks! If there's no further opposition to this, and James doesn't mind 
the changes to the generic code in av1dec.c/h, I'll go ahead and push this 
later today or tomorrow.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] libavutil: Makefile: Fix alphabetical order for the film_grain_params files

2024-09-30 Thread Martin Storsjö
---
 libavutil/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/Makefile b/libavutil/Makefile
index 6e6fa8d800..d3c95b12a0 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -34,6 +34,7 @@ HEADERS = adler32.h   
  \
   executor.h\
   fifo.h\
   file.h\
+  film_grain_params.h   \
   frame.h   \
   hash.h\
   hdr_dynamic_metadata.h\
@@ -93,7 +94,6 @@ HEADERS = adler32.h   
  \
   xtea.h\
   tea.h \
   tx.h  \
-  film_grain_params.h   \
   video_hint.h
 
 ARCH_HEADERS = bswap.h  \
@@ -135,6 +135,7 @@ OBJS = adler32.o
\
file.o   \
file_open.o  \
float_dsp.o  \
+   film_grain_params.o  \
fixed_dsp.o  \
frame.o  \
hash.o   \
@@ -189,7 +190,6 @@ OBJS = adler32.o
\
version.o\
video_enc_params.o   \
video_hint.o \
-   film_grain_params.o  \
 
 
 OBJS-$(CONFIG_CUDA) += hwcontext_cuda.o
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v4 1/3] aarch64/vvc: Add w_avg

2024-09-29 Thread Martin Storsjö

On Sun, 29 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

w_avg_8_2x2_c:   0.0 ( 0.00x)
w_avg_8_2x2_neon:0.0 ( 0.00x)
w_avg_8_4x4_c:   0.2 ( 1.00x)
w_avg_8_4x4_neon:0.0 ( 0.00x)
w_avg_8_8x8_c:   1.2 ( 1.00x)
w_avg_8_8x8_neon:0.2 ( 5.00x)
w_avg_8_16x16_c: 4.2 ( 1.00x)
w_avg_8_16x16_neon:  0.8 ( 5.67x)
w_avg_8_32x32_c:16.2 ( 1.00x)
w_avg_8_32x32_neon:  2.5 ( 6.50x)
w_avg_8_64x64_c:64.5 ( 1.00x)
w_avg_8_64x64_neon:  9.0 ( 7.17x)
w_avg_8_128x128_c: 269.5 ( 1.00x)
w_avg_8_128x128_neon:   35.5 ( 7.59x)
w_avg_10_2x2_c:  0.2 ( 1.00x)
w_avg_10_2x2_neon:   0.2 ( 1.00x)
w_avg_10_4x4_c:  0.2 ( 1.00x)
w_avg_10_4x4_neon:   0.2 ( 1.00x)
w_avg_10_8x8_c:  1.0 ( 1.00x)
w_avg_10_8x8_neon:   0.2 ( 4.00x)
w_avg_10_16x16_c:4.2 ( 1.00x)
w_avg_10_16x16_neon: 0.8 ( 5.67x)
w_avg_10_32x32_c:   16.2 ( 1.00x)
w_avg_10_32x32_neon: 2.5 ( 6.50x)
w_avg_10_64x64_c:   66.2 ( 1.00x)
w_avg_10_64x64_neon:10.0 ( 6.62x)
w_avg_10_128x128_c:277.8 ( 1.00x)
w_avg_10_128x128_neon:  39.8 ( 6.99x)
w_avg_12_2x2_c:  0.0 ( 0.00x)
w_avg_12_2x2_neon:   0.2 ( 0.00x)
w_avg_12_4x4_c:  0.2 ( 1.00x)
w_avg_12_4x4_neon:   0.0 ( 0.00x)
w_avg_12_8x8_c:  1.2 ( 1.00x)
w_avg_12_8x8_neon:   0.5 ( 2.50x)
w_avg_12_16x16_c:4.8 ( 1.00x)
w_avg_12_16x16_neon: 0.8 ( 6.33x)
w_avg_12_32x32_c:   17.0 ( 1.00x)
w_avg_12_32x32_neon: 2.8 ( 6.18x)
w_avg_12_64x64_c:   64.0 ( 1.00x)
w_avg_12_64x64_neon:10.0 ( 6.40x)
w_avg_12_128x128_c:269.2 ( 1.00x)
w_avg_12_128x128_neon:  42.0 ( 6.41x)

Signed-off-by: Zhao Zhili 
---
libavcodec/aarch64/vvc/dsp_init.c | 36 +++
libavcodec/aarch64/vvc/inter.S| 99 +--
2 files changed, 118 insertions(+), 17 deletions(-)

diff --git a/libavcodec/aarch64/vvc/dsp_init.c 
b/libavcodec/aarch64/vvc/dsp_init.c
index ad767d17e2..ebe58a2ba5 100644
--- a/libavcodec/aarch64/vvc/dsp_init.c
+++ b/libavcodec/aarch64/vvc/dsp_init.c
@@ -52,6 +52,39 @@ void ff_vvc_avg_12_neon(uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *src0, const int16_t *src1, int width,
int height);

+void ff_vvc_w_avg_8_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
+ const int16_t *src0, const int16_t *src1,
+ int width, int height,
+ uintptr_t w0_w1, uintptr_t offset_shift);
+void ff_vvc_w_avg_10_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
+ const int16_t *src0, const int16_t *src1,
+ int width, int height,
+ uintptr_t w0_w1, uintptr_t offset_shift);
+void ff_vvc_w_avg_12_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
+  const int16_t *src0, const int16_t *src1,
+  int width, int height,
+  uintptr_t w0_w1, uintptr_t offset_shift);
+/* When passing arguments to functions, Apple platforms diverge from the ARM64
+ * standard ABI for functions that require passing arguments on the stack. To
+ * simplify portability in the assembly function interface, use a different
+ * function signature that doesn't require passing arguments on the stack.
+ */
+#define W_AVG_FUN(bit_depth) \
+static void vvc_w_avg_ ## bit_depth(uint8_t *dst, ptrdiff_t dst_stride, \
+const int16_t *src0, const int16_t *src1, int width, int height, \
+int denom, int w0, int w1, int o0, int o1) \
+{ \
+int shift = denom + FFMAX(3, 15 - bit_depth); \
+int offset = ((o0 + o1) * (1 << (bit

Re: [FFmpeg-devel] [PATCH v2 12/16] swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats

2024-09-29 Thread Martin Storsjö

On Sun, 29 Sep 2024, Ramiro Polla wrote:


On Sat, Sep 28, 2024 at 11:41 PM Michael Niedermayer
 wrote:

On Fri, Sep 27, 2024 at 02:52:37PM +0200, Ramiro Polla wrote:
> There is an issue with the constants used in YUV to YUV range conversion,
> where the upper bound is not respected when converting to mpeg range.
>
> With this commit, the constants are calculated at runtime, depending on
> the bit depth. This approach also allows us to more easily understand how
> the constants are derived.
>
> For bit depths <= 14, the number of fixed point bits has been set to 14
> for all conversions, to simplify the code.
> For bit depths > 14, the number of fixed points bits has been raised and
> set to 18, to allow for the conversion to be accurate enough for the mpeg
> range to be respected.
>
> The convert functions now take the conversion constants (coeff and offset)
> as function arguments.
> For bit depths <= 14, offset is 32-bit.
> For bit depths > 14, offset is 64-bit.
>
> x86_64:
> chrRangeFromJpeg8_1920_c:5804.5  5845.2 ( 0.99x)
> chrRangeFromJpeg16_1920_c:   5792.8  5809.1 ( 1.00x)
> chrRangeToJpeg8_1920_c:  9388.6  9462.2 ( 0.99x)
> chrRangeToJpeg16_1920_c: 5796.5  9261.5 ( 0.63x)
> lumRangeFromJpeg8_1920_c:4147.9  4191.4 ( 0.99x)
> lumRangeFromJpeg16_1920_c:   4529.0  4143.4 ( 1.09x)
> lumRangeToJpeg8_1920_c:  5694.1  5720.5 ( 1.00x)
> lumRangeToJpeg16_1920_c: 5334.2  5139.5 ( 1.04x)
>
> aarch64 A55:
> chrRangeFromJpeg8_1920_c:   28833.8 28834.8 ( 1.00x)
> chrRangeFromJpeg16_1920_c:  28842.8 28840.6 ( 1.00x)
> chrRangeToJpeg8_1920_c: 23070.6 23072.5 ( 1.00x)
> chrRangeToJpeg16_1920_c:17313.8 23075.1 ( 0.75x)
> lumRangeFromJpeg8_1920_c:   15388.1 15386.7 ( 1.00x)
> lumRangeFromJpeg16_1920_c:  15388.0 15383.8 ( 1.00x)
> lumRangeToJpeg8_1920_c: 19226.2 19223.6 ( 1.00x)
> lumRangeToJpeg16_1920_c:19225.5 19225.5 ( 1.00x)
>
> aarch64 A76:
> chrRangeFromJpeg8_1920_c:6317.8  6318.5 ( 1.00x)
> chrRangeFromJpeg16_1920_c:   6322.9  6323.5 ( 1.00x)
> chrRangeToJpeg8_1920_c:  9287.1  9170.0 ( 1.01x)
> chrRangeToJpeg16_1920_c: 6104.9  9195.6 ( 0.66x)
> lumRangeFromJpeg8_1920_c:4359.1  4425.5 ( 0.98x)
> lumRangeFromJpeg16_1920_c:   4358.8  4436.8 ( 0.98x)
> lumRangeToJpeg8_1920_c:  5957.2  6017.2 ( 0.99x)
> lumRangeToJpeg16_1920_c: 6072.5  6017.2 ( 1.01x)
>
> NOTE: all simd optimizations for range_convert have been disabled.
>   they will be re-enabled when they are fixed for each architecture.
>
> NOTE2: the same issue still exists in rgb2yuv conversions, which is not
>addressed in this commit.
> ---

seems to break fate:

make -j32 fate-filter-owdenoise-sample
TESTfilter-owdenoise-sample
stddev:12247.77 PSNR: 14.57 MAXDIFF:65280 bytes:   576000/   576000
MAXDIFF: |65280 - 1| >= 3539
Test filter-owdenoise-sample failed. Look at 
tests/data/fate/filter-owdenoise-sample.err for details.
make: *** [tests/Makefile:311: fate-filter-owdenoise-sample] Error 1


Ok, I got it now. The reference is in the fate suite samples. That's
why it doesn't show up on git diff after running fate with GEN=1.

It seems to me there is an unnecessary "-color range mpeg" in that
test. But fixing this requires changing both the reference in the fate
suite and the test in tests/fate/filter-video.mak. How can we fix this
without breaking fate for people that haven't run make fate-rsync?
Perhaps disable the test, update the reference, wait a couple of days,
and enable the test again without -color_range? Or just keeping a
resulting checksum in git, so that it can more easily be updated when
needed?


People may want to run fate on old checkouts or old releases too, so we 
can't really change the reference file. If we need to, we'd have to update 
a new reference file with a new name, and update the tests to compare 
against that.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3 1/3] aarch64/vvc: Add w_avg

2024-09-27 Thread Martin Storsjö

On Thu, 26 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

w_avg_8_2x2_c:   0.0 ( 0.00x)
w_avg_8_2x2_neon:0.0 ( 0.00x)
w_avg_8_4x4_c:   0.2 ( 1.00x)
w_avg_8_4x4_neon:0.0 ( 0.00x)
w_avg_8_8x8_c:   1.2 ( 1.00x)
w_avg_8_8x8_neon:0.2 ( 5.00x)
w_avg_8_16x16_c: 4.2 ( 1.00x)
w_avg_8_16x16_neon:  0.8 ( 5.67x)
w_avg_8_32x32_c:16.2 ( 1.00x)
w_avg_8_32x32_neon:  2.5 ( 6.50x)
w_avg_8_64x64_c:64.5 ( 1.00x)
w_avg_8_64x64_neon:  9.0 ( 7.17x)
w_avg_8_128x128_c: 269.5 ( 1.00x)
w_avg_8_128x128_neon:   35.5 ( 7.59x)
w_avg_10_2x2_c:  0.2 ( 1.00x)
w_avg_10_2x2_neon:   0.2 ( 1.00x)
w_avg_10_4x4_c:  0.2 ( 1.00x)
w_avg_10_4x4_neon:   0.2 ( 1.00x)
w_avg_10_8x8_c:  1.0 ( 1.00x)
w_avg_10_8x8_neon:   0.2 ( 4.00x)
w_avg_10_16x16_c:4.2 ( 1.00x)
w_avg_10_16x16_neon: 0.8 ( 5.67x)
w_avg_10_32x32_c:   16.2 ( 1.00x)
w_avg_10_32x32_neon: 2.5 ( 6.50x)
w_avg_10_64x64_c:   66.2 ( 1.00x)
w_avg_10_64x64_neon:10.0 ( 6.62x)
w_avg_10_128x128_c:277.8 ( 1.00x)
w_avg_10_128x128_neon:  39.8 ( 6.99x)
w_avg_12_2x2_c:  0.0 ( 0.00x)
w_avg_12_2x2_neon:   0.2 ( 0.00x)
w_avg_12_4x4_c:  0.2 ( 1.00x)
w_avg_12_4x4_neon:   0.0 ( 0.00x)
w_avg_12_8x8_c:  1.2 ( 1.00x)
w_avg_12_8x8_neon:   0.5 ( 2.50x)
w_avg_12_16x16_c:4.8 ( 1.00x)
w_avg_12_16x16_neon: 0.8 ( 6.33x)
w_avg_12_32x32_c:   17.0 ( 1.00x)
w_avg_12_32x32_neon: 2.8 ( 6.18x)
w_avg_12_64x64_c:   64.0 ( 1.00x)
w_avg_12_64x64_neon:10.0 ( 6.40x)
w_avg_12_128x128_c:269.2 ( 1.00x)
w_avg_12_128x128_neon:  42.0 ( 6.41x)
---
libavcodec/aarch64/vvc/dsp_init.c | 34 +++
libavcodec/aarch64/vvc/inter.S| 99 +--
2 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/libavcodec/aarch64/vvc/dsp_init.c 
b/libavcodec/aarch64/vvc/dsp_init.c
index ad767d17e2..41d0e02d62 100644
--- a/libavcodec/aarch64/vvc/dsp_init.c
+++ b/libavcodec/aarch64/vvc/dsp_init.c
@@ -52,6 +52,37 @@ void ff_vvc_avg_12_neon(uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *src0, const int16_t *src1, int width,
int height);

+void ff_vvc_w_avg_8_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
+ const int16_t *src0, const int16_t *src1,
+ int width, int height,
+ uintptr_t w0_w1, uintptr_t offset_shift);
+void ff_vvc_w_avg_10_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
+ const int16_t *src0, const int16_t *src1,
+ int width, int height,
+ uintptr_t w0_w1, uintptr_t offset_shift);
+void ff_vvc_w_avg_12_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
+  const int16_t *src0, const int16_t *src1,
+  int width, int height,
+  uintptr_t w0_w1, uintptr_t offset_shift);
+/* When passing arguments to functions, Apple platforms diverge from the ARM64
+ * standard ABI, that we can't implement the function directly in asm.


I would prefer to reword the comment a little bit - as it _is_ possible to 
implement directly in assembly, it just is messy wrt portability.


What about:

When passing arguments to functions, Apple platforms diverge from the 
ARM64 standard ABI, for functions that require passing arguments on the 
stack. To simplify portability in the assembly function interface, use a 
different function signature that doesn't require passing arguments on the 
stack.




+ */
+#define W_AVG_FUN(bit_depth) \
+static void vvc_w_a

[FFmpeg-devel] [PATCH v3] avcodec/videotoolbox: add AV1 hardware acceleration

2024-09-26 Thread Martin Storsjö
From: Jan Ekström 

Use AV1DecContext's current_obu to access the original OBUs, and
feed them to videotoolbox, rather than the bare slice data passed
via decode_slice.

This requires a small addition to AV1DecContext, for keeping track
of the current range of OBUs that belong to the current frame.

Co-authored-by: Ruslan Chernenko 
Co-authored-by: Martin Storsjö 
---
v3: Adjust where nb_unit/start_unit are set, add code comments explaining
the roles of nb_unit/start_unit.
---
 configure |   4 ++
 libavcodec/Makefile   |   1 +
 libavcodec/av1dec.c   |  22 ++-
 libavcodec/av1dec.h   |   3 +-
 libavcodec/hwaccels.h |   1 +
 libavcodec/videotoolbox.c |  34 +++
 libavcodec/videotoolbox_av1.c | 104 ++
 libavcodec/vt_internal.h  |   4 ++
 8 files changed, 169 insertions(+), 4 deletions(-)
 create mode 100644 libavcodec/videotoolbox_av1.c

diff --git a/configure b/configure
index 643ffddd19..9d0c1423f1 100755
--- a/configure
+++ b/configure
@@ -2467,6 +2467,7 @@ TYPES_LIST="
 kCMVideoCodecType_HEVC
 kCMVideoCodecType_HEVCWithAlpha
 kCMVideoCodecType_VP9
+kCMVideoCodecType_AV1
 kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
 kCVPixelFormatType_422YpCbCr8BiPlanarVideoRange
 kCVPixelFormatType_422YpCbCr10BiPlanarVideoRange
@@ -3174,6 +3175,8 @@ av1_vaapi_hwaccel_deps="vaapi 
VADecPictureParameterBufferAV1_bit_depth_idx"
 av1_vaapi_hwaccel_select="av1_decoder"
 av1_vdpau_hwaccel_deps="vdpau VdpPictureInfoAV1"
 av1_vdpau_hwaccel_select="av1_decoder"
+av1_videotoolbox_hwaccel_deps="videotoolbox"
+av1_videotoolbox_hwaccel_select="av1_decoder"
 av1_vulkan_hwaccel_deps="vulkan"
 av1_vulkan_hwaccel_select="av1_decoder"
 h263_vaapi_hwaccel_deps="vaapi"
@@ -6707,6 +6710,7 @@ enabled videotoolbox && {
 check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_HEVC 
"-framework CoreMedia"
 check_func_headers CoreMedia/CMFormatDescription.h 
kCMVideoCodecType_HEVCWithAlpha "-framework CoreMedia"
 check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_VP9 
"-framework CoreMedia"
+check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_AV1 
"-framework CoreMedia"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange "-framework CoreVideo"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_422YpCbCr8BiPlanarVideoRange "-framework CoreVideo"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_422YpCbCr10BiPlanarVideoRange "-framework CoreVideo"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index a4fcce3b42..21188b2479 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1008,6 +1008,7 @@ OBJS-$(CONFIG_AV1_D3D12VA_HWACCEL)+= dxva2_av1.o 
d3d12va_av1.o
 OBJS-$(CONFIG_AV1_NVDEC_HWACCEL)  += nvdec_av1.o
 OBJS-$(CONFIG_AV1_VAAPI_HWACCEL)  += vaapi_av1.o
 OBJS-$(CONFIG_AV1_VDPAU_HWACCEL)  += vdpau_av1.o
+OBJS-$(CONFIG_AV1_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_av1.o
 OBJS-$(CONFIG_AV1_VULKAN_HWACCEL) += vulkan_decode.o vulkan_av1.o
 OBJS-$(CONFIG_H263_VAAPI_HWACCEL) += vaapi_mpeg4.o
 OBJS-$(CONFIG_H263_VIDEOTOOLBOX_HWACCEL)  += videotoolbox.o
diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
index 80e52d1bea..bc4ef63e68 100644
--- a/libavcodec/av1dec.c
+++ b/libavcodec/av1dec.c
@@ -543,6 +543,7 @@ static int get_pixel_format(AVCodecContext *avctx)
  CONFIG_AV1_NVDEC_HWACCEL + \
  CONFIG_AV1_VAAPI_HWACCEL + \
  CONFIG_AV1_VDPAU_HWACCEL + \
+ CONFIG_AV1_VIDEOTOOLBOX_HWACCEL + \
  CONFIG_AV1_VULKAN_HWACCEL)
 enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmtp = pix_fmts;
 
@@ -570,6 +571,9 @@ static int get_pixel_format(AVCodecContext *avctx)
 #if CONFIG_AV1_VDPAU_HWACCEL
 *fmtp++ = AV_PIX_FMT_VDPAU;
 #endif
+#if CONFIG_AV1_VIDEOTOOLBOX_HWACCEL
+*fmtp++ = AV_PIX_FMT_VIDEOTOOLBOX;
+#endif
 #if CONFIG_AV1_VULKAN_HWACCEL
 *fmtp++ = AV_PIX_FMT_VULKAN;
 #endif
@@ -594,6 +598,9 @@ static int get_pixel_format(AVCodecContext *avctx)
 #if CONFIG_AV1_VDPAU_HWACCEL
 *fmtp++ = AV_PIX_FMT_VDPAU;
 #endif
+#if CONFIG_AV1_VIDEOTOOLBOX_HWACCEL
+*fmtp++ = AV_PIX_FMT_VIDEOTOOLBOX;
+#endif
 #if CONFIG_AV1_VULKAN_HWACCEL
 *fmtp++ = AV_PIX_FMT_VULKAN;
 #endif
@@ -1441,6 +1448,10 @@ static int av1_receive_frame_internal(AVCodecContext 
*avctx, AVFrame *frame)
 
 if (raw_tile_group && (s->tile_num == raw_tile_group->tg_end + 1)) {
 int show_frame = s->raw_frame_header->show_frame;
+// Set nb_unit to point at the next OBU, to indicate which
+  

Re: [FFmpeg-devel] [PATCH] configure: Silence Xcode warnings about duplicate libraries

2024-09-26 Thread Martin Storsjö

On Wed, 25 Sep 2024, Martin Storsjö wrote:


Since Xcode 15, macOS developer tools use a new linker. The new
linker by default warns for duplicate -l options. As this is a
known and expected thing, not to be considered an issue, ask for
the warning to be silenced.

This silences linker warnings like this:

   ld: warning: ignoring duplicate libraries: '-lc++', '-lcrypto', '-lm', 
'-logg', '-lpthread', '-lssl', '-lvorbis', '-lvpx', '-lz'

The linker can also warn about duplicate -rpath options, and there's
currently no option to silence those warnings.
---
configure | 1 +
1 file changed, 1 insertion(+)

diff --git a/configure b/configure
index d77a55b653..a450b3c8d8 100755
--- a/configure
+++ b/configure
@@ -6480,6 +6480,7 @@ check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
_mm_setzero_si128()"

check_ldflags -Wl,--as-needed
check_ldflags -Wl,-z,noexecstack
+check_ldflags -Wl,-no_warn_duplicate_libraries



OK'd by Marvin on irc, will push in a day or two.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 1/3] aarch64/vvc: Add w_avg

2024-09-26 Thread Martin Storsjö

On Thu, 26 Sep 2024, Zhao Zhili wrote:


  --- a/libavcodec/aarch64/vvc/dsp_init.c
  +++ b/libavcodec/aarch64/vvc/dsp_init.c
  @@ -52,6 +52,37 @@ void ff_vvc_avg_12_neon(uint8_t *dst,
  ptrdiff_t dst_stride,
     const int16_t *src0, const int16_t
  *src1, int width,
     int height);

  +void ff_vvc_w_avg_8_neon(uint8_t *_dst, const ptrdiff_t
  _dst_stride,
  + const int16_t *src0, const
  int16_t *src1,
  + const int width, const int
  height,
  + uintptr_t w0_w1, uintptr_t
  offset_shift);


Including "const" on scalar parameters is entirely redundant, and we
don't prescribe use of that elsewhere in ffmpeg, and just makes the
whole declaration more noisy.


I see these “const” make clang-tidy not happy. They are here to keep
consistent with the prototypes
in vvc/dsp.h.


Hmm, I don't quite understand this comment - so you say that clang-tidy, 
in addition to me, also complain about them? But they are added manually 
to keep the prototypes exactly in sync? Or does clang-tidy complain about 
differences here, if we differ on the constness here?



There are three options:

1. Keep “const” as current state
2. Drop “const” only for these new functions
3. Remove “const” from vvc/dsp.h and all implementations

I can’t decide which way to go.


I would go for 3, at least long term.

If you need to keep the const within the function prototypes here for now 
to please some tool (I think most compilers wouldn't complain about 
differences in const on scalar parameters, although I think old MSVC did 
that), that's ok, but I would remove it from the unnecessary places (the 
local variables in the function, the parameter/register mappings in 
assembly).


Then we can try to do 3 as a later step.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 3/3] aarch64/vvc: Add dmvr

2024-09-26 Thread Martin Storsjö

On Mon, 23 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

dmvr_8_12x20_c:  2.2 ( 1.00x)
dmvr_8_12x20_neon:   0.5 ( 4.50x)
dmvr_8_20x12_c:  2.0 ( 1.00x)
dmvr_8_20x12_neon:   0.2 ( 8.00x)
dmvr_8_20x20_c:  3.2 ( 1.00x)
dmvr_8_20x20_neon:   0.5 ( 6.50x)
dmvr_12_12x20_c: 2.2 ( 1.00x)
dmvr_12_12x20_neon:  0.5 ( 4.50x)
dmvr_12_20x12_c: 2.2 ( 1.00x)
dmvr_12_20x12_neon:  0.5 ( 4.50x)
dmvr_12_20x20_c: 3.2 ( 1.00x)
dmvr_12_20x20_neon:  0.8 ( 4.33x)
---
libavcodec/aarch64/vvc/dsp_init.c |  4 ++
libavcodec/aarch64/vvc/inter.S| 94 ++-
2 files changed, 97 insertions(+), 1 deletion(-)


This looks ok to me.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 2/3] aarch64/vvc: Add dmvr_hv

2024-09-26 Thread Martin Storsjö

On Mon, 23 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

dmvr_hv_8_12x20_c:   8.0 ( 1.00x)
dmvr_hv_8_12x20_neon:1.2 ( 6.62x)
dmvr_hv_8_20x12_c:   8.0 ( 1.00x)
dmvr_hv_8_20x12_neon:0.9 ( 8.37x)
dmvr_hv_8_20x20_c:  12.9 ( 1.00x)
dmvr_hv_8_20x20_neon:1.7 ( 7.62x)
dmvr_hv_10_12x20_c:  7.0 ( 1.00x)
dmvr_hv_10_12x20_neon:   1.7 ( 4.09x)
dmvr_hv_10_20x12_c:  7.0 ( 1.00x)
dmvr_hv_10_20x12_neon:   1.7 ( 4.09x)
dmvr_hv_10_20x20_c: 11.2 ( 1.00x)
dmvr_hv_10_20x20_neon:   2.7 ( 4.15x)
dmvr_hv_12_12x20_c:  6.5 ( 1.00x)
dmvr_hv_12_12x20_neon:   1.7 ( 3.79x)
dmvr_hv_12_20x12_c:  6.5 ( 1.00x)
dmvr_hv_12_20x12_neon:   1.7 ( 3.79x)
dmvr_hv_12_20x20_c: 10.2 ( 1.00x)
dmvr_hv_12_20x20_neon:   2.2 ( 4.64x)
---
libavcodec/aarch64/vvc/dsp_init.c |  12 ++
libavcodec/aarch64/vvc/inter.S| 307 ++
2 files changed, 319 insertions(+)

diff --git a/libavcodec/aarch64/vvc/dsp_init.c 
b/libavcodec/aarch64/vvc/dsp_init.c
index b39ebb83fc..995e26d163 100644
--- a/libavcodec/aarch64/vvc/dsp_init.c
+++ b/libavcodec/aarch64/vvc/dsp_init.c
@@ -83,6 +83,15 @@ W_AVG_FUN(8)
W_AVG_FUN(10)
W_AVG_FUN(12)

+#define DMVR_FUN(fn, bd) \
+void ff_vvc_dmvr_ ## fn ## bd ## _neon(int16_t *dst, \
+const uint8_t *_src, const ptrdiff_t _src_stride, const int height, \
+const intptr_t mx, const intptr_t my, const int width);


Unnecessary const on scalar parameters


+
+DMVR_FUN(hv_, 8)
+DMVR_FUN(hv_, 10)
+DMVR_FUN(hv_, 12)
+
void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const int bd)
{
int cpu_flags = av_get_cpu_flags();
@@ -155,6 +164,7 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, const 
int bd)

c->inter.avg = ff_vvc_avg_8_neon;
c->inter.w_avg = vvc_w_avg_8;
+c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_8_neon;

for (int i = 0; i < FF_ARRAY_ELEMS(c->sao.band_filter); i++)
c->sao.band_filter[i] = ff_h26x_sao_band_filter_8x8_8_neon;
@@ -196,12 +206,14 @@ void ff_vvc_dsp_init_aarch64(VVCDSPContext *const c, 
const int bd)
} else if (bd == 10) {
c->inter.avg = ff_vvc_avg_10_neon;
c->inter.w_avg = vvc_w_avg_10;
+c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_10_neon;

c->alf.filter[LUMA] = alf_filter_luma_10_neon;
c->alf.filter[CHROMA] = alf_filter_chroma_10_neon;
} else if (bd == 12) {
c->inter.avg = ff_vvc_avg_12_neon;
c->inter.w_avg = vvc_w_avg_12;
+c->inter.dmvr[1][1] = ff_vvc_dmvr_hv_12_neon;

c->alf.filter[LUMA] = alf_filter_luma_12_neon;
c->alf.filter[CHROMA] = alf_filter_chroma_12_neon;
diff --git a/libavcodec/aarch64/vvc/inter.S b/libavcodec/aarch64/vvc/inter.S
index c4c6ab1a72..a0bb356f07 100644
--- a/libavcodec/aarch64/vvc/inter.S
+++ b/libavcodec/aarch64/vvc/inter.S
@@ -226,3 +226,310 @@ vvc_avg avg, 12
vvc_avg w_avg, 8
vvc_avg w_avg, 10
vvc_avg w_avg, 12
+
+/* x0: int16_t *dst
+ * x1: const uint8_t *_src
+ * x2: const ptrdiff_t _src_stride
+ * w3: const int height
+ * x4: const intptr_t mx
+ * x5: const intptr_t my
+ * w6: const int width


Unnecessary const


+ */
+function ff_vvc_dmvr_hv_8_neon, export=1
+dst .req x0
+src .req x1
+src_stride  .req x2
+height  .req w3
+mx  .req x4
+my  .req x5
+width   .req w6
+tmp0.req x7
+tmp1.req x8
+
+sub sp, sp, #(VVC_MAX_PB_SIZE * 4)
+
+movrel  x9, X(ff_vvc_inter_luma_dmvr_filters)
+add x12, x9, mx, lsl #1
+ldrbw10, [x12]
+ldrbw11, [x12, #1]
+mov tmp0, sp
+add tmp1, tmp0, #(VVC_MAX_PB_SIZE * 2)
+// We know the value are positive
+dup v0.8h, w10  // filter_x[0]
+dup v1.8h, w11  // filter_x[1]


If we don't need these values in GPRs, we could also just do ld1r, 
although that requires incrementing the pointer (which probably can be 
done with a post-increment, [x12], #1) between the loads. Then again, I 
see you load 8 bits but you want them in 16 bit elements, so that would 
require a separate uxtl. So then I guess this use of GPRs for loading is 
reasonable.


All in all, the patch seems fine, except for the unnecessary consts.

// Martin


Re: [FFmpeg-devel] [PATCH v2 1/3] aarch64/vvc: Add w_avg

2024-09-26 Thread Martin Storsjö

On Mon, 23 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

w_avg_8_2x2_c:   0.0 ( 0.00x)
w_avg_8_2x2_neon:0.0 ( 0.00x)
w_avg_8_4x4_c:   0.2 ( 1.00x)
w_avg_8_4x4_neon:0.0 ( 0.00x)
w_avg_8_8x8_c:   1.2 ( 1.00x)
w_avg_8_8x8_neon:0.2 ( 5.00x)
w_avg_8_16x16_c: 4.2 ( 1.00x)
w_avg_8_16x16_neon:  0.8 ( 5.67x)
w_avg_8_32x32_c:16.2 ( 1.00x)
w_avg_8_32x32_neon:  2.5 ( 6.50x)
w_avg_8_64x64_c:64.5 ( 1.00x)
w_avg_8_64x64_neon:  9.0 ( 7.17x)
w_avg_8_128x128_c: 269.5 ( 1.00x)
w_avg_8_128x128_neon:   35.5 ( 7.59x)
w_avg_10_2x2_c:  0.2 ( 1.00x)
w_avg_10_2x2_neon:   0.2 ( 1.00x)
w_avg_10_4x4_c:  0.2 ( 1.00x)
w_avg_10_4x4_neon:   0.2 ( 1.00x)
w_avg_10_8x8_c:  1.0 ( 1.00x)
w_avg_10_8x8_neon:   0.2 ( 4.00x)
w_avg_10_16x16_c:4.2 ( 1.00x)
w_avg_10_16x16_neon: 0.8 ( 5.67x)
w_avg_10_32x32_c:   16.2 ( 1.00x)
w_avg_10_32x32_neon: 2.5 ( 6.50x)
w_avg_10_64x64_c:   66.2 ( 1.00x)
w_avg_10_64x64_neon:10.0 ( 6.62x)
w_avg_10_128x128_c:277.8 ( 1.00x)
w_avg_10_128x128_neon:  39.8 ( 6.99x)
w_avg_12_2x2_c:  0.0 ( 0.00x)
w_avg_12_2x2_neon:   0.2 ( 0.00x)
w_avg_12_4x4_c:  0.2 ( 1.00x)
w_avg_12_4x4_neon:   0.0 ( 0.00x)
w_avg_12_8x8_c:  1.2 ( 1.00x)
w_avg_12_8x8_neon:   0.5 ( 2.50x)
w_avg_12_16x16_c:4.8 ( 1.00x)
w_avg_12_16x16_neon: 0.8 ( 6.33x)
w_avg_12_32x32_c:   17.0 ( 1.00x)
w_avg_12_32x32_neon: 2.8 ( 6.18x)
w_avg_12_64x64_c:   64.0 ( 1.00x)
w_avg_12_64x64_neon:10.0 ( 6.40x)
w_avg_12_128x128_c:269.2 ( 1.00x)
w_avg_12_128x128_neon:  42.0 ( 6.41x)
---
libavcodec/aarch64/vvc/dsp_init.c | 34 +++
libavcodec/aarch64/vvc/inter.S| 99 +--
2 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/libavcodec/aarch64/vvc/dsp_init.c 
b/libavcodec/aarch64/vvc/dsp_init.c
index ad767d17e2..b39ebb83fc 100644
--- a/libavcodec/aarch64/vvc/dsp_init.c
+++ b/libavcodec/aarch64/vvc/dsp_init.c
@@ -52,6 +52,37 @@ void ff_vvc_avg_12_neon(uint8_t *dst, ptrdiff_t dst_stride,
const int16_t *src0, const int16_t *src1, int width,
int height);

+void ff_vvc_w_avg_8_neon(uint8_t *_dst, const ptrdiff_t _dst_stride,
+ const int16_t *src0, const int16_t *src1,
+ const int width, const int height,
+ uintptr_t w0_w1, uintptr_t offset_shift);


Including "const" on scalar parameters is entirely redundant, and we don't 
prescribe use of that elsewhere in ffmpeg, and just makes the whole 
declaration more noisy.



+void ff_vvc_w_avg_10_neon(uint8_t *_dst, const ptrdiff_t _dst_stride,
+ const int16_t *src0, const int16_t *src1,
+ const int width, const int height,
+ uintptr_t w0_w1, uintptr_t offset_shift);
+void ff_vvc_w_avg_12_neon(uint8_t *_dst, const ptrdiff_t _dst_stride,
+  const int16_t *src0, const int16_t *src1,
+  const int width, const int height,
+  uintptr_t w0_w1, uintptr_t offset_shift);
+/* When passing arguments to functions, Apple platforms diverge from the ARM64
+ * standard ABI, that we can't implement the function directly in asm.
+ */


It's fully possible to implement that in assembly, but it usually requires 
ugly ifdefs.


That said, I'm ok with this kind of wrapper, as it avoids the problem 
kinda neatly, but ifdefs in the assembly can also be needed at times.



+#define W_AVG_FUN(bit_depth) \
+static void vvc_w_

Re: [FFmpeg-devel] [PATCH 2/5] configure: Add detection of assembler support for SVE/SVE2

2024-09-26 Thread Martin Storsjö

On Tue, 17 Sep 2024, Martin Storsjö wrote:


It turns out that recent versions of MS armasm64 does support some
SVE instructions, but not all of them. Test for one of the
instructions that it currently doesn't support.

---

Just as disclaimer, I'm not currently actively planning on writing
SVE/SVE2 optimizations. However, related projects such as x264 and
dav1d do have a few functions using these extensions, so we might just
as well add the framework support for these features in ffmpeg
anyway, as functions needing this support will come sooner or later
anyway.

In the related projects, there's no really use of longer vectors
(as there's very little such HW available anyway), but SVE gives
widening loads (used in a couple places in x264) and 16 bit dot
products (used in dav1d), which can be useful with 128 bit vectors.
---
configure   | 14 +-
ffbuild/arch.mak|  2 ++
libavutil/aarch64/asm.S | 18 ++
3 files changed, 33 insertions(+), 1 deletion(-)


Planning on pushing this set later today.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] swscale/aarch64: Fix rgb24toyv12 only works with aligned width

2024-09-26 Thread Martin Storsjö

On Wed, 25 Sep 2024, Martin Storsjö wrote:


On Wed, 25 Sep 2024, Zhao Zhili wrote:


On Sep 25, 2024, at 16:01, Martin Storsjö  wrote:

On Tue, 24 Sep 2024, Zhao Zhili wrote:

ffmpeg | branch: master | Zhao Zhili  | Wed Sep 18 
21:11:44 2024 +0800| [e18b46d95fadcbaaf450bda9f1871849f2b0c586] | 
committer: Zhao Zhili


swscale/aarch64: Fix rgb24toyv12 only works with aligned width

Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.

Co-authored-by: johzzy 
Signed-off-by: Zhao Zhili 


http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=e18b46d95fadcbaaf450bda9f1871849f2b0c586

---

libswscale/aarch64/rgb2rgb.c | 23 ++-
tests/checkasm/sw_rgb.c  |  2 +-
2 files changed, 23 insertions(+), 2 deletions(-)
--- a/tests/checkasm/sw_rgb.c
+++ b/tests/checkasm/sw_rgb.c
@@ -129,7 +129,7 @@ static int cmp_off_by_n(const uint8_t *ref, const 
uint8_t *test, size_t n, int a


static void check_rgb24toyv12(struct SwsContext *ctx)
{
-static const int input_sizes[] = {16, 128, 512, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};
+static const int input_sizes[] = {2, 16, 128, 540, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};


   LOCAL_ALIGNED_32(uint8_t, src, [BUFSIZE * 3]);
   LOCAL_ALIGNED_32(uint8_t, buf_y_0, [BUFSIZE]);


These new test cases fail on x86_32; we have got a version of rgb24toyv12 
which is specific to "#if ARCH_X86_32 && HAVE_7REGS".


Can you have a look?


Sorry for the break. I’m on a short vacation without access to x86_32 test 
environment. And I’m not familiar with x86 asm. I’m afraid removing the new 
test is what I can do for now, if that’s an option.


Thanks - yeah I think that's the practically best thing to do at the moment. 
I guess this assembly has existed in this form for a very long time already, 
so while it probably is incorrect for these cases, it doesn't seem to be an 
urgent thing. (But I guess whatever case that was noted on aarch64 also would 
be noted on x86_32?) So silencing the test for now probably is simplest, 
until the assembly can be fixed.


I pushed a commit to remove these testcases from checkasm, for now. 
Thanks!


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2] avcodec/videotoolbox: add AV1 hardware acceleration

2024-09-25 Thread Martin Storsjö
From: Jan Ekström 

Use AV1DecContext's current_obu to access the original OBUs, and
feed them to videotoolbox, rather than the bare slice data passed
via decode_slice.

This requires a small addition to AV1DecContext, for keeping track
of the current range of OBUs that belong to the current frame.

Co-authored-by: Ruslan Chernenko 
Co-authored-by: Martin Storsjö 
---
v2: Use current_obu for accessing the original OBUs, rather than
trying to reassemble things from the parts passed via the
start_frame/decode_slice callbacks.

If a packet consists of one frame, we can pass the whole frame
as is, but if it consists of multiple frames, we need to pass
a limited range of OBUs corresponding to each frame at a time.

By passing all OBUs, we end up passing all OBUs to videotoolbox,
including metadata ones that we otherwise handle ourselves.
---
 configure |   4 ++
 libavcodec/Makefile   |   1 +
 libavcodec/av1dec.c   |  18 +-
 libavcodec/av1dec.h   |   1 +
 libavcodec/hwaccels.h |   1 +
 libavcodec/videotoolbox.c |  34 +++
 libavcodec/videotoolbox_av1.c | 104 ++
 libavcodec/vt_internal.h  |   4 ++
 8 files changed, 164 insertions(+), 3 deletions(-)
 create mode 100644 libavcodec/videotoolbox_av1.c

diff --git a/configure b/configure
index d77a55b653..ee4a66a68a 100755
--- a/configure
+++ b/configure
@@ -2461,6 +2461,7 @@ TYPES_LIST="
 kCMVideoCodecType_HEVC
 kCMVideoCodecType_HEVCWithAlpha
 kCMVideoCodecType_VP9
+kCMVideoCodecType_AV1
 kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
 kCVPixelFormatType_422YpCbCr8BiPlanarVideoRange
 kCVPixelFormatType_422YpCbCr10BiPlanarVideoRange
@@ -3166,6 +3167,8 @@ av1_vaapi_hwaccel_deps="vaapi 
VADecPictureParameterBufferAV1_bit_depth_idx"
 av1_vaapi_hwaccel_select="av1_decoder"
 av1_vdpau_hwaccel_deps="vdpau VdpPictureInfoAV1"
 av1_vdpau_hwaccel_select="av1_decoder"
+av1_videotoolbox_hwaccel_deps="videotoolbox"
+av1_videotoolbox_hwaccel_select="av1_decoder"
 av1_vulkan_hwaccel_deps="vulkan"
 av1_vulkan_hwaccel_select="av1_decoder"
 h263_vaapi_hwaccel_deps="vaapi"
@@ -6697,6 +6700,7 @@ enabled videotoolbox && {
 check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_HEVC 
"-framework CoreMedia"
 check_func_headers CoreMedia/CMFormatDescription.h 
kCMVideoCodecType_HEVCWithAlpha "-framework CoreMedia"
 check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_VP9 
"-framework CoreMedia"
+check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_AV1 
"-framework CoreMedia"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange "-framework CoreVideo"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_422YpCbCr8BiPlanarVideoRange "-framework CoreVideo"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_422YpCbCr10BiPlanarVideoRange "-framework CoreVideo"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index a4fcce3b42..21188b2479 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1008,6 +1008,7 @@ OBJS-$(CONFIG_AV1_D3D12VA_HWACCEL)+= dxva2_av1.o 
d3d12va_av1.o
 OBJS-$(CONFIG_AV1_NVDEC_HWACCEL)  += nvdec_av1.o
 OBJS-$(CONFIG_AV1_VAAPI_HWACCEL)  += vaapi_av1.o
 OBJS-$(CONFIG_AV1_VDPAU_HWACCEL)  += vdpau_av1.o
+OBJS-$(CONFIG_AV1_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_av1.o
 OBJS-$(CONFIG_AV1_VULKAN_HWACCEL) += vulkan_decode.o vulkan_av1.o
 OBJS-$(CONFIG_H263_VAAPI_HWACCEL) += vaapi_mpeg4.o
 OBJS-$(CONFIG_H263_VIDEOTOOLBOX_HWACCEL)  += videotoolbox.o
diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
index 80e52d1bea..80485fb9c9 100644
--- a/libavcodec/av1dec.c
+++ b/libavcodec/av1dec.c
@@ -543,6 +543,7 @@ static int get_pixel_format(AVCodecContext *avctx)
  CONFIG_AV1_NVDEC_HWACCEL + \
  CONFIG_AV1_VAAPI_HWACCEL + \
  CONFIG_AV1_VDPAU_HWACCEL + \
+ CONFIG_AV1_VIDEOTOOLBOX_HWACCEL + \
  CONFIG_AV1_VULKAN_HWACCEL)
 enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmtp = pix_fmts;
 
@@ -570,6 +571,9 @@ static int get_pixel_format(AVCodecContext *avctx)
 #if CONFIG_AV1_VDPAU_HWACCEL
 *fmtp++ = AV_PIX_FMT_VDPAU;
 #endif
+#if CONFIG_AV1_VIDEOTOOLBOX_HWACCEL
+*fmtp++ = AV_PIX_FMT_VIDEOTOOLBOX;
+#endif
 #if CONFIG_AV1_VULKAN_HWACCEL
 *fmtp++ = AV_PIX_FMT_VULKAN;
 #endif
@@ -594,6 +598,9 @@ static int get_pixel_format(AVCodecContext *avctx)
 #if CONFIG_AV1_VDPAU_HWACCEL
 *fmtp++ = AV_PIX_FMT_VDPAU;
 #endif
+#if CONFIG_AV1_VIDEOTOOLBOX_HWACCEL
+*fmtp++ = AV_PIX_FMT_VIDEOTOOLBOX;
+#endif
 #if CONFIG_AV1_VULKAN_HWACCEL
 *f

Re: [FFmpeg-devel] [FFmpeg-cvslog] swscale/aarch64: Fix rgb24toyv12 only works with aligned width

2024-09-25 Thread Martin Storsjö

On Wed, 25 Sep 2024, Zhao Zhili wrote:


On Sep 25, 2024, at 16:01, Martin Storsjö  wrote:

On Tue, 24 Sep 2024, Zhao Zhili wrote:


ffmpeg | branch: master | Zhao Zhili  | Wed Sep 18 
21:11:44 2024 +0800| [e18b46d95fadcbaaf450bda9f1871849f2b0c586] | committer: Zhao 
Zhili

swscale/aarch64: Fix rgb24toyv12 only works with aligned width

Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.

Co-authored-by: johzzy 
Signed-off-by: Zhao Zhili 


http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=e18b46d95fadcbaaf450bda9f1871849f2b0c586

---

libswscale/aarch64/rgb2rgb.c | 23 ++-
tests/checkasm/sw_rgb.c  |  2 +-
2 files changed, 23 insertions(+), 2 deletions(-)
--- a/tests/checkasm/sw_rgb.c
+++ b/tests/checkasm/sw_rgb.c
@@ -129,7 +129,7 @@ static int cmp_off_by_n(const uint8_t *ref, const uint8_t 
*test, size_t n, int a

static void check_rgb24toyv12(struct SwsContext *ctx)
{
-static const int input_sizes[] = {16, 128, 512, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};
+static const int input_sizes[] = {2, 16, 128, 540, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};

   LOCAL_ALIGNED_32(uint8_t, src, [BUFSIZE * 3]);
   LOCAL_ALIGNED_32(uint8_t, buf_y_0, [BUFSIZE]);


These new test cases fail on x86_32; we have got a version of rgb24toyv12 which is specific to 
"#if ARCH_X86_32 && HAVE_7REGS".

Can you have a look?


Sorry for the break. I’m on a short vacation without access to x86_32 
test environment. And I’m not familiar with x86 asm. I’m afraid removing 
the new test is what I can do for now, if that’s an option.


Thanks - yeah I think that's the practically best thing to do at the 
moment. I guess this assembly has existed in this form for a very long 
time already, so while it probably is incorrect for these cases, it 
doesn't seem to be an urgent thing. (But I guess whatever case that was 
noted on aarch64 also would be noted on x86_32?) So silencing the test for 
now probably is simplest, until the assembly can be fixed.


Or we could ifdef out these uneven cases for ARCH_X86_32, but that's also 
kinda ugly...


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] configure: Silence Xcode warnings about duplicate libraries

2024-09-25 Thread Martin Storsjö
Since Xcode 15, macOS developer tools use a new linker. The new
linker by default warns for duplicate -l options. As this is a
known and expected thing, not to be considered an issue, ask for
the warning to be silenced.

This silences linker warnings like this:

ld: warning: ignoring duplicate libraries: '-lc++', '-lcrypto', '-lm', 
'-logg', '-lpthread', '-lssl', '-lvorbis', '-lvpx', '-lz'

The linker can also warn about duplicate -rpath options, and there's
currently no option to silence those warnings.
---
 configure | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configure b/configure
index d77a55b653..a450b3c8d8 100755
--- a/configure
+++ b/configure
@@ -6480,6 +6480,7 @@ check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
_mm_setzero_si128()"
 
 check_ldflags -Wl,--as-needed
 check_ldflags -Wl,-z,noexecstack
+check_ldflags -Wl,-no_warn_duplicate_libraries
 
 if ! disabled network; then
 check_func getaddrinfo $network_extralibs
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/videotoolbox: add AV1 hardware acceleration

2024-09-25 Thread Martin Storsjö

On Wed, 25 Sep 2024, Martin Storsjö wrote:


On Tue, 24 Sep 2024, Cameron Gutman wrote:


On Tue, Sep 24, 2024 at 7:16 AM Martin Storsjö  wrote:


I don't hit any issues with any AV1 samples that I have, I guess I don't
have any samples with tile groups?

Can you or someone else grab and share a small sample of a stream that
fails to decode with this hwaccel, so we have a chance to debug it?



Sure, here's a raw AV1 bitstream sample that should exhibit the issue:
https://drive.google.com/file/d/1rp_O6pedhBYhDWFRuBCGTG1tfpNvtgrR/view


Thanks! I can indeed reproduce the issue with this sample.


FWIW, this whole bit feels like a bit of a mess; videotoolbox is 
inherently not an API at the same level as the other hwaccels that support 
AV1 - videotoolbox is just a full-packet decoder; if I hack it to bypass 
the whole start_frame/decode_slice infrastructure and just pass the whole 
input AVPacket into videotoolbox, it all just works, and we would have had 
this working ages ago already.


It's just that the hwaccel hooks get data fed via these 
decode_params/start_frame/decode_slice callbacks, and we'd need to 
essentially reassemble the complete input packet from that.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] swscale/aarch64: Fix rgb24toyv12 only works with aligned width

2024-09-25 Thread Martin Storsjö

On Tue, 24 Sep 2024, Zhao Zhili wrote:


ffmpeg | branch: master | Zhao Zhili  | Wed Sep 18 
21:11:44 2024 +0800| [e18b46d95fadcbaaf450bda9f1871849f2b0c586] | committer: Zhao 
Zhili

swscale/aarch64: Fix rgb24toyv12 only works with aligned width

Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.

Co-authored-by: johzzy 
Signed-off-by: Zhao Zhili 


http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=e18b46d95fadcbaaf450bda9f1871849f2b0c586

---

libswscale/aarch64/rgb2rgb.c | 23 ++-
tests/checkasm/sw_rgb.c  |  2 +-
2 files changed, 23 insertions(+), 2 deletions(-)

--- a/tests/checkasm/sw_rgb.c
+++ b/tests/checkasm/sw_rgb.c
@@ -129,7 +129,7 @@ static int cmp_off_by_n(const uint8_t *ref, const uint8_t 
*test, size_t n, int a

static void check_rgb24toyv12(struct SwsContext *ctx)
{
-static const int input_sizes[] = {16, 128, 512, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};
+static const int input_sizes[] = {2, 16, 128, 540, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};

LOCAL_ALIGNED_32(uint8_t, src, [BUFSIZE * 3]);
LOCAL_ALIGNED_32(uint8_t, buf_y_0, [BUFSIZE]);


These new test cases fail on x86_32; we have got a version of 
rgb24toyv12 which is specific to "#if ARCH_X86_32 && HAVE_7REGS".


Can you have a look?

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/videotoolbox: add AV1 hardware acceleration

2024-09-25 Thread Martin Storsjö

On Tue, 24 Sep 2024, Cameron Gutman wrote:


On Tue, Sep 24, 2024 at 7:16 AM Martin Storsjö  wrote:


I don't hit any issues with any AV1 samples that I have, I guess I don't
have any samples with tile groups?

Can you or someone else grab and share a small sample of a stream that
fails to decode with this hwaccel, so we have a chance to debug it?



Sure, here's a raw AV1 bitstream sample that should exhibit the issue:
https://drive.google.com/file/d/1rp_O6pedhBYhDWFRuBCGTG1tfpNvtgrR/view


Thanks! I can indeed reproduce the issue with this sample.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] tests/fate/hevc: use bitexact scaling flags for fate-hevc-mv-switch

2024-09-24 Thread Martin Storsjö

On Tue, 24 Sep 2024, Anton Khirnov wrote:


Makes the results consistent across platforms.
---
tests/fate/hevc.mak   |   2 +-
tests/ref/fate/hevc-mv-switch | 296 +-
2 files changed, 149 insertions(+), 149 deletions(-)

diff --git a/tests/fate/hevc.mak b/tests/fate/hevc.mak
index 6d8865ea66..7f7ec43902 100644
--- a/tests/fate/hevc.mak
+++ b/tests/fate/hevc.mak
@@ -283,7 +283,7 @@ $(TARGET_SAMPLES)/hevc-conformance/LS_A_Orange_2.bit|$\
$(TARGET_SAMPLES)/hevc/mv_nuh_layer_id.bit|$\
$(TARGET_SAMPLES)/hevc-conformance/NoOutPrior_B_Qualcomm_1.bit|$\
$(TARGET_SAMPLES)/hevc-conformance/MVHEVCS_A.bit
-fate-hevc-mv-switch: CMD = framecrc -i "concat:$(INPUT)" -fps_mode passthrough 
-map 0:vidx:0 -map 0:vidx:1
+fate-hevc-mv-switch: CMD = framecrc -i "concat:$(INPUT)" -fps_mode passthrough 
-map 0:vidx:0 -map 0:vidx:1 -sws_flags +accurate_rnd+bitexact
FATE_HEVC-$(call FRAMECRC, HEVC, HEVC, CONCAT_PROTOCOL) += fate-hevc-mv-switch


LGTM, this seems to fix things on aarch64.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/videotoolbox: add AV1 hardware acceleration

2024-09-24 Thread Martin Storsjö

On Mon, 23 Sep 2024, Cameron Gutman wrote:


On Mon, Sep 23, 2024 at 12:43 PM Zhao Zhili  wrote:




> On Sep 24, 2024, at 01:24, Cameron Gutman  wrote:
>
> On Mon, Sep 23, 2024 at 6:07 AM Zhao Zhili  wrote:
>>
>>
>>
>>> On Sep 21, 2024, at 05:39, Martin Storsjö  wrote:
>>>
>>> From: Jan Ekström 
>>>
>>> Co-authored-by: Ruslan Chernenko 
>>> Co-authored-by: Martin Storsjö 
>>> ---
>>> This is a touched up version of Jan and Ruslan's patches for
>>> AV1 hwaccel via videotoolbox; I tried to polish the code a little
>>> by not overwriting avctx->extradata in
>>> ff_videotoolbox_av1c_extradata_create, and by factorizing out a
>>> new function ff_videotoolbox_buffer_append.
>>
>> LGTM, although I don’t have a device with AV1 support.
>
> I've asked for some testing from users with M3 MacBooks and it
> appears to have problems with certain resolutions (notably 4K).
>
> https://github.com/moonlight-stream/moonlight-qt/issues/1125
>
> It's possible this is a Moonlight bug, but that seems unlikely
> because VideoToolbox HEVC decoding works fine at 4K and
> VideoToolbox AV1 works at 1080p and other resolutions.

I can’t tell what’s going wrong from that bug report. Please test
with ffmpeg and/or ffplay cmdline and share the results.



I'm debugging this blind since I don't have hardware either, but I think
we're mishandling Tile Group OBUs in this patch.

Comparing working vs non-working logs, it looks like the encoder is using
2x1 tiling when encoding 4K and 1x1 for smaller unaffected resolutions.

Working:
[av1 @ 0x14f7b14c0] Frame 0:  size 1280x720  upscaled 1280  render
1280x720  subsample 2x2  bitdepth 10  tiles 1x1.
[av1 @ 0x14f7b14c0] Total OBUs on this packet: 4.
[av1 @ 0x14f7b14c0] OBU idx:0, type:2, content available:1.
[av1 @ 0x14f7b14c0] OBU idx:1, type:1, content available:1.
[av1 @ 0x14f7b14c0] OBU idx:2, type:6, content available:1.
[av1 @ 0x14f7b14c0] Format videotoolbox_vld chosen by get_format().
[av1 @ 0x14f7b14c0] Format videotoolbox_vld requires hwaccel
av1_videotoolbox initialisation.
[av1 @ 0x14f7b14c0] AV1 decode get format: videotoolbox_vld.

Broken:
[av1 @ 0x15128b530] Frame 0:  size 3840x2160  upscaled 3840  render
3840x2160  subsample 2x2  bitdepth 10  tiles 2x1.
[av1 @ 0x15128b530] Total OBUs on this packet: 4.
[av1 @ 0x15128b530] OBU idx:0, type:2, content available:1.
[av1 @ 0x15128b530] OBU idx:1, type:1, content available:1.
[av1 @ 0x15128b530] OBU idx:2, type:3, content available:1.
[av1 @ 0x15128b530] Format videotoolbox_vld chosen by get_format().
[av1 @ 0x15128b530] Format videotoolbox_vld requires hwaccel
av1_videotoolbox initialisation.
[av1 @ 0x15128b530] AV1 decode get format: videotoolbox_vld.
[av1 @ 0x15128b530] OBU idx:3, type:4, content available:1.
[av1 @ 0x15128b530] vt decoder cb: output image buffer is null: -17694
[av1 @ 0x15128b530] HW accel end frame fail.

In the broken case, instead of a Frame OBU, we get a Frame Header OBU and
a Tile Group OBU. To handle Tile Group OBUs, av1dec.c calls decode_slice()
function, but videotoolbox_av1_decode_slice() in this patch simply returns
without appending the OBU data to bitstream buffer.

It looks like other AV1 hwaccels ignore the data buffer provided in the
start_frame() callback and instead append to their bitstream buffers in
decode_slice() instead. Maybe that's what we should do here too?


That sounds plausible.

I tried a modification like this on top:

diff --git a/libavcodec/videotoolbox_av1.c b/libavcodec/videotoolbox_av1.c
index 9ccf65bb25..5fb1d06ddb 100644
--- a/libavcodec/videotoolbox_av1.c
+++ b/libavcodec/videotoolbox_av1.c
@@ -74,15 +74,15 @@ static int videotoolbox_av1_start_frame(AVCodecContext 
*avctx,
 const uint8_t *buffer,
 uint32_t size)
 {
-VTContext *vtctx = avctx->internal->hwaccel_priv_data;
-return ff_videotoolbox_buffer_append(vtctx, buffer, size);
+return 0;
 }

 static int videotoolbox_av1_decode_slice(AVCodecContext *avctx,
  const uint8_t *buffer,
  uint32_t size)
 {
-return 0;
+VTContext *vtctx = avctx->internal->hwaccel_priv_data;
+return ff_videotoolbox_buffer_append(vtctx, buffer, size);
 }

 static int videotoolbox_av1_end_frame(AVCodecContext *avctx)


Unfortunately this isn't quite enough to make it work - we probably need 
to add some extra framing around the data we get in decode_slice. (The 
data we get in decode_slice is around 20 bytes shorter than what we get in 
start_frame.)


I don't hit any issues with any AV1 samples that I have, I guess I don't 
have any samples with tile groups?


Can you or someone else grab and share a small sample of a stre

[FFmpeg-devel] [PATCH v2] compat: Fix the fallback definition of stdc_trailing_zeros

2024-09-24 Thread Martin Storsjö
While shifting "value" to left, we would iterate through all bits
of an unsigned long long, while we only expect to count through
"size * CHAR_BIT" bits; instead shift bits to the right and just
count the trailing zeros.

This fixes fate with MSVC.
---
Fixed the UB by shifting to the right instead of to the left.
---
 compat/stdbit/stdbit.h | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/compat/stdbit/stdbit.h b/compat/stdbit/stdbit.h
index b434fc2357..53419cf9f9 100644
--- a/compat/stdbit/stdbit.h
+++ b/compat/stdbit/stdbit.h
@@ -178,11 +178,14 @@ static inline unsigned int 
stdc_trailing_zeros_uc(unsigned char value)
 static inline unsigned int __stdc_trailing_zeros(unsigned long long value,
  unsigned int size)
 {
-unsigned int zeros = size * CHAR_BIT;
+unsigned int zeros = 0;
 
-while (value != 0) {
-value <<= 1;
-zeros--;
+if (!value)
+return size * CHAR_BIT;
+
+while ((value & 1) == 0) {
+value >>= 1;
+zeros++;
 }
 
 return zeros;
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] compat: Fix the fallback definition of stdc_trailing_zeros

2024-09-24 Thread Martin Storsjö
While shifting "value" to left, we would iterate through all bits
of an unsigned long long, while we only expect to count through
"size * CHAR_BIT" bits.

This fixes fate with MSVC.
---
 compat/stdbit/stdbit.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/compat/stdbit/stdbit.h b/compat/stdbit/stdbit.h
index b434fc2357..3197a24938 100644
--- a/compat/stdbit/stdbit.h
+++ b/compat/stdbit/stdbit.h
@@ -179,9 +179,10 @@ static inline unsigned int __stdc_trailing_zeros(unsigned 
long long value,
  unsigned int size)
 {
 unsigned int zeros = size * CHAR_BIT;
+unsigned long long mask = (1ULL << (size * CHAR_BIT)) - 1;
 
 while (value != 0) {
-value <<= 1;
+value = (value << 1) & mask;
 zeros--;
 }
 
-- 
2.39.5 (Apple Git-154)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3] libavutil/ppc: Make use of getauxval() and elf_aux_info() on ppc

2024-09-20 Thread Martin Storsjö

On Fri, 20 Sep 2024, Brad Smith wrote:


ping.

On 2024-09-15 1:40 a.m., Brad Smith wrote:

libavutil/ppc: Make use of getauxval() and elf_aux_info() on ppc

Modern Linux has getauxval() and FreeBSD/OpenBSD ppc have 
elf_aux_info().


Signed-off-by: Brad Smith 
---
v2: adjust to build with older glibc.
v3: freebsd/ppc requires machine/cpu.h header for feature flags.

  libavutil/ppc/cpu.c | 25 +
  1 file changed, 25 insertions(+)

diff --git a/libavutil/ppc/cpu.c b/libavutil/ppc/cpu.c
index 2b13cda662..9381272175 100644
--- a/libavutil/ppc/cpu.c
+++ b/libavutil/ppc/cpu.c
@@ -20,6 +20,11 @@
#ifdef __APPLE__
  #include 
+#elif HAVE_GETAUXVAL || HAVE_ELF_AUX_INFO
+#ifdef __FreeBSD__
+#include 
+#endif
+#include 
  #elif defined(__linux__)
  #include 
  #include 
@@ -56,6 +61,26 @@ int ff_get_cpu_flags_ppc(void)
  if (result == VECTORTYPE_ALTIVEC)
  return AV_CPU_FLAG_ALTIVEC;
  return 0;
+#elif HAVE_GETAUXVAL || HAVE_ELF_AUX_INFO
+int flags = 0;
+
+unsigned long hwcap = ff_getauxval(AT_HWCAP);
+#ifdef PPC_FEATURE2_ARCH_2_07
+unsigned long hwcap2 = ff_getauxval(AT_HWCAP2);
+#endif
+
+if (hwcap & PPC_FEATURE_HAS_ALTIVEC)
+   flags |= AV_CPU_FLAG_ALTIVEC;
+#ifdef PPC_FEATURE_HAS_VSX
+if (hwcap & PPC_FEATURE_HAS_VSX)
+   flags |= AV_CPU_FLAG_VSX;
+#endif
+#ifdef PPC_FEATURE2_ARCH_2_07
+if (hwcap2 & PPC_FEATURE2_ARCH_2_07)
+   flags |= AV_CPU_FLAG_POWER8;
+#endif
+
+return flags;
  #elif defined(__APPLE__) || defined(__NetBSD__) || 
defined(__OpenBSD__)

  #if defined(__NetBSD__) || defined(__OpenBSD__)
  int sels[2] = {CTL_MACHDEP, CPU_ALTIVEC};


I don'k know much specifically about ppc, but the patch seems reasonable 
to me.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] avcodec/videotoolbox: add AV1 hardware acceleration

2024-09-20 Thread Martin Storsjö
From: Jan Ekström 

Co-authored-by: Ruslan Chernenko 
Co-authored-by: Martin Storsjö 
---
This is a touched up version of Jan and Ruslan's patches for
AV1 hwaccel via videotoolbox; I tried to polish the code a little
by not overwriting avctx->extradata in
ff_videotoolbox_av1c_extradata_create, and by factorizing out a
new function ff_videotoolbox_buffer_append.
---
 configure |   4 ++
 libavcodec/Makefile   |   1 +
 libavcodec/av1dec.c   |  10 +++
 libavcodec/hwaccels.h |   1 +
 libavcodec/videotoolbox.c |  34 +++
 libavcodec/videotoolbox_av1.c | 112 ++
 libavcodec/vt_internal.h  |   4 ++
 7 files changed, 166 insertions(+)
 create mode 100644 libavcodec/videotoolbox_av1.c

diff --git a/configure b/configure
index 8fbf3772a8..bc244d7ca2 100755
--- a/configure
+++ b/configure
@@ -2464,6 +2464,7 @@ TYPES_LIST="
 kCMVideoCodecType_HEVC
 kCMVideoCodecType_HEVCWithAlpha
 kCMVideoCodecType_VP9
+kCMVideoCodecType_AV1
 kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange
 kCVPixelFormatType_422YpCbCr8BiPlanarVideoRange
 kCVPixelFormatType_422YpCbCr10BiPlanarVideoRange
@@ -3171,6 +3172,8 @@ av1_vaapi_hwaccel_deps="vaapi 
VADecPictureParameterBufferAV1_bit_depth_idx"
 av1_vaapi_hwaccel_select="av1_decoder"
 av1_vdpau_hwaccel_deps="vdpau VdpPictureInfoAV1"
 av1_vdpau_hwaccel_select="av1_decoder"
+av1_videotoolbox_hwaccel_deps="videotoolbox"
+av1_videotoolbox_hwaccel_select="av1_decoder"
 av1_vulkan_hwaccel_deps="vulkan"
 av1_vulkan_hwaccel_select="av1_decoder"
 h263_vaapi_hwaccel_deps="vaapi"
@@ -6690,6 +6693,7 @@ enabled videotoolbox && {
 check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_HEVC 
"-framework CoreMedia"
 check_func_headers CoreMedia/CMFormatDescription.h 
kCMVideoCodecType_HEVCWithAlpha "-framework CoreMedia"
 check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_VP9 
"-framework CoreMedia"
+check_func_headers CoreMedia/CMFormatDescription.h kCMVideoCodecType_AV1 
"-framework CoreMedia"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_420YpCbCr10BiPlanarVideoRange "-framework CoreVideo"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_422YpCbCr8BiPlanarVideoRange "-framework CoreVideo"
 check_func_headers CoreVideo/CVPixelBuffer.h 
kCVPixelFormatType_422YpCbCr10BiPlanarVideoRange "-framework CoreVideo"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 936fc3415a..fa6d30a8b3 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1007,6 +1007,7 @@ OBJS-$(CONFIG_AV1_D3D12VA_HWACCEL)+= dxva2_av1.o 
d3d12va_av1.o
 OBJS-$(CONFIG_AV1_NVDEC_HWACCEL)  += nvdec_av1.o
 OBJS-$(CONFIG_AV1_VAAPI_HWACCEL)  += vaapi_av1.o
 OBJS-$(CONFIG_AV1_VDPAU_HWACCEL)  += vdpau_av1.o
+OBJS-$(CONFIG_AV1_VIDEOTOOLBOX_HWACCEL)   += videotoolbox_av1.o
 OBJS-$(CONFIG_AV1_VULKAN_HWACCEL) += vulkan_decode.o vulkan_av1.o
 OBJS-$(CONFIG_H263_VAAPI_HWACCEL) += vaapi_mpeg4.o
 OBJS-$(CONFIG_H263_VIDEOTOOLBOX_HWACCEL)  += videotoolbox.o
diff --git a/libavcodec/av1dec.c b/libavcodec/av1dec.c
index 1d5b9ef4f4..0fad09af74 100644
--- a/libavcodec/av1dec.c
+++ b/libavcodec/av1dec.c
@@ -541,6 +541,7 @@ static int get_pixel_format(AVCodecContext *avctx)
  CONFIG_AV1_NVDEC_HWACCEL + \
  CONFIG_AV1_VAAPI_HWACCEL + \
  CONFIG_AV1_VDPAU_HWACCEL + \
+ CONFIG_AV1_VIDEOTOOLBOX_HWACCEL + \
  CONFIG_AV1_VULKAN_HWACCEL)
 enum AVPixelFormat pix_fmts[HWACCEL_MAX + 2], *fmtp = pix_fmts;
 
@@ -568,6 +569,9 @@ static int get_pixel_format(AVCodecContext *avctx)
 #if CONFIG_AV1_VDPAU_HWACCEL
 *fmtp++ = AV_PIX_FMT_VDPAU;
 #endif
+#if CONFIG_AV1_VIDEOTOOLBOX_HWACCEL
+*fmtp++ = AV_PIX_FMT_VIDEOTOOLBOX;
+#endif
 #if CONFIG_AV1_VULKAN_HWACCEL
 *fmtp++ = AV_PIX_FMT_VULKAN;
 #endif
@@ -592,6 +596,9 @@ static int get_pixel_format(AVCodecContext *avctx)
 #if CONFIG_AV1_VDPAU_HWACCEL
 *fmtp++ = AV_PIX_FMT_VDPAU;
 #endif
+#if CONFIG_AV1_VIDEOTOOLBOX_HWACCEL
+*fmtp++ = AV_PIX_FMT_VIDEOTOOLBOX;
+#endif
 #if CONFIG_AV1_VULKAN_HWACCEL
 *fmtp++ = AV_PIX_FMT_VULKAN;
 #endif
@@ -1594,6 +1601,9 @@ const FFCodec ff_av1_decoder = {
 #if CONFIG_AV1_VDPAU_HWACCEL
 HWACCEL_VDPAU(av1),
 #endif
+#if CONFIG_AV1_VIDEOTOOLBOX_HWACCEL
+HWACCEL_VIDEOTOOLBOX(av1),
+#endif
 #if CONFIG_AV1_VULKAN_HWACCEL
 HWACCEL_VULKAN(av1),
 #endif
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index 5171e4c7d7..2b9bdc8fc9 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -26,6 +26,7 @@ extern const struct FFHWAccel ff_av1_dxva2_hwaccel;
 extern 

Re: [FFmpeg-devel] [PATCH] swscale/aarch64: Fix rgb24toyv12 only works with aligned width

2024-09-18 Thread Martin Storsjö

On Wed, 18 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.

Signed-off-by: Zhao Zhili 
Co-authored-by: johzzy 
---
libswscale/aarch64/rgb2rgb.c | 23 ++-
tests/checkasm/sw_rgb.c  |  2 +-
2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/libswscale/aarch64/rgb2rgb.c b/libswscale/aarch64/rgb2rgb.c
index d978a6f173..20a25033cb 100644
--- a/libswscale/aarch64/rgb2rgb.c
+++ b/libswscale/aarch64/rgb2rgb.c
@@ -27,9 +27,30 @@
#include "libswscale/swscale.h"
#include "libswscale/swscale_internal.h"

+// Only handle width aligned to 16
void ff_rgb24toyv12_neon(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
 uint8_t *vdst, int width, int height, int lumStride,
 int chromStride, int srcStride, int32_t *rgb2yuv);
+
+static void rgb24toyv12(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+uint8_t *vdst, int width, int height, int lumStride,
+int chromStride, int srcStride, int32_t *rgb2yuv)
+{
+int width_align = width & (~15);
+
+if (width_align > 0)
+ff_rgb24toyv12_neon(src, ydst, udst, vdst, width_align, height,
+lumStride, chromStride, srcStride, rgb2yuv);
+if (width_align < width) {
+src += width_align * 3;
+ydst += width_align;
+udst += width_align / 2;
+vdst += width_align / 2;
+ff_rgb24toyv12_c(src, ydst, udst, vdst, width - width_align, height,
+lumStride, chromStride, srcStride, rgb2yuv);
+}
+}
+
void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2,
  uint8_t *dest, int width, int height,
  int src1Stride, int src2Stride, int dstStride);
@@ -42,7 +63,7 @@ av_cold void rgb2rgb_init_aarch64(void)
int cpu_flags = av_get_cpu_flags();

if (have_neon(cpu_flags)) {
-ff_rgb24toyv12  = ff_rgb24toyv12_neon;
+ff_rgb24toyv12  = rgb24toyv12;
interleaveBytes = ff_interleave_bytes_neon;
deinterleaveBytes = ff_deinterleave_bytes_neon;
}
diff --git a/tests/checkasm/sw_rgb.c b/tests/checkasm/sw_rgb.c
index af9434073a..a57c471e3b 100644
--- a/tests/checkasm/sw_rgb.c
+++ b/tests/checkasm/sw_rgb.c
@@ -129,7 +129,7 @@ static int cmp_off_by_n(const uint8_t *ref, const uint8_t 
*test, size_t n, int a

static void check_rgb24toyv12(struct SwsContext *ctx)
{
-static const int input_sizes[] = {16, 128, 512, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};
+static const int input_sizes[] = {4, 16, 128, 512, MAX_LINE_SIZE, 
-MAX_LINE_SIZE};



I think it would be good to test a case which isn't mod16, but bigger than 
16 as well.


Other than that, I think this change looks reasonable.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 5/5] checkasm: Print the SVE vector length at startup

2024-09-17 Thread Martin Storsjö
---
 tests/checkasm/checkasm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index c932e028a5..c9d2b5faf1 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -94,6 +94,10 @@
 #define isatty(fd) 1
 #endif
 
+#if ARCH_AARCH64
+#include "libavutil/aarch64/cpu.h"
+#endif
+
 #if ARCH_ARM && HAVE_ARMV5TE_EXTERNAL
 #include "libavutil/arm/cpu.h"
 
@@ -917,6 +921,7 @@ int main(int argc, char *argv[])
 {
 unsigned int seed = av_get_random_seed();
 int i, ret = 0;
+char arch_info_buf[50] = "";
 
 #ifdef _WIN32
 #if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
@@ -981,7 +986,12 @@ int main(int argc, char *argv[])
 }
 }
 
-fprintf(stderr, "checkasm: using random seed %u\n", seed);
+#if ARCH_AARCH64 && HAVE_SVE
+if (have_sve(av_get_cpu_flags()))
+snprintf(arch_info_buf, sizeof(arch_info_buf),
+ "SVE %d bits, ", 8 * ff_aarch64_sve_length());
+#endif
+fprintf(stderr, "checkasm: %susing random seed %u\n", arch_info_buf, seed);
 av_lfg_init(&checkasm_lfg, seed);
 
 if (state.bench_pattern)
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 4/5] aarch64: Print the SVE vector length in libavutil/tests/cpu.c

2024-09-17 Thread Martin Storsjö
This makes this aspect more visible in test logs.
---
 libavutil/aarch64/Makefile  |  2 ++
 libavutil/aarch64/cpu.h |  4 
 libavutil/aarch64/cpu_sve.S | 29 +
 libavutil/tests/cpu.c   |  8 
 4 files changed, 43 insertions(+)
 create mode 100644 libavutil/aarch64/cpu_sve.S

diff --git a/libavutil/aarch64/Makefile b/libavutil/aarch64/Makefile
index eba0151337..992e95e4df 100644
--- a/libavutil/aarch64/Makefile
+++ b/libavutil/aarch64/Makefile
@@ -4,3 +4,5 @@ OBJS += aarch64/cpu.o   
  \
 
 NEON-OBJS += aarch64/float_dsp_neon.o \
  aarch64/tx_float_neon.o  \
+
+SVE-OBJS += aarch64/cpu_sve.o \
diff --git a/libavutil/aarch64/cpu.h b/libavutil/aarch64/cpu.h
index df7becca30..a41b729659 100644
--- a/libavutil/aarch64/cpu.h
+++ b/libavutil/aarch64/cpu.h
@@ -30,4 +30,8 @@
 #define have_sve(flags) CPUEXT(flags, SVE)
 #define have_sve2(flags)CPUEXT(flags, SVE2)
 
+#if HAVE_SVE
+int ff_aarch64_sve_length(void);
+#endif
+
 #endif /* AVUTIL_AARCH64_CPU_H */
diff --git a/libavutil/aarch64/cpu_sve.S b/libavutil/aarch64/cpu_sve.S
new file mode 100644
index 00..d216ed2c49
--- /dev/null
+++ b/libavutil/aarch64/cpu_sve.S
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2023 Martin Storsjo
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "asm.S"
+
+ENABLE_SVE
+
+function ff_aarch64_sve_length, export=1
+cntbx0
+ret
+endfunc
diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
index 679b538f0f..abe2b057d7 100644
--- a/libavutil/tests/cpu.c
+++ b/libavutil/tests/cpu.c
@@ -23,6 +23,10 @@
 #include "libavutil/cpu.h"
 #include "libavutil/avstring.h"
 
+#if ARCH_AARCH64
+#include "libavutil/aarch64/cpu.h"
+#endif
+
 #if HAVE_UNISTD_H
 #include 
 #endif
@@ -161,6 +165,10 @@ int main(int argc, char **argv)
 print_cpu_flags(cpu_flags_raw, "raw");
 print_cpu_flags(cpu_flags_eff, "effective");
 printf("threads = %s (cpu_count = %d)\n", threads, cpu_count);
+#if ARCH_AARCH64
+if (cpu_flags_raw & AV_CPU_FLAG_SVE)
+printf("sve_vector_length = %d\n", 8 * ff_aarch64_sve_length());
+#endif
 
 return 0;
 }
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 3/5] aarch64: Add CPU feature flags for SVE and SVE2

2024-09-17 Thread Martin Storsjö
Add code for detecting the feature on Linux and Windows.
---
 libavutil/aarch64/cpu.c   | 20 
 libavutil/aarch64/cpu.h   |  2 ++
 libavutil/cpu.c   |  2 ++
 libavutil/cpu.h   |  2 ++
 libavutil/tests/cpu.c |  2 ++
 tests/checkasm/checkasm.c |  2 ++
 6 files changed, 30 insertions(+)

diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index fe24b1da4d..e82c0f19ab 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -25,6 +25,8 @@
 #include 
 
 #define HWCAP_AARCH64_ASIMDDP (1 << 20)
+#define HWCAP_AARCH64_SVE (1 << 22)
+#define HWCAP2_AARCH64_SVE2   (1 << 1)
 #define HWCAP2_AARCH64_I8MM   (1 << 13)
 
 static int detect_flags(void)
@@ -36,6 +38,10 @@ static int detect_flags(void)
 
 if (hwcap & HWCAP_AARCH64_ASIMDDP)
 flags |= AV_CPU_FLAG_DOTPROD;
+if (hwcap & HWCAP_AARCH64_SVE)
+flags |= AV_CPU_FLAG_SVE;
+if (hwcap2 & HWCAP2_AARCH64_SVE2)
+flags |= AV_CPU_FLAG_SVE2;
 if (hwcap2 & HWCAP2_AARCH64_I8MM)
 flags |= AV_CPU_FLAG_I8MM;
 
@@ -119,6 +125,14 @@ static int detect_flags(void)
  * regular I8MM is available. */
 if (IsProcessorFeaturePresent(PF_ARM_SVE_I8MM_INSTRUCTIONS_AVAILABLE))
 flags |= AV_CPU_FLAG_I8MM;
+#endif
+#ifdef PF_ARM_SVE_INSTRUCTIONS_AVAILABLE
+if (IsProcessorFeaturePresent(PF_ARM_SVE_INSTRUCTIONS_AVAILABLE))
+flags |= AV_CPU_FLAG_SVE;
+#endif
+#ifdef PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE
+if (IsProcessorFeaturePresent(PF_ARM_SVE2_INSTRUCTIONS_AVAILABLE))
+flags |= AV_CPU_FLAG_SVE2;
 #endif
 return flags;
 }
@@ -142,6 +156,12 @@ int ff_get_cpu_flags_aarch64(void)
 #ifdef __ARM_FEATURE_MATMUL_INT8
 flags |= AV_CPU_FLAG_I8MM;
 #endif
+#ifdef __ARM_FEATURE_SVE
+flags |= AV_CPU_FLAG_SVE;
+#endif
+#ifdef __ARM_FEATURE_SVE2
+flags |= AV_CPU_FLAG_SVE2;
+#endif
 
 flags |= detect_flags();
 
diff --git a/libavutil/aarch64/cpu.h b/libavutil/aarch64/cpu.h
index 64d703be37..df7becca30 100644
--- a/libavutil/aarch64/cpu.h
+++ b/libavutil/aarch64/cpu.h
@@ -27,5 +27,7 @@
 #define have_vfp(flags)  CPUEXT(flags, VFP)
 #define have_dotprod(flags) CPUEXT(flags, DOTPROD)
 #define have_i8mm(flags)CPUEXT(flags, I8MM)
+#define have_sve(flags) CPUEXT(flags, SVE)
+#define have_sve2(flags)CPUEXT(flags, SVE2)
 
 #endif /* AVUTIL_AARCH64_CPU_H */
diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index df00bd541f..e16ebc0d38 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -180,6 +180,8 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
 { "vfp",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_VFP 
 },.unit = "flags" },
 { "dotprod",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_DOTPROD 
 },.unit = "flags" },
 { "i8mm", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_I8MM
 },.unit = "flags" },
+{ "sve",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_SVE 
 },.unit = "flags" },
+{ "sve2", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_SVE2
 },.unit = "flags" },
 #elif ARCH_MIPS
 { "mmi",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MMI 
 },.unit = "flags" },
 { "msa",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_MSA 
 },.unit = "flags" },
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index ba6c234e04..6b6e50f07a 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -72,6 +72,8 @@
 #define AV_CPU_FLAG_VFP_VM   (1 << 7) ///< VFPv2 vector mode, deprecated 
in ARMv7-A and unavailable in various CPUs implementations
 #define AV_CPU_FLAG_DOTPROD  (1 << 8)
 #define AV_CPU_FLAG_I8MM (1 << 9)
+#define AV_CPU_FLAG_SVE  (1 <<10)
+#define AV_CPU_FLAG_SVE2 (1 <<11)
 #define AV_CPU_FLAG_SETEND   (1 <<16)
 
 #define AV_CPU_FLAG_MMI  (1 << 0)
diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
index 0a459c1d9e..679b538f0f 100644
--- a/libavutil/tests/cpu.c
+++ b/libavutil/tests/cpu.c
@@ -40,6 +40,8 @@ static const struct {
 { AV_CPU_FLAG_VFP,   "vfp"},
 { AV_CPU_FLAG_DOTPROD,   "dotprod"},
 { AV_CPU_FLAG_I8MM,  "i8mm"   },
+{ AV_CPU_FLAG_SVE,   "sve"},
+{ AV_CPU_FLAG_SVE2,  "sve2"   },
 #elif ARCH_ARM
 { AV_CPU_FLAG_ARMV5TE,   "armv5te"},
 { AV_CPU_FLAG_ARMV6, "armv6"  },
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 73a998ae3a..c932e028a5 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -305,6 +305,8 @@ static const struct {
 { "NEON", "neon", AV_CPU_FLAG_NEON },
 { "DOTPROD",  "dotprod",  AV_CPU_FLAG_DOTPROD },
 { "I8MM", "i8mm", AV_CPU_FLAG_I8MM },
+{ "SVE",  "sve",  AV_CPU_FLAG_SVE },
+{ "SVE2", "sve2", AV_CPU_FLAG_SVE2 },
 #elif ARCH_ARM
 { "ARMV5TE",  "armv5te",  AV_CPU_FLAG_ARMV5TE },
 { "ARMV6","armv6",AV_CPU_FLAG_ARMV6 },
-- 
2.34.

[FFmpeg-devel] [PATCH 2/5] configure: Add detection of assembler support for SVE/SVE2

2024-09-17 Thread Martin Storsjö
It turns out that recent versions of MS armasm64 does support some
SVE instructions, but not all of them. Test for one of the
instructions that it currently doesn't support.

---

Just as disclaimer, I'm not currently actively planning on writing
SVE/SVE2 optimizations. However, related projects such as x264 and
dav1d do have a few functions using these extensions, so we might just
as well add the framework support for these features in ffmpeg
anyway, as functions needing this support will come sooner or later
anyway.

In the related projects, there's no really use of longer vectors
(as there's very little such HW available anyway), but SVE gives
widening loads (used in a couple places in x264) and 16 bit dot
products (used in dav1d), which can be useful with 128 bit vectors.
---
 configure   | 14 +-
 ffbuild/arch.mak|  2 ++
 libavutil/aarch64/asm.S | 18 ++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index da36419f2d..d05c4a5a51 100755
--- a/configure
+++ b/configure
@@ -466,6 +466,8 @@ Optimization options (experts only):
   --disable-neon   disable NEON optimizations
   --disable-dotproddisable DOTPROD optimizations
   --disable-i8mm   disable I8MM optimizations
+  --disable-svedisable SVE optimizations
+  --disable-sve2   disable SVE2 optimizations
   --disable-inline-asm disable use of inline assembly
   --disable-x86asm disable use of standalone x86 assembly
   --disable-mipsdspdisable MIPS DSP ASE R1 optimizations
@@ -2163,6 +2165,8 @@ ARCH_EXT_LIST_ARM="
 vfp
 vfpv3
 setend
+sve
+sve2
 "
 
 ARCH_EXT_LIST_MIPS="
@@ -2435,6 +2439,8 @@ TOOLCHAIN_FEATURES="
 as_arch_directive
 as_archext_dotprod_directive
 as_archext_i8mm_directive
+as_archext_sve_directive
+as_archext_sve2_directive
 as_dn_directive
 as_fpu_directive
 as_func
@@ -2755,6 +2761,8 @@ vfpv3_deps="vfp"
 setend_deps="arm"
 dotprod_deps="aarch64 neon"
 i8mm_deps="aarch64 neon"
+sve_deps="aarch64 neon"
+sve2_deps="aarch64 neon sve"
 
 map 'eval ${v}_inline_deps=inline_asm' $ARCH_EXT_LIST_ARM
 
@@ -6223,9 +6231,11 @@ if enabled aarch64; then
 # internal assembler in clang 3.3 does not support this instruction
 enabled neon && check_insn neon 'ext   v0.8B, v0.8B, v1.8B, #1'
 
-archext_list="dotprod i8mm"
+archext_list="dotprod i8mm sve sve2"
 enabled dotprod && check_archext_insn dotprod 'udot v0.4s, v0.16b, v0.16b'
 enabled i8mm&& check_archext_insn i8mm'usdot v0.4s, v0.16b, v0.16b'
+enabled sve && check_archext_insn sve 'whilelt p0.s, x0, x1'
+enabled sve2&& check_archext_insn sve2'sqrdmulh z0.s, z0.s, z0.s'
 
 # Disable the main feature (e.g. HAVE_NEON) if neither inline nor external
 # assembly support the feature out of the box. Skip this for the features
@@ -7913,6 +7923,8 @@ if enabled aarch64; then
 echo "NEON enabled  ${neon-no}"
 echo "DOTPROD enabled   ${dotprod-no}"
 echo "I8MM enabled  ${i8mm-no}"
+echo "SVE enabled   ${sve-no}"
+echo "SVE2 enabled  ${sve2-no}"
 fi
 if enabled arm; then
 echo "ARMv5TE enabled   ${armv5te-no}"
diff --git a/ffbuild/arch.mak b/ffbuild/arch.mak
index 3fc40e5e5d..af71aacfd2 100644
--- a/ffbuild/arch.mak
+++ b/ffbuild/arch.mak
@@ -3,6 +3,8 @@ OBJS-$(HAVE_ARMV6)   += $(ARMV6-OBJS)   $(ARMV6-OBJS-yes)
 OBJS-$(HAVE_ARMV8)   += $(ARMV8-OBJS)   $(ARMV8-OBJS-yes)
 OBJS-$(HAVE_VFP) += $(VFP-OBJS) $(VFP-OBJS-yes)
 OBJS-$(HAVE_NEON)+= $(NEON-OBJS)$(NEON-OBJS-yes)
+OBJS-$(HAVE_SVE) += $(SVE-OBJS) $(SVE-OBJS-yes)
+OBJS-$(HAVE_SVE2)+= $(SVE2-OBJS)$(SVE2-OBJS-yes)
 
 OBJS-$(HAVE_MIPSFPU)   += $(MIPSFPU-OBJS)$(MIPSFPU-OBJS-yes)
 OBJS-$(HAVE_MIPSDSP)   += $(MIPSDSP-OBJS)$(MIPSDSP-OBJS-yes)
diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S
index 1840f9fb01..50ce7d4dfd 100644
--- a/libavutil/aarch64/asm.S
+++ b/libavutil/aarch64/asm.S
@@ -56,8 +56,26 @@
 #define DISABLE_I8MM
 #endif
 
+#if HAVE_AS_ARCHEXT_SVE_DIRECTIVE
+#define ENABLE_SVE  .arch_extension sve
+#define DISABLE_SVE .arch_extension nosve
+#else
+#define ENABLE_SVE
+#define DISABLE_SVE
+#endif
+
+#if HAVE_AS_ARCHEXT_SVE2_DIRECTIVE
+#define ENABLE_SVE2  .arch_extension sve2
+#define DISABLE_SVE2 .arch_extension nosve2
+#else
+#define ENABLE_SVE2
+#define DISABLE_SVE2
+#endif
+
 DISABLE_DOTPROD
 DISABLE_I8MM
+DISABLE_SVE
+DISABLE_SVE2
 
 
 /* Support macros for
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 1/5] aarch64: Detect I8MM on Windows via SVE-I8MM

2024-09-17 Thread Martin Storsjö
There's no direct processor feature constant for I8MM alone, but
there is a flag for SVE-I8MM (added in WinSDK 10.0.26100 and
recent versions of mingw-w64). If SVE-I8MM is available, we can
assume that I8MM is available.

While HW supporting these features isn't yet commonly running
Windows, this at least allows detecting and running the I8MM codepaths
in Windows builds in Wine (possibly running in QEMU).
---
 libavutil/aarch64/cpu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index 7631d13de0..fe24b1da4d 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -112,6 +112,13 @@ static int detect_flags(void)
 #ifdef PF_ARM_V82_DP_INSTRUCTIONS_AVAILABLE
 if (IsProcessorFeaturePresent(PF_ARM_V82_DP_INSTRUCTIONS_AVAILABLE))
 flags |= AV_CPU_FLAG_DOTPROD;
+#endif
+#ifdef PF_ARM_SVE_I8MM_INSTRUCTIONS_AVAILABLE
+/* There's no PF_* flag that indicates whether plain I8MM is available
+ * or not. But if SVE_I8MM is available, that also implies that
+ * regular I8MM is available. */
+if (IsProcessorFeaturePresent(PF_ARM_SVE_I8MM_INSTRUCTIONS_AVAILABLE))
+flags |= AV_CPU_FLAG_I8MM;
 #endif
 return flags;
 }
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v3 2/2] configure: correctly set sanitizer toolchain compilers

2024-09-13 Thread Martin Storsjö

On Thu, 12 Sep 2024, Marvin Scholz wrote:


Previously only the C compiler was set, which would lead to
confusing situations where even though clang-asan was selected,
it would still use g++ for C++ code, failing because configure
does not support mixing compilers in this way (which is a separate
issue not addressed by this commit).
---
configure | 70 ---
1 file changed, 41 insertions(+), 29 deletions(-)


Both these patches seem ok to me.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 00/14] aarch64/vvc: Add SIMD

2024-09-12 Thread Martin Storsjö

On Thu, 12 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

Patches 1~9 has been updated according to Martin's review.

Patches 10~14 are new.

I have created a PR on github:
https://github.com/quink-black/FFmpeg/pull/2


Thanks for testing it through that set of tests!

No further comments from me on this set, it seems reasonable.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] Procedure to enable the Windows on ARM64 FFMPEG Libraries

2024-09-12 Thread Martin Storsjö

On Thu, 12 Sep 2024, Niranjan Kshatriya (QUIC) wrote:

and compiled locally ( on ARM using WSL ) for Arm with tool chain : 
GitHub - Windows-on-ARM-Experiments/mingw-woarm64-build: Workflows and 
build scripts for Windows on Arm64 GNU cross-compiler for 
`aarch64-w64-mingw32` 
target. 
with below configuration :


./configure --arch=arm64 --target-os=mingw32 
--cross-prefix=aarch64-w64-mingw32- --prefix=/ffbuild/prefix 
--pkg-config-flags=--static --pkg-config=pkg-config --enable-gpl 
--enable-version3 --disable-debug --disable-w32threads --enable-pthreads 
--disable-libpulse --disable-libxcb


Just as a general headsup/warning - that toolchain is heavily in progress 
and not very mature yet (plus that it has known ABI discrepancies compared 
to established aarch64 mingw environments). If you're evaluating the 
toolchain or planning on working on it, that's of course fine.


If you want an actually mature mingw toolchain for aarch64, grab one from 
https://github.com/mstorsjo/llvm-mingw/releases.


(This is of course unrelated to what it takes to enable building 
windows/arm64 binaries in that third party repo.)


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] tests/checkasm/sw_rgb: don't write random data past the end of the buffer

2024-09-12 Thread Martin Storsjö

On Thu, 12 Sep 2024, Ramiro Polla wrote:


On Thu, Sep 12, 2024 at 8:44 AM James Almer  wrote:


Should fix fate-checkasm-sw_rgb under gcc-ubsan.

Signed-off-by: James Almer 
---
 tests/checkasm/sw_rgb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/checkasm/sw_rgb.c b/tests/checkasm/sw_rgb.c
index af9434073a..cdd43df8ba 100644
--- a/tests/checkasm/sw_rgb.c
+++ b/tests/checkasm/sw_rgb.c
@@ -287,7 +287,7 @@ static void check_deinterleave_bytes(void)
int width, int height, int srcStride,
int dst1Stride, int dst2Stride);

-randomize_buffers(src, 2*MAX_STRIDE*MAX_HEIGHT+2);
+randomize_buffers(src, 2*MAX_STRIDE*MAX_HEIGHT);


Thank you for spotting it.

The issue is that randomize_buffers() writes 4 bytes at a time. I
think the proper fix is to change randomize_buffers() to not write
past the end of the buffer. It would be even better to move
randomize_buffers() to checkasm.h or checkasm.c so it doesn't have to
be copied around so many times.


Maybe, but part of the point of having randomize_buffers() and similar be 
local to each test, is because the exact procedure for writing and the 
right kind of random differs for each test/function category.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2] configure: fix symbol prefix detection

2024-09-11 Thread Martin Storsjö

On Wed, 11 Sep 2024, Marvin Scholz wrote:


The symbol prefix check would incorrectly detect a bogus prefix in 
circumstances where sanitizers
instrument the build, like when configuring with the clang-asan toolchain where 
it would detect the
prefix as __odr_asan_gen_, which is obviously wrong.

To fix this, adjust the prefix detection to only detect a one-character prefix, 
which is the only case
that matters anywhere right now.
---
configure | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index d3bd46f382a..7e84272b74b 100755
--- a/configure
+++ b/configure
@@ -6131,13 +6131,15 @@ enable_weak_pic() {
enabled pic && enable_weak_pic

test_cc <

Since we're checking for ff_extern$ in the substr match, would it be 
safest to include the $ in the initial ff_extern match as well? So if 
there's a _ff_extern$foo symbol listed first, that won't be matched?


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/3] aarch64/vvc: Add put_qpel_hv

2024-09-11 Thread Martin Storsjö

On Wed, 11 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

With Apple M1 (no i8mm):

put_luma_hv_8_4x4_c: 2.2 ( 1.00x)
put_luma_hv_8_4x4_neon:  0.8 ( 3.00x)
put_luma_hv_8_8x8_c: 7.0 ( 1.00x)
put_luma_hv_8_8x8_neon:  0.8 ( 9.33x)
put_luma_hv_8_16x16_c:  22.8 ( 1.00x)
put_luma_hv_8_16x16_neon:2.5 ( 9.10x)
put_luma_hv_8_32x32_c:  84.8 ( 1.00x)
put_luma_hv_8_32x32_neon:9.5 ( 8.92x)
put_luma_hv_8_64x64_c: 333.0 ( 1.00x)
put_luma_hv_8_64x64_neon:   35.5 ( 9.38x)
put_luma_hv_8_128x128_c:  1294.5 ( 1.00x)
put_luma_hv_8_128x128_neon:137.8 ( 9.40x)

With Pixel 8 Pro:

put_luma_hv_8_4x4_c: 5.0 ( 1.00x)
put_luma_hv_8_4x4_neon:  0.8 ( 6.67x)
put_luma_hv_8_4x4_i8mm:  0.2 (20.00x)
put_luma_hv_8_8x8_c:13.2 ( 1.00x)
put_luma_hv_8_8x8_neon:  1.2 (10.60x)
put_luma_hv_8_8x8_i8mm:  1.2 (10.60x)
put_luma_hv_8_16x16_c:  44.2 ( 1.00x)
put_luma_hv_8_16x16_neon:4.5 ( 9.83x)
put_luma_hv_8_16x16_i8mm:4.2 (10.41x)
put_luma_hv_8_32x32_c: 160.8 ( 1.00x)
put_luma_hv_8_32x32_neon:   17.5 ( 9.19x)
put_luma_hv_8_32x32_i8mm:   16.0 (10.05x)
put_luma_hv_8_64x64_c: 611.2 ( 1.00x)
put_luma_hv_8_64x64_neon:   68.0 ( 8.99x)
put_luma_hv_8_64x64_i8mm:   62.2 ( 9.82x)
put_luma_hv_8_128x128_c:  2384.8 ( 1.00x)
put_luma_hv_8_128x128_neon:268.8 ( 8.87x)
put_luma_hv_8_128x128_i8mm:245.8 ( 9.70x)
---
libavcodec/aarch64/h26x/dsp.h   |   8 ++
libavcodec/aarch64/h26x/qpel_neon.S | 140 
libavcodec/aarch64/vvc/dsp_init.c   |  14 +++
3 files changed, 162 insertions(+)


Ok

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/3] aarch64/vvc: Add put_qpel_vx

2024-09-11 Thread Martin Storsjö

On Wed, 11 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

put_luma_v_8_4x4_c:  1.0 ( 1.00x)
put_luma_v_8_4x4_neon:   0.0 ( 0.00x)
put_luma_v_8_8x8_c:  3.5 ( 1.00x)
put_luma_v_8_8x8_neon:   0.5 ( 7.00x)
put_luma_v_8_16x16_c:   13.8 ( 1.00x)
put_luma_v_8_16x16_neon: 1.2 (11.00x)
put_luma_v_8_32x32_c:   54.2 ( 1.00x)
put_luma_v_8_32x32_neon: 5.0 (10.85x)
put_luma_v_8_64x64_c:  217.5 ( 1.00x)
put_luma_v_8_64x64_neon:18.8 (11.60x)
put_luma_v_8_128x128_c:886.2 ( 1.00x)
put_luma_v_8_128x128_neon:  74.0 (11.98x)
---
libavcodec/aarch64/h26x/dsp.h   |   8 +++
libavcodec/aarch64/h26x/qpel_neon.S | 100 
libavcodec/aarch64/vvc/dsp_init.c   |   7 ++
3 files changed, 115 insertions(+)


This doesn't look harmful, and looks like the rest of these functions, so 
I guess it's acceptable. Let it be known that I very much dislike the 
structure of these functions, but you're adding more in the same style of 
the old, so I guess that's ok.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/3] aarch64/h26x: Remove duplicate b.eq instruction

2024-09-11 Thread Martin Storsjö

On Wed, 11 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

b.eq is added by calc_all after each calc.
---
libavcodec/aarch64/h26x/qpel_neon.S | 1 -
1 file changed, 1 deletion(-)

diff --git a/libavcodec/aarch64/h26x/qpel_neon.S 
b/libavcodec/aarch64/h26x/qpel_neon.S
index 8a372a76be..7868811b3b 100644
--- a/libavcodec/aarch64/h26x/qpel_neon.S
+++ b/libavcodec/aarch64/h26x/qpel_neon.S
@@ -754,7 +754,6 @@ function ff_hevc_put_hevc_qpel_v4_8_neon, export=1
calc_qpelb  v24, \src0, \src1, \src2, \src3, \src4, \src5, \src6, 
\src7
st1 {v24.4h}, [x0], x9
subsw3, w3, #1
-b.eq2f
.endm
1:  calc_all
.purgem calc
--
2.42.0


Ok

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] avcodec/hevc: ff_hevc_(qpel/epel)_filters are signed type

2024-09-11 Thread Martin Storsjö

On Sun, 8 Sep 2024, Zhao Zhili wrote:


From: Zhao Zhili 

---
libavcodec/hevc/dsp_template.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)


The rest of these patches seem fine (or where I don't have much of an 
opinion).


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/6] aarch64/vvc: Add put_pel/put_pel_uni/put_pel_uni_w

2024-09-11 Thread Martin Storsjö

On Sun, 8 Sep 2024, Zhao Zhili wrote:


diff --git a/libavcodec/aarch64/h26x/dsp.h b/libavcodec/aarch64/h26x/dsp.h
index f72746ce03..076d01b477 100644
--- a/libavcodec/aarch64/h26x/dsp.h
+++ b/libavcodec/aarch64/h26x/dsp.h
@@ -248,4 +248,26 @@ NEON8_FNPROTO_PARTIAL_4(qpel, (int16_t *dst, const uint8_t 
*_src, ptrdiff_t _src
NEON8_FNPROTO_PARTIAL_4(qpel_uni, (uint8_t *_dst, ptrdiff_t _dststride, const 
uint8_t *_src,
ptrdiff_t _srcstride, int height, const int8_t *hf, const int8_t *vf, 
int width),)

+#undef NEON8_FNPROTO_PARTIAL_6
+#define NEON8_FNPROTO_PARTIAL_6(fn, args, ext) \
+void ff_vvc_put_##fn##4_8_neon##ext args; \
+void ff_vvc_put_##fn##8_8_neon##ext args; \
+void ff_vvc_put_##fn##16_8_neon##ext args; \
+void ff_vvc_put_##fn##32_8_neon##ext args; \
+void ff_vvc_put_##fn##64_8_neon##ext args; \
+void ff_vvc_put_##fn##128_8_neon##ext args
+
+NEON8_FNPROTO_PARTIAL_6(pel_pixels, (int16_t *dst,
+const uint8_t *src, ptrdiff_t srcstride, int height,
+const int8_t *hf, const int8_t *vf, int width),);
+
+NEON8_FNPROTO_PARTIAL_6(pel_uni_pixels, (uint8_t *_dst, ptrdiff_t _dststride,
+const uint8_t *_src, ptrdiff_t _srcstride, int height,
+const int8_t *hf, const int8_t *vf, int width),);
+
+NEON8_FNPROTO_PARTIAL_6(pel_uni_w_pixels, (uint8_t *_dst, ptrdiff_t _dststride,
+const uint8_t *_src, ptrdiff_t _srcstride,
+int height, int denom, int wx, int ox,
+const int8_t *hf, const int8_t *vf, int width),);
+
#endif
diff --git a/libavcodec/aarch64/h26x/epel_neon.S 
b/libavcodec/aarch64/h26x/epel_neon.S
index 378b0f7fb2..729395f2f0 100644
--- a/libavcodec/aarch64/h26x/epel_neon.S
+++ b/libavcodec/aarch64/h26x/epel_neon.S
@@ -19,7 +19,8 @@
 */

#include "libavutil/aarch64/asm.S"
-#define MAX_PB_SIZE 64
+#define HEVC_MAX_PB_SIZE 64
+#define VVC_MAX_PB_SIZE 128

const epel_filters, align=4
.byte  0,  0,  0,  0
@@ -131,8 +132,13 @@ endconst
b.ne1b
.endm

+function ff_vvc_put_pel_pixels4_8_neon, export=1
+mov x7, #(VVC_MAX_PB_SIZE * 2)
+b   1f
+endfunc
+
function ff_hevc_put_hevc_pel_pixels4_8_neon, export=1
-mov x7, #(MAX_PB_SIZE * 2)
+mov x7, #(HEVC_MAX_PB_SIZE * 2)
1:  ld1 {v0.s}[0], [x1], x2
ushll   v4.8h, v0.8b, #6
subsw3, w3, #1
@@ -142,7 +148,7 @@ function ff_hevc_put_hevc_pel_pixels4_8_neon, export=1
endfunc

function ff_hevc_put_hevc_pel_pixels6_8_neon, export=1
-mov x7, #(MAX_PB_SIZE * 2 - 8)
+mov x7, #(HEVC_MAX_PB_SIZE * 2 - 8)
1:  ld1 {v0.8b}, [x1], x2
ushll   v4.8h, v0.8b, #6
st1 {v4.d}[0], [x0], #8
@@ -152,8 +158,13 @@ function ff_hevc_put_hevc_pel_pixels6_8_neon, export=1
ret
endfunc

+function ff_vvc_put_pel_pixels8_8_neon, export=1
+mov x7, #(VVC_MAX_PB_SIZE * 2)
+b   1f
+endfunc
+
function ff_hevc_put_hevc_pel_pixels8_8_neon, export=1
-mov x7, #(MAX_PB_SIZE * 2)
+mov x7, #(HEVC_MAX_PB_SIZE * 2)
1:  ld1 {v0.8b}, [x1], x2
ushll   v4.8h, v0.8b, #6
subsw3, w3, #1
@@ -163,7 +174,7 @@ function ff_hevc_put_hevc_pel_pixels8_8_neon, export=1
endfunc

function ff_hevc_put_hevc_pel_pixels12_8_neon, export=1
-mov x7, #(MAX_PB_SIZE * 2 - 16)
+mov x7, #(HEVC_MAX_PB_SIZE * 2 - 16)
1:  ld1 {v0.8b, v1.8b}, [x1], x2
ushll   v4.8h, v0.8b, #6
st1 {v4.8h}, [x0], #16
@@ -174,8 +185,13 @@ function ff_hevc_put_hevc_pel_pixels12_8_neon, export=1
ret
endfunc

+function ff_vvc_put_pel_pixels16_8_neon, export=1
+mov x7, #(VVC_MAX_PB_SIZE * 2)
+b   1f
+endfunc
+
function ff_hevc_put_hevc_pel_pixels16_8_neon, export=1
-mov x7, #(MAX_PB_SIZE * 2)
+mov x7, #(HEVC_MAX_PB_SIZE * 2)
1:  ld1 {v0.8b, v1.8b}, [x1], x2
ushll   v4.8h, v0.8b, #6
ushll   v5.8h, v1.8b, #6
@@ -186,7 +202,7 @@ function ff_hevc_put_hevc_pel_pixels16_8_neon, export=1
endfunc

function ff_hevc_put_hevc_pel_pixels24_8_neon, export=1
-mov x7, #(MAX_PB_SIZE * 2)
+mov x7, #(HEVC_MAX_PB_SIZE * 2)
1:  ld1 {v0.8b-v2.8b}, [x1], x2
ushll   v4.8h, v0.8b, #6
ushll   v5.8h, v1.8b, #6
@@ -197,8 +213,13 @@ function ff_hevc_put_hevc_pel_pixels24_8_neon, export=1
ret
endfunc

+function ff_vvc_put_pel_pixels32_8_neon, export=1
+mov x7, #(VVC_MAX_PB_SIZE * 2)
+b   1f
+endfunc
+
function ff_hevc_put_hevc_pel_pixels32_8_neon, export=1
-mov x7, #(MAX_PB_SIZE * 2)
+mov x7, #(HEVC_MAX_PB_SIZE * 2)
1:  ld1   

Re: [FFmpeg-devel] aarch64: Implement support for elf_aux_info(3) on FreeBSD and OpenBSD

2024-09-09 Thread Martin Storsjö

On Mon, 9 Sep 2024, Brad Smith wrote:


aarch64: Implement support for elf_aux_info(3) on FreeBSD and OpenBSD

FreeBSD 12.0+, OpenBSD -current and what will be OpenBSD 7.6 support
elf_aux_info(3).

Signed-off-by: Brad Smith 
---
configure   | 2 ++
libavutil/aarch64/cpu.c | 2 +-
libavutil/cpu.c | 9 -
3 files changed, 11 insertions(+), 2 deletions(-)


LGTM, thanks. (I guess the same change as in aarch64/cpu.c also could be 
done in other architectures' cpu.c?)


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avutil/cpu_internal: Provide ff_getauxval() wrapper for getauxvaul()

2024-09-09 Thread Martin Storsjö

On Sat, 24 Aug 2024, Brad Smith wrote:


avutil/cpu_internal: Provide ff_getauxval() wrapper for getauxvaul()

Initially used for getauxval() but will be used to add support for
other API.

Signed-off-by: Brad Smith 
---
libavutil/aarch64/cpu.c   |  4 ++--
libavutil/arm/cpu.c   |  2 +-
libavutil/cpu.c   | 14 ++
libavutil/cpu_internal.h  |  2 ++
libavutil/loongarch/cpu.c |  2 +-
libavutil/mips/cpu.c  |  2 +-
libavutil/riscv/cpu.c |  2 +-
7 files changed, 22 insertions(+), 6 deletions(-)


LGTM, this looks reasonable to me.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/avcodec: remove usage of __typeof__()

2024-09-08 Thread Martin Storsjö

On Sun, 8 Sep 2024, James Almer wrote:


It's non-standard C.

Signed-off-by: James Almer 
---
libavcodec/avcodec.c | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libavcodec/avcodec.c b/libavcodec/avcodec.c
index cb89236549..78153d12f1 100644
--- a/libavcodec/avcodec.c
+++ b/libavcodec/avcodec.c
@@ -708,9 +708,9 @@ int attribute_align_arg 
avcodec_receive_frame(AVCodecContext *avctx, AVFrame *fr
return ff_encode_receive_frame(avctx, frame);
}

-#define WRAP_CONFIG(allowed_type, field, terminator)\
+#define WRAP_CONFIG(allowed_type, field, field_type, terminator)\
do {\
-static const __typeof__(*(field)) end = terminator; \
+static const field_type end = terminator;   \
if (codec->type != (allowed_type))  \
return AVERROR(EINVAL); \
*out_configs = (field); \
@@ -753,15 +753,15 @@ int ff_default_get_supported_config(const AVCodecContext 
*avctx,
switch (config) {
FF_DISABLE_DEPRECATION_WARNINGS
case AV_CODEC_CONFIG_PIX_FORMAT:
-WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->pix_fmts, AV_PIX_FMT_NONE);
+WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->pix_fmts, enum AVPixelFormat, 
AV_PIX_FMT_NONE);
case AV_CODEC_CONFIG_FRAME_RATE:
-WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->supported_framerates, 
(AVRational){0});
+WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->supported_framerates, 
AVRational, (AVRational){0});
case AV_CODEC_CONFIG_SAMPLE_RATE:
-WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->supported_samplerates, 0);
+WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->supported_samplerates, int, 0);
case AV_CODEC_CONFIG_SAMPLE_FORMAT:
-WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->sample_fmts, 
AV_SAMPLE_FMT_NONE);
+WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->sample_fmts, enum 
AVSampleFormat, AV_SAMPLE_FMT_NONE);
case AV_CODEC_CONFIG_CHANNEL_LAYOUT:
-WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->ch_layouts, 
(AVChannelLayout){0});
+WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->ch_layouts, AVChannelLayout, 
(AVChannelLayout){0});
FF_ENABLE_DEPRECATION_WARNINGS

case AV_CODEC_CONFIG_COLOR_RANGE:
--
2.46.0


Actually, this isn't quite enough to fix compilation with all compilers:

src/libavcodec/avcodec.c: In function 'ff_default_get_supported_config':
src/libavcodec/avcodec.c:758:9: error: initializer element is not constant
 WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->supported_framerates, 
AVRational, (AVRational){0});

 ^
src/libavcodec/avcodec.c:764:9: error: initializer element is not constant
 WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->ch_layouts, 
AVChannelLayout, (AVChannelLayout){0});

 ^

Since we're not using typeof here, we can drop the casts here and just use 
plain {0}:


diff --git a/libavcodec/avcodec.c b/libavcodec/avcodec.c
index 78153d12f1..8d1a280323 100644
--- a/libavcodec/avcodec.c
+++ b/libavcodec/avcodec.c
@@ -755,13 +755,13 @@ FF_DISABLE_DEPRECATION_WARNINGS
 case AV_CODEC_CONFIG_PIX_FORMAT:
 WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->pix_fmts, enum AVPixelFormat, 
AV_PIX_FMT_NONE);
 case AV_CODEC_CONFIG_FRAME_RATE:
-WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->supported_framerates, 
AVRational, (AVRational){0});
+WRAP_CONFIG(AVMEDIA_TYPE_VIDEO, codec->supported_framerates, 
AVRational, {0});
 case AV_CODEC_CONFIG_SAMPLE_RATE:
 WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->supported_samplerates, int, 0);
 case AV_CODEC_CONFIG_SAMPLE_FORMAT:
 WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->sample_fmts, enum 
AVSampleFormat, AV_SAMPLE_FMT_NONE);
 case AV_CODEC_CONFIG_CHANNEL_LAYOUT:
-WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->ch_layouts, AVChannelLayout, 
(AVChannelLayout){0});
+WRAP_CONFIG(AVMEDIA_TYPE_AUDIO, codec->ch_layouts, AVChannelLayout, 
{0});
 FF_ENABLE_DEPRECATION_WARNINGS

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/avcodec: remove usage of __typeof__()

2024-09-08 Thread Martin Storsjö

On Sun, 8 Sep 2024, James Almer wrote:


It's non-standard C.

Signed-off-by: James Almer 
---
libavcodec/avcodec.c | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)


LGTM

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] swscale/aarch64/rgb2rgb: add deinterleaveBytes neon implementation

2024-08-30 Thread Martin Storsjö

On Fri, 30 Aug 2024, Ramiro Polla wrote:


 A55   A76
deinterleave_bytes_c: 70342.0   34497.5
deinterleave_bytes_neon:  21594.5 ( 3.26x)   5535.2 ( 6.23x)
deinterleave_bytes_aligned_c: 71340.8   34651.2
deinterleave_bytes_aligned_neon:   8616.8 ( 8.28x)   3996.2 ( 8.67x)
---
libswscale/aarch64/rgb2rgb.c  |  4 ++
libswscale/aarch64/rgb2rgb_neon.S | 59 +++
tests/checkasm/sw_rgb.c   | 77 +++
3 files changed, 140 insertions(+)


LGTM

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/2] aarch64/vvc: Bind h26x/sao filter implementation to vvc

2024-08-29 Thread Martin Storsjö

On Wed, 28 Aug 2024, Zhao Zhili wrote:


From: Zhao Zhili 

---
libavcodec/aarch64/h26x/dsp.h |  6 +++-
libavcodec/aarch64/h26x/sao_neon.S| 44 +--
libavcodec/aarch64/hevcdsp_init_aarch64.c |  2 +-
libavcodec/aarch64/vvc/Makefile   |  5 +--
libavcodec/aarch64/vvc/dsp_init.c |  6 
5 files changed, 48 insertions(+), 15 deletions(-)


These two patches look reasonable to me.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64/rgb2rgb: add neon implementation for rgb24toyv12

2024-08-29 Thread Martin Storsjö

On Thu, 29 Aug 2024, Ramiro Polla wrote:


On Wed, Aug 28, 2024 at 11:23 PM Martin Storsjö  wrote:

On Wed, 28 Aug 2024, Ramiro Polla wrote:



+2:
+// load first line
+ld3 {v16.8b, v17.8b, v18.8b}, [x0], #24
+ld3 {v19.8b, v20.8b, v21.8b}, [x0], #24


Hmm, can't we do just one single ld3 with .16b registers, instead of two
separate ones?

If you want to keep the same register layout as now, load into v19-v21,
then do "uxtl v16.8h, v19.8b; uxtl2 v19.8h, v19.16b".


Thanks, that made it faster.


+uxtlv16.8h, v16.8b  // v16 = B11
+uxtlv17.8h, v17.8b  // v17 = G11
+uxtlv18.8h, v18.8b  // v18 = R11
+uxtlv19.8h, v19.8b  // v19 = B12
+uxtlv20.8h, v20.8b  // v20 = G12
+uxtlv21.8h, v21.8b  // v21 = R12
+
+// calculate Y values for first line
+rgbconv16   v24, v16, v17, v18, BY, GY, RY // v24 = Y11
+rgbconv16   v25, v19, v20, v21, BY, GY, RY // v25 = Y12
+
+// pairwise add and save rgb values to calculate average
+addpv5.8h, v16.8h, v19.8h
+addpv6.8h, v17.8h, v20.8h
+addpv7.8h, v18.8h, v21.8h
+
+// load second line
+ld3 {v16.8b, v17.8b, v18.8b}, [x10], #24
+ld3 {v19.8b, v20.8b, v21.8b}, [x10], #24


It's a shame we can't start this load earlier. But as essentially
everything depends on the input as it is, in v16-v21, we'd pretty much
need to use different registers here in order to do that.

If you wanted to, you could try loading earlier, into different registers
(I think v26-v31 are free at this point?), while then doing the uxtl into
the same registers as before, which shouldn't require any further changes.


Thanks, that also led to a small improvement.

New patch attached.


The new version LGTM, thanks!

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64/rgb2rgb: add neon implementation for rgb24toyv12

2024-08-28 Thread Martin Storsjö

On Wed, 28 Aug 2024, Ramiro Polla wrote:


 A55   A76
rgb24toyv12_16_200_c: 36658.8   17319.2
rgb24toyv12_16_200_neon:  12765.8 ( 2.87x)   6036.0 ( 2.87x)
rgb24toyv12_128_60_c: 83329.5   39901.2
rgb24toyv12_128_60_neon:  28059.8 ( 2.97x)  14288.2 ( 2.79x)
rgb24toyv12_512_16_c: 87874.5   42339.0
rgb24toyv12_512_16_neon:  29673.5 ( 2.96x)  15219.0 ( 2.78x)
rgb24toyv12_1920_4_c: 82323.5   39672.8
rgb24toyv12_1920_4_neon:  27627.5 ( 2.98x)  14267.5 ( 2.78x)
---
libswscale/aarch64/rgb2rgb.c  |   4 +
libswscale/aarch64/rgb2rgb_neon.S | 158 ++
2 files changed, 162 insertions(+)

diff --git a/libswscale/aarch64/rgb2rgb.c b/libswscale/aarch64/rgb2rgb.c
index a9bf6ff9e0..c557cf871c 100644
--- a/libswscale/aarch64/rgb2rgb.c
+++ b/libswscale/aarch64/rgb2rgb.c
@@ -27,6 +27,9 @@
#include "libswscale/swscale.h"
#include "libswscale/swscale_internal.h"

+void ff_rgb24toyv12_neon(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+ uint8_t *vdst, int width, int height, int lumStride,
+ int chromStride, int srcStride, int32_t *rgb2yuv);
void ff_interleave_bytes_neon(const uint8_t *src1, const uint8_t *src2,
  uint8_t *dest, int width, int height,
  int src1Stride, int src2Stride, int dstStride);
@@ -36,6 +39,7 @@ av_cold void rgb2rgb_init_aarch64(void)
int cpu_flags = av_get_cpu_flags();

if (have_neon(cpu_flags)) {
+ff_rgb24toyv12  = ff_rgb24toyv12_neon;
interleaveBytes = ff_interleave_bytes_neon;
}
}
diff --git a/libswscale/aarch64/rgb2rgb_neon.S 
b/libswscale/aarch64/rgb2rgb_neon.S
index d81110ec57..23059320b2 100644
--- a/libswscale/aarch64/rgb2rgb_neon.S
+++ b/libswscale/aarch64/rgb2rgb_neon.S
@@ -1,5 +1,6 @@
/*
 * Copyright (c) 2020 Martin Storsjo
+ * Copyright (c) 2024 Ramiro Polla
 *
 * This file is part of FFmpeg.
 *
@@ -20,6 +21,163 @@

#include "libavutil/aarch64/asm.S"

+#define RGB2YUV_COEFFS 16*4+16*32
+#define BY v0.h[0]
+#define GY v0.h[1]
+#define RY v0.h[2]
+#define BU v1.h[0]
+#define GU v1.h[1]
+#define RU v1.h[2]
+#define BV v2.h[0]
+#define GV v2.h[1]
+#define RV v2.h[2]
+#define Y_OFFSET  v22
+#define UV_OFFSET v23
+
+// convert rgb to 16-bit y, u, or v
+// uses v3 and v4
+.macro rgbconv16 dst, b, g, r, bc, gc, rc
+smull   v3.4s, \b\().4h, \bc
+smlal   v3.4s, \g\().4h, \gc
+smlal   v3.4s, \r\().4h, \rc
+smull2  v4.4s, \b\().8h, \bc
+smlal2  v4.4s, \g\().8h, \gc
+smlal2  v4.4s, \r\().8h, \rc// v3:v4 = b * bc + g * gc 
+ r * rc (32-bit)
+shrn\dst\().4h, v3.4s, #7
+shrn2   \dst\().8h, v4.4s, #7   // dst = b * bc + g * gc + 
r * rc (16-bit)
+.endm
+
+// void ff_rgb24toyv12_neon(const uint8_t *src, uint8_t *ydst, uint8_t *udst,
+//  uint8_t *vdst, int width, int height, int 
lumStride,
+//  int chromStride, int srcStride, int32_t *rgb2yuv);
+function ff_rgb24toyv12_neon, export=1
+// x0  const uint8_t *src
+// x1  uint8_t *ydst
+// x2  uint8_t *udst
+// x3  uint8_t *vdst
+// w4  int width
+// w5  int height
+// w6  int lumStride
+// w7  int chromStride
+ldrsw   x14, [sp]
+ldr x15, [sp, #8]
+// x14 int srcStride
+// x15 int32_t *rgb2yuv
+
+// extend width and stride parameters
+uxtwx4, w4
+sxtwx6, w6
+sxtwx7, w7


Just for the record: Yes, we could avoid these sxtw/uxtw instructions by 
folding it into the uses of w4/w6/w7 below, like "add ..., w6, sxtw". 
However, register extending ALU arithmetics perform worse than operations 
on the full register - that's why we prefer the explicit instructions 
here.



+
+// src1 = x0
+// src2 = x10
+add x10, x0,  x14   // x10 = src + srcStride
+lsl x14, x14, #1// srcStride *= 2
+add x11, x4,  x4, lsl #1// x11 = 3 * width
+sub x14, x14, x11   // srcPadding = (2 * 
srcStride) - (3 * width)
+
+// ydst1 = x1
+// ydst2 = x11
+add x11, x1,  x6// x11 = ydst + lumStride
+lsl x6,  x6,  #1// lumStride *= 2
+sub x6,  x6,  x4// lumPadding = (2 * 
lumStride) - width
+
+sub x7,  x7,  x4, lsr #1// chromPadding = 
chromStride - (width / 2)
+
+// load rgb2yuv coefficients into v0, v1, and v2
+add x15, x15, #RGB2YUV_COEFFS
+ld1 {v0.8h-v2.8h}, [x15]// load 24 values
+
+// load offset constants
+moviY_OFFSET.8h,  #0x10, lsl #8
+moviUV_OFFSET.8h, #0x80, lsl 

Re: [FFmpeg-devel] [PATCH 1/4] checkasm/sw_rgb: add rgb24toyv12 tests

2024-08-28 Thread Martin Storsjö

On Wed, 28 Aug 2024, Ramiro Polla wrote:


NOTE: currently the tests for rgb24toyv12 fail for x86 since the c and
 mmxext implementations differ (the mmxext version averages four
 rgb pixels before performing the chroma calculations).
---
tests/checkasm/sw_rgb.c | 89 +
1 file changed, 89 insertions(+)


Would it be better for bisectability, if you'd swap the order of patches 
1 and 2?


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avutil/aarch64: add AV_COPY128 and AV_ZERO128 macros

2024-08-22 Thread Martin Storsjö

On Thu, 22 Aug 2024, Ramiro Polla wrote:


---
libavutil/aarch64/intreadwrite.h | 42 
libavutil/intreadwrite.h |  4 ++-
2 files changed, 45 insertions(+), 1 deletion(-)
create mode 100644 libavutil/aarch64/intreadwrite.h

diff --git a/libavutil/aarch64/intreadwrite.h b/libavutil/aarch64/intreadwrite.h
new file mode 100644
index 00..4ce2d64987
--- /dev/null
+++ b/libavutil/aarch64/intreadwrite.h
@@ -0,0 +1,42 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVUTIL_AARCH64_INTREADWRITE_H
+#define AVUTIL_AARCH64_INTREADWRITE_H
+
+#if HAVE_INTRINSICS_NEON
+
+#include 
+
+#define AV_COPY128 AV_COPY128
+static av_always_inline void AV_COPY128(void *d, const void *s)
+{
+uint8x16_t tmp = vld1q_u8((const uint8_t *)s);
+vst1q_u8((uint8_t *)d, tmp);
+}
+
+#define AV_ZERO128 AV_ZERO128
+static av_always_inline void AV_ZERO128(void *d)
+{
+uint8x16_t zero = vdupq_n_u8(0);
+vst1q_u8((uint8_t *)d, zero);
+}
+
+#endif /* HAVE_INTRINSICS_NEON */
+
+#endif /* AVUTIL_AARCH64_INTREADWRITE_H */
diff --git a/libavutil/intreadwrite.h b/libavutil/intreadwrite.h
index 120bdbc8f0..ffd15a1502 100644
--- a/libavutil/intreadwrite.h
+++ b/libavutil/intreadwrite.h
@@ -64,7 +64,9 @@ typedef union {

#include "config.h"

-#if ARCH_MIPS
+#if ARCH_AARCH64
+#   include "aarch64/intreadwrite.h"
+#elif ARCH_MIPS
#   include "mips/intreadwrite.h"
#elif ARCH_PPC
#   include "ppc/intreadwrite.h"
--
2.39.2


LGTM, this seems like a valid use case for intrinsics.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 3/7] avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1

2024-08-21 Thread Martin Storsjö

On Wed, 21 Aug 2024, Ramiro Polla wrote:


BTW, this instruction is kinda exotic and the docs aren't super clear, so
it'd be good to test manually that it really does what we want, for
negative numbers and numbers close to the ends of the value range; I
didn't do that manually yet.


I prefer just sticking to sxtw + lsl then. When we move to ptrdiff_t
the sxtw will be gone anyway.


This sounds like a very reasonable choice indeed, especially if it's 
somewhat plausible that we'll get rid of it at some point in the future.



+moviv0.16b, #0
+mov w3, #16
+
+1:
+ld1 {v1.16b}, [x0], x1
+ld1 {v2.16b}, [x2], x1
+subsw3, w3, #2
+uadalp  v0.8h, v1.16b
+uadalp  v0.8h, v2.16b
+b.ne1b
+
+uaddlv  s0, v0.8h
+fmovw0, s0
+
+ret
+endfunc
+
+function ff_pix_norm1_neon, export=1
+// x0  const uint8_t *pix
+// x1  int line_size
+
+sxtwx1, w1
+moviv4.16b, #0
+moviv5.16b, #0
+mov w2, #16
+
+1:
+ld1 {v1.16b}, [x0], x1
+subsw2, w2, #1
+umull   v2.8h, v1.8b,  v1.8b
+umull2  v3.8h, v1.16b, v1.16b
+uadalp  v4.4s, v2.8h
+uadalp  v5.4s, v3.8h


From my earlier testing on A53, it seemed (surprisingly) to be equally
fast to accumulate into the same register for both instructions - but I
only tested that on A53. So we could change that here, getting rid of the
add at the end (and one movi). Or if it does help on some other core,
perhaps we should do the same for the function above too?


Indeed, it is equally fast to accumulate into the same register on the
A55 and A76 as well.

New patches attached (patch 3/7 has functional changes, but patch 4/7
only changes the commit message to reflect the new test run).


LGTM very much now, thanks! And thanks for your patience through all the 
iterations on such trivial patches as these.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 3/7] avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1

2024-08-21 Thread Martin Storsjö

On Wed, 21 Aug 2024, Ramiro Polla wrote:


  A55 A76
pix_norm1_c: 484.3   235.2
pix_norm1_neon:  193.8 ( 2.50x)   44.7 ( 5.26x)
pix_sum_c:   302.8   243.7
pix_sum_neon: 81.6 ( 3.71x)   26.0 ( 9.37x)
---
libavcodec/aarch64/Makefile   |  2 +
libavcodec/aarch64/mpegvideoencdsp_init.c | 39 +
libavcodec/aarch64/mpegvideoencdsp_neon.S | 69 +++
libavcodec/mpegvideoencdsp.c  |  4 +-
libavcodec/mpegvideoencdsp.h  |  2 +
5 files changed, 115 insertions(+), 1 deletion(-)
create mode 100644 libavcodec/aarch64/mpegvideoencdsp_init.c
create mode 100644 libavcodec/aarch64/mpegvideoencdsp_neon.S

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index a3256bb1cc..de0653ebbc 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -10,6 +10,7 @@ OBJS-$(CONFIG_HPELDSP)  += 
aarch64/hpeldsp_init_aarch64.o
OBJS-$(CONFIG_IDCTDSP)  += aarch64/idctdsp_init_aarch64.o
OBJS-$(CONFIG_ME_CMP)   += aarch64/me_cmp_init_aarch64.o
OBJS-$(CONFIG_MPEGAUDIODSP) += aarch64/mpegaudiodsp_init.o
+OBJS-$(CONFIG_MPEGVIDEOENC) += aarch64/mpegvideoencdsp_init.o
OBJS-$(CONFIG_NEON_CLOBBER_TEST)+= aarch64/neontest.o
OBJS-$(CONFIG_PIXBLOCKDSP)  += aarch64/pixblockdsp_init_aarch64.o
OBJS-$(CONFIG_VIDEODSP) += aarch64/videodsp_init.o
@@ -51,6 +52,7 @@ NEON-OBJS-$(CONFIG_IDCTDSP) += 
aarch64/idctdsp_neon.o  \
   aarch64/simple_idct_neon.o
NEON-OBJS-$(CONFIG_ME_CMP)  += aarch64/me_cmp_neon.o
NEON-OBJS-$(CONFIG_MPEGAUDIODSP)+= aarch64/mpegaudiodsp_neon.o
+NEON-OBJS-$(CONFIG_MPEGVIDEOENC)+= aarch64/mpegvideoencdsp_neon.o
NEON-OBJS-$(CONFIG_PIXBLOCKDSP) += aarch64/pixblockdsp_neon.o
NEON-OBJS-$(CONFIG_VC1DSP)  += aarch64/vc1dsp_neon.o
NEON-OBJS-$(CONFIG_VP8DSP)  += aarch64/vp8dsp_neon.o
diff --git a/libavcodec/aarch64/mpegvideoencdsp_init.c 
b/libavcodec/aarch64/mpegvideoencdsp_init.c
new file mode 100644
index 00..7eb632ed1b
--- /dev/null
+++ b/libavcodec/aarch64/mpegvideoencdsp_init.c
@@ -0,0 +1,39 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include 
+#include 
+
+#include "libavutil/attributes.h"
+#include "libavutil/aarch64/cpu.h"
+#include "libavcodec/mpegvideoencdsp.h"
+#include "config.h"
+
+int ff_pix_sum16_neon(const uint8_t *pix, int line_size);
+int ff_pix_norm1_neon(const uint8_t *pix, int line_size);
+
+av_cold void ff_mpegvideoencdsp_init_aarch64(MpegvideoEncDSPContext *c,
+ AVCodecContext *avctx)
+{
+int cpu_flags = av_get_cpu_flags();
+
+if (have_neon(cpu_flags)) {
+c->pix_sum   = ff_pix_sum16_neon;
+c->pix_norm1 = ff_pix_norm1_neon;
+}
+}
diff --git a/libavcodec/aarch64/mpegvideoencdsp_neon.S 
b/libavcodec/aarch64/mpegvideoencdsp_neon.S
new file mode 100644
index 00..6e7a9319ba
--- /dev/null
+++ b/libavcodec/aarch64/mpegvideoencdsp_neon.S
@@ -0,0 +1,69 @@
+/*
+ * Copyright (c) 2024 Ramiro Polla
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+function ff_pix_sum16_neon, export=1
+// x0  const uint8_t *pix
+// x1  int line_size
+
+add x2, x0, w1, sxtw
+sbfiz   x1, x1, #1, #32


BTW, this instruction is kinda exotic and the docs aren't super cle

Re: [FFmpeg-devel] [PATCH] libswscale: arm: Don't assume aligned output in yuv2rgb functions

2024-08-19 Thread Martin Storsjö

On Mon, 19 Aug 2024, Martin Storsjö wrote:


This fixes failures in recently added checkasm tests.

While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.
---
This fixes FATE failures like in
http://fate.ffmpeg.org/report.cgi?time=20240819041749&slot=armv7-linux-gcc-9.
---
libswscale/arm/yuv2rgb_neon.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 10950e70b4..474465427d 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -75,8 +75,8 @@
vzip.8  d7, d11@ d7 = 
G1G2G3G4G5G6G7G8 d11 = G9G10G11G12G13G14G15G16
vzip.8  d8, d12@ d8 = 
B1B2B3B4B5B6B7B8 d12 = B9B10B11B12B13B14B15B16
vzip.8  d9, d13@ d9 = 
A1A2A3A4A5A6A7A8 d13 = A9A10A11A12A13A14A15A16
-vst4.8  {q3, q4}, [\dst,:128]!
-vst4.8  {q5, q6}, [\dst,:128]!
+vst4.8  {q3, q4}, [\dst]!
+vst4.8  {q5, q6}, [\dst]!
.endm

.macro process_1l_internal dst src ofmt
--
2.34.1


OK'd by Jan and Rémi on irc, will push soon.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] Does rtspenc actually support AVFMT_GLOBALHEADER?

2024-08-19 Thread Martin Storsjö

On Mon, 19 Aug 2024, John Cox wrote:


Does rtspenc actually support AVFMT_GLOBALHEADER? It is specified in the
FFOutputFormat flags but I can't see anywhere in the code where
extradata is referenced like it is in other output formats which support
that flag.

I ask because I have an encoder that supports the flag and when set
removes SPS/PPS from the stream and puts them in extradata instead which
I believe is the correct behavior - if it isn't then that is my problem
and I'd appreciate clarification of what is meant to occur. The
transmitted RTSP stream then doesn't contain SPS/PPS.


That's correct, the SPS/PPS gets transmitted in the SDP description, not 
in-band.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/7] avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1

2024-08-19 Thread Martin Storsjö

On Mon, 19 Aug 2024, Ramiro Polla wrote:


If the stride is a negative number, the first sxtw does the right thing,
but the "lsl w1, w1, #1" will zero out the upper half of the register.


I'll start adding negative stride tests to checkasm to spot these bugs.


That's probably useful. The other alternative is to transition these cases 
to use ptrdiff_t for the stride, which should be register sized, so most 
of the sign extension issues around strides go away. (We've transitioned 
lots of preexisting DSP interfaces already, so doing that here would just 
be the next logical step. But at times, this may require marginal 
touch-ups to existing assembly, or at least allows getting rid of such 
sign extensions later.)



With this, I'm down from your 120 cycles on the A53 originally, to 78.7
cycles now. Your fully unrolled version seemed to run in 72 cycles on the
A53, so that's obviously even faster, but I think this kind of tradeoff
might be the sweet spot. What does such a version give you in terms of
real world speed?


This version is around 0.5% slower overall on the A76. Very roughly
these are the total times taken by pix_sum and pix_norm1 with the
different implementations on A76:
c: ~5%
fully unrolled: ~3%
unroll 2: 2.5%
tight loop: 2%


Ok. Given the tradeoff between various different cores (including ones not 
tested here), do you think this version would be a reasonable compromise 
(giving almost ideal results on in-order cores, and not too much slowdown 
on out-of-order cores in this benchmark)?


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/7] avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1

2024-08-19 Thread Martin Storsjö

On Sun, 18 Aug 2024, Ramiro Polla wrote:


I had tested the real world case on the A76, but not on the A53. I
spent a couple of hours with perf trying to find the source of the
discrepancy but I couldn't find anything conclusive. I need to learn
more about how to test cache misses.


Nah, I guess that's a bit overkill...


I just tested again with the following command:
$ taskset -c 2 ./ffmpeg_g -benchmark -f lavfi -i
"testsrc2=size=1920x1080" -vcodec mpeg4 -q 31 -vframes 100 -f rawvideo
-y /dev/null

The entire test was about 1% faster unrolled on A53, but about 1%
slower unrolled on A76 (I had the Raspberry Pi 5 in mind for these
optimizations, so I preferred choosing the version that was faster on
the A76).



I wonder if there is any way we could check at runtime.


There are indeed often cases where functions could be tuned differently 
for older/newer or in-order/out-of-order cores. In most cases, trying to 
specialize things is a bit waste and overkill though - in most cases, I'd 
just suggest going with a compromise.


(Sometimes, different kinds of tunings can be applied if you use e.g. the 
flag dotprod to differentiate between older and newer cores. But it's 
seldom worth the extra effort to do that.)



Right, so looking at your unrolled case, you've done a full unroll. That's 
probably also a bit overkill.


The in-order cores really hate tight loops where almost everything has a 
sequential dependency on the previous instruction - so the general rule of 
thumb is that you'll want to unroll by a factor of 2, unless the algorithm 
itself has enough complexity that there's two separate dependency chains 
interlinked.


Also, from your unrolled version, there's a slight bug in it:


+add x2, x0, w1, sxtw
+lsl w1, w1, #1


If the stride is a negative number, the first sxtw does the right thing, 
but the "lsl w1, w1, #1" will zero out the upper half of the register.


So for that, you'd still need to keep the "sxtw x1, w1" instruction, and 
do the lsl on x1 instead. It is actually possible to merge it into one 
instruction though, with "sbfiz x1, x1, #1, #32", if I read the docs 
right. But that's a much more uncommon instruction...


As for optimal performance here - I tried something like this:

moviv0.16b, #0
add x2, x0, w1, sxtw
sbfiz   x1, x1, #1, #32
mov w3, #16

1:
ld1 {v1.16b}, [x0], x1
ld1 {v2.16b}, [x2], x1
subsw3, w3, #2
uadalp  v0.8h, v1.16b
uadalp  v0.8h, v2.16b
b.ne1b

uaddlv  s0, v0.8h
fmovw0, s0

ret

With this, I'm down from your 120 cycles on the A53 originally, to 78.7 
cycles now. Your fully unrolled version seemed to run in 72 cycles on the 
A53, so that's obviously even faster, but I think this kind of tradeoff 
might be the sweet spot. What does such a version give you in terms of 
real world speed?


On this version, you can also note that the two sequential uadalp may seem 
a little potentially problematic. I did try using two separate accumulator 
registers, accumulating into v0 and v1 separately here, and only summing 
them at the end. That didn't make any difference, so the A53 may 
potentially have a special case where two such sequential accumulations 
into the same register doesn't incur the extra full latency. (The A53 does 
have such a case for "mla" at least.)


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/7] checkasm/mpegvideoencdsp: add pix_sum and pix_norm1

2024-08-19 Thread Martin Storsjö

On Sun, 18 Aug 2024, Ramiro Polla wrote:


---
tests/checkasm/Makefile  |  1 +
tests/checkasm/checkasm.c|  3 ++
tests/checkasm/checkasm.h|  1 +
tests/checkasm/mpegvideoencdsp.c | 77 
4 files changed, 82 insertions(+)
create mode 100644 tests/checkasm/mpegvideoencdsp.c


When adding a new checkasm test like this, make sure to add it to 
FATE_CHECKASM in tests/fate/checkasm.mak too, otherwise it won't get run 
by fate.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] libswscale: arm: Don't assume aligned output in yuv2rgb functions

2024-08-19 Thread Martin Storsjö
This fixes failures in recently added checkasm tests.

While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.
---
This fixes FATE failures like in
http://fate.ffmpeg.org/report.cgi?time=20240819041749&slot=armv7-linux-gcc-9.
---
 libswscale/arm/yuv2rgb_neon.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libswscale/arm/yuv2rgb_neon.S b/libswscale/arm/yuv2rgb_neon.S
index 10950e70b4..474465427d 100644
--- a/libswscale/arm/yuv2rgb_neon.S
+++ b/libswscale/arm/yuv2rgb_neon.S
@@ -75,8 +75,8 @@
 vzip.8  d7, d11@ d7 = 
G1G2G3G4G5G6G7G8 d11 = G9G10G11G12G13G14G15G16
 vzip.8  d8, d12@ d8 = 
B1B2B3B4B5B6B7B8 d12 = B9B10B11B12B13B14B15B16
 vzip.8  d9, d13@ d9 = 
A1A2A3A4A5A6A7A8 d13 = A9A10A11A12A13A14A15A16
-vst4.8  {q3, q4}, [\dst,:128]!
-vst4.8  {q5, q6}, [\dst,:128]!
+vst4.8  {q3, q4}, [\dst]!
+vst4.8  {q5, q6}, [\dst]!
 .endm
 
 .macro process_1l_internal dst src ofmt
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 3/7] avcodec/aarch64/mpegvideoencdsp: add dotprod implementation for pix_norm1

2024-08-18 Thread Martin Storsjö

On Sun, 18 Aug 2024, Ramiro Polla wrote:


 A76
pix_norm1_c:231.5
pix_norm1_neon:  44.2 ( 5.24x)
pix_norm1_dotprod:   20.7 (11.18x)
---
libavcodec/aarch64/mpegvideoencdsp_init.c | 10 
libavcodec/aarch64/mpegvideoencdsp_neon.S | 28 +++
2 files changed, 38 insertions(+)

diff --git a/libavcodec/aarch64/mpegvideoencdsp_init.c 
b/libavcodec/aarch64/mpegvideoencdsp_init.c
index 7eb632ed1b..d0ce07e178 100644
--- a/libavcodec/aarch64/mpegvideoencdsp_init.c
+++ b/libavcodec/aarch64/mpegvideoencdsp_init.c
@@ -27,6 +27,10 @@
int ff_pix_sum16_neon(const uint8_t *pix, int line_size);
int ff_pix_norm1_neon(const uint8_t *pix, int line_size);

+#if HAVE_DOTPROD
+int ff_pix_norm1_neon_dotprod(const uint8_t *pix, int line_size);
+#endif
+
av_cold void ff_mpegvideoencdsp_init_aarch64(MpegvideoEncDSPContext *c,
 AVCodecContext *avctx)
{
@@ -36,4 +40,10 @@ av_cold void 
ff_mpegvideoencdsp_init_aarch64(MpegvideoEncDSPContext *c,
c->pix_sum   = ff_pix_sum16_neon;
c->pix_norm1 = ff_pix_norm1_neon;
}
+
+#if HAVE_DOTPROD
+if (have_dotprod(cpu_flags)) {
+c->pix_norm1 = ff_pix_norm1_neon_dotprod;
+}
+#endif
}
diff --git a/libavcodec/aarch64/mpegvideoencdsp_neon.S 
b/libavcodec/aarch64/mpegvideoencdsp_neon.S
index 89e50e29b3..eccbdd850f 100644
--- a/libavcodec/aarch64/mpegvideoencdsp_neon.S
+++ b/libavcodec/aarch64/mpegvideoencdsp_neon.S
@@ -65,3 +65,31 @@ function ff_pix_norm1_neon, export=1

ret
endfunc
+
+#if HAVE_DOTPROD
+ENABLE_DOTPROD
+
+function ff_pix_norm1_neon_dotprod, export=1
+// x0  const uint8_t *pix
+// x1  int line_size
+
+sxtwx1, w1
+moviv0.16b, #0
+mov w2, #16
+
+1:
+ld1 { v1.16b }, [x0], x1
+ld1 { v2.16b }, [x0], x1


Nit, spaces inside of {}


+udotv0.4s, v1.16b, v1.16b
+subsw2, w2, #2
+udotv0.4s, v2.16b, v2.16b
+b.ne1b
+
+uaddlv  d0, v0.4s
+fmovw0, s0
+
+ret
+endfunc


This implementation LGTM otherwise

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 2/7] avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1

2024-08-18 Thread Martin Storsjö

On Sun, 18 Aug 2024, Ramiro Polla wrote:


  A53 A76
pix_norm1_c: 519.2   231.5
pix_norm1_neon:  195.0 ( 2.66x)   44.2 ( 5.24x)
pix_sum_c:   344.5   242.2
pix_sum_neon:119.0 ( 2.89x)   41.7 ( 5.81x)
---


Hmm, those speedups on the A53 look quite small. I guess that's because 
this isn't unrolled at all, as you mention. Especially for A53, I would 
expect unrolling to have a very large effect here. But it sounds weird if 
you say perf indicates that it is slower in real world use. Yes, unrolling 
does make the code use more space and makes the I-cache less efficient, 
but in this case it would only be a difference of like 2 instructions?




diff --git a/libavcodec/aarch64/mpegvideoencdsp_neon.S 
b/libavcodec/aarch64/mpegvideoencdsp_neon.S
new file mode 100644
index 00..89e50e29b3
--- /dev/null
+++ b/libavcodec/aarch64/mpegvideoencdsp_neon.S
@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2024 Ramiro Polla
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+function ff_pix_sum16_neon, export=1
+// x0  const uint8_t *pix
+// x1  int line_size
+
+sxtwx1, w1
+moviv0.16b, #0
+mov w2, #16
+
+1:
+ld1 { v1.16b }, [x0], x1


Nit; we usually don't have these {} written with spaces inside of the 
braces, same below.



+subsw2, w2, #1
+uadalp  v0.8h, v1.16b
+b.ne1b
+
+uaddlp  v0.4s, v0.8h
+uaddlv  d0, v0.4s


Couldn't this be aggregated with just one instruction, "uaddlv s0, v0.8h"? 
There's no need to widen it to 64 bit as we're truncating the returned 
value to 32 bit anyway.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [TC invoked] [PATCH 2/4] lavc/mpegvideo: use H263DSP dequant function

2024-08-17 Thread Martin Storsjö

On Sat, 6 Jul 2024, Rémi Denis-Courmont wrote:


Le lauantaina 6. heinäkuuta 2024, 19.20.33 EEST Andreas Rheinhardt a écrit :

Rémi Denis-Courmont:
> Le lauantaina 6. heinäkuuta 2024, 18.23.00 EEST Andreas Rheinhardt a écrit 

:
>> 
>> This adds an indirection. I have asked you to actually benchmark this

>> code (and not only the DSP function you add), but you never did.
> 
> I already pointed out previously that this is the way this project does

> DSP
> code. Certainly it would be nice to hard-code the path when there is only
> one possible. This is often the case on Armv8 notably, and of course on
> platforms without optimisations.
> 
> But that's a general problem way beyond the scope of this patchset. We

> always add indirect function calls in this sort of situation, and I don't
> see why I would have duty to benchmark it, so I am going to ignore this.

You have a duty to benchmark it because you add it where it wasn't before.


I don't recall other people benchmarking the indirect branch they've added 
previously for other DSP code. Recent examples include VVC and FLAC. 
Rightfully so, because there is not really an alternative anyway. Even GNU 
IFUNCs and Glibc alternative libraries internally use an indirect branch 
(hidden in PLT/GOT), and FFmpeg can't self-patch at load-time like the Linux 
kernel does, nor can it generate dynamic PLT entries with direct branches.


Also if an indirect call is unacceptable, then how come the calling code is 
itself an indirect call and for abstraction rather than performance.


Your request is completely arbitrary here. Yes, there is already an indirect 
call close up, and so? I'm not trying to clean MpegEncContext here, only 
trying to add one function to checkasm, RVV and (with James' work) post-MMX 
x86.


Hi,

As discussions on these patchsets didn't seem to progress but seemed to 
get stuck, Rémi requested (admittedly, many many weeks ago) the TC to 
resolve the disputes here.


Therefore - the TC has now started reading up on the earlier arguments 
made on the mailing list, and will now be discussing what the suggested 
way forward will be.


Regards,
The FFmpeg Technical Committee
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] avcodec/aarch64/me_cmp: add dotprod implementations of sse16 and vsse_intra16

2024-08-16 Thread Martin Storsjö

On Thu, 15 Aug 2024, Ramiro Polla wrote:


checkasm --bench for Raspberry Pi 5 Model B Rev 1.0:
sse_0_c: 241.5
sse_0_neon: 37.2
sse_0_dotprod: 22.2
vsse_4_c: 148.7
vsse_4_neon: 31.0
vsse_4_dotprod: 15.7
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  14 +++
libavcodec/aarch64/me_cmp_neon.S | 114 +++
2 files changed, 128 insertions(+)


LGTM, thanks!

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter

2024-08-16 Thread Martin Storsjö

On Thu, 15 Aug 2024, Ramiro Polla wrote:


Thank you for the review. New patch attached.


Thanks - this looks very straightforward and nice now! Just one minor nit 
below:



+add x4, x4, x5, sxtw// src1 += srcPadding
+add x9, x9, x5, sxtw// src2 += srcPadding
+add x0, x0, x1, sxtw// dst1 += dstPadding1
+add x2, x2, x3, sxtw// dst2 += dstPadding2


Since you're doing sxtw, I would have expected to have the last register 
referenced as wN, not xN. I'd guess that some picky versions of assemblers 
could error out due to this, so it could be good to change that just to be 
safe.


Other than that, this looks extremely straightforward and nice.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] libavcodec/arm/mlpdsp_armv5te: fix label format to work with binutils 2.43

2024-08-16 Thread Martin Storsjö

On Thu, 15 Aug 2024, Sebastian Ramacher wrote:


On 2024-08-13 23:24:06 +0300, Martin Storsjö wrote:

On Fri, 9 Aug 2024, Ross Burton wrote:

> binutils 2.43 has stricter validation for labels[1] and results in errors
> when building ffmpeg for armv5:
> 
> src/libavcodec/arm/mlpdsp_armv5te.S:232: Error: junk at end of line, first unrecognized character is `0'
> 
> Remove the leading zero in the "01" label to resolve this error.
> 
> [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=226749d5a6ff0d5c607d6428d6c81e1e7e7a994b
> 
> Signed-off-by: Ross Burton 

> ---
> libavcodec/arm/mlpdsp_armv5te.S | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)

LGTM, thanks, I pushed this patch!


This patch fixes #11074. Please backport to 7.0.


Done, I backported this to 7.0 and a handful of earlier, seemingly active 
older branches.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter

2024-08-14 Thread Martin Storsjö

On Fri, 9 Aug 2024, Ramiro Polla wrote:


checkasm --bench for Raspberry Pi 5 Model B Rev 1.0:
nv24_yuv420p_128_c: 423.0
nv24_yuv420p_128_neon: 115.7
nv24_yuv420p_1920_c: 5939.5
nv24_yuv420p_1920_neon: 1339.7
nv42_yuv420p_128_c: 423.2
nv42_yuv420p_128_neon: 115.7
nv42_yuv420p_1920_c: 5907.5
nv42_yuv420p_1920_neon: 1342.5
---
libswscale/aarch64/Makefile|  1 +
libswscale/aarch64/swscale_unscaled.c  | 30 +
libswscale/aarch64/swscale_unscaled_neon.S | 75 ++
3 files changed, 106 insertions(+)
create mode 100644 libswscale/aarch64/swscale_unscaled_neon.S



diff --git a/libswscale/aarch64/swscale_unscaled_neon.S 
b/libswscale/aarch64/swscale_unscaled_neon.S
new file mode 100644
index 00..a206fda41f
--- /dev/null
+++ b/libswscale/aarch64/swscale_unscaled_neon.S
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2024 Ramiro Polla
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+function ff_nv24_to_yuv420p_chroma_neon, export=1
+// x0  uint8_t *dst1
+// x1  int dstStride1
+// x2  uint8_t *dst2
+// x3  int dstStride2
+// x4  const uint8_t *src
+// x5  int srcStride
+// w6  int w
+// w7  int h
+
+uxtwx1, w1
+uxtwx3, w3
+uxtwx5, w5


You can often avoid the explicit uxtw instructions, if you can fold an 
uxtw attribute into the cases where the register is used. (If it's used 
often, it may be slightly more performant to do it upfront like this 
though, but often it can be omitted entirely.) And whenever you do an 
operation with a wN register as destination, the upper half of the 
register gets explicitly cleared, so these also may be avoided that way.



+
+add x9, x4, x5  // x9 = src + srcStride
+lsl w5, w5, #1  // srcStride *= 2
+
+1:
+mov w10, w6 // w10 = w
+mov x11, x4 // x11 = src1 (line 1)
+mov x12, x9 // x12 = src2 (line 2)
+mov x13, x0 // x13 = dst1 (dstU)
+mov x14, x2 // x14 = dst2 (dstV)
+
+2:
+ld2 { v0.16b, v1.16b }, [x11], #32 // v0 = U1, v1 = V1
+ld2 { v2.16b, v3.16b }, [x12], #32 // v2 = U2, v3 = V2
+
+uaddlp  v0.8h, v0.16b   // pairwise add U1 into v0
+uaddlp  v1.8h, v1.16b   // pairwise add V1 into v1
+uadalp  v0.8h, v2.16b   // pairwise add U2, 
accumulate into v0
+uadalp  v1.8h, v3.16b   // pairwise add V2, 
accumulate into v1
+
+shrnv0.8b, v0.8h, #2// divide by 4
+shrnv1.8b, v1.8h, #2// divide by 4
+
+st1 { v0.8b }, [x13], #8// store U into dst1
+st1 { v1.8b }, [x14], #8// store V into dst2
+
+subsw10, w10, #8
+b.gt2b
+
+// next row
+add x4, x4, x5  // src1 += srcStride * 2
+add x9, x9, x5  // src2 += srcStride * 2
+add x0, x0, x1  // dst1 += dstStride1
+add x2, x2, x3  // dst2 += dstStride2


It's often possible to avoid the extra step of moving the pointers back 
into the the x11/x12/x13/x14 registers, if you subtract the width from the 
stride at the start of the function. Then you don't need two separate 
registers for each pointer, and shortens dependency chain when moving on 
to the next line.


If the width can be any uneven value, but we in practice write in 
increments of 8 pixels, you may need to align the width up to 8 before 
using it to decrement the stride that way though.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 5/5] swscale/aarch64/yuv2rgb: add neon yuv42{0, 2}p -> gbrp unscaled colorspace converters

2024-08-14 Thread Martin Storsjö

On Tue, 6 Aug 2024, Ramiro Polla wrote:


checkasm --bench on a Raspberry Pi 5 Model B Rev 1.0:
yuv420p_gbrp_128_c: 1243.0
yuv420p_gbrp_128_neon: 453.5
yuv420p_gbrp_1920_c: 18165.5
yuv420p_gbrp_1920_neon: 6700.0
yuv422p_gbrp_128_c: 1463.5
yuv422p_gbrp_128_neon: 471.5
yuv422p_gbrp_1920_c: 21343.7
yuv422p_gbrp_1920_neon: 6743.5
---
libswscale/aarch64/swscale_unscaled.c | 58 +
libswscale/aarch64/yuv2rgb_neon.S | 73 ++-
2 files changed, 118 insertions(+), 13 deletions(-)


This looks reasonable to me, thanks!

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] libavcodec/arm/mlpdsp_armv5te: fix label format to work with binutils 2.43

2024-08-13 Thread Martin Storsjö

On Fri, 9 Aug 2024, Ross Burton wrote:


binutils 2.43 has stricter validation for labels[1] and results in errors
when building ffmpeg for armv5:

src/libavcodec/arm/mlpdsp_armv5te.S:232: Error: junk at end of line, first 
unrecognized character is `0'

Remove the leading zero in the "01" label to resolve this error.

[1] 
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=226749d5a6ff0d5c607d6428d6c81e1e7e7a994b

Signed-off-by: Ross Burton 
---
libavcodec/arm/mlpdsp_armv5te.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)


LGTM, thanks, I pushed this patch!

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] avfilter/video: don't zero allocated buffers if memory poisoning is used

2024-08-13 Thread Martin Storsjö

On Tue, 13 Aug 2024, James Almer wrote:


ffmpeg | branch: master | James Almer  | Sat Aug 10 21:31:16 
2024 -0300| [41307ff3e9384c51d646bff7e3dcf0d554098a8f] | committer: James Almer

avfilter/video: don't zero allocated buffers if memory poisoning is used

Same as in avcodec/get_buffer.c
Should help in debugging use of uninitialized memory.

Signed-off-by: James Almer 


http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=41307ff3e9384c51d646bff7e3dcf0d554098a8f

---

libavfilter/video.c | 12 
1 file changed, 8 insertions(+), 4 deletions(-)


This change broke a bunch of fate tests - in particular fate-vsynth3-rpza 
and most fate-filter-pixfmts-*.


The issue doesn't show up in normal builds of ffmpeg, unless building with 
--enable-memory-poisoning. And in such a build, tools like valgrind 
doesn't detect the issue right away, as the memory poisoning causes the 
buffers to be deterministically initialized to a nonzero value (but the 
fate test produces the wrong output).


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [GASPP PATCH] Omit the "-c" argument from the preprocessing command

2024-07-29 Thread Martin Storsjö

On Thu, 25 Jul 2024, Martin Storsjö wrote:


A command like "cc -c -E" is tautological; the -c is ignored, when
we explicitly specify that we want to preprocess only.

Since
https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353cf5600e9c
and https://github.com/llvm/llvm-project/pull/98607, Clang now
warns about the unused "-c" argument in this case.

We already did omit the "-c" argument when preprocessing
(with cl.exe) for armasm, but do this for other cases as well.
---
gas-preprocessor.pl | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
index 19b0131..aa3abc0 100755
--- a/gas-preprocessor.pl
+++ b/gas-preprocessor.pl
@@ -155,6 +155,8 @@ while ($index < $#preprocess_c_cmd) {
$index++;
}

+@preprocess_c_cmd = grep ! /^-c$/, @preprocess_c_cmd;
+
my $tempfile;
if ($as_type ne "armasm") {
@gcc_cmd = map { /\.[csS]$/ ? qw(-x assembler -) : $_ } @gcc_cmd;
@@ -163,7 +165,6 @@ if ($as_type ne "armasm") {
# Clang warns about unused -D parameters when invoked with "-x assembler".
@gcc_cmd = grep ! /^-D/, @gcc_cmd;
} else {
-@preprocess_c_cmd = grep ! /^-c$/, @preprocess_c_cmd;
@preprocess_c_cmd = grep ! /^-m/, @preprocess_c_cmd;

@preprocess_c_cmd = grep ! /^-G/, @preprocess_c_cmd;
--
2.34.1


Will push this soon.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] aarch64: Implement support for elf_aux_info(3) on FreeBSD and OpenBSD

2024-07-28 Thread Martin Storsjö

On Sun, 28 Jul 2024, Rémi Denis-Courmont wrote:


Le 28 juillet 2024 07:37:51 GMT+03:00, Brad Smith 
 a écrit :

On 2024-07-26 7:56 a.m., Rémi Denis-Courmont wrote:


Le 26 juillet 2024 13:58:34 GMT+03:00, Brad Smith 
 a écrit :

aarch64: Implement support for elf_aux_info(3) on FreeBSD and OpenBSD

FreeBSD 12.0+, OpenBSD -current and what will be OpenBSD 7.6 support
elf_aux_info(3).

Signed-off-by: Brad Smith 
---
configure   |  2 ++
libavutil/aarch64/cpu.c | 23 ++-
2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index f6f5c29fea..e80b549582 100755
--- a/configure
+++ b/configure
@@ -2366,6 +2366,7 @@ SYSTEM_FUNCS="
 clock_gettime
 closesocket
 CommandLineToArgvW
+elf_aux_info
 fcntl
 getaddrinfo
 getauxval
@@ -6565,6 +6566,7 @@ check_func_headers mach/mach_time.h mach_absolute_time
check_func_headers stdlib.h getenv
check_func_headers sys/stat.h lstat
check_func_headers sys/auxv.h getauxval
+check_func_headers sys/auxv.h elf_aux_info
check_func_headers sys/sysctl.h sysctlbyname

check_func_headers windows.h GetModuleHandle
diff --git a/libavutil/aarch64/cpu.c b/libavutil/aarch64/cpu.c
index cfa9306663..05272b4db4 100644
--- a/libavutil/aarch64/cpu.c
+++ b/libavutil/aarch64/cpu.c
@@ -42,6 +42,27 @@ static int detect_flags(void)
 return flags;
}

+#elif (defined(__FreeBSD__) || defined(__OpenBSD__)) && HAVE_ELF_AUX_INFO
+#include 
+#include 
+
+static int detect_flags(void)
+{
+int flags = 0;
+
+unsigned long hwcap = 0;
+elf_aux_info(AT_HWCAP, &hwcap, sizeof hwcap);
+unsigned long hwcap2 = 0;
+elf_aux_info(AT_HWCAP2, &hwcap2, sizeof hwcap2);
+
+if (hwcap & HWCAP_ASIMDDP)
+flags |= AV_CPU_FLAG_DOTPROD;
+if (hwcap2 & HWCAP2_I8MM)
+flags |= AV_CPU_FLAG_I8MM;
+
+return flags;
+}
+

Can't getauxval() be implemented with elf_aux_info(), or vice versa? It seems 
that otherwise the code should be identical to that from Linux.


QEMU has qemu_getauxval() for example as a wrapper. I will be using this 
elsewhere for arm, ppc and riscv.

I could split this up, but I am not sure where to place such a function.


I don't personally have a strong opinion on the details, I just don't fancy 
unnecessary duplication.


I don't have a very strong opinion on it either, but if it's reasonable to 
make this a static inline function in a header, it could e.g. be in the 
compat directory, or just as a private header in libavutil.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [GASPP PATCH] Omit the "-c" argument from the preprocessing command

2024-07-25 Thread Martin Storsjö
A command like "cc -c -E" is tautological; the -c is ignored, when
we explicitly specify that we want to preprocess only.

Since
https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353cf5600e9c
and https://github.com/llvm/llvm-project/pull/98607, Clang now
warns about the unused "-c" argument in this case.

We already did omit the "-c" argument when preprocessing
(with cl.exe) for armasm, but do this for other cases as well.
---
 gas-preprocessor.pl | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
index 19b0131..aa3abc0 100755
--- a/gas-preprocessor.pl
+++ b/gas-preprocessor.pl
@@ -155,6 +155,8 @@ while ($index < $#preprocess_c_cmd) {
 $index++;
 }
 
+@preprocess_c_cmd = grep ! /^-c$/, @preprocess_c_cmd;
+
 my $tempfile;
 if ($as_type ne "armasm") {
 @gcc_cmd = map { /\.[csS]$/ ? qw(-x assembler -) : $_ } @gcc_cmd;
@@ -163,7 +165,6 @@ if ($as_type ne "armasm") {
 # Clang warns about unused -D parameters when invoked with "-x assembler".
 @gcc_cmd = grep ! /^-D/, @gcc_cmd;
 } else {
-@preprocess_c_cmd = grep ! /^-c$/, @preprocess_c_cmd;
 @preprocess_c_cmd = grep ! /^-m/, @preprocess_c_cmd;
 
 @preprocess_c_cmd = grep ! /^-G/, @preprocess_c_cmd;
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] checkasm: Increase the tolerance for ac3_sum_square_butterfly_float

2024-07-24 Thread Martin Storsjö

On Tue, 23 Jul 2024, Michael Niedermayer wrote:


On Wed, Jul 24, 2024 at 12:01:32AM +0300, Martin Storsjö via ffmpeg-devel wrote:

Increase the tolerance from 10 ulp to 11 ulp. This fixes occasional
errors for some inputs; the errors could be reproduced on
aarch64/neon builds, with "checkasm --test=ac3dsp 3446175925".
---
 tests/checkasm/ac3dsp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


probably ok


Thanks, pushed it now.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH] checkasm: Increase the tolerance for ac3_sum_square_butterfly_float

2024-07-23 Thread Martin Storsjö via ffmpeg-devel
Increase the tolerance from 10 ulp to 11 ulp. This fixes occasional
errors for some inputs; the errors could be reproduced on
aarch64/neon builds, with "checkasm --test=ac3dsp 3446175925".
---
 tests/checkasm/ac3dsp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/checkasm/ac3dsp.c b/tests/checkasm/ac3dsp.c
index 442e965f3b..8c682d03cd 100644
--- a/tests/checkasm/ac3dsp.c
+++ b/tests/checkasm/ac3dsp.c
@@ -181,7 +181,7 @@ static void 
check_ac3_sum_square_butterfly_float(AC3DSPContext *c) {
 call_ref(v1, lt, rt, ELEMS);
 call_new(v2, lt, rt, ELEMS);
 
-if (!float_near_ulp_array(v1, v2, 10, 4))
+if (!float_near_ulp_array(v1, v2, 11, 4))
 fail();
 
 bench_new(v2, lt, rt, ELEMS);
-- 
2.39.3 (Apple Git-146)

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] Add Mediacodec audio decoders support

2024-07-23 Thread Martin Storsjö

On Wed, 10 Jul 2024, Zhao Zhili wrote:




On Jun 12, 2024, at 21:42, Matthieu Bouron  wrote:

Hello,

This patchset adds Mediacodec audio decoders support. Currently, only AAC, AMR,
MP3, FLAC, VORBIS and OPUS are supported.

This is mainly useful to avoid shipping Android builds of FFmpeg that are
subjects to licensing/patents (due to AAC and AMR).


I’m not keen on put OS audio decoder/encoder wrapper into FFmpeg. They 
don’t bring new features, they don’t improve performance. I know these 
type of wrapper exist in current project, but I’m not sure if it’s a 
good idea to add more.


I don't see a problem with it. It doesn't add much extra code, we already 
have MediaCodec interfacing in place, it allows users to set up whichever 
configuration they want. We have this for other OS codec interfaces as 
well, I don't see a problem with adding this one as well - no need for the 
further arguments about security and sandboxes.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 2/2] aarch64: vvc: Consistently use # for immediate constants

2024-07-23 Thread Martin Storsjö
---
 libavcodec/aarch64/vvc/alf.S | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libavcodec/aarch64/vvc/alf.S b/libavcodec/aarch64/vvc/alf.S
index 828031cb90..eec193302a 100644
--- a/libavcodec/aarch64/vvc/alf.S
+++ b/libavcodec/aarch64/vvc/alf.S
@@ -95,7 +95,7 @@
 .else
 ldr q5, [x5]// curr
 .endif
-moviv20.4s, 64
+moviv20.4s, #64
 cbz is_near_vb, 1f
 shl v20.4s, v20.4s, #3
 1:
@@ -220,7 +220,7 @@
 .else
 ldr d5, [x5]// curr
 .endif
-moviv20.4s, 64
+moviv20.4s, #64
 cbz is_near_vb, 1f
 shl v20.4s, v20.4s, #3
 1:
@@ -267,12 +267,12 @@ function ff_alf_filter_luma_kernel_8_neon, export=1
 endfunc
 
 function ff_alf_filter_luma_kernel_12_neon, export=1
-mov w5, 4095
+mov w5, #4095
 b   1f
 endfunc
 
 function ff_alf_filter_luma_kernel_10_neon, export=1
-mov w5, 1023
+mov w5, #1023
 1:
 alf_filter_luma_kernel  2
 endfunc
@@ -282,12 +282,12 @@ function ff_alf_filter_chroma_kernel_8_neon, export=1
 endfunc
 
 function ff_alf_filter_chroma_kernel_12_neon, export=1
-mov w5, 4095
+mov w5, #4095
 b   1f
 endfunc
 
 function ff_alf_filter_chroma_kernel_10_neon, export=1
-mov w5, 1023
+mov w5, #1023
 1:
 alf_filter_chroma_kernel  2
 endfunc
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH 1/2] aarch64: vvc: Fix compilation of alf.S with MSVC 2022 17.7 and older

2024-07-23 Thread Martin Storsjö
Use the "ldur" instruction explicitly, instead of having the
assembler implicitly convert "ldr" instructions to "ldur".

This fixes build errors like these:

libavcodec\aarch64\vvc\alf.o.asm(1023) : error A2518: operand 2: Memory offset 
must be aligned
ldr q22, [x3, #24]
libavcodec\aarch64\vvc\alf.o.asm(1024) : error A2518: operand 2: Memory offset 
must be aligned
ldr q24, [x2, #24]
libavcodec\aarch64\vvc\alf.o.asm(1393) : error A2518: operand 2: Memory offset 
must be aligned
ldr q22, [x3, #24]
libavcodec\aarch64\vvc\alf.o.asm(1394) : error A2518: operand 2: Memory offset 
must be aligned
ldr q24, [x2, #24]
---
 libavcodec/aarch64/vvc/alf.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/aarch64/vvc/alf.S b/libavcodec/aarch64/vvc/alf.S
index beb36ac66b..828031cb90 100644
--- a/libavcodec/aarch64/vvc/alf.S
+++ b/libavcodec/aarch64/vvc/alf.S
@@ -81,8 +81,8 @@
 .endif
 ldr q0, [clip]  // clip
 ldr q1, [filter]// filter
-ldr q22, [clip, #24]// clip
-ldr q24, [filter, #24]  // filter
+ldurq22, [clip, #24]// clip
+ldurq24, [filter, #24]  // filter
 
 ldr x5, [pp]// x5: p0
 ldr x6, [pp, #(5*8)]// x6: p5
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] avcodec/vvc: Add aarch64 neon optimization for ALF

2024-07-23 Thread Martin Storsjö

On Mon, 22 Jul 2024, Nuo Mi wrote:


On Mon, Jul 22, 2024 at 9:15 PM Martin Storsjö  wrote:
  On Mon, 22 Jul 2024, Zhao Zhili wrote:

  > ffmpeg | branch: master | Zhao Zhili  |
  Tue Jul 16 00:19:15 2024 +0800|
  [2d4ef304c9e13f5e8abe37c20ddd0f17102c6393] | committer: Nuo Mi
  >
  > avcodec/vvc: Add aarch64 neon optimization for ALF
  >
  
>>http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=2d4ef304c9e13f5e8
  abe37c20ddd0f17102c6393
  > ---
  >
  > libavcodec/aarch64/vvc/Makefile       |   5 +
  > libavcodec/aarch64/vvc/alf.S          | 293
  ++
  > libavcodec/aarch64/vvc/alf_template.c | 157 ++
  > libavcodec/aarch64/vvc/dsp_init.c     |  57 +++
  > libavcodec/vvc/dsp.c                  |   4 +-
  > libavcodec/vvc/dsp.h                  |   1 +
  > 6 files changed, 516 insertions(+), 1 deletion(-)

  I didn't review this patch yet.

  I've been on vacation, and I was hoping to get to reviewing this
  (and
  other things) soon when I catch up.

  The patch hasn't even been on the mailing list for one single
  week! For
  areas where earlier reviews have required multiple iterations to
  get
  patches right, I would hope that we could wait for an actual
  review.

Hi Martin,
Sorry for this.I will wait more time next time.
Do you prefer to revert the patch temporarily?


No, it doesn't seem to be necesssary - the patch seems mostly fine in 
practice; I only have a couple of minor suggestions. I'll send a patch to 
fix those bits.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] avcodec/vvc: Add aarch64 neon optimization for ALF

2024-07-22 Thread Martin Storsjö

On Mon, 22 Jul 2024, Zhao Zhili wrote:


ffmpeg | branch: master | Zhao Zhili  | Tue Jul 16 
00:19:15 2024 +0800| [2d4ef304c9e13f5e8abe37c20ddd0f17102c6393] | committer: Nuo Mi

avcodec/vvc: Add aarch64 neon optimization for ALF


http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=2d4ef304c9e13f5e8abe37c20ddd0f17102c6393

---

libavcodec/aarch64/vvc/Makefile   |   5 +
libavcodec/aarch64/vvc/alf.S  | 293 ++
libavcodec/aarch64/vvc/alf_template.c | 157 ++
libavcodec/aarch64/vvc/dsp_init.c |  57 +++
libavcodec/vvc/dsp.c  |   4 +-
libavcodec/vvc/dsp.h  |   1 +
6 files changed, 516 insertions(+), 1 deletion(-)


I didn't review this patch yet.

I've been on vacation, and I was hoping to get to reviewing this (and 
other things) soon when I catch up.


The patch hasn't even been on the mailing list for one single week! For 
areas where earlier reviews have required multiple iterations to get 
patches right, I would hope that we could wait for an actual review.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] Add Mediacodec audio decoders support

2024-07-19 Thread Martin Storsjö

On Wed, 17 Jul 2024, Anton Khirnov wrote:


Quoting Cosmin Stejerean via ffmpeg-devel (2024-07-16 22:14:19)



> On Jul 16, 2024, at 8:24 PM, Rémi Denis-Courmont  wrote:
> 
> Le tiistaina 16. heinäkuuta 2024, 18.48.06 EEST Cosmin Stejerean via ffmpeg-

> devel a écrit :
>> To add another data point, the platform decoders might also be more secure
>> due to sandboxing. I believe as of Android Q the software decoders provided
>> by MediaCodec have been moved to run within a constrained sandbox.
> 
> Platform decoders are in all likelihood strictly less secure than software 
> decoders. Software decoders will run in a user-space sandboxed within their 
> respective application. Platform decoders will run in a more privileged system 
> service, with direct access to a kernel driver in EL1, through that to the 
> firmware running on the video DSP.
> 
> More performant and energy-efficient. But also way way less secure.
> 
> The only viewpoint whence this is more secure, is the content publisher's: 
> this model enables DRM with hardware pass-through (but that does not even 
> apply if you use FFmpeg as the front end).
> 


Platform provided *software* decoders should be more secure than bundled 
software decoders due to the sandboxing of software decoders in recent versions 
of Android.


If that is such an important feature to someone then it is not
inconceivable to implement some sort of sandboxing inside avcodec.

I'm not a big fan of the argument "we should provide passthrough to
proprietary decoders because they are more secure".


My 2 cents on this matter: I don't care much about the arguments about 
more secure or less secure here - I don't see that as affecting the 
decision either way.


We generally don't add wrappings of third party proprietary 
encoders/decoders.


But for codecs shipped as part of the OS, I don't see an issue with us 
allowing accessing those codecs. It's not like we're favouring any 
specific third party, we're just facilitating access to whatever is 
already there.


And especially in this case, we already have the general code for 
accessing the MediaCodec API for codecs on Android - I don't have a 
problem with extending this to a few more codecs, as that's not much more 
than just providing mappings to the codec identifiers.


Similarly, we already allow decoding and encoding of a long range of 
codecs via AudioToolbox on Apple platforms - I don't see MediaCodec on 
Android as any different than that.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] x86/intreadwrite: add missing casts to pointer arguments

2024-07-11 Thread Martin Storsjö

On Thu, 11 Jul 2024, James Almer wrote:


Should make strict compilers happy.
Also, make AV_COPY128 use integer operations while at it.

Signed-off-by: James Almer 
---
libavutil/x86/intreadwrite.h | 15 ---
1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h
index d916410e14..65cc6b39a1 100644
--- a/libavutil/x86/intreadwrite.h
+++ b/libavutil/x86/intreadwrite.h
@@ -23,32 +23,25 @@

#include 
#include "config.h"
-#if HAVE_INTRINSICS_SSE && defined(__SSE__)
-#include 
-#endif


If we no longer use HAVE_INTRINSICS_SSE, should we remove the 
corresponding check in configure too?


Thanks, this patch seems to avoid the issue discussed in the other thread. 
(I'm not familiar enough with these intrinsics to be able to comment 
meaningfully on the patch itself though.)


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

2024-07-11 Thread Martin Storsjö

On Thu, 11 Jul 2024, James Almer wrote:


On 7/11/2024 10:54 AM, Martin Storsjö wrote:

On Thu, 11 Jul 2024, Martin Storsjö wrote:


On Thu, 11 Jul 2024, James Almer wrote:


On 7/11/2024 10:08 AM, Martin Storsjö wrote:

On Wed, 10 Jul 2024, James Almer wrote:

ffmpeg | branch: master | James Almer  | Wed Jul 
10 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] 
| committer: James Almer


x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

This has the benefit of removing any SSE -> AVX penalty that may 
happen when

the compiler emits VEX encoded instructions.

Signed-off-by: James Almer 






http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690

---

configure    |  5 -
libavutil/x86/intreadwrite.h | 20 +++-
2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/configure b/configure
index f84fefeaab..7151ed1de3 100755
--- a/configure
+++ b/configure
@@ -2314,6 +2314,7 @@ HEADERS_LIST="

INTRINSICS_LIST="
    intrinsics_neon
+    intrinsics_sse
    intrinsics_sse2
"

@@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
armv8_deps="aarch64"
neon_deps_any="aarch64 arm"
intrinsics_neon_deps="neon"
-intrinsics_sse2_deps="sse2"
+intrinsics_sse_deps="sse"
+intrinsics_sse2_deps="sse2 intrinsics_sse"
vfp_deps="arm"
vfpv3_deps="vfp"
setend_deps="arm"
@@ -6446,6 +6448,7 @@ elif enabled loongarch; then
fi

check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
+check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
check_cc intrinsics_sse2 emmintrin.h "__m128i test = 

_mm_setzero_si128()"


check_ldflags -Wl,--as-needed
diff --git a/libavutil/x86/intreadwrite.h 

b/libavutil/x86/intreadwrite.h

index 9bbef00dba..6546eb016c 100644
--- a/libavutil/x86/intreadwrite.h
+++ b/libavutil/x86/intreadwrite.h
@@ -22,29 +22,25 @@
#define AVUTIL_X86_INTREADWRITE_H

#include 
+#if HAVE_INTRINSICS_SSE
+#include 
+#endif


This change seems to have broken builds for x86 with Clang 16 or 
newer. (Clang 15 and lower seems to be fine.)


See e.g. 


http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
for an example of the error. The issue is that a clang internal intrinsics 
header contains "_mm_comige_sh(__m128h A,", i.e. a parameter with the name 
"A" (which toolchain provided headers shouldn't use). This clashes with 
libavcodec/huffuyv.h, which has a "#define A 3".


This is obviously a Clang intrinsics header bug, but we can't fix 
the existing Clang 16-18 releases that are out there, so I guess 
what we 

can
do is change our "define A" to something more elaborate. (IIRC there 

are
some similar issues with names with ncurses and/or android headers 

too.)


// Martin


We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
probably other places, so changing huffyuv.h may not be enough.


That's quite possible, but those cases may be including intreadwrite.h 
before that, do it's possible it might not trigger there.


I'll see how many places need to be changed here.


huffyuvenc.c seems to be the only file that runs into the issue; it 
includes put_bits.h (which brings in libavutil/intreadwrite.h) after 
huffyuv.h.


Can we move the huffyuv.h include right after put_bits.h then? I 
personally prefer that over changing the RGBA defines to workaround a 
compiler bug.


That would probably work too.

However your other recent patch, which removes the include of immintrin.h, 
also avoids the issue altogether.


(FWIW, the upstream bug is fixed now in 
https://github.com/llvm/llvm-project/commit/6f04f46927cf54d19cc2a1470f47d5db4b3b96bb.)


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

2024-07-11 Thread Martin Storsjö

On Thu, 11 Jul 2024, Martin Storsjö wrote:


On Thu, 11 Jul 2024, James Almer wrote:


On 7/11/2024 10:08 AM, Martin Storsjö wrote:

On Wed, 10 Jul 2024, James Almer wrote:

ffmpeg | branch: master | James Almer  | Wed Jul 
10 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | 
committer: James Almer


x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

This has the benefit of removing any SSE -> AVX penalty that may 
happen when

the compiler emits VEX encoded instructions.

Signed-off-by: James Almer 




http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690

---

configure    |  5 -
libavutil/x86/intreadwrite.h | 20 +++-
2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/configure b/configure
index f84fefeaab..7151ed1de3 100755
--- a/configure
+++ b/configure
@@ -2314,6 +2314,7 @@ HEADERS_LIST="

INTRINSICS_LIST="
    intrinsics_neon
+    intrinsics_sse
    intrinsics_sse2
"

@@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
armv8_deps="aarch64"
neon_deps_any="aarch64 arm"
intrinsics_neon_deps="neon"
-intrinsics_sse2_deps="sse2"
+intrinsics_sse_deps="sse"
+intrinsics_sse2_deps="sse2 intrinsics_sse"
vfp_deps="arm"
vfpv3_deps="vfp"
setend_deps="arm"
@@ -6446,6 +6448,7 @@ elif enabled loongarch; then
fi

check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
+check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
check_cc intrinsics_sse2 emmintrin.h "__m128i test = 

_mm_setzero_si128()"


check_ldflags -Wl,--as-needed
diff --git a/libavutil/x86/intreadwrite.h 

b/libavutil/x86/intreadwrite.h

index 9bbef00dba..6546eb016c 100644
--- a/libavutil/x86/intreadwrite.h
+++ b/libavutil/x86/intreadwrite.h
@@ -22,29 +22,25 @@
#define AVUTIL_X86_INTREADWRITE_H

#include 
+#if HAVE_INTRINSICS_SSE
+#include 
+#endif


This change seems to have broken builds for x86 with Clang 16 or 
newer. (Clang 15 and lower seems to be fine.)


See e.g. 
http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
for an example of the error. The issue is that a clang internal 
intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter 
with the name "A" (which toolchain provided headers shouldn't use). 
This clashes with libavcodec/huffuyv.h, which has a "#define A 3".


This is obviously a Clang intrinsics header bug, but we can't fix the 
existing Clang 16-18 releases that are out there, so I guess what we 
can 
do is change our "define A" to something more elaborate. (IIRC there 
are 
some similar issues with names with ncurses and/or android headers 

too.)


// Martin


We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
probably other places, so changing huffyuv.h may not be enough.


That's quite possible, but those cases may be including intreadwrite.h 
before that, do it's possible it might not trigger there.


I'll see how many places need to be changed here.


huffyuvenc.c seems to be the only file that runs into the issue; it 
includes put_bits.h (which brings in libavutil/intreadwrite.h) after 
huffyuv.h.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

2024-07-11 Thread Martin Storsjö

On Thu, 11 Jul 2024, James Almer wrote:


On 7/11/2024 10:08 AM, Martin Storsjö wrote:

On Wed, 10 Jul 2024, James Almer wrote:

ffmpeg | branch: master | James Almer  | Wed Jul 10 
13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | 
committer: James Almer


x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

This has the benefit of removing any SSE -> AVX penalty that may 
happen when

the compiler emits VEX encoded instructions.

Signed-off-by: James Almer 




http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690

---

configure    |  5 -
libavutil/x86/intreadwrite.h | 20 +++-
2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/configure b/configure
index f84fefeaab..7151ed1de3 100755
--- a/configure
+++ b/configure
@@ -2314,6 +2314,7 @@ HEADERS_LIST="

INTRINSICS_LIST="
    intrinsics_neon
+    intrinsics_sse
    intrinsics_sse2
"

@@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
armv8_deps="aarch64"
neon_deps_any="aarch64 arm"
intrinsics_neon_deps="neon"
-intrinsics_sse2_deps="sse2"
+intrinsics_sse_deps="sse"
+intrinsics_sse2_deps="sse2 intrinsics_sse"
vfp_deps="arm"
vfpv3_deps="vfp"
setend_deps="arm"
@@ -6446,6 +6448,7 @@ elif enabled loongarch; then
fi

check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
+check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
check_cc intrinsics_sse2 emmintrin.h "__m128i test = 

_mm_setzero_si128()"


check_ldflags -Wl,--as-needed
diff --git a/libavutil/x86/intreadwrite.h 

b/libavutil/x86/intreadwrite.h

index 9bbef00dba..6546eb016c 100644
--- a/libavutil/x86/intreadwrite.h
+++ b/libavutil/x86/intreadwrite.h
@@ -22,29 +22,25 @@
#define AVUTIL_X86_INTREADWRITE_H

#include 
+#if HAVE_INTRINSICS_SSE
+#include 
+#endif


This change seems to have broken builds for x86 with Clang 16 or newer. 
(Clang 15 and lower seems to be fine.)


See e.g. 

http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
for an example of the error. The issue is that a clang internal 
intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter 
with the name "A" (which toolchain provided headers shouldn't use). This 
clashes with libavcodec/huffuyv.h, which has a "#define A 3".


This is obviously a Clang intrinsics header bug, but we can't fix the 
existing Clang 16-18 releases that are out there, so I guess what we 
can 
do is change our "define A" to something more elaborate. (IIRC there 
are 
some similar issues with names with ncurses and/or android headers 

too.)


// Martin


We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
probably other places, so changing huffyuv.h may not be enough.


That's quite possible, but those cases may be including intreadwrite.h 
before that, do it's possible it might not trigger there.


I'll see how many places need to be changed here.

I sent a fix to Clang in https://github.com/llvm/llvm-project/pull/98478.

// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

2024-07-11 Thread Martin Storsjö

On Wed, 10 Jul 2024, James Almer wrote:


ffmpeg | branch: master | James Almer  | Wed Jul 10 13:00:20 
2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | committer: James Almer

x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

This has the benefit of removing any SSE -> AVX penalty that may happen when
the compiler emits VEX encoded instructions.

Signed-off-by: James Almer 


http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690

---

configure|  5 -
libavutil/x86/intreadwrite.h | 20 +++-
2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/configure b/configure
index f84fefeaab..7151ed1de3 100755
--- a/configure
+++ b/configure
@@ -2314,6 +2314,7 @@ HEADERS_LIST="

INTRINSICS_LIST="
intrinsics_neon
+intrinsics_sse
intrinsics_sse2
"

@@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
armv8_deps="aarch64"
neon_deps_any="aarch64 arm"
intrinsics_neon_deps="neon"
-intrinsics_sse2_deps="sse2"
+intrinsics_sse_deps="sse"
+intrinsics_sse2_deps="sse2 intrinsics_sse"
vfp_deps="arm"
vfpv3_deps="vfp"
setend_deps="arm"
@@ -6446,6 +6448,7 @@ elif enabled loongarch; then
fi

check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
+check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
check_cc intrinsics_sse2 emmintrin.h "__m128i test = _mm_setzero_si128()"

check_ldflags -Wl,--as-needed
diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h
index 9bbef00dba..6546eb016c 100644
--- a/libavutil/x86/intreadwrite.h
+++ b/libavutil/x86/intreadwrite.h
@@ -22,29 +22,25 @@
#define AVUTIL_X86_INTREADWRITE_H

#include 
+#if HAVE_INTRINSICS_SSE
+#include 
+#endif


This change seems to have broken builds for x86 with Clang 16 or newer. 
(Clang 15 and lower seems to be fine.)


See e.g. 
http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
for an example of the error. The issue is that a clang internal intrinsics 
header contains "_mm_comige_sh(__m128h A,", i.e. a parameter with the name 
"A" (which toolchain provided headers shouldn't use). This clashes with 
libavcodec/huffuyv.h, which has a "#define A 3".


This is obviously a Clang intrinsics header bug, but we can't fix the 
existing Clang 16-18 releases that are out there, so I guess what we can 
do is change our "define A" to something more elaborate. (IIRC there are 
some similar issues with names with ncurses and/or android headers too.)


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH] tests/fate/filter-audio: convert atempo test to oneoff

2024-07-05 Thread Martin Storsjö

On Fri, 5 Jul 2024, James Almer wrote:


On 7/5/2024 2:38 AM, Anton Khirnov wrote:

Quoting James Almer (2024-07-04 22:45:28)

On 7/4/2024 4:04 PM, Anton Khirnov wrote:

Filter output is not bitexact.
---
Reference file at https://up.khirnov.net/7r.pcm, please put it in
filter-reference/atempo.pcm


How did you create it? x86_32 uses x87 floats which are a lot more
precise than sse ones, for example, so it's best to create a ref file
using such a build.


Does it matter when the result is s16 anyway?


Eh, who knows. Just in case i generated it on x86_32 with -cpuflags 0, and 
uploaded it. Confirm it's fine on your end too, otherwise I'll replace it 
with your file.


The sample you uploaded seems to work fine for me, on aarch64 with clang, 
where the test was failing before.


FWIW, re x86_32 and x87 - some compilers default to SSE2 math even for 
x86_32 targets these days, so depending on how you build, you may still 
get similar behaviour as to x86_64.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


  1   2   3   4   5   6   7   8   9   10   >