date:20220904

Re: [FFmpeg-devel] [RFC] d3dva security hw+threads

2022-09-04 Thread Anton Khirnov

Quoting Soft Works (2022-09-04 09:43:36)
> 
> 
> > -Original Message-
> > From: ffmpeg-devel  On Behalf Of
> > Anton Khirnov
> > Sent: Sunday, September 4, 2022 8:58 AM
> > To: FFmpeg development discussions and patches  > de...@ffmpeg.org>
> > Subject: Re: [FFmpeg-devel] [RFC] d3dva security hw+threads
> > 
> > Quoting Timo Rothenpieler (2022-09-02 01:46:59)
> > > On 02.09.2022 01:32, Michael Niedermayer wrote:
> > > > Hi all
> > > >
> > > > Theres a use after free issue in H.264 Decoding on d3d11va with
> > multiple threads
> > > > I dont have the hardware/platform nor do i know the hw decoding
> > code so i made
> > > > no attempt to fix this beyond asking others to ...
> > >
> > > hwaccel with multiple threads being broken is not exactly a
> > surprise.
> > > So we could just disable that, and always have it be one single
> > thread?
> > 
> > We are already disabling it in a way - the frame threading code
> > ensures
> > that threads run one at a time when hwaccel is being used.
> 
> 
> Is there a described way to repro? I would try whether it still 
> happens after removing the lock code in hwcontext_d3d11va.c.
> Those locks are not really needed and might prevent release 
> of dx11 resources in proper order. It's a guess only but 
> easy to try.

The problem is not in d3d11 locking code, but in the generic code that
does not have clear enough ownership rules. Steve already tested that my
patch from Friday fixes this.

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/pthread_frame: avoid leaving stale hwaccel state in worker threads

2022-09-04 Thread Anton Khirnov

Quoting Steve Lhomme (2022-09-05 07:42:17)
> Hi Anton,
> 
> On 2022-09-02 22:59, Anton Khirnov wrote:
> > This state is not refcounted, so make sure it always has a well-defined
> > owner.
> > ---
> > Steve, could you please test this?
> 
> I can confirm it doesn't leak the context and plays correctly. It also 
> doesn't crash ;)

Awesome, thank you very much for testing.

Will push tomorrow to master and 5.1, if nobody has further comments.

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/pthread_frame: avoid leaving stale hwaccel state in worker threads

2022-09-04 Thread Steve Lhomme


Hi Anton,

On 2022-09-02 22:59, Anton Khirnov wrote:

This state is not refcounted, so make sure it always has a well-defined
owner.
---
Steve, could you please test this?


I can confirm it doesn't leak the context and plays correctly. It also 
doesn't crash ;)



---
  libavcodec/pthread_frame.c | 37 -
  1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
index 08a6f98898..9b44e2e698 100644
--- a/libavcodec/pthread_frame.c
+++ b/libavcodec/pthread_frame.c
@@ -148,6 +148,10 @@ typedef struct FrameThreadContext {
  * Set for the first N packets, where N is 
the number of threads.
  * While it is set, 
ff_thread_en/decode_frame won't return any results.
  */
+
+const AVHWAccel *stash_hwaccel;
+void*stash_hwaccel_context;
+void*stash_hwaccel_priv;
  } FrameThreadContext;
  
  #if FF_API_THREAD_SAFE_CALLBACKS

@@ -228,9 +232,17 @@ FF_ENABLE_DEPRECATION_WARNINGS
  ff_thread_finish_setup(avctx);
  
  if (p->hwaccel_serializing) {

+/* wipe hwaccel state to avoid stale pointers lying around;
+ * the state was transferred to FrameThreadContext in
+ * ff_thread_finish_setup(), so nothing is leaked */
+avctx->hwaccel = NULL;
+avctx->hwaccel_context = NULL;
+avctx->internal->hwaccel_priv_data = NULL;
+
  p->hwaccel_serializing = 0;
  pthread_mutex_unlock(>parent->hwaccel_mutex);
  }
+av_assert0(!avctx->hwaccel);
  
  if (p->async_serializing) {

  p->async_serializing = 0;
@@ -294,9 +306,6 @@ static int update_context_from_thread(AVCodecContext *dst, 
AVCodecContext *src,
  dst->color_range = src->color_range;
  dst->chroma_sample_location = src->chroma_sample_location;
  
-dst->hwaccel = src->hwaccel;

-dst->hwaccel_context = src->hwaccel_context;
-
  dst->sample_rate= src->sample_rate;
  dst->sample_fmt = src->sample_fmt;
  #if FF_API_OLD_CHANNEL_LAYOUT
@@ -309,8 +318,6 @@ FF_ENABLE_DEPRECATION_WARNINGS
  if (err < 0)
  return err;
  
-dst->internal->hwaccel_priv_data = src->internal->hwaccel_priv_data;

-
  if (!!dst->hw_frames_ctx != !!src->hw_frames_ctx ||
  (dst->hw_frames_ctx && dst->hw_frames_ctx->data != 
src->hw_frames_ctx->data)) {
  av_buffer_unref(>hw_frames_ctx);
@@ -450,6 +457,12 @@ static int submit_packet(PerThreadContext *p, 
AVCodecContext *user_avctx,
  pthread_mutex_unlock(>mutex);
  return err;
  }
+
+/* transfer hwaccel state stashed from previous thread, if any */
+av_assert0(!p->avctx->hwaccel);
+FFSWAP(const AVHWAccel*, p->avctx->hwaccel, 
fctx->stash_hwaccel);
+FFSWAP(void*,p->avctx->hwaccel_context, 
fctx->stash_hwaccel_context);
+FFSWAP(void*,p->avctx->internal->hwaccel_priv_data, 
fctx->stash_hwaccel_priv);
  }
  
  av_packet_unref(p->avpkt);

@@ -655,6 +668,13 @@ void ff_thread_finish_setup(AVCodecContext *avctx) {
  async_lock(p->parent);
  }
  
+/* save hwaccel state for passing to the next thread;

+ * this is done here so that this worker thread can wipe its own hwaccel
+ * state after decoding, without requiring synchronization */
+p->parent->stash_hwaccel = avctx->hwaccel;
+p->parent->stash_hwaccel_context = avctx->hwaccel_context;
+p->parent->stash_hwaccel_priv= avctx->internal->hwaccel_priv_data;
+
  pthread_mutex_lock(>progress_mutex);
  if(atomic_load(>state) == STATE_SETUP_FINISHED){
  av_log(avctx, AV_LOG_WARNING, "Multiple ff_thread_finish_setup() 
calls\n");
@@ -761,6 +781,13 @@ void ff_frame_thread_free(AVCodecContext *avctx, int 
thread_count)
  av_freep(>threads);
  ff_pthread_free(fctx, thread_ctx_offsets);
  
+/* if we have stashed hwaccel state, move it to the user-facing context,

+ * so it will be freed in avcodec_close() */
+av_assert0(!avctx->hwaccel);
+FFSWAP(const AVHWAccel*, avctx->hwaccel, 
fctx->stash_hwaccel);
+FFSWAP(void*,avctx->hwaccel_context, 
fctx->stash_hwaccel_context);
+FFSWAP(void*,avctx->internal->hwaccel_priv_data, 
fctx->stash_hwaccel_priv);
+
  av_freep(>internal->thread_ctx);
  }
  
--

2.35.1


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avfilter/vf_scale: overwrite the width and eight expressions with the original values

2022-09-04 Thread James Almer

Instead of the potentially adjusted ones. Otherwise, if config_props() is
called again and if using force_original_aspect_ratio, the already adjusted
values could be altered again.

Example command line
scale=size=1920x1000:force_original_aspect_ratio=decrease:force_divisible_by=2

user value 1920x1000 -> 1920x798 on init_dict() -> 1918x798 on frame
change when eval_mode == EVAL_MODE_INIT, which after e645a1ddb9 could be at the
very first frame.

Signed-off-by: James Almer 
---
 libavfilter/vf_scale.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/libavfilter/vf_scale.c b/libavfilter/vf_scale.c
index 996f7aaa5b..2b12cf283c 100644
--- a/libavfilter/vf_scale.c
+++ b/libavfilter/vf_scale.c
@@ -491,19 +491,19 @@ static int config_props(AVFilterLink *outlink)
 if ((ret = scale_eval_dimensions(ctx)) < 0)
 goto fail;
 
-ff_scale_adjust_dimensions(inlink, >w, >h,
+outlink->w = scale->w;
+outlink->h = scale->h;
+
+ff_scale_adjust_dimensions(inlink, >w, >h,
scale->force_original_aspect_ratio,
scale->force_divisible_by);
 
-if (scale->w > INT_MAX ||
-scale->h > INT_MAX ||
-(scale->h * inlink->w) > INT_MAX ||
-(scale->w * inlink->h) > INT_MAX)
+if (outlink->w > INT_MAX ||
+outlink->h > INT_MAX ||
+(outlink->h * inlink->w) > INT_MAX ||
+(outlink->w * inlink->h) > INT_MAX)
 av_log(ctx, AV_LOG_ERROR, "Rescaled value for width or height is too 
big.\n");
 
-outlink->w = scale->w;
-outlink->h = scale->h;
-
 /* TODO: make algorithm configurable */
 
 scale->input_is_pal = desc->flags & AV_PIX_FMT_FLAG_PAL;
@@ -718,9 +718,9 @@ static int scale_frame(AVFilterLink *link, AVFrame *in, 
AVFrame **frame_out)
 goto scale;
 
 if (scale->eval_mode == EVAL_MODE_INIT) {
-snprintf(buf, sizeof(buf)-1, "%d", outlink->w);
+snprintf(buf, sizeof(buf) - 1, "%d", scale->w);
 av_opt_set(scale, "w", buf, 0);
-snprintf(buf, sizeof(buf)-1, "%d", outlink->h);
+snprintf(buf, sizeof(buf) - 1, "%d", scale->h);
 av_opt_set(scale, "h", buf, 0);
 
 ret = scale_parse_expr(ctx, NULL, >w_pexpr, "width", 
scale->w_expr);
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] lavc/qsvenc: use VBR if maxrate is not specified on Windows

2022-09-04 Thread Xiang, Haihao

On Thu, 2022-09-01 at 10:12 +0800, Xiang, Haihao wrote:
> From: Haihao Xiang 
> 
> Currently AVBR is disabled and VBR is the default method if maxrate is
> not specified on Linux, but AVBR is the default one if maxrate is not
> specified on Windows. In order to make user experience better accross
> Linux and Windows, use VBR by default on Windows if maxrate is not
> specified. User need to set both avbr_accuracy and avbr_convergence to
> non-zero explicitly and not to specify maxrate if AVBR is expected.
> 
> In addition, AVBR works for H264 and HEVC only in the SDK.
> 
> $ ffmpeg.exe -v verbose -f lavfi -i yuvtestsrc -vf "format=nv12" -c:v
> vp9_qsv -f null -
> ---
>  doc/encoders.texi| 45 ++--
>  libavcodec/qsvenc.c  |  5 -
>  libavcodec/qsvenc.h  |  6 --
>  libavcodec/qsvenc_h264.c |  1 +
>  libavcodec/qsvenc_hevc.c |  1 +
>  5 files changed, 39 insertions(+), 19 deletions(-)
> 
> diff --git a/doc/encoders.texi b/doc/encoders.texi
> index d36464d629..d2046e437d 100644
> --- a/doc/encoders.texi
> +++ b/doc/encoders.texi
> @@ -3244,9 +3244,9 @@ the average bitrate.
>  than the average bitrate.
>  
>  @item
> -@var{AVBR} - average VBR mode, when @option{maxrate} is not specified. This
> mode
> -is further configured by the @option{avbr_accuracy} and
> -@option{avbr_convergence} options.
> +@var{AVBR} - average VBR mode, when @option{maxrate} is not specified, both
> +@option{avbr_accuracy} and @option{avbr_convergence} are set to non-zero.
> This
> +mode is available for H264 and HEVC on Windows.
>  @end itemize
>  @end itemize
>  
> @@ -3300,19 +3300,6 @@ Specifies how many asynchronous operations an
> application performs
>  before the application explicitly synchronizes the result. If zero,
>  the value is not specified.
>  
> -@item @var{avbr_accuracy}
> -Accuracy of the AVBR ratecontrol (unit of tenth of percent).
> -
> -@item @var{avbr_convergence}
> -Convergence of the AVBR ratecontrol (unit of 100 frames)
> -
> -The parameters @var{avbr_accuracy} and @var{avbr_convergence} are for the
> -average variable bitrate control (AVBR) algorithm.
> -The algorithm focuses on overall encoding quality while meeting the specified
> -bitrate, @var{target_bitrate}, within the accuracy range @var{avbr_accuracy},
> -after a @var{avbr_Convergence} period. This method does not follow HRD and
> the
> -instant bitrate is not capped or padded.
> -
>  @item @var{preset}
>  This option itemizes a range of choices from veryfast (best speed) to
> veryslow
>  (best quality).
> @@ -3518,6 +3505,19 @@ Provides a hint to encoder about the scenario for the
> encoding session.
>  @item remotegaming
>  @end table
>  
> +@item @var{avbr_accuracy}
> +Accuracy of the AVBR ratecontrol (unit of tenth of percent).
> +
> +@item @var{avbr_convergence}
> +Convergence of the AVBR ratecontrol (unit of 100 frames)
> +
> +The parameters @var{avbr_accuracy} and @var{avbr_convergence} are for the
> +average variable bitrate control (AVBR) algorithm.
> +The algorithm focuses on overall encoding quality while meeting the specified
> +bitrate, @var{target_bitrate}, within the accuracy range @var{avbr_accuracy},
> +after a @var{avbr_Convergence} period. This method does not follow HRD and
> the
> +instant bitrate is not capped or padded.
> +
>  @end table
>  
>  @subsection HEVC Options
> @@ -3681,6 +3681,19 @@ Provides a hint to encoder about the scenario for the
> encoding session.
>  @item remotegaming
>  @end table
>  
> +@item @var{avbr_accuracy}
> +Accuracy of the AVBR ratecontrol (unit of tenth of percent).
> +
> +@item @var{avbr_convergence}
> +Convergence of the AVBR ratecontrol (unit of 100 frames)
> +
> +The parameters @var{avbr_accuracy} and @var{avbr_convergence} are for the
> +average variable bitrate control (AVBR) algorithm.
> +The algorithm focuses on overall encoding quality while meeting the specified
> +bitrate, @var{target_bitrate}, within the accuracy range @var{avbr_accuracy},
> +after a @var{avbr_Convergence} period. This method does not follow HRD and
> the
> +instant bitrate is not capped or padded.
> +
>  @end table
>  
>  @subsection MPEG2 Options
> diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c
> index 7ac5390f10..31ff3b76ed 100644
> --- a/libavcodec/qsvenc.c
> +++ b/libavcodec/qsvenc.c
> @@ -479,7 +479,10 @@ static int select_rc_mode(AVCodecContext *avctx,
> QSVEncContext *q)
>  rc_desc = "constant bitrate (CBR)";
>  }
>  #if QSV_HAVE_AVBR
> -else if (!avctx->rc_max_rate) {
> +else if (!avctx->rc_max_rate &&
> + (avctx->codec_id == AV_CODEC_ID_H264 || avctx->codec_id ==
> AV_CODEC_ID_HEVC) &&
> + q->avbr_accuracy &&
> + q->avbr_convergence) {
>  rc_mode = MFX_RATECONTROL_AVBR;
>  rc_desc = "average variable bitrate (AVBR)";
>  }
> diff --git a/libavcodec/qsvenc.h b/libavcodec/qsvenc.h
> index a983651dda..ff859f2a7e 100644
> --- a/libavcodec/qsvenc.h
> +++ b/libavcodec/qsvenc.h

Re: [FFmpeg-devel] [PATCH 1/2] lavu/hwcontext_qsv: add support for AV_PIX_FMT_VUYX

2022-09-04 Thread Xiang, Haihao

On Mon, 2022-08-29 at 14:01 +, Xiang, Haihao wrote:
> On Mon, 2022-08-29 at 08:17 -0300, James Almer wrote:
> > On 8/29/2022 4:27 AM, Xiang, Haihao wrote:
> > > From: Haihao Xiang 
> > > 
> > > AV_PIX_FMT_VUYX is used in FFmpeg for 8bit 4:4:4 content on Intel HW,
> > > and MFX_FOURCC_AYUV is used in the SDK
> > 
> > Sounds like you want the VUYA pixfmt instead.
> 
> AV_PIX_FMT_VUYX is used for 4:4:4 content in the VAAPI path, AV_PIX_FMT_VUYX
> is
> expected when creating vaapi urface. QSV is based on 
> VAAPI under Linux, so AV_PIX_FMT_VUYX is expected too in the QSV path when
> creating vaapi surface.
> 
> > 
> > > ---
> > >   libavutil/hwcontext_qsv.c | 8 
> > >   1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/libavutil/hwcontext_qsv.c b/libavutil/hwcontext_qsv.c
> > > index 510f422562..a3350eae0f 100644
> > > --- a/libavutil/hwcontext_qsv.c
> > > +++ b/libavutil/hwcontext_qsv.c
> > > @@ -119,6 +119,8 @@ static const struct {
> > >  MFX_FOURCC_YUY2 },
> > >   { AV_PIX_FMT_Y210,
> > >  MFX_FOURCC_Y210 },
> > > +{ AV_PIX_FMT_VUYX,
> > > +   MFX_FOURCC_AYUV },
> > >   #endif
> > >   };
> > >   
> > > @@ -1502,6 +1504,12 @@ static int map_frame_to_surface(const AVFrame
> > > *frame,
> > > mfxFrameSurface1 *surface)
> > >   surface->Data.U16 = (mfxU16 *)frame->data[0] + 1;
> > >   surface->Data.V16 = (mfxU16 *)frame->data[0] + 3;
> > >   break;
> > > +case AV_PIX_FMT_VUYX:
> > > +surface->Data.V = frame->data[0];
> > > +surface->Data.U = frame->data[0] + 1;
> > > +surface->Data.Y = frame->data[0] + 2;
> > > +surface->Data.A = frame->data[0] + 3;
> > 
> > This will go wrong with VUYX. You need to use AV_PIX_FMT_VUYA.
> 
> frame->data[0] + 3 is valid even if alpha channel is ignored in VUYX. Intel HW
> doesn't use the data in alpha channel actually, but the SDK uses Microsoft
> pixel
> format AYUV which is the alpha version, here set a valid address to alpha
> channel when mapping a VUYX AVFrame to a AYUV mfx surface. 
> 

Is there more comment about this patch ? 

Thanks
Haihao

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec: add a bsf to reorder DTS into PTS

2022-09-04 Thread James Almer

Starting with an h264 implementation. Can be extended to support other codecs.

A few caveats:
- OpenGOP streams are currently not supported. The firt packet must be an IDR
  frame.
- In some streams, a few frames at the end may not get a reordered PTS when
  they reference frames past EOS. The code added to derive timestamps from
  previous frames needs to be extended.

Addresses ticket #502.

Signed-off-by: James Almer 
---
Changes since v1:
- Properly handle video delay frames when building the tree instead of looking
  for negative dts.
- Increase the amount of packets to buffer by four.
- Keep track of poc reset in the tree by also identifying the GOP each POC came
  from.
  
TODO:
- More thorough garbage collection in the tree.
- Support streams that don't start with an IDR.
- Handle frames that reference POC past EOS in all cases.

 configure  |   1 +
 libavcodec/Makefile|   1 +
 libavcodec/bitstream_filters.c |   1 +
 libavcodec/dts2pts_bsf.c   | 534 +
 4 files changed, 537 insertions(+)
 create mode 100644 libavcodec/dts2pts_bsf.c

diff --git a/configure b/configure
index 932ea5b553..91ee5eb303 100755
--- a/configure
+++ b/configure
@@ -3275,6 +3275,7 @@ aac_adtstoasc_bsf_select="adts_header mpeg4audio"
 av1_frame_merge_bsf_select="cbs_av1"
 av1_frame_split_bsf_select="cbs_av1"
 av1_metadata_bsf_select="cbs_av1"
+dts2pts_bsf_select="cbs_h264 h264parse"
 eac3_core_bsf_select="ac3_parser"
 filter_units_bsf_select="cbs"
 h264_metadata_bsf_deps="const_nan"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index cb80f73d99..858e110b79 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -1176,6 +1176,7 @@ OBJS-$(CONFIG_AV1_FRAME_SPLIT_BSF)+= 
av1_frame_split_bsf.o
 OBJS-$(CONFIG_CHOMP_BSF)  += chomp_bsf.o
 OBJS-$(CONFIG_DUMP_EXTRADATA_BSF) += dump_extradata_bsf.o
 OBJS-$(CONFIG_DCA_CORE_BSF)   += dca_core_bsf.o
+OBJS-$(CONFIG_DTS2PTS_BSF)+= dts2pts_bsf.o
 OBJS-$(CONFIG_DV_ERROR_MARKER_BSF)+= dv_error_marker_bsf.o
 OBJS-$(CONFIG_EAC3_CORE_BSF)  += eac3_core_bsf.o
 OBJS-$(CONFIG_EXTRACT_EXTRADATA_BSF)  += extract_extradata_bsf.o\
diff --git a/libavcodec/bitstream_filters.c b/libavcodec/bitstream_filters.c
index 23ae93..a3bebefe5f 100644
--- a/libavcodec/bitstream_filters.c
+++ b/libavcodec/bitstream_filters.c
@@ -31,6 +31,7 @@ extern const FFBitStreamFilter ff_av1_metadata_bsf;
 extern const FFBitStreamFilter ff_chomp_bsf;
 extern const FFBitStreamFilter ff_dump_extradata_bsf;
 extern const FFBitStreamFilter ff_dca_core_bsf;
+extern const FFBitStreamFilter ff_dts2pts_bsf;
 extern const FFBitStreamFilter ff_dv_error_marker_bsf;
 extern const FFBitStreamFilter ff_eac3_core_bsf;
 extern const FFBitStreamFilter ff_extract_extradata_bsf;
diff --git a/libavcodec/dts2pts_bsf.c b/libavcodec/dts2pts_bsf.c
new file mode 100644
index 00..aaa1d1c370
--- /dev/null
+++ b/libavcodec/dts2pts_bsf.c
@@ -0,0 +1,534 @@
+/*
+ * Copyright (c) 2022 James Almer
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * Derive PTS by reordering DTS from supported streams
+ */
+
+#include "libavutil/avassert.h"
+#include "libavutil/fifo.h"
+#include "libavutil/opt.h"
+#include "libavutil/tree.h"
+
+#include "bsf.h"
+#include "bsf_internal.h"
+#include "cbs.h"
+#include "cbs_h264.h"
+#include "h264_parse.h"
+#include "h264_ps.h"
+
+typedef struct DTS2PTSNode {
+int64_t  dts;
+int64_t duration;
+int  poc;
+int  gop;
+} DTS2PTSNode;
+
+typedef struct DTS2PTSFrame {
+AVPacket*pkt;
+int  poc;
+int poc_diff;
+int  gop;
+} DTS2PTSFrame;
+
+typedef struct DTS2PTSH264Context {
+H264POCContext poc;
+SPS sps;
+int poc_diff;
+int last_poc;
+int highest_poc;
+int picture_structure;
+} DTS2PTSH264Context;
+
+typedef struct DTS2PTSContext {
+struct AVTreeNode *root;
+AVFifo *fifo;
+
+// Codec specific function pointers and constants
+int (*init)(AVBSFContext *ctx);
+int (*filter)(AVBSFContext *ctx);
+void (*flush)(AVBSFContext *ctx);
+size_t fifo_size;
+
+CodedBitstreamContext *cbc;
+

[FFmpeg-devel] [PATCH] tools/.gitignore: Add missing tools

2022-09-04 Thread Andreas Rheinhardt

Signed-off-by: Andreas Rheinhardt 
---
Will apply tomorrow unless there are objections.

 tools/.gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/.gitignore b/tools/.gitignore
index c0958f40cb..7c45896923 100644
--- a/tools/.gitignore
+++ b/tools/.gitignore
@@ -3,6 +3,7 @@
 /bisect.need
 /crypto_bench
 /cws2fws
+/enum_options
 /fourcc2pixfmt
 /ffescape
 /ffeval
@@ -12,8 +13,10 @@
 /pktdumper
 /probetest
 /qt-faststart
+/scale_slice_test
 /sidxindex
 /trasher
 /seek_print
 /uncoded_frame
+/venc_data_dump
 /zmqsend
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avutil/tests/.gitignore: Add channel_layout testtool

2022-09-04 Thread James Almer


On 9/4/2022 8:36 PM, Andreas Rheinhardt wrote:

Signed-off-by: Andreas Rheinhardt 
---
Will apply this tomorrow unless there are objections.


LGTM. Backport it to the 5.1 branch too, please.



  libavutil/tests/.gitignore | 1 +
  1 file changed, 1 insertion(+)

diff --git a/libavutil/tests/.gitignore b/libavutil/tests/.gitignore
index 919010e4fc..87895912f5 100644
--- a/libavutil/tests/.gitignore
+++ b/libavutil/tests/.gitignore
@@ -9,6 +9,7 @@
  /bprint
  /camellia
  /cast5
+/channel_layout
  /color_utils
  /cpu
  /cpu_init

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avutil/tests/.gitignore: Add channel_layout testtool

2022-09-04 Thread Andreas Rheinhardt

Signed-off-by: Andreas Rheinhardt 
---
Will apply this tomorrow unless there are objections.

 libavutil/tests/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libavutil/tests/.gitignore b/libavutil/tests/.gitignore
index 919010e4fc..87895912f5 100644
--- a/libavutil/tests/.gitignore
+++ b/libavutil/tests/.gitignore
@@ -9,6 +9,7 @@
 /bprint
 /camellia
 /cast5
+/channel_layout
 /color_utils
 /cpu
 /cpu_init
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avfilter/avfilter: Don't use AVFrame.channel_layout

2022-09-04 Thread Andreas Rheinhardt

Signed-off-by: Andreas Rheinhardt 
---
 libavfilter/avfilter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavfilter/avfilter.c b/libavfilter/avfilter.c
index 965f5d0f63..6740339808 100644
--- a/libavfilter/avfilter.c
+++ b/libavfilter/avfilter.c
@@ -62,7 +62,7 @@ static void tlog_ref(void *ctx, AVFrame *ref, int end)
 }
 if (ref->nb_samples) {
 ff_tlog(ctx, " cl:%"PRId64"d n:%d r:%d",
-ref->channel_layout,
+ref->ch_layout.order == AV_CHANNEL_ORDER_NATIVE ? 
ref->ch_layout.u.mask : 0,
 ref->nb_samples,
 ref->sample_rate);
 }
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec/ffv1: Only allocate ThreadFrames for the decoder

2022-09-04 Thread Andreas Rheinhardt

The FFV1 decoder only uses the last frame's data to conceal
errors. The encoder does not have this problem and therefore
only uses the current frame and none of the ThreadFrames.
So only allocate them for the decoder.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/ffv1.c| 13 -
 libavcodec/ffv1dec.c | 23 ++-
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/libavcodec/ffv1.c b/libavcodec/ffv1.c
index c8781cdaaa..b6204740ed 100644
--- a/libavcodec/ffv1.c
+++ b/libavcodec/ffv1.c
@@ -43,11 +43,6 @@ av_cold int ff_ffv1_common_init(AVCodecContext *avctx)
 s->avctx = avctx;
 s->flags = avctx->flags;
 
-s->picture.f = av_frame_alloc();
-s->last_picture.f = av_frame_alloc();
-if (!s->picture.f || !s->last_picture.f)
-return AVERROR(ENOMEM);
-
 s->width  = avctx->width;
 s->height = avctx->height;
 
@@ -198,14 +193,6 @@ av_cold int ff_ffv1_close(AVCodecContext *avctx)
 FFV1Context *s = avctx->priv_data;
 int i, j;
 
-if (s->picture.f)
-ff_thread_release_ext_buffer(avctx, >picture);
-av_frame_free(>picture.f);
-
-if (s->last_picture.f)
-ff_thread_release_ext_buffer(avctx, >last_picture);
-av_frame_free(>last_picture.f);
-
 for (j = 0; j < s->max_slice_count; j++) {
 FFV1Context *fs = s->slice_context[j];
 for (i = 0; i < s->plane_count; i++) {
diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c
index 794c58cc40..d4bc60a7da 100644
--- a/libavcodec/ffv1dec.c
+++ b/libavcodec/ffv1dec.c
@@ -823,6 +823,11 @@ static av_cold int decode_init(AVCodecContext *avctx)
 if ((ret = ff_ffv1_common_init(avctx)) < 0)
 return ret;
 
+f->picture.f  = av_frame_alloc();
+f->last_picture.f = av_frame_alloc();
+if (!f->picture.f || !f->last_picture.f)
+return AVERROR(ENOMEM);
+
 if (avctx->extradata_size > 0 && (ret = read_extra_header(f)) < 0)
 return ret;
 
@@ -1068,6 +1073,22 @@ static int update_thread_context(AVCodecContext *dst, 
const AVCodecContext *src)
 }
 #endif
 
+static av_cold int ffv1_decode_close(AVCodecContext *avctx)
+{
+FFV1Context *const s = avctx->priv_data;
+
+if (s->picture.f) {
+ff_thread_release_ext_buffer(avctx, >picture);
+av_frame_free(>picture.f);
+}
+
+if (s->last_picture.f) {
+ff_thread_release_ext_buffer(avctx, >last_picture);
+av_frame_free(>last_picture.f);
+}
+return ff_ffv1_close(avctx);
+}
+
 const FFCodec ff_ffv1_decoder = {
 .p.name = "ffv1",
 CODEC_LONG_NAME("FFmpeg video codec #1"),
@@ -1075,7 +1096,7 @@ const FFCodec ff_ffv1_decoder = {
 .p.id   = AV_CODEC_ID_FFV1,
 .priv_data_size = sizeof(FFV1Context),
 .init   = decode_init,
-.close  = ff_ffv1_close,
+.close  = ffv1_decode_close,
 FF_CODEC_DECODE_CB(decode_frame),
 UPDATE_THREAD_CONTEXT(update_thread_context),
 .p.capabilities = AV_CODEC_CAP_DR1 /*| AV_CODEC_CAP_DRAW_HORIZ_BAND*/ |
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avcodec/libtheoraenc: Do not use invalid error code

2022-09-04 Thread Andreas Rheinhardt

Andreas Rheinhardt:
> Signed-off-by: Andreas Rheinhardt 
> ---
>  libavcodec/libtheoraenc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/libtheoraenc.c b/libavcodec/libtheoraenc.c
> index 22835553d6..92bf3a133c 100644
> --- a/libavcodec/libtheoraenc.c
> +++ b/libavcodec/libtheoraenc.c
> @@ -119,7 +119,7 @@ static int get_stats(AVCodecContext *avctx, int eos)
>  return 0;
>  #else
>  av_log(avctx, AV_LOG_ERROR, "libtheora too old to support 2pass\n");
> -return AVERROR(ENOSUP);
> +return AVERROR(ENOTSUP);
>  #endif
>  }
>  
> @@ -158,7 +158,7 @@ static int submit_stats(AVCodecContext *avctx)
>  return 0;
>  #else
>  av_log(avctx, AV_LOG_ERROR, "libtheora too old to support 2pass\n");
> -return AVERROR(ENOSUP);
> +return AVERROR(ENOTSUP);
>  #endif
>  }
>  

Will apply this patch tomorrow unless there are objections.

- Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/8] fftools/ffprobe: Report initial and trailing padding

2022-09-04 Thread Andreas Rheinhardt

Andreas Rheinhardt:
> Signed-off-by: Andreas Rheinhardt 
> ---
> trailing_padding seems to be unused and could actually be deprecated.
> 

Will apply this patchset tomorrow unless there are objections. Notice
that I really like to have a second opinion on whether trailing_padding
should be printed at all.

>  doc/ffprobe.xsd | 2 ++
>  fftools/ffprobe.c   | 3 +++
>  tests/ref/fate/concat-demuxer-extended-lavf-mxf | 2 +-
>  tests/ref/fate/concat-demuxer-extended-lavf-mxf_d10 | 2 +-
>  tests/ref/fate/concat-demuxer-simple1-lavf-mxf  | 2 +-
>  tests/ref/fate/concat-demuxer-simple1-lavf-mxf_d10  | 2 +-
>  tests/ref/fate/concat-demuxer-simple2-lavf-ts   | 2 +-
>  tests/ref/fate/ffprobe_compact  | 2 +-
>  tests/ref/fate/ffprobe_csv  | 2 +-
>  tests/ref/fate/ffprobe_default  | 2 ++
>  tests/ref/fate/ffprobe_flat | 2 ++
>  tests/ref/fate/ffprobe_ini  | 2 ++
>  tests/ref/fate/ffprobe_json | 2 ++
>  tests/ref/fate/ffprobe_xml  | 2 +-
>  tests/ref/fate/flv-demux| 2 +-
>  tests/ref/fate/gapless-mp3-side-data| 2 +-
>  tests/ref/fate/mxf-probe-applehdr10 | 4 
>  tests/ref/fate/mxf-probe-d10| 2 ++
>  tests/ref/fate/mxf-probe-dv25   | 4 
>  tests/ref/fate/oggopus-demux| 2 +-
>  tests/ref/fate/ts-demux | 4 ++--
>  tests/ref/fate/ts-opus-demux| 2 +-
>  22 files changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/doc/ffprobe.xsd b/doc/ffprobe.xsd
> index 6e678a9970..6052a5eff4 100644
> --- a/doc/ffprobe.xsd
> +++ b/doc/ffprobe.xsd
> @@ -246,6 +246,8 @@
>
>
>
> +  
> +  
>  
>
> use="required"/>
> diff --git a/fftools/ffprobe.c b/fftools/ffprobe.c
> index 3344a06409..9eb20fa4cd 100644
> --- a/fftools/ffprobe.c
> +++ b/fftools/ffprobe.c
> @@ -3044,6 +3044,9 @@ static int show_stream(WriterContext *w, 
> AVFormatContext *fmt_ctx, int stream_id
>  }
>  
>  print_int("bits_per_sample", av_get_bits_per_sample(par->codec_id));
> +
> +print_int("initial_padding",  par->initial_padding);
> +print_int("trailing_padding", par->trailing_padding);
>  break;
>  
>  case AVMEDIA_TYPE_SUBTITLE:
> diff --git a/tests/ref/fate/concat-demuxer-extended-lavf-mxf 
> b/tests/ref/fate/concat-demuxer-extended-lavf-mxf
> index 543c7d6a8c..973ce5d4a4 100644
> --- a/tests/ref/fate/concat-demuxer-extended-lavf-mxf
> +++ b/tests/ref/fate/concat-demuxer-extended-lavf-mxf
> @@ -1 +1 @@
> -d367d7f6df7292cbf454c6d07fca9b04 
> *tests/data/fate/concat-demuxer-extended-lavf-mxf.ffprobe
> +3fa8632676f0e40c42be38b842794afc 
> *tests/data/fate/concat-demuxer-extended-lavf-mxf.ffprobe
> diff --git a/tests/ref/fate/concat-demuxer-extended-lavf-mxf_d10 
> b/tests/ref/fate/concat-demuxer-extended-lavf-mxf_d10
> index 57b22848b9..905ae46343 100644
> --- a/tests/ref/fate/concat-demuxer-extended-lavf-mxf_d10
> +++ b/tests/ref/fate/concat-demuxer-extended-lavf-mxf_d10
> @@ -1 +1 @@
> -1fac6962d4c5f1070d0d2db5ab7d86aa 
> *tests/data/fate/concat-demuxer-extended-lavf-mxf_d10.ffprobe
> +f88c5d6b16ec3ffd5d35b64a031489be 
> *tests/data/fate/concat-demuxer-extended-lavf-mxf_d10.ffprobe
> diff --git a/tests/ref/fate/concat-demuxer-simple1-lavf-mxf 
> b/tests/ref/fate/concat-demuxer-simple1-lavf-mxf
> index dcc98e9bdb..c227fa534c 100644
> --- a/tests/ref/fate/concat-demuxer-simple1-lavf-mxf
> +++ b/tests/ref/fate/concat-demuxer-simple1-lavf-mxf
> @@ -100,4 +100,4 @@ 
> video|0|33|1.32|33|1.32|1|0.04|12362|195072|__|1|Strings Metadata
>  audio|1|65280|1.36|65280|1.36|1920|0.04|3840|207872|K_|1|Strings 
> Metadata
>  video|0|37|1.48|34|1.36|1|0.04|24786|212480|K_|1|Strings Metadata
>  
> 0|mpeg2video|4|video|[0][0][0][0]|0x|352|288|0|0|0|0|1|1:1|11:9|yuv420p|8|tv|unknown|unknown|unknown|left|progressive|1|N/A|25/1|25/1|1/25|N/A|N/A|N/A|N/A|N/A|N/A|N/A|N/A|N/A|51|22|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0x060A2B340101010501010D001301|CPB
>  properties|0|0|0|49152|-1
> -1|pcm_s16le|unknown|audio|[0][0][0][0]|0x|s16|48000|1|unknown|16|N/A|0/0|0/0|1/48000|0|0.00|N/A|N/A|768000|N/A|N/A|N/A|N/A|50|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0x060A2B340101010501010D001301
> +1|pcm_s16le|unknown|audio|[0][0][0][0]|0x|s16|48000|1|unknown|16|0|0|N/A|0/0|0/0|1/48000|0|0.00|N/A|N/A|768000|N/A|N/A|N/A|N/A|50|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0x060A2B340101010501010D001301
> diff --git a/tests/ref/fate/concat-demuxer-simple1-lavf-mxf_d10 
> b/tests/ref/fate/concat-demuxer-simple1-lavf-mxf_d10
> index 8937724ed1..f26e3c2e1b 100644

Re: [FFmpeg-devel] [PATCH v3 2/3] lavf/dashdec: Multithreaded DASH initialization

2022-09-04 Thread Andreas Rheinhardt

Lukas Fellechner:
> Andreas Rheinhardt andreas.rheinhardt at outlook.com
> Wed Aug 31 05:54:12 EEST 2022
>>
>>> +#if HAVE_THREADS
>>> +
>>> +struct work_pool_data
>>> +{
>>> +AVFormatContext *ctx;
>>> +struct representation *pls;
>>> +struct representation *common_pls;
>>> +pthread_mutex_t *common_mutex;
>>> +pthread_cond_t *common_condition;
>>> +int is_common;
>>> +int is_started;
>>> +int result;
>>> +};
>>> +
>>> +struct thread_data
>>
>> This is against our naming conventions: CamelCase for struct tags and
>> typedefs, lowercase names with underscore for variable names.
> 
> In the code files I looked at, CamelCase is only used for typedef structs.
> All structs without typedef are lower case with underscores, so I aligned
> with that, originally.
> 
> I will make this a typedef struct and use CamelCase for next patch.
> 
>>> +static int init_streams_multithreaded(AVFormatContext *s, int nstreams, 
>>> int threads)
>>> +{
>>> +DASHContext *c = s->priv_data;
>>> +int ret = 0;
>>> +int stream_index = 0;
>>> +int i;
>>
>> We allow "for (int i = 0;"
> 
> Oh, I did not know that, and I did not see it being used here anywhere.
> Will use in next patch, it's my preferred style, actually.
> 
>>> +
>>> +// alloc data
>>> +struct work_pool_data *init_data = (struct 
>>> work_pool_data*)av_mallocz(sizeof(struct work_pool_data) * nstreams);
>>> +if (!init_data)
>>> +return AVERROR(ENOMEM);
>>> +
>>> +struct thread_data *thread_data = (struct 
>>> thread_data*)av_mallocz(sizeof(struct thread_data) * threads);
>>> +if (!thread_data)
>>> +return AVERROR(ENOMEM);
>>
>> 1. init_data leaks here on error.
>> 2. In fact, it seems to me that both init_data and thread_data are
>> nowhere freed.
> 
> True, I must have lost the av_free call at some point.
> 
>>> +// init work pool data
>>> +struct work_pool_data* current_data = init_data;
>>> +
>>> +for (i = 0; i < c->n_videos; i++) {
>>> +create_work_pool_data(s, stream_index, c->videos[i],
>>> +c->is_init_section_common_video ? c->videos[0] : NULL,
>>> +current_data, _video_mutex, _video_cond);
>>> +
>>> +stream_index++;
>>> +current_data++;
>>> +}
>>> +
>>> +for (i = 0; i < c->n_audios; i++) {
>>> +create_work_pool_data(s, stream_index, c->audios[i],
>>> +c->is_init_section_common_audio ? c->audios[0] : NULL,
>>> +current_data, _audio_mutex, _audio_cond);
>>> +
>>> +stream_index++;
>>> +current_data++;
>>> +}
>>> +
>>> +for (i = 0; i < c->n_subtitles; i++) {
>>> +create_work_pool_data(s, stream_index, c->subtitles[i],
>>> +c->is_init_section_common_subtitle ? c->subtitles[0] : NULL,
>>> +current_data, _subtitle_mutex, _subtitle_cond);
>>> +
>>> +stream_index++;
>>> +current_data++;
>>> +}
>>
>> This is very repetitive.
> 
> Will improve for next patch.
> 
>> 1. We actually have an API to process multiple tasks by different
>> threads: Look at libavutil/slicethread.h. Why can't you reuse that?
>> 2. In case initialization of one of the conditions/mutexes fails, you
>> are nevertheless destroying them; you are even destroying completely
>> uninitialized mutexes. This is undefined behaviour. Checking the result
>> of it does not fix this.
>>
>> - Andreas
> 
> 1. The slicethread implementation is pretty hard to understand.
> I was not sure if it can be used for normal parallelization, because
> it looked more like some kind of thread pool for continuous data
> processing. But after taking a second look, I think I can use it here.
> I will try and see if it works well.
> 
> 2. I was not aware that this is undefined behavior. Will switch to
> PTHREAD_MUTEX_INITIALIZER and PTHREAD_COND_INITIALIZER macros,
> which return a safely initialized mutex/cond.
> 

"The behavior is undefined if the value specified by the mutex argument
to pthread_mutex_destroy() does not refer to an initialized mutex."
(From
https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_destroy.html)

Furthermore: "In cases where default mutex attributes are appropriate,
the macro PTHREAD_MUTEX_INITIALIZER can be used to initialize mutexes.
The effect shall be equivalent to dynamic initialization by a call to
pthread_mutex_init() with parameter attr specified as NULL, except that
no error checks are performed." The last sentence sounds as if one would
then have to check mutex locking.

Moreover, older pthread standards did not allow to use
PTHREAD_MUTEX_INITIALIZER with non-static mutexes, so I don't know
whether we can use that. Also our pthreads-wrapper on top of
OS/2-threads does not provide PTHREAD_COND_INITIALIZER (which is used
nowhere in the codebase).

> I also noticed one cross-thread issue that I will solve in the next patch.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org

[FFmpeg-devel] [PATCH 9/9] avcodec/wmaprodec: Use symbol table more efficiently

2022-09-04 Thread Andreas Rheinhardt

By using a symbol table one can already bake in applying
a LUT on the return value of get_vlc2(). So change the
symbol table for the vec2 and vec4 tables to avoid
using the symbol_to_vec2/4 LUTs.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/wmaprodata.h | 147 +---
 libavcodec/wmaprodec.c  |  22 +++---
 2 files changed, 72 insertions(+), 97 deletions(-)

diff --git a/libavcodec/wmaprodata.h b/libavcodec/wmaprodata.h
index 3a30be40b5..057ce1d02d 100644
--- a/libavcodec/wmaprodata.h
+++ b/libavcodec/wmaprodata.h
@@ -325,67 +325,73 @@ static const float coef1_level[HUFF_COEF1_SIZE] = {
  */
 #define HUFF_VEC4_SIZE127
 #define HUFF_VEC4_MAXBITS  14
-static const uint8_t vec4_table[HUFF_VEC4_SIZE][2] = {
-{ 126,  1 }, {  76,  6 }, {  28,  8 }, {  93, 10 }, {  69, 10 },
-{ 117, 10 }, {  98, 10 }, {  31,  8 }, {  84,  8 }, {  38, 10 },
-{   9, 10 }, { 108,  9 }, {  96,  8 }, {  73,  8 }, {  67,  9 },
-{ 121, 12 }, {  90, 12 }, { 110, 11 }, {  35, 12 }, {  54, 12 },
-{  17, 11 }, {  86,  9 }, {  44,  9 }, {  82,  8 }, {  41,  8 },
-{  36,  9 }, { 103,  9 }, {  78,  8 }, {  66,  8 }, {  11,  9 },
-{  97,  9 }, {   4, 12 }, {  19, 12 }, {  70, 12 }, {  55, 14 },
-{  20, 14 }, {   5, 13 }, {  51, 11 }, {  94, 11 }, { 106,  9 },
-{ 101,  8 }, {  83,  9 }, {  42,  9 }, {  45, 11 }, {  46, 11 },
-{ 112, 10 }, {  99,  9 }, {   8,  8 }, {  56,  6 }, {   1,  6 },
-{  75,  6 }, {  27,  6 }, {  72,  6 }, {  62,  6 }, { 113, 11 },
-{ 124, 11 }, { 114, 10 }, {  15, 11 }, { 116, 11 }, {  24, 10 },
-{  59, 10 }, {  39, 11 }, {  10, 11 }, { 118,  9 }, { 105,  7 },
-{  71,  6 }, {  77,  7 }, {  85,  7 }, {  21,  6 }, {   7,  6 },
-{   6,  6 }, {   0,  5 }, {  79,  7 }, { 100, 11 }, {  48, 11 },
-{  87, 10 }, { 107,  9 }, {  92,  8 }, {  57,  6 }, { 119,  9 },
-{  29,  9 }, {  16, 10 }, {  49, 10 }, {  64,  9 }, {  95,  8 },
-{  58,  8 }, {  26,  6 }, {  61,  6 }, {  22,  6 }, {  23,  8 },
-{  81,  8 }, {  13,  9 }, {  53, 12 }, {  52, 12 }, { 123, 11 },
-{  33, 10 }, {  12,  8 }, {  40,  8 }, {  30,  8 }, {  47, 10 },
-{ 111, 10 }, {   3, 10 }, {  68, 10 }, {  74,  9 }, { 115,  9 },
-{  91,  8 }, { 120, 10 }, {  25, 11 }, { 122, 11 }, {  89,  9 },
-{   2,  8 }, {  37,  8 }, {  65,  8 }, {  43,  9 }, {  34,  9 },
-{  14, 10 }, {  60, 11 }, {  18, 12 }, { 125, 12 }, {  50,  9 },
-{  80,  9 }, {  88,  9 }, { 109,  8 }, {  32,  8 }, { 102,  7 },
-{ 104,  7 }, {  63,  7 },
+static const uint8_t vec4_lens[HUFF_VEC4_SIZE] = {
+ 1,  6,  8, 10, 10, 10, 10,  8,  8, 10, 10,  9,  8,  8,  9, 12, 12, 11,
+12, 12, 11,  9,  9,  8,  8,  9,  9,  8,  8,  9,  9, 12, 12, 12, 14, 14,
+13, 11, 11,  9,  8,  9,  9, 11, 11, 10,  9,  8,  6,  6,  6,  6,  6,  6,
+11, 11, 10, 11, 11, 10, 10, 11, 11,  9,  7,  6,  7,  7,  6,  6,  6,  5,
+ 7, 11, 11, 10,  9,  8,  6,  9,  9, 10, 10,  9,  8,  8,  6,  6,  6,  8,
+ 8,  9, 12, 12, 11, 10,  8,  8,  8, 10, 10, 10, 10,  9,  9,  8, 10, 11,
+11,  9,  8,  8,  8,  9,  9, 10, 11, 12, 12,  9,  9,  9,  8,  8,  7,  7,
+ 7,
+};
+
+/* The entry in the following table with symbol zero indicates
+ * that four further entries are coded explicitly; all other
+ * entries encode four numbers in the 0..15 range via
+ * the four nibbles of (symbol - 1). */
+static const uint16_t vec4_syms[HUFF_VEC4_SIZE] = {
+0,  4370,   275,  8195,  4146, 12545,  8225,   290,  4625,   515,
+   20,  8706,  8210,  4355,  4131, 16385,  5121,  8961,   321,  1041,
+   51,  4641,   546,  4610,   530,   513,  8451,  4385,  4130,33,
+ 8211, 5,66,  4161,  1281,81, 6,   801,  8196,  8481,
+ 8449,  4611,   531,   561,   769, 12290,  8226,19,  4097, 2,
+ 4369,   274,  4354,  4114, 12291, 16641, 12305,49, 12321,   260,
+ 4100,   516,21, 12546,  8466,  4353,  4371,  4626,   257,18,
+   17, 1,  4386,  8241,   771,  4865,  8705,  8194,  4098, 12561,
+  276,50,   785,  4116,  8209,  4099,   273,  4113,   258,   259,
+ 4609,35,  1026,  1025, 16401,   305,34,   529,   289,   770,
+12289, 4,  4145,  4356, 12306,  8193, 12801,   261, 16386,  4881,
+3,   514,  4129,   545,   306,36,  4101,65, 20481,   786,
+ 4401,  4866,  8721,   291,  8450,  8465,  4115,
 };
 
 
 #define HUFF_VEC2_SIZE137
 #define HUFF_VEC2_MAXBITS  12
+/* The entry in the following table with symbol zero indicates
+ * that two further entries are coded explicitly; all other
+ * entries encode two numbers in the 0..15 range via
+ * (symbol - 1) & 0xF and (symbol - 1) >> 4. */
 static const uint8_t vec2_table[HUFF_VEC2_SIZE][2] = {
-{  18,  5 }, { 119, 10 }, { 132, 11 }, {  44, 11 }, {  68, 10 },
-{ 121, 11 }, {  11, 11 }, {  75,  8 }, {  72,  7 }, {  36,  7 },
-{ 104,  9 }, { 122, 10 }, {  27, 10 }, {  88,  9 }, {  66,  9 },
-{  33,  5 }, {  48,  6 }, {  91,  9 }, {   7,  9 }, {  85,

[FFmpeg-devel] [PATCH 8/9] avcodec/wmaprodec: Move applying offset to VLC creation

2022-09-04 Thread Andreas Rheinhardt

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/wmaprodec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/wmaprodec.c b/libavcodec/wmaprodec.c
index 1909ce2dad..698841dcaf 100644
--- a/libavcodec/wmaprodec.c
+++ b/libavcodec/wmaprodec.c
@@ -320,7 +320,7 @@ static av_cold void decode_init_static(void)
 {
 INIT_VLC_STATIC_FROM_LENGTHS(_vlc, SCALEVLCBITS, HUFF_SCALE_SIZE,
  _table[0][1], 2,
- _table[0][0], 2, 1, 0, 0, 616);
+ _table[0][0], 2, 1, -60, 0, 616);
 INIT_VLC_STATIC_FROM_LENGTHS(_rl_vlc, VLCBITS, HUFF_SCALE_RL_SIZE,
  _rl_table[0][1], 2,
  _rl_table[0][0], 2, 1, 0, 0, 1406);
@@ -1056,7 +1056,7 @@ static int decode_scale_factors(WMAProDecodeCtx* s)
 s->channel[c].scale_factor_step = get_bits(>gb, 2) + 1;
 val = 45 / s->channel[c].scale_factor_step;
 for (sf = s->channel[c].scale_factors; sf < sf_end; sf++) {
-val += get_vlc2(>gb, sf_vlc.table, SCALEVLCBITS, 
SCALEMAXDEPTH) - 60;
+val += get_vlc2(>gb, sf_vlc.table, SCALEVLCBITS, 
SCALEMAXDEPTH);
 *sf = val;
 }
 } else {
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 7/9] avcodec/wmaprodec: Use ff_init_vlc_from_lengths() instead of init_vlc

2022-09-04 Thread Andreas Rheinhardt

It allows to replace tables of big codes (uint16_t and uint32_t)
by tables of smaller symbols (mostly uint8_t).

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/wmaprodata.h | 551 
 libavcodec/wmaprodec.c  |  42 +--
 2 files changed, 236 insertions(+), 357 deletions(-)

diff --git a/libavcodec/wmaprodata.h b/libavcodec/wmaprodata.h
index 53824799d5..3a30be40b5 100644
--- a/libavcodec/wmaprodata.h
+++ b/libavcodec/wmaprodata.h
@@ -48,42 +48,32 @@ static const uint16_t critical_freq[] = {
  */
 #define HUFF_SCALE_SIZE121
 #define HUFF_SCALE_MAXBITS  19
-static const uint16_t scale_huffcodes[HUFF_SCALE_SIZE] = {
-0xE639, 0xE6C2, 0xE6C1, 0xE6C0, 0xE63F, 0xE63E, 0xE63D, 0xE63C,
-0xE63B, 0xE63A, 0xE638, 0xE637, 0xE636, 0xE635, 0xE634, 0xE632,
-0xE633, 0xE620, 0x737B, 0xE610, 0xE611, 0xE612, 0xE613, 0xE614,
-0xE615, 0xE616, 0xE617, 0xE618, 0xE619, 0xE61A, 0xE61B, 0xE61C,
-0xE61D, 0xE61E, 0xE61F, 0xE6C3, 0xE621, 0xE622, 0xE623, 0xE624,
-0xE625, 0xE626, 0xE627, 0xE628, 0xE629, 0xE62A, 0xE62B, 0xE62C,
-0xE62D, 0xE62E, 0xE62F, 0xE630, 0xE631, 0x1CDF, 0x0E60, 0x0399,
-0x00E7, 0x001D, 0x, 0x0001, 0x0001, 0x0001, 0x0002, 0x0006,
-0x0002, 0x0007, 0x0006, 0x000F, 0x0038, 0x0072, 0x039A, 0xE6C4,
-0xE6C5, 0xE6C6, 0xE6C7, 0xE6C8, 0xE6C9, 0xE6CA, 0xE6CB, 0xE6CC,
-0xE6CD, 0xE6CE, 0xE6CF, 0xE6D0, 0xE6D1, 0xE6D2, 0xE6D3, 0xE6D4,
-0xE6D5, 0xE6D6, 0xE6D7, 0xE6D8, 0xE6D9, 0xE6DA, 0xE6DB, 0xE6DC,
-0xE6DD, 0xE6DE, 0xE6DF, 0xE6E0, 0xE6E1, 0xE6E2, 0xE6E3, 0xE6E4,
-0xE6E5, 0xE6E6, 0xE6E7, 0xE6E8, 0xE6E9, 0xE6EA, 0xE6EB, 0xE6EC,
-0xE6ED, 0xE6EE, 0xE6EF, 0xE6F0, 0xE6F1, 0xE6F2, 0xE6F3, 0xE6F4,
-0xE6F5,
-};
-
-static const uint8_t scale_huffbits[HUFF_SCALE_SIZE] = {
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 18, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 16, 15, 13,
-11,  8,  5,  2,  1,  3,  5,  6,
- 6,  7,  7,  7,  9, 10, 13, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19, 19, 19, 19, 19, 19, 19, 19,
-19,
+static const uint8_t scale_table[HUFF_SCALE_SIZE][2] = {
+{  58,  5 }, {  64,  6 }, {  66,  7 }, {  65,  7 }, {  62,  5 },
+{  63,  6 }, {  68,  9 }, {  69, 10 }, {  54, 15 }, {  19, 19 },
+{  20, 19 }, {  21, 19 }, {  22, 19 }, {  23, 19 }, {  24, 19 },
+{  25, 19 }, {  26, 19 }, {  27, 19 }, {  28, 19 }, {  29, 19 },
+{  30, 19 }, {  31, 19 }, {  32, 19 }, {  33, 19 }, {  34, 19 },
+{  17, 19 }, {  36, 19 }, {  37, 19 }, {  38, 19 }, {  39, 19 },
+{  40, 19 }, {  41, 19 }, {  42, 19 }, {  43, 19 }, {  44, 19 },
+{  45, 19 }, {  46, 19 }, {  47, 19 }, {  48, 19 }, {  49, 19 },
+{  50, 19 }, {  51, 19 }, {  52, 19 }, {  15, 19 }, {  16, 19 },
+{  14, 19 }, {  13, 19 }, {  12, 19 }, {  11, 19 }, {  10, 19 },
+{   0, 19 }, {   9, 19 }, {   8, 19 }, {   7, 19 }, {   6, 19 },
+{   5, 19 }, {   4, 19 }, {  55, 13 }, {  70, 13 }, {   3, 19 },
+{   2, 19 }, {   1, 19 }, {  35, 19 }, {  71, 19 }, {  72, 19 },
+{  73, 19 }, {  74, 19 }, {  75, 19 }, {  76, 19 }, {  77, 19 },
+{  78, 19 }, {  79, 19 }, {  80, 19 }, {  81, 19 }, {  82, 19 },
+{  83, 19 }, {  84, 19 }, {  85, 19 }, {  86, 19 }, {  87, 19 },
+{  88, 19 }, {  89, 19 }, {  90, 19 }, {  91, 19 }, {  92, 19 },
+{  93, 19 }, {  94, 19 }, {  95, 19 }, {  96, 19 }, {  97, 19 },
+{  98, 19 }, {  99, 19 }, { 100, 19 }, { 101, 19 }, { 102, 19 },
+{ 103, 19 }, { 104, 19 }, { 105, 19 }, { 106, 19 }, { 107, 19 },
+{ 108, 19 }, { 109, 19 }, { 110, 19 }, { 111, 19 }, { 112, 19 },
+{ 113, 19 }, { 114, 19 }, { 115, 19 }, { 116, 19 }, { 117, 19 },
+{ 118, 19 }, { 119, 19 }, { 120, 19 }, {  18, 18 }, {  53, 16 },
+{  56, 11 }, {  57,  8 }, {  67,  7 }, {  61,  3 }, {  59,  2 },
+{  60,  1 },
 };
 /** @} */
 
@@ -94,46 +84,31 @@ static const uint8_t scale_huffbits[HUFF_SCALE_SIZE] = {
  */
 #define HUFF_SCALE_RL_SIZE120
 #define HUFF_SCALE_RL_MAXBITS  21
-static const uint32_t scale_rl_huffcodes[HUFF_SCALE_RL_SIZE] = {
-0x00010C, 0x01, 0x10FE2A, 0x03, 0x03, 0x01, 0x13,
-0x20, 0x29, 0x14, 0x16, 0x45, 0x49, 0x2F,
-0x42, 0x8E, 0x8F, 0x000129, 0x09, 0x0D, 0x0004AC,
-0x2C, 0x000561, 0x0002E6, 0x00087C, 0x0002E2, 0x00095C, 0x18,
-0x01, 0x16, 0x44, 0x2A, 0x07, 0x000159, 0x000143,
-0x000128, 0x00015A, 0x00012D, 0x2B, 0xA0, 0x000142, 0x00012A,
-0x0002EF, 0x0004AF, 0x00087D, 0x004AE9, 0x0043F9, 0x67, 0x000199,
-0x002B05, 0x001583, 0x0021FE, 0x10FE2C, 0x04, 0x2E, 0x00010D,
-0x0A, 0x000244, 0x17, 0x000245, 0x11, 0x00010E,

[FFmpeg-devel] [PATCH 6/9] avcodec/wmavoice: Avoid code table

2022-09-04 Thread Andreas Rheinhardt

These codes are already ordered from left-to-right in the tree,
so one can just use ff_init_vlc_static_from_lengths().

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/wmavoice.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/libavcodec/wmavoice.c b/libavcodec/wmavoice.c
index 8fa9db63ee..4438089e51 100644
--- a/libavcodec/wmavoice.c
+++ b/libavcodec/wmavoice.c
@@ -320,18 +320,10 @@ static av_cold void wmavoice_init_static_data(void)
 10, 10, 10, 12, 12, 12,
 14, 14, 14, 14
 };
-static const uint16_t codes[] = {
-  0x, 0x0001, 0x0002,//  00/01/10
-  0x000c, 0x000d, 0x000e,//   11+00/01/10
-  0x003c, 0x003d, 0x003e,// +00/01/10
-  0x00fc, 0x00fd, 0x00fe,//   11+00/01/10
-  0x03fc, 0x03fd, 0x03fe,// +00/01/10
-  0x0ffc, 0x0ffd, 0x0ffe,//   11+00/01/10
-  0x3ffc, 0x3ffd, 0x3ffe, 0x3fff // +xx
-};
 
-INIT_VLC_STATIC(_type_vlc, VLC_NBITS, sizeof(bits),
-bits, 1, 1, codes, 2, 2, 132);
+INIT_VLC_STATIC_FROM_LENGTHS(_type_vlc, VLC_NBITS,
+ FF_ARRAY_ELEMS(bits), bits,
+ 1, NULL, 0, 0, 0, 0, 132);
 }
 
 static av_cold void wmavoice_flush(AVCodecContext *ctx)
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 5/9] avcodec/dvdec: Avoid stack buffers

2022-09-04 Thread Andreas Rheinhardt

Instead reuse the destination RL VLC as scratch space.
This is possible, because the (implicit) codes here are already
ordered from left-to-right in the tree and because the codelengths
are increasing, which implies that mapping from VLC entries to the
corresponding entries used to initialize the VLC is monotonically
increasing. This means that one can reuse the right end of the
destination RL VLC to store the tables used to initialize the VLC
with.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/dvdata.h |  2 ++
 libavcodec/dvdec.c  | 26 +-
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/libavcodec/dvdata.h b/libavcodec/dvdata.h
index ae104096ad..31191a8475 100644
--- a/libavcodec/dvdata.h
+++ b/libavcodec/dvdata.h
@@ -27,6 +27,8 @@ extern const uint8_t ff_dv_quant_shifts[22][4];
 extern const uint8_t ff_dv_quant_offset[4];
 
 #define NB_DV_VLC 409
+/* The number of entries with value zero in ff_dv_vlc_level. */
+#define NB_DV_ZERO_LEVEL_ENTRIES 72
 
 extern const uint8_t ff_dv_vlc_len[NB_DV_VLC];
 extern const uint8_t ff_dv_vlc_run[NB_DV_VLC];
diff --git a/libavcodec/dvdec.c b/libavcodec/dvdec.c
index 3857ed1266..32085a3ba6 100644
--- a/libavcodec/dvdec.c
+++ b/libavcodec/dvdec.c
@@ -137,31 +137,30 @@ static av_cold void dv_init_static(void)
 {
 VLCElem vlc_buf[FF_ARRAY_ELEMS(dv_rl_vlc)] = { 0 };
 VLC dv_vlc = { .table = vlc_buf, .table_allocated = 
FF_ARRAY_ELEMS(vlc_buf) };
-uint8_tnew_dv_vlc_len[NB_DV_VLC * 2];
-uint8_tnew_dv_vlc_run[NB_DV_VLC * 2];
-int16_t  new_dv_vlc_level[NB_DV_VLC * 2];
+const unsigned offset = FF_ARRAY_ELEMS(dv_rl_vlc) - (2 * NB_DV_VLC - 
NB_DV_ZERO_LEVEL_ENTRIES);
+RL_VLC_ELEM *tmp = dv_rl_vlc + offset;
 int i, j;
 
 /* it's faster to include sign bit in a generic VLC parsing scheme */
 for (i = 0, j = 0; i < NB_DV_VLC; i++, j++) {
-new_dv_vlc_len[j]   = ff_dv_vlc_len[i];
-new_dv_vlc_run[j]   = ff_dv_vlc_run[i];
-new_dv_vlc_level[j] = ff_dv_vlc_level[i];
+tmp[j].len   = ff_dv_vlc_len[i];
+tmp[j].run   = ff_dv_vlc_run[i];
+tmp[j].level = ff_dv_vlc_level[i];
 
 if (ff_dv_vlc_level[i]) {
-new_dv_vlc_len[j]++;
+tmp[j].len++;
 
 j++;
-new_dv_vlc_len[j]   =  ff_dv_vlc_len[i] + 1;
-new_dv_vlc_run[j]   =  ff_dv_vlc_run[i];
-new_dv_vlc_level[j] = -ff_dv_vlc_level[i];
+tmp[j].len   =  ff_dv_vlc_len[i] + 1;
+tmp[j].run   =  ff_dv_vlc_run[i];
+tmp[j].level = -ff_dv_vlc_level[i];
 }
 }
 
 /* NOTE: as a trick, we use the fact the no codes are unused
  * to accelerate the parsing of partial codes */
 ff_init_vlc_from_lengths(_vlc, TEX_VLC_BITS, j,
- new_dv_vlc_len, 1,
+ [0].len, sizeof(tmp[0]),
  NULL, 0, 0, 0, INIT_VLC_USE_NEW_STATIC, NULL);
 av_assert1(dv_vlc.table_size == 1664);
 
@@ -174,8 +173,9 @@ static av_cold void dv_init_static(void)
 run   = 0;
 level = code;
 } else {
-run   = new_dv_vlc_run[code] + 1;
-level = new_dv_vlc_level[code];
+av_assert1(i <= code + offset);
+run   = tmp[code].run + 1;
+level = tmp[code].level;
 }
 dv_rl_vlc[i].len   = len;
 dv_rl_vlc[i].level = level;
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 4/9] avcodec/dvdec: Mark dv_init_static() as av_cold

2022-09-04 Thread Andreas Rheinhardt

Forgotten in 6d484671ecb612c32cbda0fab65f961743aff5f8.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/dvdec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/dvdec.c b/libavcodec/dvdec.c
index 8f68d2715d..3857ed1266 100644
--- a/libavcodec/dvdec.c
+++ b/libavcodec/dvdec.c
@@ -133,7 +133,7 @@ static const uint16_t dv_iweight_720_c[64] = {
 /* XXX: also include quantization */
 static RL_VLC_ELEM dv_rl_vlc[1664];
 
-static void dv_init_static(void)
+static av_cold void dv_init_static(void)
 {
 VLCElem vlc_buf[FF_ARRAY_ELEMS(dv_rl_vlc)] = { 0 };
 VLC dv_vlc = { .table = vlc_buf, .table_allocated = 
FF_ARRAY_ELEMS(vlc_buf) };
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 3/9] avcodec/dv_tablegen, dvdata: Remove ff_dv_vlc_bits

2022-09-04 Thread Andreas Rheinhardt

The codes can be easily calculated, so the table is unnecessary.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/dv_tablegen.h |  5 +++-
 libavcodec/dvdata.c  | 54 
 libavcodec/dvdata.h  |  1 -
 3 files changed, 4 insertions(+), 56 deletions(-)

diff --git a/libavcodec/dv_tablegen.h b/libavcodec/dv_tablegen.h
index 0dcfffc140..7f0ab53fa7 100644
--- a/libavcodec/dv_tablegen.h
+++ b/libavcodec/dv_tablegen.h
@@ -50,8 +50,11 @@ static struct dv_vlc_pair 
dv_vlc_map[DV_VLC_MAP_RUN_SIZE][DV_VLC_MAP_LEV_SIZE];
 
 static av_cold void dv_vlc_map_tableinit(void)
 {
+uint32_t code = 0;
 int i, j;
 for (int i = 0; i < NB_DV_VLC; i++) {
+uint32_t cur_code = code >> (32 - ff_dv_vlc_len[i]);
+code += 1U << (32 - ff_dv_vlc_len[i]);
 if (ff_dv_vlc_run[i] >= DV_VLC_MAP_RUN_SIZE)
 continue;
 #if CONFIG_SMALL
@@ -63,7 +66,7 @@ static av_cold void dv_vlc_map_tableinit(void)
 continue;
 
 dv_vlc_map[ff_dv_vlc_run[i]][ff_dv_vlc_level[i]].vlc  =
-ff_dv_vlc_bits[i] << (!!ff_dv_vlc_level[i]);
+cur_code << (!!ff_dv_vlc_level[i]);
 dv_vlc_map[ff_dv_vlc_run[i]][ff_dv_vlc_level[i]].size =
 ff_dv_vlc_len[i]   + (!!ff_dv_vlc_level[i]);
 }
diff --git a/libavcodec/dvdata.c b/libavcodec/dvdata.c
index 1e48db591d..0cd10aed10 100644
--- a/libavcodec/dvdata.c
+++ b/libavcodec/dvdata.c
@@ -74,60 +74,6 @@ const uint8_t ff_dv_quant_offset[4] = { 6, 3, 0, 1 };
  * between (run, level) and vlc is not 1-1. So you have to watch out for that
  * when building misc. tables. E.g. (1, 0) can be either 0x7cf or 0x1f82.
  */
-const uint16_t ff_dv_vlc_bits[NB_DV_VLC] = {
-0x, 0x0002, 0x0006, 0x0007, 0x0008, 0x0009, 0x0014, 0x0015, 0x0016,
-0x0017, 0x0030, 0x0031, 0x0032, 0x0033, 0x0068, 0x0069, 0x006a,
-0x006b, 0x006c, 0x006d, 0x006e, 0x006f, 0x00e0, 0x00e1, 0x00e2,
-0x00e3, 0x00e4, 0x00e5, 0x00e6, 0x00e7, 0x00e8, 0x00e9, 0x00ea,
-0x00eb, 0x00ec, 0x00ed, 0x00ee, 0x00ef, 0x01e0, 0x01e1, 0x01e2,
-0x01e3, 0x01e4, 0x01e5, 0x01e6, 0x01e7, 0x01e8, 0x01e9, 0x01ea,
-0x01eb, 0x01ec, 0x01ed, 0x01ee, 0x01ef, 0x03e0, 0x03e1, 0x03e2,
-0x03e3, 0x03e4, 0x03e5, 0x03e6, 0x07ce, 0x07cf, 0x07d0, 0x07d1,
-0x07d2, 0x07d3, 0x07d4, 0x07d5, 0x0fac, 0x0fad, 0x0fae, 0x0faf,
-0x0fb0, 0x0fb1, 0x0fb2, 0x0fb3, 0x0fb4, 0x0fb5, 0x0fb6, 0x0fb7,
-0x0fb8, 0x0fb9, 0x0fba, 0x0fbb, 0x0fbc, 0x0fbd, 0x0fbe, 0x0fbf,
-0x1f80, 0x1f81, 0x1f82, 0x1f83, 0x1f84, 0x1f85, 0x1f86, 0x1f87,
-0x1f88, 0x1f89, 0x1f8a, 0x1f8b, 0x1f8c, 0x1f8d, 0x1f8e, 0x1f8f,
-0x1f90, 0x1f91, 0x1f92, 0x1f93, 0x1f94, 0x1f95, 0x1f96, 0x1f97,
-0x1f98, 0x1f99, 0x1f9a, 0x1f9b, 0x1f9c, 0x1f9d, 0x1f9e, 0x1f9f,
-0x1fa0, 0x1fa1, 0x1fa2, 0x1fa3, 0x1fa4, 0x1fa5, 0x1fa6, 0x1fa7,
-0x1fa8, 0x1fa9, 0x1faa, 0x1fab, 0x1fac, 0x1fad, 0x1fae, 0x1faf,
-0x1fb0, 0x1fb1, 0x1fb2, 0x1fb3, 0x1fb4, 0x1fb5, 0x1fb6, 0x1fb7,
-0x1fb8, 0x1fb9, 0x1fba, 0x1fbb, 0x1fbc, 0x1fbd, 0x1fbe, 0x1fbf,
-0x7f00, 0x7f01, 0x7f02, 0x7f03, 0x7f04, 0x7f05, 0x7f06, 0x7f07,
-0x7f08, 0x7f09, 0x7f0a, 0x7f0b, 0x7f0c, 0x7f0d, 0x7f0e, 0x7f0f,
-0x7f10, 0x7f11, 0x7f12, 0x7f13, 0x7f14, 0x7f15, 0x7f16, 0x7f17,
-0x7f18, 0x7f19, 0x7f1a, 0x7f1b, 0x7f1c, 0x7f1d, 0x7f1e, 0x7f1f,
-0x7f20, 0x7f21, 0x7f22, 0x7f23, 0x7f24, 0x7f25, 0x7f26, 0x7f27,
-0x7f28, 0x7f29, 0x7f2a, 0x7f2b, 0x7f2c, 0x7f2d, 0x7f2e, 0x7f2f,
-0x7f30, 0x7f31, 0x7f32, 0x7f33, 0x7f34, 0x7f35, 0x7f36, 0x7f37,
-0x7f38, 0x7f39, 0x7f3a, 0x7f3b, 0x7f3c, 0x7f3d, 0x7f3e, 0x7f3f,
-0x7f40, 0x7f41, 0x7f42, 0x7f43, 0x7f44, 0x7f45, 0x7f46, 0x7f47,
-0x7f48, 0x7f49, 0x7f4a, 0x7f4b, 0x7f4c, 0x7f4d, 0x7f4e, 0x7f4f,
-0x7f50, 0x7f51, 0x7f52, 0x7f53, 0x7f54, 0x7f55, 0x7f56, 0x7f57,
-0x7f58, 0x7f59, 0x7f5a, 0x7f5b, 0x7f5c, 0x7f5d, 0x7f5e, 0x7f5f,
-0x7f60, 0x7f61, 0x7f62, 0x7f63, 0x7f64, 0x7f65, 0x7f66, 0x7f67,
-0x7f68, 0x7f69, 0x7f6a, 0x7f6b, 0x7f6c, 0x7f6d, 0x7f6e, 0x7f6f,
-0x7f70, 0x7f71, 0x7f72, 0x7f73, 0x7f74, 0x7f75, 0x7f76, 0x7f77,
-0x7f78, 0x7f79, 0x7f7a, 0x7f7b, 0x7f7c, 0x7f7d, 0x7f7e, 0x7f7f,
-0x7f80, 0x7f81, 0x7f82, 0x7f83, 0x7f84, 0x7f85, 0x7f86, 0x7f87,
-0x7f88, 0x7f89, 0x7f8a, 0x7f8b, 0x7f8c, 0x7f8d, 0x7f8e, 0x7f8f,
-0x7f90, 0x7f91, 0x7f92, 0x7f93, 0x7f94, 0x7f95, 0x7f96, 0x7f97,
-0x7f98, 0x7f99, 0x7f9a, 0x7f9b, 0x7f9c, 0x7f9d, 0x7f9e, 0x7f9f,
-0x7fa0, 0x7fa1, 0x7fa2, 0x7fa3, 0x7fa4, 0x7fa5, 0x7fa6, 0x7fa7,
-0x7fa8, 0x7fa9, 0x7faa, 0x7fab, 0x7fac, 0x7fad, 0x7fae, 0x7faf,
-0x7fb0, 0x7fb1, 0x7fb2, 0x7fb3, 0x7fb4, 0x7fb5, 0x7fb6, 0x7fb7,
-0x7fb8, 0x7fb9, 0x7fba, 0x7fbb, 0x7fbc, 0x7fbd, 0x7fbe, 0x7fbf,
-0x7fc0, 0x7fc1, 0x7fc2, 0x7fc3, 0x7fc4, 0x7fc5, 0x7fc6, 0x7fc7,
-0x7fc8, 0x7fc9, 0x7fca, 0x7fcb, 0x7fcc, 0x7fcd, 0x7fce, 0x7fcf,
-0x7fd0, 0x7fd1, 0x7fd2, 0x7fd3, 0x7fd4, 0x7fd5, 0x7fd6, 0x7fd7,
-0x7fd8, 0x7fd9, 0x7fda, 0x7fdb, 0x7fdc, 0x7fdd, 0x7fde, 0x7fdf,
-0x7fe0, 0x7fe1, 0x7fe2,

[FFmpeg-devel] [PATCH 2/9] avcodec/dvdec: Use ff_init_vlc_from_lengths()

2022-09-04 Thread Andreas Rheinhardt

This is possible because the codes are already ordered
from left to right in the tree. It avoids having to create
the codes ourselves and will enable the codes table
to be removed altogether once the encoder stops using it.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/dvdec.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/libavcodec/dvdec.c b/libavcodec/dvdec.c
index 3af3e82eab..8f68d2715d 100644
--- a/libavcodec/dvdec.c
+++ b/libavcodec/dvdec.c
@@ -137,7 +137,6 @@ static void dv_init_static(void)
 {
 VLCElem vlc_buf[FF_ARRAY_ELEMS(dv_rl_vlc)] = { 0 };
 VLC dv_vlc = { .table = vlc_buf, .table_allocated = 
FF_ARRAY_ELEMS(vlc_buf) };
-uint16_t  new_dv_vlc_bits[NB_DV_VLC * 2];
 uint8_tnew_dv_vlc_len[NB_DV_VLC * 2];
 uint8_tnew_dv_vlc_run[NB_DV_VLC * 2];
 int16_t  new_dv_vlc_level[NB_DV_VLC * 2];
@@ -145,17 +144,14 @@ static void dv_init_static(void)
 
 /* it's faster to include sign bit in a generic VLC parsing scheme */
 for (i = 0, j = 0; i < NB_DV_VLC; i++, j++) {
-new_dv_vlc_bits[j]  = ff_dv_vlc_bits[i];
 new_dv_vlc_len[j]   = ff_dv_vlc_len[i];
 new_dv_vlc_run[j]   = ff_dv_vlc_run[i];
 new_dv_vlc_level[j] = ff_dv_vlc_level[i];
 
 if (ff_dv_vlc_level[i]) {
-new_dv_vlc_bits[j] <<= 1;
 new_dv_vlc_len[j]++;
 
 j++;
-new_dv_vlc_bits[j]  = (ff_dv_vlc_bits[i] << 1) | 1;
 new_dv_vlc_len[j]   =  ff_dv_vlc_len[i] + 1;
 new_dv_vlc_run[j]   =  ff_dv_vlc_run[i];
 new_dv_vlc_level[j] = -ff_dv_vlc_level[i];
@@ -164,8 +160,9 @@ static void dv_init_static(void)
 
 /* NOTE: as a trick, we use the fact the no codes are unused
  * to accelerate the parsing of partial codes */
-init_vlc(_vlc, TEX_VLC_BITS, j, new_dv_vlc_len,
- 1, 1, new_dv_vlc_bits, 2, 2, INIT_VLC_USE_NEW_STATIC);
+ff_init_vlc_from_lengths(_vlc, TEX_VLC_BITS, j,
+ new_dv_vlc_len, 1,
+ NULL, 0, 0, 0, INIT_VLC_USE_NEW_STATIC, NULL);
 av_assert1(dv_vlc.table_size == 1664);
 
 for (int i = 0; i < dv_vlc.table_size; i++) {
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/9] avcodec/dvdata: Order code table by codes

2022-09-04 Thread Andreas Rheinhardt

Right now, it is nearly ordered by "left codes in the tree first";
the only exception is the escape value which has been put at the
end. This commit moves it to the place it should have according
to the above order. This is in preparation for further commits.

Signed-off-by: Andreas Rheinhardt 
---
 libavcodec/dv_tablegen.h |  2 +-
 libavcodec/dvdata.c  | 12 
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/libavcodec/dv_tablegen.h b/libavcodec/dv_tablegen.h
index 941b5572be..0dcfffc140 100644
--- a/libavcodec/dv_tablegen.h
+++ b/libavcodec/dv_tablegen.h
@@ -51,7 +51,7 @@ static struct dv_vlc_pair 
dv_vlc_map[DV_VLC_MAP_RUN_SIZE][DV_VLC_MAP_LEV_SIZE];
 static av_cold void dv_vlc_map_tableinit(void)
 {
 int i, j;
-for (i = 0; i < NB_DV_VLC - 1; i++) {
+for (int i = 0; i < NB_DV_VLC; i++) {
 if (ff_dv_vlc_run[i] >= DV_VLC_MAP_RUN_SIZE)
 continue;
 #if CONFIG_SMALL
diff --git a/libavcodec/dvdata.c b/libavcodec/dvdata.c
index 231569a328..1e48db591d 100644
--- a/libavcodec/dvdata.c
+++ b/libavcodec/dvdata.c
@@ -75,7 +75,7 @@ const uint8_t ff_dv_quant_offset[4] = { 6, 3, 0, 1 };
  * when building misc. tables. E.g. (1, 0) can be either 0x7cf or 0x1f82.
  */
 const uint16_t ff_dv_vlc_bits[NB_DV_VLC] = {
-0x, 0x0002, 0x0007, 0x0008, 0x0009, 0x0014, 0x0015, 0x0016,
+0x, 0x0002, 0x0006, 0x0007, 0x0008, 0x0009, 0x0014, 0x0015, 0x0016,
 0x0017, 0x0030, 0x0031, 0x0032, 0x0033, 0x0068, 0x0069, 0x006a,
 0x006b, 0x006c, 0x006d, 0x006e, 0x006f, 0x00e0, 0x00e1, 0x00e2,
 0x00e3, 0x00e4, 0x00e5, 0x00e6, 0x00e7, 0x00e8, 0x00e9, 0x00ea,
@@ -126,11 +126,10 @@ const uint16_t ff_dv_vlc_bits[NB_DV_VLC] = {
 0x7fe8, 0x7fe9, 0x7fea, 0x7feb, 0x7fec, 0x7fed, 0x7fee, 0x7fef,
 0x7ff0, 0x7ff1, 0x7ff2, 0x7ff3, 0x7ff4, 0x7ff5, 0x7ff6, 0x7ff7,
 0x7ff8, 0x7ff9, 0x7ffa, 0x7ffb, 0x7ffc, 0x7ffd, 0x7ffe, 0x7fff,
-0x0006,
 };
 
 const uint8_t ff_dv_vlc_len[NB_DV_VLC] = {
- 2,  3,  4,  4,  4,  5,  5,  5,
+ 2,  3,  4,  4,  4,  4,  5,  5,  5,
  5,  6,  6,  6,  6,  7,  7,  7,
  7,  7,  7,  7,  7,  8,  8,  8,
  8,  8,  8,  8,  8,  8,  8,  8,
@@ -181,11 +180,10 @@ const uint8_t ff_dv_vlc_len[NB_DV_VLC] = {
 15, 15, 15, 15, 15, 15, 15, 15,
 15, 15, 15, 15, 15, 15, 15, 15,
 15, 15, 15, 15, 15, 15, 15, 15,
- 4,
 };
 
 const uint8_t ff_dv_vlc_run[NB_DV_VLC] = {
- 0,  0,  1,  0,  0,  2,  1,  0,
+ 0,  0, 127, 1,  0,  0,  2,  1,  0,
  0,  3,  4,  0,  0,  5,  6,  2,
  1,  1,  0,  0,  0,  7,  8,  9,
 10,  3,  4,  2,  1,  1,  1,  0,
@@ -236,11 +234,10 @@ const uint8_t ff_dv_vlc_run[NB_DV_VLC] = {
  0,  0,  0,  0,  0,  0,  0,  0,
  0,  0,  0,  0,  0,  0,  0,  0,
  0,  0,  0,  0,  0,  0,  0,  0,
-   127,
 };
 
 const uint8_t ff_dv_vlc_level[NB_DV_VLC] = {
- 1,   2,   1,   3,   4,   1,   2,   5,
+ 1,   2,   0,   1,   3,   4,   1,   2,   5,
  6,   1,   1,   7,   8,   1,   1,   2,
  3,   4,   9,  10,  11,   1,   1,   1,
  1,   2,   2,   3,   5,   6,   7,  12,
@@ -291,5 +288,4 @@ const uint8_t ff_dv_vlc_level[NB_DV_VLC] = {
232, 233, 234, 235, 236, 237, 238, 239,
240, 241, 242, 243, 244, 245, 246, 247,
248, 249, 250, 251, 252, 253, 254, 255,
- 0,
 };
-- 
2.34.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/3] avcodec/fmvc: buffer size is stride based not 4*width

2022-09-04 Thread Michael Niedermayer

On Fri, Sep 02, 2022 at 06:48:57PM +0200, Paul B Mahol wrote:
> On Fri, Sep 2, 2022 at 6:32 PM Michael Niedermayer 
> wrote:
> 
> > On Mon, Jun 13, 2022 at 09:13:19PM +0200, Michael Niedermayer wrote:
> > > On Mon, Jun 13, 2022 at 12:10:44PM +0200, Paul B Mahol wrote:
> > > > On Mon, Jun 13, 2022 at 11:48 AM Anton Khirnov 
> > wrote:
> > > >
> > > > > Quoting Paul B Mahol (2022-06-13 11:34:44)
> > > > > > On Mon, Jun 13, 2022 at 11:10 AM Anton Khirnov 
> > > > > wrote:
> > > > > >
> > > > > > > Quoting Paul B Mahol (2022-06-13 10:04:04)
> > > > > > > > On Sat, Jun 11, 2022 at 4:55 PM Michael Niedermayer <
> > > > > > > mich...@niedermayer.cc>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > On Sat, Jun 11, 2022 at 10:47:57AM +0200, Paul B Mahol wrote:
> > > > > > > > > > Have you actually tested this "change" ?
> > > > > > > > >
> > > > > > > > > On every file i found
> > > > > > > > > 6-methyl-5-hepten-2-one-CC-db_small.avi
> > > > > > > > > fmvcVirtualDub_small.avi
> > > > > > > > > skrzyzowanie4.avi
> > > > > > > > > fmvc-poc.avi
> > > > > > > > >
> > > > > > > > > are there any other files i should test it on ?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, the ones where stride != width.
> > > > > > >
> > > > > > > Give examples of such files then. And add more tests.
> > > > > > >
> > > > > > > You really should try to be more helpful if you care about this
> > code
> > > > > > > working.
> > > > > >
> > > > > >
> > > > > > Code works perfectly from start. There are always attempts to
> > break it.
> > > > > > Your attempts to belittle my work are futile.
> > > > >
> > > > > Perfect code should live in an external repository that is locked
> > > > > against modification.
> > > > >
> > > > > The ffmpeg repository is only for imperfect code that evolves with
> > time,
> > > > > and so requires changes.
> > > > >
> > > > >
> > > > I dunno what Michael attempts to fix. Decoder works fine with valid
> > files.
> > > > I doubt that encoder would encode random bytes or padding into valid
> > file
> > > > bitstream.
> > >
> > > the stride*4 / width*4 change was because of 2 things.
> > > first with AV_PIX_FMT_BGR24 the data stored is not width*4
> > >
> > > stride is in units of 4 bytes for some reason, so stride*4
> > > fixes this
> > > The 2nd issue is that the code addresses it by "s->stride * 4"
> > > so the buffer allocation should be stride*4 if we belive the
> > > other code is correct
> > >
> > > src = s->buffer;
> > > ...
> > > for (y = 0; y < avctx->height; y++) {
> > > ...
> > > src += s->stride * 4;
> > >
> > > width*4 works because its bigger than stride*4 for BGR24 which is what
> > all
> > > samples i have use.
> > >
> > > also
> > > ssrc = s->buffer;
> > > ...
> > > for (y = 0; y < avctx->height; y++) {
> > > ...
> > > ssrc += s->stride * 4;
> > > and
> > > dst = (uint32_t *)s->buffer;
> > >
> > > for (block = 0, y = 0; y < s->yb; y++) {
> > > int block_h = s->blocks[block].h;
> > > uint32_t *rect = dst;
> > >
> > > for (x = 0; x < s->xb; x++) {
> > > int block_w = s->blocks[block].w;
> > > uint32_t *row = dst;
> > >
> > > block_h = s->blocks[block].h;
> > > if (s->blocks[block].xor) {
> > > for (k = 0; k < block_h; k++) {
> > > uint32_t *column = dst;
> > > for (l = 0; l < block_w; l++)
> > > *dst++ ^= *src++;
> > > dst = [s->stride];
> > > }
> > > }
> > > dst = [block_w];
> > > ++block;
> > > }
> > > dst = [block_h * s->stride];
> > > }
> > >
> > > Again, if you have fmvc files with more odd widths or other pixel formats
> > > these would be very welcome. I can just say the code as is in git is
> > wrong
> > > and the buffer size as is in git is wrong. I noticed this when i added
> > > a check to see if the buffer is only partly filled and realized its
> > > always partly filled even when the whole image is actually touched
> >
> > If there are no objections aka noone sees a bug in this then id like
> > to apply this
> >
> 
> Since when are partially filled buffers are bad thing?

- waste of memory
- breaks subsequent patch
- width and stride relate this way: 
  s->stride = (avctx->width * avctx->bits_per_coded_sample + 31) / 32;
  is width always bigger or equal ?
  If not we might be accessing outside the array because access uses
  stride, allocation width

Or one line awnser
Since when is it a good thing to mismatch allocation and access?

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Nations do behave wisely once they have exhausted all other alternatives. 
-- Abba Eban


signature.asc
Description: PGP signature

Re: [FFmpeg-devel] [PATCH v3 2/3] lavf/dashdec: Multithreaded DASH initialization

2022-09-04 Thread Lukas Fellechner

Andreas Rheinhardt andreas.rheinhardt at outlook.com
Wed Aug 31 05:54:12 EEST 2022
>
> > +#if HAVE_THREADS
> > +
> > +struct work_pool_data
> > +{
> > +AVFormatContext *ctx;
> > +struct representation *pls;
> > +struct representation *common_pls;
> > +pthread_mutex_t *common_mutex;
> > +pthread_cond_t *common_condition;
> > +int is_common;
> > +int is_started;
> > +int result;
> > +};
> > +
> > +struct thread_data
>
> This is against our naming conventions: CamelCase for struct tags and
> typedefs, lowercase names with underscore for variable names.

In the code files I looked at, CamelCase is only used for typedef structs.
All structs without typedef are lower case with underscores, so I aligned
with that, originally.

I will make this a typedef struct and use CamelCase for next patch.

> > +static int init_streams_multithreaded(AVFormatContext *s, int nstreams, 
> > int threads)
> > +{
> > +DASHContext *c = s->priv_data;
> > +int ret = 0;
> > +int stream_index = 0;
> > +int i;
>
> We allow "for (int i = 0;"

Oh, I did not know that, and I did not see it being used here anywhere.
Will use in next patch, it's my preferred style, actually.

> > +
> > +// alloc data
> > +struct work_pool_data *init_data = (struct 
> > work_pool_data*)av_mallocz(sizeof(struct work_pool_data) * nstreams);
> > +if (!init_data)
> > +return AVERROR(ENOMEM);
> > +
> > +struct thread_data *thread_data = (struct 
> > thread_data*)av_mallocz(sizeof(struct thread_data) * threads);
> > +if (!thread_data)
> > +return AVERROR(ENOMEM);
>
> 1. init_data leaks here on error.
> 2. In fact, it seems to me that both init_data and thread_data are
> nowhere freed.

True, I must have lost the av_free call at some point.

> > +// init work pool data
> > +struct work_pool_data* current_data = init_data;
> > +
> > +for (i = 0; i < c->n_videos; i++) {
> > +create_work_pool_data(s, stream_index, c->videos[i],
> > +c->is_init_section_common_video ? c->videos[0] : NULL,
> > +current_data, _video_mutex, _video_cond);
> > +
> > +stream_index++;
> > +current_data++;
> > +}
> > +
> > +for (i = 0; i < c->n_audios; i++) {
> > +create_work_pool_data(s, stream_index, c->audios[i],
> > +c->is_init_section_common_audio ? c->audios[0] : NULL,
> > +current_data, _audio_mutex, _audio_cond);
> > +
> > +stream_index++;
> > +current_data++;
> > +}
> > +
> > +for (i = 0; i < c->n_subtitles; i++) {
> > +create_work_pool_data(s, stream_index, c->subtitles[i],
> > +c->is_init_section_common_subtitle ? c->subtitles[0] : NULL,
> > +current_data, _subtitle_mutex, _subtitle_cond);
> > +
> > +stream_index++;
> > +current_data++;
> > +}
>
> This is very repetitive.

Will improve for next patch.

> 1. We actually have an API to process multiple tasks by different
> threads: Look at libavutil/slicethread.h. Why can't you reuse that?
> 2. In case initialization of one of the conditions/mutexes fails, you
> are nevertheless destroying them; you are even destroying completely
> uninitialized mutexes. This is undefined behaviour. Checking the result
> of it does not fix this.
>
> - Andreas

1. The slicethread implementation is pretty hard to understand.
I was not sure if it can be used for normal parallelization, because
it looked more like some kind of thread pool for continuous data
processing. But after taking a second look, I think I can use it here.
I will try and see if it works well.

2. I was not aware that this is undefined behavior. Will switch to
PTHREAD_MUTEX_INITIALIZER and PTHREAD_COND_INITIALIZER macros,
which return a safely initialized mutex/cond.

I also noticed one cross-thread issue that I will solve in the next patch.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 5/5] lavc/aarch64: Provide neon implementation of nsse16

2022-09-04 Thread Martin Storsjö


On Mon, 22 Aug 2022, Hubert Mazur wrote:


Add vectorized implementation of nsse16 function.

Performance comparison tests are shown below.
- nsse_0_c: 707.0
- nsse_0_neon: 120.0

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur 
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  15 +++
libavcodec/aarch64/me_cmp_neon.S | 126 +++
2 files changed, 141 insertions(+)

diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c 
b/libavcodec/aarch64/me_cmp_init_aarch64.c
index 8c295d5457..146ef04345 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -49,6 +49,10 @@ int vsse16_neon(MpegEncContext *c, const uint8_t *s1, const 
uint8_t *s2,
ptrdiff_t stride, int h);
int vsse_intra16_neon(MpegEncContext *c, const uint8_t *s, const uint8_t *dummy,
  ptrdiff_t stride, int h);
+int nsse16_neon(int multiplier, const uint8_t *s, const uint8_t *s2,
+ptrdiff_t stride, int h);
+int nsse16_neon_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t 
*s2,
+ptrdiff_t stride, int h);

av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
{
@@ -72,5 +76,16 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, 
AVCodecContext *avctx)

c->vsse[0] = vsse16_neon;
c->vsse[4] = vsse_intra16_neon;
+
+c->nsse[0] = nsse16_neon_wrapper;
}
}
+
+int nsse16_neon_wrapper(MpegEncContext *c, const uint8_t *s1, const uint8_t 
*s2,
+ptrdiff_t stride, int h)
+{
+if (c)
+return nsse16_neon(c->avctx->nsse_weight, s1, s2, stride, h);
+else
+return nsse16_neon(8, s1, s2, stride, h);
+}
\ No newline at end of file


The indentation is off for this file, and it's missing the final newline.


diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index 46d4dade5d..9fe96e111c 100644
--- a/libavcodec/aarch64/me_cmp_neon.S
+++ b/libavcodec/aarch64/me_cmp_neon.S
@@ -889,3 +889,129 @@ function vsse_intra16_neon, export=1

ret
endfunc
+
+function nsse16_neon, export=1
+// x0   multiplier
+// x1   uint8_t *pix1
+// x2   uint8_t *pix2
+// x3   ptrdiff_t stride
+// w4   int h
+
+str x0, [sp, #-0x40]!
+stp x1, x2, [sp, #0x10]
+stp x3, x4, [sp, #0x20]
+str lr, [sp, #0x30]
+bl  sse16_neon
+ldr lr, [sp, #0x30]
+mov w9, w0  // here we 
store score1
+ldr x5, [sp]
+ldp x1, x2, [sp, #0x10]
+ldp x3, x4, [sp, #0x20]
+add sp, sp, #0x40
+
+moviv16.8h, #0
+moviv17.8h, #0
+moviv18.8h, #0
+moviv19.8h, #0
+
+mov x10, x1 // x1
+mov x14, x2 // x2


I don't see why you need to make a copy of x1/x2 here, as you don't use 
x1/x2 after this at all.



+add x11, x1, x3 // x1 + stride
+add x15, x2, x3 // x2 + stride
+add x12, x1, #1 // x1 + 1
+add x16, x2, #1 // x2 + 1


FWIW, instead of making two loads, for [x1] and [x1+1], as we don't need 
the final value at [x1+16], I would normally just do one load of [x1] and 
then make a shifted version with the 'ext' instruction; ext is generally 
cheaper than doing redundant loads. On the other hand, by doing two loads, 
you don't have a serial dependency on the first load.




+// iterate by one
+2:
+ld1 {v0.16b}, [x10], x3
+ld1 {v1.16b}, [x11], x3
+ld1 {v2.16b}, [x12], x3
+usubl   v31.8h, v0.8b, v1.8b
+ld1 {v3.16b}, [x13], x3
+usubl2  v30.8h, v0.16b, v1.16b
+usubl   v29.8h, v2.8b, v3.8b
+usubl2  v28.8h, v2.16b, v3.16b
+sabav16.8h, v31.8h, v29.8h
+ld1 {v4.16b}, [x14], x3
+ld1 {v5.16b}, [x15], x3
+sabav17.8h, v30.8h, v28.8h
+ld1 {v6.16b}, [x16], x3
+usubl   v27.8h, v4.8b, v5.8b
+ld1 {v7.16b}, [x17], x3


So, looking at the main implementation structure here, by looking at the 
non-unrolled version: You're doing 8 loads per iteration here - and I 
would say you can do this with 2 loads per iteration.


By reusing the loaded data from the previous iteration instead of 
duplicated loading, you can get this down from 8 to 4 loads. And

Re: [FFmpeg-devel] [PATCH 4/5] lavc/aarch64: Add neon implementation for vsse_intra16

2022-09-04 Thread Martin Storsjö


On Mon, 22 Aug 2022, Hubert Mazur wrote:


Provide optimized implementation for vsse_intra16 for arm64.

Performance tests are shown below.
- vsse_4_c: 153.7
- vsse_4_neon: 34.2

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur 
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  3 +
libavcodec/aarch64/me_cmp_neon.S | 75 
2 files changed, 78 insertions(+)


The same comment as for the others.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/5] lavc/aarch64: Add neon implementation for vsad_intra16

2022-09-04 Thread Martin Storsjö


On Mon, 22 Aug 2022, Hubert Mazur wrote:


Provide optimized implementation for vsad_intra16 function for arm64.

Performance comparison tests are shown below.
- vsad_4_c: 177.2
- vsad_4_neon: 24.5

Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.

Signed-off-by: Hubert Mazur 
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  3 ++
libavcodec/aarch64/me_cmp_neon.S | 58 
2 files changed, 61 insertions(+)


Same thing as for the others; keep the data for the previous row in 
registers instead of loading it twice.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/5] lavc/aarch64: Add neon implementation of vsse16

2022-09-04 Thread Martin Storsjö


On Mon, 22 Aug 2022, Hubert Mazur wrote:


Provide optimized implementation of vsse16 for arm64.

Performance comparison tests are shown below.
- vsse_0_c: 254.4
- vsse_0_neon: 64.7

Benchmarks and tests are run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur 
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  4 +
libavcodec/aarch64/me_cmp_neon.S | 97 
2 files changed, 101 insertions(+)

diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c 
b/libavcodec/aarch64/me_cmp_init_aarch64.c
index ddc5d05611..7b81e48d16 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -43,6 +43,8 @@ int sse4_neon(MpegEncContext *v, const uint8_t *pix1, const 
uint8_t *pix2,

int vsad16_neon(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2,
ptrdiff_t stride, int h);
+int vsse16_neon(MpegEncContext *c, const uint8_t *s1, const uint8_t *s2,
+ptrdiff_t stride, int h);

av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, AVCodecContext *avctx)
{
@@ -62,5 +64,7 @@ av_cold void ff_me_cmp_init_aarch64(MECmpContext *c, 
AVCodecContext *avctx)
c->sse[2] = sse4_neon;

c->vsad[0] = vsad16_neon;
+
+c->vsse[0] = vsse16_neon;
}
}
diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index d4c0099854..279bae7cb5 100644
--- a/libavcodec/aarch64/me_cmp_neon.S
+++ b/libavcodec/aarch64/me_cmp_neon.S
@@ -659,3 +659,100 @@ function vsad16_neon, export=1

ret
endfunc
+
+function vsse16_neon, export=1
+// x0   unused
+// x1   uint8_t *pix1
+// x2   uint8_t *pix2
+// x3   ptrdiff_t stride
+// w4   int h
+
+moviv30.4s, #0
+moviv29.4s, #0
+
+add x5, x1, x3  // pix1 + stride
+add x6, x2, x3  // pix2 + stride
+sub w4, w4, #1  // we need to make h-1 
iterations
+cmp w4, #3  // check if we can 
make 4 iterations at once
+b.le2f
+
+// make 4 iterations at once


The comments seem to talk about 4 iterations at once while the code 
actually only does 3.



+1:
+// x = abs(pix1[0] - pix2[0] - pix1[0 + stride] + pix2[0 + stride]) =


The comment seems a bit un-updated here, since there's no abs() involved 
here



+// res = (x) * (x)
+ld1 {v0.16b}, [x1], x3  // Load pix1[0], first 
iteration
+ld1 {v1.16b}, [x2], x3  // Load pix2[0], first 
iteration
+ld1 {v2.16b}, [x5], x3  // Load pix1[0 + 
stride], first iteration
+usubl   v28.8h, v0.8b, v1.8b// Signed difference 
of pix1[0] - pix2[0], first iteration
+ld1 {v3.16b}, [x6], x3  // Load pix2[0 + 
stride], first iteration
+usubl2  v27.8h, v0.16b, v1.16b  // Signed difference 
of pix1[0] - pix2[0], first iteration
+usubl   v26.8h, v3.8b, v2.8b// Signed difference 
of pix1[0 + stride] - pix2[0 + stride], first iteration
+usubl2  v25.8h, v3.16b, v2.16b  // Signed difference 
of pix1[0 + stride] - pix2[0 + stride], first iteration


Same thing about reusing data from the previous row, as for the previous 
patch.


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/3] avisynth: use AviSynth+'s frame properties to set various fields

2022-09-04 Thread Stephen Hutchinson


On 8/25/22 3:46 AM, Steinar Apalnes wrote:

tor. 25. aug. 2022 kl. 02:11 skrev Stephen Hutchinson :


On 8/24/22 1:04 PM, Steinar Apalnes wrote:

tir. 8. feb. 2022 kl. 12:03 skrev Stephen Hutchinson :


* Field Order
* Chroma Location
* Color Transfer Characteristics
* Color Range
* Color Primaries
* Matrix Coefficients

The existing TFF/BFF detection is retained as a fallback for
older versions of AviSynth that can't access frame properties.
The other properties have no legacy equivalent to detect them.

Signed-off-by: Stephen Hutchinson 


...
   Hi Stephen,

Would it be possible to add support for "_SARum" and "_SARDen" so that
ffmpeg could also recognize the sample aspect ratio in avs scripts?



I'm a bit hesitant to do so, namely because the _SARNum/Den properties
are much more likely to need to have been changed due to operations
in-script, and unless the user is studious about updating those
properties after even just a basic resizing operation, then _SARNum/Den
will still be set to the original values populated by the source filter,
and will be wrong, leading to encodes ending up wrong and potentially
bug reports to Trac which aren't actually the fault of the demuxer.

This is partially coming from the fact that even the color-based
properties that were already added have experienced some level of
backlash because of the requirement for users to ensure the properties
are correctly updated if they've done any changes to those factors
(as best as I'm aware, the filters in the AviSynth+ core still only pass
through the existing properties, but they don't update them if they
pertain to that property's functionality; I believe some external
filters do update them, however).  I would be fairly confident in
betting that users resizing video is far more common than them doing
color correction ops that would require updating the frameprops FFmpeg
can currently read.

One mitigation to that, IMO, would be to flag that as an experimental
feature, making it to where FFmpeg won't read _SARNum/Den unless
the -strict option has been used.
___



If you think cherry picking what props to support under normal operation is
the way to go
then a -strict option to support *all* "scary" avs props would be very
welcome indeed.
Because as it is now we have nothing at all in regards to the SAR.

Thanks for your efforts :-)

-steinar


SAR frameprops implemented in commit c49beead, and a more fine-grained,
flags-based way to toggle any of the frameprops on and off added in
commit adead1cc.  Pushed a few minutes ago.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avcodec/flac: critical fix

2022-09-04 Thread Leo Izen


On 9/4/22 14:54, Paul B Mahol wrote:

Another critical fix for decoder.


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


I tested this, I can confirm it fixes #9270.

I'm not familiar with the copyright status on the given sample, would it 
make sense to add a fate test to this to prevent this kind of regression 
in the future?


- Leo Izen (thebombzen)
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 3/3 v2] avformat/avisynth: reindent

2022-09-04 Thread Stephen Hutchinson


On 8/30/22 8:23 PM, Stephen Hutchinson wrote:

Signed-off-by: Stephen Hutchinson 
---
  libavformat/avisynth.c | 348 -
  1 file changed, 174 insertions(+), 174 deletions(-)

diff --git a/libavformat/avisynth.c b/libavformat/avisynth.c
index 7bb2977383..b426ac343e 100644
--- a/libavformat/avisynth.c
+++ b/libavformat/avisynth.c
@@ -533,235 +533,235 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  
  /* Field order */

  if(avs->flags & AVISYNTH_FRAMEPROP_FIELD_ORDER) {
-if(avs_library.avs_prop_get_type(avs->env, avsmap, "_FieldBased") == 
AVS_PROPTYPE_UNSET) {
-st->codecpar->field_order = AV_FIELD_UNKNOWN;
-} else {
-switch (avs_library.avs_prop_get_int(avs->env, avsmap, "_FieldBased", 
0, )) {
-case 0:
-st->codecpar->field_order = AV_FIELD_PROGRESSIVE;
-break;
-case 1:
-st->codecpar->field_order = AV_FIELD_BB;
-break;
-case 2:
-st->codecpar->field_order = AV_FIELD_TT;
-break;
-default:
+if(avs_library.avs_prop_get_type(avs->env, avsmap, "_FieldBased") 
== AVS_PROPTYPE_UNSET) {
  st->codecpar->field_order = AV_FIELD_UNKNOWN;
+} else {
+switch (avs_library.avs_prop_get_int(avs->env, avsmap, 
"_FieldBased", 0, )) {
+case 0:
+st->codecpar->field_order = AV_FIELD_PROGRESSIVE;
+break;
+case 1:
+st->codecpar->field_order = AV_FIELD_BB;
+break;
+case 2:
+st->codecpar->field_order = AV_FIELD_TT;
+break;
+default:
+st->codecpar->field_order = AV_FIELD_UNKNOWN;
+}
  }
  }
-}
  
  /* Color Range */

  if(avs->flags & AVISYNTH_FRAMEPROP_RANGE) {
-if(avs_library.avs_prop_get_type(avs->env, avsmap, "_ColorRange") == 
AVS_PROPTYPE_UNSET) {
-st->codecpar->color_range = AVCOL_RANGE_UNSPECIFIED;
-} else {
-switch (avs_library.avs_prop_get_int(avs->env, avsmap, "_ColorRange", 
0, )) {
-case 0:
-st->codecpar->color_range = AVCOL_RANGE_JPEG;
-break;
-case 1:
-st->codecpar->color_range = AVCOL_RANGE_MPEG;
-break;
-default:
+if(avs_library.avs_prop_get_type(avs->env, avsmap, "_ColorRange") 
== AVS_PROPTYPE_UNSET) {
  st->codecpar->color_range = AVCOL_RANGE_UNSPECIFIED;
+} else {
+switch (avs_library.avs_prop_get_int(avs->env, avsmap, 
"_ColorRange", 0, )) {
+case 0:
+st->codecpar->color_range = AVCOL_RANGE_JPEG;
+break;
+case 1:
+st->codecpar->color_range = AVCOL_RANGE_MPEG;
+break;
+default:
+st->codecpar->color_range = AVCOL_RANGE_UNSPECIFIED;
+}
  }
  }
-}
  
  /* Color Primaries */

  if(avs->flags & AVISYNTH_FRAMEPROP_PRIMARIES) {
-switch (avs_library.avs_prop_get_int(avs->env, avsmap, "_Primaries", 0, 
)) {
-case 1:
-st->codecpar->color_primaries = AVCOL_PRI_BT709;
-break;
-case 2:
-st->codecpar->color_primaries = AVCOL_PRI_UNSPECIFIED;
-break;
-case 4:
-st->codecpar->color_primaries = AVCOL_PRI_BT470M;
-break;
-case 5:
-st->codecpar->color_primaries = AVCOL_PRI_BT470BG;
-break;
-case 6:
-st->codecpar->color_primaries = AVCOL_PRI_SMPTE170M;
-break;
-case 7:
-st->codecpar->color_primaries = AVCOL_PRI_SMPTE240M;
-break;
-case 8:
-st->codecpar->color_primaries = AVCOL_PRI_FILM;
-break;
-case 9:
-st->codecpar->color_primaries = AVCOL_PRI_BT2020;
-break;
-case 10:
-st->codecpar->color_primaries = AVCOL_PRI_SMPTE428;
-break;
-case 11:
-st->codecpar->color_primaries = AVCOL_PRI_SMPTE431;
-break;
-case 12:
-st->codecpar->color_primaries = AVCOL_PRI_SMPTE432;
-break;
-case 22:
-st->codecpar->color_primaries = AVCOL_PRI_EBU3213;
-break;
-default:
-st->codecpar->color_primaries = AVCOL_PRI_UNSPECIFIED;
-}
+switch (avs_library.avs_prop_get_int(avs->env, avsmap, "_Primaries", 
0, )) {
+case 1:
+st->codecpar->color_primaries = AVCOL_PRI_BT709;
+break;
+case 2:
+

Re: [FFmpeg-devel] [PATCH 2/3 v2] avformat/avisynth: implement avisynth_flags option

2022-09-04 Thread Stephen Hutchinson


On 8/30/22 8:23 PM, Stephen Hutchinson wrote:

Signed-off-by: Stephen Hutchinson 
---
  libavformat/avisynth.c | 52 ++
  1 file changed, 52 insertions(+)

diff --git a/libavformat/avisynth.c b/libavformat/avisynth.c
index d978e6ec40..7bb2977383 100644
--- a/libavformat/avisynth.c
+++ b/libavformat/avisynth.c
@@ -21,6 +21,7 @@
  
  #include "libavutil/attributes.h"

  #include "libavutil/internal.h"
+#include "libavutil/opt.h"
  
  #include "libavcodec/internal.h"
  
@@ -85,7 +86,18 @@ typedef struct AviSynthLibrary {

  #undef AVSC_DECLARE_FUNC
  } AviSynthLibrary;
  
+typedef enum AviSynthFlags {

+AVISYNTH_FRAMEPROP_FIELD_ORDER = (1 << 0),
+AVISYNTH_FRAMEPROP_RANGE = (1 << 1),
+AVISYNTH_FRAMEPROP_PRIMARIES = (1 << 2),
+AVISYNTH_FRAMEPROP_TRANSFER = (1 << 3),
+AVISYNTH_FRAMEPROP_MATRIX = (1 << 4),
+AVISYNTH_FRAMEPROP_CHROMA_LOCATION = (1 << 5),
+AVISYNTH_FRAMEPROP_SAR = (1 << 6),
+} AviSynthFlags;
+
  typedef struct AviSynthContext {
+const AVClass *class;
  AVS_ScriptEnvironment *env;
  AVS_Clip *clip;
  const AVS_VideoInfo *vi;
@@ -100,6 +112,8 @@ typedef struct AviSynthContext {
  
  int error;
  
+uint32_t flags;

+
  /* Linked list pointers. */
  struct AviSynthContext *next;
  } AviSynthContext;
@@ -518,6 +532,7 @@ static int avisynth_create_stream_video(AVFormatContext *s, 
AVStream *st)
  avsmap = avs_library.avs_get_frame_props_ro(avs->env, frame);
  
  /* Field order */

+if(avs->flags & AVISYNTH_FRAMEPROP_FIELD_ORDER) {
  if(avs_library.avs_prop_get_type(avs->env, avsmap, "_FieldBased") == 
AVS_PROPTYPE_UNSET) {
  st->codecpar->field_order = AV_FIELD_UNKNOWN;
  } else {
@@ -535,8 +550,10 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  st->codecpar->field_order = AV_FIELD_UNKNOWN;
  }
  }
+}
  
  /* Color Range */

+if(avs->flags & AVISYNTH_FRAMEPROP_RANGE) {
  if(avs_library.avs_prop_get_type(avs->env, avsmap, "_ColorRange") == 
AVS_PROPTYPE_UNSET) {
  st->codecpar->color_range = AVCOL_RANGE_UNSPECIFIED;
  } else {
@@ -551,8 +568,10 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  st->codecpar->color_range = AVCOL_RANGE_UNSPECIFIED;
  }
  }
+}
  
  /* Color Primaries */

+if(avs->flags & AVISYNTH_FRAMEPROP_PRIMARIES) {
  switch (avs_library.avs_prop_get_int(avs->env, avsmap, "_Primaries", 0, 
)) {
  case 1:
  st->codecpar->color_primaries = AVCOL_PRI_BT709;
@@ -593,8 +612,10 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  default:
  st->codecpar->color_primaries = AVCOL_PRI_UNSPECIFIED;
  }
+}
  
  /* Color Transfer Characteristics */

+if(avs->flags & AVISYNTH_FRAMEPROP_TRANSFER) {
  switch (avs_library.avs_prop_get_int(avs->env, avsmap, "_Transfer", 0, 
)) {
  case 1:
  st->codecpar->color_trc = AVCOL_TRC_BT709;
@@ -650,8 +671,10 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  default:
  st->codecpar->color_trc = AVCOL_TRC_UNSPECIFIED;
  }
+}
  
  /* Matrix coefficients */

+if(avs->flags & AVISYNTH_FRAMEPROP_MATRIX) {
  if(avs_library.avs_prop_get_type(avs->env, avsmap, "_Matrix") == 
AVS_PROPTYPE_UNSET) {
  st->codecpar->color_space = AVCOL_SPC_UNSPECIFIED;
  } else {
@@ -702,8 +725,10 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  st->codecpar->color_space = AVCOL_SPC_UNSPECIFIED;
  }
  }
+}
  
  /* Chroma Location */

+if(avs->flags & AVISYNTH_FRAMEPROP_CHROMA_LOCATION) {
  if(avs_library.avs_prop_get_type(avs->env, avsmap, "_ChromaLocation") 
== AVS_PROPTYPE_UNSET) {
  st->codecpar->chroma_location = AVCHROMA_LOC_UNSPECIFIED;
  } else {
@@ -730,11 +755,14 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  st->codecpar->chroma_location = AVCHROMA_LOC_UNSPECIFIED;
  }
  }
+}
  
  /* Sample aspect ratio */

+if(avs->flags & AVISYNTH_FRAMEPROP_SAR) {
  sar_num = avs_library.avs_prop_get_int(avs->env, avsmap, "_SARNum", 0, 
);
  sar_den = avs_library.avs_prop_get_int(avs->env, avsmap, "_SARDen", 0, 
);
  st->sample_aspect_ratio = (AVRational){ sar_num, sar_den };
+}
  
  avs_library.avs_release_video_frame(frame);

  } else {
@@ -1140,6 +1168,29 @@ static int avisynth_read_seek(AVFormatContext *s, int 
stream_index,
  return 0;
  }
  
+#define AVISYNTH_FRAMEPROP_DEFAULT AVISYNTH_FRAMEPROP_FIELD_ORDER | AVISYNTH_FRAMEPROP_RANGE | \

+

Re: [FFmpeg-devel] [PATCH 1/3 v2] avformat/avisynth: read _SARNum/_SARDen from frame properties

2022-09-04 Thread Stephen Hutchinson


On 8/30/22 8:23 PM, Stephen Hutchinson wrote:

Initialized to 1:1, but if the script sets these properties, it
will be set to those instead (0:0 disables it, apparently).

Signed-off-by: Stephen Hutchinson 
---
  libavformat/avisynth.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/libavformat/avisynth.c b/libavformat/avisynth.c
index 3d9fa2be50..d978e6ec40 100644
--- a/libavformat/avisynth.c
+++ b/libavformat/avisynth.c
@@ -251,6 +251,8 @@ static int avisynth_create_stream_video(AVFormatContext *s, 
AVStream *st)
  AVS_VideoFrame *frame;
  int error;
  int planar = 0; // 0: packed, 1: YUV, 2: Y8, 3: Planar RGB, 4: YUVA, 5: 
Planar RGBA
+int sar_num = 1;
+int sar_den = 1;
  
  st->codecpar->codec_type = AVMEDIA_TYPE_VIDEO;

  st->codecpar->codec_id   = AV_CODEC_ID_RAWVIDEO;
@@ -728,6 +730,12 @@ static int avisynth_create_stream_video(AVFormatContext 
*s, AVStream *st)
  st->codecpar->chroma_location = AVCHROMA_LOC_UNSPECIFIED;
  }
  }
+
+/* Sample aspect ratio */
+sar_num = avs_library.avs_prop_get_int(avs->env, avsmap, "_SARNum", 0, 
);
+sar_den = avs_library.avs_prop_get_int(avs->env, avsmap, "_SARDen", 0, 
);
+st->sample_aspect_ratio = (AVRational){ sar_num, sar_den };
+
  avs_library.avs_release_video_frame(frame);
  } else {
  st->codecpar->field_order = AV_FIELD_UNKNOWN;


Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP

2022-09-04 Thread Rémi Denis-Courmont

Le sunnuntaina 4. syyskuuta 2022, 20.48.26 EEST Lynne a écrit :
> > The pointer arithmetic could be slightly optimised with SH2ADD and
> > SH3ADD instructions from the Zvba extension. This would require more
> > conditional code, or requiring support for Zvba for probably neglible
> > performance gains though.
> 
> Did you test on real hardware or a VM?

I don't think we will see real and conforming RV64GV hardware this year, 
unless you count FPGAs. I hope we can get affordable stuff in 2023H1.

This is running on simulator. But since the code is already (AFAICT) using the 
largest possible grouping, there won't be that much room for further 
optimisations, other maybe than to add prefetching hints. In that sense, RVV 
is really nice in how it makes unrolling almost unnecessary/effortless.

The code is also already laid out to leverage multiple issue if available. 
RISC-V does not have post-index addressing modes, so the interleaving a fair 
amount of pointer arithmetic is unavoidable.

> If the former, what does checkasm --bench report?

...

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec/flac: critical fix

2022-09-04 Thread Paul B Mahol

Another critical fix for decoder.
From e61c05f721eee756739ba7cd864486ea9704b3c9 Mon Sep 17 00:00:00 2001
From: Paul B Mahol 
Date: Sun, 4 Sep 2022 20:50:16 +0200
Subject: [PATCH] avcodec/flac: smallest frame is 10 bytes

Fixes #9270

Signed-off-by: Paul B Mahol 
---
 libavcodec/flac.h| 2 +-
 libavcodec/flacdec.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavcodec/flac.h b/libavcodec/flac.h
index f118dbbff3..00e631ed20 100644
--- a/libavcodec/flac.h
+++ b/libavcodec/flac.h
@@ -33,7 +33,7 @@
 #define FLAC_MAX_CHANNELS   8
 #define FLAC_MIN_BLOCKSIZE 16
 #define FLAC_MAX_BLOCKSIZE  65535
-#define FLAC_MIN_FRAME_SIZE11
+#define FLAC_MIN_FRAME_SIZE10
 
 enum {
 FLAC_CHMODE_INDEPENDENT = 0,
diff --git a/libavcodec/flacdec.c b/libavcodec/flacdec.c
index 075d76bc8a..5b8547a98f 100644
--- a/libavcodec/flacdec.c
+++ b/libavcodec/flacdec.c
@@ -577,7 +577,7 @@ static int flac_decode_frame(AVCodecContext *avctx, AVFrame *frame,
 
 /* check that there is at least the smallest decodable amount of data.
this amount corresponds to the smallest valid FLAC frame possible.
-   FF F8 69 02 00 00 9A 00 00 34 46 */
+   FF F8 69 02 00 00 9A 00 00 34 */
 if (buf_size < FLAC_MIN_FRAME_SIZE)
 return buf_size;
 
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP

2022-09-04 Thread Lynne

Sep 4, 2022, 15:54 by r...@remlab.net:

> The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe:
>
>  avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 
> +0200)
>
> are waiting thorough bashing at your express convenience up to:
>
>  riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300)
>
> Changes since v1:
>
> - Removed stray define.
> - Fixed mismatch between byte and element size in mul-scalar.
> - Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window.
> - Added float butterfly and dot product.
>
> All operations are unrolled to the maximum group size (8), with the
> exception of overlap/add. The later seems to require a minimum of 6
> vectors (maybe 5 by extremely careful ordering), so the group size is
> only 4.
>
> The pointer arithmetic could be slightly optimised with SH2ADD and
> SH3ADD instructions from the Zvba extension. This would require more
> conditional code, or requiring support for Zvba for probably neglible
> performance gains though.
>

Did you test on real hardware or a VM?
If the former, what does checkasm --bench report?

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 03/10] riscv: float vector-scalar multiplication with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

This is based on existing code from the VLC git tree with two minor
changes to account for the different function prototypes.
---
 libavutil/float_dsp.c|  2 ++
 libavutil/float_dsp.h|  1 +
 libavutil/riscv/Makefile |  4 ++-
 libavutil/riscv/float_dsp_init.c | 41 +
 libavutil/riscv/float_dsp_rvv.S  | 62 
 5 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 libavutil/riscv/float_dsp_init.c
 create mode 100644 libavutil/riscv/float_dsp_rvv.S

diff --git a/libavutil/float_dsp.c b/libavutil/float_dsp.c
index 8676c8b0f8..742dd679d2 100644
--- a/libavutil/float_dsp.c
+++ b/libavutil/float_dsp.c
@@ -156,6 +156,8 @@ av_cold AVFloatDSPContext *avpriv_float_dsp_alloc(int 
bit_exact)
 ff_float_dsp_init_arm(fdsp);
 #elif ARCH_PPC
 ff_float_dsp_init_ppc(fdsp, bit_exact);
+#elif ARCH_RISCV
+ff_float_dsp_init_riscv(fdsp);
 #elif ARCH_X86
 ff_float_dsp_init_x86(fdsp);
 #elif ARCH_MIPS
diff --git a/libavutil/float_dsp.h b/libavutil/float_dsp.h
index 9c664592bd..7cad9fc622 100644
--- a/libavutil/float_dsp.h
+++ b/libavutil/float_dsp.h
@@ -205,6 +205,7 @@ float avpriv_scalarproduct_float_c(const float *v1, const 
float *v2, int len);
 void ff_float_dsp_init_aarch64(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_arm(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_ppc(AVFloatDSPContext *fdsp, int strict);
+void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_x86(AVFloatDSPContext *fdsp);
 void ff_float_dsp_init_mips(AVFloatDSPContext *fdsp);
 
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
index 1f818043dc..6bf8243e8d 100644
--- a/libavutil/riscv/Makefile
+++ b/libavutil/riscv/Makefile
@@ -1 +1,3 @@
-OBJS += riscv/cpu.o
+OBJS += riscv/cpu.o \
+riscv/float_dsp_init.o \
+riscv/float_dsp_rvv.o
diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
new file mode 100644
index 00..279412c036
--- /dev/null
+++ b/libavutil/riscv/float_dsp_init.c
@@ -0,0 +1,41 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include 
+
+#include "libavutil/attributes.h"
+#include "libavutil/cpu.h"
+#include "libavutil/float_dsp.h"
+
+void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
+int len);
+
+void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
+int len);
+
+av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
+{
+int flags = av_get_cpu_flags();
+
+if (flags & AV_CPU_FLAG_ZVE32F) {
+fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+
+if (flags & AV_CPU_FLAG_ZVE64D)
+fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
+}
+}
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
new file mode 100644
index 00..98d06c6d07
--- /dev/null
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -0,0 +1,62 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+#include "asm.S"
+
+.option  arch, +v
+
+// (a0) = (a1) * fa0 [0..a2-1]
+func ff_vector_fmul_scalar_rvv
+#if defined (__riscv_float_abi_soft)
+fmv.w.x  fa0, a2
+mv   a2, a3
+#endif
+
+1:  vsetvli  t0, a2, e32, m8, ta, ma
+slli t1, t0, 2
+vle32.v  v16, (a1)
+add  a1, a1, t1
+vfmul.vf v16, v16, fa0
+sub  a2, a2, t0
+vse32.v  v16, (a0)

[FFmpeg-devel] [PATCH 02/10] riscv: initial common header for assembler macros

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/asm.S | 33 +
 1 file changed, 33 insertions(+)
 create mode 100644 libavutil/riscv/asm.S

diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S
new file mode 100644
index 00..31001b8bdb
--- /dev/null
+++ b/libavutil/riscv/asm.S
@@ -0,0 +1,33 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "config.h"
+
+.macro func sym
+.text
+.align 2
+
+.global \sym
+.type   \sym, %function
+\sym:
+
+.macro endfunc
+.size   \sym, . - \sym
+.purgem endfunc
+.endm
+.endm
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 01/10] riscv: add CPU flags for the RISC-V Vector extension

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

RVV defines a total of 12 different extensions: V, Zvl32b, Zvl64b,
Zvl128b, Zvl256b, Zvl512b, Zvl1024b, Zve32x, Zve32f, Zve64x, Zve64f and
Zve64d.

At this stage, we don't expose the vector length extensions Zvl*, as
the vector length is most commonly determined at run-time depending on
the element size and the effective multipler. There are anyways no
other run-time mechanisms defiend to determine the actual maximum
vector length than to invoke VSETVL.

Zve64f is equivalent to Zve32f plus Zve64x, so it is exposed as a
convenience flag, but not tracked internally. Likewise V is the
equivalent of Zve64d plus Zvl128b.

Technically, Zve32f and Zve64x are both implied by Zve64d and both
imply Zve32x, leaving only 5 possibilities (including no vector
support), but we keep 4 separate bits for easy run-time checks as on
other instruction set architectures.
---
 libavutil/cpu.c  | 14 ++
 libavutil/cpu.h  |  6 +
 libavutil/cpu_internal.h |  1 +
 libavutil/riscv/Makefile |  1 +
 libavutil/riscv/cpu.c| 57 
 5 files changed, 79 insertions(+)
 create mode 100644 libavutil/riscv/Makefile
 create mode 100644 libavutil/riscv/cpu.c

diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 0035e927a5..83bf513cf2 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -62,6 +62,8 @@ static int get_cpu_flags(void)
 return ff_get_cpu_flags_arm();
 #elif ARCH_PPC
 return ff_get_cpu_flags_ppc();
+#elif ARCH_RISCV
+return ff_get_cpu_flags_riscv();
 #elif ARCH_X86
 return ff_get_cpu_flags_x86();
 #elif ARCH_LOONGARCH
@@ -178,6 +180,18 @@ int av_parse_cpu_caps(unsigned *flags, const char *s)
 #elif ARCH_LOONGARCH
 { "lsx",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_LSX 
 },.unit = "flags" },
 { "lasx", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_LASX
 },.unit = "flags" },
+#elif ARCH_RISCV
+#define AV_CPU_FLAG_ZVE32F_M (AV_CPU_FLAG_ZVE32F | AV_CPU_FLAG_ZVE32X)
+#define AV_CPU_FLAG_ZVE64X_M (AV_CPU_FLAG_ZVE64X | AV_CPU_FLAG_ZVE32X)
+#define AV_CPU_FLAG_ZVE64D_M (AV_CPU_FLAG_ZVE64D | AV_CPU_FLAG_ZVE64F_M)
+#define AV_CPU_FLAG_ZVE64F_M (AV_CPU_FLAG_ZVE32F_M | AV_CPU_FLAG_ZVE64X_M)
+#define AV_CPU_FLAG_VECTORS  AV_CPU_FLAG_ZVE64D_M
+{ "vectors",  NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_VECTORS 
 },.unit = "flags" },
+{ "zve32x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_ZVE32X  
 },.unit = "flags" },
+{ "zve32f",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
AV_CPU_FLAG_ZVE32F_M },.unit = "flags" },
+{ "zve64x",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
AV_CPU_FLAG_ZVE64X_M },.unit = "flags" },
+{ "zve64f",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
AV_CPU_FLAG_ZVE64F_M },.unit = "flags" },
+{ "zve64d",   NULL, 0, AV_OPT_TYPE_CONST, { .i64 = 
AV_CPU_FLAG_ZVE64D_M },.unit = "flags" },
 #endif
 { NULL },
 };
diff --git a/libavutil/cpu.h b/libavutil/cpu.h
index 9711e574c5..44836e50d6 100644
--- a/libavutil/cpu.h
+++ b/libavutil/cpu.h
@@ -78,6 +78,12 @@
 #define AV_CPU_FLAG_LSX  (1 << 0)
 #define AV_CPU_FLAG_LASX (1 << 1)
 
+// RISC-V Vector extension
+#define AV_CPU_FLAG_ZVE32X   (1 << 0) /* 8-, 16-, 32-bit integers */
+#define AV_CPU_FLAG_ZVE32F   (1 << 1) /* single precision scalars */
+#define AV_CPU_FLAG_ZVE64X   (1 << 2) /* 64-bit integers */
+#define AV_CPU_FLAG_ZVE64D   (1 << 3) /* double precision scalars */
+
 /**
  * Return the flags which specify extensions supported by the CPU.
  * The returned value is affected by av_force_cpu_flags() if that was used
diff --git a/libavutil/cpu_internal.h b/libavutil/cpu_internal.h
index 650d47fc96..634f28bac4 100644
--- a/libavutil/cpu_internal.h
+++ b/libavutil/cpu_internal.h
@@ -48,6 +48,7 @@ int ff_get_cpu_flags_mips(void);
 int ff_get_cpu_flags_aarch64(void);
 int ff_get_cpu_flags_arm(void);
 int ff_get_cpu_flags_ppc(void);
+int ff_get_cpu_flags_riscv(void);
 int ff_get_cpu_flags_x86(void);
 int ff_get_cpu_flags_loongarch(void);
 
diff --git a/libavutil/riscv/Makefile b/libavutil/riscv/Makefile
new file mode 100644
index 00..1f818043dc
--- /dev/null
+++ b/libavutil/riscv/Makefile
@@ -0,0 +1 @@
+OBJS += riscv/cpu.o
diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c
new file mode 100644
index 00..9e4cce5e8b
--- /dev/null
+++ b/libavutil/riscv/cpu.c
@@ -0,0 +1,57 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public

[FFmpeg-devel] [PATCH 10/10] riscv: float vector dot product with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/float_dsp_init.c |  2 ++
 libavutil/riscv/float_dsp_rvv.S  | 23 +++
 2 files changed, 25 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 887706d899..7c2fc10e99 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -35,6 +35,7 @@ void ff_vector_fmul_add_rvv(float *dst, const float *src0, 
const float *src1,
 void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
  const float *src1, int len);
 void ff_butterflies_float_rvv(float *v1, float *v2, int len);
+float ff_scalarproduct_float_rvv(const float *v1, const float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
  int len);
@@ -55,6 +56,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
 fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
 fdsp->butterflies_float = ff_butterflies_float_rvv;
+fdsp->scalarproduct_float = ff_scalarproduct_float_rvv;
 
 if (flags & AV_CPU_FLAG_ZVE64D) {
 fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 7e7d48374f..7616abb9f6 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -173,6 +173,29 @@ func ff_butterflies_float_rvv
 ret
 endfunc
 
+// a0 = (a0).(a1) [0..a2-1]
+func ff_scalarproduct_float_rvv
+vsetvli  zero, zero, e32, m8, ta, ma
+vmv.s.x  v8, zero
+
+1:  vsetvli  t0, a2, e32, m8, ta, ma
+slli t1, t0, 2
+vle32.v  v16, (a0)
+add  a0, a0, t1
+vle32.v  v24, (a1)
+add  a1, a1, t1
+vfmul.vv v16, v16, v24
+sub  a2, a2, t0
+vfredusum.vs v8, v16, v8
+bnez a2, 1b
+
+vfmv.f.s fa0, v8
+#if defined (__riscv_float_abi_soft)
+fmv.x.w  a0, fa0
+#endif
+ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv
 1:  vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 09/10] riscv: float vector windowed overlap/add with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 35 
 2 files changed, 38 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 1183460181..887706d899 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -28,6 +28,8 @@ void ff_vector_fmac_scalar_rvv(float *dst, const float *src, 
float mul,
 int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
 int len);
+void ff_vector_fmul_window_rvv(float *dst, const float *src0,
+const float *src1, const float *win, int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
  const float *src2, int len);
 void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
@@ -49,6 +51,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 fdsp->vector_fmul = ff_vector_fmul_rvv;
 fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
 fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+fdsp->vector_fmul_window = ff_vector_fmul_window_rvv;
 fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
 fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
 fdsp->butterflies_float = ff_butterflies_float_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index e3738ef7c5..7e7d48374f 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -79,6 +79,41 @@ func ff_vector_fmul_scalar_rvv
 ret
 endfunc
 
+func ff_vector_fmul_window_rvv
+// a0: dst, a1: src0, a2: src1, a3: window, a4: length
+addi   t0, a4, -1
+addt1, t0, a4
+slli   t0, t0, 2
+slli   t1, t1, 2
+adda2, a2, t0
+addt0, a0, t1
+addt3, a3, t1
+li t1, -4 // byte stride
+
+1:  vsetvlit2, a4, e32, m4, ta, ma
+slli   t4, t2, 2
+vle32.vv16, (a1)
+adda1, a1, t4
+vlse32.v   v20, (a2), t1
+suba2, a2, t4
+vle32.vv24, (a3)
+adda3, a3, t4
+vlse32.v   v28, (t3), t1
+subt3, t3, t4
+vfmul.vv   v0, v16, v28
+suba4, a4, t2
+vfmul.vv   v8, v16, v24
+vfnmsac.vv v0, v20, v24
+vfmacc.vv  v8, v20, v28
+vse32.vv0, (a0)
+adda0, a0, t4
+vsse32.v   v8, (t0), t1
+subt0, t0, t4
+bnez   a4, 1b
+
+ret
+endfunc
+
 // (a0) = (a1) * (a2) + (a3) [0..a4-1]
 func ff_vector_fmul_add_rvv
 1:  vsetvli   t0, a4, e32, m8, ta, ma
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 08/10] riscv: float reversed vector multiplication with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 22 ++
 2 files changed, 25 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 2165394585..1183460181 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -30,6 +30,8 @@ void ff_vector_fmul_scalar_rvv(float *dst, const float *src, 
float mul,
 int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
  const float *src2, int len);
+void ff_vector_fmul_reverse_rvv(float *dst, const float *src0,
+ const float *src1, int len);
 void ff_butterflies_float_rvv(float *v1, float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
@@ -48,6 +50,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
 fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
+fdsp->vector_fmul_reverse = ff_vector_fmul_reverse_rvv;
 fdsp->butterflies_float = ff_butterflies_float_rvv;
 
 if (flags & AV_CPU_FLAG_ZVE64D) {
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 61beb868b0..e3738ef7c5 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -98,6 +98,28 @@ func ff_vector_fmul_add_rvv
 ret
 endfunc
 
+// (a0) = (a1) * reverse(a2) [0..a3-1]
+func ff_vector_fmul_reverse_rvv
+add  t3, a3, -1
+li   t2, -4 // byte stride
+slli t3, t3, 2
+add  a2, a2, t3
+
+1:  vsetvli  t0, a3, e32, m8, ta, ma
+slli t1, t0, 2
+vle32.v  v16, (a1)
+add  a1, a1, t1
+vlse32.v v24, (a2), t2
+sub  a2, a2, t1
+vfmul.vv v16, v16, v24
+sub  a3, a3, t0
+vse32.v  v16, (a0)
+add  a0, a0, t1
+bnez a3, 1b
+
+ret
+endfunc
+
 // (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1]
 func ff_butterflies_float_rvv
 1:  vsetvli  t0, a2, e32, m8, ta, ma
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 07/10] riscv: float vector sum-and-difference with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/float_dsp_init.c |  2 ++
 libavutil/riscv/float_dsp_rvv.S  | 18 ++
 2 files changed, 20 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 8539fe9ac5..2165394585 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -30,6 +30,7 @@ void ff_vector_fmul_scalar_rvv(float *dst, const float *src, 
float mul,
 int len);
 void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
  const float *src2, int len);
+void ff_butterflies_float_rvv(float *v1, float *v2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
  int len);
@@ -47,6 +48,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
 fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
+fdsp->butterflies_float = ff_butterflies_float_rvv;
 
 if (flags & AV_CPU_FLAG_ZVE64D) {
 fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 27190c21ff..61beb868b0 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -98,6 +98,24 @@ func ff_vector_fmul_add_rvv
 ret
 endfunc
 
+// (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1]
+func ff_butterflies_float_rvv
+1:  vsetvli  t0, a2, e32, m8, ta, ma
+slli t1, t0, 2
+vle32.v  v16, (a0)
+vle32.v  v24, (a1)
+vfadd.vv v0, v16, v24
+vfsub.vv v8, v16, v24
+sub  a2, a2, t0
+vse32.v  v0, (a0)
+add  a0, a0, t1
+vse32.v  v8, (a1)
+add  a1, a1, t1
+bnez a2, 1b
+
+ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv
 1:  vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 06/10] riscv: float vector multiplication-addition with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/float_dsp_init.c |  3 +++
 libavutil/riscv/float_dsp_rvv.S  | 19 +++
 2 files changed, 22 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index a1bb112ec7..8539fe9ac5 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -28,6 +28,8 @@ void ff_vector_fmac_scalar_rvv(float *dst, const float *src, 
float mul,
 int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
 int len);
+void ff_vector_fmul_add_rvv(float *dst, const float *src0, const float *src1,
+ const float *src2, int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
  int len);
@@ -44,6 +46,7 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 fdsp->vector_fmul = ff_vector_fmul_rvv;
 fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
 fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
+fdsp->vector_fmul_add = ff_vector_fmul_add_rvv;
 
 if (flags & AV_CPU_FLAG_ZVE64D) {
 fdsp->vector_dmul = ff_vector_dmul_rvv;
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 8adfa6085c..27190c21ff 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -79,6 +79,25 @@ func ff_vector_fmul_scalar_rvv
 ret
 endfunc
 
+// (a0) = (a1) * (a2) + (a3) [0..a4-1]
+func ff_vector_fmul_add_rvv
+1:  vsetvli   t0, a4, e32, m8, ta, ma
+slli  t1, t0, 2
+vle32.v   v8, (a1)
+add   a1, a1, t1
+vle32.v   v16, (a2)
+add   a2, a2, t1
+vle32.v   v24, (a3)
+add   a3, a3, t1
+vfmadd.vv v8, v16, v24
+sub   a4, a4, t0
+vse32.v   v8, (a0)
+add   a0, a0, t1
+bnez  a4, 1b
+
+ret
+endfunc
+
 // (a0) = (a1) * (a2) [0..a3-1]
 func ff_vector_dmul_rvv
 1:  vsetvli  t0, a3, e64, m8, ta, ma
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 05/10] riscv: float vector multiply-accumulate with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/float_dsp_init.c |  6 +
 libavutil/riscv/float_dsp_rvv.S  | 42 
 2 files changed, 48 insertions(+)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 4135284c76..a1bb112ec7 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -24,11 +24,15 @@
 
 void ff_vector_fmul_rvv(float *dst, const float *src0, const float *src1,
  int len);
+void ff_vector_fmac_scalar_rvv(float *dst, const float *src, float mul,
+int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
 int len);
 
 void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
  int len);
+void ff_vector_dmac_scalar_rvv(double *dst, const double *src, double mul,
+int len);
 void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
 int len);
 
@@ -38,10 +42,12 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext 
*fdsp)
 
 if (flags & AV_CPU_FLAG_ZVE32F) {
 fdsp->vector_fmul = ff_vector_fmul_rvv;
+fdsp->vector_fmac_scalar = ff_vector_fmac_scalar_rvv;
 fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 
 if (flags & AV_CPU_FLAG_ZVE64D) {
 fdsp->vector_dmul = ff_vector_dmul_rvv;
+fdsp->vector_dmac_scalar = ff_vector_dmac_scalar_rvv;
 fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
 }
 }
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 15c875f9d2..8adfa6085c 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -38,6 +38,27 @@ func ff_vector_fmul_rvv
 ret
 endfunc
 
+// (a0) += (a1) * fa0 [0..a2-1]
+func ff_vector_fmac_scalar_rvv
+#if defined (__riscv_float_abi_soft)
+fmv.w.x   fa0, a2
+mva2, a3
+#endif
+
+1:  vsetvli   t0, a2, e32, m8, ta, ma
+slli  t1, t0, 2
+vle32.v   v24, (a1)
+add   a1, a1, t1
+vle32.v   v16, (a0)
+vfmacc.vf v16, fa0, v24
+sub   a2, a2, t0
+vse32.v   v16, (a0)
+add   a0, a0, t1
+bnez  a2, 1b
+
+ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_fmul_scalar_rvv
 #if defined (__riscv_float_abi_soft)
@@ -75,6 +96,27 @@ func ff_vector_dmul_rvv
 ret
 endfunc
 
+// (a0) += (a1) * fa0 [0..a2-1]
+func ff_vector_dmac_scalar_rvv
+#if defined (__riscv_float_abi_soft) || defined (__riscv_float_abi_single)
+fmv.d.x   fa0, a2
+mva2, a3
+#endif
+
+1:  vsetvli   t0, a2, e64, m8, ta, ma
+slli  t1, t0, 3
+vle64.v   v24, (a1)
+add   a1, a1, t1
+vle64.v   v16, (a0)
+vfmacc.vf v16, fa0, v24
+sub   a2, a2, t0
+vse64.v   v16, (a0)
+add   a0, a0, t1
+bnez  a2, 1b
+
+ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_dmul_scalar_rvv
 #if defined (__riscv_float_abi_soft) || defined (__riscv_float_abi_single)
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 04/10] riscv: float vector-vector multiplication with RVV

2022-09-04 Thread remi

From: Rémi Denis-Courmont 

---
 libavutil/riscv/float_dsp_init.c |  9 -
 libavutil/riscv/float_dsp_rvv.S  | 34 
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/libavutil/riscv/float_dsp_init.c b/libavutil/riscv/float_dsp_init.c
index 279412c036..4135284c76 100644
--- a/libavutil/riscv/float_dsp_init.c
+++ b/libavutil/riscv/float_dsp_init.c
@@ -22,9 +22,13 @@
 #include "libavutil/cpu.h"
 #include "libavutil/float_dsp.h"
 
+void ff_vector_fmul_rvv(float *dst, const float *src0, const float *src1,
+ int len);
 void ff_vector_fmul_scalar_rvv(float *dst, const float *src, float mul,
 int len);
 
+void ff_vector_dmul_rvv(double *dst, const double *src0, const double *src1,
+ int len);
 void ff_vector_dmul_scalar_rvv(double *dst, const double *src, double mul,
 int len);
 
@@ -33,9 +37,12 @@ av_cold void ff_float_dsp_init_riscv(AVFloatDSPContext *fdsp)
 int flags = av_get_cpu_flags();
 
 if (flags & AV_CPU_FLAG_ZVE32F) {
+fdsp->vector_fmul = ff_vector_fmul_rvv;
 fdsp->vector_fmul_scalar = ff_vector_fmul_scalar_rvv;
 
-if (flags & AV_CPU_FLAG_ZVE64D)
+if (flags & AV_CPU_FLAG_ZVE64D) {
+fdsp->vector_dmul = ff_vector_dmul_rvv;
 fdsp->vector_dmul_scalar = ff_vector_dmul_scalar_rvv;
+}
 }
 }
diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S
index 98d06c6d07..15c875f9d2 100644
--- a/libavutil/riscv/float_dsp_rvv.S
+++ b/libavutil/riscv/float_dsp_rvv.S
@@ -21,6 +21,23 @@
 
 .option  arch, +v
 
+// (a0) = (a1) * (a2) [0..a3-1]
+func ff_vector_fmul_rvv
+1:  vsetvli  t0, a3, e32, m8, ta, ma
+slli t1, t0, 2
+vle32.v  v16, (a1)
+add  a1, a1, t1
+vle32.v  v24, (a2)
+add  a2, a2, t1
+vfmul.vv v16, v16, v24
+sub  a3, a3, t0
+vse32.v  v16, (a0)
+add  a0, a0, t1
+bnez a3, 1b
+
+ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_fmul_scalar_rvv
 #if defined (__riscv_float_abi_soft)
@@ -41,6 +58,23 @@ func ff_vector_fmul_scalar_rvv
 ret
 endfunc
 
+// (a0) = (a1) * (a2) [0..a3-1]
+func ff_vector_dmul_rvv
+1:  vsetvli  t0, a3, e64, m8, ta, ma
+slli t1, t0, 3
+vle64.v  v16, (a1)
+add  a1, a1, t1
+vle64.v  v24, (a2)
+add  a2, a2, t1
+vfmul.vv v16, v16, v24
+sub  a3, a3, t0
+vse64.v  v16, (a0)
+add  a0, a0, t1
+bnez a3, 1b
+
+ret
+endfunc
+
 // (a0) = (a1) * fa0 [0..a2-1]
 func ff_vector_dmul_scalar_rvv
 #if defined (__riscv_float_abi_soft) || defined (__riscv_float_abi_single)
-- 
2.37.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCHv2 0/10] RISC-V V floating point DSP

2022-09-04 Thread Rémi Denis-Courmont

The following changes since commit b6e8fc1c201d58672639134a737137e1ba7b55fe:

  avcodec/speexdec: improve support for speex in non-ogg (2022-09-04 11:31:57 
+0200)

are waiting thorough bashing at your express convenience up to:

  riscv: float vector dot product with RVV (2022-09-04 16:45:38 +0300)

Changes since v1:

- Removed stray define.
- Fixed mismatch between byte and element size in mul-scalar.
- Added fmul, fac, dmul, dmac, fmul-add, fmul-reverse, fmul-window.
- Added float butterfly and dot product.

All operations are unrolled to the maximum group size (8), with the
exception of overlap/add. The later seems to require a minimum of 6
vectors (maybe 5 by extremely careful ordering), so the group size is
only 4.

The pointer arithmetic could be slightly optimised with SH2ADD and
SH3ADD instructions from the Zvba extension. This would require more
conditional code, or requiring support for Zvba for probably neglible
performance gains though.


Rémi Denis-Courmont (10):
  riscv: add CPU flags for the RISC-V Vector extension
  riscv: initial common header for assembler macros
  riscv: float vector-scalar multiplication with RVV
  riscv: float vector-vector multiplication with RVV
  riscv: float vector multiply-accumulate with RVV
  riscv: float vector multiplication-addition with RVV
  riscv: float vector sum-and-difference with RVV
  riscv: float reversed vector multiplication with RVV
  riscv: float vector windowed overlap/add with RVV
  riscv: float vector dot product with RVV

 libavutil/cpu.c  |  14 +++
 libavutil/cpu.h  |   6 +
 libavutil/cpu_internal.h |   1 +
 libavutil/float_dsp.c|   2 +
 libavutil/float_dsp.h|   1 +
 libavutil/riscv/Makefile |   3 +
 libavutil/riscv/asm.S|  33 +
 libavutil/riscv/cpu.c|  57 +
 libavutil/riscv/float_dsp_init.c |  67 ++
 libavutil/riscv/float_dsp_rvv.S  | 255 +++
 10 files changed, 439 insertions(+)
 create mode 100644 libavutil/riscv/Makefile
 create mode 100644 libavutil/riscv/asm.S
 create mode 100644 libavutil/riscv/cpu.c
 create mode 100644 libavutil/riscv/float_dsp_init.c
 create mode 100644 libavutil/riscv/float_dsp_rvv.S

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/3] riscv: add CPU flags for the RISC-V Vector extension

2022-09-04 Thread Rémi Denis-Courmont

Le sunnuntaina 4. syyskuuta 2022, 9.39.36 EEST Lynne a écrit :
> In particular, doing the tail, which consists of 2 equal length transforms.
> On AVX we interleave the coefficients from 2x4pt transforms during
> lookups since we can do them simultaneously and save on
> shuffles. Doing them individually wouldn't be as efficient.

I'm not going to boldy state that one size fits all, because I am pretty sure 
that it would come back to bite me in soft and sensitive tissue. But unlike 
SIMD extensions, RISC-V V and ARM SVE favour the use of offsets and masks to 
deal with misaligned edges, so I'm not sure how useful the insights from AVX 
are.

> > And besides, how do you want to get the value if not with assembler? This
> > is currently not found in ELF HWCAP and probably never will be.

> Sucks, knowing how wide the units are is as important as
> knowing how much L1 cache you have for me.

I understand that for some multidimensional calculations, you need to make 
special cases. The obvious case would be if the vector is too short to fit a 
column or row of elements whilst performing a transposition.

But even then, and even if we end up later on with, say, an arch_prctl() call 
to find the vector size, I don't think exposing it in CPU flags would be a good 
idea. VSETVL & VSETIVL also account for the element size and the vector group 
multiplier, so it seems better to use either of them than to reimplement the 
same logic in C based on the raw vector bit length.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [RFC] d3dva security hw+threads

2022-09-04 Thread Soft Works




> -Original Message-
> From: ffmpeg-devel  On Behalf Of
> Anton Khirnov
> Sent: Sunday, September 4, 2022 8:58 AM
> To: FFmpeg development discussions and patches  de...@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [RFC] d3dva security hw+threads
> 
> Quoting Timo Rothenpieler (2022-09-02 01:46:59)
> > On 02.09.2022 01:32, Michael Niedermayer wrote:
> > > Hi all
> > >
> > > Theres a use after free issue in H.264 Decoding on d3d11va with
> multiple threads
> > > I dont have the hardware/platform nor do i know the hw decoding
> code so i made
> > > no attempt to fix this beyond asking others to ...
> >
> > hwaccel with multiple threads being broken is not exactly a
> surprise.
> > So we could just disable that, and always have it be one single
> thread?
> 
> We are already disabling it in a way - the frame threading code
> ensures
> that threads run one at a time when hwaccel is being used.


Is there a described way to repro? I would try whether it still 
happens after removing the lock code in hwcontext_d3d11va.c.
Those locks are not really needed and might prevent release 
of dx11 resources in proper order. It's a guess only but 
easy to try.

softworkz
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [RFC] d3dva security hw+threads

2022-09-04 Thread Anton Khirnov

Quoting Timo Rothenpieler (2022-09-02 01:46:59)
> On 02.09.2022 01:32, Michael Niedermayer wrote:
> > Hi all
> > 
> > Theres a use after free issue in H.264 Decoding on d3d11va with multiple 
> > threads
> > I dont have the hardware/platform nor do i know the hw decoding code so i 
> > made
> > no attempt to fix this beyond asking others to ...
> 
> hwaccel with multiple threads being broken is not exactly a surprise.
> So we could just disable that, and always have it be one single thread?

We are already disabling it in a way - the frame threading code ensures
that threads run one at a time when hwaccel is being used.

-- 
Anton Khirnov
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/3] riscv: add CPU flags for the RISC-V Vector extension

2022-09-04 Thread Lynne

Sep 4, 2022, 07:41 by r...@remlab.net:

> Le sunnuntaina 4. syyskuuta 2022, 0.38.32 EEST Lynne a écrit :
>
>> I need to know the length in C, not assembly.
>>
>
> There may be some corner cases where that makes sense, but typically it 
> doesn't. Even if you're dealing in fixed-size macro blocks, you should 
> leverage 
> the larger vectors to unroll and process multiple macro blocks in parallel.
>

Some aspects of a split-radix FFT work better if you know how
much you could fit into a register upfront. In particular, doing
the tail, which consists of 2 equal length transforms. On AVX
we interleave the coefficients from 2x4pt transforms during
lookups since we can do them simultaneously and save on
shuffles. Doing them individually wouldn't be as efficient.
Since interleaving is done during the permute step, we have
to know from C how much to interleave.
Of course if you switched away from a split-radix algorithm (X+X/2+X/2),
you could have a very simple 100-line FFT if you had arbitrarily
long vectors (or the pretense of such), but if you didn't have
the hardware to back that up, the penalty for using a suboptimal
algorithm wouldn't be worth it.


> And besides, how do you want to get the value if not with assembler? This is 
> currently not found in ELF HWCAP and probably never will be.
>

Sucks, knowing how wide the units are is as important as
knowing how much L1 cache you have for me.


> I disagree. There are currently no means to negotiate a vector length with 
> the 
> OS, so that seems highly premature. And even if there was such a mechanism, 
> it's simply much faster to call VSETVL in an inline assembler macro where 
> needed than to compute the whole set of CPU flags.
>

Guess that's what I'll have to do.In due time anyway, who knows how many years 
it'll be until
a cheap enough device appears with vector support that
doesn't merely do what SVE2 devices did by reusing old NEON
unit designs.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH]lavfi/rotate: Fix undefined behaviour

2022-09-04 Thread Michael Koch


/Also, shouldn't the same change be done also to interpolate_bilinear8? /

I was unable to reproduce with 8-bit input.


When I tested it, the issue was reproducible only with 14-bit and 16-bit input. 
12-bit did work.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

54 matches

Mail list logo