date:20210112

Re: [FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

2021-01-12 Thread Lynne

Jan 12, 2021, 22:13 by felix.leclair...@hotmail.com:

> That's great! Any way for me to pull that branch or otherwise contribute?
>
The branch is here for now - https://github.com/haasn/FFmpeg
The only blocker to having it merged is for me to rewrite the vulkan 
synchronization
mechanism we currently use. Which I should hopefully get around to soon.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 07/11] avcodec: add cbs for h266/vvc

2021-01-12 Thread James Almer


On 1/11/2021 1:33 PM, Nuo Mi wrote:

---
  configure |2 +
  libavcodec/Makefile   |1 +
  libavcodec/cbs.c  |6 +
  libavcodec/cbs_h2645.c|  374 
  libavcodec/cbs_h266.h |  758 +++
  libavcodec/cbs_h266_syntax_template.c | 2774 +
  libavcodec/cbs_internal.h |3 +-
  7 files changed, 3917 insertions(+), 1 deletion(-)
  create mode 100644 libavcodec/cbs_h266.h
  create mode 100644 libavcodec/cbs_h266_syntax_template.c



[...]


+static int cbs_h266_replace_ph(CodedBitstreamContext *ctx,
+   CodedBitstreamUnit *unit)
+{
+CodedBitstreamH266Context *priv = ctx->priv_data;
+int err;
+err = ff_cbs_make_unit_refcounted(ctx, unit);
+if (err < 0)
+return err;
+av_buffer_unref(>ph_ref);
+av_assert0(unit->content_ref);
+priv->ph_ref = av_buffer_ref(unit->content_ref);
+if (!priv->ph_ref)
+return AVERROR(ENOMEM);
+priv->ph = (H266RawPH *)priv->ph_ref->data;


Make this function take a pointer to H266RawPH as argument, and set it here.
See below.


+return 0;
+}
+
+static int cbs_h266_read_nal_unit(CodedBitstreamContext *ctx,
+  CodedBitstreamUnit *unit)
+{
+GetBitContext gbc;
+int err;
+
+err = init_get_bits8(, unit->data, unit->data_size);
+if (err < 0)
+return err;
+
+err = ff_cbs_alloc_unit_content2(ctx, unit);
+if (err < 0)
+return err;
+
+switch (unit->type) {
+case VVC_SPS_NUT:
+{
+H266RawSPS *sps = unit->content;
+
+err = cbs_h266_read_sps(ctx, , sps);
+if (err < 0)
+return err;
+
+err = cbs_h266_replace_sps(ctx, unit);
+if (err < 0)
+return err;
+}
+break;
+
+case VVC_PPS_NUT:
+{
+H266RawPPS *pps = unit->content;
+
+err = cbs_h266_read_pps(ctx, , pps);
+if (err < 0)
+return err;
+
+err = cbs_h266_replace_pps(ctx, unit);
+if (err < 0)
+return err;
+}
+break;
+
+case VVC_PH_NUT:
+{
+H266RawPH *ph = unit->content;
+err = cbs_h266_read_ph(ctx, , ph);
+if (err < 0)
+return err;
+
+err = cbs_h266_replace_ph(ctx, unit);


Pass ph as mentioned above.


+if (err < 0)
+return err;
+}
+break;
+
+case VVC_TRAIL_NUT:
+case VVC_STSA_NUT:
+case VVC_RADL_NUT:
+case VVC_RASL_NUT:
+case VVC_IDR_W_RADL:
+case VVC_IDR_N_LP:
+case VVC_CRA_NUT:
+case VVC_GDR_NUT:
+{
+H266RawSlice *slice = unit->content;
+int pos, len;
+
+err = cbs_h266_read_slice_header(ctx, , >header);
+if (err < 0)
+return err;


Add a call to cbs_h266_replace_ph() here when 
slice->header.sh_picture_header_in_slice_header_flag is true, and pass a 
pointer to slice->header.sh_picture_header to it.


Do the same for the writing functions.

[...]


+static int FUNC(slice_header)(CodedBitstreamContext *ctx, RWContext *rw,
+  H266RawSliceHeader *current)
+{
+CodedBitstreamH266Context *h266 = ctx->priv_data;
+const H266RawSPS *sps;
+const H266RawPPS *pps;
+const H266RawPH  *ph;
+const H266RefPicLists *ref_pic_lists;
+int  err, i;
+uint8_t  nal_unit_type, qp_bd_offset;
+uint16_t curr_subpic_idx;
+uint16_t num_slices_in_subpic;
+
+HEADER("Slice Header");
+
+CHECK(FUNC(nal_unit_header)(ctx, rw, >nal_unit_header, -1));
+
+flag(sh_picture_header_in_slice_header_flag);
+if (current->sh_picture_header_in_slice_header_flag){
+CHECK(FUNC(picture_header)(ctx, rw, >sh_picture_header));
+if (!h266->ph_ref) {
+h266->ph_ref = av_buffer_allocz(sizeof(H266RawPH));
+if (!h266->ph_ref)
+return AVERROR(ENOMEM);
+}
+h266->ph = (H266RawPH*)h266->ph_ref->data;
+memcpy(h266->ph, >sh_picture_header, sizeof(H266RawPH));


With the above, you can remove all this and simply set ph to 
>sh_picture_header when 
current->sh_picture_header_in_slice_header_flag is true, or to h266->ph 
otherwise.


This saves an unnecessary buffer alloc per slice header that includes a 
picture header. The buffer reference most assuredly already exists, and 
you can reuse it.



+}
+sps = h266->active_sps;
+pps = h266->active_pps;
+ph  = h266->ph;
+
+if (!ph) {
+av_log(ctx->log_ctx, AV_LOG_ERROR, "Picture header not available.\n");
+return AVERROR_INVALIDDATA;
+}

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email

Re: [FFmpeg-devel] [PATCH v2 1/3] kmsgrab: Use invalid modifier if modifiers weren't used.

2021-01-12 Thread Bas Nieuwenhuizen

A friendly ping on reviewing this series. Thanks!

On Sat, Nov 14, 2020 at 12:15 AM Bas Nieuwenhuizen
 wrote:
>
> The kernel defaults to initializing the field to 0 when modifiers
> are not used and this happens to be linear. If we end up actually
> passing the modifier to a driver, tiling issues happen.
>
> So if the kernel doesn't return a modifier set it explicitly to
> INVALID. That way later processing knows there is no explicit
> modifier.
>
> v2:
>   Fix support for modifier overrides in case the getfb2 call does
>   not return a modifier.
> ---
>  libavdevice/kmsgrab.c | 27 +--
>  1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/libavdevice/kmsgrab.c b/libavdevice/kmsgrab.c
> index a0aa9dc22f..b740a32171 100644
> --- a/libavdevice/kmsgrab.c
> +++ b/libavdevice/kmsgrab.c
> @@ -160,6 +160,7 @@ static int kmsgrab_get_fb2(AVFormatContext *avctx,
>  KMSGrabContext *ctx = avctx->priv_data;
>  drmModeFB2 *fb;
>  int err, i, nb_objects;
> +uint64_t modifier = ctx->drm_format_modifier;
>
>  fb = drmModeGetFB2(ctx->hwctx->fd, plane->fb_id);
>  if (!fb) {
> @@ -195,6 +196,9 @@ static int kmsgrab_get_fb2(AVFormatContext *avctx,
>  goto fail;
>  }
>
> +if (fb->flags & DRM_MODE_FB_MODIFIERS)
> +modifier = fb->modifier;
> +
>  *desc = (AVDRMFrameDescriptor) {
>  .nb_layers = 1,
>  .layers[0] = {
> @@ -243,7 +247,7 @@ static int kmsgrab_get_fb2(AVFormatContext *avctx,
>  desc->objects[obj] = (AVDRMObjectDescriptor) {
>  .fd  = fd,
>  .size= size,
> -.format_modifier = fb->modifier,
> +.format_modifier = modifier,
>  };
>  desc->layers[0].planes[i] = (AVDRMPlaneDescriptor) {
>  .object_index = obj,
> @@ -557,15 +561,18 @@ static av_cold int kmsgrab_read_header(AVFormatContext 
> *avctx)
>  err = AVERROR(EINVAL);
>  goto fail;
>  }
> -if (ctx->drm_format_modifier != DRM_FORMAT_MOD_INVALID &&
> -ctx->drm_format_modifier != fb2->modifier) {
> -av_log(avctx, AV_LOG_ERROR, "Framebuffer format modifier "
> -   "%"PRIx64" does not match expected modifier.\n",
> -   fb2->modifier);
> -err = AVERROR(EINVAL);
> -goto fail;
> -} else {
> -ctx->drm_format_modifier = fb2->modifier;
> +
> +if (fb2->flags & DRM_MODE_FB_MODIFIERS) {
> +if (ctx->drm_format_modifier != DRM_FORMAT_MOD_INVALID &&
> +ctx->drm_format_modifier != fb2->modifier) {
> +av_log(avctx, AV_LOG_ERROR, "Framebuffer format modifier "
> +   "%"PRIx64" does not match expected modifier.\n",
> +   fb2->modifier);
> +err = AVERROR(EINVAL);
> +goto fail;
> +} else {
> +ctx->drm_format_modifier = fb2->modifier;
> +}
>  }
>  av_log(avctx, AV_LOG_VERBOSE, "Format is %s, from "
> "DRM format %"PRIx32" modifier %"PRIx64".\n",
> --
> 2.29.2
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v6] avformat/udp: return the error code instead of generic EIO

2021-01-12 Thread lance . lmwang

On Tue, Jan 12, 2021 at 10:35:52PM +0100, Marton Balint wrote:
> 
> 
> On Tue, 12 Jan 2021, Nicolas George wrote:
> 
> > lance.lmw...@gmail.com (12021-01-12):
> > > @@ -888,23 +901,24 @@ static int udp_open(URLContext *h, const char *uri, 
> > > int flags)
> > >  }
> > > 
> > >  if ((!is_output && s->circular_buffer_size) || (is_output && 
> > > s->bitrate && s->circular_buffer_size)) {
> > > -int ret;
> > > -
> > >  /* start the task going */
> > >  s->fifo = av_fifo_alloc(s->circular_buffer_size);
> > >  ret = pthread_mutex_init(>mutex, NULL);
> > >  if (ret != 0) {
> > >  av_log(h, AV_LOG_ERROR, "pthread_mutex_init failed : %s\n", 
> > > strerror(ret));
> > > +ret =  AVERROR(ret);
> 
> extra space before AVERROR(ret), similarly below.

OK, will fix them.

> 
> > >  goto fail;
> > >  }
> > >  ret = pthread_cond_init(>cond, NULL);
> > >  if (ret != 0) {
> > >  av_log(h, AV_LOG_ERROR, "pthread_cond_init failed : %s\n", 
> > > strerror(ret));
> > > +ret =  AVERROR(ret);
> > >  goto cond_fail;
> > >  }
> > >  ret = pthread_create(>circular_buffer_thread, NULL, 
> > > is_output?circular_buffer_task_tx:circular_buffer_task_rx, h);
> > >  if (ret != 0) {
> > >  av_log(h, AV_LOG_ERROR, "pthread_create failed : %s\n", 
> > > strerror(ret));
> > > +ret =  AVERROR(ret);
> > >  goto thread_fail;
> > >  }
> > >  s->thread_started = 1;
> > > @@ -923,7 +937,7 @@ static int udp_open(URLContext *h, const char *uri, 
> > > int flags)
> > >  closesocket(udp_fd);
> > >  av_fifo_freep(>fifo);
> > >  ff_ip_reset_filters(>filters);
> > > -return AVERROR(EIO);
> > > +return ret;
> > >  }
> > > 
> > >  static int udplite_open(URLContext *h, const char *uri, int flags)
> > 
> > Thanks for your efforts.
> 
> Yeah, hopefully this will be the last iteration :)
> 
> Thanks,
> Marton
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

-- 
Thanks,
Limin Wang
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v7] avformat/udp: return the error code instead of generic EIO

2021-01-12 Thread lance . lmwang

From: Limin Wang 

Signed-off-by: Limin Wang 
---
 libavformat/udp.c | 58 ++-
 1 file changed, 36 insertions(+), 22 deletions(-)

diff --git a/libavformat/udp.c b/libavformat/udp.c
index 13c346a..c4e403b 100644
--- a/libavformat/udp.c
+++ b/libavformat/udp.c
@@ -165,7 +165,7 @@ static int udp_set_multicast_ttl(int sockfd, int mcastTTL,
 if (addr->sa_family == AF_INET) {
 if (setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_TTL, , 
sizeof(mcastTTL)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IP_MULTICAST_TTL)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -173,7 +173,7 @@ static int udp_set_multicast_ttl(int sockfd, int mcastTTL,
 if (addr->sa_family == AF_INET6) {
 if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS, , 
sizeof(mcastTTL)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IPV6_MULTICAST_HOPS)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -193,7 +193,7 @@ static int udp_join_multicast_group(int sockfd, struct 
sockaddr *addr,struct soc
 mreq.imr_interface.s_addr = INADDR_ANY;
 if (setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, (const void 
*), sizeof(mreq)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IP_ADD_MEMBERSHIP)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -206,7 +206,7 @@ static int udp_join_multicast_group(int sockfd, struct 
sockaddr *addr,struct soc
 mreq6.ipv6mr_interface = 0;
 if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_ADD_MEMBERSHIP, , 
sizeof(mreq6)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IPV6_ADD_MEMBERSHIP)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -633,6 +633,7 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 char buf[256];
 struct sockaddr_storage my_addr;
 socklen_t len;
+int ret;
 
 h->is_streamed = 1;
 
@@ -641,12 +642,12 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 s->buffer_size = is_output ? UDP_TX_BUF_SIZE : UDP_RX_BUF_SIZE;
 
 if (s->sources) {
-if (ff_ip_parse_sources(h, s->sources, >filters) < 0)
+if ((ret = ff_ip_parse_sources(h, s->sources, >filters)) < 0)
 goto fail;
 }
 
 if (s->block) {
-if (ff_ip_parse_blocks(h, s->block, >filters) < 0)
+if ((ret = ff_ip_parse_blocks(h, s->block, >filters)) < 0)
 goto fail;
 }
 
@@ -712,11 +713,11 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 av_strlcpy(localaddr, buf, sizeof(localaddr));
 }
 if (av_find_info_tag(buf, sizeof(buf), "sources", p)) {
-if (ff_ip_parse_sources(h, buf, >filters) < 0)
+if ((ret = ff_ip_parse_sources(h, buf, >filters)) < 0)
 goto fail;
 }
 if (av_find_info_tag(buf, sizeof(buf), "block", p)) {
-if (ff_ip_parse_blocks(h, buf, >filters) < 0)
+if ((ret = ff_ip_parse_blocks(h, buf, >filters)) < 0)
 goto fail;
 }
 if (!is_output && av_find_info_tag(buf, sizeof(buf), "timeout", p))
@@ -742,7 +743,7 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 if (!(flags & AVIO_FLAG_READ))
 goto fail;
 } else {
-if (ff_udp_set_remote_url(h, uri) < 0)
+if ((ret = ff_udp_set_remote_url(h, uri)) < 0)
 goto fail;
 }
 
@@ -763,15 +764,22 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
  */
 if (s->reuse_socket > 0 || (s->is_multicast && s->reuse_socket < 0)) {
 s->reuse_socket = 1;
-if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, &(s->reuse_socket), 
sizeof(s->reuse_socket)) != 0)
+if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, &(s->reuse_socket), 
sizeof(s->reuse_socket)) != 0) {
+ret = ff_neterrno();
 goto fail;
+}
 }
 
 if (s->is_broadcast) {
 #ifdef SO_BROADCAST
-if (setsockopt (udp_fd, SOL_SOCKET, SO_BROADCAST, &(s->is_broadcast), 
sizeof(s->is_broadcast)) != 0)
+if (setsockopt (udp_fd, SOL_SOCKET, SO_BROADCAST, &(s->is_broadcast), 
sizeof(s->is_broadcast)) != 0) {
+ret = ff_neterrno();
+goto fail;
+}
+#else
+ret = AVERROR(ENOSYS);
+goto fail;
 #endif
-   goto fail;
 }
 
 /* Set the checksum coverage for UDP-Lite (RFC 3828) for sending and 
receiving.
@@ -788,8 +796,10 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 
 if (dscp >= 0) {
 dscp <<= 2;
-if (setsockopt (udp_fd, IPPROTO_IP, IP_TOS, , sizeof(dscp)) != 0)
+if (setsockopt (udp_fd, IPPROTO_IP, IP_TOS, , sizeof(dscp)) != 0) 
{
+ret = ff_neterrno();
 goto

Re: [FFmpeg-devel] [PATCH v4] avcodec/libvpxenc: add a way to set VP9E_SET_SVC_REF_FRAME_CONFIG

2021-01-12 Thread James Zern

On Fri, Jan 8, 2021 at 3:33 PM Wonkap Jang
 wrote:
>
> In order to fine-control referencing schemes in VP9 encoding, there
> is a need to use VP9E_SET_SVC_REF_FRAME_CONFIG method. This commit
> provides a way to use the API through frame metadata.
> ---
>  doc/encoders.texi  | 32 
>  libavcodec/libvpxenc.c | 84 ++
>  libavcodec/version.h   |  2 +-
>  3 files changed, 117 insertions(+), 1 deletion(-)
>

lgtm. I'll push this soon if there aren't any other comments.

> [...]
> +}
> +

I removed this extra whitespace locally.

> +#endif
> +
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 1/5] ac3enc_fixed: convert to 32-bit sample format

2021-01-12 Thread Lynne

Jan 12, 2021, 22:24 by andreas.rheinha...@gmail.com:

> Lynne:
>
>> The AC3 encoder used to be a separate library called "Aften", which
>> got merged into libavcodec (literally, SVN commits and all).
>> The merge preserved as much features from the library as possible.
>>
>> The code had two versions - a fixed point version and a floating
>> point version. FFmpeg had floating point DSP code used by other
>> codecs, the AC3 decoder including, so the floating-point DSP was
>> simply replaced with FFmpeg's own functions.
>> However, FFmpeg had no fixed-point audio code at that point. So
>> the encoder brought along its own fixed-point DSP functions,
>> including a fixed-point MDCT.
>>
>> The fixed-point MDCT itself is trivially just a float MDCT with a
>> different type and each multiply being a fixed-point multiply.
>> So over time, it got refactored, and the FFT used for all other codecs
>> was templated.
>>
>> Due to design decisions at the time, the fixed-point version of the
>> encoder operates at 16-bits of precision. Although convenient, this,
>> even at the time, was inadequate and inefficient. The encoder is noisy,
>> does not produce output comparable to the float encoder, and even
>> rings at higher frequencies due to the badly approximated winow function.
>>
>> Enter MIPS (owned by Imagination Technologies at the time). They wanted
>> quick fixed-point decoding on their FPUless cores. So they contributed
>> patches to template the AC3 decoder so it had both a fixed-point
>> and a floating-point version. They also did the same for the AAC decoder.
>> They however, used 32-bit samples. Not 16-bits. And we did not have
>> 32-bit fixed-point DSP functions, including an MDCT. But instead of
>> templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit 
>> fixed),
>> they simply copy-pasted their own MDCT into ours, and completely
>> ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected.
>>
>> This is also the status quo nowadays - 2 separate MDCTs, one which
>> produces floating point and 16-bit fixed point versions, and one
>> sort-of integrated which produces 32-bit MDCT.
>>
>> MIPS weren't all that interested in encoding, so they left the encoder
>> as-is, and they didn't care much about the ifdeffery, mess or quality - it's
>> not their problem.
>>
>> So the MDCT/FFT code has always been a thorn in anyone looking to clean up
>> code's eye.
>>
>> Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients.
>> So for the floating point version, the encoder simply runs the float MDCT,
>> and converts the resulting coefficients to 25-bit fixed-point, as AC3 is 
>> inherently
>> a fixed-point codec. For the fixed-point version, the input is 16-bit 
>> samples,
>> so to maximize precision the frame samples are analyzed and the highest set
>> bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then
>> scaled up via ac3_lshift_int16(), so the input for the FFT is always at 
>> least 14 bits,
>> computed in normalize_samples(). After FFT, the coefficients are scaled up 
>> to 25 bits.
>>
>> This patch simply changes the encoder to accept 32-bit samples, reusing
>> the already well-optimized 32-bit MDCT code, allowing us to clean up and drop
>> a large part of a very messy code of ours, as well as prepare for the future 
>> lavu/tx
>> conversion. The coefficients are simply scaled down to 25 bits during 
>> windowing,
>> skipping 2 separate scalings, as the hacks to extend precision are simply no 
>> longer
>> necessary. There's no point in running the MDCT always at 32 bits when you're
>> going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds
>> properly.
>>
>> This also makes the encoder even slightly more accurate over the float 
>> version,
>> as there's no coefficient conversion step necessary.
>>
>> SIZE SAVINGS:
>> ARM32:
>> HARDCODED TABLES:
>> BASE   - 10709590
>> DROP  DSP  - 10702872 - diff:   -6.56KiB
>> DROP  MDCT - 10667932 - diff:  -34.12KiB - both:   -40.68KiB
>> DROP  FFT  - 10336652 - diff: -323.52KiB - all:   -364.20KiB
>> SOFTCODED TABLES:
>> BASE   -  9685096
>> DROP  DSP  -  9678378 - diff:   -6.56KiB
>> DROP  MDCT -  9643466 - diff:  -34.09KiB - both:   -40.65KiB
>> DROP  FFT  -  9573918 - diff:  -67.92KiB - all:   -108.57KiB
>>
>> ARM64:
>> HARDCODED TABLES:
>> BASE   - 14641112
>> DROP  DSP  - 14633806 - diff:   -7.13KiB
>> DROP  MDCT - 14604812 - diff:  -28.31KiB - both:   -35.45KiB
>> DROP  FFT  - 14286826 - diff: -310.53KiB - all:   -345.98KiB
>> SOFTCODED TABLES:
>> BASE   - 13636238
>> DROP  DSP  - 13628932 - diff:   -7.13KiB
>> DROP  MDCT - 13599866 - diff:  -28.38KiB - both:   -35.52KiB
>> DROP  FFT  - 13542080 - diff:  -56.43KiB - all:    -91.95KiB
>>
>> x86:
>> HARDCODED TABLES:
>> BASE   - 12367336
>> DROP  DSP  - 12354698 - diff:  -12.34KiB
>> DROP  MDCT - 12331024 - diff:  -23.12KiB - both:

Re: [FFmpeg-devel] [PATCH v6] avformat/udp: return the error code instead of generic EIO

2021-01-12 Thread lance . lmwang

On Tue, Jan 12, 2021 at 03:36:00PM +0100, Nicolas George wrote:
> lance.lmw...@gmail.com (12021-01-12):
> > From: Limin Wang 
> > 
> > Signed-off-by: Limin Wang 
> > ---
> >  libavformat/udp.c | 58 
> > ++-
> >  1 file changed, 36 insertions(+), 22 deletions(-)
> > 
> > diff --git a/libavformat/udp.c b/libavformat/udp.c
> > index 13c346a..42e4563 100644
> > --- a/libavformat/udp.c
> > +++ b/libavformat/udp.c
> > @@ -165,7 +165,7 @@ static int udp_set_multicast_ttl(int sockfd, int 
> > mcastTTL,
> >  if (addr->sa_family == AF_INET) {
> >  if (setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_TTL, , 
> > sizeof(mcastTTL)) < 0) {
> >  ff_log_net_error(NULL, AV_LOG_ERROR, 
> > "setsockopt(IP_MULTICAST_TTL)");
> > -return -1;
> > +return ff_neterrno();
> >  }
> >  }
> >  #endif
> > @@ -173,7 +173,7 @@ static int udp_set_multicast_ttl(int sockfd, int 
> > mcastTTL,
> >  if (addr->sa_family == AF_INET6) {
> >  if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS, 
> > , sizeof(mcastTTL)) < 0) {
> >  ff_log_net_error(NULL, AV_LOG_ERROR, 
> > "setsockopt(IPV6_MULTICAST_HOPS)");
> > -return -1;
> > +return ff_neterrno();
> >  }
> >  }
> >  #endif
> > @@ -193,7 +193,7 @@ static int udp_join_multicast_group(int sockfd, struct 
> > sockaddr *addr,struct soc
> >  mreq.imr_interface.s_addr = INADDR_ANY;
> >  if (setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, (const void 
> > *), sizeof(mreq)) < 0) {
> >  ff_log_net_error(NULL, AV_LOG_ERROR, 
> > "setsockopt(IP_ADD_MEMBERSHIP)");
> > -return -1;
> > +return ff_neterrno();
> >  }
> >  }
> >  #endif
> > @@ -206,7 +206,7 @@ static int udp_join_multicast_group(int sockfd, struct 
> > sockaddr *addr,struct soc
> >  mreq6.ipv6mr_interface = 0;
> >  if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_ADD_MEMBERSHIP, , 
> > sizeof(mreq6)) < 0) {
> >  ff_log_net_error(NULL, AV_LOG_ERROR, 
> > "setsockopt(IPV6_ADD_MEMBERSHIP)");
> > -return -1;
> > +return ff_neterrno();
> >  }
> >  }
> >  #endif
> > @@ -633,6 +633,7 @@ static int udp_open(URLContext *h, const char *uri, int 
> > flags)
> >  char buf[256];
> >  struct sockaddr_storage my_addr;
> >  socklen_t len;
> 
> > +int ret = AVERROR(EIO);
> 
> ret should be left uninited, so that the compiler can warn you if there
> is a code path where you return it without setting it.

OK, will change it.

> 
> >  
> >  h->is_streamed = 1;
> >  
> > @@ -641,12 +642,12 @@ static int udp_open(URLContext *h, const char *uri, 
> > int flags)
> >  s->buffer_size = is_output ? UDP_TX_BUF_SIZE : UDP_RX_BUF_SIZE;
> >  
> >  if (s->sources) {
> > -if (ff_ip_parse_sources(h, s->sources, >filters) < 0)
> > +if ((ret = ff_ip_parse_sources(h, s->sources, >filters)) < 0)
> >  goto fail;
> >  }
> >  
> >  if (s->block) {
> > -if (ff_ip_parse_blocks(h, s->block, >filters) < 0)
> > +if ((ret = ff_ip_parse_blocks(h, s->block, >filters)) < 0)
> >  goto fail;
> >  }
> >  
> > @@ -712,11 +713,11 @@ static int udp_open(URLContext *h, const char *uri, 
> > int flags)
> >  av_strlcpy(localaddr, buf, sizeof(localaddr));
> >  }
> >  if (av_find_info_tag(buf, sizeof(buf), "sources", p)) {
> > -if (ff_ip_parse_sources(h, buf, >filters) < 0)
> > +if ((ret = ff_ip_parse_sources(h, buf, >filters)) < 0)
> >  goto fail;
> >  }
> >  if (av_find_info_tag(buf, sizeof(buf), "block", p)) {
> > -if (ff_ip_parse_blocks(h, buf, >filters) < 0)
> > +if ((ret = ff_ip_parse_blocks(h, buf, >filters)) < 0)
> >  goto fail;
> >  }
> >  if (!is_output && av_find_info_tag(buf, sizeof(buf), "timeout", p))
> > @@ -742,7 +743,7 @@ static int udp_open(URLContext *h, const char *uri, int 
> > flags)
> >  if (!(flags & AVIO_FLAG_READ))
> >  goto fail;
> >  } else {
> > -if (ff_udp_set_remote_url(h, uri) < 0)
> > +if ((ret = ff_udp_set_remote_url(h, uri)) < 0)
> >  goto fail;
> >  }
> >  
> > @@ -763,15 +764,22 @@ static int udp_open(URLContext *h, const char *uri, 
> > int flags)
> >   */
> >  if (s->reuse_socket > 0 || (s->is_multicast && s->reuse_socket < 0)) {
> >  s->reuse_socket = 1;
> > -if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, 
> > &(s->reuse_socket), sizeof(s->reuse_socket)) != 0)
> > +if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, 
> > &(s->reuse_socket), sizeof(s->reuse_socket)) != 0) {
> > +ret = ff_neterrno();
> >  goto fail;
> > +}
> >  }
> >  
> >  if (s->is_broadcast) {
> >  #ifdef SO_BROADCAST
> > -if

[FFmpeg-devel] [PATCH 3/4] avformat/flvdec: Check for EOF in amf_skip_tag()

2021-01-12 Thread Michael Niedermayer

Fixes: Timeout
Fixes: 
29070/clusterfuzz-testcase-minimized-ffmpeg_dem_KUX_fuzzer-5650106766458880

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavformat/flvdec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libavformat/flvdec.c b/libavformat/flvdec.c
index ad6e7a3ca5..9e348f66b3 100644
--- a/libavformat/flvdec.c
+++ b/libavformat/flvdec.c
@@ -841,6 +841,9 @@ static int amf_skip_tag(AVIOContext *pb, AMFDataType type)
 {
 int nb = -1, ret, parse_name = 1;
 
+if (avio_feof(pb))
+return AVERROR_EOF;
+
 switch (type) {
 case AMF_DATA_TYPE_NUMBER:
 avio_skip(pb, 8);
-- 
2.17.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/4] avformat/asfdec_o: Check size vs. offset in detect_unknown_subobject()

2021-01-12 Thread Michael Niedermayer

Fixes: signed integer overflow: 2314885530818453566 + 7503032301549264928 
cannot be represented in type 'long'
Fixes: 
26639/clusterfuzz-testcase-minimized-ffmpeg_dem_ASF_O_fuzzer-6024222100684800

Alternatively this could be ignored but then the end condition of the loop
would be hard to reach as avio_tell() is int64_t

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavformat/asfdec_o.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libavformat/asfdec_o.c b/libavformat/asfdec_o.c
index 655ba9f9ef..6ce5a3fb43 100644
--- a/libavformat/asfdec_o.c
+++ b/libavformat/asfdec_o.c
@@ -1661,6 +1661,9 @@ static int detect_unknown_subobject(AVFormatContext *s, 
int64_t offset, int64_t
 ff_asf_guid guid;
 int ret;
 
+if (offset > INT64_MAX - size)
+return AVERROR_INVALIDDATA;
+
 while (avio_tell(pb) <= offset + size) {
 if (avio_tell(pb) == asf->offset)
 break;
-- 
2.17.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/4] avformat/asfdec_o: Check lang_idx

2021-01-12 Thread Michael Niedermayer

Fixes: index 26981 out of bounds for type 'ASFStreamData [128]'
Fixes: 
27334/clusterfuzz-testcase-minimized-ffmpeg_dem_ASF_O_fuzzer-6197611002068992

Alternatively the array could be increased in size or the cases not fitting be 
ignored

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavformat/asfdec_o.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavformat/asfdec_o.c b/libavformat/asfdec_o.c
index 6ce5a3fb43..0a65850059 100644
--- a/libavformat/asfdec_o.c
+++ b/libavformat/asfdec_o.c
@@ -851,6 +851,8 @@ static int asf_read_ext_stream_properties(AVFormatContext 
*s, const GUIDParseTab
 st_num = avio_rl16(pb);
 st_num&= ASF_STREAM_NUM;
 lang_idx   = avio_rl16(pb); // Stream Language ID Index
+if (lang_idx >= ASF_MAX_STREAMS)
+return AVERROR_INVALIDDATA;
 for (i = 0; i < asf->nb_streams; i++) {
 if (st_num == asf->asf_st[i]->stream_index) {
 st   = s->streams[asf->asf_st[i]->index];
-- 
2.17.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 4/4] avformat/electronicarts: More chunk_size checks

2021-01-12 Thread Michael Niedermayer

Fixes: Timeout
Fixes: 
26909/clusterfuzz-testcase-minimized-ffmpeg_dem_EA_fuzzer-6489496553783296

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavformat/electronicarts.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libavformat/electronicarts.c b/libavformat/electronicarts.c
index 4c292f29a2..a98a8d604e 100644
--- a/libavformat/electronicarts.c
+++ b/libavformat/electronicarts.c
@@ -607,10 +607,14 @@ static int ea_read_packet(AVFormatContext *s, AVPacket 
*pkt)
 break;
 } else if (ea->audio_codec == AV_CODEC_ID_PCM_S16LE_PLANAR ||
ea->audio_codec == AV_CODEC_ID_MP3) {
+if (chunk_size < 12)
+return AVERROR_INVALIDDATA;
 num_samples = avio_rl32(pb);
 avio_skip(pb, 8);
 chunk_size -= 12;
 } else if (ea->audio_codec == AV_CODEC_ID_ADPCM_PSX) {
+if (chunk_size < 8)
+return AVERROR_INVALIDDATA;
 avio_skip(pb, 8);
 chunk_size -= 8;
 }
@@ -693,6 +697,8 @@ static int ea_read_packet(AVFormatContext *s, AVPacket *pkt)
 case fVGT_TAG:
 case MADm_TAG:
 case MADe_TAG:
+if (chunk_size > INT_MAX - 8)
+return AVERROR_INVALIDDATA;
 avio_seek(pb, -8, SEEK_CUR);// include chunk preamble
 chunk_size += 8;
 goto get_video_packet;
-- 
2.17.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v6] avformat/udp: return the error code instead of generic EIO

2021-01-12 Thread Marton Balint




On Tue, 12 Jan 2021, Nicolas George wrote:


lance.lmw...@gmail.com (12021-01-12):

@@ -888,23 +901,24 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 }

 if ((!is_output && s->circular_buffer_size) || (is_output && s->bitrate && 
s->circular_buffer_size)) {
-int ret;
-
 /* start the task going */
 s->fifo = av_fifo_alloc(s->circular_buffer_size);
 ret = pthread_mutex_init(>mutex, NULL);
 if (ret != 0) {
 av_log(h, AV_LOG_ERROR, "pthread_mutex_init failed : %s\n", 
strerror(ret));
+ret =  AVERROR(ret);


extra space before AVERROR(ret), similarly below.


 goto fail;
 }
 ret = pthread_cond_init(>cond, NULL);
 if (ret != 0) {
 av_log(h, AV_LOG_ERROR, "pthread_cond_init failed : %s\n", 
strerror(ret));
+ret =  AVERROR(ret);
 goto cond_fail;
 }
 ret = pthread_create(>circular_buffer_thread, NULL, 
is_output?circular_buffer_task_tx:circular_buffer_task_rx, h);
 if (ret != 0) {
 av_log(h, AV_LOG_ERROR, "pthread_create failed : %s\n", 
strerror(ret));
+ret =  AVERROR(ret);
 goto thread_fail;
 }
 s->thread_started = 1;
@@ -923,7 +937,7 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 closesocket(udp_fd);
 av_fifo_freep(>fifo);
 ff_ip_reset_filters(>filters);
-return AVERROR(EIO);
+return ret;
 }

 static int udplite_open(URLContext *h, const char *uri, int flags)


Thanks for your efforts.


Yeah, hopefully this will be the last iteration :)

Thanks,
Marton
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-12 Thread Soft Works



> -Original Message-
> From: ffmpeg-devel  On Behalf Of
> Lynne
> Sent: Tuesday, January 12, 2021 9:47 PM
> To: FFmpeg development discussions and patches  de...@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.
> 
> Jan 12, 2021, 19:28 by reimar.doeffin...@gmx.de:
> 
> >>
> >> On 10 Jan 2021, at 19:55, Lynne  wrote:
> >>
> >> Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de:
> >>
> >>> From: Reimar Döffinger 
> >>>
> >>> real0m15.040s
> >>> user0m18.874s (80.7% of original)
> >>> sys 0m0.168s
> >>>
> >>
> >> I think I have to disagree.
> >> The performance gains are marginal,
> >>
> >
> > It’s almost 20%. At least for this combination of codec and stream a
> > large amount of time is spend in non-DSP functions, so even
> > hand-written assembler won’t give you huge gains.
> >
> It's non-guaranteed 20% on a single system. It could change, and it could very
> well mess up like gcc does with autovectorization, which we still explicitly
> disable because FATE fails (-fno-tree-vectorize, and I was the one who sent
> an RFC to try to undo it somewhat recently. Even though it was an RFC the
> reaction from devs was quite cold).

I wonder whether there's a way to enable autovectorization only for 
specific loops? But that would probably be compiler-specific.

> >> its definitely something the compiler should be able to decide on its
> >> own,
> >>
> >
> > So you object to unlikely() macros as well?
> > It’s really just giving the compiler a hint it should try, though I
> > admit the configure part makes it look otherwise.
> >
> I'm more against the macro and changes to the code itself. If you can make it
> work without adding a macro to individual loops or the likes of
> av_cold/av_hot or any other changes to the code, I'll be more welcoming.
> I really _hate_ compiler hints. Take a look at the upipe source code to see
> what a cthulian monstrosity made of hint flags looks like. Every single branch
> had a cold/hot macro and it was the project's coding style. It's completely
> irredeemable.

OpenMP is a standard at least, which is supported by many compilers and
#pragma omp simd is not really a "monstrosity".


> >> Most of the loops this is added to are trivially SIMDable.

Could you provide some examples? What constructs would you suggest, 
that can be applied trivially? And that it would be compiled as SIMD even
though fno-tree-vectorize is set? 

Thanks,
softworkz


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-12 Thread Reimar Döffinger

> On 12 Jan 2021, at 21:46, Lynne  wrote:
> 
> Jan 12, 2021, 19:28 by reimar.doeffin...@gmx.de:
> 
>> It’s almost 20%. At least for this combination of
>> codec and stream a large amount of time is spend in
>> non-DSP functions, so even hand-written assembler
>> won’t give you huge gains.
>> 
> It's non-guaranteed 20% on a single system. It could change, and it could very
> well mess up like gcc does with autovectorization, which we still explicitly 
> disable
> because FATE fails (-fno-tree-vectorize, and I was the one who sent an RFC to
> try to undo it somewhat recently. Even though it was an RFC the reaction from 
> devs
> was quite cold).

Oh, thanks for the reminder, I thought that was gone because it seems
it’s not used for clang, and MPlayer does not seem to set that.
I need to compare it, however the problem with the auto-vectorization
is exactly that the compiler will try to apply it to everything,
which has at least 2 issues:
1) it gigantically increases the risk for bugs when it's every
single loop instead of loops that we already wrote assembler for
somewhere.
2) it will quite often make things worse, by vectorizing loops
that are rarely iterated over more than a few times (and it
needs to handle a whole lot of code to handle loop counts not
a multiple of vector size) - because all too often the compiler
can only take a wild guess if “width” is usually 1 or 1920,
while we DO know.

>>> its definitely something the compiler should
>>> be able to decide on its own,
>>> 
>> 
>> So you object to unlikely() macros as well?
>> It’s really just giving the compiler a hint it should try, though I admit 
>> the configure part makes it
>> look otherwise.
>> 
> I'm more against the macro and changes to the code itself. If you can make it
> work without adding a macro to individual loops or the likes of 
> av_cold/av_hot or
> any other changes to the code, I'll be more welcoming.

I expect that will just run into the same issue as the tree-vectorize...

> I really _hate_ compiler hints. Take a look at the upipe source code to see 
> what
> a cthulian monstrosity made of hint flags looks like. Every single branch had
> a cold/hot macro and it was the project's coding style. It's completely 
> irredeemable.

I guess my suggested solution would be to require proof of
clearly measurable performance benefit.
But I see the point that if it gets “randomly” added to loops
that might turn out quite a mess.

>>> Most of the loops this is added to are trivially SIMDable.
>>> 
>> 
>> How many hours of effort do you consider “trivial”?
>> Especially if it’s someone not experienced?
>> It might be fairly trivial with intrinsics, however
>> many of your counter-arguments also apply
>> to intrinsics (and to a degree inline assembly).
>> That’s btw not just a rhetorical question because
>> I’m pretty sure I am not going to all the trouble
>> to port more of the arm 32-bit assembler functions
>> since it’s a huge PITA, and I was wondering if there
>> was a point to even have a try with intrinsics...
>> 
> Intrinsics and inline assembly are a whole different thing than magic
> macros that tell and force the compiler what a well written compiler
> should already very well know about.

There are no well written compilers, in a way ;)
I would also argue that most of what intrinsics do,
such a compiler should figure out on its own, too.
And the first time I tried intrinsics they slowed the
loop down by a factor 2 because the compiler stored and
loaded the value to stack between every intrinsic,
so it’s not like they are not without problems.
But I was actually thinking that it might be somewhat
interesting to have a kind of “generic SIMD intrinsics”.
Though I think I read that such a thing has already be
tried, so it might just be wasted time.

> I already said all that can be said here: this will halt efforts on actually
> optimizing the code in exchange for naive trust in compilers.

I’m not sure it will discourage it more than having to write
the optimizations over and over, for Armv7 NEON, for Armv8 Linux,
for Armv8 Windows, then SVE/SVE2, who knows maybe Armv9
will also need a rewrite.
SSE2, AVX256, AVX512 for x86, so much stuff never gets ported
to the new versions.
I’d also claim anyone naively trusting in compilers is unlikely
to write SIMD optimizations either way :)

> New platforms will be stuck at scalar performance anyway until
> the compilers for the arch are smart enough to deal with vectorization.

That seems to happen a long time before someone gets around to
optimising FFmpeg though.
This is particularly true when it’s a new OS and not CPU architecture
platform.
For example macOS we are lucky enough that the assembler etc. are
largely compatible to Linux.
But for Windows-on-Arm there is no GNU assembler, and the Microsoft
assembler needs a completely different syntax, so even the assembly
we DO have just doesn’t work.

Anyway, thanks for the discussion.
I still think the situation with SIMD optimizations

Re: [FFmpeg-devel] [PATCH v2 1/5] ac3enc_fixed: convert to 32-bit sample format

2021-01-12 Thread Andreas Rheinhardt

Lynne:
> The AC3 encoder used to be a separate library called "Aften", which
> got merged into libavcodec (literally, SVN commits and all).
> The merge preserved as much features from the library as possible.
> 
> The code had two versions - a fixed point version and a floating
> point version. FFmpeg had floating point DSP code used by other
> codecs, the AC3 decoder including, so the floating-point DSP was
> simply replaced with FFmpeg's own functions.
> However, FFmpeg had no fixed-point audio code at that point. So
> the encoder brought along its own fixed-point DSP functions,
> including a fixed-point MDCT.
> 
> The fixed-point MDCT itself is trivially just a float MDCT with a
> different type and each multiply being a fixed-point multiply.
> So over time, it got refactored, and the FFT used for all other codecs
> was templated.
> 
> Due to design decisions at the time, the fixed-point version of the
> encoder operates at 16-bits of precision. Although convenient, this,
> even at the time, was inadequate and inefficient. The encoder is noisy,
> does not produce output comparable to the float encoder, and even
> rings at higher frequencies due to the badly approximated winow function.
> 
> Enter MIPS (owned by Imagination Technologies at the time). They wanted
> quick fixed-point decoding on their FPUless cores. So they contributed
> patches to template the AC3 decoder so it had both a fixed-point
> and a floating-point version. They also did the same for the AAC decoder.
> They however, used 32-bit samples. Not 16-bits. And we did not have
> 32-bit fixed-point DSP functions, including an MDCT. But instead of
> templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit 
> fixed),
> they simply copy-pasted their own MDCT into ours, and completely
> ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected.
> 
> This is also the status quo nowadays - 2 separate MDCTs, one which
> produces floating point and 16-bit fixed point versions, and one
> sort-of integrated which produces 32-bit MDCT.
> 
> MIPS weren't all that interested in encoding, so they left the encoder
> as-is, and they didn't care much about the ifdeffery, mess or quality - it's
> not their problem.
> 
> So the MDCT/FFT code has always been a thorn in anyone looking to clean up
> code's eye.
> 
> Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients.
> So for the floating point version, the encoder simply runs the float MDCT,
> and converts the resulting coefficients to 25-bit fixed-point, as AC3 is 
> inherently
> a fixed-point codec. For the fixed-point version, the input is 16-bit samples,
> so to maximize precision the frame samples are analyzed and the highest set
> bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then
> scaled up via ac3_lshift_int16(), so the input for the FFT is always at least 
> 14 bits,
> computed in normalize_samples(). After FFT, the coefficients are scaled up to 
> 25 bits.
> 
> This patch simply changes the encoder to accept 32-bit samples, reusing
> the already well-optimized 32-bit MDCT code, allowing us to clean up and drop
> a large part of a very messy code of ours, as well as prepare for the future 
> lavu/tx
> conversion. The coefficients are simply scaled down to 25 bits during 
> windowing,
> skipping 2 separate scalings, as the hacks to extend precision are simply no 
> longer
> necessary. There's no point in running the MDCT always at 32 bits when you're
> going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds
> properly.
> 
> This also makes the encoder even slightly more accurate over the float 
> version,
> as there's no coefficient conversion step necessary.
> 
> SIZE SAVINGS:
> ARM32:
> HARDCODED TABLES:
> BASE   - 10709590
> DROP  DSP  - 10702872 - diff:   -6.56KiB
> DROP  MDCT - 10667932 - diff:  -34.12KiB - both:   -40.68KiB
> DROP  FFT  - 10336652 - diff: -323.52KiB - all:   -364.20KiB
> SOFTCODED TABLES:
> BASE   -  9685096
> DROP  DSP  -  9678378 - diff:   -6.56KiB
> DROP  MDCT -  9643466 - diff:  -34.09KiB - both:   -40.65KiB
> DROP  FFT  -  9573918 - diff:  -67.92KiB - all:   -108.57KiB
> 
> ARM64:
> HARDCODED TABLES:
> BASE   - 14641112
> DROP  DSP  - 14633806 - diff:   -7.13KiB
> DROP  MDCT - 14604812 - diff:  -28.31KiB - both:   -35.45KiB
> DROP  FFT  - 14286826 - diff: -310.53KiB - all:   -345.98KiB
> SOFTCODED TABLES:
> BASE   - 13636238
> DROP  DSP  - 13628932 - diff:   -7.13KiB
> DROP  MDCT - 13599866 - diff:  -28.38KiB - both:   -35.52KiB
> DROP  FFT  - 13542080 - diff:  -56.43KiB - all:    -91.95KiB
> 
> x86:
> HARDCODED TABLES:
> BASE   - 12367336
> DROP  DSP  - 12354698 - diff:  -12.34KiB
> DROP  MDCT - 12331024 - diff:  -23.12KiB - both:   -35.46KiB
> DROP  FFT  - 12029788 - diff: -294.18KiB - all:   -329.64KiB
> SOFTCODED TABLES:
> BASE   - 11358094
> DROP  DSP  -

Re: [FFmpeg-devel] Development of a CUDA accelerated variant of the libav vf_tonemap

2021-01-12 Thread Felix LeClair

That's great! Any way for me to pull that branch or otherwise 
contribute?


Have been using FFmpeg for a few years now, so hopping to be able to 
give back.


On Tue, Jan 12, 2021 at 5:55 am, Lynne  wrote:
Jan 11, 2021, 23:27 by felix.leclair...@hotmail.com 
:


 Hi guys and gals, first post on this mailing list, apologies for 
any formatting/stylistic snafus


 TLDR; we currently have tone mapping filters (typically used to map 
content from a 10bit HDR source to an 8bit SDR output) that are done 
on CPU with Zscale from Zlib, or hardware implementations using 
VAAPI or OpenCL. Having a version implemented in CUDA would round 
out the main HWaccels types.


 Context:
  I'm a computer engineering student up in Canada with an interest 
in high efficiency distributed processing. As a personal project I'm 
trying to build a cluster of Nvidia Jetson Nano's to be able to 
handle a few dozen streams (mix of SD, HD, FHD, UHD, 4kHDR) at once 
while drawing south of 100W at peak. These little devices can do 
anywhere from 1 to 9 streams of content at a time depending on 
resolution/framerate in hardware in any mix of HEVC or H.264, so 3 
of them should get me most of the way to where I want to go (this 
would be a 30W package capable of ~12 2160p30@10 bit -> 1080p30 8bit 
streams).


 The issue is that, 4 little arm64 cores are just not going to be 
able to tonemap using Zscale in real time, even with the encoder and 
decoders sharing memory with the CPU (so no PCIe memcopy penalty). 
On the other hand, the built in GPU and the relative simplicity of 
most tone mapping algorithms (say hable) should make quick work of 
this. Unfortunately (or fortunately for me to learn with?) there 
isn't a CUDA version of the filter.


 Question/guidance:
 I've read through the doc on how to write filters, as well as 
looking at the other cuda filters currently in the source and have a 
general idea of where I'm going, but haven't been able to fully nail 
down how to access frames from hwupload_cuda passed to 
vf_tonemap_cuda.c which in turn passes that frame to 
vf_tonemap_cuda.cu for processing. I have a repo with everything 
I've been pulling together for my project, but the piece of interest 
is under */cuda_filter/ in the source tree. 
<>


 Would anyone mind helping me out with how to architect this?



The tonemap filter is just a (very old by now) copy of libplacebo's 
tonemapping.

No one has bothered to keep it in sync.
I'm working on a libplacebo wrapper currently, so once that's merged 
there

will be up to date hardware tonemapping.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org 


To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org 
 with subject "unsubscribe".


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-12 Thread Lynne

Jan 12, 2021, 19:28 by reimar.doeffin...@gmx.de:

>>
>> On 10 Jan 2021, at 19:55, Lynne  wrote:
>>
>> Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de:
>>
>>> From: Reimar Döffinger 
>>>
>>> real0m15.040s
>>> user0m18.874s (80.7% of original)
>>> sys 0m0.168s
>>>
>>
>> I think I have to disagree.
>> The performance gains are marginal,
>>
>
> It’s almost 20%. At least for this combination of
> codec and stream a large amount of time is spend in
> non-DSP functions, so even hand-written assembler
> won’t give you huge gains.
>
It's non-guaranteed 20% on a single system. It could change, and it could very
well mess up like gcc does with autovectorization, which we still explicitly 
disable
because FATE fails (-fno-tree-vectorize, and I was the one who sent an RFC to
try to undo it somewhat recently. Even though it was an RFC the reaction from 
devs
was quite cold).



>> its definitely something the compiler should
>> be able to decide on its own,
>>
>
> So you object to unlikely() macros as well?
> It’s really just giving the compiler a hint it should try, though I admit the 
> configure part makes it
> look otherwise.
>
I'm more against the macro and changes to the code itself. If you can make it
work without adding a macro to individual loops or the likes of av_cold/av_hot 
or
any other changes to the code, I'll be more welcoming.
I really _hate_ compiler hints. Take a look at the upipe source code to see what
a cthulian monstrosity made of hint flags looks like. Every single branch had
a cold/hot macro and it was the project's coding style. It's completely 
irredeemable.



>> Most of the loops this is added to are trivially SIMDable.
>>
>
> How many hours of effort do you consider “trivial”?
> Especially if it’s someone not experienced?
> It might be fairly trivial with intrinsics, however
> many of your counter-arguments also apply
> to intrinsics (and to a degree inline assembly).
> That’s btw not just a rhetorical question because
> I’m pretty sure I am not going to all the trouble
> to port more of the arm 32-bit assembler functions
> since it’s a huge PITA, and I was wondering if there
> was a point to even have a try with intrinsics...
>
Intrinsics and inline assembly are a whole different thing than magic
macros that tell and force the compiler what a well written compiler
should already very well know about.



>> Just because no one has
>> had the motivation to do SIMD for a pretty unpopular codec doesn't mean we 
>> should
>> compromise.
>>
>
> If you think of AArch64 specifically, I can
> kind of agree.
> However I wouldn’t say the word “compromise”
> is appropriate when there’s a good chance nothing
> better will ever come to exist.
> But the real point is not AArch64, that is just
> a very convenient test platform.
> The point is to raise the minimum bar.
> A new architecture, RISC-V for example or something
> else should not be stuck at scalar performance
> until someone actually gets around to implementing
> assembler optimizations.
> And just to be clear: I don’t actually care about
> HEVC, it just seemed a nice target to do some
> experiments.
>
I already said all that can be said here: this will halt efforts on actually
optimizing the code in exchange for naive trust in compilers.
New platforms will be stuck at scalar performance anyway until
the compilers for the arch are smart enough to deal with vectorization.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-12 Thread Reimar Döffinger



> On 12 Jan 2021, at 19:52, Soft Works  wrote:
> 
> 
> 
>> -Original Message-
>> From: ffmpeg-devel  On Behalf Of
>> reimar.doeffin...@gmx.de
>> Sent: Sunday, January 10, 2021 5:44 PM
>> To: ffmpeg-devel@ffmpeg.org
>> Cc: Reimar Döffinger 
>> Subject: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.
>> 
>> From: Reimar Döffinger 
>> 
>> This requests loops to be vectorized using SIMD instructions.
>> The performance increase is far from hand-optimized assembly but still
>> significant over the plain C version.
>> Typical values are a 2-4x speedup where a hand-written version would
>> achieve 4x-10x.
>> So it is far from a replacement, however some architures will get hand-
>> written assembler quite late or not at all, and this is a good improvement 
>> for
>> a trivial amount of work.
>> The cause, besides the compiler being a compiler, is usually that it does not
>> manage to use saturating instructions and thus has to use 32-bit operations
>> where actually saturating 16-bit operations would be sufficient.
>> Other causes are for example the av_clip functions that are not ideal for
>> vectorization (and even as scalar code not optimal for any modern CPU that
>> has either CSEL or MAX/MIN instructions).
>> And of course this only works for relatively simple loops, the IDCT functions
>> for example seemed not possible to optimize that way.
> 
> ...
> 
>> +if enabled openmp_simd; then
>> +ompopt="-fopenmp"
>> +if ! test_cflags $ompopt ; then
>> +test_cflags -Xpreprocessor -fopenmp && ompopt="-Xpreprocessor -
>> fopenmp"
> 
> Isn't it sufficient to specify -fopenmp-simd instead of -fopenmp for this 
> patch?

I think so, I just didn’t know/even expect that option to exist!
Thanks a lot for the tip!

> As OMP SIMD is the only openmp feature that is used, there's no need to link
> to the openmp lib. 


That it doesn’t do anyway because -fopenmp is not in the linker flags,
but I admit that was a bit of a hacky solution.

Thanks,
Reimar
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2] Replace arrays of pointers to strings by arrays of strings

2021-01-12 Thread Reimar Döffinger

> On 12 Jan 2021, at 02:41, Andreas Rheinhardt  
> wrote:

> Of course I am all ears for how to make it clear that someone who
> modifies the strings also needs to check the array dimensions.

I think I kind of agree with the other comments, this would/should
rather have to be something that can be checked in an automated way.
In principle I also do not much like this solution, it does not
work very well when the strings are of very different sizes.
I’ve not found a solution that is really worth the effort needed,
but I think it would be better to have an approach that is more
along the lines of what PIC does: encode only offsets.
In assembler you can just encode the strings and then have an array
with the offsets to them, but in C that is undefined behaviour…
Where performance doesn’t matter, what you definitely can do
is to just dump one string after the other in an array and then
use a function to scan for the desired index.
The ideal solution from my point of view, would be to have
such an array of all strings together and then an array with
the offsets to each.
With some kind of special preprocessor or code generator that
would be easy, but that makes the code messy.
Doing it in pure C also ends up quite a mess, the below is about as
far as I got, no idea if someone knows some magic to make it actually
bearable…

#include 

#define STRINGS \
X(0, 1, "test1") \
X(1, 2, "test2") \
X(2, 3, "testteststeste2") \

#define X(n, m, x) x "\0"
static const char stringdata[] = STRINGS ;
#undef X

#define X(n, m, x) static const int strpos##m = strpos##n + sizeof(x);
static const int strpos0 = 0;
STRINGS
#undef X
int main()
{
printf("0: %s\n", stringdata + strpos0);
printf("1: %s\n", stringdata + strpos1);
printf("2: %s\n", stringdata + strpos2);
}

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-12 Thread Soft Works



> -Original Message-
> From: ffmpeg-devel  On Behalf Of
> reimar.doeffin...@gmx.de
> Sent: Sunday, January 10, 2021 5:44 PM
> To: ffmpeg-devel@ffmpeg.org
> Cc: Reimar Döffinger 
> Subject: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.
> 
> From: Reimar Döffinger 
> 
> This requests loops to be vectorized using SIMD instructions.
> The performance increase is far from hand-optimized assembly but still
> significant over the plain C version.
> Typical values are a 2-4x speedup where a hand-written version would
> achieve 4x-10x.
> So it is far from a replacement, however some architures will get hand-
> written assembler quite late or not at all, and this is a good improvement for
> a trivial amount of work.
> The cause, besides the compiler being a compiler, is usually that it does not
> manage to use saturating instructions and thus has to use 32-bit operations
> where actually saturating 16-bit operations would be sufficient.
> Other causes are for example the av_clip functions that are not ideal for
> vectorization (and even as scalar code not optimal for any modern CPU that
> has either CSEL or MAX/MIN instructions).
> And of course this only works for relatively simple loops, the IDCT functions
> for example seemed not possible to optimize that way.

...

> +if enabled openmp_simd; then
> +ompopt="-fopenmp"
> +if ! test_cflags $ompopt ; then
> +test_cflags -Xpreprocessor -fopenmp && ompopt="-Xpreprocessor -
> fopenmp"

Isn't it sufficient to specify -fopenmp-simd instead of -fopenmp for this patch?

As OMP SIMD is the only openmp feature that is used, there's no need to link
to the openmp lib. 

softworkz


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-12 Thread Reimar Döffinger

> 
> On 10 Jan 2021, at 19:55, Lynne  wrote:
> 
> Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de:
> 
>> From: Reimar Döffinger 
>> 
>> real0m15.040s
>> user0m18.874s (80.7% of original)
>> sys 0m0.168s
>> 
> 
> I think I have to disagree.
> The performance gains are marginal,

It’s almost 20%. At least for this combination of
codec and stream a large amount of time is spend in
non-DSP functions, so even hand-written assembler
won’t give you huge gains.

> its definitely something the compiler should
> be able to decide on its own,

So you object to unlikely() macros as well?
It’s really just giving the compiler a hint it should try, though I admit the 
configure part makes it
look otherwise.

> Most of the loops this is added to are trivially SIMDable.

How many hours of effort do you consider “trivial”?
Especially if it’s someone not experienced?
It might be fairly trivial with intrinsics, however
many of your counter-arguments also apply
to intrinsics (and to a degree inline assembly).
That’s btw not just a rhetorical question because
I’m pretty sure I am not going to all the trouble
to port more of the arm 32-bit assembler functions
since it’s a huge PITA, and I was wondering if there
was a point to even have a try with intrinsics...

> Just because no one has
> had the motivation to do SIMD for a pretty unpopular codec doesn't mean we 
> should
> compromise.

If you think of AArch64 specifically, I can
kind of agree.
However I wouldn’t say the word “compromise”
is appropriate when there’s a good chance nothing
better will ever come to exist.
But the real point is not AArch64, that is just
a very convenient test platform.
The point is to raise the minimum bar.
A new architecture, RISC-V for example or something
else should not be stuck at scalar performance
until someone actually gets around to implementing
assembler optimizations.
And just to be clear: I don’t actually care about
HEVC, it just seemed a nice target to do some
experiments.

Best regards,
Reimar
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-12 Thread Reimar Döffinger

> On 12 Jan 2021, at 13:24, Josh Dekker  wrote:
> 
> Hi,
> 
> On 2021-01-08 21:36, reimar.doeffin...@gmx.de wrote:
>> From: Reimar Döffinger 
>> Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
>> available on aarch64.
>> For a UHD HDR (10 bit) sample video these were consuming the most time
>> and this optimization reduced overall decode time from 19.4s to 16.4s,
>> approximately 15% speedup.
>> Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts",
>> running on Apple M1.
>> ---
>> libavcodec/aarch64/Makefile   |   2 +
>> libavcodec/aarch64/hevcdsp_idct_neon.S| 426 ++
>> libavcodec/aarch64/hevcdsp_init_aarch64.c |  45 +++
>> libavcodec/hevcdsp.c  |   2 +
>> libavcodec/hevcdsp.h  |   1 +
>> 5 files changed, 476 insertions(+)
>> create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S
>> create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c
>> [...]
> 
> ASlibavcodec/aarch64/hevcdsp_idct_neon.o
> libavcodec/aarch64/hevcdsp_idct_neon.S: Assembler messages:
> libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- `mov 
> v29.4S,v28.4S'

Yes, I noticed that a few days ago, I sent the fixed version now.
I had only tested on Apple assembler, assuming it would be the same.
Really stupid behaviour by the GNU one, as if the type mattered for a mov 
instruction, needlessly complicates macros.

>> Thanks for porting this, I was in the process of writing HEVC
> assembly (see my set on the ML) and would be interested to rebase this on top 
> of that set.

Sorry, I had not seen that as I’ve only recently started reading the list 
(well, only my threads to be honest).
Hope I’ve not duplicated/complicated any of your
work, I was mostly just interested in learning
something new, otherwise I would have checked first
for related work.

Thanks for the interest,
Reimar
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 5/5] fft: remove 16-bit FFT and MDCT code

2021-01-12 Thread Lynne

No longer used by anything. 
Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
simply too much work for code meant to be all removed anyway.

Merged the 2 patches and fixed all tests.

>From 47da78c332dc00a69eb0fae75cba6d7f71f26070 Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Sat, 9 Jan 2021 16:23:20 +0100
Subject: [PATCH v2 5/5] fft: remove 16-bit FFT and MDCT code

No longer used by anything.
Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
simply too much work for code meant to be all removed anyway.
---
 libavcodec/Makefile |  11 +-
 libavcodec/arm/Makefile |   9 +-
 libavcodec/arm/fft_fixed_init_arm.c |  50 --
 libavcodec/arm/fft_fixed_neon.S | 261 
 libavcodec/arm/mdct_fixed_neon.S| 193 
 libavcodec/fft-internal.h   |  29 +---
 libavcodec/fft.h|   9 -
 libavcodec/fft_fixed.c  |  21 ---
 libavcodec/fft_template.c   |   4 -
 libavcodec/mdct_fixed.c |  65 ---
 libavcodec/tests/.gitignore |   1 -
 libavcodec/tests/fft-fixed.c|  21 ---
 tests/fate/fft.mak  |  30 +---
 13 files changed, 14 insertions(+), 690 deletions(-)
 delete mode 100644 libavcodec/arm/fft_fixed_init_arm.c
 delete mode 100644 libavcodec/arm/fft_fixed_neon.S
 delete mode 100644 libavcodec/arm/mdct_fixed_neon.S
 delete mode 100644 libavcodec/fft_fixed.c
 delete mode 100644 libavcodec/mdct_fixed.c
 delete mode 100644 libavcodec/tests/fft-fixed.c

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 0546e6f6c5..446e6e6b3b 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -83,10 +83,9 @@ OBJS-$(CONFIG_EXIF)+= exif.o tiff_common.o
 OBJS-$(CONFIG_FAANDCT) += faandct.o
 OBJS-$(CONFIG_FAANIDCT)+= faanidct.o
 OBJS-$(CONFIG_FDCTDSP) += fdctdsp.o jfdctfst.o jfdctint.o
-FFT-OBJS-$(CONFIG_HARDCODED_TABLES)+= cos_tables.o cos_fixed_tables.o
-OBJS-$(CONFIG_FFT) += avfft.o fft_fixed.o fft_float.o \
-  fft_fixed_32.o fft_init_table.o \
-  $(FFT-OBJS-yes)
+FFT-OBJS-$(CONFIG_HARDCODED_TABLES)+= cos_tables.o
+OBJS-$(CONFIG_FFT) += avfft.o fft_float.o fft_fixed_32.o \
+  fft_init_table.o $(FFT-OBJS-yes)
 OBJS-$(CONFIG_FLACDSP) += flacdsp.o
 OBJS-$(CONFIG_FMTCONVERT)  += fmtconvert.o
 OBJS-$(CONFIG_GOLOMB)  += golomb.o
@@ -115,7 +114,7 @@ OBJS-$(CONFIG_LLVIDENCDSP) += lossless_videoencdsp.o
 OBJS-$(CONFIG_LPC) += lpc.o
 OBJS-$(CONFIG_LSP) += lsp.o
 OBJS-$(CONFIG_LZF) += lzf.o
-OBJS-$(CONFIG_MDCT)+= mdct_fixed.o mdct_float.o mdct_fixed_32.o
+OBJS-$(CONFIG_MDCT)+= mdct_float.o mdct_fixed_32.o
 OBJS-$(CONFIG_ME_CMP)  += me_cmp.o
 OBJS-$(CONFIG_MEDIACODEC)  += mediacodecdec_common.o mediacodec_surface.o mediacodec_wrapper.o mediacodec_sw_buffer.o
 OBJS-$(CONFIG_MPEG_ER) += mpeg_er.o
@@ -1217,7 +1216,7 @@ TESTPROGS = avpacket\
 
 TESTPROGS-$(CONFIG_CABAC) += cabac
 TESTPROGS-$(CONFIG_DCT)   += avfft
-TESTPROGS-$(CONFIG_FFT)   += fft fft-fixed fft-fixed32
+TESTPROGS-$(CONFIG_FFT)   += fft fft-fixed32
 TESTPROGS-$(CONFIG_GOLOMB)+= golomb
 TESTPROGS-$(CONFIG_IDCTDSP)   += dct
 TESTPROGS-$(CONFIG_IIRFILTER) += iirfilter
diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
index c6be814153..c4ab93aeeb 100644
--- a/libavcodec/arm/Makefile
+++ b/libavcodec/arm/Makefile
@@ -5,8 +5,7 @@ OBJS-$(CONFIG_AC3DSP)  += arm/ac3dsp_init_arm.o \
   arm/ac3dsp_arm.o
 OBJS-$(CONFIG_AUDIODSP)+= arm/audiodsp_init_arm.o
 OBJS-$(CONFIG_BLOCKDSP)+= arm/blockdsp_init_arm.o
-OBJS-$(CONFIG_FFT) += arm/fft_init_arm.o\
-  arm/fft_fixed_init_arm.o
+OBJS-$(CONFIG_FFT) += arm/fft_init_arm.o
 OBJS-$(CONFIG_FLACDSP) += arm/flacdsp_init_arm.o\
   arm/flacdsp_arm.o
 OBJS-$(CONFIG_FMTCONVERT)  += arm/fmtconvert_init_arm.o
@@ -108,8 +107,7 @@ NEON-OBJS-$(CONFIG_AUDIODSP)   += arm/audiodsp_init_neon.o  \
   arm/int_neon.o
 NEON-OBJS-$(CONFIG_BLOCKDSP)   += arm/blockdsp_init_neon.o  \
   arm/blockdsp_neon.o
-NEON-OBJS-$(CONFIG_FFT)+= arm/fft_neon.o\
-

[FFmpeg-devel] [PATCH v2 4/5] ac3enc_fixed: drop unnecessary fixed-point DSP code

2021-01-12 Thread Lynne

Patch attached.

>From adfbae619a9ac2584e45096e3b98a3c7ad36ceb4 Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Sat, 9 Jan 2021 03:19:18 +0100
Subject: [PATCH v2 4/5] ac3enc_fixed: drop unnecessary fixed-point DSP code

---
 libavcodec/ac3dsp.c  |  60 ---
 libavcodec/ac3dsp.h  |  47 --
 libavcodec/ac3tab.c  |  38 -
 libavcodec/ac3tab.h  |   1 -
 libavcodec/arm/ac3dsp_init_arm.c |   9 --
 libavcodec/x86/ac3dsp.asm| 258 ---
 libavcodec/x86/ac3dsp_init.c |  52 +--
 7 files changed, 1 insertion(+), 464 deletions(-)

diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c
index 382f87c05f..85c721dd3b 100644
--- a/libavcodec/ac3dsp.c
+++ b/libavcodec/ac3dsp.c
@@ -46,49 +46,6 @@ static void ac3_exponent_min_c(uint8_t *exp, int num_reuse_blocks, int nb_coefs)
 }
 }
 
-static int ac3_max_msb_abs_int16_c(const int16_t *src, int len)
-{
-int i, v = 0;
-for (i = 0; i < len; i++)
-v |= abs(src[i]);
-return v;
-}
-
-static void ac3_lshift_int16_c(int16_t *src, unsigned int len,
-   unsigned int shift)
-{
-uint32_t *src32 = (uint32_t *)src;
-const uint32_t mask = ~(((1 << shift) - 1) << 16);
-int i;
-len >>= 1;
-for (i = 0; i < len; i += 8) {
-src32[i  ] = (src32[i  ] << shift) & mask;
-src32[i+1] = (src32[i+1] << shift) & mask;
-src32[i+2] = (src32[i+2] << shift) & mask;
-src32[i+3] = (src32[i+3] << shift) & mask;
-src32[i+4] = (src32[i+4] << shift) & mask;
-src32[i+5] = (src32[i+5] << shift) & mask;
-src32[i+6] = (src32[i+6] << shift) & mask;
-src32[i+7] = (src32[i+7] << shift) & mask;
-}
-}
-
-static void ac3_rshift_int32_c(int32_t *src, unsigned int len,
-   unsigned int shift)
-{
-do {
-*src++ >>= shift;
-*src++ >>= shift;
-*src++ >>= shift;
-*src++ >>= shift;
-*src++ >>= shift;
-*src++ >>= shift;
-*src++ >>= shift;
-*src++ >>= shift;
-len -= 8;
-} while (len > 0);
-}
-
 static void float_to_fixed24_c(int32_t *dst, const float *src, unsigned int len)
 {
 const float scale = 1 << 24;
@@ -376,19 +333,6 @@ void ff_ac3dsp_downmix_fixed(AC3DSPContext *c, int32_t **samples, int16_t **matr
 ac3_downmix_c_fixed(samples, matrix, out_ch, in_ch, len);
 }
 
-static void apply_window_int16_c(int16_t *output, const int16_t *input,
- const int16_t *window, unsigned int len)
-{
-int i;
-int len2 = len >> 1;
-
-for (i = 0; i < len2; i++) {
-int16_t w   = window[i];
-output[i]   = (MUL16(input[i],   w) + (1 << 14)) >> 15;
-output[len-i-1] = (MUL16(input[len-i-1], w) + (1 << 14)) >> 15;
-}
-}
-
 void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix,
int out_ch, int in_ch, int len)
 {
@@ -424,9 +368,6 @@ void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix,
 av_cold void ff_ac3dsp_init(AC3DSPContext *c, int bit_exact)
 {
 c->ac3_exponent_min = ac3_exponent_min_c;
-c->ac3_max_msb_abs_int16 = ac3_max_msb_abs_int16_c;
-c->ac3_lshift_int16 = ac3_lshift_int16_c;
-c->ac3_rshift_int32 = ac3_rshift_int32_c;
 c->float_to_fixed24 = float_to_fixed24_c;
 c->bit_alloc_calc_bap = ac3_bit_alloc_calc_bap_c;
 c->update_bap_counts = ac3_update_bap_counts_c;
@@ -438,7 +379,6 @@ av_cold void ff_ac3dsp_init(AC3DSPContext *c, int bit_exact)
 c->out_channels  = 0;
 c->downmix   = NULL;
 c->downmix_fixed = NULL;
-c->apply_window_int16 = apply_window_int16_c;
 
 if (ARCH_ARM)
 ff_ac3dsp_init_arm(c, bit_exact);
diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h
index 161de4cb86..a23b11526e 100644
--- a/libavcodec/ac3dsp.h
+++ b/libavcodec/ac3dsp.h
@@ -42,39 +42,6 @@ typedef struct AC3DSPContext {
  */
 void (*ac3_exponent_min)(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
 
-/**
- * Calculate the maximum MSB of the absolute value of each element in an
- * array of int16_t.
- * @param src input array
- *constraints: align 16. values must be in range [-32767,32767]
- * @param len number of values in the array
- *constraints: multiple of 16 greater than 0
- * @returna value with the same MSB as max(abs(src[]))
- */
-int (*ac3_max_msb_abs_int16)(const int16_t *src, int len);
-
-/**
- * Left-shift each value in an array of int16_t by a specified amount.
- * @param srcinput array
- *   constraints: align 16
- * @param lennumber of values in the array
- *   constraints: multiple of 32 greater than 0
- * @param shift  left shift amount
- *   constraints: range [0,15]
- */
-void (*ac3_lshift_int16)(int16_t *src, unsigned int

[FFmpeg-devel] [PATCH v2 3/5] ac3enc: halve the MDCT window size by using vector_fmul_reverse

2021-01-12 Thread Lynne

This brings the encoder in-line with the rest of ours and saves 
on a bit of memory.

Patch attached.

>From aa4464763919145eb3c58475b6f1fd1418d684c9 Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Sat, 9 Jan 2021 17:27:16 +0100
Subject: [PATCH v2 3/5] ac3enc: halve the MDCT window size by using
 vector_fmul_reverse

This brings the encoder in-line with the rest of ours and saves
on a bit of memory.
---
 libavcodec/ac3enc_fixed.c|  9 -
 libavcodec/ac3enc_float.c| 15 ---
 libavcodec/ac3enc_template.c |  5 -
 3 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/libavcodec/ac3enc_fixed.c b/libavcodec/ac3enc_fixed.c
index 3b302d40df..7a8a77fb93 100644
--- a/libavcodec/ac3enc_fixed.c
+++ b/libavcodec/ac3enc_fixed.c
@@ -101,14 +101,13 @@ static av_cold int ac3_fixed_mdct_init(AC3EncodeContext *s)
 {
 float fwin[AC3_BLOCK_SIZE];
 
-int32_t *iwin = av_malloc_array(AC3_WINDOW_SIZE, sizeof(*iwin));
+int32_t *iwin = av_malloc_array(AC3_BLOCK_SIZE, sizeof(*iwin));
 if (!iwin)
 return AVERROR(ENOMEM);
 
-ff_kbd_window_init(fwin, 5.0, AC3_WINDOW_SIZE/2);
-
-for (int i = 0; i < AC3_WINDOW_SIZE/2; i++)
-iwin[AC3_WINDOW_SIZE-1-i] = lrintf(fwin[i] * (1 << 22));
+ff_kbd_window_init(fwin, 5.0, AC3_BLOCK_SIZE);
+for (int i = 0; i < AC3_BLOCK_SIZE; i++)
+iwin[i] = lrintf(fwin[i] * (1 << 22));
 
 s->mdct_window = iwin;
 
diff --git a/libavcodec/ac3enc_float.c b/libavcodec/ac3enc_float.c
index b17b3a2365..74f3ab8d86 100644
--- a/libavcodec/ac3enc_float.c
+++ b/libavcodec/ac3enc_float.c
@@ -108,23 +108,16 @@ static av_cold void ac3_float_mdct_end(AC3EncodeContext *s)
  */
 static av_cold int ac3_float_mdct_init(AC3EncodeContext *s)
 {
-float *window;
-int i, n, n2;
-
-n  = 1 << 9;
-n2 = n >> 1;
-
-window = av_malloc_array(n, sizeof(*window));
+float *window = av_malloc_array(AC3_BLOCK_SIZE, sizeof(*window));
 if (!window) {
 av_log(s->avctx, AV_LOG_ERROR, "Cannot allocate memory.\n");
 return AVERROR(ENOMEM);
 }
-ff_kbd_window_init(window, 5.0, n2);
-for (i = 0; i < n2; i++)
-window[n-1-i] = window[i];
+
+ff_kbd_window_init(window, 5.0, AC3_BLOCK_SIZE);
 s->mdct_window = window;
 
-return ff_mdct_init(>mdct, 9, 0, -2.0 / n);
+return ff_mdct_init(>mdct, 9, 0, -2.0 / AC3_WINDOW_SIZE);
 }
 
 
diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c
index 4f1e181e0b..5ecef3b178 100644
--- a/libavcodec/ac3enc_template.c
+++ b/libavcodec/ac3enc_template.c
@@ -92,7 +92,10 @@ static void apply_mdct(AC3EncodeContext *s)
 const SampleType *input_samples = >planar_samples[ch][blk * AC3_BLOCK_SIZE];
 
 s->fdsp->vector_fmul(s->windowed_samples, input_samples,
- s->mdct_window, AC3_WINDOW_SIZE);
+ s->mdct_window, AC3_BLOCK_SIZE);
+s->fdsp->vector_fmul_reverse(s->windowed_samples + AC3_BLOCK_SIZE,
+ _samples[AC3_BLOCK_SIZE],
+ s->mdct_window, AC3_BLOCK_SIZE);
 
 s->mdct.mdct_calc(>mdct, block->mdct_coef[ch+1],
   s->windowed_samples);
-- 
2.30.0.rc2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 2/5] ac3enc: do not clip coefficients after transforms

2021-01-12 Thread Lynne

In either encoder, its impossible for the coefficients to go past 25 bits 
right after the MDCT. Our MDCT is numerically stable.
For the floating point encoder, in case a NaN is contained, lrintf() will
raise a floating point exception during the conversion.

Patch attached.

>From e384c30349177bc792d070cb9c9a24fb6ac66950 Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Sat, 9 Jan 2021 09:05:18 +0100
Subject: [PATCH v2 2/5] ac3enc: do not clip coefficients after transforms

In either encoder, its impossible for the coefficients to go past 25 bits
right after the MDCT. Our MDCT is numerically stable.
For the floating point encoder, in case a NaN is contained, lrintf() will
raise a floating point exception during the conversion.
---
 libavcodec/ac3enc_template.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c
index de6eba71d8..4f1e181e0b 100644
--- a/libavcodec/ac3enc_template.c
+++ b/libavcodec/ac3enc_template.c
@@ -383,9 +383,6 @@ int AC3_NAME(encode_frame)(AVCodecContext *avctx, AVPacket *avpkt,
 
 apply_mdct(s);
 
-clip_coefficients(>adsp, s->blocks[0].mdct_coef[1],
-  AC3_MAX_COEFS * s->num_blocks * s->channels);
-
 s->cpl_on = s->cpl_enabled;
 ff_ac3_compute_coupling_strategy(s);
 
-- 
2.30.0.rc2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 1/5] ac3enc_fixed: convert to 32-bit sample format

2021-01-12 Thread Lynne

The AC3 encoder used to be a separate library called "Aften", which
got merged into libavcodec (literally, SVN commits and all).
The merge preserved as much features from the library as possible.

The code had two versions - a fixed point version and a floating
point version. FFmpeg had floating point DSP code used by other
codecs, the AC3 decoder including, so the floating-point DSP was
simply replaced with FFmpeg's own functions.
However, FFmpeg had no fixed-point audio code at that point. So
the encoder brought along its own fixed-point DSP functions,
including a fixed-point MDCT.

The fixed-point MDCT itself is trivially just a float MDCT with a
different type and each multiply being a fixed-point multiply.
So over time, it got refactored, and the FFT used for all other codecs
was templated.

Due to design decisions at the time, the fixed-point version of the
encoder operates at 16-bits of precision. Although convenient, this,
even at the time, was inadequate and inefficient. The encoder is noisy,
does not produce output comparable to the float encoder, and even
rings at higher frequencies due to the badly approximated winow function.

Enter MIPS (owned by Imagination Technologies at the time). They wanted
quick fixed-point decoding on their FPUless cores. So they contributed
patches to template the AC3 decoder so it had both a fixed-point
and a floating-point version. They also did the same for the AAC decoder.
They however, used 32-bit samples. Not 16-bits. And we did not have
32-bit fixed-point DSP functions, including an MDCT. But instead of
templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit fixed),
they simply copy-pasted their own MDCT into ours, and completely
ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected.

This is also the status quo nowadays - 2 separate MDCTs, one which
produces floating point and 16-bit fixed point versions, and one
sort-of integrated which produces 32-bit MDCT.

MIPS weren't all that interested in encoding, so they left the encoder
as-is, and they didn't care much about the ifdeffery, mess or quality - it's
not their problem.

So the MDCT/FFT code has always been a thorn in anyone looking to clean up
code's eye.

Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients.
So for the floating point version, the encoder simply runs the float MDCT,
and converts the resulting coefficients to 25-bit fixed-point, as AC3 is 
inherently
a fixed-point codec. For the fixed-point version, the input is 16-bit samples,
so to maximize precision the frame samples are analyzed and the highest set
bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then
scaled up via ac3_lshift_int16(), so the input for the FFT is always at least 
14 bits,
computed in normalize_samples(). After FFT, the coefficients are scaled up to 
25 bits.

This patch simply changes the encoder to accept 32-bit samples, reusing
the already well-optimized 32-bit MDCT code, allowing us to clean up and drop
a large part of a very messy code of ours, as well as prepare for the future 
lavu/tx
conversion. The coefficients are simply scaled down to 25 bits during windowing,
skipping 2 separate scalings, as the hacks to extend precision are simply no 
longer
necessary. There's no point in running the MDCT always at 32 bits when you're
going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds
properly.

This also makes the encoder even slightly more accurate over the float version,
as there's no coefficient conversion step necessary.

SIZE SAVINGS:
ARM32:
HARDCODED TABLES:
BASE   - 10709590
DROP  DSP  - 10702872 - diff:   -6.56KiB
DROP  MDCT - 10667932 - diff:  -34.12KiB - both:   -40.68KiB
DROP  FFT  - 10336652 - diff: -323.52KiB - all:   -364.20KiB
SOFTCODED TABLES:
BASE   -  9685096
DROP  DSP  -  9678378 - diff:   -6.56KiB
DROP  MDCT -  9643466 - diff:  -34.09KiB - both:   -40.65KiB
DROP  FFT  -  9573918 - diff:  -67.92KiB - all:   -108.57KiB

ARM64:
HARDCODED TABLES:
BASE   - 14641112
DROP  DSP  - 14633806 - diff:   -7.13KiB
DROP  MDCT - 14604812 - diff:  -28.31KiB - both:   -35.45KiB
DROP  FFT  - 14286826 - diff: -310.53KiB - all:   -345.98KiB
SOFTCODED TABLES:
BASE   - 13636238
DROP  DSP  - 13628932 - diff:   -7.13KiB
DROP  MDCT - 13599866 - diff:  -28.38KiB - both:   -35.52KiB
DROP  FFT  - 13542080 - diff:  -56.43KiB - all:    -91.95KiB

x86:
HARDCODED TABLES:
BASE   - 12367336
DROP  DSP  - 12354698 - diff:  -12.34KiB
DROP  MDCT - 12331024 - diff:  -23.12KiB - both:   -35.46KiB
DROP  FFT  - 12029788 - diff: -294.18KiB - all:   -329.64KiB
SOFTCODED TABLES:
BASE   - 11358094
DROP  DSP  - 11345456 - diff:  -12.34KiB
DROP  MDCT - 11321742 - diff:  -23.16KiB - both:   -35.50KiB
DROP  FFT  - 11276946 - diff:  -43.75KiB - all:    -79.25KiB

PERFORMANCE (10min random s32le):
ARM32 - before -  39.9x - 0m15.046s

[FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-12 Thread Reimar . Doeffinger

From: Reimar Döffinger 

Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts",
running on Apple M1.
---
 libavcodec/aarch64/Makefile   |   2 +
 libavcodec/aarch64/hevcdsp_idct_neon.S| 423 ++
 libavcodec/aarch64/hevcdsp_init_aarch64.c |  45 +++
 libavcodec/hevcdsp.c  |   2 +
 libavcodec/hevcdsp.h  |   1 +
 5 files changed, 473 insertions(+)
 create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S
 create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c

diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index f6434e40da..2ea1d74a38 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -61,3 +61,5 @@ NEON-OBJS-$(CONFIG_VP9_DECODER) += 
aarch64/vp9itxfm_16bpp_neon.o   \
aarch64/vp9lpf_neon.o   
\
aarch64/vp9mc_16bpp_neon.o  
\
aarch64/vp9mc_neon.o
+NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_idct_neon.o 
\
+   aarch64/hevcdsp_init_aarch64.o
diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S 
b/libavcodec/aarch64/hevcdsp_idct_neon.S
new file mode 100644
index 00..6b42f6ca3a
--- /dev/null
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -0,0 +1,423 @@
+/*
+ * ARM NEON optimised IDCT functions for HEVC decoding
+ * Copyright (c) 2014 Seppo Tomperi 
+ * Copyright (c) 2017 Alexandra Hájková
+ *
+ * Ported from arm/hevcdsp_idct_neon.S by
+ * Copyright (c) 2020 Reimar Döffinger
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+const trans, align=4
+.short 64, 83, 64, 36
+.short 89, 75, 50, 18
+.short 90, 87, 80, 70
+.short 57, 43, 25, 9
+.short 90, 90, 88, 85
+.short 82, 78, 73, 67
+.short 61, 54, 46, 38
+.short 31, 22, 13, 4
+endconst
+
+.macro sum_sub out, in, c, op, p
+  .ifc \op, +
+smlal\p \out, \in, \c
+  .else
+smlsl\p \out, \in, \c
+  .endif
+.endm
+
+.macro fixsqrshrn d, dt, n, m
+  .ifc \dt, .8H
+sqrshrn2\d\dt, \n\().4S, \m
+  .else
+sqrshrn \n\().4H, \n\().4S, \m
+mov \d\().D[0], \n\().D[0]
+  .endif
+.endm
+
+// uses and clobbers v28-v31 as temp registers
+.macro tr_4x4_8 in0, in1, in2, in3, out0, out1, out2, out3, p1, p2
+ sshll\p1   v28.4S, \in0, #6
+ movv29.16B, v28.16B
+ smull\p1   v30.4S, \in1, v0.H[1]
+ smull\p1   v31.4S, \in1, v0.H[3]
+ smlal\p2   v28.4S, \in2, v0.H[0] //e0
+ smlsl\p2   v29.4S, \in2, v0.H[0] //e1
+ smlal\p2   v30.4S, \in3, v0.H[3] //o0
+ smlsl\p2   v31.4S, \in3, v0.H[1] //o1
+
+ add\out0, v28.4S, v30.4S
+ add\out1, v29.4S, v31.4S
+ sub\out2, v29.4S, v31.4S
+ sub\out3, v28.4S, v30.4S
+.endm
+
+.macro transpose8_4x4 r0, r1, r2, r3
+trn1v2.8H, \r0\().8H, \r1\().8H
+trn2v3.8H, \r0\().8H, \r1\().8H
+trn1v4.8H, \r2\().8H, \r3\().8H
+trn2v5.8H, \r2\().8H, \r3\().8H
+trn1\r0\().4S, v2.4S, v4.4S
+trn2\r2\().4S, v2.4S, v4.4S
+trn1\r1\().4S, v3.4S, v5.4S
+trn2\r3\().4S, v3.4S, v5.4S
+.endm
+
+.macro transpose_8x8 r0, r1, r2, r3, r4, r5, r6, r7
+transpose8_4x4  \r0, \r1, \r2, \r3
+transpose8_4x4  \r4, \r5, \r6, \r7
+.endm
+
+.macro tr_8x4 shift, in0,in0t, in1,in1t, in2,in2t, in3,in3t, in4,in4t, 
in5,in5t, in6,in6t, in7,in7t, p1, p2
+tr_4x4_8\in0\in0t, \in2\in2t, \in4\in4t, \in6\in6t, v24.4S, 
v25.4S, v26.4S, v27.4S, \p1, \p2
+
+smull\p1v30.4S, \in1\in1t, v0.H[6]
+smull\p1

Re: [FFmpeg-devel] [PATCH] avformat/webvttdec: Fix WebVTT decoder truncating files at first STYLE block

2021-01-12 Thread Dave Evans

Hijacking with related, similar patch from a few months ago:
https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=2538


On Tue, Jan 12, 2021 at 4:56 PM Roderich Schupp 
wrote:

> Bug-ID: 9064
>
> The webvtt decoder truncates the file at the first such block.
> Since these blocks typically occur at the top of the webvtt file, this
> results
> in an empty file (except for the WEBVTT header line).
>
> Reason is that at STYLE block neither parses as a valid cue block nor
> is it skipped like the WEBVTT (i.e. header) or NOTE blocks, hence
> decoding stops.
>
> Solution is to add STYLE to list of skipped blocks. And while we're at it,
> add REGION, too.
> ---
>  libavformat/webvttdec.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/libavformat/webvttdec.c b/libavformat/webvttdec.c
> index 8d2fdfe..5a982dd 100644
> --- a/libavformat/webvttdec.c
> +++ b/libavformat/webvttdec.c
> @@ -89,10 +89,12 @@ static int webvtt_read_header(AVFormatContext *s)
>  p = identifier = cue.str;
>  pos = avio_tell(s->pb);
>
> -/* ignore header chunk */
> +/* ignore header, NOTE, STYLE and REGION chunks */
>  if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) ||
>  !strncmp(p, "WEBVTT", 6) ||
> -!strncmp(p, "NOTE", 4))
> +!strncmp(p, "NOTE", 4) ||
> +!strncmp(p, "STYLE", 5) ||
> +!strncmp(p, "REGION", 6))
>  continue;
>
>  /* optional cue identifier (can be a number like in SRT or some
> kind of
> --
> 2.30.0
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avformat/webvttdec: Fix WebVTT decoder truncating files at first STYLE block

2021-01-12 Thread Roderich Schupp

Bug-ID: 9064

The webvtt decoder truncates the file at the first such block.
Since these blocks typically occur at the top of the webvtt file, this results
in an empty file (except for the WEBVTT header line).

Reason is that at STYLE block neither parses as a valid cue block nor
is it skipped like the WEBVTT (i.e. header) or NOTE blocks, hence
decoding stops.

Solution is to add STYLE to list of skipped blocks. And while we're at it, add 
REGION, too.
---
 libavformat/webvttdec.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libavformat/webvttdec.c b/libavformat/webvttdec.c
index 8d2fdfe..5a982dd 100644
--- a/libavformat/webvttdec.c
+++ b/libavformat/webvttdec.c
@@ -89,10 +89,12 @@ static int webvtt_read_header(AVFormatContext *s)
 p = identifier = cue.str;
 pos = avio_tell(s->pb);
 
-/* ignore header chunk */
+/* ignore header, NOTE, STYLE and REGION chunks */
 if (!strncmp(p, "\xEF\xBB\xBFWEBVTT", 9) ||
 !strncmp(p, "WEBVTT", 6) ||
-!strncmp(p, "NOTE", 4))
+!strncmp(p, "NOTE", 4) ||
+!strncmp(p, "STYLE", 5) ||
+!strncmp(p, "REGION", 6))
 continue;
 
 /* optional cue identifier (can be a number like in SRT or some kind of
-- 
2.30.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v6] avformat/udp: return the error code instead of generic EIO

2021-01-12 Thread Nicolas George

lance.lmw...@gmail.com (12021-01-12):
> From: Limin Wang 
> 
> Signed-off-by: Limin Wang 
> ---
>  libavformat/udp.c | 58 
> ++-
>  1 file changed, 36 insertions(+), 22 deletions(-)
> 
> diff --git a/libavformat/udp.c b/libavformat/udp.c
> index 13c346a..42e4563 100644
> --- a/libavformat/udp.c
> +++ b/libavformat/udp.c
> @@ -165,7 +165,7 @@ static int udp_set_multicast_ttl(int sockfd, int mcastTTL,
>  if (addr->sa_family == AF_INET) {
>  if (setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_TTL, , 
> sizeof(mcastTTL)) < 0) {
>  ff_log_net_error(NULL, AV_LOG_ERROR, 
> "setsockopt(IP_MULTICAST_TTL)");
> -return -1;
> +return ff_neterrno();
>  }
>  }
>  #endif
> @@ -173,7 +173,7 @@ static int udp_set_multicast_ttl(int sockfd, int mcastTTL,
>  if (addr->sa_family == AF_INET6) {
>  if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS, , 
> sizeof(mcastTTL)) < 0) {
>  ff_log_net_error(NULL, AV_LOG_ERROR, 
> "setsockopt(IPV6_MULTICAST_HOPS)");
> -return -1;
> +return ff_neterrno();
>  }
>  }
>  #endif
> @@ -193,7 +193,7 @@ static int udp_join_multicast_group(int sockfd, struct 
> sockaddr *addr,struct soc
>  mreq.imr_interface.s_addr = INADDR_ANY;
>  if (setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, (const void 
> *), sizeof(mreq)) < 0) {
>  ff_log_net_error(NULL, AV_LOG_ERROR, 
> "setsockopt(IP_ADD_MEMBERSHIP)");
> -return -1;
> +return ff_neterrno();
>  }
>  }
>  #endif
> @@ -206,7 +206,7 @@ static int udp_join_multicast_group(int sockfd, struct 
> sockaddr *addr,struct soc
>  mreq6.ipv6mr_interface = 0;
>  if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_ADD_MEMBERSHIP, , 
> sizeof(mreq6)) < 0) {
>  ff_log_net_error(NULL, AV_LOG_ERROR, 
> "setsockopt(IPV6_ADD_MEMBERSHIP)");
> -return -1;
> +return ff_neterrno();
>  }
>  }
>  #endif
> @@ -633,6 +633,7 @@ static int udp_open(URLContext *h, const char *uri, int 
> flags)
>  char buf[256];
>  struct sockaddr_storage my_addr;
>  socklen_t len;

> +int ret = AVERROR(EIO);

ret should be left uninited, so that the compiler can warn you if there
is a code path where you return it without setting it.

>  
>  h->is_streamed = 1;
>  
> @@ -641,12 +642,12 @@ static int udp_open(URLContext *h, const char *uri, int 
> flags)
>  s->buffer_size = is_output ? UDP_TX_BUF_SIZE : UDP_RX_BUF_SIZE;
>  
>  if (s->sources) {
> -if (ff_ip_parse_sources(h, s->sources, >filters) < 0)
> +if ((ret = ff_ip_parse_sources(h, s->sources, >filters)) < 0)
>  goto fail;
>  }
>  
>  if (s->block) {
> -if (ff_ip_parse_blocks(h, s->block, >filters) < 0)
> +if ((ret = ff_ip_parse_blocks(h, s->block, >filters)) < 0)
>  goto fail;
>  }
>  
> @@ -712,11 +713,11 @@ static int udp_open(URLContext *h, const char *uri, int 
> flags)
>  av_strlcpy(localaddr, buf, sizeof(localaddr));
>  }
>  if (av_find_info_tag(buf, sizeof(buf), "sources", p)) {
> -if (ff_ip_parse_sources(h, buf, >filters) < 0)
> +if ((ret = ff_ip_parse_sources(h, buf, >filters)) < 0)
>  goto fail;
>  }
>  if (av_find_info_tag(buf, sizeof(buf), "block", p)) {
> -if (ff_ip_parse_blocks(h, buf, >filters) < 0)
> +if ((ret = ff_ip_parse_blocks(h, buf, >filters)) < 0)
>  goto fail;
>  }
>  if (!is_output && av_find_info_tag(buf, sizeof(buf), "timeout", p))
> @@ -742,7 +743,7 @@ static int udp_open(URLContext *h, const char *uri, int 
> flags)
>  if (!(flags & AVIO_FLAG_READ))
>  goto fail;
>  } else {
> -if (ff_udp_set_remote_url(h, uri) < 0)
> +if ((ret = ff_udp_set_remote_url(h, uri)) < 0)
>  goto fail;
>  }
>  
> @@ -763,15 +764,22 @@ static int udp_open(URLContext *h, const char *uri, int 
> flags)
>   */
>  if (s->reuse_socket > 0 || (s->is_multicast && s->reuse_socket < 0)) {
>  s->reuse_socket = 1;
> -if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, 
> &(s->reuse_socket), sizeof(s->reuse_socket)) != 0)
> +if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, 
> &(s->reuse_socket), sizeof(s->reuse_socket)) != 0) {
> +ret = ff_neterrno();
>  goto fail;
> +}
>  }
>  
>  if (s->is_broadcast) {
>  #ifdef SO_BROADCAST
> -if (setsockopt (udp_fd, SOL_SOCKET, SO_BROADCAST, 
> &(s->is_broadcast), sizeof(s->is_broadcast)) != 0)
> +if (setsockopt (udp_fd, SOL_SOCKET, SO_BROADCAST, 
> &(s->is_broadcast), sizeof(s->is_broadcast)) != 0) {
> +ret = ff_neterrno();
> +goto fail;
> +}
> +#else

> +ret = AVERROR(EINVAL);

ENOSYS

> +goto

[FFmpeg-devel] [PATCH v6] avformat/udp: return the error code instead of generic EIO

2021-01-12 Thread lance . lmwang

From: Limin Wang 

Signed-off-by: Limin Wang 
---
 libavformat/udp.c | 58 ++-
 1 file changed, 36 insertions(+), 22 deletions(-)

diff --git a/libavformat/udp.c b/libavformat/udp.c
index 13c346a..42e4563 100644
--- a/libavformat/udp.c
+++ b/libavformat/udp.c
@@ -165,7 +165,7 @@ static int udp_set_multicast_ttl(int sockfd, int mcastTTL,
 if (addr->sa_family == AF_INET) {
 if (setsockopt(sockfd, IPPROTO_IP, IP_MULTICAST_TTL, , 
sizeof(mcastTTL)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IP_MULTICAST_TTL)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -173,7 +173,7 @@ static int udp_set_multicast_ttl(int sockfd, int mcastTTL,
 if (addr->sa_family == AF_INET6) {
 if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS, , 
sizeof(mcastTTL)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IPV6_MULTICAST_HOPS)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -193,7 +193,7 @@ static int udp_join_multicast_group(int sockfd, struct 
sockaddr *addr,struct soc
 mreq.imr_interface.s_addr = INADDR_ANY;
 if (setsockopt(sockfd, IPPROTO_IP, IP_ADD_MEMBERSHIP, (const void 
*), sizeof(mreq)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IP_ADD_MEMBERSHIP)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -206,7 +206,7 @@ static int udp_join_multicast_group(int sockfd, struct 
sockaddr *addr,struct soc
 mreq6.ipv6mr_interface = 0;
 if (setsockopt(sockfd, IPPROTO_IPV6, IPV6_ADD_MEMBERSHIP, , 
sizeof(mreq6)) < 0) {
 ff_log_net_error(NULL, AV_LOG_ERROR, 
"setsockopt(IPV6_ADD_MEMBERSHIP)");
-return -1;
+return ff_neterrno();
 }
 }
 #endif
@@ -633,6 +633,7 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 char buf[256];
 struct sockaddr_storage my_addr;
 socklen_t len;
+int ret = AVERROR(EIO);
 
 h->is_streamed = 1;
 
@@ -641,12 +642,12 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 s->buffer_size = is_output ? UDP_TX_BUF_SIZE : UDP_RX_BUF_SIZE;
 
 if (s->sources) {
-if (ff_ip_parse_sources(h, s->sources, >filters) < 0)
+if ((ret = ff_ip_parse_sources(h, s->sources, >filters)) < 0)
 goto fail;
 }
 
 if (s->block) {
-if (ff_ip_parse_blocks(h, s->block, >filters) < 0)
+if ((ret = ff_ip_parse_blocks(h, s->block, >filters)) < 0)
 goto fail;
 }
 
@@ -712,11 +713,11 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 av_strlcpy(localaddr, buf, sizeof(localaddr));
 }
 if (av_find_info_tag(buf, sizeof(buf), "sources", p)) {
-if (ff_ip_parse_sources(h, buf, >filters) < 0)
+if ((ret = ff_ip_parse_sources(h, buf, >filters)) < 0)
 goto fail;
 }
 if (av_find_info_tag(buf, sizeof(buf), "block", p)) {
-if (ff_ip_parse_blocks(h, buf, >filters) < 0)
+if ((ret = ff_ip_parse_blocks(h, buf, >filters)) < 0)
 goto fail;
 }
 if (!is_output && av_find_info_tag(buf, sizeof(buf), "timeout", p))
@@ -742,7 +743,7 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 if (!(flags & AVIO_FLAG_READ))
 goto fail;
 } else {
-if (ff_udp_set_remote_url(h, uri) < 0)
+if ((ret = ff_udp_set_remote_url(h, uri)) < 0)
 goto fail;
 }
 
@@ -763,15 +764,22 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
  */
 if (s->reuse_socket > 0 || (s->is_multicast && s->reuse_socket < 0)) {
 s->reuse_socket = 1;
-if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, &(s->reuse_socket), 
sizeof(s->reuse_socket)) != 0)
+if (setsockopt (udp_fd, SOL_SOCKET, SO_REUSEADDR, &(s->reuse_socket), 
sizeof(s->reuse_socket)) != 0) {
+ret = ff_neterrno();
 goto fail;
+}
 }
 
 if (s->is_broadcast) {
 #ifdef SO_BROADCAST
-if (setsockopt (udp_fd, SOL_SOCKET, SO_BROADCAST, &(s->is_broadcast), 
sizeof(s->is_broadcast)) != 0)
+if (setsockopt (udp_fd, SOL_SOCKET, SO_BROADCAST, &(s->is_broadcast), 
sizeof(s->is_broadcast)) != 0) {
+ret = ff_neterrno();
+goto fail;
+}
+#else
+ret = AVERROR(EINVAL);
+goto fail;
 #endif
-   goto fail;
 }
 
 /* Set the checksum coverage for UDP-Lite (RFC 3828) for sending and 
receiving.
@@ -788,8 +796,10 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
 
 if (dscp >= 0) {
 dscp <<= 2;
-if (setsockopt (udp_fd, IPPROTO_IP, IP_TOS, , sizeof(dscp)) != 0)
+if (setsockopt (udp_fd, IPPROTO_IP, IP_TOS, , sizeof(dscp)) != 0) 
{
+ret = ff_neterrno();

[FFmpeg-devel] [PATCH 2/3] avformat/dashdec: fix code style in is_common_init_section_exist

2021-01-12 Thread liuqi05

make the code block short when it too long.

Signed-off-by: liuqi05 
---
 libavformat/dashdec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libavformat/dashdec.c b/libavformat/dashdec.c
index 5f9b9ba882..85b5f147e8 100644
--- a/libavformat/dashdec.c
+++ b/libavformat/dashdec.c
@@ -1995,7 +1995,9 @@ static int is_common_init_section_exist(struct 
representation **pls, int n_pls)
 if (!pls[i]->init_section)
 continue;
 
-if (av_strcasecmp(pls[i]->init_section->url, url) || 
pls[i]->init_section->url_offset != url_offset || pls[i]->init_section->size != 
size) {
+if (av_strcasecmp(pls[i]->init_section->url, url) ||
+  pls[i]->init_section->url_offset != url_offset ||
+  pls[i]->init_section->size != size) {
 return 0;
 }
 }
-- 
2.25.0



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 3/3] avformat/dashdec: rename variable name for more readable

2021-01-12 Thread liuqi05

Rename is_init_section_common_audio to is_init_section_common_subtitle
for is_common_init_section_exist(c->subtitles, c->n_subtitles).
Because it is checked to subtitles, not audio.

Signed-off-by: liuqi05 
---
 libavformat/dashdec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libavformat/dashdec.c b/libavformat/dashdec.c
index 85b5f147e8..e0a89ebc07 100644
--- a/libavformat/dashdec.c
+++ b/libavformat/dashdec.c
@@ -155,6 +155,7 @@ typedef struct DASHContext {
 /* Flags for init section*/
 int is_init_section_common_video;
 int is_init_section_common_audio;
+int is_init_section_common_subtitle;
 
 } DASHContext;
 
@@ -2084,11 +2085,11 @@ static int dash_read_header(AVFormatContext *s)
 }
 
 if (c->n_subtitles)
-c->is_init_section_common_audio = 
is_common_init_section_exist(c->subtitles, c->n_subtitles);
+c->is_init_section_common_subtitle = 
is_common_init_section_exist(c->subtitles, c->n_subtitles);
 
 for (i = 0; i < c->n_subtitles; i++) {
 rep = c->subtitles[i];
-if (i > 0 && c->is_init_section_common_audio) {
+if (i > 0 && c->is_init_section_common_subtitle) {
 ret = copy_init_section(rep, c->subtitles[0]);
 if (ret < 0)
 goto fail;
-- 
2.25.0



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/3] avformat/dashdec: check init_section before use it.

2021-01-12 Thread liuqi05

because there have no Initialization in SegmentTemplate, so
it will have no init_section for init segment file.
but in the is_common_init_section_exist function it will be used for
check to url, url_offset and size, so check init_section before use init_section

fix ticket: 9062

Signed-off-by: liuqi05 
---
 libavformat/dashdec.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libavformat/dashdec.c b/libavformat/dashdec.c
index 693fc7372b..5f9b9ba882 100644
--- a/libavformat/dashdec.c
+++ b/libavformat/dashdec.c
@@ -1992,7 +1992,10 @@ static int is_common_init_section_exist(struct 
representation **pls, int n_pls)
 url_offset = first_init_section->url_offset;
 size = pls[0]->init_section->size;
 for (i=0;iinit_section->url,url) || 
pls[i]->init_section->url_offset != url_offset || pls[i]->init_section->size != 
size) {
+if (!pls[i]->init_section)
+continue;
+
+if (av_strcasecmp(pls[i]->init_section->url, url) || 
pls[i]->init_section->url_offset != url_offset || pls[i]->init_section->size != 
size) {
 return 0;
 }
 }
-- 
2.25.0



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

2021-01-12 Thread Josh Dekker


Hi,

On 2021-01-08 21:36, reimar.doeffin...@gmx.de wrote:

From: Reimar Döffinger 

Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth
available on aarch64.
For a UHD HDR (10 bit) sample video these were consuming the most time
and this optimization reduced overall decode time from 19.4s to 16.4s,
approximately 15% speedup.
Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts",
running on Apple M1.
---
  libavcodec/aarch64/Makefile   |   2 +
  libavcodec/aarch64/hevcdsp_idct_neon.S| 426 ++
  libavcodec/aarch64/hevcdsp_init_aarch64.c |  45 +++
  libavcodec/hevcdsp.c  |   2 +
  libavcodec/hevcdsp.h  |   1 +
  5 files changed, 476 insertions(+)
  create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S
  create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c

[...]


AS  libavcodec/aarch64/hevcdsp_idct_neon.o
libavcodec/aarch64/hevcdsp_idct_neon.S: Assembler messages:
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- 
`mov v29.4S,v28.4S'

libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this?
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.8b, v28.8b
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s):
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.16b, v28.16b
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- 
`mov v29.4S,v28.4S'

libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this?
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.8b, v28.8b
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s):
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.16b, v28.16b
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- 
`mov v29.4S,v28.4S'

libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this?
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.8b, v28.8b
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s):
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.16b, v28.16b
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- 
`mov v29.4S,v28.4S'

libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this?
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.8b, v28.8b
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s):
libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:   mov v29.16b, v28.16b

This doesn't build on GNU assembler (GNU Binutils for Ubuntu) 2.34 
(aarch64). Thanks for porting this, I was in the process of writing HEVC
assembly (see my set on the ML) and would be interested to rebase this 
on top of that set.


--
Josh
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 6/6] fft_fixed: remove 16-bit FFT code

2021-01-12 Thread Andreas Rheinhardt

Lynne:
> Jan 12, 2021, 08:50 by andreas.rheinha...@gmail.com:
> 
>> Lynne:
>>
>>> Jan 9, 2021, 20:22 by d...@lynne.ee:
>>>
 No longer used by anything. 
 Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
 simply too much work for code meant to be all removed anyway.

 Patch attached. Read patch 1/6 to see the size savings.

>>> Forgot to remove the tests, making FATE fail.
>>> Fixed, patch attached.
>>>
>> According to patchwork, even the very first of your patches doesn't pass
>> FATE. And given that your second version (of the first patch) didn't
>> change anything wrt FATE, it won't be different with v2.
>>
> 
> Why are you posting this as a reply to this patch then?
> I only tested fate-ac3 and fate-fft. This wasn't covered by either.
> Fixed locally. It was a 2-char fix in fate-unknown_layout-ac3.
> 
Because IMO the topic of this mail is "FATE failures", so I answered here.

- Andreas

PS: Are you sure you do not need to change fate-lavf-rm, too?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 6/6] fft_fixed: remove 16-bit FFT code

2021-01-12 Thread Lynne

Jan 12, 2021, 08:50 by andreas.rheinha...@gmail.com:

> Lynne:
>
>> Jan 9, 2021, 20:22 by d...@lynne.ee:
>>
>>> No longer used by anything. 
>>> Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
>>> simply too much work for code meant to be all removed anyway.
>>>
>>> Patch attached. Read patch 1/6 to see the size savings.
>>>
>> Forgot to remove the tests, making FATE fail.
>> Fixed, patch attached.
>>
> According to patchwork, even the very first of your patches doesn't pass
> FATE. And given that your second version (of the first patch) didn't
> change anything wrt FATE, it won't be different with v2.
>

Why are you posting this as a reply to this patch then?
I only tested fate-ac3 and fate-fft. This wasn't covered by either.
Fixed locally. It was a 2-char fix in fate-unknown_layout-ac3.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/6] ac3enc_fixed: convert to 32-bit sample format

2021-01-12 Thread Lynne

Jan 12, 2021, 08:59 by andreas.rheinha...@gmail.com:

> Lynne:
>
>> The AC3 encoder used to be a separate library called "Aften", which
>> got merged into libavcodec (literally, SVN commits and all).
>> The merge preserved as much features from the library as possible.
>>
>> The code had two versions - a fixed point version and a floating
>> point version. FFmpeg had floating point DSP code used by other
>> codecs, the AC3 decoder including, so the floating-point DSP was
>> simply replaced with FFmpeg's own functions.
>> However, FFmpeg had no fixed-point audio code at that point. So
>> the encoder brought along its own fixed-point DSP functions,
>> including a fixed-point MDCT.
>>
>> The fixed-point MDCT itself is trivially just a float MDCT with a
>> different type and each multiply being a fixed-point multiply.
>> So over time, it got refactored, and the FFT used for all other codecs
>> was templated.
>>
>> Due to design decisions at the time, the fixed-point version of the
>> encoder operates at 16-bits of precision. Although convenient, this,
>> even at the time, was inadequate and inefficient. The encoder is noisy,
>> does not produce output comparable to the float encoder, and even
>> rings at higher frequencies due to the badly approximated winow function.
>>
>> Enter MIPS (owned by Imagination Technologies at the time). They wanted
>> quick fixed-point decoding on their FPUless cores. So they contributed
>> patches to template the AC3 decoder so it had both a fixed-point
>> and a floating-point version. They also did the same for the AAC decoder.
>> They however, used 32-bit samples. Not 16-bits. And we did not have
>> 32-bit fixed-point DSP functions, including an MDCT. But instead of
>> templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit 
>> fixed),
>> they simply copy-pasted their own MDCT into ours, and completely
>> ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected.
>>
>> This is also the status quo nowadays - 2 separate MDCTs, one which
>> produces floating point and 16-bit fixed point versions, and one
>> sort-of integrated which produces 32-bit MDCT.
>>
>> MIPS weren't all that interested in encoding, so they left the encoder
>> as-is, and they didn't care much about the ifdeffery, mess or quality - it's
>> not their problem.
>>
>> So the MDCT/FFT code has always been a thorn in anyone looking to clean up
>> code's eye.
>>
>> Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients.
>> So for the floating point version, the encoder simply runs the float MDCT,
>> and converts the resulting coefficients to 25-bit fixed-point, as AC3 is 
>> inherently
>> a fixed-point codec. For the fixed-point version, the input is 16-bit 
>> samples,
>> so to maximize precision the frame samples are analyzed and the highest set
>> bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then
>> scaled up via ac3_lshift_int16(), so the input for the FFT is always at 
>> least 14 bits,
>> computed in normalize_samples(). After FFT, the coefficients are scaled up 
>> to 25 bits.
>>
>> This patch simply changes the encoder to accept 32-bit samples, reusing
>> the already well-optimized 32-bit MDCT code, allowing us to clean up and drop
>> a large part of a very messy code of ours, as well as prepare for the future 
>> lavu/tx
>> conversion. The coefficients are simply scaled down to 25 bits during 
>> windowing,
>> skipping 2 separate scalings, as the hacks to extend precision are simply no 
>> longer
>> necessary. There's no point in running the MDCT always at 32 bits when you're
>> going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds
>> properly.
>>
>> This also makes the encoder even slightly more accurate over the float 
>> version,
>> as there's no coefficient conversion step necessary.
>>
>> SIZE SAVINGS:
>> ARM32:
>> HARDCODED TABLES:
>> BASE   - 10709590
>> DROP  DSP  - 10702872 - diff:   -6.56KiB
>> DROP  MDCT - 10667932 - diff:  -34.12KiB - both:   -40.68KiB
>> DROP  FFT  - 10336652 - diff: -323.52KiB - all:   -364.20KiB
>> SOFTCODED TABLES:
>> BASE   -  9685096
>> DROP  DSP  -  9678378 - diff:   -6.56KiB
>> DROP  MDCT -  9643466 - diff:  -34.09KiB - both:   -40.65KiB
>> DROP  FFT  -  9573918 - diff:  -67.92KiB - all:   -108.57KiB
>>
>> ARM64:
>> HARDCODED TABLES:
>> BASE   - 14641112
>> DROP  DSP  - 14633806 - diff:   -7.13KiB
>> DROP  MDCT - 14604812 - diff:  -28.31KiB - both:   -35.45KiB
>> DROP  FFT  - 14286826 - diff: -310.53KiB - all:   -345.98KiB
>> SOFTCODED TABLES:
>> BASE   - 13636238
>> DROP  DSP  - 13628932 - diff:   -7.13KiB
>> DROP  MDCT - 13599866 - diff:  -28.38KiB - both:   -35.52KiB
>> DROP  FFT  - 13542080 - diff:  -56.43KiB - all:    -91.95KiB
>>
>> x86:
>> HARDCODED TABLES:
>> BASE   - 12367336
>> DROP  DSP  - 12354698 - diff:  -12.34KiB
>> DROP  MDCT - 12331024 - diff:  -23.12KiB - both:

Re: [FFmpeg-devel] [PATCH 1/6] ac3enc_fixed: convert to 32-bit sample format

2021-01-12 Thread Lynne

Jan 12, 2021, 08:48 by andreas.rheinha...@gmail.com:

> Lynne:
>
>> Jan 9, 2021, 22:01 by andreas.rheinha...@gmail.com:
>>
>>> Lynne:
>>>
 @@ -165,7 +164,11 @@ typedef struct AC3EncodeContext {
  AVCodecContext *avctx;  ///< parent AVCodecContext
  PutBitContext pb;   ///< bitstream writer context
  AudioDSPContext adsp;
 +#if AC3ENC_FLOAT
  AVFloatDSPContext *fdsp;
 +#else
 +AVFixedDSPContext *fdsp;
 +#endif
  MECmpContext mecc;
  AC3DSPContext ac3dsp;   ///< AC-3 optimized functions
  FFTContext mdct;///< FFT context for MDCT 
 calculation

>>> [...]
>>>
 @@ -118,9 +89,10 @@ static CoefType calc_cpl_coord(CoefSumType energy_ch, 
 CoefSumType energy_cpl)
  static av_cold void ac3_fixed_mdct_end(AC3EncodeContext *s)
  {
  ff_mdct_end(>mdct);
 +av_freep(>fdsp);
 +av_freep(>mdct_window);
  }

>>>
>>> ff_ac3_encode_close already unconditionally frees fdsp, so freeing it
>>> above is either unnecessary or ac3_float_mdct_end should also free its
>>> fdsp (and ff_ac3_encode_close shouldn't). Freeing mdct_window can also
>>> be moved to ff_ac3_encode_close (which already frees several buffers
>>> whose pointed-to-type depends upon the encoding mode).
>>> Notice that ac3enc.c uses the fixed-point mode, but the layout of
>>> AC3EncodeContext does not depend upon this (apart from pointed-to-types,
>>> of course). Actually, ff_mdct_end does the same for both fixed- and
>>> floating-point mode, so one could even incorporate
>>> ac3_fixed/float_mdct_end into ff_ac3_encode_close.
>>>
>> Done. Left ac3_fixed/float_mdct_end as-is for now.
>> New patch attached.
>> >
>> @@ -129,8 +99,31 @@ static av_cold void ac3_fixed_mdct_end(AC3EncodeContext 
>> *s)
>>  */
>>  static av_cold int ac3_fixed_mdct_init(AC3EncodeContext *s)
>>  {
>> +int32_t *iwin;
>> +float fwin[AC3_BLOCK_SIZE];
>> +
>>  int ret = ff_mdct_init(>mdct, 9, 0, -1.0);
>> -s->mdct_window = ff_ac3_window;
>>
>
> You forgot to remove this table.
>
That's done as a part of patch 4/6.



>> +if (ret < 0)
>> +return ret;
>> +
>> +iwin = av_malloc_array(AC3_WINDOW_SIZE, sizeof(*iwin));
>> +if (!iwin)
>> +return AVERROR(ENOMEM);
>> +
>> +ff_kbd_window_init(fwin, 5.0, AC3_WINDOW_SIZE/2);
>> +
>> +for (int i = 0; i < AC3_WINDOW_SIZE/2; i++)
>> +iwin[i] = lrintf(fwin[i] * (1 << 22));
>> +
>>
>
> Does this lead to a different result than using ff_kbd_window_init_fixed
> directly?
>

Yes, slightly. We need a different scaling, so if we work with what that
function gives us the rounding is different. Nothing major, and definitely
still better than before, but it's there.
As for what we gain by switching to it... nothing except a single line for
the temporary float array. We do almost the same that function does
anyway.
So I think I'll leave this as-is.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/6] ac3enc_fixed: convert to 32-bit sample format

2021-01-12 Thread Andreas Rheinhardt

Lynne:
> The AC3 encoder used to be a separate library called "Aften", which
> got merged into libavcodec (literally, SVN commits and all).
> The merge preserved as much features from the library as possible.
> 
> The code had two versions - a fixed point version and a floating
> point version. FFmpeg had floating point DSP code used by other
> codecs, the AC3 decoder including, so the floating-point DSP was
> simply replaced with FFmpeg's own functions.
> However, FFmpeg had no fixed-point audio code at that point. So
> the encoder brought along its own fixed-point DSP functions,
> including a fixed-point MDCT.
> 
> The fixed-point MDCT itself is trivially just a float MDCT with a
> different type and each multiply being a fixed-point multiply.
> So over time, it got refactored, and the FFT used for all other codecs
> was templated.
> 
> Due to design decisions at the time, the fixed-point version of the
> encoder operates at 16-bits of precision. Although convenient, this,
> even at the time, was inadequate and inefficient. The encoder is noisy,
> does not produce output comparable to the float encoder, and even
> rings at higher frequencies due to the badly approximated winow function.
> 
> Enter MIPS (owned by Imagination Technologies at the time). They wanted
> quick fixed-point decoding on their FPUless cores. So they contributed
> patches to template the AC3 decoder so it had both a fixed-point
> and a floating-point version. They also did the same for the AAC decoder.
> They however, used 32-bit samples. Not 16-bits. And we did not have
> 32-bit fixed-point DSP functions, including an MDCT. But instead of
> templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit 
> fixed),
> they simply copy-pasted their own MDCT into ours, and completely
> ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected.
> 
> This is also the status quo nowadays - 2 separate MDCTs, one which
> produces floating point and 16-bit fixed point versions, and one
> sort-of integrated which produces 32-bit MDCT.
> 
> MIPS weren't all that interested in encoding, so they left the encoder
> as-is, and they didn't care much about the ifdeffery, mess or quality - it's
> not their problem.
> 
> So the MDCT/FFT code has always been a thorn in anyone looking to clean up
> code's eye.
> 
> Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients.
> So for the floating point version, the encoder simply runs the float MDCT,
> and converts the resulting coefficients to 25-bit fixed-point, as AC3 is 
> inherently
> a fixed-point codec. For the fixed-point version, the input is 16-bit samples,
> so to maximize precision the frame samples are analyzed and the highest set
> bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then
> scaled up via ac3_lshift_int16(), so the input for the FFT is always at least 
> 14 bits,
> computed in normalize_samples(). After FFT, the coefficients are scaled up to 
> 25 bits.
> 
> This patch simply changes the encoder to accept 32-bit samples, reusing
> the already well-optimized 32-bit MDCT code, allowing us to clean up and drop
> a large part of a very messy code of ours, as well as prepare for the future 
> lavu/tx
> conversion. The coefficients are simply scaled down to 25 bits during 
> windowing,
> skipping 2 separate scalings, as the hacks to extend precision are simply no 
> longer
> necessary. There's no point in running the MDCT always at 32 bits when you're
> going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds
> properly.
> 
> This also makes the encoder even slightly more accurate over the float 
> version,
> as there's no coefficient conversion step necessary.
> 
> SIZE SAVINGS:
> ARM32:
> HARDCODED TABLES:
> BASE   - 10709590
> DROP  DSP  - 10702872 - diff:   -6.56KiB
> DROP  MDCT - 10667932 - diff:  -34.12KiB - both:   -40.68KiB
> DROP  FFT  - 10336652 - diff: -323.52KiB - all:   -364.20KiB
> SOFTCODED TABLES:
> BASE   -  9685096
> DROP  DSP  -  9678378 - diff:   -6.56KiB
> DROP  MDCT -  9643466 - diff:  -34.09KiB - both:   -40.65KiB
> DROP  FFT  -  9573918 - diff:  -67.92KiB - all:   -108.57KiB
> 
> ARM64:
> HARDCODED TABLES:
> BASE   - 14641112
> DROP  DSP  - 14633806 - diff:   -7.13KiB
> DROP  MDCT - 14604812 - diff:  -28.31KiB - both:   -35.45KiB
> DROP  FFT  - 14286826 - diff: -310.53KiB - all:   -345.98KiB
> SOFTCODED TABLES:
> BASE   - 13636238
> DROP  DSP  - 13628932 - diff:   -7.13KiB
> DROP  MDCT - 13599866 - diff:  -28.38KiB - both:   -35.52KiB
> DROP  FFT  - 13542080 - diff:  -56.43KiB - all:    -91.95KiB
> 
> x86:
> HARDCODED TABLES:
> BASE   - 12367336
> DROP  DSP  - 12354698 - diff:  -12.34KiB
> DROP  MDCT - 12331024 - diff:  -23.12KiB - both:   -35.46KiB
> DROP  FFT  - 12029788 - diff: -294.18KiB - all:   -329.64KiB
> SOFTCODED TABLES:
> BASE   - 11358094
> DROP  DSP  -

42 matches

Mail list logo