[FFmpeg-devel] [PATCH 1/2] lavu/tx: rewrite internal code as a tree-based codelet constructor

2022-01-21 Thread Lynne
This commit rewrites the internal transform code into a constructor
that stitches transforms (codelets).
This allows for transforms to reuse arbitrary parts of other
transforms, and allows transforms to be stacked onto one
another (such as a full iMDCT using a half-iMDCT which in turn
uses an FFT). It also permits for each step to be individually
replaced by assembly or a custom implementation (such as an ASIC).

Patch attached.

>From a5e4bb3e1bde245264c6a4cde5c8db3162ddfa5f Mon Sep 17 00:00:00 2001
From: Lynne 
Date: Thu, 20 Jan 2022 07:14:46 +0100
Subject: [PATCH 1/2] lavu/tx: rewrite internal code as a tree-based codelet
 constructor

This commit rewrites the internal transform code into a constructor
that stitches transforms (codelets).
This allows for transforms to reuse arbitrary parts of other
transforms, and allows transforms to be stacked onto one
another (such as a full iMDCT using a half-iMDCT which in turn
uses an FFT). It also permits for each step to be individually
replaced by assembly or a custom implementation (such as an ASIC).
---
 libavutil/Makefile|4 +-
 libavutil/tx.c|  483 +---
 libavutil/tx.h|3 +
 libavutil/tx_priv.h   |  180 +++--
 libavutil/tx_template.c   | 1356 -
 libavutil/x86/tx_float.asm|  111 +--
 libavutil/x86/tx_float_init.c |  170 +++--
 7 files changed, 1526 insertions(+), 781 deletions(-)

diff --git a/libavutil/Makefile b/libavutil/Makefile
index d17876df1a..22a7b15f61 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -170,8 +170,8 @@ OBJS = adler32.o\
tea.o\
tx.o \
tx_float.o   \
-   tx_double.o  \
-   tx_int32.o   \
+#   tx_double.o  \
+#   tx_int32.o   \
video_enc_params.o   \
film_grain_params.o  \
 
diff --git a/libavutil/tx.c b/libavutil/tx.c
index fa81ada2f1..28fe6c55b9 100644
--- a/libavutil/tx.c
+++ b/libavutil/tx.c
@@ -17,8 +17,9 @@
  */
 
 #include "tx_priv.h"
+#include "qsort.h"
 
-int ff_tx_type_is_mdct(enum AVTXType type)
+static av_always_inline int type_is_mdct(enum AVTXType type)
 {
 switch (type) {
 case AV_TX_FLOAT_MDCT:
@@ -42,22 +43,26 @@ static av_always_inline int mulinv(int n, int m)
 }
 
 /* Guaranteed to work for any n, m where gcd(n, m) == 1 */
-int ff_tx_gen_compound_mapping(AVTXContext *s)
+int ff_tx_gen_compound_mapping(AVTXContext *s, int n, int m)
 {
 int *in_map, *out_map;
-const int n = s->n;
-const int m = s->m;
-const int inv   = s->inv;
-const int len   = n*m;
-const int m_inv = mulinv(m, n);
-const int n_inv = mulinv(n, m);
-const int mdct  = ff_tx_type_is_mdct(s->type);
-
-if (!(s->pfatab = av_malloc(2*len*sizeof(*s->pfatab
+const int inv = s->inv;
+const int len = n*m;/* Will not be equal to s->len for MDCTs */
+const int mdct = type_is_mdct(s->type);
+int m_inv, n_inv;
+
+/* Make sure the numbers are coprime */
+if (av_gcd(n, m) != 1)
+return AVERROR(EINVAL);
+
+m_inv = mulinv(m, n);
+n_inv = mulinv(n, m);
+
+if (!(s->map = av_malloc(2*len*sizeof(*s->map
 return AVERROR(ENOMEM);
 
-in_map  = s->pfatab;
-out_map = s->pfatab + n*m;
+in_map  = s->map;
+out_map = s->map + len;
 
 /* Ruritanian map for input, CRT map for output, can be swapped */
 for (int j = 0; j < m; j++) {
@@ -92,48 +97,50 @@ int ff_tx_gen_compound_mapping(AVTXContext *s)
 return 0;
 }
 
-static inline int split_radix_permutation(int i, int m, int inverse)
+static inline int split_radix_permutation(int i, int len, int inv)
 {
-m >>= 1;
-if (m <= 1)
+len >>= 1;
+if (len <= 1)
 return i & 1;
-if (!(i & m))
-return split_radix_permutation(i, m, inverse) * 2;
-m >>= 1;
-return split_radix_permutation(i, m, inverse) * 4 + 1 - 2*(!(i & m) ^ inverse);
+if (!(i & len))
+return split_radix_permutation(i, len, inv) * 2;
+len >>= 1;
+return split_radix_permutation(i, len, inv) * 4 + 1 - 2*(!(i & len) ^ inv);
 }
 
 int ff_tx_gen_ptwo_revtab(AVTXContext *s, int invert_lookup)
 {
-const int m = s->m, inv = s->inv;
+int len = s->len;
 
-if (!(s->revtab = av_malloc(s->m*sizeof(*s->revtab
-return AVERROR(ENOMEM);
-if (!(s->revtab_c = av_malloc(m*sizeof(*s->revtab_c
+if (!(s->map = av_malloc(len*sizeof(*s->map
 return AVERROR(ENOMEM);
 
-/*

Re: [FFmpeg-devel] [PATCH 1/2] lavu/tx: rewrite internal code as a tree-based codelet constructor

2022-01-21 Thread Lynne
21 Jan 2022, 09:33 by d...@lynne.ee:

> This commit rewrites the internal transform code into a constructor
> that stitches transforms (codelets).
> This allows for transforms to reuse arbitrary parts of other
> transforms, and allows transforms to be stacked onto one
> another (such as a full iMDCT using a half-iMDCT which in turn
> uses an FFT). It also permits for each step to be individually
> replaced by assembly or a custom implementation (such as an ASIC).
>
> Patch attached.
>

Forgot that I disabled double and int32 transforms to speed up
testing, reenabled locally and on my github tx_tree branch.
Also removed some inactive debug code.
https://github.com/cyanreg/FFmpeg/tree/tx_tree
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] lavu/tx: rewrite internal code as a tree-based codelet constructor

2022-01-25 Thread Lynne
21 Jan 2022, 09:51 by d...@lynne.ee:

> 21 Jan 2022, 09:33 by d...@lynne.ee:
>
>> This commit rewrites the internal transform code into a constructor
>> that stitches transforms (codelets).
>> This allows for transforms to reuse arbitrary parts of other
>> transforms, and allows transforms to be stacked onto one
>> another (such as a full iMDCT using a half-iMDCT which in turn
>> uses an FFT). It also permits for each step to be individually
>> replaced by assembly or a custom implementation (such as an ASIC).
>>
>> Patch attached.
>>
>
> Forgot that I disabled double and int32 transforms to speed up
> testing, reenabled locally and on my github tx_tree branch.
> Also removed some inactive debug code.
> https://github.com/cyanreg/FFmpeg/tree/tx_tree
>

I fixed bugs and improved to code more, and I think it's ready
for merging now.
The rdft is no longer bound by any convention, and its
scale may be changed by the user, eliminating after-transform
multiplies that are used pretty much everywhere in our code.

If someone (looks at Paul) gives it a test or converts a filter,
would be nice. I've only tested it on my synthetic benchmarks:
https://github.com/cyanreg/lavu_fft_test

I plan to push the patchset tomorrow unless there are comments.
Mostly done with the aarch64's SIMD, patch coming soon, hopefully.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] lavu/tx: rewrite internal code as a tree-based codelet constructor

2022-01-25 Thread Paul B Mahol
On Tue, Jan 25, 2022 at 11:46 AM Lynne  wrote:

> 21 Jan 2022, 09:51 by d...@lynne.ee:
>
> > 21 Jan 2022, 09:33 by d...@lynne.ee:
> >
> >> This commit rewrites the internal transform code into a constructor
> >> that stitches transforms (codelets).
> >> This allows for transforms to reuse arbitrary parts of other
> >> transforms, and allows transforms to be stacked onto one
> >> another (such as a full iMDCT using a half-iMDCT which in turn
> >> uses an FFT). It also permits for each step to be individually
> >> replaced by assembly or a custom implementation (such as an ASIC).
> >>
> >> Patch attached.
> >>
> >
> > Forgot that I disabled double and int32 transforms to speed up
> > testing, reenabled locally and on my github tx_tree branch.
> > Also removed some inactive debug code.
> > https://github.com/cyanreg/FFmpeg/tree/tx_tree
> >
>
> I fixed bugs and improved to code more, and I think it's ready
> for merging now.
> The rdft is no longer bound by any convention, and its
> scale may be changed by the user, eliminating after-transform
> multiplies that are used pretty much everywhere in our code.
>
> If someone (looks at Paul) gives it a test or converts a filter,
> would be nice. I've only tested it on my synthetic benchmarks:
> https://github.com/cyanreg/lavu_fft_test


Will try it once its applied. Thanks.

>
>
> I plan to push the patchset tomorrow unless there are comments.
> Mostly done with the aarch64's SIMD, patch coming soon, hopefully.
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] lavu/tx: rewrite internal code as a tree-based codelet constructor

2022-01-25 Thread Lynne
25 Jan 2022, 18:17 by one...@gmail.com:

> On Tue, Jan 25, 2022 at 11:46 AM Lynne  wrote:
>
>> 21 Jan 2022, 09:51 by d...@lynne.ee:
>>
>> > 21 Jan 2022, 09:33 by d...@lynne.ee:
>> >
>> >> This commit rewrites the internal transform code into a constructor
>> >> that stitches transforms (codelets).
>> >> This allows for transforms to reuse arbitrary parts of other
>> >> transforms, and allows transforms to be stacked onto one
>> >> another (such as a full iMDCT using a half-iMDCT which in turn
>> >> uses an FFT). It also permits for each step to be individually
>> >> replaced by assembly or a custom implementation (such as an ASIC).
>> >>
>> >> Patch attached.
>> >>
>> >
>> > Forgot that I disabled double and int32 transforms to speed up
>> > testing, reenabled locally and on my github tx_tree branch.
>> > Also removed some inactive debug code.
>> > https://github.com/cyanreg/FFmpeg/tree/tx_tree
>> >
>>
>> I fixed bugs and improved to code more, and I think it's ready
>> for merging now.
>> The rdft is no longer bound by any convention, and its
>> scale may be changed by the user, eliminating after-transform
>> multiplies that are used pretty much everywhere in our code.
>>
>> If someone (looks at Paul) gives it a test or converts a filter,
>> would be nice. I've only tested it on my synthetic benchmarks:
>> https://github.com/cyanreg/lavu_fft_test
>>
>
>
> Will try it once its applied. Thanks.
>

Applied.
It's around 20% faster than lavc's rdft for powers of two lengths.
Non-power-of-two lengths are partially SIMD'd, so they're usable too.
I'll SIMD the small O(n) rdft loop once I'm done with NEON's and
PFA's SIMD. If you find bugs ping me on IRC.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".