Compared with the C implementation of KissFFT (it's the only one I tested on ARM M4). Yes, there is no SIMD on x86. This was not the main target. Was mainly made for ARM M4 (for BLE devices Nordic Semi / Zephyr), and ARM Neon (Android). By the way, this does not change a lot, the FFT/MDCT on powerful CPU's is marginal compared to the read/write of the bitstream arithmetically coded. We can perhaps connect the FFMpeg implementation, but it will probably miss 2 things: - Some transformations are not a multiple of 15, but only 5 * 2^n. I guess FFmpeg only has a base 15 implementation. - It uses asymmetric windowing, to reduce algorithmic delay. Some coefficients are zeroed. Not important, but will need a larger coefficients table, and a bunch of multiplication by 0, without a specific implementation. So I think it will need some work.
On Tue, Mar 26, 2024 at 10:45 AM Paul B Mahol <one...@gmail.com> wrote: > > > On Tue, Mar 26, 2024 at 6:07 PM Antoine Soulier <asoul...@google.com> > wrote: > >> What do you mean by sub-optimal? >> It's stacked by prime factors, and unrolled for FFT3 and FF5. >> The butterfly implementations of FFT3 and FF5, gives me slightly slower >> computation. FFT5 is done first, so it takes advantage of sin()/cos() >> values of 0 or 1. >> There are also no reordering steps (this stage is completely removed), >> but cannot run in-place. >> Benchmarks I made show that it runs slightly faster. >> > > Compared with what? > Where is at least x86 SIMD for that MDCT? > > >> >> On Tue, Mar 26, 2024 at 9:59 AM Paul B Mahol <one...@gmail.com> wrote: >> >>> >>> Isn't this using sub-optimal MDCT implementation? >>> >> _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".