PR #23371 opened by Kacper Michajłow (kasper93) URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23371 Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23371.patch
The AVX2 15xM PFA FFT calls its second-dimension subtransform with dirty YMM. That subtransform may be a legacy-SSE codelet (fft4 is SSE2 only), causing AVX<->SSE transition penalties. Clear them after the first dimension, before the calls. Detected with `sde64 -ast` FATE job. Fixes: ace42cf581f8c06872bfb58cf575d9e8bd398c0a --- For the report see, https://fate.ffmpeg.org/report.cgi?time=20260605155641&slot=amd64-clang-sde-asm-ast From 205c1307675cf96157b75abf34ea2b6d3b7b5844 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <[email protected]> Date: Fri, 5 Jun 2026 20:11:46 +0200 Subject: [PATCH] avutil/x86/tx_float: add missing vzeroupper to 15xM PFA FFT The AVX2 15xM PFA FFT calls its second-dimension subtransform with dirty YMM. That subtransform may be a legacy-SSE codelet (fft4 is SSE2 only), causing AVX<->SSE transition penalties. Clear them after the first dimension, before the calls. Detected with `sde64 -ast` FATE job. Fixes: ace42cf581f8c06872bfb58cf575d9e8bd398c0a --- libavutil/x86/tx_float.asm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libavutil/x86/tx_float.asm b/libavutil/x86/tx_float.asm index 87be21c2d6..7dedf54312 100644 --- a/libavutil/x86/tx_float.asm +++ b/libavutil/x86/tx_float.asm @@ -1874,6 +1874,8 @@ cglobal fft_pfa_15xM_float, 4, 14, 16, 320, ctx, out, in, stride, len, lut, buf, mov lutq, [ctxq + AVTXContext.map] ; load subtransform's map movsxd lenq, dword [ctxq + AVTXContext.len] ; load subtransform's length + vzeroupper + .dim2: call tgt5q ; call the FFT lea inq, [inq + lenq*8] -- 2.52.0 _______________________________________________ ffmpeg-devel mailing list -- [email protected] To unsubscribe send an email to [email protected]
