PR #23371 opened by Kacper Michajłow (kasper93)
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23371
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23371.patch

The AVX2 15xM PFA FFT calls its second-dimension subtransform with dirty
YMM. That subtransform may be a legacy-SSE codelet (fft4 is SSE2 only),
causing AVX<->SSE transition penalties. Clear them after the first
dimension, before the calls.

Detected with `sde64 -ast` FATE job.

Fixes: ace42cf581f8c06872bfb58cf575d9e8bd398c0a

--- 

For the report see, 
https://fate.ffmpeg.org/report.cgi?time=20260605155641&slot=amd64-clang-sde-asm-ast


From 205c1307675cf96157b75abf34ea2b6d3b7b5844 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kacper=20Michaj=C5=82ow?= <[email protected]>
Date: Fri, 5 Jun 2026 20:11:46 +0200
Subject: [PATCH] avutil/x86/tx_float: add missing vzeroupper to 15xM PFA FFT

The AVX2 15xM PFA FFT calls its second-dimension subtransform with dirty
YMM. That subtransform may be a legacy-SSE codelet (fft4 is SSE2 only),
causing AVX<->SSE transition penalties. Clear them after the first
dimension, before the calls.

Detected with `sde64 -ast` FATE job.

Fixes: ace42cf581f8c06872bfb58cf575d9e8bd398c0a
---
 libavutil/x86/tx_float.asm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavutil/x86/tx_float.asm b/libavutil/x86/tx_float.asm
index 87be21c2d6..7dedf54312 100644
--- a/libavutil/x86/tx_float.asm
+++ b/libavutil/x86/tx_float.asm
@@ -1874,6 +1874,8 @@ cglobal fft_pfa_15xM_float, 4, 14, 16, 320, ctx, out, in, 
stride, len, lut, buf,
     mov lutq, [ctxq + AVTXContext.map]              ; load subtransform's map
     movsxd lenq, dword [ctxq + AVTXContext.len]     ; load subtransform's 
length
 
+    vzeroupper
+
 .dim2:
     call tgt5q                                      ; call the FFT
     lea inq,  [inq  + lenq*8]
-- 
2.52.0

_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to