On Sun, 17 May 2026, DROOdotFOO via ffmpeg-devel wrote:
Add NEON unscaled converters for {yuv420p, yuv422p, yuva420p, nv12, nv21}
to {rgb565le, bgr565le, rgb555le, bgr555le}.
The 16bpp packing uses v8/v9 as the output accumulator. Since AAPCS-64
requires d8-d15 to be callee-saved, declare_func now wraps a
stp d8, d9 / ldp d8, d9 around 16bpp paths only (gated by .ifc on the
output format). Pattern matches libswscale/aarch64/hscale.S.
yuva420p -> 16bpp drops alpha and routes through the yuv420p wrappers,
mirroring how yuva420p -> rgb24/bgr24 already work in tree.
Verified with checkasm --test=sw_yuv2rgb (110/110) and the full
checkasm regression (7657/7657) on Apple M1. Cycle counts and the
speedup table are in the cover letter.
Hi,
Thanks for your patch! Can you sign up at
https://code.ffmpeg.org and send the patch as a PR at
https://code.ffmpeg.org/ffmpeg/ffmpeg?
From a brief browse through, I think the assembly seems fine.
Can you split out the changes to the checkasm test into a preceding
commit, with a separate commit message explaining the what and why for
that change specifically?
The benchmark numbers in the cover letter look good, but ideally I want
them preserved in the commit history as well, so I'd recommend adding them
to the message of the actual commit, not just in the cover letter (or PR
description on forgejo).
// Martin
_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]