rgb16 fast paths

Martin Storsjö via ffmpeg-devel Tue, 19 May 2026 01:59:28 -0700

On Sun, 17 May 2026, DROOdotFOO via ffmpeg-devel wrote:

Add NEON unscaled converters for {yuv420p, yuv422p, yuva420p, nv12, nv21}
to {rgb565le, bgr565le, rgb555le, bgr555le}.

The 16bpp packing uses v8/v9 as the output accumulator. Since AAPCS-64
requires d8-d15 to be callee-saved, declare_func now wraps a
stp d8, d9 / ldp d8, d9 around 16bpp paths only (gated by .ifc on the
output format). Pattern matches libswscale/aarch64/hscale.S.

yuva420p -> 16bpp drops alpha and routes through the yuv420p wrappers,
mirroring how yuva420p -> rgb24/bgr24 already work in tree.

Verified with checkasm --test=sw_yuv2rgb (110/110) and the full
checkasm regression (7657/7657) on Apple M1. Cycle counts and the
speedup table are in the cover letter.

Hi,

Thanks for your patch! Can you sign up athttps://code.ffmpeg.org and send the patch as a PR athttps://code.ffmpeg.org/ffmpeg/ffmpeg?

From a brief browse through, I think the assembly seems fine.

Can you split out the changes to the checkasm test into a precedingcommit, with a separate commit message explaining the what and why forthat change specifically?

The benchmark numbers in the cover letter look good, but ideally I wantthem preserved in the commit history as well, so I'd recommend adding themto the message of the actual commit, not just in the cover letter (or PRdescription on forgejo).


// Martin

_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[FFmpeg-devel] Re: [PATCH v1 1/1] swscale/aarch64: add NEON yuv->rgb16 fast paths

Reply via email to