Adds NEON unscaled paths for {yuv420p, yuv422p, yuva420p, nv12, nv21}
-> {rgb565le, bgr565le, rgb555le, bgr555le}, extending the 24/32bpp
NEON conversions from 7fab0becab.
Speedup vs C at width=1920 on Apple M1, --bench:
| input | rgb565le | bgr565le | rgb555le | bgr555le |
|----------|----------|----------|----------|----------|
| yuv420p | 3.69x | 3.68x | 3.28x | 3.31x |
| yuv422p | 4.70x | 4.70x | 4.32x | 4.35x |
| yuva420p | 3.67x | 3.66x | 3.32x | 3.27x |
| nv12 | bench | bench | bench | bench |
| nv21 | bench | bench | bench | bench |
NEON cycles are ~48 for planar and ~50.5 for semi-planar across all
four outputs. yuv422p shows the biggest speedup because its C
reference is the most expensive. 555 ratios trail 565 because the C
reference is faster for 555 (one fewer mask bit); NEON cycles are
the same.
The 16bpp packing uses v8/v9 as accumulators, which clobbers d8/d9.
AAPCS-64 requires d8-d15 callee-saved, so declare_func now wraps a
stp/ldp d8, d9 around the 16bpp paths only, gated by .ifc on the
output format. Other paths are untouched. Same pattern as
libswscale/aarch64/hscale.S.
LE-only. Apple Silicon is always LE; a BE follow-up is one rev16
before the store.
nv12/nv21 are bench-only in checkasm because ff_get_unscaled_swscale
wires the C yuv2rgb fast path only for {YUV420P, YUV422P, YUVA420P}.
The NEON wrappers run (clobber detection + cycle counts) but have no
C reference to compare against. They share pack_rgb16 and the
compute_rgb macro with the verified planar paths, and FATE exercises
them end-to-end.
Tests:
- checkasm --test=sw_yuv2rgb: 110/110 (was 44/44; +66 from the new
16bpp outputs across yuv420p/yuv422p/yuva420p plus the new nv12
and nv21 suites)
- full checkasm: 7657/7657 (baseline 7589)
- make fate: clean
DROOdotFOO (1):
swscale/aarch64: add NEON yuv->rgb16 fast paths
libswscale/aarch64/swscale_unscaled.c | 47 ++++++++
libswscale/aarch64/yuv2rgb_neon.S | 147 ++++++++++++++++++++++++++
tests/checkasm/sw_yuv2rgb.c | 13 ++-
3 files changed, 205 insertions(+), 2 deletions(-)
--
2.50.1 (Apple Git-155)
_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]