Hi,

Please see attached an attempt to optimise the 8-bit input to v210enc to
reduce the number of shuffles.
This comes at the cost of having to extract the middle element and perform
a DWORD shift on it and then reinserting it.
I have added a few comments but any other ideas are welcome.

Crude benchmarks on Intel(R) Xeon(R) D-2123IT:

Before:

v210_planar_pack_8_ssse3: 316.5
v210_planar_pack_8_avx: 319.0
v210_planar_pack_8_avx2: 223.0

After:

v210_planar_pack_8_ssse3: 321.0
v210_planar_pack_8_avx: 326.0
v210_planar_pack_8_avx2: 217.0
v210_planar_pack_8_avx512: 211.0

Regards,
Kieran Kunhya

Attachment: 0001-RFC-v210enc-optimisations-and-initial-AVX-512.patch
Description: Binary data

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to