On Mon, 25 Mar 2024, Martin Storsjö wrote:

Since some time, we have pretty complete AArch64 NEON coverage
for the hevc decoder.

However, some of these functions require the I8MM instruction set
extension, and many of them (but not all) lack a plain NEON
version.

This patchset fills in a regular NEON version of all functions
where we have an I8MM function.

For context; the I8MM instruction set extension is a mandatory
part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it,
but Apple M1 and Ampere Altra don't.

This patchset takes decoding of a 1080p HEVC clip from 402
fps to 649 fps on an Apple M1.

Patch #2 also fixes a subtle bug in the existing implementation;
two functions relied on the contents on the stack, below the
stack pointer, being untouched within a function. If a signal
gets delivered, those parts of the stack could be clobbered.

I know this is a bit short notice for a patchset of this size - but, would people be OK with merging this patchset before the impending 7.0 branch (which is made within the next 24h)?

The patches pass all my tricky build configurations, they give a very non-negligible speedup on many common CPUs, and patch #2 fixes a real bug in the existing impleemntations. (A bug fix patch can of course be backported after the branch too, but performance optimizations aren't generally relevant for backporting.)

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to