On Mon, 25 Mar 2024, at 22:56, J. Dekker wrote: >> On Mon, 25 Mar 2024, Martin Storsjö wrote: >> >>> Since some time, we have pretty complete AArch64 NEON coverage >>> for the hevc decoder. >>> >>> However, some of these functions require the I8MM instruction set >>> extension, and many of them (but not all) lack a plain NEON >>> version. >>> >>> This patchset fills in a regular NEON version of all functions >>> where we have an I8MM function. >>> >>> For context; the I8MM instruction set extension is a mandatory >>> part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it, >>> but Apple M1 and Ampere Altra don't. >>> >>> This patchset takes decoding of a 1080p HEVC clip from 402 >>> fps to 649 fps on an Apple M1. >>> >>> Patch #2 also fixes a subtle bug in the existing implementation; >>> two functions relied on the contents on the stack, below the >>> stack pointer, being untouched within a function. If a signal >>> gets delivered, those parts of the stack could be clobbered. >> >> I know this is a bit short notice for a patchset of this size - but, would >> people be OK with merging this patchset before the impending 7.0 branch >> (which is made within the next 24h)? >> >> The patches pass all my tricky build configurations, they give a very >> non-negligible speedup on many common CPUs, and patch #2 fixes a real bug in >> the existing impleemntations. (A bug fix patch can of course be backported >> after the branch too, but performance optimizations aren't generally >> relevant for backporting.) >> >> // Martin > > Yes, please. I will tomorrow morning if you didn’t already push.
+1 -- Jean-Baptiste Kempf - President +33 672 704 734 https://jbkempf.com/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".