> On Mon, 25 Mar 2024, Martin Storsjö wrote:
> 
>> Since some time, we have pretty complete AArch64 NEON coverage
>> for the hevc decoder.
>> 
>> However, some of these functions require the I8MM instruction set
>> extension, and many of them (but not all) lack a plain NEON
>> version.
>> 
>> This patchset fills in a regular NEON version of all functions
>> where we have an I8MM function.
>> 
>> For context; the I8MM instruction set extension is a mandatory
>> part of armv8.6-a. E.g. Apple M2, AWS Graviton 3 have it,
>> but Apple M1 and Ampere Altra don't.
>> 
>> This patchset takes decoding of a 1080p HEVC clip from 402
>> fps to 649 fps on an Apple M1.
>> 
>> Patch #2 also fixes a subtle bug in the existing implementation;
>> two functions relied on the contents on the stack, below the
>> stack pointer, being untouched within a function. If a signal
>> gets delivered, those parts of the stack could be clobbered.
> 
> I know this is a bit short notice for a patchset of this size - but, would 
> people be OK with merging this patchset before the impending 7.0 branch 
> (which is made within the next 24h)?
> 
> The patches pass all my tricky build configurations, they give a very 
> non-negligible speedup on many common CPUs, and patch #2 fixes a real bug in 
> the existing impleemntations. (A bug fix patch can of course be backported 
> after the branch too, but performance optimizations aren't generally relevant 
> for backporting.)
> 
> // Martin

Yes, please. I will tomorrow morning if you didn’t already push.
-- 
jd
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to