Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

Reimar Döffinger Fri, 15 Jan 2021 17:13:27 -0800


> On 15 Jan 2021, at 23:55, Martin Storsjö <[email protected]> wrote:
> 
> On Tue, 12 Jan 2021, [email protected] wrote:
> 
>> create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S
>> create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c
> 
> This patch fails checkasm


Fixed, one mis-translated coefficient index...

>> 
>> +.macro fixsqrshrn d, dt, n, m
>> +  .ifc \dt, .8H
>> +        sqrshrn2        \d\dt, \n\().4S, \m
>> +  .else
>> +        sqrshrn         \n\().4H, \n\().4S, \m
>> +        mov             \d\().D[0], \n\().D[0]
>> +  .endif
>> +.endm
> 
> Is this to set the lower half of the dest register, without wiping the upper 
> part? It looks a bit clumsy (and stall-prone on in-order cores), but I guess 
> it's not easy to do things differently without rewriting the higher level 
> structure?

Did not have a good idea, but I was also aiming for keeping the structure of 
the 32-bit and 64-bit code similar.
In particular since I did not know about checkasm and expected some further 
painful debug.

> 
>> +.macro transpose_8x8 r0, r1, r2, r3, r4, r5, r6, r7
>> +        transpose8_4x4  \r0, \r1, \r2, \r3
>> +        transpose8_4x4  \r4, \r5, \r6, \r7
>> +.endm
>> +
> 
> There's a bunch of existing transposes in libavcodec/aarch64/neon.S - can 
> they be used? Not that they're rocket science, though... And I see that the 
> existing arm version also has got its own transpose macros.
> 
> If it's inconvenient to use shared macros, this is fine.

They are different and seem to not be documented, so it would take some
time to figure out how to replace them.
There’s also a bit of a question if I’d want to give up alignment
with the 32-bit code just yet.


> 
>> +        // Transpose each 4x4 block, and swap how d4-d7 and d8-d11 are used.
>> +        // Layout before:
>> +        // d0  d1
>> +        // d2  d3
>> +        // d4  d5
>> +        // d6  d7
>> +        // d8  d9
>> +        // d10 d11
>> +        // d12 d13
>> +        // d14 d15
> 
> These layouts don't look like they're up to date for the aarch64 version?

Removed in new version as it seems not that useful.

>> 
>> +        vst1.s32        {\in7}, [r3, :128]
>> +.endm
> 
> This is left behind untranslated (and thus unused)

Removed. I am not sure if that means it’s also unused in the 32-bit version.

>> +
>> +        movrel          x1, trans + 16
> 
> The movrel macro has got a separate third optional argument for the offset, 
> so write this "movrel x1, trans, 16". (Older versions of llvm are picky with 
> the symbol offset syntax and break with this, as the macro right now adds its 
> own implicit +(0) here. If you pass the offset in the macro parameter, all 
> the offsets get passed within the parentheses.

Changed.

>> +.macro idct_16x16 bitdepth
>> +function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1
>> +//r0 - coeffs
>> +        mov             x15, lr
>> +
> 
> Binutils doesn't recognize "lr" as alias for x30

It didn’t have an issue in the Debian unstable VM?
That seems like the kind of workaround where it would be
better to leave a comment with more info, if you know
what exactly is affected.

_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.

Reply via email to