> On 15 Jan 2021, at 23:55, Martin Storsjö <mar...@martin.st> wrote: > > On Tue, 12 Jan 2021, reimar.doeffin...@gmx.de wrote: > >> create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S >> create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c > > This patch fails checkasm
Fixed, one mis-translated coefficient index... >> >> +.macro fixsqrshrn d, dt, n, m >> + .ifc \dt, .8H >> + sqrshrn2 \d\dt, \n\().4S, \m >> + .else >> + sqrshrn \n\().4H, \n\().4S, \m >> + mov \d\().D[0], \n\().D[0] >> + .endif >> +.endm > > Is this to set the lower half of the dest register, without wiping the upper > part? It looks a bit clumsy (and stall-prone on in-order cores), but I guess > it's not easy to do things differently without rewriting the higher level > structure? Did not have a good idea, but I was also aiming for keeping the structure of the 32-bit and 64-bit code similar. In particular since I did not know about checkasm and expected some further painful debug. > >> +.macro transpose_8x8 r0, r1, r2, r3, r4, r5, r6, r7 >> + transpose8_4x4 \r0, \r1, \r2, \r3 >> + transpose8_4x4 \r4, \r5, \r6, \r7 >> +.endm >> + > > There's a bunch of existing transposes in libavcodec/aarch64/neon.S - can > they be used? Not that they're rocket science, though... And I see that the > existing arm version also has got its own transpose macros. > > If it's inconvenient to use shared macros, this is fine. They are different and seem to not be documented, so it would take some time to figure out how to replace them. There’s also a bit of a question if I’d want to give up alignment with the 32-bit code just yet. > >> + // Transpose each 4x4 block, and swap how d4-d7 and d8-d11 are used. >> + // Layout before: >> + // d0 d1 >> + // d2 d3 >> + // d4 d5 >> + // d6 d7 >> + // d8 d9 >> + // d10 d11 >> + // d12 d13 >> + // d14 d15 > > These layouts don't look like they're up to date for the aarch64 version? Removed in new version as it seems not that useful. >> >> + vst1.s32 {\in7}, [r3, :128] >> +.endm > > This is left behind untranslated (and thus unused) Removed. I am not sure if that means it’s also unused in the 32-bit version. >> + >> + movrel x1, trans + 16 > > The movrel macro has got a separate third optional argument for the offset, > so write this "movrel x1, trans, 16". (Older versions of llvm are picky with > the symbol offset syntax and break with this, as the macro right now adds its > own implicit +(0) here. If you pass the offset in the macro parameter, all > the offsets get passed within the parentheses. Changed. >> +.macro idct_16x16 bitdepth >> +function ff_hevc_idct_16x16_\bitdepth\()_neon, export=1 >> +//r0 - coeffs >> + mov x15, lr >> + > > Binutils doesn't recognize "lr" as alias for x30 It didn’t have an issue in the Debian unstable VM? That seems like the kind of workaround where it would be better to leave a comment with more info, if you know what exactly is affected. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".