On Fri, 20 Aug 2021, Mikhail Nitenko wrote:

Benchmarks:                                             A53     A72
h264_idct4_add_10bpp_c:                                187.7   115.2
h264_idct4_add_10bpp_neon:                              72.5    45.0
h264_idct4_add_dc_10bpp_c:                              96.0    61.2
h264_idct4_add_dc_10bpp_neon:                           36.0    19.5
h264_idct8_add4_10bpp_c:                              2115.5  1424.2
h264_idct8_add4_10bpp_neon:                            734.0   459.5
h264_idct8_add_10bpp_c:                               1017.5   709.0
h264_idct8_add_10bpp_neon:                             345.5   216.5
h264_idct8_add_dc_10bpp_c:                             316.0   235.5
h264_idct8_add_dc_10bpp_neon:                           69.7    44.0
h264_idct_add16_10bpp_c:                              2540.2  1498.5
h264_idct_add16_10bpp_neon:                           1080.5   616.0
h264_idct_add16intra_10bpp_c:                          784.7   439.5
h264_idct_add16intra_10bpp_neon:                       641.0   462.2

Signed-off-by: Mikhail Nitenko <mnite...@gmail.com>
---

there is a function that is not covered by tests, but I tested it with
sample videos, not sure what to do with it

It would be really good to add a checkasm test for it, because assembly without checkasm tests can have lots of hidden bugs (although it seems fairly straightforward here) that only get uncovered by later compiler updates. Not saying that it is the case here, but without a checkasm test we don't know.

Overall the patch seems fine, the code is fairly 1:1 copy of the existing but with wider SIMD elements, and I presume that the range of values don't allow keeping anything in more narrow form.

+function ff_h264_idct_add8_neon_10, export=1 // NO TESTS but test video looks fine (did not look fine before the fixes so it is definitely working somehow)

I'm not quite sure what you mean here - did it look wrong before implementing this function - then we'd have a bug in the C code? Or did it look wrong with a broken version of this assembly function, and look right after getting it right?

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to