On Fri, 20 Aug 2021, Mikhail Nitenko wrote:
Benchmarks: A53 A72
h264_idct4_add_10bpp_c: 187.7 115.2
h264_idct4_add_10bpp_neon: 72.5 45.0
h264_idct4_add_dc_10bpp_c: 96.0 61.2
h264_idct4_add_dc_10bpp_neon: 36.0 19.5
h264_idct8_add4_10bpp_c: 2115.5 1424.2
h264_idct8_add4_10bpp_neon: 734.0 459.5
h264_idct8_add_10bpp_c: 1017.5 709.0
h264_idct8_add_10bpp_neon: 345.5 216.5
h264_idct8_add_dc_10bpp_c: 316.0 235.5
h264_idct8_add_dc_10bpp_neon: 69.7 44.0
h264_idct_add16_10bpp_c: 2540.2 1498.5
h264_idct_add16_10bpp_neon: 1080.5 616.0
h264_idct_add16intra_10bpp_c: 784.7 439.5
h264_idct_add16intra_10bpp_neon: 641.0 462.2
Signed-off-by: Mikhail Nitenko <mnite...@gmail.com>
---
there is a function that is not covered by tests, but I tested it with
sample videos, not sure what to do with it
It would be really good to add a checkasm test for it, because assembly
without checkasm tests can have lots of hidden bugs (although it seems
fairly straightforward here) that only get uncovered by later compiler
updates. Not saying that it is the case here, but without a checkasm test
we don't know.
Overall the patch seems fine, the code is fairly 1:1 copy of the existing
but with wider SIMD elements, and I presume that the range of values don't
allow keeping anything in more narrow form.
+function ff_h264_idct_add8_neon_10, export=1 // NO TESTS but test video
looks fine (did not look fine before the fixes so it is definitely
working somehow)
I'm not quite sure what you mean here - did it look wrong before
implementing this function - then we'd have a bug in the C code? Or did it
look wrong with a broken version of this assembly function, and look right
after getting it right?
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".