vc1: Introduce fast path for unescaping bitstream buffer

Ben Avison Mon, 21 Mar 2022 08:51:21 -0700

On 18/03/2022 19:10, Andreas Rheinhardt wrote:

Ben Avison:

+static int vc1_unescape_buffer_neon(const uint8_t *src, int size, uint8_t *dst)
+{
+    /* Dealing with starting and stopping, and removing escape bytes, are
+     * comparatively less time-sensitive, so are more clearly expressed using
+     * a C wrapper around the assembly inner loop. Note that we assume a
+     * little-endian machine that supports unaligned loads. */


You should nevertheless use AV_RL32 for your unaligned LE loads


Thanks - I wasn't aware of that. I'll add it in.

1. You should add some benchmarks to the commit message.

Do you mean for each commit, or this one in particular? Are there anyparticular standard files you'd expect to see benchmarked, or will theones I used in the cover-letter do? (Those were just snippets fromproblematic BluRay rips, but that does mean I don't have the rights toredistribute them.) I believe there should be conformance bitstreams forVC-1 somewhere, but I wasn't able to locate them.

During development, I wrote a simple benchmarker for this particularpatch, which measures the throughput of processing random data (whichdoesn't contain the escape sequence at any point). I've just pushed ithere if anyone's interested:


https://github.com/bavison/test-unescape

The compile-time define VERSION there takes a few different values:
1: the original C implementation of vc1_unescape_buffer()

2: an early prototype version I wrote that uses unaligned 32-bit loads,again in pure C

3: the NEON assembly versions

The sort of speeds this measures are:
            AArch32    AArch64
version 1   210 MB/s   292 MB/s
version 2   461 MB/s   435 MB/s
version 3  1294 MB/s  1554 MB/s

2. The unescaping process for VC1 is basically the same as for H.264 and
HEVC* and for those we already have better optimized code in
libavcodec/h2645_parse.c. Can you check the performance of this code
here against (re)using the code from h2645_parse.c?

I've hacked that around a bit to match the calling conditions ofvc1_unescape_buffer(), though not adapted it for the slightly differentrules you noted for VC-1 as opposed to H.264/265. Hopefully it shouldstill give some indication of the approximate performance that could beexpected, but I didn't take time to fully understand everything it wasdoing, so do please say if I've messed something up.


This can be selected by #defining VERSION 4:

            AArch32    AArch64
version 4   737 MB/s  1286 MB/s

This suggests it's much better than the original C, but my NEON versionsstill have the edge, especially on AArch32. The NEON code is very much abrute force check, but it's effectively able to do the testing inparallel with the memcpy - each byte only gets loaded once.


Ben
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 6/6] avcodec/vc1: Introduce fast path for unescaping bitstream buffer

Reply via email to