quarter idct16 and idct32 (alternative 2)

Janne Grunau Wed, 08 Feb 2017 23:57:29 -0800

On 2017-02-06 00:16:41 +0200, Martin Storsjö wrote:
> 
> Ok, so after running a slightly shorter clip (which seems to have about as
> large percentage of runtime doing IDCT as the previous one) with a bit more
> iterations, I've got the following results (the 'user' part from 'time
> avconv -threads 1 -i foo -f null -'):
> 
> 32 orig   32 alt1   32 alt2   64 orig   64 alt1   64 alt2
> 40.436s   40.148s   40.008s   37.428s   37.356s   37.192s
> 40.596s   40.140s   40.216s   37.572s   37.524s   37.384s
> 40.512s   40.228s   40.188s   37.740s   37.588s   37.368s
> 40.584s   40.136s   40.216s   37.880s   37.492s   37.348s
> 40.572s   40.292s   40.232s   37.756s   37.556s   37.676s
> 40.764s   40.312s   40.232s   37.876s   37.640s   37.468s
> 40.688s   40.284s   40.368s   37.972s   37.608s   37.460s
> 
> So while alt2 is faster in most runs, the margin is not quite as big as in
> the previous benchmark. (The benchmarks were done on a practically unloaded
> system so it shouldn't vary too much from run to run, but in practice, the
> first few runs seem to be slightly faster than the later ones.)
> 
> I.e. around 400 ms gain out of 40 s for alt1, and then another -50 - +150 ms
> speedup on top of that for alt2.
> 
> What do you think?


At least it looks like the difference between alt1 and alt2 are quite 
similar on 32- and 64-bit. So we should use the same variant on both 
archs. I favor alternate 2.

Janne
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 5/5] aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 (alternative 2)

Reply via email to