On 2017-02-06 00:16:41 +0200, Martin Storsjö wrote: > > Ok, so after running a slightly shorter clip (which seems to have about as > large percentage of runtime doing IDCT as the previous one) with a bit more > iterations, I've got the following results (the 'user' part from 'time > avconv -threads 1 -i foo -f null -'): > > 32 orig 32 alt1 32 alt2 64 orig 64 alt1 64 alt2 > 40.436s 40.148s 40.008s 37.428s 37.356s 37.192s > 40.596s 40.140s 40.216s 37.572s 37.524s 37.384s > 40.512s 40.228s 40.188s 37.740s 37.588s 37.368s > 40.584s 40.136s 40.216s 37.880s 37.492s 37.348s > 40.572s 40.292s 40.232s 37.756s 37.556s 37.676s > 40.764s 40.312s 40.232s 37.876s 37.640s 37.468s > 40.688s 40.284s 40.368s 37.972s 37.608s 37.460s > > So while alt2 is faster in most runs, the margin is not quite as big as in > the previous benchmark. (The benchmarks were done on a practically unloaded > system so it shouldn't vary too much from run to run, but in practice, the > first few runs seem to be slightly faster than the later ones.) > > I.e. around 400 ms gain out of 40 s for alt1, and then another -50 - +150 ms > speedup on top of that for alt2. > > What do you think?
At least it looks like the difference between alt1 and alt2 are quite similar on 32- and 64-bit. So we should use the same variant on both archs. I favor alternate 2. Janne _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel