On 2016-11-30 13:57, Ronald S. Bultje wrote: > On Wed, Nov 30, 2016 at 7:10 AM, James Darnley <jdarn...@obe.tv> wrote: >>> Nehalem: >>> - sse2: >>> - complex: 4.13x faster (1514 vs. 367 cycles) >>> - simple: 4.38x faster (1836 vs. 419 cycles) >>> >>> Haswell: >>> - sse2: >>> - complex: 3.61x faster ( 936 vs. 260 cycles) >>> - simple: 3.97x faster (1126 vs. 284 cycles) >>> - avx (versus sse2): >>> - complex: 1.07x faster (260 vs. 244 cycles) >>> - simple: 1.03x faster (284 vs. 274 cycles) >> >> I included the sse2 results for the Haswell to show that the avx is >> (slightly) better. > > > Ah! Now it makes sense. I had no idea why your SSE2 results changed from > 367 (SSE2 vs. C) to 260 cycles (AVX vs. SSE2).
Great. If there are no further comments I will push later tonight. First I need to correct the micro-architecture names. Then I will rebase onto the latest master and push. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel