Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-26 Thread Mike Stoner via ffmpeg-devel
Hello,
I’ve accounted for all feedback on this so far, I’m wondering if it is ready to 
be pushed upstream?

Here are my results from ‘checkasm’ (lower is better):

v210_unpack_c: 1636
v210_unpack_ssse3: 611
v210_unpack_avx: 601
v210_unpack_avx2: 423

I ran it 5 times and averaged the middle 3 results for each CPU target 
(ignoring the highest and lowest time).

https://patchwork.ffmpeg.org/patch/12325/


Thanks… -Mike
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-16 Thread Mike Stoner
Hello,
I resent my AVX2 patch for v210 unpacking.  My first attempt didn't get picked 
up by the Patchwork list for some reason.

I installed Linux on a Broadwell laptop to utilize James Darnley's checkasm 
patch for v210 decode.  The results are below.  

AVX2 gets a nice boost from replacing SHUFPS instructions with VPBLENDD, which 
has more flexible port bindings.  VBLENDPS could also be substituted and is 
available from SSE4.1 onward, however I found only the AVX2 code received any 
measureable gain from that change.

Any further comments are greatly appreciated.  

Thanks,
Mike


Tested on Broadwell CPU, Ubuntu 18.10 x86_64

~/FFmpeg$ tests/checkasm/checkasm --bench --test=v210dec
benchmarking with native FFmpeg timers
nop: 94.1
checkasm: using random seed 3963743306
SSSE3:
 - v210dec.v210_unpack [OK]
AVX:
 - v210dec.v210_unpack [OK]
AVX2:
 - v210dec.v210_unpack [OK]
checkasm: all 3 tests passed
v210_unpack_c: 1625.2
v210_unpack_ssse3: 604.2
v210_unpack_avx: 592.2
v210_unpack_avx2: 422.2
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Revised ff_v210_planar_unpack AVX2

2019-03-12 Thread Mike Stoner
I am submitting another patch.  Please disregard this one.

-Mike
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-06 Thread Mike Stoner
 Thanks for the feedback.  You are right, I can use VPERMQ to free up a 
register.  I can also remove the PAND mask by doing PSLLD/PSRLD.  That 
eliminates the need for an x86-64 block.
I tried the naive 'unrolled' version with no permute, and it was much slower, 
about the same as the AVX/SSSE3 code.  VPERMQ/D is a single shuffle uop on port 
5, so it turns out to be useful.
I will submit a new patch with those improvements and the VBROADCASTI128 macro. 
 I role-modeled my code from 'v210enc.asm' which also could be updated with 
VBROADCASTI128.
Note, I'm running on Windows and it looks like 'checkasm' performance 
benchmarking is only enabled on Linux.  For my tests I put a 100x loop around 
the 'unpack_frame' call and ran:
ffmpeg.exe -s:v 1920x1080 -vcodec v210  -stream_loop 200 -i 
OddaView_1920x1080.v210  -f null -y NUL
If there is a better way, let me know...
Thanks,Mike
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel