Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
2012/12/21 Justin Ruggles : > If you are unable to test AVX, I can go through your functions after > they're committed and test to see if using AVX helps. If you have some > set of samples which utilize specific functions that would be helpful. It would be simpler for everyone if I could, and it's a pity I can't, but I think I'll have to pass for now. Unfortunately, the same issue may arise for other patches in my backlog. For samples, either the fate suite (al-sbr* and sbr* files - no make target), or the 5.1 first audio track from http://samples.mplayerhq.hu/A-codecs/AAC/zx.eva.renewal.01.divx511.mkv which I initially used. -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
On 12/21/2012 12:39 PM, Christophe Gisquet wrote: > 2012/12/20 Justin Ruggles : >> putting [ps_neg] in a register and switching m0 and m2 in the unpacking >> would allow some 3-arg XMM AVX to be used, like so: > [...] > Tested that, no change in generated code, and indeed no change > speedwise/fate result. > > Updated patch attached. LGTM. If you are unable to test AVX, I can go through your functions after they're committed and test to see if using AVX helps. If you have some set of samples which utilize specific functions that would be helpful. Thanks, Justin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
2012/12/20 Justin Ruggles : > putting [ps_neg] in a register and switching m0 and m2 in the unpacking > would allow some 3-arg XMM AVX to be used, like so: [...] Tested that, no change in generated code, and indeed no change speedwise/fate result. Updated patch attached. -- Christophe 0002-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch Description: Binary data ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
Hi, On 12/01/2012 06:17 AM, Christophe Gisquet wrote: > +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z INIT_XMM sse > +lea r2q, [zq + (64-4)*4] > +.loop: > +mova m0, [r2q] > +mova m1, [zq ] > +xorps m0, [ps_neg] > +shufps m0, m0, 0x1B > +mova m2, m0 > +unpcklps m0, m1 > +unpckhps m2, m1 > +mova [Wq + 0], m0 > +mova [Wq + 16], m2 putting [ps_neg] in a register and switching m0 and m2 in the unpacking would allow some 3-arg XMM AVX to be used, like so: mova m3, [ps_neg] .loop: mova m1, [zq] xorps m0, m3, [r2q] shufps m0, m0, m0, q0123 unpcklps m2, m0, m1 unpckhps m0, m0, m1 mova [Wq + 0], m2 mova [Wq + 16], m0 > +addWq, 32 > +sub r2q, 16 > +addzq, 16 > +cmpzq, r2q > +jl .loop > +REP_RET Thanks, Justin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
2012/12/7 Christophe Gisquet : > 2012/12/1 Christophe Gisquet : >> 2012/11/30 Christophe Gisquet : 4 space tabs. >> >> Done. > > Given the results for the parts where a change was investigated, is > there a need for another review? Ping2 -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
2012/12/1 Christophe Gisquet : > 2012/11/30 Christophe Gisquet : >>> 4 space tabs. > > Done. Given the results for the parts where a change was investigated, is there a need for another review? -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
2012/11/30 Christophe Gisquet : >> 4 space tabs. Done. -- Christophe 0003-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch Description: Binary data ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
2012/12/1 Ronald S. Bultje : > Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it? No change. On that cpu and os, I observed such strange facts: - complex addressing costs sometimes noticeably - aligning jump positions rarely helps - trying to hide latency of operations often doesn't result in measurable changes (but I guess it does on other cpus) All in all, I could have recovered those cycles maybe by doing a reverse scan, but at this point the validation didn't seem worth the very marginal and hypothetical gain. -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
Hi, On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet wrote: > Hello, > > 2012/11/30 Loren Merritt : >> If you increment an index into W and z rather than the pointers >> themselves, then you can eliminate an add and a cmp. > > I add already tested that, and redid it: > cglobal sbr_qmf_post_shuffle, 2,4,3,W,z > mov r3q, 32*4 > lea r2q, [zq + (64-4)*4] > addzq, r3q > leaWq, [Wq + 2*r3q] > neg r3q > .loop: > mova m0, [r2q] > mova m1, [zq + r3q] > xorps m0, [ps_neg] > shufps m0, m0, 0x1B > mova m2, m0 > unpcklps m0, m1 > unpckhps m2, m1 > mova [Wq + 2*r3q + 0], m0 > mova [Wq + 2*r3q + 16], m2 > sub r2q, 16 > add r3q, 16 > jl .loop > REP_RET > > It's 2 cycles slower on Penrynn/Win64 (154 vs 152). Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it? Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
Christophe Gisquet writes: >> 4 space tabs. > > OK, I was a bit puzzled and looking for trailing whitespaces/... You > mean style change then. > A bit cumbersome to redo all patches because of that. :%s/^ // -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
Hello, 2012/11/30 Loren Merritt : > If you increment an index into W and z rather than the pointers > themselves, then you can eliminate an add and a cmp. I add already tested that, and redid it: cglobal sbr_qmf_post_shuffle, 2,4,3,W,z mov r3q, 32*4 lea r2q, [zq + (64-4)*4] addzq, r3q leaWq, [Wq + 2*r3q] neg r3q .loop: mova m0, [r2q] mova m1, [zq + r3q] xorps m0, [ps_neg] shufps m0, m0, 0x1B mova m2, m0 unpcklps m0, m1 unpckhps m2, m1 mova [Wq + 2*r3q + 0], m0 mova [Wq + 2*r3q + 16], m2 sub r2q, 16 add r3q, 16 jl .loop REP_RET It's 2 cycles slower on Penrynn/Win64 (154 vs 152). > 4 space tabs. OK, I was a bit puzzled and looking for trailing whitespaces/... You mean style change then. A bit cumbersome to redo all patches because of that. -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
On Fri, 30 Nov 2012, Christophe Gisquet wrote: > +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z > + lea r2q, [zq + (64-4)*4] > +.loop: > + mova m0, [r2q] > + mova m1, [zq ] > + xorps m0, [ps_neg] > + shufps m0, m0, 0x1B > + mova m2, m0 > + unpcklps m0, m1 > + unpckhps m2, m1 > + mova [Wq + 0], m0 > + mova [Wq + 16], m2 > + addWq, 32 > + sub r2q, 16 > + addzq, 16 > + cmpzq, r2q > + jl .loop > + REP_RET If you increment an index into W and z rather than the pointers themselves, then you can eliminate an add and a cmp. 4 space tabs. --Loren Merritt ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
On penrynn, from 255 to 174c. Unrolling yields no gain. --- libavcodec/x86/sbrdsp.asm| 21 + libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index 11a6faf..2b90100 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -24,6 +24,8 @@ SECTION_RODATA ; mask equivalent for multiply by -1.0 1.0 ps_mask times 2 dd 1<<31, 0 +ps_mask2times 2 dd 0, 1<<31 +ps_neg times 4 dd 1<<31 SECTION_TEXT @@ -203,3 +205,22 @@ cglobal sbr_sum64x5, 1,2,4,z cmp zq, r1q jne .loop REP_RET + +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z + lea r2q, [zq + (64-4)*4] +.loop: + mova m0, [r2q] + mova m1, [zq ] + xorps m0, [ps_neg] + shufps m0, m0, 0x1B + mova m2, m0 + unpcklps m0, m1 + unpckhps m2, m1 + mova [Wq + 0], m0 + mova [Wq + 16], m2 + addWq, 32 + sub r2q, 16 + addzq, 16 + cmpzq, r2q + jl .loop + REP_RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index 108a681..3f6dd97 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -31,6 +31,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2], const float alpha0[2], const float alpha1[2], float bw, int start, int end); void ff_sbr_sum64x5_sse(float *z); +void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z); void ff_sbrdsp_init_x86(SBRDSPContext *s) { @@ -41,5 +42,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) s->hf_g_filt = ff_sbr_hf_g_filt_sse; s->hf_gen = ff_sbr_hf_gen_sse; s->sum64x5= ff_sbr_sum64x5_sse; +s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse; } } -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel