Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Christophe Gisquet
2012/12/21 Justin Ruggles :
> If you are unable to test AVX, I can go through your functions after
> they're committed and test to see if using AVX helps. If you have some
> set of samples which utilize specific functions that would be helpful.

It would be simpler for everyone if I could, and it's a pity I can't,
but I think I'll have to pass for now. Unfortunately, the same issue
may arise for other patches in my backlog.

For samples, either the fate suite (al-sbr* and sbr* files - no make
target), or the 5.1 first audio track from
http://samples.mplayerhq.hu/A-codecs/AAC/zx.eva.renewal.01.divx511.mkv
which I initially used.

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Justin Ruggles
On 12/21/2012 12:39 PM, Christophe Gisquet wrote:
> 2012/12/20 Justin Ruggles :
>> putting [ps_neg] in a register and switching m0 and m2 in the unpacking
>> would allow some 3-arg XMM AVX to be used, like so:
> [...]
> Tested that, no change in generated code, and indeed no change
> speedwise/fate result.
> 
> Updated patch attached.

LGTM.

If you are unable to test AVX, I can go through your functions after
they're committed and test to see if using AVX helps. If you have some
set of samples which utilize specific functions that would be helpful.

Thanks,
Justin
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-21 Thread Christophe Gisquet
2012/12/20 Justin Ruggles :
> putting [ps_neg] in a register and switching m0 and m2 in the unpacking
> would allow some 3-arg XMM AVX to be used, like so:
[...]
Tested that, no change in generated code, and indeed no change
speedwise/fate result.

Updated patch attached.
-- 
Christophe


0002-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch
Description: Binary data
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-20 Thread Justin Ruggles
Hi,

On 12/01/2012 06:17 AM, Christophe Gisquet wrote:
> +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z

INIT_XMM sse

> +lea   r2q, [zq + (64-4)*4]
> +.loop:
> +mova   m0, [r2q]
> +mova   m1, [zq ]
> +xorps  m0, [ps_neg]
> +shufps m0, m0, 0x1B
> +mova   m2, m0
> +unpcklps   m0, m1
> +unpckhps   m2, m1
> +mova  [Wq +  0], m0
> +mova  [Wq + 16], m2

putting [ps_neg] in a register and switching m0 and m2 in the unpacking
would allow some 3-arg XMM AVX to be used, like so:

mova   m3, [ps_neg]
.loop:
mova   m1, [zq]
xorps  m0, m3, [r2q]
shufps m0, m0, m0, q0123
unpcklps   m2, m0, m1
unpckhps   m0, m0, m1
mova  [Wq +  0], m2
mova  [Wq + 16], m0

> +addWq, 32
> +sub   r2q, 16
> +addzq, 16
> +cmpzq, r2q
> +jl  .loop
> +REP_RET

Thanks,
Justin
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-20 Thread Christophe Gisquet
2012/12/7 Christophe Gisquet :
> 2012/12/1 Christophe Gisquet :
>> 2012/11/30 Christophe Gisquet :
 4 space tabs.
>>
>> Done.
>
> Given the results for the parts where a change was investigated, is
> there a need for another review?

Ping2

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-06 Thread Christophe Gisquet
2012/12/1 Christophe Gisquet :
> 2012/11/30 Christophe Gisquet :
>>> 4 space tabs.
>
> Done.

Given the results for the parts where a change was investigated, is
there a need for another review?

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-01 Thread Christophe Gisquet
2012/11/30 Christophe Gisquet :
>> 4 space tabs.

Done.

-- 
Christophe


0003-SBR-DSP-x86-implement-SSE-qmf_post_shuffle.patch
Description: Binary data
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-12-01 Thread Christophe Gisquet
2012/12/1 Ronald S. Bultje :
> Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it?

No change.

On that cpu and os, I observed such strange facts:
- complex addressing costs sometimes noticeably
- aligning jump positions rarely helps
- trying to hide latency of operations often doesn't result in
measurable changes (but I guess it does on other cpus)

All in all, I could have recovered those cycles maybe by doing a
reverse scan, but at this point the validation didn't seem worth the
very marginal and hypothetical gain.

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Ronald S. Bultje
Hi,

On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet
 wrote:
> Hello,
>
> 2012/11/30 Loren Merritt :
>> If you increment an index into W and z rather than the pointers
>> themselves, then you can eliminate an add and a cmp.
>
> I add already tested that, and redid it:
> cglobal sbr_qmf_post_shuffle, 2,4,3,W,z
>   mov   r3q, 32*4
>   lea   r2q, [zq + (64-4)*4]
>   addzq, r3q
>   leaWq, [Wq + 2*r3q]
>   neg   r3q
> .loop:
>   mova   m0, [r2q]
>   mova   m1, [zq  + r3q]
>   xorps  m0, [ps_neg]
>   shufps m0, m0, 0x1B
>   mova   m2, m0
>   unpcklps   m0, m1
>   unpckhps   m2, m1
>   mova  [Wq + 2*r3q +  0], m0
>   mova  [Wq + 2*r3q + 16], m2
>   sub   r2q, 16
>   add   r3q, 16
>   jl  .loop
>   REP_RET
>
> It's 2 cycles slower on Penrynn/Win64 (154 vs 152).

Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Måns Rullgård
Christophe Gisquet  writes:

>> 4 space tabs.
>
> OK, I was a bit puzzled and looking for trailing whitespaces/... You
> mean style change then.
> A bit cumbersome to redo all patches because of that.

:%s/^  //

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
Hello,

2012/11/30 Loren Merritt :
> If you increment an index into W and z rather than the pointers
> themselves, then you can eliminate an add and a cmp.

I add already tested that, and redid it:
cglobal sbr_qmf_post_shuffle, 2,4,3,W,z
  mov   r3q, 32*4
  lea   r2q, [zq + (64-4)*4]
  addzq, r3q
  leaWq, [Wq + 2*r3q]
  neg   r3q
.loop:
  mova   m0, [r2q]
  mova   m1, [zq  + r3q]
  xorps  m0, [ps_neg]
  shufps m0, m0, 0x1B
  mova   m2, m0
  unpcklps   m0, m1
  unpckhps   m2, m1
  mova  [Wq + 2*r3q +  0], m0
  mova  [Wq + 2*r3q + 16], m2
  sub   r2q, 16
  add   r3q, 16
  jl  .loop
  REP_RET

It's 2 cycles slower on Penrynn/Win64 (154 vs 152).

> 4 space tabs.

OK, I was a bit puzzled and looking for trailing whitespaces/... You
mean style change then.
A bit cumbersome to redo all patches because of that.

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Loren Merritt
On Fri, 30 Nov 2012, Christophe Gisquet wrote:

> +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
> +  lea   r2q, [zq + (64-4)*4]
> +.loop:
> +  mova   m0, [r2q]
> +  mova   m1, [zq ]
> +  xorps  m0, [ps_neg]
> +  shufps m0, m0, 0x1B
> +  mova   m2, m0
> +  unpcklps   m0, m1
> +  unpckhps   m2, m1
> +  mova  [Wq +  0], m0
> +  mova  [Wq + 16], m2
> +  addWq, 32
> +  sub   r2q, 16
> +  addzq, 16
> +  cmpzq, r2q
> +  jl  .loop
> +  REP_RET

If you increment an index into W and z rather than the pointers
themselves, then you can eliminate an add and a cmp.

4 space tabs.

--Loren Merritt
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
On penrynn, from 255 to 174c. Unrolling yields no gain.
---
 libavcodec/x86/sbrdsp.asm|   21 +
 libavcodec/x86/sbrdsp_init.c |2 ++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index 11a6faf..2b90100 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -24,6 +24,8 @@
 SECTION_RODATA
 ; mask equivalent for multiply by -1.0 1.0
 ps_mask times 2 dd 1<<31, 0
+ps_mask2times 2 dd 0, 1<<31
+ps_neg  times 4 dd 1<<31
 
 SECTION_TEXT
 
@@ -203,3 +205,22 @@ cglobal sbr_sum64x5, 1,2,4,z
   cmp zq, r1q
   jne  .loop
   REP_RET
+
+cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
+  lea   r2q, [zq + (64-4)*4]
+.loop:
+  mova   m0, [r2q]
+  mova   m1, [zq ]
+  xorps  m0, [ps_neg]
+  shufps m0, m0, 0x1B
+  mova   m2, m0
+  unpcklps   m0, m1
+  unpckhps   m2, m1
+  mova  [Wq +  0], m0
+  mova  [Wq + 16], m2
+  addWq, 32
+  sub   r2q, 16
+  addzq, 16
+  cmpzq, r2q
+  jl  .loop
+  REP_RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index 108a681..3f6dd97 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -31,6 +31,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float 
(*X_low)[2],
const float alpha0[2], const float alpha1[2],
float bw, int start, int end);
 void ff_sbr_sum64x5_sse(float *z);
+void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z);
 
 void ff_sbrdsp_init_x86(SBRDSPContext *s)
 {
@@ -41,5 +42,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 s->hf_g_filt  = ff_sbr_hf_g_filt_sse;
 s->hf_gen = ff_sbr_hf_gen_sse;
 s->sum64x5= ff_sbr_sum64x5_sse;
+s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse;
 }
 }
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel