Re: [libav-devel] Invitation to connect on LinkedIn

2012-11-30 Thread Attila Kinali
On Sat, 1 Dec 2012 06:56:08 + (UTC)
fei wang  wrote:


> I'd like to add you to my professional network on LinkedIn.

I blocked linkedin at our mailserver, this shouldnt happen again.

I also unsubscribed this guy for trying to add a mailinglist to linkedin.

Attila Kinali


-- 
There is no secret ingredient
 -- Po, Kung Fu Panda
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] Request for a raspberry pi hardware accelerated scaling function

2012-11-30 Thread Attila Kinali
Dear raspberry pi user

On Fri, 30 Nov 2012 20:33:28 +0100
Arjen Vellekoop  wrote:

> I am a bit confused, just subscribed and expected to be redericted to a
> forum. Since this is not the case I'll try this way. Hope not to spam
> the whole list. pls point me to the right direction if I am in the wrong
> list here.

Welcome to the world of opensource, where webforums are frowned upon,
because they are damn inefficient to use if you are dealing with
hundreds of messages every day.

 
> A request for compiling an avconv version that is hardware accelerated
> probably already exists (if not pls consider to do so), however for my
> application I only need scaling. My DVB-T dongle receives MPEG2 in SD
> (704x576) And I only need to transcode it to the same codec but at a
> quarter resolution (CIF=352x288)

Well.. You should note a few things.
First, this mailinglist is about development of libav, not about its usage.
This means, if you have questions about how to modify the code of libav,
or have done some modifications that you would like to share with others,
then you are at the right place. If it's just about usage, and compiling
without modification of the _code_ is usage, then you should choose
a different mailinglist.

The second thing is, we do not provide binaries of the library.
This is the job of distributions. All you get from us is the source code.
(Ok, a few of those who package libav for distributions are also
libav developers, but that's beside the point)

The third thing is, that the raspberry pi is a horribly closed and
undocumented piece of hardware. About the only thing that is documented
is it's CPU core, and that documentation comes from ARM and not from
Broadcom. With this little documentation it is nearly impossible to
support even the most basic functionallity properly. Even Nvidia provides
more help for getting their hardware working with opensource software.
The insane amount of bit baning raspberry pi users have to do, even for
the most simple stuff (like I2C)[1] is a tell tale of this. I really feel
pitty for all of you, who have been tricked into buying this piece of
*censored* by the raspberry pi foundation and the hype they created.

Or to summarize: it's very unlikely that anyone has been able to modify
libav or any other encoding library to use the hardware acceleration of
the raspberry pi. I recommend you to get yourself either a BeagleBoard
or a PandaBoard which are fully documented and most of its hardware
drivers are already in the mainline kernel. The PandaBoard with its
dual core 1.2GHz should even have enough cpu power to encode your video
in real time without using any hardware acceleration, which would be
also available.


Attila Kinali

[1]http://www.google.com/search?q=raspberry+pi+bit+banging 
-- 
There is no secret ingredient
 -- Po, Kung Fu Panda
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] Invitation to connect on LinkedIn

2012-11-30 Thread fei wang
LinkedIn




libav,

I'd like to add you to my professional network on LinkedIn.

- fei

fei wang
software engineer at Marvell corp.
Shanghai City, China

Confirm that you know fei wang:
https://www.linkedin.com/e/-yb2raf-ha6e0xg6-14/isd/9843425302/RxRGuAd1/?hs=false&tok=1OxcRjoCG0rlw1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-yb2raf-ha6e0xg6-14/qxBs_HW79xfhyWpS4lsvpOW79xfh3Ctvi6/goo/libav-devel%40libav%2Eorg/20061/I3284912995_1/?hs=false&tok=0RHHn8jpG0rlw1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.


  
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 10/10] SBR DSP x86: implement SSE hf_apply_noise

2012-11-30 Thread Jason Garrett-Glaser
On Fri, Nov 30, 2012 at 6:58 AM, Christophe Gisquet
 wrote:
> 497 to 253 cycles under Win64.
> Replacing the multiplication by s_m[m] by an andps and an xorps with
> appropriate vectors is slower. Unrolling is a 15 cycles win.
> ---
>  libavcodec/sbrdsp.c  |1 -
>  libavcodec/x86/sbrdsp.asm|   93 
> ++
>  libavcodec/x86/sbrdsp_init.c |   16 +++
>  3 files changed, 109 insertions(+), 1 deletions(-)
>
> diff --git a/libavcodec/sbrdsp.c b/libavcodec/sbrdsp.c
> index 781ec83..d0a0b93 100644
> --- a/libavcodec/sbrdsp.c
> +++ b/libavcodec/sbrdsp.c
> @@ -175,7 +175,6 @@ static av_always_inline void sbr_hf_apply_noise(float 
> (*Y)[2],
>  int m_max)
>  {
>  int m;
> -
>  for (m = 0; m < m_max; m++) {
>  float y0 = Y[m][0];
>  float y1 = Y[m][1];
> diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
> index cfbd6e8..608dee6 100644
> --- a/libavcodec/x86/sbrdsp.asm
> +++ b/libavcodec/x86/sbrdsp.asm
> @@ -26,6 +26,12 @@ SECTION_RODATA
>  ps_mask times 2 dd 1<<31, 0
>  ps_mask2times 2 dd 0, 1<<31
>  ps_neg  times 4 dd 1<<31
> +ps_noise0   times 2 dd  1.0,  0.0,
> +ps_noise2   times 2 dd -1.0,  0.0
> +ps_noise13  dd  0.0,  1.0, 0.0, -1.0
> +dd  0.0, -1.0, 0.0,  1.0
> +dd  0.0,  1.0, 0.0, -1.0
> +cextern sbr_noise_table
>
>  SECTION_TEXT
>
> @@ -318,3 +324,90 @@ cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c
>subcq, 2*mmsize
>jge .loop
>REP_RET
> +
> +; r0q=Y   r1q=s_m   r2q=q_filt   r3q=noise  r4q=max_m
> +cglobal hf_apply_noise_main
> +  dec   r3q
> +  shl   r4q, 2
> +  lea   r0q, [r0q + 2*r4q]
> +  add   r1q, r4q
> +  add   r2q, r4q
> +  shl   r3q, 3
> +  xorps  m5, m5
> +  neg   r4q
> +.loop:
> +  add   r3q, 16
> +  and   r3q, 0x1ff<<3
> +  movh   m1, [r2q + r4q]
> +  movu   m3, [r3q + sbr_noise_table]
> +  movh   m2, [r2q + r4q + 8]
> +  add   r3q, 16
> +  and   r3q, 0x1ff<<3
> +  movu   m4, [r3q + sbr_noise_table]
> +  unpcklps   m1, m1
> +  unpcklps   m2, m2
> +  mulps  m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
> +  mulps  m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
> +  movh   m3, [r1q + r4q]
> +  movh   m4, [r1q + r4q + 8]
> +  unpcklps   m3, m3
> +  unpcklps   m4, m4
> +  mova   m6, m3
> +  mova   m7, m4
> +  mulps  m3, m0 ; s_m[m] * phi_sign
> +  mulps  m4, m0 ; s_m[m] * phi_sign
> +  cmpps  m6, m5, 0 ; m1 == 0
> +  cmpps  m7, m5, 0 ; m1 == 0
> +  andps  m1, m6
> +  andps  m2, m7
> +  movu   m6, [r0q + 2*r4q]
> +  movu   m7, [r0q + 2*r4q + 16]
> +  addps  m6, m1
> +  addps  m7, m2
> +  addps  m6, m3
> +  addps  m7, m4

Maybe add m1/m2 to m3/m4 before to m6/m7, to better hide the memory load?

Jason
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Ronald S. Bultje
Hi,

On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet
 wrote:
> Hello,
>
> 2012/11/30 Loren Merritt :
>> If you increment an index into W and z rather than the pointers
>> themselves, then you can eliminate an add and a cmp.
>
> I add already tested that, and redid it:
> cglobal sbr_qmf_post_shuffle, 2,4,3,W,z
>   mov   r3q, 32*4
>   lea   r2q, [zq + (64-4)*4]
>   addzq, r3q
>   leaWq, [Wq + 2*r3q]
>   neg   r3q
> .loop:
>   mova   m0, [r2q]
>   mova   m1, [zq  + r3q]
>   xorps  m0, [ps_neg]
>   shufps m0, m0, 0x1B
>   mova   m2, m0
>   unpcklps   m0, m1
>   unpckhps   m2, m1
>   mova  [Wq + 2*r3q +  0], m0
>   mova  [Wq + 2*r3q + 16], m2
>   sub   r2q, 16
>   add   r3q, 16
>   jl  .loop
>   REP_RET
>
> It's 2 cycles slower on Penrynn/Win64 (154 vs 152).

Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Måns Rullgård
Christophe Gisquet  writes:

>> 4 space tabs.
>
> OK, I was a bit puzzled and looking for trailing whitespaces/... You
> mean style change then.
> A bit cumbersome to redo all patches because of that.

:%s/^  //

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 01/10] SBR DSP x86: implement SSE sbr_hf_gen

2012-11-30 Thread Christophe Gisquet
Hello,

2012/11/30 Loren Merritt :
> Recommend using base-4 for shuffle constants.

I wrote that code like 6 months ago, before I really wrapped my head
around/noticed that.
Do you want me to change it now, or is that a remark for later contributions?

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
Hello,

2012/11/30 Loren Merritt :
> If you increment an index into W and z rather than the pointers
> themselves, then you can eliminate an add and a cmp.

I add already tested that, and redid it:
cglobal sbr_qmf_post_shuffle, 2,4,3,W,z
  mov   r3q, 32*4
  lea   r2q, [zq + (64-4)*4]
  addzq, r3q
  leaWq, [Wq + 2*r3q]
  neg   r3q
.loop:
  mova   m0, [r2q]
  mova   m1, [zq  + r3q]
  xorps  m0, [ps_neg]
  shufps m0, m0, 0x1B
  mova   m2, m0
  unpcklps   m0, m1
  unpckhps   m2, m1
  mova  [Wq + 2*r3q +  0], m0
  mova  [Wq + 2*r3q + 16], m2
  sub   r2q, 16
  add   r3q, 16
  jl  .loop
  REP_RET

It's 2 cycles slower on Penrynn/Win64 (154 vs 152).

> 4 space tabs.

OK, I was a bit puzzled and looking for trailing whitespaces/... You
mean style change then.
A bit cumbersome to redo all patches because of that.

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.

2012-11-30 Thread Luca Barbato
On 11/30/12 6:28 PM, Christophe Gisquet wrote:
> 2012/11/30 Luca Barbato :
>> The idea is nice, is Ypos always 0 or 1?
> 
> Yes, and actually, Ypos is here because we already dealt with a
> similar situation (see commit
> cc412b71047ebf77c7e810c90b044f018a1c0c2d).
> 
> So I am just reapplying the same solution.
> 

Fine for me then. thanks for checking =)

lu
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 10/10] SBR DSP x86: implement SSE hf_apply_noise

2012-11-30 Thread Loren Merritt
On Fri, 30 Nov 2012, Christophe Gisquet wrote:

> 497 to 253 cycles under Win64.

cpu is more relevant than os.

> +; r0q=Y   r1q=s_m   r2q=q_filt   r3q=noise  r4q=max_m
> +cglobal hf_apply_noise_main

You can invoke DEFINE_ARGS even if not generating a prologue.

> +  dec   r3q
> +  shl   r4q, 2
> +  lea   r0q, [r0q + 2*r4q]
> +  add   r1q, r4q
> +  add   r2q, r4q
> +  shl   r3q, 3
> +  xorps  m5, m5
> +  neg   r4q
> +.loop:
> +  add   r3q, 16
> +  and   r3q, 0x1ff<<3
> +  movh   m1, [r2q + r4q]
> +  movu   m3, [r3q + sbr_noise_table]
> +  movh   m2, [r2q + r4q + 8]
> +  add   r3q, 16
> +  and   r3q, 0x1ff<<3
> +  movu   m4, [r3q + sbr_noise_table]
> +  unpcklps   m1, m1
> +  unpcklps   m2, m2
> +  mulps  m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
> +  mulps  m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
> +  movh   m3, [r1q + r4q]
> +  movh   m4, [r1q + r4q + 8]

Can these be a single aligned load?

> +  unpcklps   m3, m3
> +  unpcklps   m4, m4
> +  mova   m6, m3
> +  mova   m7, m4
> +  mulps  m3, m0 ; s_m[m] * phi_sign
> +  mulps  m4, m0 ; s_m[m] * phi_sign
> +  cmpps  m6, m5, 0 ; m1 == 0
> +  cmpps  m7, m5, 0 ; m1 == 0

You mean m7 == 0?

> +  andps  m1, m6
> +  andps  m2, m7
> +  movu   m6, [r0q + 2*r4q]
> +  movu   m7, [r0q + 2*r4q + 16]
> +  addps  m6, m1
> +  addps  m7, m2
> +  addps  m6, m3
> +  addps  m7, m4
> +  movu[r0q + 2*r4q], m6
> +  movu[r0q + 2*r4q + 16], m7
> +  add   r4q, 16
> +  jl  .loop
> +  ret
> +
> +; sbr_hf_apply_noise_0(float (*Y)[2], const float *s_m,
> +;  const float *q_filt, int noise,
> +;  int kx, int m_max)
> +cglobal sbr_hf_apply_noise_0, 4,5,8, Y,s_m,q_filt,noise,kx,m_max
> +  mova   m0, [ps_noise0]
> +  mov   r4d, m_maxm
> +  call  hf_apply_noise_main
> +  RET

TAIL_CALL hf_apply_noise_main, 1

--Loren Merritt
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Loren Merritt
On Fri, 30 Nov 2012, Christophe Gisquet wrote:

> +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
> +  lea   r2q, [zq + (64-4)*4]
> +.loop:
> +  mova   m0, [r2q]
> +  mova   m1, [zq ]
> +  xorps  m0, [ps_neg]
> +  shufps m0, m0, 0x1B
> +  mova   m2, m0
> +  unpcklps   m0, m1
> +  unpckhps   m2, m1
> +  mova  [Wq +  0], m0
> +  mova  [Wq + 16], m2
> +  addWq, 32
> +  sub   r2q, 16
> +  addzq, 16
> +  cmpzq, r2q
> +  jl  .loop
> +  REP_RET

If you increment an index into W and z rather than the pointers
themselves, then you can eliminate an add and a cmp.

4 space tabs.

--Loren Merritt
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 02/10] SBR DSP x86: implement SSE sum64x5

2012-11-30 Thread Loren Merritt
On Fri, 30 Nov 2012, Christophe Gisquet wrote:

> 698 to 174 cycles on penrynn. Unrolling is a 6 cycles gain.
>
> ---
>  libavcodec/x86/sbrdsp.asm|   22 ++
>  libavcodec/x86/sbrdsp_init.c |2 ++
>  2 files changed, 24 insertions(+), 0 deletions(-)

LGTM.

--Loren Merritt
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 01/10] SBR DSP x86: implement SSE sbr_hf_gen

2012-11-30 Thread Loren Merritt
On Fri, 30 Nov 2012, Christophe Gisquet wrote:

> +movam0, [X_lowq + start]
> +movlhps m1, m1 ; (a2 a3 a2 a3)
> +movlhps m2, m2 ; (a0 a1 a0 a1)
> +shufps  m3, m3, 00010001b  ; (a3 a2 a3 a2)
> +shufps  m4, m4, 00010001b  ; (a1 a0 a1 a0)
> +xorps   m3, m7 ; (-a3 a2 -a3 a2)
> +xorps   m4, m7 ; (-a1 a0 -a1 a0)
> +.loop2:
> +movam5, m0
> +movam6, m0
> +shufps  m0, m0, 1010b ; {Xl[-2][0],",Xl[-1][0],"}
> +shufps  m5, m5, 0101b ; {Xl[-2][1],",Xl[-1][1],"}
> +mulps   m0, m2
> +mulps   m5, m4
> +movam7, m6
> +addps   m5, m0
> +movam0, [X_lowq + start + 2*2*4]
> +shufps  m6, m0, 1010b ; {Xl[-1][0],",Xl[0][0],"}
> +shufps  m7, m0, 0101b ; {Xl[-1][1],",Xl[1][1],"}

Recommend using base-4 for shuffle constants.

--Loren Merritt
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] Request for a raspberry pi hardware accelerated scaling function

2012-11-30 Thread Arjen Vellekoop

Sorry, sent it previously from an unknown email address

Dear readers.

I am a bit confused, just subscribed and expected to be redericted to a
forum. Since this is not the case I'll try this way. Hope not to spam
the whole list. pls point me to the right direction if I am in the wrong
list here.

Here's my situation

I own a raspberry Pi. Connected a IT9135 DVB-T USB dongle, installed
raspbian (a Debain linux special for this hardware) and tvheadend. I am
able to stream and watch TV over my LAN network. However I like to
stream TV over the internet, but I lack the required bandwidth to do so.
This issue is often referred to in the raspberry forum

I can do some aftermath once I recorded a program in tvheadend, but a 1
hour show will take 3 hours to transcode in the raspberry with avconv.
The CPU is limited.
However, there is a GPU with hardware acceleration in the raspberry pi.
And there is an example program under /opt/vc/.../hello_video that
works. It works with an Openmax API

A request for compiling an avconv version that is hardware accelerated
probably already exists (if not pls consider to do so), however for my
application I only need scaling. My DVB-T dongle receives MPEG2 in SD
(704x576) And I only need to transcode it to the same codec but at a
quarter resolution (CIF=352x288)

So for now I do not need the whole library to be recompiled for hardware
acceleration just he scaling bit.

Could this be possible?

Thanks







___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 1/1] golomb: use unsigned arithmetics in svq3_get_ue_golomb()

2012-11-30 Thread Janne Grunau
This prevents undefined behaviour of signed left shift if the coded
value is larger than 2^31. Large values are most likely invalid and
caused errors or by feeding random.

Validate every use of svq3_get_ue_golomb() and changed the place there
the return value was compared with negative numbers. dirac.c was clean,
fixed rv30 and svq3.
---
 libavcodec/golomb.h |  5 +++--
 libavcodec/rv30.c   |  6 +++---
 libavcodec/svq3.c   | 17 -
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h
index 6f95a67..564ba4e 100644
--- a/libavcodec/golomb.h
+++ b/libavcodec/golomb.h
@@ -107,7 +107,8 @@ static inline int get_ue_golomb_31(GetBitContext *gb){
 return ff_ue_golomb_vlc_code[buf];
 }
 
-static inline int svq3_get_ue_golomb(GetBitContext *gb){
+static inline unsigned svq3_get_ue_golomb(GetBitContext *gb)
+{
 uint32_t buf;
 
 OPEN_READER(re, gb);
@@ -121,7 +122,7 @@ static inline int svq3_get_ue_golomb(GetBitContext *gb){
 
 return ff_interleaved_ue_golomb_vlc_code[buf];
 }else{
-int ret = 1;
+unsigned ret = 1;
 
 do {
 buf >>= 32 - 8;
diff --git a/libavcodec/rv30.c b/libavcodec/rv30.c
index 8016ad3..e4f3251 100644
--- a/libavcodec/rv30.c
+++ b/libavcodec/rv30.c
@@ -73,7 +73,7 @@ static int rv30_decode_intra_types(RV34DecContext *r, 
GetBitContext *gb, int8_t
 
 for(i = 0; i < 4; i++, dst += r->intra_types_stride - 4){
 for(j = 0; j < 4; j+= 2){
-int code = svq3_get_ue_golomb(gb) << 1;
+unsigned code = svq3_get_ue_golomb(gb) << 1;
 if(code >= 81*2){
 av_log(r->s.avctx, AV_LOG_ERROR, "Incorrect intra prediction 
code\n");
 return -1;
@@ -101,9 +101,9 @@ static int rv30_decode_mb_info(RV34DecContext *r)
 static const int rv30_b_types[6] = { RV34_MB_SKIP, RV34_MB_B_DIRECT, 
RV34_MB_B_FORWARD, RV34_MB_B_BACKWARD, RV34_MB_TYPE_INTRA, 
RV34_MB_TYPE_INTRA16x16 };
 MpegEncContext *s = &r->s;
 GetBitContext *gb = &s->gb;
-int code = svq3_get_ue_golomb(gb);
+unsigned code = svq3_get_ue_golomb(gb);
 
-if (code < 0 || code > 11) {
+if (code > 11) {
 av_log(s->avctx, AV_LOG_ERROR, "Incorrect MB type code\n");
 return -1;
 }
diff --git a/libavcodec/svq3.c b/libavcodec/svq3.c
index ac8d9c1..4f0c2c0 100644
--- a/libavcodec/svq3.c
+++ b/libavcodec/svq3.c
@@ -216,17 +216,15 @@ static inline int svq3_decode_block(GetBitContext *gb, 
DCTELEM *block,
 static const uint8_t *const scan_patterns[4] =
 { luma_dc_zigzag_scan, zigzag_scan, svq3_scan, chroma_dc_scan };
 
-int run, level, sign, vlc, limit;
+int run, level, limit;
+unsigned vlc;
 const int intra   = 3 * type >> 2;
 const uint8_t *const scan = scan_patterns[type];
 
 for (limit = (16 >> intra); index < 16; index = limit, limit += 8) {
 for (; (vlc = svq3_get_ue_golomb(gb)) != 0; index++) {
-if (vlc == INVALID_VLC)
-return -1;
-
-sign = (vlc & 0x1) - 1;
-vlc  = vlc + 1 >> 1;
+int sign = (vlc & 1) ? 0 : -1;
+vlc  = vlc + 1 >> 1;
 
 if (type == 3) {
 if (vlc < 3) {
@@ -786,7 +784,7 @@ static int svq3_decode_slice_header(AVCodecContext *avctx)
 skip_bits_long(&s->gb, 0);
 }
 
-if ((i = svq3_get_ue_golomb(&s->gb)) == INVALID_VLC || i >= 3) {
+if ((i = svq3_get_ue_golomb(&s->gb)) >= 3) {
 av_log(h->s.avctx, AV_LOG_ERROR, "illegal slice type %d \n", i);
 return -1;
 }
@@ -1010,7 +1008,7 @@ static int svq3_decode_frame(AVCodecContext *avctx, void 
*data,
 H264Context *h = &svq3->h;
 MpegEncContext *s  = &h->s;
 int buf_size   = avpkt->size;
-int m, mb_type;
+int m;
 
 /* special case for last picture */
 if (buf_size == 0) {
@@ -1093,6 +1091,7 @@ static int svq3_decode_frame(AVCodecContext *avctx, void 
*data,
 
 for (s->mb_y = 0; s->mb_y < s->mb_height; s->mb_y++) {
 for (s->mb_x = 0; s->mb_x < s->mb_width; s->mb_x++) {
+unsigned mb_type;
 h->mb_xy = s->mb_x + s->mb_y * s->mb_stride;
 
 if ((get_bits_count(&s->gb) + 7) >= s->gb.size_in_bits &&
@@ -1113,7 +1112,7 @@ static int svq3_decode_frame(AVCodecContext *avctx, void 
*data,
 mb_type += 8;
 else if (s->pict_type == AV_PICTURE_TYPE_B && mb_type >= 4)
 mb_type += 4;
-if ((unsigned)mb_type > 33 || svq3_decode_mb(svq3, mb_type)) {
+if (mb_type > 33 || svq3_decode_mb(svq3, mb_type)) {
 av_log(h->s.avctx, AV_LOG_ERROR,
"error while decoding MB %d %d\n", s->mb_x, s->mb_y);
 return -1;
-- 
1.7.12.4

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] configure: sunos: clean up shared library options

2012-11-30 Thread Sean McGovern
On Wednesday, November 28, 2012, Sean McGovern  wrote:
> On Wed, Nov 28, 2012 at 8:08 PM, Måns Rullgård  wrote:
>> Sean McGovern  writes:
>>
>>> Several of the options were incorrect for suncc.
>>> ---
>>>  configure | 8 +---
>>>  1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/configure b/configure
>>> index ca11a85..f094a32 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -2196,7 +2196,8 @@ suncc_flags(){
>>>  -fomit-frame-pointer) echo -xregs=frameptr;;
>>>  -fPIC)echo -KPIC -xcode=pic32 ;;
>>>  -W*,*)echo $flag  ;;
>>> --f*-*|-W*);;
>>> +-f*-*|-W*|-mimpure-text)  ;;
>>> +-shared)  echo -G ;;
>>>  *)echo $flag  ;;
>>>  esac
>>>  done
>>> @@ -2748,8 +2749,9 @@ case $target_os in
>>>  ;;
>>>  sunos)
>>>  AVSERVERLDFLAGS=""
>>> -SHFLAGS='-shared -Wl,-h,$$(@F)'
>>> -enabled x86 && SHFLAGS="-mimpure-text $SHFLAGS"
>>> +SHFLAGS='-Wl,-h,$$(@F)'
>>> +append SHFLAGS $($ldflags_filter -shared)
>>> +enabled x86 && append SHFLAGS $($ldflags_filter -mimpure-text)
>>>  network_extralibs="-lsocket -lnsl"
>>>  add_cppflags -D__EXTENSIONS__ -D_XOPEN_SOURCE=600
>>>  # When using suncc to build, the Solaris linker will mark
>>> --
>>
>> I have an even better idea.  Drop the second hunk above and apply this
>> instead:
>>
>> diff --git a/configure b/configure
>> index 38f52b1..0c580e1c 100755
>> --- a/configure
>> +++ b/configure
>> @@ -3781,7 +3781,7 @@ LD_PATH=$LD_PATH
>>  DLLTOOL=$dlltool
>>  LDFLAGS=$LDFLAGS
>>  LDFLAGS-avserver=$AVSERVERLDFLAGS
>> -SHFLAGS=$SHFLAGS
>> +SHFLAGS=$($ldflags_filter $SHFLAGS)
>>  YASMFLAGS=$YASMFLAGS
>>  BUILDSUF=$build_suffix
>>  FULLNAME=$FULLNAME
>>
>>
>> --
>> Måns Rullgård
>> m...@mansr.com
>
> This didn't go so well... SHFLAGS now contains a very unwanted linebreak.

My bash-fu is not great, this seems to be due to string tokenization inside
the here-document. I can echo it with the same statement right above the
here-doc and no linebreak is present.

-- Sean McG.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.

2012-11-30 Thread Christophe Gisquet
2012/11/30 Luca Barbato :
> The idea is nice, is Ypos always 0 or 1?

Yes, and actually, Ypos is here because we already dealt with a
similar situation (see commit
cc412b71047ebf77c7e810c90b044f018a1c0c2d).

So I am just reapplying the same solution.

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.

2012-11-30 Thread Luca Barbato
On 11/30/12 3:58 PM, Christophe Gisquet wrote:
> Swapping buffer indices allows saving one memcpy that accounts for 1% of the
> runtime, according to oprofile.
> ---
>  libavcodec/aacsbr.c |   22 +++---
>  1 files changed, 11 insertions(+), 11 deletions(-)

The idea is nice, is Ypos always 0 or 1?

lu


___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/2] lavr: clarify documentation for avresample_get/set_matrix()

2012-11-30 Thread Anton Khirnov

On Fri, 30 Nov 2012 11:18:08 -0500, Justin Ruggles  
wrote:
> On 11/30/2012 05:47 AM, Anton Khirnov wrote:
> > 
> > On Thu, 29 Nov 2012 15:08:04 -0500, Justin Ruggles 
> >  wrote:
> >> ---
> >>  libavresample/avresample.h |6 +-
> >>  1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/libavresample/avresample.h b/libavresample/avresample.h
> >> index affeeeb..a73d686 100644
> >> --- a/libavresample/avresample.h
> >> +++ b/libavresample/avresample.h
> >> @@ -216,6 +216,9 @@ int avresample_build_matrix(uint64_t in_layout, 
> >> uint64_t out_layout,
> >>  /**
> >>   * Get the current channel mixing matrix.
> >>   *
> >> + * If no custom matrix has been previously set or the 
> >> AVAudioResampleContext is
> >> + * not open, an error is returned.
> > 
> > Ok
> > 
> >> + *
> >>   * @param avr audio resample context
> >>   * @param matrix  mixing coefficients; matrix[i + stride * o] is the 
> >> weight of
> >>   *input channel i in output channel o.
> >> @@ -231,7 +234,8 @@ int avresample_get_matrix(AVAudioResampleContext *avr, 
> >> double *matrix,
> >>   * Allows for setting a custom mixing matrix, overriding the default 
> >> matrix
> >>   * generated internally during avresample_open(). This function can be 
> >> called
> >>   * anytime on an allocated context, either before or after calling
> >> - * avresample_open(). avresample_convert() always uses the current matrix.
> >> + * avresample_open(), as long as the channel layouts have been set.
> >> + * avresample_convert() always uses the current matrix.
> > 
> > 
> > Why bother mentioning this explicitly? If the channel layouts are not set,
> > avresample_open() will fail and avresample_convert() cannot be called at 
> > all.
> 
> Because avresample_get/set_matrix() can be called before
> avresample_open(). So if the user wants to do that they just have to
> make sure they set the layouts first.
> 

Ah nvm, seems I just parsed that sentence wrong. Patch LGTM

-- 
Anton Khirnov
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/2] lavr: clarify documentation for avresample_get/set_matrix()

2012-11-30 Thread Justin Ruggles
On 11/30/2012 05:47 AM, Anton Khirnov wrote:
> 
> On Thu, 29 Nov 2012 15:08:04 -0500, Justin Ruggles  
> wrote:
>> ---
>>  libavresample/avresample.h |6 +-
>>  1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/libavresample/avresample.h b/libavresample/avresample.h
>> index affeeeb..a73d686 100644
>> --- a/libavresample/avresample.h
>> +++ b/libavresample/avresample.h
>> @@ -216,6 +216,9 @@ int avresample_build_matrix(uint64_t in_layout, uint64_t 
>> out_layout,
>>  /**
>>   * Get the current channel mixing matrix.
>>   *
>> + * If no custom matrix has been previously set or the 
>> AVAudioResampleContext is
>> + * not open, an error is returned.
> 
> Ok
> 
>> + *
>>   * @param avr audio resample context
>>   * @param matrix  mixing coefficients; matrix[i + stride * o] is the weight 
>> of
>>   *input channel i in output channel o.
>> @@ -231,7 +234,8 @@ int avresample_get_matrix(AVAudioResampleContext *avr, 
>> double *matrix,
>>   * Allows for setting a custom mixing matrix, overriding the default matrix
>>   * generated internally during avresample_open(). This function can be 
>> called
>>   * anytime on an allocated context, either before or after calling
>> - * avresample_open(). avresample_convert() always uses the current matrix.
>> + * avresample_open(), as long as the channel layouts have been set.
>> + * avresample_convert() always uses the current matrix.
> 
> 
> Why bother mentioning this explicitly? If the channel layouts are not set,
> avresample_open() will fail and avresample_convert() cannot be called at all.

Because avresample_get/set_matrix() can be called before
avresample_open(). So if the user wants to do that they just have to
make sure they set the layouts first.

-Justin
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 00/10] Various patches for AAC SBR DSP

2012-11-30 Thread Christophe Gisquet
2012/11/30 Måns Rullgård :
>> I couldn't find a vector excercising qmf_deint_neg, and I guess neither did
>> the one who wrote it,
>
> I can assure you I did, but I don't remember which one(s).

Ok, it's just that it looked so much like where I ended that I thought
the same had happened to you.

But as far as I know, no sample in the fate suite tests for that code
block/dsp function. I also extracted/used some samples from the
mplayerhq archives, but to no avail.

-- 
Christophe
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/1] avprobe: report per stream bit rate if set by the decoder

2012-11-30 Thread Luca Barbato
On 11/30/12 3:45 PM, Janne Grunau wrote:
> ---
>  avprobe.c | 4 
>  1 file changed, 4 insertions(+)
> 

Ok.

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/3] avutil: Use a configure check to enable windows console functions

2012-11-30 Thread Måns Rullgård
Martin Storsjö  writes:

> Not all versions or API subsets of windows have these functions.
>
> Signed-off-by: Martin Storsjö 
> ---
>  configure   |2 ++
>  libavutil/log.c |4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)

LGTM

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/3] avutil: Include io.h with a separate condition from windows console functions

2012-11-30 Thread Måns Rullgård
Martin Storsjö  writes:

> Not all versions of windows have the console color functions,
> while io.h might be needed for isatty (which can be found in
> unistd.h or io.h).
>
> Signed-off-by: Martin Storsjö 
> ---
>  libavutil/log.c |4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

LGTM

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 3/3] configure: Use headers in the check for _beginthreadex for w32threads

2012-11-30 Thread Måns Rullgård
Martin Storsjö  writes:

> When targeting the metro API subset, this function still exists in
> the link libraries, but is excluded from the headers. This makes
> sure w32threads is automatically disabled when targeting this API
> subset (since not all the necessary functions for it are available).
>
> Signed-off-by: Martin Storsjö 
> ---
>  configure |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index 38f49e0..02dd1c6 100755
> --- a/configure
> +++ b/configure
> @@ -3332,7 +3332,7 @@ disabled  zlib || check_lib   zlib.h  zlibVersion 
> -lz   || disable  zlib
>  disabled bzlib || check_lib2 bzlib.h BZ2_bzlibVersion -lbz2 || disable bzlib
>
>  if ! disabled w32threads && ! enabled pthreads; then
> -check_func _beginthreadex && enable w32threads
> +check_func_headers "windows.h process.h" _beginthreadex && enable 
> w32threads
>  fi
>
>  # check for some common methods of building with pthread support
> -- 

OK

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 00/10] Various patches for AAC SBR DSP

2012-11-30 Thread Måns Rullgård
Christophe Gisquet  writes:

> I couldn't find a vector excercising qmf_deint_neg, and I guess neither did
> the one who wrote it,

I can assure you I did, but I don't remember which one(s).

> as not all the code in the same code block of aacsbr was moved to DSP
> functions.

That's an invalid conclusion.  The remaining code was probably just not
worth the effort to move.

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 3/3] configure: Use headers in the check for _beginthreadex for w32threads

2012-11-30 Thread Martin Storsjö
When targeting the metro API subset, this function still exists in
the link libraries, but is excluded from the headers. This makes
sure w32threads is automatically disabled when targeting this API
subset (since not all the necessary functions for it are available).

Signed-off-by: Martin Storsjö 
---
 configure |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 38f49e0..02dd1c6 100755
--- a/configure
+++ b/configure
@@ -3332,7 +3332,7 @@ disabled  zlib || check_lib   zlib.h  zlibVersion -lz 
  || disable  zlib
 disabled bzlib || check_lib2 bzlib.h BZ2_bzlibVersion -lbz2 || disable bzlib
 
 if ! disabled w32threads && ! enabled pthreads; then
-check_func _beginthreadex && enable w32threads
+check_func_headers "windows.h process.h" _beginthreadex && enable 
w32threads
 fi
 
 # check for some common methods of building with pthread support
-- 
1.7.9.4

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 2/3] avutil: Use a configure check to enable windows console functions

2012-11-30 Thread Martin Storsjö
Not all versions or API subsets of windows have these functions.

Signed-off-by: Martin Storsjö 
---
 configure   |2 ++
 libavutil/log.c |4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 5e1be33..38f49e0 100755
--- a/configure
+++ b/configure
@@ -1242,6 +1242,7 @@ HAVE_LIST="
 sched_getaffinity
 sdl
 sdl_video_size
+SetConsoleTextAttribute
 setmode
 setrlimit
 Sleep
@@ -3305,6 +3306,7 @@ check_func_headers windows.h GetProcessAffinityMask
 check_func_headers windows.h GetProcessTimes
 check_func_headers windows.h GetSystemTimeAsFileTime
 check_func_headers windows.h MapViewOfFile
+check_func_headers windows.h SetConsoleTextAttribute
 check_func_headers windows.h Sleep
 check_func_headers windows.h VirtualAlloc
 
diff --git a/libavutil/log.c b/libavutil/log.c
index d335944..45c649a 100644
--- a/libavutil/log.c
+++ b/libavutil/log.c
@@ -41,7 +41,7 @@
 static int av_log_level = AV_LOG_INFO;
 static int flags;
 
-#if defined(_WIN32) && !defined(__MINGW32CE__)
+#if HAVE_SETCONSOLETEXTATTRIBUTE
 #include 
 static const uint8_t color[] = { 12, 12, 12, 14, 7, 10, 11 };
 static int16_t background, attr_orig;
@@ -59,7 +59,7 @@ static int use_color = -1;
 static void colored_fputs(int level, const char *str)
 {
 if (use_color < 0) {
-#if defined(_WIN32) && !defined(__MINGW32CE__)
+#if HAVE_SETCONSOLETEXTATTRIBUTE
 CONSOLE_SCREEN_BUFFER_INFO con_info;
 con = GetStdHandle(STD_ERROR_HANDLE);
 use_color = (con != INVALID_HANDLE_VALUE) && !getenv("NO_COLOR") &&
-- 
1.7.9.4

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 1/3] avutil: Include io.h with a separate condition from windows console functions

2012-11-30 Thread Martin Storsjö
Not all versions of windows have the console color functions,
while io.h might be needed for isatty (which can be found in
unistd.h or io.h).

Signed-off-by: Martin Storsjö 
---
 libavutil/log.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libavutil/log.c b/libavutil/log.c
index d2cf88f..d335944 100644
--- a/libavutil/log.c
+++ b/libavutil/log.c
@@ -29,6 +29,9 @@
 #if HAVE_UNISTD_H
 #include 
 #endif
+#if HAVE_IO_H
+#include 
+#endif
 #include 
 #include "avstring.h"
 #include "avutil.h"
@@ -40,7 +43,6 @@ static int flags;
 
 #if defined(_WIN32) && !defined(__MINGW32CE__)
 #include 
-#include 
 static const uint8_t color[] = { 12, 12, 12, 14, 7, 10, 11 };
 static int16_t background, attr_orig;
 static HANDLE con;
-- 
1.7.9.4

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 10/10] SBR DSP x86: implement SSE hf_apply_noise

2012-11-30 Thread Christophe Gisquet
497 to 253 cycles under Win64.
Replacing the multiplication by s_m[m] by an andps and an xorps with
appropriate vectors is slower. Unrolling is a 15 cycles win.
---
 libavcodec/sbrdsp.c  |1 -
 libavcodec/x86/sbrdsp.asm|   93 ++
 libavcodec/x86/sbrdsp_init.c |   16 +++
 3 files changed, 109 insertions(+), 1 deletions(-)

diff --git a/libavcodec/sbrdsp.c b/libavcodec/sbrdsp.c
index 781ec83..d0a0b93 100644
--- a/libavcodec/sbrdsp.c
+++ b/libavcodec/sbrdsp.c
@@ -175,7 +175,6 @@ static av_always_inline void sbr_hf_apply_noise(float 
(*Y)[2],
 int m_max)
 {
 int m;
-
 for (m = 0; m < m_max; m++) {
 float y0 = Y[m][0];
 float y1 = Y[m][1];
diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index cfbd6e8..608dee6 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -26,6 +26,12 @@ SECTION_RODATA
 ps_mask times 2 dd 1<<31, 0
 ps_mask2times 2 dd 0, 1<<31
 ps_neg  times 4 dd 1<<31
+ps_noise0   times 2 dd  1.0,  0.0,
+ps_noise2   times 2 dd -1.0,  0.0
+ps_noise13  dd  0.0,  1.0, 0.0, -1.0
+dd  0.0, -1.0, 0.0,  1.0
+dd  0.0,  1.0, 0.0, -1.0
+cextern sbr_noise_table
 
 SECTION_TEXT
 
@@ -318,3 +324,90 @@ cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c
   subcq, 2*mmsize
   jge .loop
   REP_RET
+
+; r0q=Y   r1q=s_m   r2q=q_filt   r3q=noise  r4q=max_m
+cglobal hf_apply_noise_main
+  dec   r3q
+  shl   r4q, 2
+  lea   r0q, [r0q + 2*r4q]
+  add   r1q, r4q
+  add   r2q, r4q
+  shl   r3q, 3
+  xorps  m5, m5
+  neg   r4q
+.loop:
+  add   r3q, 16
+  and   r3q, 0x1ff<<3
+  movh   m1, [r2q + r4q]
+  movu   m3, [r3q + sbr_noise_table]
+  movh   m2, [r2q + r4q + 8]
+  add   r3q, 16
+  and   r3q, 0x1ff<<3
+  movu   m4, [r3q + sbr_noise_table]
+  unpcklps   m1, m1
+  unpcklps   m2, m2
+  mulps  m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
+  mulps  m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise]
+  movh   m3, [r1q + r4q]
+  movh   m4, [r1q + r4q + 8]
+  unpcklps   m3, m3
+  unpcklps   m4, m4
+  mova   m6, m3
+  mova   m7, m4
+  mulps  m3, m0 ; s_m[m] * phi_sign
+  mulps  m4, m0 ; s_m[m] * phi_sign
+  cmpps  m6, m5, 0 ; m1 == 0
+  cmpps  m7, m5, 0 ; m1 == 0
+  andps  m1, m6
+  andps  m2, m7
+  movu   m6, [r0q + 2*r4q]
+  movu   m7, [r0q + 2*r4q + 16]
+  addps  m6, m1
+  addps  m7, m2
+  addps  m6, m3
+  addps  m7, m4
+  movu[r0q + 2*r4q], m6
+  movu[r0q + 2*r4q + 16], m7
+  add   r4q, 16
+  jl  .loop
+  ret
+
+; sbr_hf_apply_noise_0(float (*Y)[2], const float *s_m,
+;  const float *q_filt, int noise,
+;  int kx, int m_max)
+cglobal sbr_hf_apply_noise_0, 4,5,8, Y,s_m,q_filt,noise,kx,m_max
+  mova   m0, [ps_noise0]
+  mov   r4d, m_maxm
+  call  hf_apply_noise_main
+  RET
+
+; sbr_hf_apply_noise_1(float (*Y)[2], const float *s_m,
+;  const float *q_filt, int noise,
+;  int kx, int m_max)
+cglobal sbr_hf_apply_noise_1, 5,5,8, Y,s_m,q_filt,noise,kx,m_max
+  and   kxq, 1
+  shl   kxq, 4
+  mova   m0, [kxq + ps_noise13]
+  mov   r4d, m_maxm
+  call  hf_apply_noise_main
+  RET
+
+; sbr_hf_apply_noise_2(float (*Y)[2], const float *s_m,
+;  const float *q_filt, int noise,
+;  int kx, int m_max)
+cglobal sbr_hf_apply_noise_2, 4,5,8, Y,s_m,q_filt,noise,kx,m_max
+  mova   m0, [ps_noise2]
+  mov   r4d, m_maxm
+  call  hf_apply_noise_main
+  RET
+
+; sbr_hf_apply_noise_3(float (*Y)[2], const float *s_m,
+;  const float *q_filt, int noise,
+;  int kx, int m_max)
+cglobal sbr_hf_apply_noise_3, 5,5,8, Y,s_m,q_filt,noise,kx,m_max
+  and   kxq, 1
+  shl   kxq, 4
+  mova   m0, [kxq + ps_noise13 + 16]
+  mov   r4d, m_maxm
+  call  hf_apply_noise_main
+  RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index 5e3e131..9759314 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -36,6 +36,18 @@ void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float 
*z);
 void ff_sbr_qmf_pre_shuffle_sse(float *z);
 void ff_sbr_qmf_deint_neg_sse(float *v, const float *src);
 void ff_sbr_qmf_deint_bfly_sse(float *v, const float *src0, const float *src1);
+void ff_sbr_hf_apply_noise_0_sse(float (*Y)[2], const float *s_m,
+ const float *q_filt, int noise,
+ int kx, int m_max);
+void ff_sbr_hf_apply_noise_1_sse(float (*Y)[2], const float *s_m,
+ const float *q_filt, int noise,
+ int kx, int m_max);
+void ff_sbr_hf_apply_noise_2_sse(float (*Y)[2], const float *s_m,
+

[libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.

2012-11-30 Thread Christophe Gisquet
Swapping buffer indices allows saving one memcpy that accounts for 1% of the
runtime, according to oprofile.
---
 libavcodec/aacsbr.c |   22 +++---
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c
index df5d927..40b08f9 100644
--- a/libavcodec/aacsbr.c
+++ b/libavcodec/aacsbr.c
@@ -1153,10 +1153,9 @@ static void sbr_dequant(SpectralBandReplication *sbr, 
int id_aac)
  */
 static void sbr_qmf_analysis(DSPContext *dsp, FFTContext *mdct,
  SBRDSPContext *sbrdsp, const float *in, float *x,
- float z[320], float W[2][32][32][2])
+ float z[320], float W[2][32][32][2], int buf_idx)
 {
 int i;
-memcpy(W[0], W[1], sizeof(W[0]));
 memcpy(x, x+1024, (320-32)*sizeof(x[0]));
 memcpy(x+288, in, 1024*sizeof(x[0]));
 for (i = 0; i < 32; i++) { // numTimeSlots*RATE = 16*2 as 960 sample frames
@@ -1165,7 +1164,7 @@ static void sbr_qmf_analysis(DSPContext *dsp, FFTContext 
*mdct,
 sbrdsp->sum64x5(z);
 sbrdsp->qmf_pre_shuffle(z);
 mdct->imdct_half(mdct, z, z+64);
-sbrdsp->qmf_post_shuffle(W[1][i], z);
+sbrdsp->qmf_post_shuffle(W[buf_idx][i], z);
 x += 32;
 }
 }
@@ -1301,7 +1300,8 @@ static void sbr_chirp(SpectralBandReplication *sbr, 
SBRData *ch_data)
 
 /// Generate the subband filtered lowband
 static int sbr_lf_gen(AACContext *ac, SpectralBandReplication *sbr,
-  float X_low[32][40][2], const float W[2][32][32][2])
+  float X_low[32][40][2], const float W[2][32][32][2],
+  int buf_idx)
 {
 int i, k;
 const int t_HFGen = 8;
@@ -1309,14 +1309,15 @@ static int sbr_lf_gen(AACContext *ac, 
SpectralBandReplication *sbr,
 memset(X_low, 0, 32*sizeof(*X_low));
 for (k = 0; k < sbr->kx[1]; k++) {
 for (i = t_HFGen; i < i_f + t_HFGen; i++) {
-X_low[k][i][0] = W[1][i - t_HFGen][k][0];
-X_low[k][i][1] = W[1][i - t_HFGen][k][1];
+X_low[k][i][0] = W[buf_idx][i - t_HFGen][k][0];
+X_low[k][i][1] = W[buf_idx][i - t_HFGen][k][1];
 }
 }
+buf_idx = 1-buf_idx;
 for (k = 0; k < sbr->kx[0]; k++) {
 for (i = 0; i < t_HFGen; i++) {
-X_low[k][i][0] = W[0][i + i_f - t_HFGen][k][0];
-X_low[k][i][1] = W[0][i + i_f - t_HFGen][k][1];
+X_low[k][i][0] = W[buf_idx][i + i_f - t_HFGen][k][0];
+X_low[k][i][1] = W[buf_idx][i + i_f - t_HFGen][k][1];
 }
 }
 return 0;
@@ -1344,7 +1345,6 @@ static int sbr_hf_gen(AACContext *ac, 
SpectralBandReplication *sbr,
"ERROR : no subband found for frequency %d\n", k);
 return -1;
 }
-
 sbr->dsp.hf_gen(X_high[k] + ENVELOPE_ADJUSTMENT_OFFSET,
 X_low[p]  + ENVELOPE_ADJUSTMENT_OFFSET,
 alpha0[p], alpha1[p], bw_array[g],
@@ -1665,8 +1665,8 @@ void ff_sbr_apply(AACContext *ac, SpectralBandReplication 
*sbr, int id_aac,
 /* decode channel */
 sbr_qmf_analysis(&ac->dsp, &sbr->mdct_ana, &sbr->dsp, ch ? R : L, 
sbr->data[ch].analysis_filterbank_samples,
  (float*)sbr->qmf_filter_scratch,
- sbr->data[ch].W);
-sbr_lf_gen(ac, sbr, sbr->X_low, sbr->data[ch].W);
+ sbr->data[ch].W, sbr->data[ch].Ypos);
+sbr_lf_gen(ac, sbr, sbr->X_low, sbr->data[ch].W, sbr->data[ch].Ypos);
 sbr->data[ch].Ypos ^= 1;
 if (sbr->start) {
 sbr_hf_inverse_filter(&sbr->dsp, sbr->alpha0, sbr->alpha1, 
sbr->X_low, sbr->k[0]);
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 08/10] x264asm: fix cmp* number of arguments

2012-11-30 Thread Christophe Gisquet
cmp{p,s}{s,d} instructions do take an imm8 operand.
---
 libavutil/x86/x86inc.asm |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 52ee46a..3744e46 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -951,10 +951,10 @@ AVX_INSTR blendpd, 1, 0, 0
 AVX_INSTR blendps, 1, 0, 0
 AVX_INSTR blendvpd, 1, 0, 0
 AVX_INSTR blendvps, 1, 0, 0
-AVX_INSTR cmppd, 1, 0, 0
-AVX_INSTR cmpps, 1, 0, 0
-AVX_INSTR cmpsd, 1, 0, 0
-AVX_INSTR cmpss, 1, 0, 0
+AVX_INSTR cmppd, 1, 1, 0
+AVX_INSTR cmpps, 1, 1, 0
+AVX_INSTR cmpsd, 1, 1, 0
+AVX_INSTR cmpss, 1, 1, 0
 AVX_INSTR cvtdq2ps, 1, 0, 0
 AVX_INSTR cvtps2dq, 1, 0, 0
 AVX_INSTR divpd, 1, 0, 0
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 07/10] SBR DSP x86: implement SSE qmf_deint_bfly

2012-11-30 Thread Christophe Gisquet
>From 713 to 209 cycles on Penrynn.
Having a loop counter is a 7 cycle gain.
Unrolling is another 7 cycle gain.
Working in reverse scan is another 6 cycles.
---
 libavcodec/x86/sbrdsp.asm|   31 +++
 libavcodec/x86/sbrdsp_init.c |2 ++
 2 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index 49dd78c..cfbd6e8 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -287,3 +287,34 @@ cglobal sbr_qmf_deint_neg, 2,3,4,v,src,vrev
   cmpvq, vrevq
   jl  .loop
   REP_RET
+
+; sbr_qmf_deint_bfly(float *v, const float *src0, const float *src1)
+cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c
+  movcq, 64*4-2*mmsize
+  lea vrevq, [vq + 64*4]
+.loop:
+  mova   m0, [src0q+cq]
+  mova   m1, [src1q]
+  mova   m4, [src0q+cq+mmsize]
+  mova   m5, [src1q+mmsize]
+  mova   m2, m0
+  mova   m3, m1
+  shufps m2, m2, 11011b
+  shufps m3, m3, 11011b
+  mova   m6, m4
+  mova   m7, m5
+  shufps m6, m6, 11011b
+  shufps m7, m7, 11011b
+  addps  m5, m2
+  subps  m0, m7
+  addps  m1, m6
+  subps  m4, m3
+  mova  [vrevq], m1
+  mova  [vrevq+mmsize], m5
+  mova  [vq+cq], m0
+  mova  [vq+cq+mmsize], m4
+  add src1q, 2*mmsize
+  add vrevq, 2*mmsize
+  subcq, 2*mmsize
+  jge .loop
+  REP_RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index 1ac64aa..5e3e131 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -35,6 +35,7 @@ void ff_sbr_neg_odd_64_sse(float *z);
 void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z);
 void ff_sbr_qmf_pre_shuffle_sse(float *z);
 void ff_sbr_qmf_deint_neg_sse(float *v, const float *src);
+void ff_sbr_qmf_deint_bfly_sse(float *v, const float *src0, const float *src1);
 
 void ff_sbrdsp_init_x86(SBRDSPContext *s)
 {
@@ -49,5 +50,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse;
 s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse;
 s->qmf_deint_neg = ff_sbr_qmf_deint_neg_sse;
+s->qmf_deint_bfly = ff_sbr_qmf_deint_bfly_sse;
 }
 }
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 02/10] SBR DSP x86: implement SSE sum64x5

2012-11-30 Thread Christophe Gisquet
698 to 174 cycles on penrynn. Unrolling is a 6 cycles gain.

unrol 6 cycles
---
 libavcodec/x86/sbrdsp.asm|   22 ++
 libavcodec/x86/sbrdsp_init.c |2 ++
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index 039bf8c..11a6faf 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -181,3 +181,25 @@ cglobal sbr_hf_gen, 4,4,8, X_high, X_low, alpha0, alpha1, 
BW, S, E
 add start, 16
 jnz .loop2
 RET
+
+cglobal sbr_sum64x5, 1,2,4,z
+  lear1q, [zq+ 256]
+.loop:
+  movam0, [zq+   0]
+  movam2, [zq+  16]
+  movam1, [zq+ 256]
+  movam3, [zq+ 272]
+  addps   m0, [zq+ 512]
+  addps   m2, [zq+ 528]
+  addps   m1, [zq+ 768]
+  addps   m3, [zq+ 784]
+  addps   m0, [zq+1024]
+  addps   m2, [zq+1040]
+  addps   m0, m1
+  addps   m2, m3
+  mova  [zq], m0
+  mova  [zq+16], m2
+  add zq, 32
+  cmp zq, r1q
+  jne  .loop
+  REP_RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index 51c4bd4..108a681 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -30,6 +30,7 @@ void ff_sbr_hf_g_filt_sse(float (*Y)[2], const float 
(*X_high)[40][2],
 void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2],
const float alpha0[2], const float alpha1[2],
float bw, int start, int end);
+void ff_sbr_sum64x5_sse(float *z);
 
 void ff_sbrdsp_init_x86(SBRDSPContext *s)
 {
@@ -39,5 +40,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 s->sum_square = ff_sbr_sum_square_sse;
 s->hf_g_filt  = ff_sbr_hf_g_filt_sse;
 s->hf_gen = ff_sbr_hf_gen_sse;
+s->sum64x5= ff_sbr_sum64x5_sse;
 }
 }
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 06/10] SBR DSP x86: implement SSE neg_odd_64

2012-11-30 Thread Christophe Gisquet
>From 210 cycles to 87 on penrynn.
Unrolling and not storing mask both save some cycles.
---
 libavcodec/x86/sbrdsp.asm|   21 +
 libavcodec/x86/sbrdsp_init.c |2 ++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index aff6879..49dd78c 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -206,6 +206,26 @@ cglobal sbr_sum64x5, 1,2,4,z
   jne  .loop
   REP_RET
 
+cglobal sbr_neg_odd_64, 1,2,4,z
+  lea   r1q, [zq+256]
+.loop:
+  mova   m0, [zq+ 0]
+  mova   m1, [zq+16]
+  mova   m2, [zq+32]
+  mova   m3, [zq+48]
+  xorps  m0, [ps_mask2]
+  xorps  m1, [ps_mask2]
+  xorps  m2, [ps_mask2]
+  xorps  m3, [ps_mask2]
+  mova  [zq+ 0], m0
+  mova  [zq+16], m1
+  mova  [zq+32], m2
+  mova  [zq+48], m3
+  addzq, 64
+  cmpzq, r1q
+  jne .loop
+  REP_RET
+
 cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
   lea   r2q, [zq + (64-4)*4]
 .loop:
@@ -266,3 +286,4 @@ cglobal sbr_qmf_deint_neg, 2,3,4,v,src,vrev
   sub  srcq, 32
   cmpvq, vrevq
   jl  .loop
+  REP_RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index e70b970..1ac64aa 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -31,6 +31,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float 
(*X_low)[2],
const float alpha0[2], const float alpha1[2],
float bw, int start, int end);
 void ff_sbr_sum64x5_sse(float *z);
+void ff_sbr_neg_odd_64_sse(float *z);
 void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z);
 void ff_sbr_qmf_pre_shuffle_sse(float *z);
 void ff_sbr_qmf_deint_neg_sse(float *v, const float *src);
@@ -44,6 +45,7 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 s->hf_g_filt  = ff_sbr_hf_g_filt_sse;
 s->hf_gen = ff_sbr_hf_gen_sse;
 s->sum64x5= ff_sbr_sum64x5_sse;
+s->neg_odd_64 = ff_sbr_neg_odd_64_sse;
 s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse;
 s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse;
 s->qmf_deint_neg = ff_sbr_qmf_deint_neg_sse;
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 05/10] SBR DSP x86: implement SSE qmf_deint_neg

2012-11-30 Thread Christophe Gisquet
No vector tests it.
---
 libavcodec/x86/sbrdsp.asm|   19 +++
 libavcodec/x86/sbrdsp_init.c |2 ++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index b9f0709..aff6879 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -247,3 +247,22 @@ cglobal sbr_qmf_pre_shuffle, 1,4,4,z
   jl  .loop
   movh  [r3q-256], m3
   REP_RET
+
+cglobal sbr_qmf_deint_neg, 2,3,4,v,src,vrev
+  lea vrevq, [vq + (64-4)*4]
+  add  srcq, (64-8)*4
+  mova   m3, [ps_neg]
+.loop:
+  mova   m0, [srcq +  0]
+  mova   m1, [srcq + 16]
+  mova   m2, m1
+  shufps m0, m1, 11011101b
+  shufps m2, m1, 10001000b
+  xorps  m0, m3
+  mova [vq], m2
+  mova  [vrevq], m0
+  addvq, 16
+  sub vrevq, 16
+  sub  srcq, 32
+  cmpvq, vrevq
+  jl  .loop
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index 5babe62..e70b970 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -33,6 +33,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float 
(*X_low)[2],
 void ff_sbr_sum64x5_sse(float *z);
 void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z);
 void ff_sbr_qmf_pre_shuffle_sse(float *z);
+void ff_sbr_qmf_deint_neg_sse(float *v, const float *src);
 
 void ff_sbrdsp_init_x86(SBRDSPContext *s)
 {
@@ -45,5 +46,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 s->sum64x5= ff_sbr_sum64x5_sse;
 s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse;
 s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse;
+s->qmf_deint_neg = ff_sbr_qmf_deint_neg_sse;
 }
 }
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 04/10] SBR DSP x86: implement SSE qmf_pre_shuffle

2012-11-30 Thread Christophe Gisquet
>From 253 to 185c.
---
 libavcodec/x86/sbrdsp.asm|   23 +++
 libavcodec/x86/sbrdsp_init.c |2 ++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index 2b90100..b9f0709 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -224,3 +224,26 @@ cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
   cmpzq, r2q
   jl  .loop
   REP_RET
+
+cglobal sbr_qmf_pre_shuffle, 1,4,4,z
+  movh   m3, [zq]
+  lea   r3q, [zq + 64*4]
+  lea   r2q, [zq + (64-3)*4]
+  addzq, 4
+.loop:
+  movu   m0, [r2q]
+  movu   m1, [zq ]
+  xorps  m0, [ps_neg]
+  shufps m0, m0, 0x1B
+  mova   m2, m0
+  unpcklps   m0, m1
+  unpckhps   m2, m1
+  mova  [r3q +  0], m0
+  mova  [r3q + 16], m2
+  add   r3q, 32
+  sub   r2q, 16
+  addzq, 16
+  cmpzq, r2q
+  jl  .loop
+  movh  [r3q-256], m3
+  REP_RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index 3f6dd97..5babe62 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -32,6 +32,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float 
(*X_low)[2],
float bw, int start, int end);
 void ff_sbr_sum64x5_sse(float *z);
 void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z);
+void ff_sbr_qmf_pre_shuffle_sse(float *z);
 
 void ff_sbrdsp_init_x86(SBRDSPContext *s)
 {
@@ -43,5 +44,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 s->hf_gen = ff_sbr_hf_gen_sse;
 s->sum64x5= ff_sbr_sum64x5_sse;
 s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse;
+s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse;
 }
 }
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle

2012-11-30 Thread Christophe Gisquet
On penrynn, from 255 to 174c. Unrolling yields no gain.
---
 libavcodec/x86/sbrdsp.asm|   21 +
 libavcodec/x86/sbrdsp_init.c |2 ++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index 11a6faf..2b90100 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -24,6 +24,8 @@
 SECTION_RODATA
 ; mask equivalent for multiply by -1.0 1.0
 ps_mask times 2 dd 1<<31, 0
+ps_mask2times 2 dd 0, 1<<31
+ps_neg  times 4 dd 1<<31
 
 SECTION_TEXT
 
@@ -203,3 +205,22 @@ cglobal sbr_sum64x5, 1,2,4,z
   cmp zq, r1q
   jne  .loop
   REP_RET
+
+cglobal sbr_qmf_post_shuffle, 2,3,3,W,z
+  lea   r2q, [zq + (64-4)*4]
+.loop:
+  mova   m0, [r2q]
+  mova   m1, [zq ]
+  xorps  m0, [ps_neg]
+  shufps m0, m0, 0x1B
+  mova   m2, m0
+  unpcklps   m0, m1
+  unpckhps   m2, m1
+  mova  [Wq +  0], m0
+  mova  [Wq + 16], m2
+  addWq, 32
+  sub   r2q, 16
+  addzq, 16
+  cmpzq, r2q
+  jl  .loop
+  REP_RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index 108a681..3f6dd97 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -31,6 +31,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float 
(*X_low)[2],
const float alpha0[2], const float alpha1[2],
float bw, int start, int end);
 void ff_sbr_sum64x5_sse(float *z);
+void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z);
 
 void ff_sbrdsp_init_x86(SBRDSPContext *s)
 {
@@ -41,5 +42,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 s->hf_g_filt  = ff_sbr_hf_g_filt_sse;
 s->hf_gen = ff_sbr_hf_gen_sse;
 s->sum64x5= ff_sbr_sum64x5_sse;
+s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse;
 }
 }
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 01/10] SBR DSP x86: implement SSE sbr_hf_gen

2012-11-30 Thread Christophe Gisquet
Start and end index are multiple of 2, therefore guaranteeing aligned access.
Also, this allows to generate 4 floats per loop, keeping the alignment all
along.

Timing:
- 32 bits: 326c -> 172
- 64 bits: 323c -> 156c
---
 libavcodec/x86/sbrdsp.asm|   73 -
 libavcodec/x86/sbrdsp_init.c |4 ++
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm
index c351de4..039bf8c 100644
--- a/libavcodec/x86/sbrdsp.asm
+++ b/libavcodec/x86/sbrdsp.asm
@@ -21,8 +21,11 @@
 
 %include "libavutil/x86/x86util.asm"
 
-;SECTION_RODATA
-SECTION .text
+SECTION_RODATA
+; mask equivalent for multiply by -1.0 1.0
+ps_mask times 2 dd 1<<31, 0
+
+SECTION_TEXT
 
 INIT_XMM sse
 cglobal sbr_sum_square, 2, 3, 6
@@ -112,3 +115,69 @@ cglobal sbr_hf_g_filt, 5, 6, 5
 jnz .loop1
 .end:
 RET
+
+; static void sbr_hf_gen_c(float (*X_high)[2], const float (*X_low)[2],
+;  const float alpha0[2], const float alpha1[2],
+;  float bw, int start, int end)
+;
+cglobal sbr_hf_gen, 4,4,8, X_high, X_low, alpha0, alpha1, BW, S, E
+; load alpha factors
+%define bw m0
+%if ARCH_X86_64 == 0 || WIN64
+movss  bw, BWm
+%endif
+movh   m2, [alpha1q]
+movh   m1, [alpha0q]
+shufps bw, bw, 0
+mulps  m2, bw ; (a1[0] a1[1])*bw
+mulps  m1, bw ; (a0[0] a0[1])*bw= (a2 a3)
+mulps  m2, bw ; (a1[0] a1[1])*bw*bw = (a0 a1)
+mova   m3, m1
+mova   m4, m2
+mova   m7, [ps_mask]
+
+; Set pointers
+%if ARCH_X86_64 == 0 || WIN64
+; start and end 6th and 7th args on stack
+movr2d, Sm
+movr3d, Em
+%define  start r2q
+%define  end   r3q
+%else
+; BW does not actually occupy a register, so shift by 1
+%define  start BWq
+%define  end   Sq
+%endif
+sub  start, end  ; neg num of loops
+leaX_highq, [X_highq + end*2*4]
+lea X_lowq, [X_lowq  + end*2*4 - 2*2*4]
+shl  start, 3  ; offset from num loops
+
+movam0, [X_lowq + start]
+movlhps m1, m1 ; (a2 a3 a2 a3)
+movlhps m2, m2 ; (a0 a1 a0 a1)
+shufps  m3, m3, 00010001b  ; (a3 a2 a3 a2)
+shufps  m4, m4, 00010001b  ; (a1 a0 a1 a0)
+xorps   m3, m7 ; (-a3 a2 -a3 a2)
+xorps   m4, m7 ; (-a1 a0 -a1 a0)
+.loop2:
+movam5, m0
+movam6, m0
+shufps  m0, m0, 1010b ; {Xl[-2][0],",Xl[-1][0],"}
+shufps  m5, m5, 0101b ; {Xl[-2][1],",Xl[-1][1],"}
+mulps   m0, m2
+mulps   m5, m4
+movam7, m6
+addps   m5, m0
+movam0, [X_lowq + start + 2*2*4]
+shufps  m6, m0, 1010b ; {Xl[-1][0],",Xl[0][0],"}
+shufps  m7, m0, 0101b ; {Xl[-1][1],",Xl[1][1],"}
+mulps   m6, m1
+mulps   m7, m3
+addps   m5, m6
+addps   m7, m0
+addps   m5, m7
+mova  [X_highq + start], m5
+add start, 16
+jnz .loop2
+RET
diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c
index d272896..51c4bd4 100644
--- a/libavcodec/x86/sbrdsp_init.c
+++ b/libavcodec/x86/sbrdsp_init.c
@@ -27,6 +27,9 @@
 float ff_sbr_sum_square_sse(float (*x)[2], int n);
 void ff_sbr_hf_g_filt_sse(float (*Y)[2], const float (*X_high)[40][2],
   const float *g_filt, int m_max, intptr_t ixh);
+void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2],
+   const float alpha0[2], const float alpha1[2],
+   float bw, int start, int end);
 
 void ff_sbrdsp_init_x86(SBRDSPContext *s)
 {
@@ -35,5 +38,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s)
 if (EXTERNAL_SSE(mm_flags)) {
 s->sum_square = ff_sbr_sum_square_sse;
 s->hf_g_filt  = ff_sbr_hf_g_filt_sse;
+s->hf_gen = ff_sbr_hf_gen_sse;
 }
 }
-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 00/10] Various patches for AAC SBR DSP

2012-11-30 Thread Christophe Gisquet
Those are mostly x86 SSE asm. First patch is a continuation of thread
"[PATCHES] SBR DSP and sbr_hf_gen".

Except for hf_apply_noise, they are tested using fate-aac on win32/win64
and linux x86-64. I didn't check under linux x86-32.

I couldn't find a vector excercising qmf_deint_neg, and I guess neither did
the one who wrote it, as not all the code in the same code block of aacsbr
was moved to DSP functions.

Christophe Gisquet (10):
  SBR DSP x86: implement SSE sbr_hf_gen
  SBR DSP x86: implement SSE sum64x5
  SBR DSP x86: implement SSE qmf_post_shuffle
  SBR DSP x86: implement SSE qmf_pre_shuffle
  SBR DSP x86: implement SSE qmf_deint_neg
  SBR DSP x86: implement SSE neg_odd_64
  SBR DSP x86: implement SSE qmf_deint_bfly
  x264asm: fix cmp* number of arguments
  AAC SBR: avoid a memcpy.
  SBR DSP x86: implement SSE hf_apply_noise

 libavcodec/aacsbr.c  |   22 ++--
 libavcodec/sbrdsp.c  |1 -
 libavcodec/x86/sbrdsp.asm|  303 +-
 libavcodec/x86/sbrdsp_init.c |   32 +
 libavutil/x86/x86inc.asm |8 +-
 5 files changed, 348 insertions(+), 18 deletions(-)

-- 
1.7.7.msysgit.0

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 1/1] avprobe: report per stream bit rate if set by the decoder

2012-11-30 Thread Janne Grunau
---
 avprobe.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/avprobe.c b/avprobe.c
index 3a3ae0f..4da9621 100644
--- a/avprobe.c
+++ b/avprobe.c
@@ -654,6 +654,10 @@ static void show_stream(AVFormatContext *fmt_ctx, int 
stream_idx)
 probe_str("avg_frame_rate",
   rational_string(val_str, sizeof(val_str), "/",
   &stream->avg_frame_rate));
+if (dec_ctx->bit_rate)
+probe_str("bit_rate",
+  value_string(val_str, sizeof(val_str),
+   dec_ctx->bit_rate, unit_bit_per_second_str));
 probe_str("time_base",
   rational_string(val_str, sizeof(val_str), "/",
   &stream->time_base));
-- 
1.7.12.4

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCHES] scalarproduct_and_madd_int16 and wma lossless

2012-11-30 Thread Christophe Gisquet
2012/11/27 Justin Ruggles :
> Either make the function name lowercase or make it a macro. Also, if you
> leave it as an inline function, put the opening brace on a separate line.

Moved to a macro. This was mimicking apedec codec.

If the patches are declared ok, someone should check the neon code.

-- 
Christophe


0003-dsputil-allow-scalarproduct_and_madd_int16-to-handle.patch
Description: Binary data


0004-wma-lossless-reuse-scalarproduct_and_madd_int16.patch
Description: Binary data
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/1] svq3: check and reject negative slice types

2012-11-30 Thread Måns Rullgård
Janne Grunau  writes:

> On 2012-11-29 17:21:45 +, Måns Rullgård wrote:
>> Janne Grunau  writes:
>> 
>> > On 2012-11-29 13:46:12 +, Måns Rullgård wrote:
>> >> Janne Grunau  writes:
>> >> 
>> >> > On 2012-11-29 00:08:52 +, Måns Rullgård wrote:
>> >> >> Janne Grunau  writes:
>> >> >> 
>> >> >> > On 2012-11-28 23:52:27 +0100, Luca Barbato wrote:
>> >> >> >> 
>> >> >> >> Is INVALID_VLC value negative?
>> >> >> >
>> >> >> > no, 0x8000. But arithmetic conversion saves us.
>> >> >> 
>> >> >> Ouch, that's _really_ bad.  The svq3_get_ue_golomb() return type is
>> >> >> (inexplicably) int, so returning that value entails a conversion with
>> >> >> implementation-defined behaviour.  Most compilers leave the bits intact
>> >> >> in such conversions, but I'd rather not depend on it.  Also, such code
>> >> >> is nothing short of obfuscated even if it does work reliably.
>> >> >
>> >> > 6.3.1.3 reads to me as if signed int to unsigned int conversion is well
>> >> > defined:
>> >> 
>> >> Yes, but we're dealing with unsigned to signed here.  The integer
>> >> constant 0x8000 has type unsigned int (if int is 32-bit).  Returning
>> >> this from svq3_get_ue_golomb() as a signed int invokes an
>> >> implementation-defined conversion.
>> >
>> > svq3_get_ue_golomb() doesn't return INVALID_VLC explicitly.
>> 
>> Right, I confused it with svq3_get_se_golomb().  svq3_get_ue_golomb() is
>> actually worse.
>> 
>> > The only it can return something negative is due to signed arithmetic
>> > on the int ret in the else branch.
>> 
>> The function has two return statements.  The first returns a value from
>> ff_interleaved_ue_golomb_vlc_code[] (array of uint8_t), so this one is
>> safe.  The other returns "ret - 1" where ret is a signed int.  For this
>> to produce a negative value other than -1, ret must itself be negative.
>> This can only happen through the left shifts in the loop overflowing
>> into the sign bit.  Such an overflow has *undefined* behaviour.
>
> returning -1 relies on undefined behaviour too. ret is initialized to 1
> and the only operations are left shifts and bitwise ORs.

Quite so.

>> input can cause this to happen (and your patch suggests this is the
>> case), the code is broken and must be fixed here.  Checking it after the
>> fact is not good enough.
>
> I see no reason why the computation in svq3_get_ue_golomb() can't be
> changed to unsigned and at least make fate agrees.
>
> The only decision to be made is whether to detect truncation/invalid
> codes or keep reading until it is properly terminated.

Detecting errors as soon as possible might be more robust, but simply
reading until it naturally terminates is probably faster.

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/1] svq3: check and reject negative slice types

2012-11-30 Thread Janne Grunau
On 2012-11-29 17:21:45 +, Måns Rullgård wrote:
> Janne Grunau  writes:
> 
> > On 2012-11-29 13:46:12 +, Måns Rullgård wrote:
> >> Janne Grunau  writes:
> >> 
> >> > On 2012-11-29 00:08:52 +, Måns Rullgård wrote:
> >> >> Janne Grunau  writes:
> >> >> 
> >> >> > On 2012-11-28 23:52:27 +0100, Luca Barbato wrote:
> >> >> >> 
> >> >> >> Is INVALID_VLC value negative?
> >> >> >
> >> >> > no, 0x8000. But arithmetic conversion saves us.
> >> >> 
> >> >> Ouch, that's _really_ bad.  The svq3_get_ue_golomb() return type is
> >> >> (inexplicably) int, so returning that value entails a conversion with
> >> >> implementation-defined behaviour.  Most compilers leave the bits intact
> >> >> in such conversions, but I'd rather not depend on it.  Also, such code
> >> >> is nothing short of obfuscated even if it does work reliably.
> >> >
> >> > 6.3.1.3 reads to me as if signed int to unsigned int conversion is well
> >> > defined:
> >> 
> >> Yes, but we're dealing with unsigned to signed here.  The integer
> >> constant 0x8000 has type unsigned int (if int is 32-bit).  Returning
> >> this from svq3_get_ue_golomb() as a signed int invokes an
> >> implementation-defined conversion.
> >
> > svq3_get_ue_golomb() doesn't return INVALID_VLC explicitly.
> 
> Right, I confused it with svq3_get_se_golomb().  svq3_get_ue_golomb() is
> actually worse.
> 
> > The only it can return something negative is due to signed arithmetic
> > on the int ret in the else branch.
> 
> The function has two return statements.  The first returns a value from
> ff_interleaved_ue_golomb_vlc_code[] (array of uint8_t), so this one is
> safe.  The other returns "ret - 1" where ret is a signed int.  For this
> to produce a negative value other than -1, ret must itself be negative.
> This can only happen through the left shifts in the loop overflowing
> into the sign bit.  Such an overflow has *undefined* behaviour.

returning -1 relies on undefined behaviour too. ret is initialized to 1
and the only operations are left shifts and bitwise ORs.

> input can cause this to happen (and your patch suggests this is the
> case), the code is broken and must be fixed here.  Checking it after the
> fact is not good enough.

I see no reason why the computation in svq3_get_ue_golomb() can't be
changed to unsigned and at least make fate agrees.

The only decision to be made is whether to detect truncation/invalid
codes or keep reading until it is properly terminated.

Janne
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/2] lavr: temporarily store custom matrix in AVAudioResampleContext

2012-11-30 Thread Anton Khirnov

On Thu, 29 Nov 2012 23:36:05 -0500, Justin Ruggles  
wrote:
> This allows AudioMix to be treated the same way as other conversion contexts
> and removes the requirement to allocate it at the same time as the
> AVAudioResampleContext.
> 
> The current matrix get/set functions are split between the public interface
> and AudioMix private functions.
> ---
> Updated patch also moves the AudioMix definition to audio_mix.c since
> none of the fields need to be accessed outside of that file.
> 
>  libavresample/audio_mix.c|  186 
> --
>  libavresample/audio_mix.h|   47 --
>  libavresample/audio_mix_matrix.c |  112 ---
>  libavresample/internal.h |7 ++
>  libavresample/options.c  |7 --
>  libavresample/utils.c|   79 +++--
>  6 files changed, 257 insertions(+), 181 deletions(-)
> 
> diff --git a/libavresample/audio_mix.c b/libavresample/audio_mix.c
> index dd2f33d..ad68b7a 100644
> --- a/libavresample/audio_mix.c
> +++ b/libavresample/audio_mix.c
> @@ -28,6 +28,29 @@
>  #include "audio_data.h"
>  #include "audio_mix.h"
>  
> +struct AudioMix {
> +AVAudioResampleContext *avr;
> +enum AVSampleFormat fmt;
> +enum AVMixCoeffType coeff_type;
> +uint64_t in_layout;
> +uint64_t out_layout;
> +int in_channels;
> +int out_channels;
> +
> +int ptr_align;
> +int samples_align;
> +int has_optimized_func;
> +const char *func_descr;
> +const char *func_descr_generic;
> +mix_func *mix;
> +mix_func *mix_generic;
> +
> +int16_t *matrix_q8[AVRESAMPLE_MAX_CHANNELS];
> +int32_t *matrix_q15[AVRESAMPLE_MAX_CHANNELS];
> +float   *matrix_flt[AVRESAMPLE_MAX_CHANNELS];
> +void   **matrix;
> +};
> +
>  static const char *coeff_type_names[] = { "q8", "q15", "flt" };
>  
>  void ff_audio_mix_set_func(AudioMix *am, enum AVSampleFormat fmt,
> @@ -302,27 +325,37 @@ static int mix_function_init(AudioMix *am)
>  return 0;
>  }
>  
> -int ff_audio_mix_init(AVAudioResampleContext *avr)
> +AudioMix *ff_audio_mix_alloc(AVAudioResampleContext *avr)
>  {
> +AudioMix *am;
>  int ret;
>  
> +am = av_mallocz(sizeof(*am));
> +if (!am)
> +return NULL;
> +am->avr = avr;
> +
>  if (avr->internal_sample_fmt != AV_SAMPLE_FMT_S16P &&
>  avr->internal_sample_fmt != AV_SAMPLE_FMT_FLTP) {
>  av_log(avr, AV_LOG_ERROR, "Unsupported internal format for "
> "mixing: %s\n",
> av_get_sample_fmt_name(avr->internal_sample_fmt));
> -return AVERROR(EINVAL);
> +goto error;
>  }
>  
> +am->fmt  = avr->internal_sample_fmt;
> +am->coeff_type   = avr->mix_coeff_type;
> +am->in_layout= avr->in_channel_layout;
> +am->out_layout   = avr->out_channel_layout;
> +am->in_channels  = avr->in_channels;
> +am->out_channels = avr->out_channels;
> +
>  /* build matrix if the user did not already set one */
> -if (avr->am->matrix) {
> -if (avr->am->coeff_type != avr->mix_coeff_type  ||
> -avr->am->in_layout  != avr->in_channel_layout   ||
> -avr->am->out_layout != avr->out_channel_layout) {
> -av_log(avr, AV_LOG_ERROR,
> -   "Custom matrix does not match current parameters\n");
> -return AVERROR(EINVAL);
> -}
> +if (avr->mix_matrix) {
> +ret = ff_audio_mix_set_matrix(am, avr->mix_matrix, avr->in_channels);
> +if (ret < 0)
> +goto error;
> +av_freep(&avr->mix_matrix);
>  } else {
>  int i, j;
>  char in_layout_name[128];
> @@ -330,7 +363,7 @@ int ff_audio_mix_init(AVAudioResampleContext *avr)
>  double *matrix_dbl = av_mallocz(avr->out_channels * avr->in_channels 
> *
>  sizeof(*matrix_dbl));
>  if (!matrix_dbl)
> -return AVERROR(ENOMEM);
> +goto error;
>  
>  ret = avresample_build_matrix(avr->in_channel_layout,
>avr->out_channel_layout,
> @@ -343,7 +376,7 @@ int ff_audio_mix_init(AVAudioResampleContext *avr)
>avr->matrix_encoding);
>  if (ret < 0) {
>  av_free(matrix_dbl);
> -return ret;
> +goto error;
>  }
>  
>  av_get_channel_layout_string(in_layout_name, sizeof(in_layout_name),
> @@ -360,32 +393,33 @@ int ff_audio_mix_init(AVAudioResampleContext *avr)
>  av_log(avr, AV_LOG_DEBUG, "\n");
>  }
>  
> -ret = avresample_set_matrix(avr, matrix_dbl, avr->in_channels);
> +ret = ff_audio_mix_set_matrix(am, matrix_dbl, avr->in_channels);
>  if (ret < 0) {
>  av_free(matrix_dbl);
> -return ret;
> +goto error;
>  }
>  av_free(matrix_dbl);
>  }
>  
> -avr->am->fmt  = avr->internal_samp

Re: [libav-devel] [PATCH 1/2] lavr: clarify documentation for avresample_get/set_matrix()

2012-11-30 Thread Anton Khirnov

On Thu, 29 Nov 2012 15:08:04 -0500, Justin Ruggles  
wrote:
> ---
>  libavresample/avresample.h |6 +-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/libavresample/avresample.h b/libavresample/avresample.h
> index affeeeb..a73d686 100644
> --- a/libavresample/avresample.h
> +++ b/libavresample/avresample.h
> @@ -216,6 +216,9 @@ int avresample_build_matrix(uint64_t in_layout, uint64_t 
> out_layout,
>  /**
>   * Get the current channel mixing matrix.
>   *
> + * If no custom matrix has been previously set or the AVAudioResampleContext 
> is
> + * not open, an error is returned.

Ok

> + *
>   * @param avr audio resample context
>   * @param matrix  mixing coefficients; matrix[i + stride * o] is the weight 
> of
>   *input channel i in output channel o.
> @@ -231,7 +234,8 @@ int avresample_get_matrix(AVAudioResampleContext *avr, 
> double *matrix,
>   * Allows for setting a custom mixing matrix, overriding the default matrix
>   * generated internally during avresample_open(). This function can be called
>   * anytime on an allocated context, either before or after calling
> - * avresample_open(). avresample_convert() always uses the current matrix.
> + * avresample_open(), as long as the channel layouts have been set.
> + * avresample_convert() always uses the current matrix.


Why bother mentioning this explicitly? If the channel layouts are not set,
avresample_open() will fail and avresample_convert() cannot be called at all.

-- 
Anton Khirnov
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel