Re: [libav-devel] Invitation to connect on LinkedIn
On Sat, 1 Dec 2012 06:56:08 + (UTC) fei wang wrote: > I'd like to add you to my professional network on LinkedIn. I blocked linkedin at our mailserver, this shouldnt happen again. I also unsubscribed this guy for trying to add a mailinglist to linkedin. Attila Kinali -- There is no secret ingredient -- Po, Kung Fu Panda ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] Request for a raspberry pi hardware accelerated scaling function
Dear raspberry pi user On Fri, 30 Nov 2012 20:33:28 +0100 Arjen Vellekoop wrote: > I am a bit confused, just subscribed and expected to be redericted to a > forum. Since this is not the case I'll try this way. Hope not to spam > the whole list. pls point me to the right direction if I am in the wrong > list here. Welcome to the world of opensource, where webforums are frowned upon, because they are damn inefficient to use if you are dealing with hundreds of messages every day. > A request for compiling an avconv version that is hardware accelerated > probably already exists (if not pls consider to do so), however for my > application I only need scaling. My DVB-T dongle receives MPEG2 in SD > (704x576) And I only need to transcode it to the same codec but at a > quarter resolution (CIF=352x288) Well.. You should note a few things. First, this mailinglist is about development of libav, not about its usage. This means, if you have questions about how to modify the code of libav, or have done some modifications that you would like to share with others, then you are at the right place. If it's just about usage, and compiling without modification of the _code_ is usage, then you should choose a different mailinglist. The second thing is, we do not provide binaries of the library. This is the job of distributions. All you get from us is the source code. (Ok, a few of those who package libav for distributions are also libav developers, but that's beside the point) The third thing is, that the raspberry pi is a horribly closed and undocumented piece of hardware. About the only thing that is documented is it's CPU core, and that documentation comes from ARM and not from Broadcom. With this little documentation it is nearly impossible to support even the most basic functionallity properly. Even Nvidia provides more help for getting their hardware working with opensource software. The insane amount of bit baning raspberry pi users have to do, even for the most simple stuff (like I2C)[1] is a tell tale of this. I really feel pitty for all of you, who have been tricked into buying this piece of *censored* by the raspberry pi foundation and the hype they created. Or to summarize: it's very unlikely that anyone has been able to modify libav or any other encoding library to use the hardware acceleration of the raspberry pi. I recommend you to get yourself either a BeagleBoard or a PandaBoard which are fully documented and most of its hardware drivers are already in the mainline kernel. The PandaBoard with its dual core 1.2GHz should even have enough cpu power to encode your video in real time without using any hardware acceleration, which would be also available. Attila Kinali [1]http://www.google.com/search?q=raspberry+pi+bit+banging -- There is no secret ingredient -- Po, Kung Fu Panda ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] Invitation to connect on LinkedIn
LinkedIn libav, I'd like to add you to my professional network on LinkedIn. - fei fei wang software engineer at Marvell corp. Shanghai City, China Confirm that you know fei wang: https://www.linkedin.com/e/-yb2raf-ha6e0xg6-14/isd/9843425302/RxRGuAd1/?hs=false&tok=1OxcRjoCG0rlw1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/-yb2raf-ha6e0xg6-14/qxBs_HW79xfhyWpS4lsvpOW79xfh3Ctvi6/goo/libav-devel%40libav%2Eorg/20061/I3284912995_1/?hs=false&tok=0RHHn8jpG0rlw1 (c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 10/10] SBR DSP x86: implement SSE hf_apply_noise
On Fri, Nov 30, 2012 at 6:58 AM, Christophe Gisquet wrote: > 497 to 253 cycles under Win64. > Replacing the multiplication by s_m[m] by an andps and an xorps with > appropriate vectors is slower. Unrolling is a 15 cycles win. > --- > libavcodec/sbrdsp.c |1 - > libavcodec/x86/sbrdsp.asm| 93 > ++ > libavcodec/x86/sbrdsp_init.c | 16 +++ > 3 files changed, 109 insertions(+), 1 deletions(-) > > diff --git a/libavcodec/sbrdsp.c b/libavcodec/sbrdsp.c > index 781ec83..d0a0b93 100644 > --- a/libavcodec/sbrdsp.c > +++ b/libavcodec/sbrdsp.c > @@ -175,7 +175,6 @@ static av_always_inline void sbr_hf_apply_noise(float > (*Y)[2], > int m_max) > { > int m; > - > for (m = 0; m < m_max; m++) { > float y0 = Y[m][0]; > float y1 = Y[m][1]; > diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm > index cfbd6e8..608dee6 100644 > --- a/libavcodec/x86/sbrdsp.asm > +++ b/libavcodec/x86/sbrdsp.asm > @@ -26,6 +26,12 @@ SECTION_RODATA > ps_mask times 2 dd 1<<31, 0 > ps_mask2times 2 dd 0, 1<<31 > ps_neg times 4 dd 1<<31 > +ps_noise0 times 2 dd 1.0, 0.0, > +ps_noise2 times 2 dd -1.0, 0.0 > +ps_noise13 dd 0.0, 1.0, 0.0, -1.0 > +dd 0.0, -1.0, 0.0, 1.0 > +dd 0.0, 1.0, 0.0, -1.0 > +cextern sbr_noise_table > > SECTION_TEXT > > @@ -318,3 +324,90 @@ cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c >subcq, 2*mmsize >jge .loop >REP_RET > + > +; r0q=Y r1q=s_m r2q=q_filt r3q=noise r4q=max_m > +cglobal hf_apply_noise_main > + dec r3q > + shl r4q, 2 > + lea r0q, [r0q + 2*r4q] > + add r1q, r4q > + add r2q, r4q > + shl r3q, 3 > + xorps m5, m5 > + neg r4q > +.loop: > + add r3q, 16 > + and r3q, 0x1ff<<3 > + movh m1, [r2q + r4q] > + movu m3, [r3q + sbr_noise_table] > + movh m2, [r2q + r4q + 8] > + add r3q, 16 > + and r3q, 0x1ff<<3 > + movu m4, [r3q + sbr_noise_table] > + unpcklps m1, m1 > + unpcklps m2, m2 > + mulps m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise] > + mulps m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise] > + movh m3, [r1q + r4q] > + movh m4, [r1q + r4q + 8] > + unpcklps m3, m3 > + unpcklps m4, m4 > + mova m6, m3 > + mova m7, m4 > + mulps m3, m0 ; s_m[m] * phi_sign > + mulps m4, m0 ; s_m[m] * phi_sign > + cmpps m6, m5, 0 ; m1 == 0 > + cmpps m7, m5, 0 ; m1 == 0 > + andps m1, m6 > + andps m2, m7 > + movu m6, [r0q + 2*r4q] > + movu m7, [r0q + 2*r4q + 16] > + addps m6, m1 > + addps m7, m2 > + addps m6, m3 > + addps m7, m4 Maybe add m1/m2 to m3/m4 before to m6/m7, to better hide the memory load? Jason ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
Hi, On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet wrote: > Hello, > > 2012/11/30 Loren Merritt : >> If you increment an index into W and z rather than the pointers >> themselves, then you can eliminate an add and a cmp. > > I add already tested that, and redid it: > cglobal sbr_qmf_post_shuffle, 2,4,3,W,z > mov r3q, 32*4 > lea r2q, [zq + (64-4)*4] > addzq, r3q > leaWq, [Wq + 2*r3q] > neg r3q > .loop: > mova m0, [r2q] > mova m1, [zq + r3q] > xorps m0, [ps_neg] > shufps m0, m0, 0x1B > mova m2, m0 > unpcklps m0, m1 > unpckhps m2, m1 > mova [Wq + 2*r3q + 0], m0 > mova [Wq + 2*r3q + 16], m2 > sub r2q, 16 > add r3q, 16 > jl .loop > REP_RET > > It's 2 cycles slower on Penrynn/Win64 (154 vs 152). Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it? Ronald ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
Christophe Gisquet writes: >> 4 space tabs. > > OK, I was a bit puzzled and looking for trailing whitespaces/... You > mean style change then. > A bit cumbersome to redo all patches because of that. :%s/^ // -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 01/10] SBR DSP x86: implement SSE sbr_hf_gen
Hello, 2012/11/30 Loren Merritt : > Recommend using base-4 for shuffle constants. I wrote that code like 6 months ago, before I really wrapped my head around/noticed that. Do you want me to change it now, or is that a remark for later contributions? -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
Hello, 2012/11/30 Loren Merritt : > If you increment an index into W and z rather than the pointers > themselves, then you can eliminate an add and a cmp. I add already tested that, and redid it: cglobal sbr_qmf_post_shuffle, 2,4,3,W,z mov r3q, 32*4 lea r2q, [zq + (64-4)*4] addzq, r3q leaWq, [Wq + 2*r3q] neg r3q .loop: mova m0, [r2q] mova m1, [zq + r3q] xorps m0, [ps_neg] shufps m0, m0, 0x1B mova m2, m0 unpcklps m0, m1 unpckhps m2, m1 mova [Wq + 2*r3q + 0], m0 mova [Wq + 2*r3q + 16], m2 sub r2q, 16 add r3q, 16 jl .loop REP_RET It's 2 cycles slower on Penrynn/Win64 (154 vs 152). > 4 space tabs. OK, I was a bit puzzled and looking for trailing whitespaces/... You mean style change then. A bit cumbersome to redo all patches because of that. -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.
On 11/30/12 6:28 PM, Christophe Gisquet wrote: > 2012/11/30 Luca Barbato : >> The idea is nice, is Ypos always 0 or 1? > > Yes, and actually, Ypos is here because we already dealt with a > similar situation (see commit > cc412b71047ebf77c7e810c90b044f018a1c0c2d). > > So I am just reapplying the same solution. > Fine for me then. thanks for checking =) lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 10/10] SBR DSP x86: implement SSE hf_apply_noise
On Fri, 30 Nov 2012, Christophe Gisquet wrote: > 497 to 253 cycles under Win64. cpu is more relevant than os. > +; r0q=Y r1q=s_m r2q=q_filt r3q=noise r4q=max_m > +cglobal hf_apply_noise_main You can invoke DEFINE_ARGS even if not generating a prologue. > + dec r3q > + shl r4q, 2 > + lea r0q, [r0q + 2*r4q] > + add r1q, r4q > + add r2q, r4q > + shl r3q, 3 > + xorps m5, m5 > + neg r4q > +.loop: > + add r3q, 16 > + and r3q, 0x1ff<<3 > + movh m1, [r2q + r4q] > + movu m3, [r3q + sbr_noise_table] > + movh m2, [r2q + r4q + 8] > + add r3q, 16 > + and r3q, 0x1ff<<3 > + movu m4, [r3q + sbr_noise_table] > + unpcklps m1, m1 > + unpcklps m2, m2 > + mulps m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise] > + mulps m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise] > + movh m3, [r1q + r4q] > + movh m4, [r1q + r4q + 8] Can these be a single aligned load? > + unpcklps m3, m3 > + unpcklps m4, m4 > + mova m6, m3 > + mova m7, m4 > + mulps m3, m0 ; s_m[m] * phi_sign > + mulps m4, m0 ; s_m[m] * phi_sign > + cmpps m6, m5, 0 ; m1 == 0 > + cmpps m7, m5, 0 ; m1 == 0 You mean m7 == 0? > + andps m1, m6 > + andps m2, m7 > + movu m6, [r0q + 2*r4q] > + movu m7, [r0q + 2*r4q + 16] > + addps m6, m1 > + addps m7, m2 > + addps m6, m3 > + addps m7, m4 > + movu[r0q + 2*r4q], m6 > + movu[r0q + 2*r4q + 16], m7 > + add r4q, 16 > + jl .loop > + ret > + > +; sbr_hf_apply_noise_0(float (*Y)[2], const float *s_m, > +; const float *q_filt, int noise, > +; int kx, int m_max) > +cglobal sbr_hf_apply_noise_0, 4,5,8, Y,s_m,q_filt,noise,kx,m_max > + mova m0, [ps_noise0] > + mov r4d, m_maxm > + call hf_apply_noise_main > + RET TAIL_CALL hf_apply_noise_main, 1 --Loren Merritt ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
On Fri, 30 Nov 2012, Christophe Gisquet wrote: > +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z > + lea r2q, [zq + (64-4)*4] > +.loop: > + mova m0, [r2q] > + mova m1, [zq ] > + xorps m0, [ps_neg] > + shufps m0, m0, 0x1B > + mova m2, m0 > + unpcklps m0, m1 > + unpckhps m2, m1 > + mova [Wq + 0], m0 > + mova [Wq + 16], m2 > + addWq, 32 > + sub r2q, 16 > + addzq, 16 > + cmpzq, r2q > + jl .loop > + REP_RET If you increment an index into W and z rather than the pointers themselves, then you can eliminate an add and a cmp. 4 space tabs. --Loren Merritt ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 02/10] SBR DSP x86: implement SSE sum64x5
On Fri, 30 Nov 2012, Christophe Gisquet wrote: > 698 to 174 cycles on penrynn. Unrolling is a 6 cycles gain. > > --- > libavcodec/x86/sbrdsp.asm| 22 ++ > libavcodec/x86/sbrdsp_init.c |2 ++ > 2 files changed, 24 insertions(+), 0 deletions(-) LGTM. --Loren Merritt ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 01/10] SBR DSP x86: implement SSE sbr_hf_gen
On Fri, 30 Nov 2012, Christophe Gisquet wrote: > +movam0, [X_lowq + start] > +movlhps m1, m1 ; (a2 a3 a2 a3) > +movlhps m2, m2 ; (a0 a1 a0 a1) > +shufps m3, m3, 00010001b ; (a3 a2 a3 a2) > +shufps m4, m4, 00010001b ; (a1 a0 a1 a0) > +xorps m3, m7 ; (-a3 a2 -a3 a2) > +xorps m4, m7 ; (-a1 a0 -a1 a0) > +.loop2: > +movam5, m0 > +movam6, m0 > +shufps m0, m0, 1010b ; {Xl[-2][0],",Xl[-1][0],"} > +shufps m5, m5, 0101b ; {Xl[-2][1],",Xl[-1][1],"} > +mulps m0, m2 > +mulps m5, m4 > +movam7, m6 > +addps m5, m0 > +movam0, [X_lowq + start + 2*2*4] > +shufps m6, m0, 1010b ; {Xl[-1][0],",Xl[0][0],"} > +shufps m7, m0, 0101b ; {Xl[-1][1],",Xl[1][1],"} Recommend using base-4 for shuffle constants. --Loren Merritt ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] Request for a raspberry pi hardware accelerated scaling function
Sorry, sent it previously from an unknown email address Dear readers. I am a bit confused, just subscribed and expected to be redericted to a forum. Since this is not the case I'll try this way. Hope not to spam the whole list. pls point me to the right direction if I am in the wrong list here. Here's my situation I own a raspberry Pi. Connected a IT9135 DVB-T USB dongle, installed raspbian (a Debain linux special for this hardware) and tvheadend. I am able to stream and watch TV over my LAN network. However I like to stream TV over the internet, but I lack the required bandwidth to do so. This issue is often referred to in the raspberry forum I can do some aftermath once I recorded a program in tvheadend, but a 1 hour show will take 3 hours to transcode in the raspberry with avconv. The CPU is limited. However, there is a GPU with hardware acceleration in the raspberry pi. And there is an example program under /opt/vc/.../hello_video that works. It works with an Openmax API A request for compiling an avconv version that is hardware accelerated probably already exists (if not pls consider to do so), however for my application I only need scaling. My DVB-T dongle receives MPEG2 in SD (704x576) And I only need to transcode it to the same codec but at a quarter resolution (CIF=352x288) So for now I do not need the whole library to be recompiled for hardware acceleration just he scaling bit. Could this be possible? Thanks ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/1] golomb: use unsigned arithmetics in svq3_get_ue_golomb()
This prevents undefined behaviour of signed left shift if the coded value is larger than 2^31. Large values are most likely invalid and caused errors or by feeding random. Validate every use of svq3_get_ue_golomb() and changed the place there the return value was compared with negative numbers. dirac.c was clean, fixed rv30 and svq3. --- libavcodec/golomb.h | 5 +++-- libavcodec/rv30.c | 6 +++--- libavcodec/svq3.c | 17 - 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/libavcodec/golomb.h b/libavcodec/golomb.h index 6f95a67..564ba4e 100644 --- a/libavcodec/golomb.h +++ b/libavcodec/golomb.h @@ -107,7 +107,8 @@ static inline int get_ue_golomb_31(GetBitContext *gb){ return ff_ue_golomb_vlc_code[buf]; } -static inline int svq3_get_ue_golomb(GetBitContext *gb){ +static inline unsigned svq3_get_ue_golomb(GetBitContext *gb) +{ uint32_t buf; OPEN_READER(re, gb); @@ -121,7 +122,7 @@ static inline int svq3_get_ue_golomb(GetBitContext *gb){ return ff_interleaved_ue_golomb_vlc_code[buf]; }else{ -int ret = 1; +unsigned ret = 1; do { buf >>= 32 - 8; diff --git a/libavcodec/rv30.c b/libavcodec/rv30.c index 8016ad3..e4f3251 100644 --- a/libavcodec/rv30.c +++ b/libavcodec/rv30.c @@ -73,7 +73,7 @@ static int rv30_decode_intra_types(RV34DecContext *r, GetBitContext *gb, int8_t for(i = 0; i < 4; i++, dst += r->intra_types_stride - 4){ for(j = 0; j < 4; j+= 2){ -int code = svq3_get_ue_golomb(gb) << 1; +unsigned code = svq3_get_ue_golomb(gb) << 1; if(code >= 81*2){ av_log(r->s.avctx, AV_LOG_ERROR, "Incorrect intra prediction code\n"); return -1; @@ -101,9 +101,9 @@ static int rv30_decode_mb_info(RV34DecContext *r) static const int rv30_b_types[6] = { RV34_MB_SKIP, RV34_MB_B_DIRECT, RV34_MB_B_FORWARD, RV34_MB_B_BACKWARD, RV34_MB_TYPE_INTRA, RV34_MB_TYPE_INTRA16x16 }; MpegEncContext *s = &r->s; GetBitContext *gb = &s->gb; -int code = svq3_get_ue_golomb(gb); +unsigned code = svq3_get_ue_golomb(gb); -if (code < 0 || code > 11) { +if (code > 11) { av_log(s->avctx, AV_LOG_ERROR, "Incorrect MB type code\n"); return -1; } diff --git a/libavcodec/svq3.c b/libavcodec/svq3.c index ac8d9c1..4f0c2c0 100644 --- a/libavcodec/svq3.c +++ b/libavcodec/svq3.c @@ -216,17 +216,15 @@ static inline int svq3_decode_block(GetBitContext *gb, DCTELEM *block, static const uint8_t *const scan_patterns[4] = { luma_dc_zigzag_scan, zigzag_scan, svq3_scan, chroma_dc_scan }; -int run, level, sign, vlc, limit; +int run, level, limit; +unsigned vlc; const int intra = 3 * type >> 2; const uint8_t *const scan = scan_patterns[type]; for (limit = (16 >> intra); index < 16; index = limit, limit += 8) { for (; (vlc = svq3_get_ue_golomb(gb)) != 0; index++) { -if (vlc == INVALID_VLC) -return -1; - -sign = (vlc & 0x1) - 1; -vlc = vlc + 1 >> 1; +int sign = (vlc & 1) ? 0 : -1; +vlc = vlc + 1 >> 1; if (type == 3) { if (vlc < 3) { @@ -786,7 +784,7 @@ static int svq3_decode_slice_header(AVCodecContext *avctx) skip_bits_long(&s->gb, 0); } -if ((i = svq3_get_ue_golomb(&s->gb)) == INVALID_VLC || i >= 3) { +if ((i = svq3_get_ue_golomb(&s->gb)) >= 3) { av_log(h->s.avctx, AV_LOG_ERROR, "illegal slice type %d \n", i); return -1; } @@ -1010,7 +1008,7 @@ static int svq3_decode_frame(AVCodecContext *avctx, void *data, H264Context *h = &svq3->h; MpegEncContext *s = &h->s; int buf_size = avpkt->size; -int m, mb_type; +int m; /* special case for last picture */ if (buf_size == 0) { @@ -1093,6 +1091,7 @@ static int svq3_decode_frame(AVCodecContext *avctx, void *data, for (s->mb_y = 0; s->mb_y < s->mb_height; s->mb_y++) { for (s->mb_x = 0; s->mb_x < s->mb_width; s->mb_x++) { +unsigned mb_type; h->mb_xy = s->mb_x + s->mb_y * s->mb_stride; if ((get_bits_count(&s->gb) + 7) >= s->gb.size_in_bits && @@ -1113,7 +1112,7 @@ static int svq3_decode_frame(AVCodecContext *avctx, void *data, mb_type += 8; else if (s->pict_type == AV_PICTURE_TYPE_B && mb_type >= 4) mb_type += 4; -if ((unsigned)mb_type > 33 || svq3_decode_mb(svq3, mb_type)) { +if (mb_type > 33 || svq3_decode_mb(svq3, mb_type)) { av_log(h->s.avctx, AV_LOG_ERROR, "error while decoding MB %d %d\n", s->mb_x, s->mb_y); return -1; -- 1.7.12.4 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] configure: sunos: clean up shared library options
On Wednesday, November 28, 2012, Sean McGovern wrote: > On Wed, Nov 28, 2012 at 8:08 PM, Måns Rullgård wrote: >> Sean McGovern writes: >> >>> Several of the options were incorrect for suncc. >>> --- >>> configure | 8 +--- >>> 1 file changed, 5 insertions(+), 3 deletions(-) >>> >>> diff --git a/configure b/configure >>> index ca11a85..f094a32 100755 >>> --- a/configure >>> +++ b/configure >>> @@ -2196,7 +2196,8 @@ suncc_flags(){ >>> -fomit-frame-pointer) echo -xregs=frameptr;; >>> -fPIC)echo -KPIC -xcode=pic32 ;; >>> -W*,*)echo $flag ;; >>> --f*-*|-W*);; >>> +-f*-*|-W*|-mimpure-text) ;; >>> +-shared) echo -G ;; >>> *)echo $flag ;; >>> esac >>> done >>> @@ -2748,8 +2749,9 @@ case $target_os in >>> ;; >>> sunos) >>> AVSERVERLDFLAGS="" >>> -SHFLAGS='-shared -Wl,-h,$$(@F)' >>> -enabled x86 && SHFLAGS="-mimpure-text $SHFLAGS" >>> +SHFLAGS='-Wl,-h,$$(@F)' >>> +append SHFLAGS $($ldflags_filter -shared) >>> +enabled x86 && append SHFLAGS $($ldflags_filter -mimpure-text) >>> network_extralibs="-lsocket -lnsl" >>> add_cppflags -D__EXTENSIONS__ -D_XOPEN_SOURCE=600 >>> # When using suncc to build, the Solaris linker will mark >>> -- >> >> I have an even better idea. Drop the second hunk above and apply this >> instead: >> >> diff --git a/configure b/configure >> index 38f52b1..0c580e1c 100755 >> --- a/configure >> +++ b/configure >> @@ -3781,7 +3781,7 @@ LD_PATH=$LD_PATH >> DLLTOOL=$dlltool >> LDFLAGS=$LDFLAGS >> LDFLAGS-avserver=$AVSERVERLDFLAGS >> -SHFLAGS=$SHFLAGS >> +SHFLAGS=$($ldflags_filter $SHFLAGS) >> YASMFLAGS=$YASMFLAGS >> BUILDSUF=$build_suffix >> FULLNAME=$FULLNAME >> >> >> -- >> Måns Rullgård >> m...@mansr.com > > This didn't go so well... SHFLAGS now contains a very unwanted linebreak. My bash-fu is not great, this seems to be due to string tokenization inside the here-document. I can echo it with the same statement right above the here-doc and no linebreak is present. -- Sean McG. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.
2012/11/30 Luca Barbato : > The idea is nice, is Ypos always 0 or 1? Yes, and actually, Ypos is here because we already dealt with a similar situation (see commit cc412b71047ebf77c7e810c90b044f018a1c0c2d). So I am just reapplying the same solution. -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.
On 11/30/12 3:58 PM, Christophe Gisquet wrote: > Swapping buffer indices allows saving one memcpy that accounts for 1% of the > runtime, according to oprofile. > --- > libavcodec/aacsbr.c | 22 +++--- > 1 files changed, 11 insertions(+), 11 deletions(-) The idea is nice, is Ypos always 0 or 1? lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] lavr: clarify documentation for avresample_get/set_matrix()
On Fri, 30 Nov 2012 11:18:08 -0500, Justin Ruggles wrote: > On 11/30/2012 05:47 AM, Anton Khirnov wrote: > > > > On Thu, 29 Nov 2012 15:08:04 -0500, Justin Ruggles > > wrote: > >> --- > >> libavresample/avresample.h |6 +- > >> 1 files changed, 5 insertions(+), 1 deletions(-) > >> > >> diff --git a/libavresample/avresample.h b/libavresample/avresample.h > >> index affeeeb..a73d686 100644 > >> --- a/libavresample/avresample.h > >> +++ b/libavresample/avresample.h > >> @@ -216,6 +216,9 @@ int avresample_build_matrix(uint64_t in_layout, > >> uint64_t out_layout, > >> /** > >> * Get the current channel mixing matrix. > >> * > >> + * If no custom matrix has been previously set or the > >> AVAudioResampleContext is > >> + * not open, an error is returned. > > > > Ok > > > >> + * > >> * @param avr audio resample context > >> * @param matrix mixing coefficients; matrix[i + stride * o] is the > >> weight of > >> *input channel i in output channel o. > >> @@ -231,7 +234,8 @@ int avresample_get_matrix(AVAudioResampleContext *avr, > >> double *matrix, > >> * Allows for setting a custom mixing matrix, overriding the default > >> matrix > >> * generated internally during avresample_open(). This function can be > >> called > >> * anytime on an allocated context, either before or after calling > >> - * avresample_open(). avresample_convert() always uses the current matrix. > >> + * avresample_open(), as long as the channel layouts have been set. > >> + * avresample_convert() always uses the current matrix. > > > > > > Why bother mentioning this explicitly? If the channel layouts are not set, > > avresample_open() will fail and avresample_convert() cannot be called at > > all. > > Because avresample_get/set_matrix() can be called before > avresample_open(). So if the user wants to do that they just have to > make sure they set the layouts first. > Ah nvm, seems I just parsed that sentence wrong. Patch LGTM -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] lavr: clarify documentation for avresample_get/set_matrix()
On 11/30/2012 05:47 AM, Anton Khirnov wrote: > > On Thu, 29 Nov 2012 15:08:04 -0500, Justin Ruggles > wrote: >> --- >> libavresample/avresample.h |6 +- >> 1 files changed, 5 insertions(+), 1 deletions(-) >> >> diff --git a/libavresample/avresample.h b/libavresample/avresample.h >> index affeeeb..a73d686 100644 >> --- a/libavresample/avresample.h >> +++ b/libavresample/avresample.h >> @@ -216,6 +216,9 @@ int avresample_build_matrix(uint64_t in_layout, uint64_t >> out_layout, >> /** >> * Get the current channel mixing matrix. >> * >> + * If no custom matrix has been previously set or the >> AVAudioResampleContext is >> + * not open, an error is returned. > > Ok > >> + * >> * @param avr audio resample context >> * @param matrix mixing coefficients; matrix[i + stride * o] is the weight >> of >> *input channel i in output channel o. >> @@ -231,7 +234,8 @@ int avresample_get_matrix(AVAudioResampleContext *avr, >> double *matrix, >> * Allows for setting a custom mixing matrix, overriding the default matrix >> * generated internally during avresample_open(). This function can be >> called >> * anytime on an allocated context, either before or after calling >> - * avresample_open(). avresample_convert() always uses the current matrix. >> + * avresample_open(), as long as the channel layouts have been set. >> + * avresample_convert() always uses the current matrix. > > > Why bother mentioning this explicitly? If the channel layouts are not set, > avresample_open() will fail and avresample_convert() cannot be called at all. Because avresample_get/set_matrix() can be called before avresample_open(). So if the user wants to do that they just have to make sure they set the layouts first. -Justin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/10] Various patches for AAC SBR DSP
2012/11/30 Måns Rullgård : >> I couldn't find a vector excercising qmf_deint_neg, and I guess neither did >> the one who wrote it, > > I can assure you I did, but I don't remember which one(s). Ok, it's just that it looked so much like where I ended that I thought the same had happened to you. But as far as I know, no sample in the fate suite tests for that code block/dsp function. I also extracted/used some samples from the mplayerhq archives, but to no avail. -- Christophe ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/1] avprobe: report per stream bit rate if set by the decoder
On 11/30/12 3:45 PM, Janne Grunau wrote: > --- > avprobe.c | 4 > 1 file changed, 4 insertions(+) > Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/3] avutil: Use a configure check to enable windows console functions
Martin Storsjö writes: > Not all versions or API subsets of windows have these functions. > > Signed-off-by: Martin Storsjö > --- > configure |2 ++ > libavutil/log.c |4 ++-- > 2 files changed, 4 insertions(+), 2 deletions(-) LGTM -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/3] avutil: Include io.h with a separate condition from windows console functions
Martin Storsjö writes: > Not all versions of windows have the console color functions, > while io.h might be needed for isatty (which can be found in > unistd.h or io.h). > > Signed-off-by: Martin Storsjö > --- > libavutil/log.c |4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) LGTM -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/3] configure: Use headers in the check for _beginthreadex for w32threads
Martin Storsjö writes: > When targeting the metro API subset, this function still exists in > the link libraries, but is excluded from the headers. This makes > sure w32threads is automatically disabled when targeting this API > subset (since not all the necessary functions for it are available). > > Signed-off-by: Martin Storsjö > --- > configure |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure b/configure > index 38f49e0..02dd1c6 100755 > --- a/configure > +++ b/configure > @@ -3332,7 +3332,7 @@ disabled zlib || check_lib zlib.h zlibVersion > -lz || disable zlib > disabled bzlib || check_lib2 bzlib.h BZ2_bzlibVersion -lbz2 || disable bzlib > > if ! disabled w32threads && ! enabled pthreads; then > -check_func _beginthreadex && enable w32threads > +check_func_headers "windows.h process.h" _beginthreadex && enable > w32threads > fi > > # check for some common methods of building with pthread support > -- OK -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 00/10] Various patches for AAC SBR DSP
Christophe Gisquet writes: > I couldn't find a vector excercising qmf_deint_neg, and I guess neither did > the one who wrote it, I can assure you I did, but I don't remember which one(s). > as not all the code in the same code block of aacsbr was moved to DSP > functions. That's an invalid conclusion. The remaining code was probably just not worth the effort to move. -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 3/3] configure: Use headers in the check for _beginthreadex for w32threads
When targeting the metro API subset, this function still exists in the link libraries, but is excluded from the headers. This makes sure w32threads is automatically disabled when targeting this API subset (since not all the necessary functions for it are available). Signed-off-by: Martin Storsjö --- configure |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure b/configure index 38f49e0..02dd1c6 100755 --- a/configure +++ b/configure @@ -3332,7 +3332,7 @@ disabled zlib || check_lib zlib.h zlibVersion -lz || disable zlib disabled bzlib || check_lib2 bzlib.h BZ2_bzlibVersion -lbz2 || disable bzlib if ! disabled w32threads && ! enabled pthreads; then -check_func _beginthreadex && enable w32threads +check_func_headers "windows.h process.h" _beginthreadex && enable w32threads fi # check for some common methods of building with pthread support -- 1.7.9.4 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/3] avutil: Use a configure check to enable windows console functions
Not all versions or API subsets of windows have these functions. Signed-off-by: Martin Storsjö --- configure |2 ++ libavutil/log.c |4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 5e1be33..38f49e0 100755 --- a/configure +++ b/configure @@ -1242,6 +1242,7 @@ HAVE_LIST=" sched_getaffinity sdl sdl_video_size +SetConsoleTextAttribute setmode setrlimit Sleep @@ -3305,6 +3306,7 @@ check_func_headers windows.h GetProcessAffinityMask check_func_headers windows.h GetProcessTimes check_func_headers windows.h GetSystemTimeAsFileTime check_func_headers windows.h MapViewOfFile +check_func_headers windows.h SetConsoleTextAttribute check_func_headers windows.h Sleep check_func_headers windows.h VirtualAlloc diff --git a/libavutil/log.c b/libavutil/log.c index d335944..45c649a 100644 --- a/libavutil/log.c +++ b/libavutil/log.c @@ -41,7 +41,7 @@ static int av_log_level = AV_LOG_INFO; static int flags; -#if defined(_WIN32) && !defined(__MINGW32CE__) +#if HAVE_SETCONSOLETEXTATTRIBUTE #include static const uint8_t color[] = { 12, 12, 12, 14, 7, 10, 11 }; static int16_t background, attr_orig; @@ -59,7 +59,7 @@ static int use_color = -1; static void colored_fputs(int level, const char *str) { if (use_color < 0) { -#if defined(_WIN32) && !defined(__MINGW32CE__) +#if HAVE_SETCONSOLETEXTATTRIBUTE CONSOLE_SCREEN_BUFFER_INFO con_info; con = GetStdHandle(STD_ERROR_HANDLE); use_color = (con != INVALID_HANDLE_VALUE) && !getenv("NO_COLOR") && -- 1.7.9.4 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/3] avutil: Include io.h with a separate condition from windows console functions
Not all versions of windows have the console color functions, while io.h might be needed for isatty (which can be found in unistd.h or io.h). Signed-off-by: Martin Storsjö --- libavutil/log.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libavutil/log.c b/libavutil/log.c index d2cf88f..d335944 100644 --- a/libavutil/log.c +++ b/libavutil/log.c @@ -29,6 +29,9 @@ #if HAVE_UNISTD_H #include #endif +#if HAVE_IO_H +#include +#endif #include #include "avstring.h" #include "avutil.h" @@ -40,7 +43,6 @@ static int flags; #if defined(_WIN32) && !defined(__MINGW32CE__) #include -#include static const uint8_t color[] = { 12, 12, 12, 14, 7, 10, 11 }; static int16_t background, attr_orig; static HANDLE con; -- 1.7.9.4 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 10/10] SBR DSP x86: implement SSE hf_apply_noise
497 to 253 cycles under Win64. Replacing the multiplication by s_m[m] by an andps and an xorps with appropriate vectors is slower. Unrolling is a 15 cycles win. --- libavcodec/sbrdsp.c |1 - libavcodec/x86/sbrdsp.asm| 93 ++ libavcodec/x86/sbrdsp_init.c | 16 +++ 3 files changed, 109 insertions(+), 1 deletions(-) diff --git a/libavcodec/sbrdsp.c b/libavcodec/sbrdsp.c index 781ec83..d0a0b93 100644 --- a/libavcodec/sbrdsp.c +++ b/libavcodec/sbrdsp.c @@ -175,7 +175,6 @@ static av_always_inline void sbr_hf_apply_noise(float (*Y)[2], int m_max) { int m; - for (m = 0; m < m_max; m++) { float y0 = Y[m][0]; float y1 = Y[m][1]; diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index cfbd6e8..608dee6 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -26,6 +26,12 @@ SECTION_RODATA ps_mask times 2 dd 1<<31, 0 ps_mask2times 2 dd 0, 1<<31 ps_neg times 4 dd 1<<31 +ps_noise0 times 2 dd 1.0, 0.0, +ps_noise2 times 2 dd -1.0, 0.0 +ps_noise13 dd 0.0, 1.0, 0.0, -1.0 +dd 0.0, -1.0, 0.0, 1.0 +dd 0.0, 1.0, 0.0, -1.0 +cextern sbr_noise_table SECTION_TEXT @@ -318,3 +324,90 @@ cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c subcq, 2*mmsize jge .loop REP_RET + +; r0q=Y r1q=s_m r2q=q_filt r3q=noise r4q=max_m +cglobal hf_apply_noise_main + dec r3q + shl r4q, 2 + lea r0q, [r0q + 2*r4q] + add r1q, r4q + add r2q, r4q + shl r3q, 3 + xorps m5, m5 + neg r4q +.loop: + add r3q, 16 + and r3q, 0x1ff<<3 + movh m1, [r2q + r4q] + movu m3, [r3q + sbr_noise_table] + movh m2, [r2q + r4q + 8] + add r3q, 16 + and r3q, 0x1ff<<3 + movu m4, [r3q + sbr_noise_table] + unpcklps m1, m1 + unpcklps m2, m2 + mulps m1, m3 ; m2 = q_filt[m] * ff_sbr_noise_table[noise] + mulps m2, m4 ; m2 = q_filt[m] * ff_sbr_noise_table[noise] + movh m3, [r1q + r4q] + movh m4, [r1q + r4q + 8] + unpcklps m3, m3 + unpcklps m4, m4 + mova m6, m3 + mova m7, m4 + mulps m3, m0 ; s_m[m] * phi_sign + mulps m4, m0 ; s_m[m] * phi_sign + cmpps m6, m5, 0 ; m1 == 0 + cmpps m7, m5, 0 ; m1 == 0 + andps m1, m6 + andps m2, m7 + movu m6, [r0q + 2*r4q] + movu m7, [r0q + 2*r4q + 16] + addps m6, m1 + addps m7, m2 + addps m6, m3 + addps m7, m4 + movu[r0q + 2*r4q], m6 + movu[r0q + 2*r4q + 16], m7 + add r4q, 16 + jl .loop + ret + +; sbr_hf_apply_noise_0(float (*Y)[2], const float *s_m, +; const float *q_filt, int noise, +; int kx, int m_max) +cglobal sbr_hf_apply_noise_0, 4,5,8, Y,s_m,q_filt,noise,kx,m_max + mova m0, [ps_noise0] + mov r4d, m_maxm + call hf_apply_noise_main + RET + +; sbr_hf_apply_noise_1(float (*Y)[2], const float *s_m, +; const float *q_filt, int noise, +; int kx, int m_max) +cglobal sbr_hf_apply_noise_1, 5,5,8, Y,s_m,q_filt,noise,kx,m_max + and kxq, 1 + shl kxq, 4 + mova m0, [kxq + ps_noise13] + mov r4d, m_maxm + call hf_apply_noise_main + RET + +; sbr_hf_apply_noise_2(float (*Y)[2], const float *s_m, +; const float *q_filt, int noise, +; int kx, int m_max) +cglobal sbr_hf_apply_noise_2, 4,5,8, Y,s_m,q_filt,noise,kx,m_max + mova m0, [ps_noise2] + mov r4d, m_maxm + call hf_apply_noise_main + RET + +; sbr_hf_apply_noise_3(float (*Y)[2], const float *s_m, +; const float *q_filt, int noise, +; int kx, int m_max) +cglobal sbr_hf_apply_noise_3, 5,5,8, Y,s_m,q_filt,noise,kx,m_max + and kxq, 1 + shl kxq, 4 + mova m0, [kxq + ps_noise13 + 16] + mov r4d, m_maxm + call hf_apply_noise_main + RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index 5e3e131..9759314 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -36,6 +36,18 @@ void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z); void ff_sbr_qmf_pre_shuffle_sse(float *z); void ff_sbr_qmf_deint_neg_sse(float *v, const float *src); void ff_sbr_qmf_deint_bfly_sse(float *v, const float *src0, const float *src1); +void ff_sbr_hf_apply_noise_0_sse(float (*Y)[2], const float *s_m, + const float *q_filt, int noise, + int kx, int m_max); +void ff_sbr_hf_apply_noise_1_sse(float (*Y)[2], const float *s_m, + const float *q_filt, int noise, + int kx, int m_max); +void ff_sbr_hf_apply_noise_2_sse(float (*Y)[2], const float *s_m, +
[libav-devel] [PATCH 09/10] AAC SBR: avoid a memcpy.
Swapping buffer indices allows saving one memcpy that accounts for 1% of the runtime, according to oprofile. --- libavcodec/aacsbr.c | 22 +++--- 1 files changed, 11 insertions(+), 11 deletions(-) diff --git a/libavcodec/aacsbr.c b/libavcodec/aacsbr.c index df5d927..40b08f9 100644 --- a/libavcodec/aacsbr.c +++ b/libavcodec/aacsbr.c @@ -1153,10 +1153,9 @@ static void sbr_dequant(SpectralBandReplication *sbr, int id_aac) */ static void sbr_qmf_analysis(DSPContext *dsp, FFTContext *mdct, SBRDSPContext *sbrdsp, const float *in, float *x, - float z[320], float W[2][32][32][2]) + float z[320], float W[2][32][32][2], int buf_idx) { int i; -memcpy(W[0], W[1], sizeof(W[0])); memcpy(x, x+1024, (320-32)*sizeof(x[0])); memcpy(x+288, in, 1024*sizeof(x[0])); for (i = 0; i < 32; i++) { // numTimeSlots*RATE = 16*2 as 960 sample frames @@ -1165,7 +1164,7 @@ static void sbr_qmf_analysis(DSPContext *dsp, FFTContext *mdct, sbrdsp->sum64x5(z); sbrdsp->qmf_pre_shuffle(z); mdct->imdct_half(mdct, z, z+64); -sbrdsp->qmf_post_shuffle(W[1][i], z); +sbrdsp->qmf_post_shuffle(W[buf_idx][i], z); x += 32; } } @@ -1301,7 +1300,8 @@ static void sbr_chirp(SpectralBandReplication *sbr, SBRData *ch_data) /// Generate the subband filtered lowband static int sbr_lf_gen(AACContext *ac, SpectralBandReplication *sbr, - float X_low[32][40][2], const float W[2][32][32][2]) + float X_low[32][40][2], const float W[2][32][32][2], + int buf_idx) { int i, k; const int t_HFGen = 8; @@ -1309,14 +1309,15 @@ static int sbr_lf_gen(AACContext *ac, SpectralBandReplication *sbr, memset(X_low, 0, 32*sizeof(*X_low)); for (k = 0; k < sbr->kx[1]; k++) { for (i = t_HFGen; i < i_f + t_HFGen; i++) { -X_low[k][i][0] = W[1][i - t_HFGen][k][0]; -X_low[k][i][1] = W[1][i - t_HFGen][k][1]; +X_low[k][i][0] = W[buf_idx][i - t_HFGen][k][0]; +X_low[k][i][1] = W[buf_idx][i - t_HFGen][k][1]; } } +buf_idx = 1-buf_idx; for (k = 0; k < sbr->kx[0]; k++) { for (i = 0; i < t_HFGen; i++) { -X_low[k][i][0] = W[0][i + i_f - t_HFGen][k][0]; -X_low[k][i][1] = W[0][i + i_f - t_HFGen][k][1]; +X_low[k][i][0] = W[buf_idx][i + i_f - t_HFGen][k][0]; +X_low[k][i][1] = W[buf_idx][i + i_f - t_HFGen][k][1]; } } return 0; @@ -1344,7 +1345,6 @@ static int sbr_hf_gen(AACContext *ac, SpectralBandReplication *sbr, "ERROR : no subband found for frequency %d\n", k); return -1; } - sbr->dsp.hf_gen(X_high[k] + ENVELOPE_ADJUSTMENT_OFFSET, X_low[p] + ENVELOPE_ADJUSTMENT_OFFSET, alpha0[p], alpha1[p], bw_array[g], @@ -1665,8 +1665,8 @@ void ff_sbr_apply(AACContext *ac, SpectralBandReplication *sbr, int id_aac, /* decode channel */ sbr_qmf_analysis(&ac->dsp, &sbr->mdct_ana, &sbr->dsp, ch ? R : L, sbr->data[ch].analysis_filterbank_samples, (float*)sbr->qmf_filter_scratch, - sbr->data[ch].W); -sbr_lf_gen(ac, sbr, sbr->X_low, sbr->data[ch].W); + sbr->data[ch].W, sbr->data[ch].Ypos); +sbr_lf_gen(ac, sbr, sbr->X_low, sbr->data[ch].W, sbr->data[ch].Ypos); sbr->data[ch].Ypos ^= 1; if (sbr->start) { sbr_hf_inverse_filter(&sbr->dsp, sbr->alpha0, sbr->alpha1, sbr->X_low, sbr->k[0]); -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 08/10] x264asm: fix cmp* number of arguments
cmp{p,s}{s,d} instructions do take an imm8 operand. --- libavutil/x86/x86inc.asm |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 52ee46a..3744e46 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -951,10 +951,10 @@ AVX_INSTR blendpd, 1, 0, 0 AVX_INSTR blendps, 1, 0, 0 AVX_INSTR blendvpd, 1, 0, 0 AVX_INSTR blendvps, 1, 0, 0 -AVX_INSTR cmppd, 1, 0, 0 -AVX_INSTR cmpps, 1, 0, 0 -AVX_INSTR cmpsd, 1, 0, 0 -AVX_INSTR cmpss, 1, 0, 0 +AVX_INSTR cmppd, 1, 1, 0 +AVX_INSTR cmpps, 1, 1, 0 +AVX_INSTR cmpsd, 1, 1, 0 +AVX_INSTR cmpss, 1, 1, 0 AVX_INSTR cvtdq2ps, 1, 0, 0 AVX_INSTR cvtps2dq, 1, 0, 0 AVX_INSTR divpd, 1, 0, 0 -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 07/10] SBR DSP x86: implement SSE qmf_deint_bfly
>From 713 to 209 cycles on Penrynn. Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. --- libavcodec/x86/sbrdsp.asm| 31 +++ libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 33 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index 49dd78c..cfbd6e8 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -287,3 +287,34 @@ cglobal sbr_qmf_deint_neg, 2,3,4,v,src,vrev cmpvq, vrevq jl .loop REP_RET + +; sbr_qmf_deint_bfly(float *v, const float *src0, const float *src1) +cglobal sbr_qmf_deint_bfly, 3,5,8, v,src0,src1,vrev,c + movcq, 64*4-2*mmsize + lea vrevq, [vq + 64*4] +.loop: + mova m0, [src0q+cq] + mova m1, [src1q] + mova m4, [src0q+cq+mmsize] + mova m5, [src1q+mmsize] + mova m2, m0 + mova m3, m1 + shufps m2, m2, 11011b + shufps m3, m3, 11011b + mova m6, m4 + mova m7, m5 + shufps m6, m6, 11011b + shufps m7, m7, 11011b + addps m5, m2 + subps m0, m7 + addps m1, m6 + subps m4, m3 + mova [vrevq], m1 + mova [vrevq+mmsize], m5 + mova [vq+cq], m0 + mova [vq+cq+mmsize], m4 + add src1q, 2*mmsize + add vrevq, 2*mmsize + subcq, 2*mmsize + jge .loop + REP_RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index 1ac64aa..5e3e131 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -35,6 +35,7 @@ void ff_sbr_neg_odd_64_sse(float *z); void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z); void ff_sbr_qmf_pre_shuffle_sse(float *z); void ff_sbr_qmf_deint_neg_sse(float *v, const float *src); +void ff_sbr_qmf_deint_bfly_sse(float *v, const float *src0, const float *src1); void ff_sbrdsp_init_x86(SBRDSPContext *s) { @@ -49,5 +50,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse; s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse; s->qmf_deint_neg = ff_sbr_qmf_deint_neg_sse; +s->qmf_deint_bfly = ff_sbr_qmf_deint_bfly_sse; } } -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 02/10] SBR DSP x86: implement SSE sum64x5
698 to 174 cycles on penrynn. Unrolling is a 6 cycles gain. unrol 6 cycles --- libavcodec/x86/sbrdsp.asm| 22 ++ libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 24 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index 039bf8c..11a6faf 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -181,3 +181,25 @@ cglobal sbr_hf_gen, 4,4,8, X_high, X_low, alpha0, alpha1, BW, S, E add start, 16 jnz .loop2 RET + +cglobal sbr_sum64x5, 1,2,4,z + lear1q, [zq+ 256] +.loop: + movam0, [zq+ 0] + movam2, [zq+ 16] + movam1, [zq+ 256] + movam3, [zq+ 272] + addps m0, [zq+ 512] + addps m2, [zq+ 528] + addps m1, [zq+ 768] + addps m3, [zq+ 784] + addps m0, [zq+1024] + addps m2, [zq+1040] + addps m0, m1 + addps m2, m3 + mova [zq], m0 + mova [zq+16], m2 + add zq, 32 + cmp zq, r1q + jne .loop + REP_RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index 51c4bd4..108a681 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -30,6 +30,7 @@ void ff_sbr_hf_g_filt_sse(float (*Y)[2], const float (*X_high)[40][2], void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2], const float alpha0[2], const float alpha1[2], float bw, int start, int end); +void ff_sbr_sum64x5_sse(float *z); void ff_sbrdsp_init_x86(SBRDSPContext *s) { @@ -39,5 +40,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) s->sum_square = ff_sbr_sum_square_sse; s->hf_g_filt = ff_sbr_hf_g_filt_sse; s->hf_gen = ff_sbr_hf_gen_sse; +s->sum64x5= ff_sbr_sum64x5_sse; } } -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 06/10] SBR DSP x86: implement SSE neg_odd_64
>From 210 cycles to 87 on penrynn. Unrolling and not storing mask both save some cycles. --- libavcodec/x86/sbrdsp.asm| 21 + libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index aff6879..49dd78c 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -206,6 +206,26 @@ cglobal sbr_sum64x5, 1,2,4,z jne .loop REP_RET +cglobal sbr_neg_odd_64, 1,2,4,z + lea r1q, [zq+256] +.loop: + mova m0, [zq+ 0] + mova m1, [zq+16] + mova m2, [zq+32] + mova m3, [zq+48] + xorps m0, [ps_mask2] + xorps m1, [ps_mask2] + xorps m2, [ps_mask2] + xorps m3, [ps_mask2] + mova [zq+ 0], m0 + mova [zq+16], m1 + mova [zq+32], m2 + mova [zq+48], m3 + addzq, 64 + cmpzq, r1q + jne .loop + REP_RET + cglobal sbr_qmf_post_shuffle, 2,3,3,W,z lea r2q, [zq + (64-4)*4] .loop: @@ -266,3 +286,4 @@ cglobal sbr_qmf_deint_neg, 2,3,4,v,src,vrev sub srcq, 32 cmpvq, vrevq jl .loop + REP_RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index e70b970..1ac64aa 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -31,6 +31,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2], const float alpha0[2], const float alpha1[2], float bw, int start, int end); void ff_sbr_sum64x5_sse(float *z); +void ff_sbr_neg_odd_64_sse(float *z); void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z); void ff_sbr_qmf_pre_shuffle_sse(float *z); void ff_sbr_qmf_deint_neg_sse(float *v, const float *src); @@ -44,6 +45,7 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) s->hf_g_filt = ff_sbr_hf_g_filt_sse; s->hf_gen = ff_sbr_hf_gen_sse; s->sum64x5= ff_sbr_sum64x5_sse; +s->neg_odd_64 = ff_sbr_neg_odd_64_sse; s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse; s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse; s->qmf_deint_neg = ff_sbr_qmf_deint_neg_sse; -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 05/10] SBR DSP x86: implement SSE qmf_deint_neg
No vector tests it. --- libavcodec/x86/sbrdsp.asm| 19 +++ libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index b9f0709..aff6879 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -247,3 +247,22 @@ cglobal sbr_qmf_pre_shuffle, 1,4,4,z jl .loop movh [r3q-256], m3 REP_RET + +cglobal sbr_qmf_deint_neg, 2,3,4,v,src,vrev + lea vrevq, [vq + (64-4)*4] + add srcq, (64-8)*4 + mova m3, [ps_neg] +.loop: + mova m0, [srcq + 0] + mova m1, [srcq + 16] + mova m2, m1 + shufps m0, m1, 11011101b + shufps m2, m1, 10001000b + xorps m0, m3 + mova [vq], m2 + mova [vrevq], m0 + addvq, 16 + sub vrevq, 16 + sub srcq, 32 + cmpvq, vrevq + jl .loop diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index 5babe62..e70b970 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -33,6 +33,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2], void ff_sbr_sum64x5_sse(float *z); void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z); void ff_sbr_qmf_pre_shuffle_sse(float *z); +void ff_sbr_qmf_deint_neg_sse(float *v, const float *src); void ff_sbrdsp_init_x86(SBRDSPContext *s) { @@ -45,5 +46,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) s->sum64x5= ff_sbr_sum64x5_sse; s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse; s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse; +s->qmf_deint_neg = ff_sbr_qmf_deint_neg_sse; } } -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 04/10] SBR DSP x86: implement SSE qmf_pre_shuffle
>From 253 to 185c. --- libavcodec/x86/sbrdsp.asm| 23 +++ libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index 2b90100..b9f0709 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -224,3 +224,26 @@ cglobal sbr_qmf_post_shuffle, 2,3,3,W,z cmpzq, r2q jl .loop REP_RET + +cglobal sbr_qmf_pre_shuffle, 1,4,4,z + movh m3, [zq] + lea r3q, [zq + 64*4] + lea r2q, [zq + (64-3)*4] + addzq, 4 +.loop: + movu m0, [r2q] + movu m1, [zq ] + xorps m0, [ps_neg] + shufps m0, m0, 0x1B + mova m2, m0 + unpcklps m0, m1 + unpckhps m2, m1 + mova [r3q + 0], m0 + mova [r3q + 16], m2 + add r3q, 32 + sub r2q, 16 + addzq, 16 + cmpzq, r2q + jl .loop + movh [r3q-256], m3 + REP_RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index 3f6dd97..5babe62 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -32,6 +32,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2], float bw, int start, int end); void ff_sbr_sum64x5_sse(float *z); void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z); +void ff_sbr_qmf_pre_shuffle_sse(float *z); void ff_sbrdsp_init_x86(SBRDSPContext *s) { @@ -43,5 +44,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) s->hf_gen = ff_sbr_hf_gen_sse; s->sum64x5= ff_sbr_sum64x5_sse; s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse; +s->qmf_pre_shuffle = ff_sbr_qmf_pre_shuffle_sse; } } -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 03/10] SBR DSP x86: implement SSE qmf_post_shuffle
On penrynn, from 255 to 174c. Unrolling yields no gain. --- libavcodec/x86/sbrdsp.asm| 21 + libavcodec/x86/sbrdsp_init.c |2 ++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index 11a6faf..2b90100 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -24,6 +24,8 @@ SECTION_RODATA ; mask equivalent for multiply by -1.0 1.0 ps_mask times 2 dd 1<<31, 0 +ps_mask2times 2 dd 0, 1<<31 +ps_neg times 4 dd 1<<31 SECTION_TEXT @@ -203,3 +205,22 @@ cglobal sbr_sum64x5, 1,2,4,z cmp zq, r1q jne .loop REP_RET + +cglobal sbr_qmf_post_shuffle, 2,3,3,W,z + lea r2q, [zq + (64-4)*4] +.loop: + mova m0, [r2q] + mova m1, [zq ] + xorps m0, [ps_neg] + shufps m0, m0, 0x1B + mova m2, m0 + unpcklps m0, m1 + unpckhps m2, m1 + mova [Wq + 0], m0 + mova [Wq + 16], m2 + addWq, 32 + sub r2q, 16 + addzq, 16 + cmpzq, r2q + jl .loop + REP_RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index 108a681..3f6dd97 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -31,6 +31,7 @@ void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2], const float alpha0[2], const float alpha1[2], float bw, int start, int end); void ff_sbr_sum64x5_sse(float *z); +void ff_sbr_qmf_post_shuffle_sse(float W[32][2], const float *z); void ff_sbrdsp_init_x86(SBRDSPContext *s) { @@ -41,5 +42,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) s->hf_g_filt = ff_sbr_hf_g_filt_sse; s->hf_gen = ff_sbr_hf_gen_sse; s->sum64x5= ff_sbr_sum64x5_sse; +s->qmf_post_shuffle = ff_sbr_qmf_post_shuffle_sse; } } -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 01/10] SBR DSP x86: implement SSE sbr_hf_gen
Start and end index are multiple of 2, therefore guaranteeing aligned access. Also, this allows to generate 4 floats per loop, keeping the alignment all along. Timing: - 32 bits: 326c -> 172 - 64 bits: 323c -> 156c --- libavcodec/x86/sbrdsp.asm| 73 - libavcodec/x86/sbrdsp_init.c |4 ++ 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/sbrdsp.asm b/libavcodec/x86/sbrdsp.asm index c351de4..039bf8c 100644 --- a/libavcodec/x86/sbrdsp.asm +++ b/libavcodec/x86/sbrdsp.asm @@ -21,8 +21,11 @@ %include "libavutil/x86/x86util.asm" -;SECTION_RODATA -SECTION .text +SECTION_RODATA +; mask equivalent for multiply by -1.0 1.0 +ps_mask times 2 dd 1<<31, 0 + +SECTION_TEXT INIT_XMM sse cglobal sbr_sum_square, 2, 3, 6 @@ -112,3 +115,69 @@ cglobal sbr_hf_g_filt, 5, 6, 5 jnz .loop1 .end: RET + +; static void sbr_hf_gen_c(float (*X_high)[2], const float (*X_low)[2], +; const float alpha0[2], const float alpha1[2], +; float bw, int start, int end) +; +cglobal sbr_hf_gen, 4,4,8, X_high, X_low, alpha0, alpha1, BW, S, E +; load alpha factors +%define bw m0 +%if ARCH_X86_64 == 0 || WIN64 +movss bw, BWm +%endif +movh m2, [alpha1q] +movh m1, [alpha0q] +shufps bw, bw, 0 +mulps m2, bw ; (a1[0] a1[1])*bw +mulps m1, bw ; (a0[0] a0[1])*bw= (a2 a3) +mulps m2, bw ; (a1[0] a1[1])*bw*bw = (a0 a1) +mova m3, m1 +mova m4, m2 +mova m7, [ps_mask] + +; Set pointers +%if ARCH_X86_64 == 0 || WIN64 +; start and end 6th and 7th args on stack +movr2d, Sm +movr3d, Em +%define start r2q +%define end r3q +%else +; BW does not actually occupy a register, so shift by 1 +%define start BWq +%define end Sq +%endif +sub start, end ; neg num of loops +leaX_highq, [X_highq + end*2*4] +lea X_lowq, [X_lowq + end*2*4 - 2*2*4] +shl start, 3 ; offset from num loops + +movam0, [X_lowq + start] +movlhps m1, m1 ; (a2 a3 a2 a3) +movlhps m2, m2 ; (a0 a1 a0 a1) +shufps m3, m3, 00010001b ; (a3 a2 a3 a2) +shufps m4, m4, 00010001b ; (a1 a0 a1 a0) +xorps m3, m7 ; (-a3 a2 -a3 a2) +xorps m4, m7 ; (-a1 a0 -a1 a0) +.loop2: +movam5, m0 +movam6, m0 +shufps m0, m0, 1010b ; {Xl[-2][0],",Xl[-1][0],"} +shufps m5, m5, 0101b ; {Xl[-2][1],",Xl[-1][1],"} +mulps m0, m2 +mulps m5, m4 +movam7, m6 +addps m5, m0 +movam0, [X_lowq + start + 2*2*4] +shufps m6, m0, 1010b ; {Xl[-1][0],",Xl[0][0],"} +shufps m7, m0, 0101b ; {Xl[-1][1],",Xl[1][1],"} +mulps m6, m1 +mulps m7, m3 +addps m5, m6 +addps m7, m0 +addps m5, m7 +mova [X_highq + start], m5 +add start, 16 +jnz .loop2 +RET diff --git a/libavcodec/x86/sbrdsp_init.c b/libavcodec/x86/sbrdsp_init.c index d272896..51c4bd4 100644 --- a/libavcodec/x86/sbrdsp_init.c +++ b/libavcodec/x86/sbrdsp_init.c @@ -27,6 +27,9 @@ float ff_sbr_sum_square_sse(float (*x)[2], int n); void ff_sbr_hf_g_filt_sse(float (*Y)[2], const float (*X_high)[40][2], const float *g_filt, int m_max, intptr_t ixh); +void ff_sbr_hf_gen_sse(float (*X_high)[2], const float (*X_low)[2], + const float alpha0[2], const float alpha1[2], + float bw, int start, int end); void ff_sbrdsp_init_x86(SBRDSPContext *s) { @@ -35,5 +38,6 @@ void ff_sbrdsp_init_x86(SBRDSPContext *s) if (EXTERNAL_SSE(mm_flags)) { s->sum_square = ff_sbr_sum_square_sse; s->hf_g_filt = ff_sbr_hf_g_filt_sse; +s->hf_gen = ff_sbr_hf_gen_sse; } } -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 00/10] Various patches for AAC SBR DSP
Those are mostly x86 SSE asm. First patch is a continuation of thread "[PATCHES] SBR DSP and sbr_hf_gen". Except for hf_apply_noise, they are tested using fate-aac on win32/win64 and linux x86-64. I didn't check under linux x86-32. I couldn't find a vector excercising qmf_deint_neg, and I guess neither did the one who wrote it, as not all the code in the same code block of aacsbr was moved to DSP functions. Christophe Gisquet (10): SBR DSP x86: implement SSE sbr_hf_gen SBR DSP x86: implement SSE sum64x5 SBR DSP x86: implement SSE qmf_post_shuffle SBR DSP x86: implement SSE qmf_pre_shuffle SBR DSP x86: implement SSE qmf_deint_neg SBR DSP x86: implement SSE neg_odd_64 SBR DSP x86: implement SSE qmf_deint_bfly x264asm: fix cmp* number of arguments AAC SBR: avoid a memcpy. SBR DSP x86: implement SSE hf_apply_noise libavcodec/aacsbr.c | 22 ++-- libavcodec/sbrdsp.c |1 - libavcodec/x86/sbrdsp.asm| 303 +- libavcodec/x86/sbrdsp_init.c | 32 + libavutil/x86/x86inc.asm |8 +- 5 files changed, 348 insertions(+), 18 deletions(-) -- 1.7.7.msysgit.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/1] avprobe: report per stream bit rate if set by the decoder
--- avprobe.c | 4 1 file changed, 4 insertions(+) diff --git a/avprobe.c b/avprobe.c index 3a3ae0f..4da9621 100644 --- a/avprobe.c +++ b/avprobe.c @@ -654,6 +654,10 @@ static void show_stream(AVFormatContext *fmt_ctx, int stream_idx) probe_str("avg_frame_rate", rational_string(val_str, sizeof(val_str), "/", &stream->avg_frame_rate)); +if (dec_ctx->bit_rate) +probe_str("bit_rate", + value_string(val_str, sizeof(val_str), + dec_ctx->bit_rate, unit_bit_per_second_str)); probe_str("time_base", rational_string(val_str, sizeof(val_str), "/", &stream->time_base)); -- 1.7.12.4 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCHES] scalarproduct_and_madd_int16 and wma lossless
2012/11/27 Justin Ruggles : > Either make the function name lowercase or make it a macro. Also, if you > leave it as an inline function, put the opening brace on a separate line. Moved to a macro. This was mimicking apedec codec. If the patches are declared ok, someone should check the neon code. -- Christophe 0003-dsputil-allow-scalarproduct_and_madd_int16-to-handle.patch Description: Binary data 0004-wma-lossless-reuse-scalarproduct_and_madd_int16.patch Description: Binary data ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/1] svq3: check and reject negative slice types
Janne Grunau writes: > On 2012-11-29 17:21:45 +, Måns Rullgård wrote: >> Janne Grunau writes: >> >> > On 2012-11-29 13:46:12 +, Måns Rullgård wrote: >> >> Janne Grunau writes: >> >> >> >> > On 2012-11-29 00:08:52 +, Måns Rullgård wrote: >> >> >> Janne Grunau writes: >> >> >> >> >> >> > On 2012-11-28 23:52:27 +0100, Luca Barbato wrote: >> >> >> >> >> >> >> >> Is INVALID_VLC value negative? >> >> >> > >> >> >> > no, 0x8000. But arithmetic conversion saves us. >> >> >> >> >> >> Ouch, that's _really_ bad. The svq3_get_ue_golomb() return type is >> >> >> (inexplicably) int, so returning that value entails a conversion with >> >> >> implementation-defined behaviour. Most compilers leave the bits intact >> >> >> in such conversions, but I'd rather not depend on it. Also, such code >> >> >> is nothing short of obfuscated even if it does work reliably. >> >> > >> >> > 6.3.1.3 reads to me as if signed int to unsigned int conversion is well >> >> > defined: >> >> >> >> Yes, but we're dealing with unsigned to signed here. The integer >> >> constant 0x8000 has type unsigned int (if int is 32-bit). Returning >> >> this from svq3_get_ue_golomb() as a signed int invokes an >> >> implementation-defined conversion. >> > >> > svq3_get_ue_golomb() doesn't return INVALID_VLC explicitly. >> >> Right, I confused it with svq3_get_se_golomb(). svq3_get_ue_golomb() is >> actually worse. >> >> > The only it can return something negative is due to signed arithmetic >> > on the int ret in the else branch. >> >> The function has two return statements. The first returns a value from >> ff_interleaved_ue_golomb_vlc_code[] (array of uint8_t), so this one is >> safe. The other returns "ret - 1" where ret is a signed int. For this >> to produce a negative value other than -1, ret must itself be negative. >> This can only happen through the left shifts in the loop overflowing >> into the sign bit. Such an overflow has *undefined* behaviour. > > returning -1 relies on undefined behaviour too. ret is initialized to 1 > and the only operations are left shifts and bitwise ORs. Quite so. >> input can cause this to happen (and your patch suggests this is the >> case), the code is broken and must be fixed here. Checking it after the >> fact is not good enough. > > I see no reason why the computation in svq3_get_ue_golomb() can't be > changed to unsigned and at least make fate agrees. > > The only decision to be made is whether to detect truncation/invalid > codes or keep reading until it is properly terminated. Detecting errors as soon as possible might be more robust, but simply reading until it naturally terminates is probably faster. -- Måns Rullgård m...@mansr.com ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/1] svq3: check and reject negative slice types
On 2012-11-29 17:21:45 +, Måns Rullgård wrote: > Janne Grunau writes: > > > On 2012-11-29 13:46:12 +, Måns Rullgård wrote: > >> Janne Grunau writes: > >> > >> > On 2012-11-29 00:08:52 +, Måns Rullgård wrote: > >> >> Janne Grunau writes: > >> >> > >> >> > On 2012-11-28 23:52:27 +0100, Luca Barbato wrote: > >> >> >> > >> >> >> Is INVALID_VLC value negative? > >> >> > > >> >> > no, 0x8000. But arithmetic conversion saves us. > >> >> > >> >> Ouch, that's _really_ bad. The svq3_get_ue_golomb() return type is > >> >> (inexplicably) int, so returning that value entails a conversion with > >> >> implementation-defined behaviour. Most compilers leave the bits intact > >> >> in such conversions, but I'd rather not depend on it. Also, such code > >> >> is nothing short of obfuscated even if it does work reliably. > >> > > >> > 6.3.1.3 reads to me as if signed int to unsigned int conversion is well > >> > defined: > >> > >> Yes, but we're dealing with unsigned to signed here. The integer > >> constant 0x8000 has type unsigned int (if int is 32-bit). Returning > >> this from svq3_get_ue_golomb() as a signed int invokes an > >> implementation-defined conversion. > > > > svq3_get_ue_golomb() doesn't return INVALID_VLC explicitly. > > Right, I confused it with svq3_get_se_golomb(). svq3_get_ue_golomb() is > actually worse. > > > The only it can return something negative is due to signed arithmetic > > on the int ret in the else branch. > > The function has two return statements. The first returns a value from > ff_interleaved_ue_golomb_vlc_code[] (array of uint8_t), so this one is > safe. The other returns "ret - 1" where ret is a signed int. For this > to produce a negative value other than -1, ret must itself be negative. > This can only happen through the left shifts in the loop overflowing > into the sign bit. Such an overflow has *undefined* behaviour. returning -1 relies on undefined behaviour too. ret is initialized to 1 and the only operations are left shifts and bitwise ORs. > input can cause this to happen (and your patch suggests this is the > case), the code is broken and must be fixed here. Checking it after the > fact is not good enough. I see no reason why the computation in svq3_get_ue_golomb() can't be changed to unsigned and at least make fate agrees. The only decision to be made is whether to detect truncation/invalid codes or keep reading until it is properly terminated. Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/2] lavr: temporarily store custom matrix in AVAudioResampleContext
On Thu, 29 Nov 2012 23:36:05 -0500, Justin Ruggles wrote: > This allows AudioMix to be treated the same way as other conversion contexts > and removes the requirement to allocate it at the same time as the > AVAudioResampleContext. > > The current matrix get/set functions are split between the public interface > and AudioMix private functions. > --- > Updated patch also moves the AudioMix definition to audio_mix.c since > none of the fields need to be accessed outside of that file. > > libavresample/audio_mix.c| 186 > -- > libavresample/audio_mix.h| 47 -- > libavresample/audio_mix_matrix.c | 112 --- > libavresample/internal.h |7 ++ > libavresample/options.c |7 -- > libavresample/utils.c| 79 +++-- > 6 files changed, 257 insertions(+), 181 deletions(-) > > diff --git a/libavresample/audio_mix.c b/libavresample/audio_mix.c > index dd2f33d..ad68b7a 100644 > --- a/libavresample/audio_mix.c > +++ b/libavresample/audio_mix.c > @@ -28,6 +28,29 @@ > #include "audio_data.h" > #include "audio_mix.h" > > +struct AudioMix { > +AVAudioResampleContext *avr; > +enum AVSampleFormat fmt; > +enum AVMixCoeffType coeff_type; > +uint64_t in_layout; > +uint64_t out_layout; > +int in_channels; > +int out_channels; > + > +int ptr_align; > +int samples_align; > +int has_optimized_func; > +const char *func_descr; > +const char *func_descr_generic; > +mix_func *mix; > +mix_func *mix_generic; > + > +int16_t *matrix_q8[AVRESAMPLE_MAX_CHANNELS]; > +int32_t *matrix_q15[AVRESAMPLE_MAX_CHANNELS]; > +float *matrix_flt[AVRESAMPLE_MAX_CHANNELS]; > +void **matrix; > +}; > + > static const char *coeff_type_names[] = { "q8", "q15", "flt" }; > > void ff_audio_mix_set_func(AudioMix *am, enum AVSampleFormat fmt, > @@ -302,27 +325,37 @@ static int mix_function_init(AudioMix *am) > return 0; > } > > -int ff_audio_mix_init(AVAudioResampleContext *avr) > +AudioMix *ff_audio_mix_alloc(AVAudioResampleContext *avr) > { > +AudioMix *am; > int ret; > > +am = av_mallocz(sizeof(*am)); > +if (!am) > +return NULL; > +am->avr = avr; > + > if (avr->internal_sample_fmt != AV_SAMPLE_FMT_S16P && > avr->internal_sample_fmt != AV_SAMPLE_FMT_FLTP) { > av_log(avr, AV_LOG_ERROR, "Unsupported internal format for " > "mixing: %s\n", > av_get_sample_fmt_name(avr->internal_sample_fmt)); > -return AVERROR(EINVAL); > +goto error; > } > > +am->fmt = avr->internal_sample_fmt; > +am->coeff_type = avr->mix_coeff_type; > +am->in_layout= avr->in_channel_layout; > +am->out_layout = avr->out_channel_layout; > +am->in_channels = avr->in_channels; > +am->out_channels = avr->out_channels; > + > /* build matrix if the user did not already set one */ > -if (avr->am->matrix) { > -if (avr->am->coeff_type != avr->mix_coeff_type || > -avr->am->in_layout != avr->in_channel_layout || > -avr->am->out_layout != avr->out_channel_layout) { > -av_log(avr, AV_LOG_ERROR, > - "Custom matrix does not match current parameters\n"); > -return AVERROR(EINVAL); > -} > +if (avr->mix_matrix) { > +ret = ff_audio_mix_set_matrix(am, avr->mix_matrix, avr->in_channels); > +if (ret < 0) > +goto error; > +av_freep(&avr->mix_matrix); > } else { > int i, j; > char in_layout_name[128]; > @@ -330,7 +363,7 @@ int ff_audio_mix_init(AVAudioResampleContext *avr) > double *matrix_dbl = av_mallocz(avr->out_channels * avr->in_channels > * > sizeof(*matrix_dbl)); > if (!matrix_dbl) > -return AVERROR(ENOMEM); > +goto error; > > ret = avresample_build_matrix(avr->in_channel_layout, >avr->out_channel_layout, > @@ -343,7 +376,7 @@ int ff_audio_mix_init(AVAudioResampleContext *avr) >avr->matrix_encoding); > if (ret < 0) { > av_free(matrix_dbl); > -return ret; > +goto error; > } > > av_get_channel_layout_string(in_layout_name, sizeof(in_layout_name), > @@ -360,32 +393,33 @@ int ff_audio_mix_init(AVAudioResampleContext *avr) > av_log(avr, AV_LOG_DEBUG, "\n"); > } > > -ret = avresample_set_matrix(avr, matrix_dbl, avr->in_channels); > +ret = ff_audio_mix_set_matrix(am, matrix_dbl, avr->in_channels); > if (ret < 0) { > av_free(matrix_dbl); > -return ret; > +goto error; > } > av_free(matrix_dbl); > } > > -avr->am->fmt = avr->internal_samp
Re: [libav-devel] [PATCH 1/2] lavr: clarify documentation for avresample_get/set_matrix()
On Thu, 29 Nov 2012 15:08:04 -0500, Justin Ruggles wrote: > --- > libavresample/avresample.h |6 +- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/libavresample/avresample.h b/libavresample/avresample.h > index affeeeb..a73d686 100644 > --- a/libavresample/avresample.h > +++ b/libavresample/avresample.h > @@ -216,6 +216,9 @@ int avresample_build_matrix(uint64_t in_layout, uint64_t > out_layout, > /** > * Get the current channel mixing matrix. > * > + * If no custom matrix has been previously set or the AVAudioResampleContext > is > + * not open, an error is returned. Ok > + * > * @param avr audio resample context > * @param matrix mixing coefficients; matrix[i + stride * o] is the weight > of > *input channel i in output channel o. > @@ -231,7 +234,8 @@ int avresample_get_matrix(AVAudioResampleContext *avr, > double *matrix, > * Allows for setting a custom mixing matrix, overriding the default matrix > * generated internally during avresample_open(). This function can be called > * anytime on an allocated context, either before or after calling > - * avresample_open(). avresample_convert() always uses the current matrix. > + * avresample_open(), as long as the channel layouts have been set. > + * avresample_convert() always uses the current matrix. Why bother mentioning this explicitly? If the channel layouts are not set, avresample_open() will fail and avresample_convert() cannot be called at all. -- Anton Khirnov ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel