Re: [libav-devel] [PATCH] swscale: implement scale19to15 in sse2.

2011-07-23 Thread Loren Merritt
On Sat, 23 Jul 2011, Jason Garrett-Glaser wrote:
> On Sat, Jul 23, 2011 at 8:11 PM, Ronald S. Bultje  wrote:
>> On Sat, Jul 23, 2011 at 7:32 PM, Jason Garrett-Glaser  wrote:
>>> On Sat, Jul 23, 2011 at 6:52 PM, Ronald S. Bultje  
>>> wrote:
>>>
 +cglobal scale19To15Fw_sse2, 3, 4, 0
 + � �xor � � � � � r3, r3
 +.loop:
 + � �movdqu � � � �m0, [r1+r3*4+ 0]
 + � �movdqu � � � �m1, [r1+r3*4+16]
 + � �psrad � � � � m0, 4
 + � �psrad � � � � m1, 4
 + � �packssdw � � �m0, m1
 + � �movdqu [r0+r3*2], m0
 + � �add � � � � � r3, 4
 + � �cmp � � � � �r3d, r2d
 + � �jl .loop
 + � �REP_RET
>>>
>>> This is going to be load-bottlenecked. �Why can't we guarantee the
>>> inputs or outputs are aligned?
>>
>> Good question, I've asked that before and got no answer. It seems to
>> have to do with special-case conditions where ( blah blah ) a crop can
>> be done inline and that must destroy the general system.
>>
>> I suppose I could make two versions, one movdqa and one movdqu,
>> depending on alignment of the src/dst, and test that per-iteration.
>> Would that work?
>
> Wouldn't that be 4 versions for all 4 permutations of the possibilities?

The load can always be aligned, if you don't mind fixing up the ends
outside the loop.

--Loren Merritt___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 2/2] swscale: implement SSE2 version for 9/10-bit horizontal scaling.

2011-07-23 Thread Loren Merritt
On Sat, 23 Jul 2011, Ronald S. Bultje wrote:

> +%macro SCALE_16BITS_FUNC 1
> +INIT_XMM
> +cglobal hscale%1_sse2, 7, 7, 5
> +%ifdef ARCH_X86_64
> +movsxdr2, r2d
> +%endif
> +movdqam2, [max_19bit]
> +cmp  r6d, 4
> +je  .scale4
> +cmp  r6d, 8
> +je  .scale8
> +jmp .scaleX

cmp r6d, 8
je .scale8
jg .scaleX

> +.scale4:
> +lea   r1, [r1+r2*4]
> +lea   r4, [r4+r2*8]
> +lea   r5, [r5+r2*2]
> +neg   r2
> +.loop4:
> +movsx r0, word [r5+r2*2+0]  ; filterPos[0]
> +movsx r6, word [r5+r2*2+4]  ; filterPos[2]
> +movq  m0, [r3+r0*2] ; src[filterPos[0] + {0,1,2,3}]
> +movhpsm0, [r3+r6*2] ; src[filterPos[2] + {0,1,2,3}]
> +movsx r0, word [r5+r2*2+2]  ; filterPos[1]
> +movsx r6, word [r5+r2*2+6]  ; filterPos[3]
> +movq  m3, [r3+r0*2] ; src[filterPos[1] + {0,1,2,3}]
> +movhpsm3, [r3+r6*2] ; src[filterPos[3] + {0,1,2,3}]
> +movq  m1, [r4+r2*8+ 0]  ; filter[{0,1,..,6,7}]
> +movhpsm1, [r4+r2*8+16]  ; filter[{0,1,..,6,7}]
> +movq  m4, [r4+r2*8+ 8]  ; filter[{8,9,..,14,15}]
> +movhpsm4, [r4+r2*8+24]  ; filter[{8,9,..,14,15}]

Can you make these consecutive?

> +pmaddwd   m0, m1
> +pmaddwd   m3, m4
> +pshufdm1, m0, 00110001b
> +pshufdm4, m3, 1000b
> +paddd m0, m1
> +paddd m3, m4
> +pblendw   m0, m3, 11001100b ; filter[{ 0, 1, 2, 
> 3}]*src[filterPos[0]+{0,1,2,3}],
> +; filter[{ 4, 5, 6, 
> 7}]*src[filterPos[1]+{0,1,2,3}],
> +; filter[{ 8, 
> 9,10,11}]*src[filterPos[2]+{0,1,2,3}],
> +; 
> filter[{12,13,14,15}]*src[filterPos[3]+{0,1,2,3}]

pblendw is sse4

> +psrad m0, %1 - 5
> +pminsdm0, m2

You only clamp one end of the range?

> +movdqa [r1+r2*4], m0
> +add   r2, 4
> +jl .loop4
> +REP_RET
> +
> +.scale8:
> +shl   r2, 1 ; this allows *16 (i.e. now *8) in 
> lea instructions
> +lea   r1, [r1+r2*2]
> +lea   r4, [r4+r2*8]
> +lea   r5, [r5+r2*1]
> +neg   r2
> +.loop8:
> +movsx r0, word [r5+r2*1+0]  ; filterPos[0]
> +movsx r6, word [r5+r2*1+2]  ; filterPos[1]
> +movdqum0, [r3+r0*2] ; src[filterPos[0] + 
> {0,1,2,3,4,5,6,7}]
> +movdqum3, [r3+r6*2] ; src[filterPos[1] + 
> {0,1,2,3,4,5,6,7}]
> +movdqam1, [r4+r2*8   ]  ; filter[{0,1,..,6,7}]
> +movdqam4, [r4+r2*8+16]  ; filter[{8,9,..,14,15}]
> +pmaddwd   m0, m1
> +pmaddwd   m3, m4

memory arg

> +pshufdm1, m0, 1110b
> +pshufdm4, m3, 1110b
> +paddd m0, m1
> +paddd m3, m4
> +punpckldq m0, m3
> +pshufdm1, m0, 1110b
> +paddd m0, m1; filter[{0,1,..., 6, 
> 7}]*src[filterPos[0]+{0,1,...,6,7}],
> +; 
> filter[{8,9,...,14,15}]*src[filterPos[1]+{0,1,...,6,7}],
> +psrad m0, %1 - 5
> +pminsdm0, m2
> +movq   [r1+r2*2], m0
> +add   r2, 4 ; it really only does 2px, see "shl 
> r2,1" above
> +jl .loop8
> +REP_RET
> +
> +.scaleX:
> +%ifdef ARCH_X86_64
> +movsxdr6, r6d
> +%endif
> +lea  r11, [r3+r6*2] ; &src[filterSize]

r11 on x86_32?

> +mov  r6m, r11

Keep it in a register.

> +lea   r5, [r5+r2*2]
> +lea   r1, [r1+r2*4]
> +neg   r2
> +.loopX:
> +movsx r0, word [r5+r2*2+0]  ; filterPos[0]
> +movsxr10, word [r5+r2*2+2]  ; filterPos[1]
> +pxor  m3, m3
> +mov  r11, r3
> +.innerloopX:
> +movq  m0, [r11+r0 *2]   ; src[filterPos[0] + {0,1,2,3}]
> +movhpsm0, [r11+r10*2]   ; src[filterPos[1] + {0,1,2,3}]
> +movq  m1, [r4 ] ; filter[{0,1,2,3}]
> +movhpsm1, [r4+r6*2] ; filter[filtersize+{0,1,2,3}]
> +pmaddwd   m0, m1
> +paddd m3, m0
> +add   r4, 8
> +add  r11, 8
> +cmp  r11, r6m   ; while (src += 4) < &src[filterSize]
> +jl .innerloopX
> +lea   r4, [r4+r6*2]
> +pshufdm0, m3, 00110001b
> +paddd m0, m3
> +pshufdm0, m0, 1000b

Would be lower latency if you did the 2 shuffles in parallel.

> +psrad m0, %1 - 5
> +pminsdm0, m2
> +movq   [r1+r2*4], m0
> +add   r2, 2
> +jl .loopX
> +RET
> +%endmacro

--Loren Merritt
___
libav-devel mailing list
libav-devel@libav.org
https://list

Re: [libav-devel] [PATCH] swscale: implement scale19to15 in sse2.

2011-07-23 Thread Jason Garrett-Glaser
On Sat, Jul 23, 2011 at 8:11 PM, Ronald S. Bultje  wrote:
> Hi,
>
> On Sat, Jul 23, 2011 at 7:32 PM, Jason Garrett-Glaser  wrote:
>> On Sat, Jul 23, 2011 at 6:52 PM, Ronald S. Bultje  wrote:
>>> From 4.89k cycles to 1.23k cycles on a 1980-pixel-wide movie, i.e.
>>> ~4x speedup.
>>> ---
>>>  libswscale/x86/scale.asm     |   20 
>>>  libswscale/x86/swscale_mmx.c |    2 ++
>>>  2 files changed, 22 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/libswscale/x86/scale.asm b/libswscale/x86/scale.asm
>>> index e200801..c3a9a75 100644
>>> --- a/libswscale/x86/scale.asm
>>> +++ b/libswscale/x86/scale.asm
>>> @@ -153,3 +153,23 @@ cglobal hscale%1_sse2, 7, 7, 5
>>>
>>>  SCALE_16BITS_FUNC  9
>>>  SCALE_16BITS_FUNC 10
>>> +
>>> +;-
>>> +; cut off last 4 bits of a line of pixels, and convert from 32 to 16 bits
>>> +;
>>> +; void scale19To15Fw_sse2(int16_t *dst, int32_t *src, int len);
>>> +;-
>>> +
>>> +cglobal scale19To15Fw_sse2, 3, 4, 0
>>> +    xor           r3, r3
>>> +.loop:
>>> +    movdqu        m0, [r1+r3*4+ 0]
>>> +    movdqu        m1, [r1+r3*4+16]
>>> +    psrad         m0, 4
>>> +    psrad         m1, 4
>>> +    packssdw      m0, m1
>>> +    movdqu [r0+r3*2], m0
>>> +    add           r3, 4
>>> +    cmp          r3d, r2d
>>> +    jl .loop
>>> +    REP_RET
>>
>> This is going to be load-bottlenecked.  Why can't we guarantee the
>> inputs or outputs are aligned?
>
> Good question, I've asked that before and got no answer. It seems to
> have to do with special-case conditions where ( blah blah ) a crop can
> be done inline and that must destroy the general system.
>
> I suppose I could make two versions, one movdqa and one movdqu,
> depending on alignment of the src/dst, and test that per-iteration.
> Would that work?

Wouldn't that be 4 versions for all 4 permutations of the possibilities?

Jason
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] swscale: implement scale19to15 in sse2.

2011-07-23 Thread Ronald S. Bultje
Hi,

On Sat, Jul 23, 2011 at 7:32 PM, Jason Garrett-Glaser  wrote:
> On Sat, Jul 23, 2011 at 6:52 PM, Ronald S. Bultje  wrote:
>> From 4.89k cycles to 1.23k cycles on a 1980-pixel-wide movie, i.e.
>> ~4x speedup.
>> ---
>>  libswscale/x86/scale.asm     |   20 
>>  libswscale/x86/swscale_mmx.c |    2 ++
>>  2 files changed, 22 insertions(+), 0 deletions(-)
>>
>> diff --git a/libswscale/x86/scale.asm b/libswscale/x86/scale.asm
>> index e200801..c3a9a75 100644
>> --- a/libswscale/x86/scale.asm
>> +++ b/libswscale/x86/scale.asm
>> @@ -153,3 +153,23 @@ cglobal hscale%1_sse2, 7, 7, 5
>>
>>  SCALE_16BITS_FUNC  9
>>  SCALE_16BITS_FUNC 10
>> +
>> +;-
>> +; cut off last 4 bits of a line of pixels, and convert from 32 to 16 bits
>> +;
>> +; void scale19To15Fw_sse2(int16_t *dst, int32_t *src, int len);
>> +;-
>> +
>> +cglobal scale19To15Fw_sse2, 3, 4, 0
>> +    xor           r3, r3
>> +.loop:
>> +    movdqu        m0, [r1+r3*4+ 0]
>> +    movdqu        m1, [r1+r3*4+16]
>> +    psrad         m0, 4
>> +    psrad         m1, 4
>> +    packssdw      m0, m1
>> +    movdqu [r0+r3*2], m0
>> +    add           r3, 4
>> +    cmp          r3d, r2d
>> +    jl .loop
>> +    REP_RET
>
> This is going to be load-bottlenecked.  Why can't we guarantee the
> inputs or outputs are aligned?

Good question, I've asked that before and got no answer. It seems to
have to do with special-case conditions where ( blah blah ) a crop can
be done inline and that must destroy the general system.

I suppose I could make two versions, one movdqa and one movdqu,
depending on alignment of the src/dst, and test that per-iteration.
Would that work?

Ronald
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] swscale: implement scale19to15 in sse2.

2011-07-23 Thread Jason Garrett-Glaser
On Sat, Jul 23, 2011 at 6:52 PM, Ronald S. Bultje  wrote:
> From 4.89k cycles to 1.23k cycles on a 1980-pixel-wide movie, i.e.
> ~4x speedup.
> ---
>  libswscale/x86/scale.asm     |   20 
>  libswscale/x86/swscale_mmx.c |    2 ++
>  2 files changed, 22 insertions(+), 0 deletions(-)
>
> diff --git a/libswscale/x86/scale.asm b/libswscale/x86/scale.asm
> index e200801..c3a9a75 100644
> --- a/libswscale/x86/scale.asm
> +++ b/libswscale/x86/scale.asm
> @@ -153,3 +153,23 @@ cglobal hscale%1_sse2, 7, 7, 5
>
>  SCALE_16BITS_FUNC  9
>  SCALE_16BITS_FUNC 10
> +
> +;-
> +; cut off last 4 bits of a line of pixels, and convert from 32 to 16 bits
> +;
> +; void scale19To15Fw_sse2(int16_t *dst, int32_t *src, int len);
> +;-
> +
> +cglobal scale19To15Fw_sse2, 3, 4, 0
> +    xor           r3, r3
> +.loop:
> +    movdqu        m0, [r1+r3*4+ 0]
> +    movdqu        m1, [r1+r3*4+16]
> +    psrad         m0, 4
> +    psrad         m1, 4
> +    packssdw      m0, m1
> +    movdqu [r0+r3*2], m0
> +    add           r3, 4
> +    cmp          r3d, r2d
> +    jl .loop
> +    REP_RET

This is going to be load-bottlenecked.  Why can't we guarantee the
inputs or outputs are aligned?

Jason
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH] swscale: implement scale19to15 in sse2.

2011-07-23 Thread Ronald S. Bultje
>From 4.89k cycles to 1.23k cycles on a 1980-pixel-wide movie, i.e.
~4x speedup.
---
 libswscale/x86/scale.asm |   20 
 libswscale/x86/swscale_mmx.c |2 ++
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/libswscale/x86/scale.asm b/libswscale/x86/scale.asm
index e200801..c3a9a75 100644
--- a/libswscale/x86/scale.asm
+++ b/libswscale/x86/scale.asm
@@ -153,3 +153,23 @@ cglobal hscale%1_sse2, 7, 7, 5
 
 SCALE_16BITS_FUNC  9
 SCALE_16BITS_FUNC 10
+
+;-
+; cut off last 4 bits of a line of pixels, and convert from 32 to 16 bits
+;
+; void scale19To15Fw_sse2(int16_t *dst, int32_t *src, int len);
+;-
+
+cglobal scale19To15Fw_sse2, 3, 4, 0
+xor   r3, r3
+.loop:
+movdqum0, [r1+r3*4+ 0]
+movdqum1, [r1+r3*4+16]
+psrad m0, 4
+psrad m1, 4
+packssdw  m0, m1
+movdqu [r0+r3*2], m0
+add   r3, 4
+cmp  r3d, r2d
+jl .loop
+REP_RET
diff --git a/libswscale/x86/swscale_mmx.c b/libswscale/x86/swscale_mmx.c
index d9995cd..f3abeab 100644
--- a/libswscale/x86/swscale_mmx.c
+++ b/libswscale/x86/swscale_mmx.c
@@ -184,6 +184,7 @@ extern void ff_hscale9_sse2 (SwsContext *c, int16_t *_dst,
  int dstW, const uint8_t *_src,
  const int16_t *filter,
  const int16_t *filterPos, int filterSize);
+extern void ff_scale19To15Fw_sse2(int16_t *dst, const int32_t *src, int len);
 
 void ff_sws_init_swScale_mmx(SwsContext *c)
 {
@@ -206,6 +207,7 @@ void ff_sws_init_swScale_mmx(SwsContext *c)
 c->hScale = ff_hscale10_sse2;
 }
 }
+c->scale19To15Fw = ff_scale19To15Fw_sse2;
 }
 #endif
 }
-- 
1.7.2.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH 1/2] Move x86inc.asm to libavutil/.

2011-07-23 Thread Daniel Kang
On Sat, Jul 23, 2011 at 9:36 PM, Ronald S. Bultje wrote:

> This allows using it in libswscale/ also.
> ---
>  libavcodec/x86/ac3dsp.asm   |2 +-
>  libavcodec/x86/dct32_sse.asm|2 +-
>  libavcodec/x86/deinterlace.asm  |2 +-
>  libavcodec/x86/dsputil_yasm.asm |2 +-
>  libavcodec/x86/dsputilenc_yasm.asm  |2 +-
>  libavcodec/x86/fft_mmx.asm  |2 +-
>  libavcodec/x86/fmtconvert.asm   |2 +-
>  libavcodec/x86/h264_chromamc.asm|2 +-
>  libavcodec/x86/h264_chromamc_10bit.asm  |2 +-
>  libavcodec/x86/h264_deblock.asm |2 +-
>  libavcodec/x86/h264_deblock_10bit.asm   |2 +-
>  libavcodec/x86/h264_idct.asm|2 +-
>  libavcodec/x86/h264_idct_10bit.asm  |2 +-
>  libavcodec/x86/h264_intrapred.asm   |2 +-
>  libavcodec/x86/h264_intrapred_10bit.asm |2 +-
>  libavcodec/x86/h264_qpel_10bit.asm  |2 +-
>  libavcodec/x86/h264_weight.asm  |2 +-
>  libavcodec/x86/h264_weight_10bit.asm|2 +-
>  libavcodec/x86/vc1dsp_yasm.asm  |2 +-
>  libavcodec/x86/vp3dsp.asm   |2 +-
>  libavcodec/x86/vp56dsp.asm  |2 +-
>  libavcodec/x86/vp8dsp.asm   |2 +-
>  libavcodec/x86/x86inc.asm   |  905
> ---
>  libavutil/x86/x86inc.asm|  905
> +++
>  24 files changed, 927 insertions(+), 927 deletions(-)
>  delete mode 100644 libavcodec/x86/x86inc.asm
>  create mode 100644 libavutil/x86/x86inc.asm
>

Ok if you've tested to make sure nothing breaks.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [PATCH 2/2] swscale: implement SSE2 version for 9/10-bit horizontal scaling.

2011-07-23 Thread Ronald S. Bultje
Speed: ~340k cycles -> ~59k cycles for a 1920-pixel wide movie, i.e.
around 5.8x faster.
---
 libswscale/Makefile  |1 +
 libswscale/x86/scale.asm |  155 ++
 libswscale/x86/swscale_mmx.c |   22 ++
 3 files changed, 178 insertions(+), 0 deletions(-)
 create mode 100644 libswscale/x86/scale.asm

diff --git a/libswscale/Makefile b/libswscale/Makefile
index 57e867a..88eb6e2 100644
--- a/libswscale/Makefile
+++ b/libswscale/Makefile
@@ -16,6 +16,7 @@ OBJS-$(HAVE_ALTIVEC)   +=  ppc/swscale_altivec.o\
 OBJS-$(HAVE_MMX)   +=  x86/rgb2rgb.o\
x86/swscale_mmx.o\
x86/yuv2rgb_mmx.o
+OBJS-$(HAVE_YASM)  +=  x86/scale.o
 OBJS-$(HAVE_VIS)   +=  sparc/yuv2rgb_vis.o
 
 TESTPROGS = colorspace swscale
diff --git a/libswscale/x86/scale.asm b/libswscale/x86/scale.asm
new file mode 100644
index 000..e200801
--- /dev/null
+++ b/libswscale/x86/scale.asm
@@ -0,0 +1,155 @@
+;**
+;* x86-optimized horizontal line scaling functions
+;* Copyright (c) 2011 Ronald S. Bultje 
+;*
+;* This file is part of Libav.
+;*
+;* Libav is free software; you can redistribute it and/or
+;* modify it under the terms of the GNU Lesser General Public
+;* License as published by the Free Software Foundation; either
+;* version 2.1 of the License, or (at your option) any later version.
+;*
+;* Libav is distributed in the hope that it will be useful,
+;* but WITHOUT ANY WARRANTY; without even the implied warranty of
+;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;* Lesser General Public License for more details.
+;*
+;* You should have received a copy of the GNU Lesser General Public
+;* License along with Libav; if not, write to the Free Software
+;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+;**
+
+%include "libavutil/x86/x86inc.asm"
+
+SECTION_RODATA
+
+max_19bit: times 4 dd (1 << 19) - 1
+
+SECTION .text
+
+;-
+; 16-bit horizontal line scaling
+;
+; static void hscale_c(SwsContext *c, int32_t *_dst,
+; int dstW, const uint16_t *_src,
+; const int16_t *filter,
+; const int16_t *filterPos, int filterSize)
+;-
+
+%macro SCALE_16BITS_FUNC 1
+INIT_XMM
+cglobal hscale%1_sse2, 7, 7, 5
+%ifdef ARCH_X86_64
+movsxdr2, r2d
+%endif
+movdqam2, [max_19bit]
+cmp  r6d, 4
+je  .scale4
+cmp  r6d, 8
+je  .scale8
+jmp .scaleX
+
+.scale4:
+lea   r1, [r1+r2*4]
+lea   r4, [r4+r2*8]
+lea   r5, [r5+r2*2]
+neg   r2
+.loop4:
+movsx r0, word [r5+r2*2+0]  ; filterPos[0]
+movsx r6, word [r5+r2*2+4]  ; filterPos[2]
+movq  m0, [r3+r0*2] ; src[filterPos[0] + {0,1,2,3}]
+movhpsm0, [r3+r6*2] ; src[filterPos[2] + {0,1,2,3}]
+movsx r0, word [r5+r2*2+2]  ; filterPos[1]
+movsx r6, word [r5+r2*2+6]  ; filterPos[3]
+movq  m3, [r3+r0*2] ; src[filterPos[1] + {0,1,2,3}]
+movhpsm3, [r3+r6*2] ; src[filterPos[3] + {0,1,2,3}]
+movq  m1, [r4+r2*8+ 0]  ; filter[{0,1,..,6,7}]
+movhpsm1, [r4+r2*8+16]  ; filter[{0,1,..,6,7}]
+movq  m4, [r4+r2*8+ 8]  ; filter[{8,9,..,14,15}]
+movhpsm4, [r4+r2*8+24]  ; filter[{8,9,..,14,15}]
+pmaddwd   m0, m1
+pmaddwd   m3, m4
+pshufdm1, m0, 00110001b
+pshufdm4, m3, 1000b
+paddd m0, m1
+paddd m3, m4
+pblendw   m0, m3, 11001100b ; filter[{ 0, 1, 2, 
3}]*src[filterPos[0]+{0,1,2,3}],
+; filter[{ 4, 5, 6, 
7}]*src[filterPos[1]+{0,1,2,3}],
+; filter[{ 8, 
9,10,11}]*src[filterPos[2]+{0,1,2,3}],
+; 
filter[{12,13,14,15}]*src[filterPos[3]+{0,1,2,3}]
+psrad m0, %1 - 5
+pminsdm0, m2
+movdqa [r1+r2*4], m0
+add   r2, 4
+jl .loop4
+REP_RET
+
+.scale8:
+shl   r2, 1 ; this allows *16 (i.e. now *8) in lea 
instructions
+lea   r1, [r1+r2*2]
+lea   r4, [r4+r2*8]
+lea   r5, [r5+r2*1]
+neg   r2
+.loop8:
+movsx r0, word [r5+r2*1+0]  ; filterPos[0]
+movsx r6, word [r5+r2*1+2]  ; filterPos[1]
+movdqum0, [r3+r0*2] ; src[filterPos[0] + {0,1,2,3,4,5,6,7}]
+movdqum3, [r3+r6*2] ; src[filterPos[1] + {0,1,2,3,4,5,6,7

Re: [libav-devel] [RFC] Make Bink IDCT take 32-bit coefficients as input

2011-07-23 Thread Kostya
On Sat, Jul 23, 2011 at 07:11:09PM +0100, Måns Rullgård wrote:
> Kostya  writes:
> 
> >> >> > diff --git a/libavcodec/binkdsp.c b/libavcodec/binkdsp.c
> >> >> > new file mode 100644
> >> >> > index 000..456d71d
> >> >> > --- /dev/null
> >> >> > +++ b/libavcodec/binkdsp.c
> >> >> 
> >> >> [...]
> >> >> 
> >> >> > +static void bink_clear_block_c(int32_t *block)
> >> >> > +{
> >> >> > +memset(block, 0, sizeof(*block) * 64);
> >> >> > +}
> >> >> > +
> >> >> > +static void bink_put_block_c(const int32_t *block, uint8_t *restrict 
> >> >> > pixels)
> >> >> > +{
> >> >> > +int i;
> >> >> > +
> >> >> > +for (i = 0; i < 8; i++) {
> >> >> > +pixels[0] = block[0];
> >> >> > +pixels[1] = block[1];
> >> >> > +pixels[2] = block[2];
> >> >> > +pixels[3] = block[3];
> >> >> > +pixels[4] = block[4];
> >> >> > +pixels[5] = block[5];
> >> >> > +pixels[6] = block[6];
> >> >> > +pixels[7] = block[7];
> >> >> > +
> >> >> > +pixels += 8;
> >> >> > +block  += 8;
> >> >> > +}
> >> >> > +}
> >> >> 
> >> >> These almost exist in dsputil...
> >> >
> >> > But not for such drastic situation.
> >> 
> >> The 32-bit clear_block() actually exists exactly.  It would be a shame
> >> not to use it.
> >
> > It's a pity you have to select high bits_per_sample to use it and then it
> > screws other functions like put/get_pixels*
> 
> Enabling 32-bit "DCTELEM" with 8-bit pixels is a 2-line patch to remove
> an #if.  I disabled that variant since nothing needed it at the time.
> Keeping the 16-bit version for add/put_pixels is harder.  Ideas for a
> clean solution welcome.  One possibility is to use two DSPContexts, or
> initialise a scratch context and pull the needed pointers out of it.

But all of this are hacks :(
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [RFC] Make Bink IDCT take 32-bit coefficients as input

2011-07-23 Thread Måns Rullgård
Kostya  writes:

>> >> > diff --git a/libavcodec/binkdsp.c b/libavcodec/binkdsp.c
>> >> > new file mode 100644
>> >> > index 000..456d71d
>> >> > --- /dev/null
>> >> > +++ b/libavcodec/binkdsp.c
>> >> 
>> >> [...]
>> >> 
>> >> > +static void bink_clear_block_c(int32_t *block)
>> >> > +{
>> >> > +memset(block, 0, sizeof(*block) * 64);
>> >> > +}
>> >> > +
>> >> > +static void bink_put_block_c(const int32_t *block, uint8_t *restrict 
>> >> > pixels)
>> >> > +{
>> >> > +int i;
>> >> > +
>> >> > +for (i = 0; i < 8; i++) {
>> >> > +pixels[0] = block[0];
>> >> > +pixels[1] = block[1];
>> >> > +pixels[2] = block[2];
>> >> > +pixels[3] = block[3];
>> >> > +pixels[4] = block[4];
>> >> > +pixels[5] = block[5];
>> >> > +pixels[6] = block[6];
>> >> > +pixels[7] = block[7];
>> >> > +
>> >> > +pixels += 8;
>> >> > +block  += 8;
>> >> > +}
>> >> > +}
>> >> 
>> >> These almost exist in dsputil...
>> >
>> > But not for such drastic situation.
>> 
>> The 32-bit clear_block() actually exists exactly.  It would be a shame
>> not to use it.
>
> It's a pity you have to select high bits_per_sample to use it and then it
> screws other functions like put/get_pixels*

Enabling 32-bit "DCTELEM" with 8-bit pixels is a 2-line patch to remove
an #if.  I disabled that variant since nothing needed it at the time.
Keeping the 16-bit version for add/put_pixels is harder.  Ideas for a
clean solution welcome.  One possibility is to use two DSPContexts, or
initialise a scratch context and pull the needed pointers out of it.

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [RFC] Make Bink IDCT take 32-bit coefficients as input

2011-07-23 Thread Kostya
On Sat, Jul 23, 2011 at 06:32:09PM +0100, Måns Rullgård wrote:
> Kostya  writes:
> 
> > On Sat, Jul 23, 2011 at 03:42:42PM +0100, Måns Rullgård wrote:
> >> Kostya  writes:
> >> 
> >> > As you may know, Bink IDCT is a very special one, it's defined to be
> >> > integer-only (hence lossless and producing the same result on different
> >> > platforms). Unfortunately, it needs more than 16-bit input on blocks with
> >> > sharp edges (and still producing 8-bit output). Since bloating 
> >> > DSPContext with
> >> > another set of functions is bad, here's an attempt to introduce a 
> >> > different
> >> > context for Bink-specific functions. I don't like my patch much (it still
> >> > relies on some of DSPContext stuff), please suggest the better way to 
> >> > fix it.
> > [...]
> >> > @@ -669,10 +671,10 @@ static int read_dct_coeffs(GetBitContext *gb, 
> >> > DCTELEM block[64], const uint8_t *
> >> >  
> >> >  quant = quant_matrices[quant_idx];
> >> >  
> >> > -block[0] = (block[0] * quant[0]) >> 11;
> >> > +block[0] = ((int32_t)(block[0] * quant[0])) >> 11;
> >> >  for (i = 0; i < coef_count; i++) {
> >> >  int idx = coef_idx[i];
> >> > -block[scan[idx]] = (block[scan[idx]] * quant[idx]) >> 11;
> >> > +block[scan[idx]] = ((int32_t)(block[scan[idx]] * quant[idx])) 
> >> > >> 11;
> >> >  }
> >> 
> >> What are those casts meant to do?  They have no effect.
> >
> > They do: quant is uint32_t, so at least GCC chooses to multiply result into
> > uint32_t, previously it was masked by the fact it stored only low 16 bits of
> > result anyway.
> 
> The low 32 bits of a 32x32 multiplication do not depend on the sign of
> the inputs.  The signedness does affect the shift, of course.  Could the
> quant tables be made signed instead?  That feels more natural since you
> do want a signed result.

Indeed, changed.
 
> >> > diff --git a/libavcodec/binkdsp.c b/libavcodec/binkdsp.c
> >> > new file mode 100644
> >> > index 000..456d71d
> >> > --- /dev/null
> >> > +++ b/libavcodec/binkdsp.c
> >> 
> >> [...]
> >> 
> >> > +static void bink_clear_block_c(int32_t *block)
> >> > +{
> >> > +memset(block, 0, sizeof(*block) * 64);
> >> > +}
> >> > +
> >> > +static void bink_put_block_c(const int32_t *block, uint8_t *restrict 
> >> > pixels)
> >> > +{
> >> > +int i;
> >> > +
> >> > +for (i = 0; i < 8; i++) {
> >> > +pixels[0] = block[0];
> >> > +pixels[1] = block[1];
> >> > +pixels[2] = block[2];
> >> > +pixels[3] = block[3];
> >> > +pixels[4] = block[4];
> >> > +pixels[5] = block[5];
> >> > +pixels[6] = block[6];
> >> > +pixels[7] = block[7];
> >> > +
> >> > +pixels += 8;
> >> > +block  += 8;
> >> > +}
> >> > +}
> >> 
> >> These almost exist in dsputil...
> >
> > But not for such drastic situation.
> 
> The 32-bit clear_block() actually exists exactly.  It would be a shame
> not to use it.

It's a pity you have to select high bits_per_sample to use it and then it
screws other functions like put/get_pixels*
 
> >> > diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
> >> > index 09e58f4..be846d3 100644
> >> > --- a/libavcodec/dsputil.c
> >> > +++ b/libavcodec/dsputil.c
> >> > @@ -2895,9 +2895,6 @@ av_cold void dsputil_init(DSPContext* c, 
> >> > AVCodecContext *avctx)
> >> >  c->idct_put= ff_ea_idct_put_c;
> >> >  c->idct_permutation_type= FF_NO_IDCT_PERM;
> >> >  }else if(CONFIG_BINK_DECODER && avctx->idct_algo==FF_IDCT_BINK) 
> >> > {
> >> > -c->idct = ff_bink_idct_c;
> >> > -c->idct_add = ff_bink_idct_add_c;
> >> > -c->idct_put = ff_bink_idct_put_c;
> >> >  c->idct_permutation_type = FF_NO_IDCT_PERM;
> >> >  }else{ //accurate/default
> >> 
> >> That part can be removed entirely if you change the ff_init_scantable()
> >> call in bink.c:decode_init() to not use dsp.idct_permuation but pass
> >> FF_NO_IDCT_PERM directly (since that is the only one used).
> >
> > It's actually the whole table you need to pass there, but changed anyway.
> 
> You could also drop that scantable permutation entirely since there is
> only one bink idct implementation, and it doesn't need it.  In the
> unlikely event someone writes a simd version, we can put it back then.

Done.
>From 013886acce4b00396fa69671ca099e88dcb523ce Mon Sep 17 00:00:00 2001
From: Kostya Shishkov 
Date: Sat, 23 Jul 2011 15:46:35 +0200
Subject: [PATCH] Make Bink DCT take 32-bit input

Since IDC transforming 32-bit input to 8-bit output is unusual and unpractical
for most codecs, move Bink IDCT into separate context. Get rid of an additional
permutation table while at it since SIMD support for DCT is unlikely to be
implemented in foreseeable future.
Quantisation tables also have to change type into signed one for proper
quantisation of DCT coefficients.
---
 libavcodec/Makefile   |2 +-
 libavcodec/bink.c |   59 ++-
 libavcodec/bin

Re: [libav-devel] [RFC] Make Bink IDCT take 32-bit coefficients as input

2011-07-23 Thread Måns Rullgård
Kostya  writes:

> On Sat, Jul 23, 2011 at 03:42:42PM +0100, Måns Rullgård wrote:
>> Kostya  writes:
>> 
>> > As you may know, Bink IDCT is a very special one, it's defined to be
>> > integer-only (hence lossless and producing the same result on different
>> > platforms). Unfortunately, it needs more than 16-bit input on blocks with
>> > sharp edges (and still producing 8-bit output). Since bloating DSPContext 
>> > with
>> > another set of functions is bad, here's an attempt to introduce a different
>> > context for Bink-specific functions. I don't like my patch much (it still
>> > relies on some of DSPContext stuff), please suggest the better way to fix 
>> > it.
> [...]
>> > @@ -669,10 +671,10 @@ static int read_dct_coeffs(GetBitContext *gb, 
>> > DCTELEM block[64], const uint8_t *
>> >  
>> >  quant = quant_matrices[quant_idx];
>> >  
>> > -block[0] = (block[0] * quant[0]) >> 11;
>> > +block[0] = ((int32_t)(block[0] * quant[0])) >> 11;
>> >  for (i = 0; i < coef_count; i++) {
>> >  int idx = coef_idx[i];
>> > -block[scan[idx]] = (block[scan[idx]] * quant[idx]) >> 11;
>> > +block[scan[idx]] = ((int32_t)(block[scan[idx]] * quant[idx])) >> 
>> > 11;
>> >  }
>> 
>> What are those casts meant to do?  They have no effect.
>
> They do: quant is uint32_t, so at least GCC chooses to multiply result into
> uint32_t, previously it was masked by the fact it stored only low 16 bits of
> result anyway.

The low 32 bits of a 32x32 multiplication do not depend on the sign of
the inputs.  The signedness does affect the shift, of course.  Could the
quant tables be made signed instead?  That feels more natural since you
do want a signed result.

>> > diff --git a/libavcodec/binkdsp.c b/libavcodec/binkdsp.c
>> > new file mode 100644
>> > index 000..456d71d
>> > --- /dev/null
>> > +++ b/libavcodec/binkdsp.c
>> 
>> [...]
>> 
>> > +static void bink_clear_block_c(int32_t *block)
>> > +{
>> > +memset(block, 0, sizeof(*block) * 64);
>> > +}
>> > +
>> > +static void bink_put_block_c(const int32_t *block, uint8_t *restrict 
>> > pixels)
>> > +{
>> > +int i;
>> > +
>> > +for (i = 0; i < 8; i++) {
>> > +pixels[0] = block[0];
>> > +pixels[1] = block[1];
>> > +pixels[2] = block[2];
>> > +pixels[3] = block[3];
>> > +pixels[4] = block[4];
>> > +pixels[5] = block[5];
>> > +pixels[6] = block[6];
>> > +pixels[7] = block[7];
>> > +
>> > +pixels += 8;
>> > +block  += 8;
>> > +}
>> > +}
>> 
>> These almost exist in dsputil...
>
> But not for such drastic situation.

The 32-bit clear_block() actually exists exactly.  It would be a shame
not to use it.

>> > diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
>> > index 09e58f4..be846d3 100644
>> > --- a/libavcodec/dsputil.c
>> > +++ b/libavcodec/dsputil.c
>> > @@ -2895,9 +2895,6 @@ av_cold void dsputil_init(DSPContext* c, 
>> > AVCodecContext *avctx)
>> >  c->idct_put= ff_ea_idct_put_c;
>> >  c->idct_permutation_type= FF_NO_IDCT_PERM;
>> >  }else if(CONFIG_BINK_DECODER && avctx->idct_algo==FF_IDCT_BINK) {
>> > -c->idct = ff_bink_idct_c;
>> > -c->idct_add = ff_bink_idct_add_c;
>> > -c->idct_put = ff_bink_idct_put_c;
>> >  c->idct_permutation_type = FF_NO_IDCT_PERM;
>> >  }else{ //accurate/default
>> 
>> That part can be removed entirely if you change the ff_init_scantable()
>> call in bink.c:decode_init() to not use dsp.idct_permuation but pass
>> FF_NO_IDCT_PERM directly (since that is the only one used).
>
> It's actually the whole table you need to pass there, but changed anyway.

You could also drop that scantable permutation entirely since there is
only one bink idct implementation, and it doesn't need it.  In the
unlikely event someone writes a simd version, we can put it back then.

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [RFC] Make Bink IDCT take 32-bit coefficients as input

2011-07-23 Thread Kostya
On Sat, Jul 23, 2011 at 03:42:42PM +0100, Måns Rullgård wrote:
> Kostya  writes:
> 
> > As you may know, Bink IDCT is a very special one, it's defined to be
> > integer-only (hence lossless and producing the same result on different
> > platforms). Unfortunately, it needs more than 16-bit input on blocks with
> > sharp edges (and still producing 8-bit output). Since bloating DSPContext 
> > with
> > another set of functions is bad, here's an attempt to introduce a different
> > context for Bink-specific functions. I don't like my patch much (it still
> > relies on some of DSPContext stuff), please suggest the better way to fix 
> > it.
[...]
> > @@ -669,10 +671,10 @@ static int read_dct_coeffs(GetBitContext *gb, DCTELEM 
> > block[64], const uint8_t *
> >  
> >  quant = quant_matrices[quant_idx];
> >  
> > -block[0] = (block[0] * quant[0]) >> 11;
> > +block[0] = ((int32_t)(block[0] * quant[0])) >> 11;
> >  for (i = 0; i < coef_count; i++) {
> >  int idx = coef_idx[i];
> > -block[scan[idx]] = (block[scan[idx]] * quant[idx]) >> 11;
> > +block[scan[idx]] = ((int32_t)(block[scan[idx]] * quant[idx])) >> 
> > 11;
> >  }
> 
> What are those casts meant to do?  They have no effect.

They do: quant is uint32_t, so at least GCC chooses to multiply result into
uint32_t, previously it was masked by the fact it stored only low 16 bits of
result anyway.
 
> > diff --git a/libavcodec/binkdsp.c b/libavcodec/binkdsp.c
> > new file mode 100644
> > index 000..456d71d
> > --- /dev/null
> > +++ b/libavcodec/binkdsp.c
> 
> [...]
> 
> > +static void bink_clear_block_c(int32_t *block)
> > +{
> > +memset(block, 0, sizeof(*block) * 64);
> > +}
> > +
> > +static void bink_put_block_c(const int32_t *block, uint8_t *restrict 
> > pixels)
> > +{
> > +int i;
> > +
> > +for (i = 0; i < 8; i++) {
> > +pixels[0] = block[0];
> > +pixels[1] = block[1];
> > +pixels[2] = block[2];
> > +pixels[3] = block[3];
> > +pixels[4] = block[4];
> > +pixels[5] = block[5];
> > +pixels[6] = block[6];
> > +pixels[7] = block[7];
> > +
> > +pixels += 8;
> > +block  += 8;
> > +}
> > +}
> 
> These almost exist in dsputil...

But not for such drastic situation.

> > diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
> > index 09e58f4..be846d3 100644
> > --- a/libavcodec/dsputil.c
> > +++ b/libavcodec/dsputil.c
> > @@ -2895,9 +2895,6 @@ av_cold void dsputil_init(DSPContext* c, 
> > AVCodecContext *avctx)
> >  c->idct_put= ff_ea_idct_put_c;
> >  c->idct_permutation_type= FF_NO_IDCT_PERM;
> >  }else if(CONFIG_BINK_DECODER && avctx->idct_algo==FF_IDCT_BINK) {
> > -c->idct = ff_bink_idct_c;
> > -c->idct_add = ff_bink_idct_add_c;
> > -c->idct_put = ff_bink_idct_put_c;
> >  c->idct_permutation_type = FF_NO_IDCT_PERM;
> >  }else{ //accurate/default
> 
> That part can be removed entirely if you change the ff_init_scantable()
> call in bink.c:decode_init() to not use dsp.idct_permuation but pass
> FF_NO_IDCT_PERM directly (since that is the only one used).

It's actually the whole table you need to pass there, but changed anyway.
>From 5b103ed5f6e60aae2b743b708a8c57dbcd1a8d45 Mon Sep 17 00:00:00 2001
From: Kostya Shishkov 
Date: Sat, 23 Jul 2011 15:46:35 +0200
Subject: [PATCH] Make Bink DCT take 32-bit input

---
 libavcodec/Makefile   |2 +-
 libavcodec/bink.c |   55 ++
 libavcodec/binkdsp.c  |  151 +
 libavcodec/binkdsp.h  |   43 ++
 libavcodec/binkidct.c |  112 
 libavcodec/dsputil.c  |5 --
 6 files changed, 225 insertions(+), 143 deletions(-)
 create mode 100644 libavcodec/binkdsp.c
 create mode 100644 libavcodec/binkdsp.h
 delete mode 100644 libavcodec/binkidct.c

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 99ecbbf..36e07a9 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -89,7 +89,7 @@ OBJS-$(CONFIG_AURA2_DECODER)   += aura.o
 OBJS-$(CONFIG_AVS_DECODER) += avs.o
 OBJS-$(CONFIG_BETHSOFTVID_DECODER) += bethsoftvideo.o
 OBJS-$(CONFIG_BFI_DECODER) += bfi.o
-OBJS-$(CONFIG_BINK_DECODER)+= bink.o binkidct.o
+OBJS-$(CONFIG_BINK_DECODER)+= bink.o binkdsp.o
 OBJS-$(CONFIG_BINKAUDIO_DCT_DECODER)   += binkaudio.o wma.o
 OBJS-$(CONFIG_BINKAUDIO_RDFT_DECODER)  += binkaudio.o wma.o
 OBJS-$(CONFIG_BMP_DECODER) += bmp.o msrledec.o
diff --git a/libavcodec/bink.c b/libavcodec/bink.c
index e085aa5..73fbe37 100644
--- a/libavcodec/bink.c
+++ b/libavcodec/bink.c
@@ -24,6 +24,7 @@
 #include "avcodec.h"
 #include "dsputil.h"
 #include "binkdata.h"
+#include "binkdsp.h"
 #include "mathops.h"
 
 #define ALT_BITSTREAM_READER_LE
@@ -109,6 +110,7 @@ typedef struct Bundle {
 typedef struct BinkContext {
 AVCodecContext *avctx;
 DS

Re: [libav-devel] [PATCH] jpegdec: actually search for and parse RSTn

2011-07-23 Thread Kostya
On Sat, Jul 23, 2011 at 05:14:29PM +0200, Reinhard Tartler wrote:
> On Sat, Jul 23, 2011 at 14:44:36 (CEST), Kostya wrote:
> 
> > On Fri, Jul 22, 2011 at 07:53:44PM +0200, Reinhard Tartler wrote:
> >> From: Michael Niedermayer 
> >> 
> >> Fixes http://ffmpeg.org/trac/ffmpeg/ticket/267
> >
> > Please replace it with a description of a problem (from that ticket, for
> > example). Our project is not high-quality and thus still demands sensible
> > commit messages.
> 
> Okay, what about this message then:
> 
> jpegdec: actually search for and parse RSTn
> 
> Fixes decoding of MJPEG files produced by some UVC Logitec web cameras,
> such as "Notebook Pro" and "HD C910".
> 
> References:
> http://trac.videolan.org/vlc/ticket/4215
> http://ffmpeg.org/trac/ffmpeg/ticket/267
> 
> Signed-off-by: Michael Niedermayer 
> (cherry picked from commit 7b8ed831eb8432d202dad16dedc1758b018bb1fa)

Yes, looks much better.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] jpegdec: actually search for and parse RSTn

2011-07-23 Thread Reinhard Tartler
On Sat, Jul 23, 2011 at 14:44:36 (CEST), Kostya wrote:

> On Fri, Jul 22, 2011 at 07:53:44PM +0200, Reinhard Tartler wrote:
>> From: Michael Niedermayer 
>> 
>> Fixes http://ffmpeg.org/trac/ffmpeg/ticket/267
>
> Please replace it with a description of a problem (from that ticket, for
> example). Our project is not high-quality and thus still demands sensible
> commit messages.

Okay, what about this message then:

jpegdec: actually search for and parse RSTn

Fixes decoding of MJPEG files produced by some UVC Logitec web cameras,
such as "Notebook Pro" and "HD C910".

References:
http://trac.videolan.org/vlc/ticket/4215
http://ffmpeg.org/trac/ffmpeg/ticket/267

Signed-off-by: Michael Niedermayer 
(cherry picked from commit 7b8ed831eb8432d202dad16dedc1758b018bb1fa)


>> ---
>>  libavcodec/mjpegdec.c |7 +--
>>  1 files changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
>> index b28fdf8..a1588df 100644
>> --- a/libavcodec/mjpegdec.c
>> +++ b/libavcodec/mjpegdec.c
>> @@ -879,9 +879,12 @@ static int mjpeg_decode_scan(MJpegDecodeContext *s, int 
>> nb_components, int Ah, i
>>  }
>>  }
>>  
>> -if (s->restart_interval && !--s->restart_count) {
>> +if (s->restart_interval && show_bits(&s->gb, 8) == 0xFF){/* 
>> skip RSTn */
> a space wouldn't hurt here ^

Will add it.

>> +--s->restart_count;
>>  align_get_bits(&s->gb);
>> -skip_bits(&s->gb, 16); /* skip RSTn */
>> +while(show_bits(&s->gb, 8) == 0xFF)
>> +skip_bits(&s->gb, 8);
>> +skip_bits(&s->gb, 8);
>>  for (i=0; i>  s->last_dc[i] = 1024;
>>  }
>> --
>
> After looking at the spec the patch seems passable, Appendix E of the JPEG
> spec says "Note that optional X’FF’ fill bytes which may precede any marker
> shall be discarded before determining which marker is present."
> ___
> libav-devel mailing list
> libav-devel@libav.org
> https://lists.libav.org/mailman/listinfo/libav-devel

-- 
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [RFC] Make Bink IDCT take 32-bit coefficients as input

2011-07-23 Thread Måns Rullgård
Kostya  writes:

> As you may know, Bink IDCT is a very special one, it's defined to be
> integer-only (hence lossless and producing the same result on different
> platforms). Unfortunately, it needs more than 16-bit input on blocks with
> sharp edges (and still producing 8-bit output). Since bloating DSPContext with
> another set of functions is bad, here's an attempt to introduce a different
> context for Bink-specific functions. I don't like my patch much (it still
> relies on some of DSPContext stuff), please suggest the better way to fix it.
>
> diff --git a/libavcodec/bink.c b/libavcodec/bink.c
> index e085aa5..25ee63a 100644
> --- a/libavcodec/bink.c
> +++ b/libavcodec/bink.c
> @@ -24,6 +24,7 @@
>  #include "avcodec.h"
>  #include "dsputil.h"
>  #include "binkdata.h"
> +#include "binkdsp.h"
>  #include "mathops.h"
>  
>  #define ALT_BITSTREAM_READER_LE
> @@ -109,6 +110,7 @@ typedef struct Bundle {
>  typedef struct BinkContext {
>  AVCodecContext *avctx;
>  DSPContext dsp;
> +BinkDSPContext bdsp;
>  AVFramepic, last;
>  intversion;  ///< internal Bink file version
>  inthas_alpha;
> @@ -580,7 +582,7 @@ static inline int binkb_get_value(BinkContext *c, int 
> bundle_num)
>   * @param quant_matrices quantization matrices
>   * @return 0 for success, negative value in other cases
>   */
> -static int read_dct_coeffs(GetBitContext *gb, DCTELEM block[64], const 
> uint8_t *scan,
> +static int read_dct_coeffs(GetBitContext *gb, int32_t block[64], const 
> uint8_t *scan,
> const uint32_t quant_matrices[16][64], int q)
>  {
>  int coef_list[128];
> @@ -669,10 +671,10 @@ static int read_dct_coeffs(GetBitContext *gb, DCTELEM 
> block[64], const uint8_t *
>  
>  quant = quant_matrices[quant_idx];
>  
> -block[0] = (block[0] * quant[0]) >> 11;
> +block[0] = ((int32_t)(block[0] * quant[0])) >> 11;
>  for (i = 0; i < coef_count; i++) {
>  int idx = coef_idx[i];
> -block[scan[idx]] = (block[scan[idx]] * quant[idx]) >> 11;
> +block[scan[idx]] = ((int32_t)(block[scan[idx]] * quant[idx])) >> 11;
>  }

What are those casts meant to do?  They have no effect.

> diff --git a/libavcodec/binkdsp.c b/libavcodec/binkdsp.c
> new file mode 100644
> index 000..456d71d
> --- /dev/null
> +++ b/libavcodec/binkdsp.c

[...]

> +static void bink_clear_block_c(int32_t *block)
> +{
> +memset(block, 0, sizeof(*block) * 64);
> +}
> +
> +static void bink_put_block_c(const int32_t *block, uint8_t *restrict pixels)
> +{
> +int i;
> +
> +for (i = 0; i < 8; i++) {
> +pixels[0] = block[0];
> +pixels[1] = block[1];
> +pixels[2] = block[2];
> +pixels[3] = block[3];
> +pixels[4] = block[4];
> +pixels[5] = block[5];
> +pixels[6] = block[6];
> +pixels[7] = block[7];
> +
> +pixels += 8;
> +block  += 8;
> +}
> +}

These almost exist in dsputil...

> diff --git a/libavcodec/dsputil.c b/libavcodec/dsputil.c
> index 09e58f4..be846d3 100644
> --- a/libavcodec/dsputil.c
> +++ b/libavcodec/dsputil.c
> @@ -2895,9 +2895,6 @@ av_cold void dsputil_init(DSPContext* c, AVCodecContext 
> *avctx)
>  c->idct_put= ff_ea_idct_put_c;
>  c->idct_permutation_type= FF_NO_IDCT_PERM;
>  }else if(CONFIG_BINK_DECODER && avctx->idct_algo==FF_IDCT_BINK) {
> -c->idct = ff_bink_idct_c;
> -c->idct_add = ff_bink_idct_add_c;
> -c->idct_put = ff_bink_idct_put_c;
>  c->idct_permutation_type = FF_NO_IDCT_PERM;
>  }else{ //accurate/default

That part can be removed entirely if you change the ff_init_scantable()
call in bink.c:decode_init() to not use dsp.idct_permuation but pass
FF_NO_IDCT_PERM directly (since that is the only one used).

-- 
Måns Rullgård
m...@mansr.com
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] [RFC] Make Bink IDCT take 32-bit coefficients as input

2011-07-23 Thread Kostya
As you may know, Bink IDCT is a very special one, it's defined to be
integer-only (hence lossless and producing the same result on different
platforms). Unfortunately, it needs more than 16-bit input on blocks with
sharp edges (and still producing 8-bit output). Since bloating DSPContext with
another set of functions is bad, here's an attempt to introduce a different
context for Bink-specific functions. I don't like my patch much (it still
relies on some of DSPContext stuff), please suggest the better way to fix it.
>From f341e08d28637aef990f1d5a86e395f955660304 Mon Sep 17 00:00:00 2001
From: Kostya Shishkov 
Date: Sat, 23 Jul 2011 15:46:35 +0200
Subject: [PATCH] Make Bink DCT take 32-bit input

---
 libavcodec/Makefile   |2 +-
 libavcodec/bink.c |   53 ++
 libavcodec/binkdsp.c  |  146 +
 libavcodec/binkdsp.h  |   42 ++
 libavcodec/binkidct.c |  112 -
 libavcodec/dsputil.c  |3 -
 6 files changed, 218 insertions(+), 140 deletions(-)
 create mode 100644 libavcodec/binkdsp.c
 create mode 100644 libavcodec/binkdsp.h
 delete mode 100644 libavcodec/binkidct.c

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 99ecbbf..36e07a9 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -89,7 +89,7 @@ OBJS-$(CONFIG_AURA2_DECODER)   += aura.o
 OBJS-$(CONFIG_AVS_DECODER) += avs.o
 OBJS-$(CONFIG_BETHSOFTVID_DECODER) += bethsoftvideo.o
 OBJS-$(CONFIG_BFI_DECODER) += bfi.o
-OBJS-$(CONFIG_BINK_DECODER)+= bink.o binkidct.o
+OBJS-$(CONFIG_BINK_DECODER)+= bink.o binkdsp.o
 OBJS-$(CONFIG_BINKAUDIO_DCT_DECODER)   += binkaudio.o wma.o
 OBJS-$(CONFIG_BINKAUDIO_RDFT_DECODER)  += binkaudio.o wma.o
 OBJS-$(CONFIG_BMP_DECODER) += bmp.o msrledec.o
diff --git a/libavcodec/bink.c b/libavcodec/bink.c
index e085aa5..25ee63a 100644
--- a/libavcodec/bink.c
+++ b/libavcodec/bink.c
@@ -24,6 +24,7 @@
 #include "avcodec.h"
 #include "dsputil.h"
 #include "binkdata.h"
+#include "binkdsp.h"
 #include "mathops.h"
 
 #define ALT_BITSTREAM_READER_LE
@@ -109,6 +110,7 @@ typedef struct Bundle {
 typedef struct BinkContext {
 AVCodecContext *avctx;
 DSPContext dsp;
+BinkDSPContext bdsp;
 AVFramepic, last;
 intversion;  ///< internal Bink file version
 inthas_alpha;
@@ -580,7 +582,7 @@ static inline int binkb_get_value(BinkContext *c, int bundle_num)
  * @param quant_matrices quantization matrices
  * @return 0 for success, negative value in other cases
  */
-static int read_dct_coeffs(GetBitContext *gb, DCTELEM block[64], const uint8_t *scan,
+static int read_dct_coeffs(GetBitContext *gb, int32_t block[64], const uint8_t *scan,
const uint32_t quant_matrices[16][64], int q)
 {
 int coef_list[128];
@@ -669,10 +671,10 @@ static int read_dct_coeffs(GetBitContext *gb, DCTELEM block[64], const uint8_t *
 
 quant = quant_matrices[quant_idx];
 
-block[0] = (block[0] * quant[0]) >> 11;
+block[0] = ((int32_t)(block[0] * quant[0])) >> 11;
 for (i = 0; i < coef_count; i++) {
 int idx = coef_idx[i];
-block[scan[idx]] = (block[scan[idx]] * quant[idx]) >> 11;
+block[scan[idx]] = ((int32_t)(block[scan[idx]] * quant[idx])) >> 11;
 }
 
 return 0;
@@ -791,6 +793,7 @@ static int binkb_decode_plane(BinkContext *c, GetBitContext *gb, int plane_idx,
 const uint8_t *scan;
 int xoff, yoff;
 LOCAL_ALIGNED_16(DCTELEM, block, [64]);
+LOCAL_ALIGNED_16(int32_t, dctblock, [64]);
 int coordmap[64];
 int ybias = is_key ? -15 : 0;
 int qp;
@@ -845,11 +848,11 @@ static int binkb_decode_plane(BinkContext *c, GetBitContext *gb, int plane_idx,
 dst[coordmap[*scan++]] = binkb_get_value(c, BINKB_SRC_COLORS);
 break;
 case 2:
-c->dsp.clear_block(block);
-block[0] = binkb_get_value(c, BINKB_SRC_INTRA_DC);
+c->bdsp.clear_block(dctblock);
+dctblock[0] = binkb_get_value(c, BINKB_SRC_INTRA_DC);
 qp = binkb_get_value(c, BINKB_SRC_INTRA_Q);
-read_dct_coeffs(gb, block, c->scantable.permutated, binkb_intra_quant, qp);
-c->dsp.idct_put(dst, stride, block);
+read_dct_coeffs(gb, dctblock, c->scantable.permutated, binkb_intra_quant, qp);
+c->bdsp.idct_put(dst, stride, dctblock);
 break;
 case 3:
 xoff = binkb_get_value(c, BINKB_SRC_X_OFF);
@@ -878,11 +881,11 @@ static int binkb_decode_plane(BinkContext *c, GetBitContext *gb, int plane_idx,
 } else {
 put_pixels8x8_overlapped(dst, ref, stride);
 }
-c->dsp.clear_block(block);
-block[0] = binkb_get_value(c, BINKB_SRC_INTER_DC);
+c->bdsp.clear_b

Re: [libav-devel] 10-bit H.264 Decoding

2011-07-23 Thread madshi
2011/7/23 Daniel Kang 
> I've been working on optimizing 10-bit H.264 decoding
> on x86/x64 for the past months. Before I started, 10-bit
> H.264 HD content was unplayable in real-time, but now,
> with a recent CPU, it is.
> Overall improvements are about 53%. With threading,
> about 61% (with respect to pre-asm with the same
> number of threads).

Thanks a lot for your work - and well done!!   :-)

Best regards, Mathias.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] parseutils: add av_parse_number() function and children, and use them in the ff* tools

2011-07-23 Thread Stefano Sabatini
On date Wednesday 2011-06-15 00:30:32 +0200, Stefano Sabatini encoded:
> On date Wednesday 2011-06-15 00:04:21 +0200, Stefano Sabatini encoded:
> > On date Sunday 2011-06-05 00:40:57 +0200, Stefano Sabatini encoded:
> > > On date Saturday 2011-05-28 19:28:17 +0200, Stefano Sabatini encoded:
> > [...]
> > > > With number parsing it is more complicated, the failure may depend
> > > > on an invalid range, on a typo, or on the parsed number not being an
> > > > integer, discovering the reason of the failure requires duplicating
> > > > the logic in the calling code, if you don't report the failure
> > > > reason you have a poor feedback (and I don't want to duplicate the
> > > > error reporting every time I use that function, check ffmpeg for an
> > > > example).
> > > > 
> > > > > Developers can use av_dlog() messages with DEBUG defined. So there is
> > > > > something to say for it. Whatever you guys prefer is fine with me, I'd
> > > > > have a slight preference for the part where you have a simpler (so 4-
> > > > > instead of 6-argument API), but don't care much.
> > > > 
> > > > I believe the two parameters are a modest price for the added
> > > > flexibility/functionality.
> > > 
> > > Ping and updated patch (Note for the committer: update libavutil minor
> > > number and APIchanges number and date).
> > 
> > Ping and updated patch.
> 
> With a minor fix.

> From 414530679acbd9845c940dcfdf5dcbf32472a25a Mon Sep 17 00:00:00 2001
> From: Stefano Sabatini 
> Date: Sat, 28 May 2011 01:20:27 +0200
> Subject: [PATCH] parseutils: add av_parse_number() function and children
> 
> Also use them in the ff* tools in place of parse_number_or_die().

Updated with some factorization. Also av_parse_number now returns
AVERROR(ERANGE) in case of out-of-range value.
>From 50a8feae9ce59dbed308c86d4a71ff4f7b9e4f23 Mon Sep 17 00:00:00 2001
From: Stefano Sabatini 
Date: Sat, 28 May 2011 01:20:27 +0200
Subject: [PATCH] parseutils: add av_parse_number() function and children

Also use them in the ff* tools in place of parse_number_or_die().
---
 cmdutils.c |   45 +++---
 cmdutils.h |   15 ---
 doc/APIchanges |9 +++
 ffmpeg.c   |   30 +--
 ffplay.c   |   15 ---
 libavutil/parseutils.c |   62 
 libavutil/parseutils.h |   29 ++
 7 files changed, 132 insertions(+), 73 deletions(-)

diff --git a/cmdutils.c b/cmdutils.c
index 19c5d72..23baa82 100644
--- a/cmdutils.c
+++ b/cmdutils.c
@@ -87,25 +87,6 @@ void log_callback_help(void* ptr, int level, const char* fmt, va_list vl)
 vfprintf(stdout, fmt, vl);
 }
 
-double parse_number_or_die(const char *context, const char *numstr, int type, double min, double max)
-{
-char *tail;
-const char *error;
-double d = av_strtod(numstr, &tail);
-if (*tail)
-error= "Expected number for %s but found: %s\n";
-else if (d < min || d > max)
-error= "The value for %s was %s which is not within %f - %f\n";
-else if(type == OPT_INT64 && (int64_t)d != d)
-error= "Expected int64 for %s but found %s\n";
-else if (type == OPT_INT && (int)d != d)
-error= "Expected int for %s but found %s\n";
-else
-return d;
-fprintf(stderr, error, context, numstr, min, max);
-exit(1);
-}
-
 int64_t parse_time_or_die(const char *context, const char *timestr, int is_duration)
 {
 int64_t us;
@@ -214,7 +195,7 @@ void parse_options(int argc, char **argv, const OptionDef *options,
void (* parse_arg_function)(const char*))
 {
 const char *opt, *arg;
-int optindex, handleoptions=1;
+int ret = 0, optindex, handleoptions=1;
 const OptionDef *po;
 
 /* perform system-dependent conversions for arguments list */
@@ -262,16 +243,17 @@ unknown_opt:
 } else if (po->flags & OPT_BOOL) {
 *po->u.int_arg = bool_val;
 } else if (po->flags & OPT_INT) {
-*po->u.int_arg = parse_number_or_die(opt, arg, OPT_INT64, INT_MIN, INT_MAX);
+ret = av_parse_int(po->u.int_arg, arg, INT_MIN, INT_MAX, 0, NULL);
 } else if (po->flags & OPT_INT64) {
-*po->u.int64_arg = parse_number_or_die(opt, arg, OPT_INT64, INT64_MIN, INT64_MAX);
+ret = av_parse_int64(po->u.int64_arg, arg, INT64_MIN, INT64_MAX, 0, NULL);
 } else if (po->flags & OPT_FLOAT) {
-*po->u.float_arg = parse_number_or_die(opt, arg, OPT_FLOAT, -INFINITY, INFINITY);
-} else if (po->u.func_arg) {
-if (po->u.func_arg(opt, arg) < 0) {
-fprintf(stderr, "%s: failed to set value '%s' for option '%s'\n", argv[0], arg, opt);
-exit(1);
-}
+ret = av_parse_float(po->u.float_arg, arg, -INFINITY, INFINITY, 0, NULL);
+} else if (po->u.func_arg && po->u.fu

Re: [libav-devel] [PATCH] jpegdec: actually search for and parse RSTn

2011-07-23 Thread Kostya
On Fri, Jul 22, 2011 at 07:53:44PM +0200, Reinhard Tartler wrote:
> From: Michael Niedermayer 
> 
> Fixes http://ffmpeg.org/trac/ffmpeg/ticket/267

Please replace it with a description of a problem (from that ticket, for
example). Our project is not high-quality and thus still demands sensible
commit messages.

> Signed-off-by: Michael Niedermayer 
> 
> (cherry picked from commit 7b8ed831eb8432d202dad16dedc1758b018bb1fa)
> ---
>  libavcodec/mjpegdec.c |7 +--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/mjpegdec.c b/libavcodec/mjpegdec.c
> index b28fdf8..a1588df 100644
> --- a/libavcodec/mjpegdec.c
> +++ b/libavcodec/mjpegdec.c
> @@ -879,9 +879,12 @@ static int mjpeg_decode_scan(MJpegDecodeContext *s, int 
> nb_components, int Ah, i
>  }
>  }
>  
> -if (s->restart_interval && !--s->restart_count) {
> +if (s->restart_interval && show_bits(&s->gb, 8) == 0xFF){/* skip 
> RSTn */
a space wouldn't hurt here ^
> +--s->restart_count;
>  align_get_bits(&s->gb);
> -skip_bits(&s->gb, 16); /* skip RSTn */
> +while(show_bits(&s->gb, 8) == 0xFF)
> +skip_bits(&s->gb, 8);
> +skip_bits(&s->gb, 8);
>  for (i=0; i  s->last_dc[i] = 1024;
>  }
> --

After looking at the spec the patch seems passable, Appendix E of the JPEG
spec says "Note that optional X’FF’ fill bytes which may precede any marker
shall be discarded before determining which marker is present."
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


[libav-devel] 10-bit H.264 Decoding

2011-07-23 Thread Daniel Kang
I've been working on optimizing 10-bit H.264 decoding on x86/x64 for the
past months. Before I started, 10-bit H.264 HD content was unplayable in
real-time, but now, with a recent CPU, it is.

Overall improvements are about 53%. With threading, about 61% (with respect
to pre-asm with the same number of threads).

On a recent laptop computer, normal 1080p content can be decoded in real
time (with threading). With a high end computer, 1080p50 can be decoded in
real time (with just two threads).

The actual timing data and patch I used are attached.
CPU: Intel(R) Xeon(R) CPU E31270 @ 3.40GHz

Percent improvements are taken with respect to the before with the same number 
of threads.

Real was used in all cases for standarization.

~BEFORE~
time ./ffmpeg -vsync 0 -i ../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y 
/dev/null
real0m20.426s
user0m18.440s
sys 0m1.950s

real0m20.393s
user0m18.420s
sys 0m1.940s

real0m20.387s
user0m18.410s
sys 0m1.940s

20.402


time ./ffmpeg -thread_type 1 -threads 2 -vsync 0 -i 
../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y /dev/null
real0m12.813s
user0m19.600s
sys 0m2.290s

real0m12.387s
user0m19.030s
sys 0m2.130s

real0m12.386s
user0m18.880s
sys 0m2.230s

12.529


time ./ffmpeg -thread_type 1 -threads 4 -vsync 0 -i 
../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y /dev/null
real0m7.050s
user0m21.780s
sys 0m2.640s

real0m7.430s
user0m21.290s
sys 0m2.750s

real0m7.490s
user0m21.390s
sys 0m2.600s

7.323


time ./ffmpeg -thread_type 1 -threads 6 -vsync 0 -i 
../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y /dev/null
real0m6.058s
user0m25.980s
sys 0m2.840s

real0m5.922s
user0m26.450s
sys 0m2.640s

real0m6.081s
user0m26.550s
sys 0m2.780s

6.020


~AFTER~
time ./ffmpeg -vsync 0 -i ../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y 
/dev/null
real0m13.311s
user0m11.450s
sys 0m1.840s

real0m13.307s
user0m11.520s
sys 0m1.760s

real0m13.333s
user0m11.280s
sys 0m2.030s

13.317 -> 53.203% speedup


time ./ffmpeg -thread_type 1 -threads 2 -vsync 0 -i 
../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y /dev/null
real0m8.199s
user0m11.640s
sys 0m1.930s

real0m8.084s
user0m11.420s
sys 0m1.940s

real0m8.297s
user0m11.440s
sys 0m2.320s

8.193 -> 52.913% speedup


time ./ffmpeg -thread_type 1 -threads 4 -vsync 0 -i 
../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y /dev/null
real0m4.579s
user0m13.420s
sys 0m2.190s

real0m4.527s
user0m13.480s
sys 0m2.110s

real0m4.552s
user0m13.370s
sys 0m2.250s

4.553 -> 60.858% speedup


time ./ffmpeg -thread_type 1 -threads 6 -vsync 0 -i 
../park_joy_paritions_all.264 -v 2 -an -f rawvideo -y /dev/null
real0m4.420s
user0m15.120s
sys 0m2.370s

real0m4.449s
user0m14.640s
sys 0m2.500s

real0m4.754s
user0m14.500s
sys 0m2.660s

4.541 -> 32.577% speedup


patch.diff
Description: Binary data
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] cmdutils: add codec_opts parameter to setup_find_stream_info_opts()

2011-07-23 Thread Stefano Sabatini
On date Sunday 2011-07-17 16:19:28 +0200, Stefano Sabatini encoded:
> Avoid brittle and obfuscating reference to a global.
> ---
>  cmdutils.c |2 +-
>  cmdutils.h |2 +-
>  ffmpeg.c   |2 +-
>  ffplay.c   |2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)

Updated.
>From ce8b65229980cfb09230f5fde6a0d2427bf6427d Mon Sep 17 00:00:00 2001
From: Stefano Sabatini 
Date: Sun, 17 Jul 2011 01:32:27 +0200
Subject: [PATCH] cmdutils: add codec_opts parameter to setup_find_stream_info_opts()

Avoid brittle and obfuscating reference to a global.
---
 cmdutils.c |3 ++-
 cmdutils.h |4 +++-
 ffmpeg.c   |2 +-
 ffplay.c   |2 +-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/cmdutils.c b/cmdutils.c
index 9278371..6fbfa25 100644
--- a/cmdutils.c
+++ b/cmdutils.c
@@ -806,7 +806,8 @@ AVDictionary *filter_codec_opts(AVDictionary *opts, enum CodecID codec_id, int e
 return ret;
 }
 
-AVDictionary **setup_find_stream_info_opts(AVFormatContext *s)
+AVDictionary **setup_find_stream_info_opts(AVFormatContext *s,
+   const AVDictionary *codec_opts)
 {
 int i;
 AVDictionary **opts;
diff --git a/cmdutils.h b/cmdutils.h
index 90d02e6..993186f 100644
--- a/cmdutils.h
+++ b/cmdutils.h
@@ -170,10 +170,12 @@ AVDictionary *filter_codec_opts(AVDictionary *opts, enum CodecID codec_id, int e
  * be applied to the corresponding stream codec context.
  *
  * @param s pointer to a format context
+ * @param codec_opts a dictionary containing options
  * @return pointer to the created array of dictionaries, NULL if it
  * cannot be created
  */
-AVDictionary **setup_find_stream_info_opts(AVFormatContext *s);
+AVDictionary **setup_find_stream_info_opts(AVFormatContext *s,
+   const AVDictionary *codec_opts);
 
 /**
  * Print an error message to stderr, indicating filename and a human
diff --git a/ffmpeg.c b/ffmpeg.c
index 776cfab..c76aeac 100644
--- a/ffmpeg.c
+++ b/ffmpeg.c
@@ -3252,7 +3252,7 @@ static int opt_input_file(const char *opt, const char *filename)
 }
 
 /* Set AVCodecContext options for avformat_find_stream_info */
-opts = setup_find_stream_info_opts(ic);
+opts = setup_find_stream_info_opts(ic, codec_opts);
 orig_nb_streams = ic->nb_streams;
 
 /* If not enough info to get the stream parameters, we decode the
diff --git a/ffplay.c b/ffplay.c
index 77c9d4b..bef75fe 100644
--- a/ffplay.c
+++ b/ffplay.c
@@ -2343,7 +2343,7 @@ static int decode_thread(void *arg)
 if(genpts)
 ic->flags |= AVFMT_FLAG_GENPTS;
 
-opts = setup_find_stream_info_opts(ic);
+opts = setup_find_stream_info_opts(ic, codec_opts);
 orig_nb_streams = ic->nb_streams;
 
 err = avformat_find_stream_info(ic, opts);
-- 
1.7.2.5

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] cmdutils: clarify documentation for setup_find_stream_info_opts()

2011-07-23 Thread Stefano Sabatini
On date Monday 2011-07-18 12:47:25 +0200, Diego Biurrun encoded:
> On Sun, Jul 17, 2011 at 06:47:54PM +0200, Stefano Sabatini wrote:
> > 
> > --- a/cmdutils.h
> > +++ b/cmdutils.h
> > @@ -159,8 +159,16 @@ void parse_options(int argc, char **argv, const 
> > OptionDef *options,
> >  
> > +/**
> > + * Setup AVCodecContext options for avformat_find_stream_info().
> > + *
> > + * Create an array of dictionaries, one dictionary for each stream
> > + * contained in s.
> > + * Each dictionary will contain the options from codec_opts which can
> > + * be applied to the corresponding stream codec context.
> > + *
> > + * @return pointer to the created array of dictionaries, NULL if it
> > + * cannot be created
> >   */
> >  AVDictionary **setup_find_stream_info_opts(AVFormatContext *s);
> 
> This is missing parameter documentation, which will create warnings.

Updated.
>From bac4009d43ace460c7672002d5dab9b5df4af026 Mon Sep 17 00:00:00 2001
From: Stefano Sabatini 
Date: Sun, 17 Jul 2011 01:30:43 +0200
Subject: [PATCH] cmdutils: clarify documentation for setup_find_stream_info_opts()

---
 cmdutils.h |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/cmdutils.h b/cmdutils.h
index e09398e..90d02e6 100644
--- a/cmdutils.h
+++ b/cmdutils.h
@@ -161,8 +161,17 @@ void parse_options(int argc, char **argv, const OptionDef *options,
  */
 AVDictionary *filter_codec_opts(AVDictionary *opts, enum CodecID codec_id, int encoder);
 
-/*
- * Setup AVCodecContext options for avformat_find_stream_info.
+/**
+ * Setup AVCodecContext options for avformat_find_stream_info().
+ *
+ * Create an array of dictionaries, one dictionary for each stream
+ * contained in s.
+ * Each dictionary will contain the options from codec_opts which can
+ * be applied to the corresponding stream codec context.
+ *
+ * @param s pointer to a format context
+ * @return pointer to the created array of dictionaries, NULL if it
+ * cannot be created
  */
 AVDictionary **setup_find_stream_info_opts(AVFormatContext *s);
 
-- 
1.7.2.5

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel


Re: [libav-devel] [PATCH] cmdutils: clarify documentation for filter_codec_opts()

2011-07-23 Thread Stefano Sabatini
On date Monday 2011-07-18 12:42:42 +0200, Diego Biurrun encoded:
> On Sun, Jul 17, 2011 at 06:46:34PM +0200, Stefano Sabatini wrote:
> > 
> > --- a/cmdutils.h
> > +++ b/cmdutils.h
> > @@ -150,6 +150,12 @@ void parse_options(int argc, char **argv, const 
> > OptionDef *options,
> >  
> >  /**
> >   * Filter out options for given codec.
> 
> the/a given codec
> 
> > + *
> > + * Create a new options dictionary containing only the options from
> > + * opts which apply to the codec with ID codec_id.
> > + *
> > + * @param encoder if non-zero the codec is an encoder, otherwise is a 
> > decoder
> 
> I think flags like this are ugly, I'd rather create two separate functions
> - but maybe that's just me...

I tend to disagree, but unrelated.

> > + * @return a pointer to the created dictionary
> >   */
> >  AVDictionary *filter_codec_opts(AVDictionary *opts, enum CodecID codec_id, 
> > int encoder);
> 
> I'm generally not happy about leaving parameters undocumented - it
> generates Doxygen warnings.  We already drown in Doxygen warnings,
> exacerbating the problem is not good.

I see the point into always documenting params (this allows
e.g. intellisense systems to display some info when you hang over with
a pointer device), on the other hand they tend to be redundant. A
possibility would be to patch doxygen to disable this specific
warning.

Updated.
>From f86e77b586e9b930457881b99398be43a1e1fd1b Mon Sep 17 00:00:00 2001
From: Stefano Sabatini 
Date: Sun, 17 Jul 2011 01:20:50 +0200
Subject: [PATCH] cmdutils: clarify documentation for filter_codec_opts()

---
 cmdutils.h |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/cmdutils.h b/cmdutils.h
index a47f157..e09398e 100644
--- a/cmdutils.h
+++ b/cmdutils.h
@@ -149,7 +149,15 @@ void parse_options(int argc, char **argv, const OptionDef *options,
void (* parse_arg_function)(const char*));
 
 /**
- * Filter out options for given codec.
+ * Filter out options for the given codec.
+ *
+ * Create a new options dictionary containing only the options from
+ * opts which apply to the codec with ID codec_id.
+ *
+ * @param opts  a dictionary containing options
+ * @param codec_id  the ID which identifies the codec
+ * @param encoder   if non-zero the codec is an encoder, otherwise is a decoder
+ * @return a pointer to the created dictionary
  */
 AVDictionary *filter_codec_opts(AVDictionary *opts, enum CodecID codec_id, int encoder);
 
-- 
1.7.2.5

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel