Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value

2016-03-02 Thread Rostislav Pehlivanov
On 2 March 2016 at 04:04, Ganesh Ajjanagadde  wrote:

> On Tue, Mar 1, 2016 at 7:52 AM, Derek Buitenhuis
>  wrote:
> > On 3/1/2016 3:21 AM, Ganesh Ajjanagadde wrote:
> >
> > [...]
> >
> >> ---
> >>  libavcodec/aacenc_utils.h | 3 +--
> >>  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > Cool. Looks like an obvious/easy win, assuming it's identical.
>
> They are not precisely identical, and in fact the change results in
> slightly better accuracy wrt the mathematical expression, simply
> because sqrtf(q * sqrtf(q)) is not always a correctly rounded float. I
> vaguely recall negligible ~ 2/3 ulp differences. The table is
> correctly rounded; I tested that while speeding up the tablegen.
>
> Added a small line to this effect in the notes.
>
>
I did test it for a few test tracks and the results had the exact same SHA1
as before. Either the differences are only affecting in very extreme cases
or we've cracked SHA1.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value

2016-03-01 Thread Ganesh Ajjanagadde
On Tue, Mar 1, 2016 at 9:14 AM, Rostislav Pehlivanov
 wrote:
> On 1 March 2016 at 03:21, Ganesh Ajjanagadde  wrote:
>>
>> It makes no sense whatsoever to do this at each function call; we
>> already have a table for this.
>>
>> Yields a 2x improvement in find_min_book (x86-64, Haswell+GCC):
>> ffmpeg -i sin.flac -acodec aac -y sin.aac
>> find_min_book
>> old
>> 605 decicycles in find_min_book, 8388453 runs,155 skips.9x
>> 606 decicycles in find_min_book,16776912 runs,304 skips.9x
>> 607 decicycles in find_min_book,33553819 runs,613 skips.2x
>> 607 decicycles in find_min_book,67107668 runs,   1196 skips.3x
>> 607 decicycles in find_min_book,134215360 runs,   2368 skips3x
>>
>> new
>> 359 decicycles in find_min_book, 8388552 runs, 56 skips.3x
>> 360 decicycles in find_min_book,16777112 runs,104 skips.1x
>> 361 decicycles in find_min_book,33554218 runs,214 skips.4x
>> 361 decicycles in find_min_book,67108381 runs,483 skips.5x
>> 361 decicycles in find_min_book,134216725 runs,   1003 skips5x
>>
>> and more importantly a non-negligible speedup (~ 8%) to overall AAC
>> encoding:
>> old:
>> ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_new.aac  6.82s user 0.03s
>> system 104% cpu 6.565 total
>> new:
>> ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_old.aac  6.24s user 0.03s
>> system 104% cpu 5.993 total
>>
>> Signed-off-by: Ganesh Ajjanagadde 
>
>
> Nicely spotted, thanks.
>
> LGTM, feel free to apply whenever you can.
>

pushed, thanks both
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value

2016-03-01 Thread Ganesh Ajjanagadde
On Tue, Mar 1, 2016 at 7:52 AM, Derek Buitenhuis
 wrote:
> On 3/1/2016 3:21 AM, Ganesh Ajjanagadde wrote:
>
> [...]
>
>> ---
>>  libavcodec/aacenc_utils.h | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> Cool. Looks like an obvious/easy win, assuming it's identical.

They are not precisely identical, and in fact the change results in
slightly better accuracy wrt the mathematical expression, simply
because sqrtf(q * sqrtf(q)) is not always a correctly rounded float. I
vaguely recall negligible ~ 2/3 ulp differences. The table is
correctly rounded; I tested that while speeding up the tablegen.

Added a small line to this effect in the notes.

>
> - Derek
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value

2016-03-01 Thread Rostislav Pehlivanov
On 1 March 2016 at 03:21, Ganesh Ajjanagadde  wrote:

> It makes no sense whatsoever to do this at each function call; we
> already have a table for this.
>
> Yields a 2x improvement in find_min_book (x86-64, Haswell+GCC):
> ffmpeg -i sin.flac -acodec aac -y sin.aac
> find_min_book
> old
> 605 decicycles in find_min_book, 8388453 runs,155 skips.9x
> 606 decicycles in find_min_book,16776912 runs,304 skips.9x
> 607 decicycles in find_min_book,33553819 runs,613 skips.2x
> 607 decicycles in find_min_book,67107668 runs,   1196 skips.3x
> 607 decicycles in find_min_book,134215360 runs,   2368 skips3x
>
> new
> 359 decicycles in find_min_book, 8388552 runs, 56 skips.3x
> 360 decicycles in find_min_book,16777112 runs,104 skips.1x
> 361 decicycles in find_min_book,33554218 runs,214 skips.4x
> 361 decicycles in find_min_book,67108381 runs,483 skips.5x
> 361 decicycles in find_min_book,134216725 runs,   1003 skips5x
>
> and more importantly a non-negligible speedup (~ 8%) to overall AAC
> encoding:
> old:
> ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_new.aac  6.82s user 0.03s
> system 104% cpu 6.565 total
> new:
> ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_old.aac  6.24s user 0.03s
> system 104% cpu 5.993 total
>
> Signed-off-by: Ganesh Ajjanagadde 
>

Nicely spotted, thanks.

LGTM, feel free to apply whenever you can.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value

2016-03-01 Thread Derek Buitenhuis
On 3/1/2016 3:21 AM, Ganesh Ajjanagadde wrote:

[...]

> ---
>  libavcodec/aacenc_utils.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Cool. Looks like an obvious/easy win, assuming it's identical.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value

2016-02-29 Thread Ganesh Ajjanagadde
It makes no sense whatsoever to do this at each function call; we
already have a table for this.

Yields a 2x improvement in find_min_book (x86-64, Haswell+GCC):
ffmpeg -i sin.flac -acodec aac -y sin.aac
find_min_book
old
605 decicycles in find_min_book, 8388453 runs,155 skips.9x
606 decicycles in find_min_book,16776912 runs,304 skips.9x
607 decicycles in find_min_book,33553819 runs,613 skips.2x
607 decicycles in find_min_book,67107668 runs,   1196 skips.3x
607 decicycles in find_min_book,134215360 runs,   2368 skips3x

new
359 decicycles in find_min_book, 8388552 runs, 56 skips.3x
360 decicycles in find_min_book,16777112 runs,104 skips.1x
361 decicycles in find_min_book,33554218 runs,214 skips.4x
361 decicycles in find_min_book,67108381 runs,483 skips.5x
361 decicycles in find_min_book,134216725 runs,   1003 skips5x

and more importantly a non-negligible speedup (~ 8%) to overall AAC encoding:
old:
ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_new.aac  6.82s user 0.03s 
system 104% cpu 6.565 total
new:
ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_old.aac  6.24s user 0.03s 
system 104% cpu 5.993 total

Signed-off-by: Ganesh Ajjanagadde 
---
 libavcodec/aacenc_utils.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
index cb5bc8d..c2a2c2e 100644
--- a/libavcodec/aacenc_utils.h
+++ b/libavcodec/aacenc_utils.h
@@ -90,8 +90,7 @@ static inline float find_max_val(int group_len, int swb_size, 
const float *scale
 
 static inline int find_min_book(float maxval, int sf)
 {
-float Q = ff_aac_pow2sf_tab[POW_SF2_ZERO - sf + SCALE_ONE_POS - 
SCALE_DIV_512];
-float Q34 = sqrtf(Q * sqrtf(Q));
+float Q34 = ff_aac_pow34sf_tab[POW_SF2_ZERO - sf + SCALE_ONE_POS - 
SCALE_DIV_512];
 int qmaxval, cb;
 qmaxval = maxval * Q34 + C_QUANT;
 if (qmaxval >= (FF_ARRAY_ELEMS(aac_maxval_cb)))
-- 
2.7.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel