Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value
On 2 March 2016 at 04:04, Ganesh Ajjanagaddewrote: > On Tue, Mar 1, 2016 at 7:52 AM, Derek Buitenhuis > wrote: > > On 3/1/2016 3:21 AM, Ganesh Ajjanagadde wrote: > > > > [...] > > > >> --- > >> libavcodec/aacenc_utils.h | 3 +-- > >> 1 file changed, 1 insertion(+), 2 deletions(-) > > > > Cool. Looks like an obvious/easy win, assuming it's identical. > > They are not precisely identical, and in fact the change results in > slightly better accuracy wrt the mathematical expression, simply > because sqrtf(q * sqrtf(q)) is not always a correctly rounded float. I > vaguely recall negligible ~ 2/3 ulp differences. The table is > correctly rounded; I tested that while speeding up the tablegen. > > Added a small line to this effect in the notes. > > I did test it for a few test tracks and the results had the exact same SHA1 as before. Either the differences are only affecting in very extreme cases or we've cracked SHA1. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value
On Tue, Mar 1, 2016 at 9:14 AM, Rostislav Pehlivanovwrote: > On 1 March 2016 at 03:21, Ganesh Ajjanagadde wrote: >> >> It makes no sense whatsoever to do this at each function call; we >> already have a table for this. >> >> Yields a 2x improvement in find_min_book (x86-64, Haswell+GCC): >> ffmpeg -i sin.flac -acodec aac -y sin.aac >> find_min_book >> old >> 605 decicycles in find_min_book, 8388453 runs,155 skips.9x >> 606 decicycles in find_min_book,16776912 runs,304 skips.9x >> 607 decicycles in find_min_book,33553819 runs,613 skips.2x >> 607 decicycles in find_min_book,67107668 runs, 1196 skips.3x >> 607 decicycles in find_min_book,134215360 runs, 2368 skips3x >> >> new >> 359 decicycles in find_min_book, 8388552 runs, 56 skips.3x >> 360 decicycles in find_min_book,16777112 runs,104 skips.1x >> 361 decicycles in find_min_book,33554218 runs,214 skips.4x >> 361 decicycles in find_min_book,67108381 runs,483 skips.5x >> 361 decicycles in find_min_book,134216725 runs, 1003 skips5x >> >> and more importantly a non-negligible speedup (~ 8%) to overall AAC >> encoding: >> old: >> ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_new.aac 6.82s user 0.03s >> system 104% cpu 6.565 total >> new: >> ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_old.aac 6.24s user 0.03s >> system 104% cpu 5.993 total >> >> Signed-off-by: Ganesh Ajjanagadde > > > Nicely spotted, thanks. > > LGTM, feel free to apply whenever you can. > pushed, thanks both ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value
On Tue, Mar 1, 2016 at 7:52 AM, Derek Buitenhuiswrote: > On 3/1/2016 3:21 AM, Ganesh Ajjanagadde wrote: > > [...] > >> --- >> libavcodec/aacenc_utils.h | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) > > Cool. Looks like an obvious/easy win, assuming it's identical. They are not precisely identical, and in fact the change results in slightly better accuracy wrt the mathematical expression, simply because sqrtf(q * sqrtf(q)) is not always a correctly rounded float. I vaguely recall negligible ~ 2/3 ulp differences. The table is correctly rounded; I tested that while speeding up the tablegen. Added a small line to this effect in the notes. > > - Derek > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value
On 1 March 2016 at 03:21, Ganesh Ajjanagaddewrote: > It makes no sense whatsoever to do this at each function call; we > already have a table for this. > > Yields a 2x improvement in find_min_book (x86-64, Haswell+GCC): > ffmpeg -i sin.flac -acodec aac -y sin.aac > find_min_book > old > 605 decicycles in find_min_book, 8388453 runs,155 skips.9x > 606 decicycles in find_min_book,16776912 runs,304 skips.9x > 607 decicycles in find_min_book,33553819 runs,613 skips.2x > 607 decicycles in find_min_book,67107668 runs, 1196 skips.3x > 607 decicycles in find_min_book,134215360 runs, 2368 skips3x > > new > 359 decicycles in find_min_book, 8388552 runs, 56 skips.3x > 360 decicycles in find_min_book,16777112 runs,104 skips.1x > 361 decicycles in find_min_book,33554218 runs,214 skips.4x > 361 decicycles in find_min_book,67108381 runs,483 skips.5x > 361 decicycles in find_min_book,134216725 runs, 1003 skips5x > > and more importantly a non-negligible speedup (~ 8%) to overall AAC > encoding: > old: > ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_new.aac 6.82s user 0.03s > system 104% cpu 6.565 total > new: > ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_old.aac 6.24s user 0.03s > system 104% cpu 5.993 total > > Signed-off-by: Ganesh Ajjanagadde > Nicely spotted, thanks. LGTM, feel free to apply whenever you can. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value
On 3/1/2016 3:21 AM, Ganesh Ajjanagadde wrote: [...] > --- > libavcodec/aacenc_utils.h | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) Cool. Looks like an obvious/easy win, assuming it's identical. - Derek ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace sqrtf(Q*sqrtf(Q)) by precomputed value
It makes no sense whatsoever to do this at each function call; we already have a table for this. Yields a 2x improvement in find_min_book (x86-64, Haswell+GCC): ffmpeg -i sin.flac -acodec aac -y sin.aac find_min_book old 605 decicycles in find_min_book, 8388453 runs,155 skips.9x 606 decicycles in find_min_book,16776912 runs,304 skips.9x 607 decicycles in find_min_book,33553819 runs,613 skips.2x 607 decicycles in find_min_book,67107668 runs, 1196 skips.3x 607 decicycles in find_min_book,134215360 runs, 2368 skips3x new 359 decicycles in find_min_book, 8388552 runs, 56 skips.3x 360 decicycles in find_min_book,16777112 runs,104 skips.1x 361 decicycles in find_min_book,33554218 runs,214 skips.4x 361 decicycles in find_min_book,67108381 runs,483 skips.5x 361 decicycles in find_min_book,134216725 runs, 1003 skips5x and more importantly a non-negligible speedup (~ 8%) to overall AAC encoding: old: ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_new.aac 6.82s user 0.03s system 104% cpu 6.565 total new: ffmpeg -i sin.flac -acodec aac -strict -2 -y sin_old.aac 6.24s user 0.03s system 104% cpu 5.993 total Signed-off-by: Ganesh Ajjanagadde--- libavcodec/aacenc_utils.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h index cb5bc8d..c2a2c2e 100644 --- a/libavcodec/aacenc_utils.h +++ b/libavcodec/aacenc_utils.h @@ -90,8 +90,7 @@ static inline float find_max_val(int group_len, int swb_size, const float *scale static inline int find_min_book(float maxval, int sf) { -float Q = ff_aac_pow2sf_tab[POW_SF2_ZERO - sf + SCALE_ONE_POS - SCALE_DIV_512]; -float Q34 = sqrtf(Q * sqrtf(Q)); +float Q34 = ff_aac_pow34sf_tab[POW_SF2_ZERO - sf + SCALE_ONE_POS - SCALE_DIV_512]; int qmaxval, cb; qmaxval = maxval * Q34 + C_QUANT; if (qmaxval >= (FF_ARRAY_ELEMS(aac_maxval_cb))) -- 2.7.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel