LGTM, but could you leave (just comment it out) the old code in there so it's a little easier to follow? > //ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0); > //ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0);
The accuracy increase is always nice. On Thu, 2015-11-26 at 16:31 -0500, Ganesh Ajjanagadde wrote: > This speeds up aac_tablegen to a ludicruous degree (~97%), i.e to the > point > where it can be argued that runtime initialization can always be done > instead of > hard-coded tables. The only cost is essentially a trivial increase in > the stack size. > > Even if one does not care about this, the patch also improves > accuracy > as detailed below. > > Performance: > Benchmark obtained by looping 10^4 times over ff_aac_tableinit. > > Sample benchmark (x86-64, Haswell, GNU/Linux): > old: > 1295292 decicycles in ff_aac_tableinit, 512 runs, 0 skips > 1275981 decicycles in ff_aac_tableinit, 1024 runs, 0 skips > 1272932 decicycles in ff_aac_tableinit, 2048 runs, 0 skips > 1262164 decicycles in ff_aac_tableinit, 4096 runs, 0 skips > 1256720 decicycles in ff_aac_tableinit, 8192 runs, 0 skips > > new: > 25691 decicycles in ff_aac_tableinit, 505 runs, 7 skips > 25130 decicycles in ff_aac_tableinit, 1016 runs, 8 skips > 25973 decicycles in ff_aac_tableinit, 2036 runs, 12 skips > 25911 decicycles in ff_aac_tableinit, 4078 runs, 18 skips > 25816 decicycles in ff_aac_tableinit, 8154 runs, 38 skips > > Accuracy: > The previous code was resulting in needless loss of > accuracy due to the pow being called in succession. As an > illustration > of this: > ff_aac_pow34sf_tab[3] > old : 0.000000000007598092294225 > new : 0.000000000007598091426864 > real: 0.000000000007598091778545 > > truncated to float > old : 0.000000000007598092294225 > new : 0.000000000007598091426864 > real: 0.000000000007598091426864 > > showing that the old value was not correctly rounded. This affects a > large number of elements of the array. > > Patch tested with FATE. > > Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> > --- > libavcodec/aac_tablegen.h | 38 ++++++++++++++++++++++++++++++++++++- > - > 1 file changed, 36 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/aac_tablegen.h b/libavcodec/aac_tablegen.h > index 8b223f9..255723b 100644 > --- a/libavcodec/aac_tablegen.h > +++ b/libavcodec/aac_tablegen.h > @@ -35,9 +35,43 @@ float ff_aac_pow34sf_tab[428]; > av_cold void ff_aac_tableinit(void) > { > int i; > + > + /* 2^(i/16) for 0 <= i <= 15 */ > + const double exp2_lut[] = { > + 1.00000000000000000000, > + 1.04427378242741384032, > + 1.09050773266525765921, > + 1.13878863475669165370, > + 1.18920711500272106672, > + 1.24185781207348404859, > + 1.29683955465100966593, > + 1.35425554693689272830, > + 1.41421356237309504880, > + 1.47682614593949931139, > + 1.54221082540794082361, > + 1.61049033194925430818, > + 1.68179283050742908606, > + 1.75625216037329948311, > + 1.83400808640934246349, > + 1.91520656139714729387, > + }; > + double t1 = 8.8817841970012523233890533447265625e-16; // 2^(-50) > + double t2 = 3.63797880709171295166015625e-12; // 2^(-38) > + int t1_inc_cur, t2_inc_cur; > + int t1_inc_prev = 0; > + int t2_inc_prev = 8; > + > for (i = 0; i < 428; i++) { > - ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0); > - ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0); > + t1_inc_cur = 4 * (i % 4); > + t2_inc_cur = (8 + 3*i) % 16; > + if (t1_inc_cur < t1_inc_prev) > + t1 *= 2; > + if (t2_inc_cur < t2_inc_prev) > + t2 *= 2; > + ff_aac_pow2sf_tab[i] = t1 * exp2_lut[t1_inc_cur]; > + ff_aac_pow34sf_tab[i] = t2 * exp2_lut[t2_inc_cur]; > + t1_inc_prev = t1_inc_cur; > + t2_inc_prev = t2_inc_cur; > } > } > #endif /* CONFIG_HARDCODED_TABLES */ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel