On Sun, Jan 3, 2016 at 6:13 AM, Michael Niedermayer <mich...@niedermayer.cc> wrote: > On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: >> This gets rid of some branches to speed up table generation slightly >> (impact higher on mulaw than alaw). Tables are identical to before, >> tested with FATE. >> >> Sample benchmark (Haswell, GNU/Linux+gcc): >> old: >> 313494 decicycles in build_alaw_table, 4094 runs, 2 skips >> 315959 decicycles in build_alaw_table, 8190 runs, 2 skips >> >> 323599 decicycles in build_ulaw_table, 4095 runs, 1 skips >> 318849 decicycles in build_ulaw_table, 8188 runs, 4 skips >> >> new: >> 261902 decicycles in build_alaw_table, 4096 runs, 0 skips >> 266519 decicycles in build_alaw_table, 8192 runs, 0 skips >> >> 209657 decicycles in build_ulaw_table, 4096 runs, 0 skips >> 232656 decicycles in build_ulaw_table, 8192 runs, 0 skips >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >> --- >> libavcodec/pcm_tablegen.h | 24 ++++++++++++------------ >> 1 file changed, 12 insertions(+), 12 deletions(-) >> >> diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h >> index 1387210..7269977 100644 >> --- a/libavcodec/pcm_tablegen.h >> +++ b/libavcodec/pcm_tablegen.h >> @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t >> *linear_to_xlaw, >> { >> int i, j, v, v1, v2; >> >> - j = 0; >> - for(i=0;i<128;i++) { >> - if (i != 127) { >> - v1 = xlaw2linear(i ^ mask); >> - v2 = xlaw2linear((i + 1) ^ mask); >> - v = (v1 + v2 + 4) >> 3; >> - } else { >> - v = 8192; >> - } >> - for(;j<v;j++) { >> + j = 1; >> + linear_to_xlaw[8192] = mask; >> + for(i=0;i<127;i++) { >> + v1 = xlaw2linear(i ^ mask); >> + v2 = xlaw2linear((i + 1) ^ mask); >> + v = (v1 + v2 + 4) >> 3; >> + for(;j<v;j+=1) { >> + linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> linear_to_xlaw[8192 + j] = (i ^ mask); >> - if (j > 0) >> - linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> } >> } >> + for(;j<8192;j++) { >> + linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); >> + linear_to_xlaw[8192 + j] = (127 ^ mask); >> + } > > removing the if(j>0) and replacing it by the direct init before > is fine. > do the other changes have any significnat speed effect ? > i think they make the code harder to read and this is not really > speed critical code
It is still "speed critical" enough for people to retain CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle count down enough so that hardcoded tables can be removed here. If patch 2 is fine as is, i.e if the current code is fast enough, than I will just commit with the removal of if(j > 0). > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Avoid a single point of failure, be that a person or equipment. > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel