http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52285
Bug #: 52285 Summary: [4.7 Regression] libgcrypt _gcry_burn_stack slowdown Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ja...@gcc.gnu.org #define wipememory2(_ptr,_set,_len) do { \ volatile char *_vptr=(volatile char *)(_ptr); \ unsigned long _vlen=(_len); \ while(_vlen) { *_vptr=(_set); _vptr++; _vlen--; } \ } while(0) #define wipememory(_ptr,_len) wipememory2(_ptr,0,_len) __attribute__((noinline, noclone)) void _gcry_burn_stack (int bytes) { char buf[64]; wipememory (buf, sizeof buf); bytes -= sizeof buf; if (bytes > 0) _gcry_burn_stack (bytes); } is one of the hot parts of gcrypt camellia256 ECB benchmark which apparently slowed down in 4.7 compared to 4.6. The routine is called many times, usually with bytes equal to 372. The above first slowed down the whole benchmark from ~ 2040ms to ~ 2400ms (-O2 -march=corei7-avx -fPIC) in http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=178104 and then (somewhat surprisingly) became tiny bit faster again with http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181172 , which no longer tail recursion optimizes the function (in this case suprisingly a win, but generally IMHO a mistake).