[Bug ipa/110946] 3x perf regression with -Os on M1 Pro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #11 from Alexander Monakov --- (In reply to Alexander Monakov from comment #8) > inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) > { > memcpy(p, , sizeof(x)); > } > > > We deciding to not inline this, while inlining its get_unaligned > counterpart? Seems bizarre. I can reproduce this part, and on my side it's caused by _FORTIFY_SOURCE: with fortification, put_unaligned indeed looks bigger during inlining: mbedtls_put_unaligned_uint32 (void * p, uint32_t x) { long unsigned int _3; [local count: 1073741824]: _3 = __builtin_object_size (p_2(D), 0); __builtin___memcpy_chk (p_2(D), , 4, _3); return; } mbedtls_get_unaligned_uint64 (const void * p) { long unsigned int _3; [local count: 1073741824]: _3 = MEM [(char * {ref-all})p_2(D)]; return _3; }
[Bug ipa/110946] 3x perf regression with -Os on M1 Pro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #10 from Alexander Monakov --- Ah, the non-static inlines are intentional, the corresponding extern declarations appear in library/platform_util.c. Sorry, I missed that file the first time around.
[Bug ipa/110946] 3x perf regression with -Os on M1 Pro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #9 from Alexander Monakov --- (In reply to Alexander Monakov from comment #2) > Note that inline functions in mbedtls/library/alignment.h all miss the > 'static' qualifier, which affects inlining decisions, and looks like a > mistake anyway (if they are really meant to be non-static inlines, shouldn't > there be a comment?) Can you address this on the mbedtls side? Even if it doesn't help with the observed slowdown, it will remain a problem for the future if left unfixed.
[Bug ipa/110946] 3x perf regression with -Os on M1 Pro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #8 from Alexander Monakov --- Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a straightforward wrapper for memcpy: inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) { memcpy(p, , sizeof(x)); } We deciding to not inline this, while inlining its get_unaligned counterpart? Seems bizarre.
[Bug ipa/110946] 3x perf regression with -Os on M1 Pro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 Richard Biener changed: What|Removed |Added Component|other |ipa Target||aarch64 Keywords||missed-optimization CC||marxin at gcc dot gnu.org --- Comment #3 from Richard Biener --- Note you shouldn't use -Os if you care about performance. GCC is quite reasonable with code size increases at -O2 (as compared to other compilers). Instead I suggest you use -flto with -O2 to decrease the size of the final executable/library and give GCC better knowledge on unit growth.