[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #11 from Alexander Monakov  ---
(In reply to Alexander Monakov from comment #8)
> inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
> {
> memcpy(p, , sizeof(x));
> }
> 
> 
> We deciding to not inline this, while inlining its get_unaligned
> counterpart? Seems bizarre.

I can reproduce this part, and on my side it's caused by _FORTIFY_SOURCE: with
fortification, put_unaligned indeed looks bigger during inlining:

mbedtls_put_unaligned_uint32 (void * p, uint32_t x)
{
  long unsigned int _3;

   [local count: 1073741824]:
  _3 = __builtin_object_size (p_2(D), 0);
  __builtin___memcpy_chk (p_2(D), , 4, _3);
  return;

}

mbedtls_get_unaligned_uint64 (const void * p)
{
  long unsigned int _3;

   [local count: 1073741824]:
  _3 = MEM  [(char * {ref-all})p_2(D)];
  return _3;

}

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #10 from Alexander Monakov  ---
Ah, the non-static inlines are intentional, the corresponding extern
declarations appear in library/platform_util.c. Sorry, I missed that file the
first time around.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #9 from Alexander Monakov  ---
(In reply to Alexander Monakov from comment #2)
> Note that inline functions in mbedtls/library/alignment.h all miss the
> 'static' qualifier, which affects inlining decisions, and looks like a
> mistake anyway (if they are really meant to be non-static inlines, shouldn't
> there be a comment?)

Can you address this on the mbedtls side? Even if it doesn't help with the
observed slowdown, it will remain a problem for the future if left unfixed.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

--- Comment #8 from Alexander Monakov  ---
Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a
straightforward wrapper for memcpy:

inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
{
memcpy(p, , sizeof(x));
}


We deciding to not inline this, while inlining its get_unaligned counterpart?
Seems bizarre.

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946

Richard Biener  changed:

   What|Removed |Added

  Component|other   |ipa
 Target||aarch64
   Keywords||missed-optimization
 CC||marxin at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
Note you shouldn't use -Os if you care about performance.  GCC is quite
reasonable with code size increases at -O2 (as compared to other compilers). 
Instead I suggest you use -flto with -O2 to decrease the size of the final
executable/library and give GCC better knowledge on unit growth.