>> The new version does not seem better, as it adds a branch on the path >> and it is not smaller. > > That looks like bb-reorder isn't doing its job? Maybe it thinks that > pop is too expensive to copy?
It relies on static branch probabilities, which are set completely wrong in GCC, so it ends up optimizing the hot path in many functions for size rather than speed and visa versa. A simple example I tried on AArch64: void g(void); int a; int f(void) { g(); if (a == 0) // or != 0 or < 0 or a < 0x7ffffffe return -1; a = 1; return 1; } The funny thing is that a == 0 and a != 0 behave in exactly the same way, but a < 0 and a >= 0 are different. However a < C and a > C are always seen as unlikely no matter the immediate, except for a > 0 which inlines the return... I also noticed that GCC ignores the explicit __builtin_expect used in the string/str(c)spn.c implementations in GLIBC (which you need to avoid incorrect block ordering) and not only inlines returns in the cold path but also fails to inline them in the hot path... Wilco