https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82135
Bug ID: 82135 Summary: Missed constant propagation through possible unsigned wraparound, with std::align() variable pointer, constant everything else. Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- The code in this report is easiest to look at here: https://godbolt.org/g/DffP3J, with asm output. When g++ inlines this (copied version of std::align from include/c++/memory), it fails to optimize to just rounding up to the next power of 2 when align=size=64 and space=1024, but ptr is variable. (If __ptr is also constant, it's fine.) #include <cstdint> #include <stddef.h> inline void* libalign(size_t __align, size_t __size, void*& __ptr, size_t& __space) noexcept { const auto __intptr = reinterpret_cast<uintptr_t>(__ptr); const auto __aligned = (__intptr - 1u + __align) & -__align; // if (__aligned < __size) __builtin_unreachable(); const auto __diff = __aligned - __intptr; // if (__diff > __size) __builtin_unreachable(); if ((__size + __diff) > __space) return (void*)123456; //nullptr; // non-zero constant is obvious in the asm else { __space -= __diff; return __ptr = reinterpret_cast<void*>(__aligned); } } void *libalign64(void *voidp) { std::size_t len = 1024; //if (voidp+len < voidp) __builtin_unreachable(); // doesn't help voidp = libalign(64, 64, voidp, len); return voidp; } g++ -O3 -std=c++14 -Wall -Wextra (trunk 8.0.0 20170906) # x86-64. Other targets do the same compare/cmov or branch leaq 63(%rdi), %rax andq $-64, %rax movq %rax, %rdx subq %rdi, %rdx addq $65, %rdx cmpq $1025, %rdx movl $123456, %edx cmovnb %rdx, %rax ret libalign64 gives exactly the same result as just rounding up to the next power of 2 (including wrapping around to zero with addresses very close to the top). But gcc doesn't spot this, I think getting confused about what can happen with unsigned wraparound. char *roundup2(char *p) { auto t = (uintptr_t)p; t = (t+63) & -64; return (char*)t; } leaq 63(%rdi), %rax andq $-64, %rax ret For easy testing, I made wrappers that call with a constant pointer, so I can test that it really does wrap around at exactly the same place as roundup2(). (It does: libalign64(-64) = -64, libalign64(-64) = 0.) So it can safely be compiled to 2 instructions on targets where unsigned integer wraparound works normally, without all that adding constants and comparing against constants. static char* const test_constant = (char*)-63ULL; char *test_roundup2() { return roundup2(test_constant); } void *test_libalign() { return libalign64(test_constant); } Uncommenting this line I added: if (__diff > __size) __builtin_unreachable(); lets it compile to just two instructions, but that condition isn't really always true. __diff will be huge when __aligned wraps around. clang, icc, and msvc also fail to make this optimization. IDK if it's particularly useful in real life for anything other than abusing std::align as a simple round-up function.