https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82135

            Bug ID: 82135
           Summary: Missed constant propagation through possible unsigned
                    wraparound, with std::align() variable pointer,
                    constant everything else.
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

The code in this report is easiest to look at here:
https://godbolt.org/g/DffP3J, with asm output.

When g++ inlines this (copied version of std::align from include/c++/memory),
it fails to optimize to just rounding up to the next power of 2 when
align=size=64 and space=1024, but ptr is variable.

(If __ptr is also constant, it's fine.)

#include <cstdint>
#include <stddef.h>
inline void*
libalign(size_t __align, size_t __size, void*& __ptr, size_t& __space) noexcept
{
  const auto __intptr = reinterpret_cast<uintptr_t>(__ptr);
  const auto __aligned = (__intptr - 1u + __align) & -__align;
//    if (__aligned < __size)   __builtin_unreachable();
  const auto __diff = __aligned - __intptr;
//    if (__diff > __size)  __builtin_unreachable();
  if ((__size + __diff) > __space)
    return (void*)123456; //nullptr;   // non-zero constant is obvious in the
asm
  else
    {
      __space -= __diff;
      return __ptr = reinterpret_cast<void*>(__aligned);
    }
}

void *libalign64(void *voidp) {
    std::size_t len = 1024;
             //if (voidp+len < voidp) __builtin_unreachable();   // doesn't
help
    voidp = 
      libalign(64, 64, voidp, len);
    return voidp;
}

g++ -O3 -std=c++14  -Wall -Wextra  (trunk 8.0.0 20170906)

        # x86-64.  Other targets do the same compare/cmov or branch
        leaq    63(%rdi), %rax
        andq    $-64, %rax
        movq    %rax, %rdx
        subq    %rdi, %rdx
        addq    $65, %rdx
        cmpq    $1025, %rdx
        movl    $123456, %edx
        cmovnb  %rdx, %rax
        ret


libalign64 gives exactly the same result as just rounding up to the next power
of 2 (including wrapping around to zero with addresses very close to the top). 
But gcc doesn't spot this, I think getting confused about what can happen with
unsigned wraparound.

char *roundup2(char *p) {
    auto t = (uintptr_t)p;
    t = (t+63) & -64;
    return (char*)t;
}

        leaq    63(%rdi), %rax
        andq    $-64, %rax
        ret

For easy testing, I made wrappers that call with a constant pointer, so I can
test that it really does wrap around at exactly the same place as roundup2(). 
(It does: libalign64(-64) = -64, libalign64(-64) = 0.)  So it can safely be
compiled to 2 instructions on targets where unsigned integer wraparound works
normally, without all that adding constants and comparing against constants.

static char* const test_constant = (char*)-63ULL;

char *test_roundup2() {
    return roundup2(test_constant);
}
void *test_libalign() {
    return libalign64(test_constant);
}


Uncommenting this line I added:
   if (__diff > __size)  __builtin_unreachable();

lets it compile to just two instructions, but that condition isn't really
always true.  __diff will be huge when __aligned wraps around.

clang, icc, and msvc also fail to make this optimization.  IDK if it's
particularly useful in real life for anything other than abusing std::align as
a simple round-up function.

Reply via email to