https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97738

--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Even faster code:

  ctz = __builtin_ctz (value);
  lowest_bit = value & - value;
  left_bits = value + lowest_bit;
  changed_bits = value ^ left_bits;
  right_bits = changed_bits >> (ctz + 2);
  return left_bits | right_bits;

The first two instructions get compiled directly (with -march=native)
to

        blsi    %edi, %edx
        tzcntl  %edi, %eax

Reply via email to