https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115115

--- Comment #9 from Jan Wassenberg <jan.wassenberg at gmail dot com> ---
On second thought, we are actually trying to convert out-of-bounds values to
the closest representable. We use the documented behavior of the instruction,
as mentioned in #5, and then correct the result afterwards.

Per the comment in the code below, it seems GCC since v11 or even 10 has been
assuming this is UB, and optimizing out our fix.

I do believe this is compiler misbehavior, rooted in treating the operation as
if it were scalar code. But vector instructions are more modern and have
tighter specs; for example, integers are 2's complement and wraparound for
addition is well-defined in the actual instructions.

Given that at least GCC's constant folding has unexpected results, we will have
to find a workaround. I had previously worried that a floating-point min(input,
(2^63)-1) is not exact. But comparing the float >= 2^63 and if so returning
(2^63)-1 would work, right? The CPU will anyway truncate the float to int.

Our current fixup code:

// For ConvertTo float->int of same size, clamping before conversion would
// change the result because the max integer value is not exactly
representable.
// Instead detect the overflow result after conversion and fix it.
// Generic for all vector lengths.
template <class DI>
HWY_INLINE VFromD<DI> FixConversionOverflow(DI di,
                                            VFromD<RebindToFloat<DI>> original,
                                            VFromD<DI> converted) {
  // Combinations of original and output sign:
  //   --: normal <0 or -huge_val to 80..00: OK
  //   -+: -0 to 0                         : OK
  //   +-: +huge_val to 80..00             : xor with FF..FF to get 7F..FF
  //   ++: normal >0                       : OK
  const VFromD<DI> sign_wrong = AndNot(BitCast(di, original), converted);
#if HWY_COMPILER_GCC_ACTUAL
  // Critical GCC 11 compiler bug (possibly also GCC 10): omits the Xor; also
  // Add() if using that instead. Work around with one more instruction.
  const RebindToUnsigned<DI> du;
  const VFromD<DI> mask = BroadcastSignBit(sign_wrong);
  const VFromD<DI> max = BitCast(di, ShiftRight<1>(BitCast(du, mask)));
  return IfVecThenElse(mask, max, converted);
#else
  return Xor(converted, BroadcastSignBit(sign_wrong));
#endif
}

Reply via email to