https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113753
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I see 2 issues. One is a wide-int.cc bug, where VRP calls operator_mult::overflow_free_p on unsigned _BitInt(129) [0, 340282366920938463463374607431768211454] and unsigned _BitInt(129) [0, 4294967295] and incorrectly says that it is overflow free, that is not the case, ((unsigned _BitInt(129)) 0xfffffffffffffffffffffffffffffffeuwb) * 0x00000000ffffffffuwb is 0xfffffffefffffffffffffffffffffffe00000002uwb and that surely doesn't fit into 129 bits, the above shifted right by 129 gives 0x7fffffff. Now, mul_internal seems to compute the right value: x/12wx r 0x7fffffffa800: 0x00000002 0xfffffffe 0xffffffff 0xffffffff 0x7fffffffa810: 0xfffffffe 0x00000000 0x00000000 0x00000000 0x7fffffffa820: 0x00000000 0x00000000 0x00000000 0x00000000 where half_blocks_needed is 6. The problem is that the needs_overflow code just looks at the half limbs from half_blocks_needed to half_blocks_needed * 2, which is fine for precisions which are multiple of 64 (HOST_BITS_PER_WIDE_INT), or for precisions <= 32 (HOST_BITS_PER_HALF_WIDE_INT) for which we use different code, the /* If we need to check for overflow, we can only do half wide multiplies quickly because we need to look at the top bits to check for the overflow. */ stuff. And another issue (not relevant to x86_64 or aarch64, but probably to arm) is that __mulbitint3 doesn't doesn't actually try to extend the most significant limb if there is an overflow, so on the testcase with -O0 we end up with the most significant of the 3 limbs being 3 even when it is unsigned 129 precision. That needs to be 1 if the ABI doesn't say the upper bits beyond precision are unspecified. Now, we could do that extension either only on the affected arches in __mulbitint3 caller, or in libgcc unconditionally, or in libgcc only for affected targets.