https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113753

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I see 2 issues.
One is a wide-int.cc bug, where VRP calls operator_mult::overflow_free_p on
unsigned _BitInt(129) [0, 340282366920938463463374607431768211454]
and
unsigned _BitInt(129) [0, 4294967295]
and incorrectly says that it is overflow free, that is not the case,
((unsigned _BitInt(129)) 0xfffffffffffffffffffffffffffffffeuwb) *
0x00000000ffffffffuwb
is
0xfffffffefffffffffffffffffffffffe00000002uwb and that surely doesn't fit into
129 bits, the above shifted right by 129 gives 0x7fffffff.
Now, mul_internal seems to compute the right value:
x/12wx r
0x7fffffffa800: 0x00000002      0xfffffffe      0xffffffff      0xffffffff
0x7fffffffa810: 0xfffffffe      0x00000000      0x00000000      0x00000000
0x7fffffffa820: 0x00000000      0x00000000      0x00000000      0x00000000
where half_blocks_needed is 6.
The problem is that the needs_overflow code just looks at the half limbs from
half_blocks_needed to half_blocks_needed * 2, which is fine for precisions
which are multiple of 64 (HOST_BITS_PER_WIDE_INT), or for precisions <= 32
(HOST_BITS_PER_HALF_WIDE_INT) for which we use different code, the
  /* If we need to check for overflow, we can only do half wide
     multiplies quickly because we need to look at the top bits to
     check for the overflow.  */
stuff.

And another issue (not relevant to x86_64 or aarch64, but probably to arm) is
that
__mulbitint3 doesn't doesn't actually try to extend the most significant limb
if there is an overflow, so on the testcase with -O0 we end up with the most
significant of the 3 limbs being 3 even when it is unsigned 129 precision. 
That needs to be 1 if the ABI doesn't say the upper bits beyond precision are
unspecified.
Now, we could do that extension either only on the affected arches in
__mulbitint3 caller, or in libgcc unconditionally, or in libgcc only for
affected targets.

Reply via email to