On Tue, 6 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> As the following testcases show, the overflow (needs_overflow) and high
> handling in wi::mul_internal seem to only work correctly for either
> small precisions (less than or equal to 32, that is handled by earlier
> simpler code, not the full Knuth's multiplication algorithm) or for
> precisions which are multiple of HOST_BITS_PER_WIDE_INT, so it happened
> to work fine in most pre-_BitInt era types (and for large bitfields we
> probably didn't have good coverage or were lucky and nothing was asking
> if there was overflow or not; I think high multiplication is typically
> used only when we have optab in corresponding mode).
> E.g. on the gcc.dg/bitint-87.c testcase, there were overflow warnings
> emitted only the the / 2wb * 3wb _BitInt(128) cases, but not in the
> other precisions.
> 
> I found 3 issues when prec > HOST_BITS_PER_HALF_WIDE_INT and
> (prec % HOST_BITS_PER_WIDE_INT) != 0:
> 1) the checking for overflow was simply checking the values of the
>    r array members from half_blocks_needed to 2 * half_blocks_needed - 1,
>    for UNSIGNED overflow checking if any of them is non-zero, for
>    SIGNED comparing them if any is different from top where top is computed
>    from the sign bit of the result (see below); similarly, the highpart
>    multiplication simply picks the half limbs at r + half_blocks_needed
>    offset; and furthermore, for SIGNED there is an adjustment if either
>    operand was negative which also just walks r half-limbs from
>    half_blocks_needed onwards;
>    this works great for precisions like 64 or 128, but for precisions like
>    129, 159, 160 or 161 doesn't, it would need to walk the bits in the
>    half-limbs starting right above the most significant bit of the base
>    precision; that can be up to a whole half-limb and all but one bit from
>    the one below it earlier
> 2) as the comment says, the multiplication is originally done as unsigned
>    multiplication, with adjustment of the high bits which subtracts the
>    other operand once:
>       if (wi::neg_p (op1))
>         {
>           b = 0;
>           for (i = 0; i < half_blocks_needed; i++)
>             {
>               t = (unsigned HOST_WIDE_INT)r[i + half_blocks_needed]
>                 - (unsigned HOST_WIDE_INT)v[i] - b;
>               r[i + half_blocks_needed] = t & HALF_INT_MASK;
>               b = t >> (HOST_BITS_PER_WIDE_INT - 1);
>             }
>         }
>    and similarly for the other one.  Now, this also only works nicely if
>    a negative operand has just a single sign bit set in the given precision;
>    but we were unpacking the operands with wi_unpack (..., SIGNED);, so
>    say for the negative operand in 129-bit precision, that means the least
>    significant bit of u[half_blocks_needed - 2] (or v instead of u depending
>    on which argument it is) is the set sign bit, but then there are 31
>    further copies of the sign bit in u[half_blocks_needed - 2] and
>    further 32 copies in u[half_blocks_needed - 1]; the above adjustment
>    for signed operands doesn't really do the right job in such cases, it
>    would need to subtract many more times the other operand
> 3) the computation of top for SIGNED
>           top = r[(half_blocks_needed) - 1];
>           top = SIGN_MASK (top << (HOST_BITS_PER_WIDE_INT / 2));
>           top &= mask;
>    also uses the most significant bit which fits into prec of the result
>    only if prec is multiple of HOST_BITS_PER_WIDE_INT, otherwise we need
>    to look at a different bit and sometimes it can be also a bit in
>    r[half_blocks_needed - 2]
> 
> For 1), while for UNSIGNED overflow it could be fairly easy to check
> the bits above prec in r half-limbs for being non-zero, doing all the
> shifts also in the SIGNED adjustment code in 2 further locations and finally
> for the high handling (unless we want to assert one doesn't do the highpart
> multiply for such precisions) would be quite ugly and hard to maintain, so
> I instead chose (implemented in the second hunk) to shift the
> beyond-precision bits up such that the expectations of the rest of the
> code are met, that is the LSB of r[half_blocks_needed] after adjustment
> is the bit immediately above the precision, etc.  We don't need to care
> about the bits it shifts out, because the multiplication will yield at most
> 2 * prec bits.
> 
> For 2), the patch changes the wi_unpack argument from SIGNED to UNSIGNED,
> so that we get all zero bits above the precision.
> 
> And finally for 3) it does shifts and perhaps picks lower r half-limb so
> that it uses the actual MSB of the result within prec.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, and additionally
> tested with
> make check-gcc -k -j32 GCC_TEST_RUN_EXPENSIVE=1 
> RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg-torture.exp=*bitint*"
> Ok for trunk?

OK.

Thanks,
Richard.

> 2024-02-06  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR tree-optimization/113753
>       * wide-int.cc (wi::mul_internal): Unpack op1val and op2val with
>       UNSIGNED rather than SIGNED.  If high or needs_overflow and prec is
>       not a multiple of HOST_BITS_PER_WIDE_INT, shift left bits above prec
>       so that they start with r[half_blocks_needed] lowest bit.  Fix up
>       computation of top mask for SIGNED.
> 
>       * gcc.dg/torture/bitint-56.c: New test.
>       * gcc.dg/bitint-87.c: New test.
> 
> --- gcc/wide-int.cc.jj        2024-02-06 08:43:15.069885709 +0100
> +++ gcc/wide-int.cc   2024-02-06 10:29:18.137373518 +0100
> @@ -1483,8 +1483,8 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>      }
>  
>    /* We do unsigned mul and then correct it.  */
> -  wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED);
> -  wi_unpack (v, op2val, op2len, half_blocks_needed, prec, SIGNED);
> +  wi_unpack (u, op1val, op1len, half_blocks_needed, prec, UNSIGNED);
> +  wi_unpack (v, op2val, op2len, half_blocks_needed, prec, UNSIGNED);
>  
>    /* The 2 is for a full mult.  */
>    memset (r, 0, half_blocks_needed * 2
> @@ -1503,6 +1503,28 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>        r[j + half_blocks_needed] = k;
>      }
>  
> +  unsigned int shift;
> +  if ((high || needs_overflow) && (shift = prec % HOST_BITS_PER_WIDE_INT) != 
> 0)
> +    {
> +      /* The high or needs_overflow code assumes that the high bits
> +      only appear from r[half_blocks_needed] up to
> +      r[half_blocks_needed * 2 - 1].  If prec is not a multiple
> +      of HOST_BITS_PER_WIDE_INT, shift the bits above prec up
> +      to make that code simple.  */
> +      if (shift == HOST_BITS_PER_HALF_WIDE_INT)
> +     memmove (&r[half_blocks_needed], &r[half_blocks_needed - 1],
> +              sizeof (r[0]) * half_blocks_needed);
> +      else
> +     {
> +       unsigned int skip = shift < HOST_BITS_PER_HALF_WIDE_INT;
> +       if (!skip)
> +         shift -= HOST_BITS_PER_HALF_WIDE_INT;
> +       for (i = 2 * half_blocks_needed - 1; i >= half_blocks_needed; i--)
> +         r[i] = ((r[i - skip] << (-shift % HOST_BITS_PER_HALF_WIDE_INT))
> +                 | (r[i - skip - 1] >> shift));
> +     }
> +    }
> +
>    /* We did unsigned math above.  For signed we must adjust the
>       product (assuming we need to see that).  */
>    if (sgn == SIGNED && (high || needs_overflow))
> @@ -1543,8 +1565,12 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>       top = 0;
>        else
>       {
> -       top = r[(half_blocks_needed) - 1];
> -       top = SIGN_MASK (top << (HOST_BITS_PER_WIDE_INT / 2));
> +       top = r[half_blocks_needed - 1
> +               - ((-prec % HOST_BITS_PER_WIDE_INT)
> +                  >= HOST_BITS_PER_HALF_WIDE_INT)];
> +       top = SIGN_MASK (((unsigned HOST_WIDE_INT) top)
> +                        << (HOST_BITS_PER_WIDE_INT / 2
> +                            + (-prec % HOST_BITS_PER_HALF_WIDE_INT)));
>         top &= mask;
>       }
>  
> --- gcc/testsuite/gcc.dg/torture/bitint-56.c.jj       2024-02-06 
> 08:53:45.584120553 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-56.c  2024-02-06 08:53:45.584120553 
> +0100
> @@ -0,0 +1,25 @@
> +/* PR tree-optimization/113753 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 129
> +unsigned _BitInt(128)
> +foo (unsigned u, unsigned _BitInt(128) a, unsigned _BitInt(128) b)
> +{
> +  unsigned _BitInt(129) m = a % b;
> +  return u * m / u;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 129
> +  if (foo (0xfa637c33, 0x37af7fe8b0000000000000000wb,
> +        0xfffffffff0000000000000000wb)
> +      != 0x16f7e93f6d726b38b38d0b753wb)
> +    __builtin_abort ();
> +#endif
> +}
> --- gcc/testsuite/gcc.dg/bitint-87.c.jj       2024-01-13 00:05:00.077372302 
> +0100
> +++ gcc/testsuite/gcc.dg/bitint-87.c  2024-02-06 10:41:33.403167442 +0100
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/113753 */
> +/* { dg-do compile { target bitint575 } } */
> +/* { dg-options "-std=gnu23" } */
> +
> +_BitInt(161) a = 1461501637330902918203684832716283019655932542975wb / 2wb * 
> 2wb;
> +_BitInt(160) b = 730750818665451459101842416358141509827966271487wb / 2wb * 
> 2wb;
> +_BitInt(159) c = 365375409332725729550921208179070754913983135743wb / 2wb * 
> 2wb;
> +_BitInt(129) d = 340282366920938463463374607431768211455wb / 2wb * 2wb;
> +_BitInt(128) e = 170141183460469231731687303715884105727wb / 2wb * 2wb;
> +_BitInt(161) f = (-1461501637330902918203684832716283019655932542975wb - 
> 1wb) / 2wb * 2wb;
> +_BitInt(160) g = (-730750818665451459101842416358141509827966271487wb - 1wb) 
> / 2wb * 2wb;
> +_BitInt(159) h = (-365375409332725729550921208179070754913983135743wb - 1wb) 
> / 2wb * 2wb;
> +_BitInt(129) i = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 
> 2wb;
> +_BitInt(128) j = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 
> 2wb;
> +_BitInt(161) k = 1461501637330902918203684832716283019655932542975wb / 2wb * 
> 3wb;            /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(161\\\)' results in" } */
> +_BitInt(160) l = 730750818665451459101842416358141509827966271487wb / 2wb * 
> 3wb;             /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(160\\\)' results in" } */
> +_BitInt(159) m = 365375409332725729550921208179070754913983135743wb / 2wb * 
> 3wb;             /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(159\\\)' results in" } */
> +_BitInt(129) n = 340282366920938463463374607431768211455wb / 2wb * 3wb;      
>                         /* { dg-warning "integer overflow in expression of 
> type '_BitInt\\\(129\\\)' results in" } */
> +_BitInt(128) o = 170141183460469231731687303715884105727wb / 2wb * 3wb;      
>                         /* { dg-warning "integer overflow in expression of 
> type '_BitInt\\\(128\\\)' results in" } */
> +_BitInt(161) p = (-1461501637330902918203684832716283019655932542975wb - 
> 1wb) / 2wb * 3wb;   /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(161\\\)' results in" } */
> +_BitInt(160) q = (-730750818665451459101842416358141509827966271487wb - 1wb) 
> / 2wb * 3wb;    /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(160\\\)' results in" } */
> +_BitInt(159) r = (-365375409332725729550921208179070754913983135743wb - 1wb) 
> / 2wb * 3wb;    /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(159\\\)' results in" } */
> +_BitInt(129) s = (-340282366920938463463374607431768211455wb - 1wb) / 2wb * 
> 3wb;             /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(129\\\)' results in" } */
> +_BitInt(128) t = (-170141183460469231731687303715884105727wb - 1wb) / 2wb * 
> 3wb;             /* { dg-warning "integer overflow in expression of type 
> '_BitInt\\\(128\\\)' results in" } */
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to