Re: [PATCH] wide-int: Fix wide_int division/remainder [PR112733]

Richard Biener Wed, 29 Nov 2023 04:20:26 -0800

On Wed, 29 Nov 2023, Jakub Jelinek wrote:

> Hi!
> 
> This patch fixes the other half of the PR, where we were crashing on the
> UNSIGNED widest_int modulo of 1 % -128 (where the -128 in UNSIGNED is
> 131072 bit 0xfffff...fffff80).
> The patch actually has 2 independent parts, if you want, they could be
> split out.
> The fix for the divmod buffer overflow is in the 2nd-5th and 7th-8th hunks of
> the patch.  abs (remainder) <= min (abs (dividend), abs (divisor)) and the
> wide-int.h callers estimate the remainder size to be the same as the
> quotient size, i.e. for UNSIGNED dividend with msb set quotient (same as
> remainder) precision based estimation, for SIGNED or dividend with msb
> clear estimate based on divident len (+ 1).
> For quotient (if any), wi_pack is called as
>   quotient_len = wi_pack (quotient, b_quotient, m, dividend_prec);
> where m is
>   m = dividend_blocks_needed;
>   while (m > 1 && b_dividend[m - 1] == 0)
>     m--;
> and
>   dividend_blocks_needed = 2 * BLOCKS_NEEDED (dividend_prec);
> where wi_pack stores to result (in_len + 1) / 2 limbs (or one more
> if (in_len & 1) == 0 and in_len / 2 < BLOCKS_NEEDED (dividend_prec)).
> In any case, that is never more than BLOCKS_NEEDED (dividend_prec) and
> matches caller's estimations (then it is canonicalized and perhaps shrunk
> that way).
> 
> The problematic case is the remainder, where wi_pack is called as
>   *remainder_len = wi_pack (remainder, b_remainder, n, dividend_prec);
> The last argument is again based on the dividend precision like for
> quotient, but n is instead:
>   n = divisor_blocks_needed;
>   while (n > 1 && b_divisor[n - 1] == 0)
>     n--;
> so based on the divisor rather than dividend.  Now, in the
> wi::multiple_of_p (1, -128, UNSIGNED) widest_int precision case,
> dividend_prec is very small, the dividend is 1 and while the official
> precision is 131072 bits, dividend_len is 1 and
>   if (sgn == SIGNED || dividend_val[dividend_len - 1] >= 0)
>     dividend_prec = MIN ((dividend_len + 1) * HOST_BITS_PER_WIDE_INT,
>                          dividend_prec);
> shrinks the estimate to 128 bits in this case; but divisor_prec
> is huge, because the divisor is negative UNSIGNED, so 131072 bits,
> 2048 limbs, 4096 half-limbs (so divisor_blocks_needed and n are 4096).
> Now, wi_pack relies on the penultimate argument to be smaller or equal
> to 2 * BLOCKS_NEEDED of the last argument and that is not the case here,
> 4096 is certainly larger than 2 * BLOCKS_NEEDED (128) aka 4 and so
> wi_pack overflows the array.
> Unfortunately looking around, this isn't the only overflow, already
> in divmod_internal_2 we have an overflow, though there not overflowing
> a buffer during writing, but during reading.  When divmod_internal_2
> is called with the last 2 arguments like m=1, n=4096 as in this case
> (but generally any where m < n), it has a special case for n == 1 (which
> won't be the problem, we never have m or n 0), then decides based on
> msb of divisor whether there should be some shift canonicalization, then
> performs a
>   for (j = m - n; j >= 0; j--)
> main loop and finally initializes b_remainder:
>   if (s)
>     for (i = 0; i < n; i++)
>       b_remainder[i] = (b_dividend[i] >> s)
>         | (b_dividend[i+1] << (HOST_BITS_PER_HALF_WIDE_INT - s));
>   else
>     for (i = 0; i < n; i++)
>       b_remainder[i] = b_dividend[i];
> In the usual case of m >= n, the main loop will iterate at least once,
> b_dividend has m + 1 valid elements and the copying to b_remainder
> is correct.  But for the m < n unusual case, the main loop never iterates
> and we want to copy the b_dividend right into b_remainder (possibly with
> shifts).  Except when it copies n elements, it copies garbage which isn't
> there at all for the last n - m elements.
> Because the decision whether the main loop should iterate at all or not
> and the s computation should be based on the original n, the following
> patch sets n to MIN (n, m) only after the main loop, such that we copy
> to b_remainder at most what is initialized in b_dividend, and returns
> that adjusted value to the caller which then passes it to wi_pack and
> so satisfies again the requirement there, rather than trying to
> do the n = MIN (n, m) before calling divmod_internal_2 or after it.
> 
> I believe the bug is mostly something I've only introduced myself in the
> > 576 bit wide_int/widest_int support changes, before that there was
> no attempt to decrease the precisions and so I think dividend_prec
> and divisor_prec were pretty much always the same (or always?),
> and even the copying of garbage wasn't the case because the only time
> m or n was decreased from the same precision was if there were 0s in the
> upper half-limbs.
> 
> The 1st and 6th hunks in the patch is just that while looking at the
> code, I've noticed I computed wrong the sizes in the XALLOCAVEC case,
> somehow the 4 * in the static arrays led me believe I need to make the
> alloced buffers twice (in the multiplication) or 4 times (in the
> division/modulo cases) larger than needed.
> 
> Earlier version of the patch has been bootstrapped/regtested on x86_64-linux
> and i686-linux and due to forgotten adjustment of memset argument in 6th
> hunk regressed gcc.dg/bitint-3{8,9}.c tests, this version so far passed
> make check-gcc check-g++ -j32 -k GCC_TEST_RUN_EXPENSIVE=1 
> RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c 
> builtin-stdc-bit-*.c pr112566-2.c pr112511.c' dg-torture.exp=*bitint* 
> dfp.exp=*bitint*"
> 
> Ok for trunk if it passes full bootstrap/regtest again?
> And do you want to split those 2 hunks into a separate commit or not?


OK in one piece.

Thanks,
Richard.

> 2023-11-29  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR middle-end/112733
>       * wide-int.cc (wi::mul_internal): Don't allocate twice as much
>       space for u, v and r as needed.
>       (divmod_internal_2): Change return type from void to int, for n == 1
>       return 1, otherwise before writing b_dividend into b_remainder set
>       n to MIN (n, m) and at the end return it.
>       (wi::divmod_internal): Don't allocate 4 times as much space for
>       b_quotient, b_remainder, b_dividend and b_divisor.  Set n to
>       result of divmod_internal_2.
>       (wide_int_cc_tests): Add test for unsigned widest_int
>       wi::multiple_of_p of 1 and -128.
> 
> --- gcc/wide-int.cc.jj        2023-11-28 16:56:50.000000000 +0100
> +++ gcc/wide-int.cc   2023-11-29 10:53:14.329098216 +0100
> @@ -1477,10 +1477,10 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>    if (UNLIKELY (prec > WIDE_INT_MAX_INL_PRECISION))
>      {
>        unsigned HOST_HALF_WIDE_INT *buf
> -     = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * 4 * blocks_needed);
> +     = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * half_blocks_needed);
>        u = buf;
> -      v = u + 4 * blocks_needed;
> -      r = v + 4 * blocks_needed;
> +      v = u + half_blocks_needed;
> +      r = v + half_blocks_needed;
>      }
>  
>    /* We do unsigned mul and then correct it.  */
> @@ -1675,8 +1675,9 @@ wi::sub_large (HOST_WIDE_INT *val, const
>     Delight by Warren, which itself is a small modification of Knuth's
>     algorithm.  M is the number of significant elements of U however
>     there needs to be at least one extra element of B_DIVIDEND
> -   allocated, N is the number of elements of B_DIVISOR.  */
> -static void
> +   allocated, N is the number of elements of B_DIVISOR.
> +   Return new value for N.  */
> +static int
>  divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient,
>                  unsigned HOST_HALF_WIDE_INT *b_remainder,
>                  unsigned HOST_HALF_WIDE_INT *b_dividend,
> @@ -1707,7 +1708,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI
>                 * (unsigned HOST_WIDE_INT)b_divisor[0]));
>       }
>        b_remainder[0] = k;
> -      return;
> +      return 1;
>      }
>  
>    s = clz_hwi (b_divisor[n-1]) - HOST_BITS_PER_HALF_WIDE_INT; /* CHECK clz */
> @@ -1770,6 +1771,10 @@ divmod_internal_2 (unsigned HOST_HALF_WI
>         b_dividend[j+n] += k;
>       }
>      }
> +  /* If N > M, the main loop was skipped, quotient will be 0 and
> +     we can't copy more than M half-limbs into the remainder, as they
> +     aren't present in b_dividend (which has .  */
> +  n = MIN (n, m);
>    if (s)
>      for (i = 0; i < n; i++)
>        b_remainder[i] = (b_dividend[i] >> s)
> @@ -1777,6 +1782,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI
>    else
>      for (i = 0; i < n; i++)
>        b_remainder[i] = b_dividend[i];
> +  return n;
>  }
>  
>  
> @@ -1943,14 +1949,14 @@ wi::divmod_internal (HOST_WIDE_INT *quot
>      {
>        unsigned HOST_HALF_WIDE_INT *buf
>       = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT,
> -                   12 * dividend_blocks_needed
> -                   + 4 * divisor_blocks_needed + 1);
> +                   3 * dividend_blocks_needed + 1
> +                   + divisor_blocks_needed);
>        b_quotient = buf;
> -      b_remainder = b_quotient + 4 * dividend_blocks_needed;
> -      b_dividend = b_remainder + 4 * dividend_blocks_needed;
> -      b_divisor = b_dividend + 4 * dividend_blocks_needed + 1;
> +      b_remainder = b_quotient + dividend_blocks_needed;
> +      b_dividend = b_remainder + dividend_blocks_needed;
> +      b_divisor = b_dividend + dividend_blocks_needed + 1;
>        memset (b_quotient, 0,
> -           4 * dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT));
> +           dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT));
>      }
>    wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (),
>            dividend_blocks_needed, dividend_prec, UNSIGNED);
> @@ -1969,7 +1975,7 @@ wi::divmod_internal (HOST_WIDE_INT *quot
>    if (b_quotient == b_quotient_buf)
>      memset (b_quotient_buf, 0, sizeof (b_quotient_buf));
>  
> -  divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n);
> +  n = divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, 
> n);
>  
>    unsigned int quotient_len = 0;
>    if (quotient)
> @@ -2673,6 +2679,9 @@ wide_int_cc_tests ()
>            wi::shifted_mask (0, 128, false, 128));
>    ASSERT_EQ (wi::mask (128, true, 128),
>            wi::shifted_mask (0, 128, true, 128));
> +  ASSERT_EQ (wi::multiple_of_p (from_int <widest_int> (1),
> +                             from_int <widest_int> (-128), UNSIGNED),
> +          false);
>  }
>  
>  } // namespace selftest
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] wide-int: Fix wide_int division/remainder [PR112733]

Reply via email to