On Wed, 29 Nov 2023, Jakub Jelinek wrote: > Hi! > > This patch fixes the other half of the PR, where we were crashing on the > UNSIGNED widest_int modulo of 1 % -128 (where the -128 in UNSIGNED is > 131072 bit 0xfffff...fffff80). > The patch actually has 2 independent parts, if you want, they could be > split out. > The fix for the divmod buffer overflow is in the 2nd-5th and 7th-8th hunks of > the patch. abs (remainder) <= min (abs (dividend), abs (divisor)) and the > wide-int.h callers estimate the remainder size to be the same as the > quotient size, i.e. for UNSIGNED dividend with msb set quotient (same as > remainder) precision based estimation, for SIGNED or dividend with msb > clear estimate based on divident len (+ 1). > For quotient (if any), wi_pack is called as > quotient_len = wi_pack (quotient, b_quotient, m, dividend_prec); > where m is > m = dividend_blocks_needed; > while (m > 1 && b_dividend[m - 1] == 0) > m--; > and > dividend_blocks_needed = 2 * BLOCKS_NEEDED (dividend_prec); > where wi_pack stores to result (in_len + 1) / 2 limbs (or one more > if (in_len & 1) == 0 and in_len / 2 < BLOCKS_NEEDED (dividend_prec)). > In any case, that is never more than BLOCKS_NEEDED (dividend_prec) and > matches caller's estimations (then it is canonicalized and perhaps shrunk > that way). > > The problematic case is the remainder, where wi_pack is called as > *remainder_len = wi_pack (remainder, b_remainder, n, dividend_prec); > The last argument is again based on the dividend precision like for > quotient, but n is instead: > n = divisor_blocks_needed; > while (n > 1 && b_divisor[n - 1] == 0) > n--; > so based on the divisor rather than dividend. Now, in the > wi::multiple_of_p (1, -128, UNSIGNED) widest_int precision case, > dividend_prec is very small, the dividend is 1 and while the official > precision is 131072 bits, dividend_len is 1 and > if (sgn == SIGNED || dividend_val[dividend_len - 1] >= 0) > dividend_prec = MIN ((dividend_len + 1) * HOST_BITS_PER_WIDE_INT, > dividend_prec); > shrinks the estimate to 128 bits in this case; but divisor_prec > is huge, because the divisor is negative UNSIGNED, so 131072 bits, > 2048 limbs, 4096 half-limbs (so divisor_blocks_needed and n are 4096). > Now, wi_pack relies on the penultimate argument to be smaller or equal > to 2 * BLOCKS_NEEDED of the last argument and that is not the case here, > 4096 is certainly larger than 2 * BLOCKS_NEEDED (128) aka 4 and so > wi_pack overflows the array. > Unfortunately looking around, this isn't the only overflow, already > in divmod_internal_2 we have an overflow, though there not overflowing > a buffer during writing, but during reading. When divmod_internal_2 > is called with the last 2 arguments like m=1, n=4096 as in this case > (but generally any where m < n), it has a special case for n == 1 (which > won't be the problem, we never have m or n 0), then decides based on > msb of divisor whether there should be some shift canonicalization, then > performs a > for (j = m - n; j >= 0; j--) > main loop and finally initializes b_remainder: > if (s) > for (i = 0; i < n; i++) > b_remainder[i] = (b_dividend[i] >> s) > | (b_dividend[i+1] << (HOST_BITS_PER_HALF_WIDE_INT - s)); > else > for (i = 0; i < n; i++) > b_remainder[i] = b_dividend[i]; > In the usual case of m >= n, the main loop will iterate at least once, > b_dividend has m + 1 valid elements and the copying to b_remainder > is correct. But for the m < n unusual case, the main loop never iterates > and we want to copy the b_dividend right into b_remainder (possibly with > shifts). Except when it copies n elements, it copies garbage which isn't > there at all for the last n - m elements. > Because the decision whether the main loop should iterate at all or not > and the s computation should be based on the original n, the following > patch sets n to MIN (n, m) only after the main loop, such that we copy > to b_remainder at most what is initialized in b_dividend, and returns > that adjusted value to the caller which then passes it to wi_pack and > so satisfies again the requirement there, rather than trying to > do the n = MIN (n, m) before calling divmod_internal_2 or after it. > > I believe the bug is mostly something I've only introduced myself in the > > 576 bit wide_int/widest_int support changes, before that there was > no attempt to decrease the precisions and so I think dividend_prec > and divisor_prec were pretty much always the same (or always?), > and even the copying of garbage wasn't the case because the only time > m or n was decreased from the same precision was if there were 0s in the > upper half-limbs. > > The 1st and 6th hunks in the patch is just that while looking at the > code, I've noticed I computed wrong the sizes in the XALLOCAVEC case, > somehow the 4 * in the static arrays led me believe I need to make the > alloced buffers twice (in the multiplication) or 4 times (in the > division/modulo cases) larger than needed. > > Earlier version of the patch has been bootstrapped/regtested on x86_64-linux > and i686-linux and due to forgotten adjustment of memset argument in 6th > hunk regressed gcc.dg/bitint-3{8,9}.c tests, this version so far passed > make check-gcc check-g++ -j32 -k GCC_TEST_RUN_EXPENSIVE=1 > RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c > builtin-stdc-bit-*.c pr112566-2.c pr112511.c' dg-torture.exp=*bitint* > dfp.exp=*bitint*" > > Ok for trunk if it passes full bootstrap/regtest again? > And do you want to split those 2 hunks into a separate commit or not?
OK in one piece. Thanks, Richard. > 2023-11-29 Jakub Jelinek <ja...@redhat.com> > > PR middle-end/112733 > * wide-int.cc (wi::mul_internal): Don't allocate twice as much > space for u, v and r as needed. > (divmod_internal_2): Change return type from void to int, for n == 1 > return 1, otherwise before writing b_dividend into b_remainder set > n to MIN (n, m) and at the end return it. > (wi::divmod_internal): Don't allocate 4 times as much space for > b_quotient, b_remainder, b_dividend and b_divisor. Set n to > result of divmod_internal_2. > (wide_int_cc_tests): Add test for unsigned widest_int > wi::multiple_of_p of 1 and -128. > > --- gcc/wide-int.cc.jj 2023-11-28 16:56:50.000000000 +0100 > +++ gcc/wide-int.cc 2023-11-29 10:53:14.329098216 +0100 > @@ -1477,10 +1477,10 @@ wi::mul_internal (HOST_WIDE_INT *val, co > if (UNLIKELY (prec > WIDE_INT_MAX_INL_PRECISION)) > { > unsigned HOST_HALF_WIDE_INT *buf > - = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * 4 * blocks_needed); > + = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * half_blocks_needed); > u = buf; > - v = u + 4 * blocks_needed; > - r = v + 4 * blocks_needed; > + v = u + half_blocks_needed; > + r = v + half_blocks_needed; > } > > /* We do unsigned mul and then correct it. */ > @@ -1675,8 +1675,9 @@ wi::sub_large (HOST_WIDE_INT *val, const > Delight by Warren, which itself is a small modification of Knuth's > algorithm. M is the number of significant elements of U however > there needs to be at least one extra element of B_DIVIDEND > - allocated, N is the number of elements of B_DIVISOR. */ > -static void > + allocated, N is the number of elements of B_DIVISOR. > + Return new value for N. */ > +static int > divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient, > unsigned HOST_HALF_WIDE_INT *b_remainder, > unsigned HOST_HALF_WIDE_INT *b_dividend, > @@ -1707,7 +1708,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI > * (unsigned HOST_WIDE_INT)b_divisor[0])); > } > b_remainder[0] = k; > - return; > + return 1; > } > > s = clz_hwi (b_divisor[n-1]) - HOST_BITS_PER_HALF_WIDE_INT; /* CHECK clz */ > @@ -1770,6 +1771,10 @@ divmod_internal_2 (unsigned HOST_HALF_WI > b_dividend[j+n] += k; > } > } > + /* If N > M, the main loop was skipped, quotient will be 0 and > + we can't copy more than M half-limbs into the remainder, as they > + aren't present in b_dividend (which has . */ > + n = MIN (n, m); > if (s) > for (i = 0; i < n; i++) > b_remainder[i] = (b_dividend[i] >> s) > @@ -1777,6 +1782,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI > else > for (i = 0; i < n; i++) > b_remainder[i] = b_dividend[i]; > + return n; > } > > > @@ -1943,14 +1949,14 @@ wi::divmod_internal (HOST_WIDE_INT *quot > { > unsigned HOST_HALF_WIDE_INT *buf > = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, > - 12 * dividend_blocks_needed > - + 4 * divisor_blocks_needed + 1); > + 3 * dividend_blocks_needed + 1 > + + divisor_blocks_needed); > b_quotient = buf; > - b_remainder = b_quotient + 4 * dividend_blocks_needed; > - b_dividend = b_remainder + 4 * dividend_blocks_needed; > - b_divisor = b_dividend + 4 * dividend_blocks_needed + 1; > + b_remainder = b_quotient + dividend_blocks_needed; > + b_dividend = b_remainder + dividend_blocks_needed; > + b_divisor = b_dividend + dividend_blocks_needed + 1; > memset (b_quotient, 0, > - 4 * dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT)); > + dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT)); > } > wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (), > dividend_blocks_needed, dividend_prec, UNSIGNED); > @@ -1969,7 +1975,7 @@ wi::divmod_internal (HOST_WIDE_INT *quot > if (b_quotient == b_quotient_buf) > memset (b_quotient_buf, 0, sizeof (b_quotient_buf)); > > - divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n); > + n = divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, > n); > > unsigned int quotient_len = 0; > if (quotient) > @@ -2673,6 +2679,9 @@ wide_int_cc_tests () > wi::shifted_mask (0, 128, false, 128)); > ASSERT_EQ (wi::mask (128, true, 128), > wi::shifted_mask (0, 128, true, 128)); > + ASSERT_EQ (wi::multiple_of_p (from_int <widest_int> (1), > + from_int <widest_int> (-128), UNSIGNED), > + false); > } > > } // namespace selftest > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)