Kenneth Zadeck <zad...@naturalbridge.com> writes: > On 12/14/2013 06:40 AM, Richard Sandiford wrote: >> Hi Kenny, >> >> Sorry for the slow response. >> >> Kenneth Zadeck <zad...@naturalbridge.com> writes: >>> Index: gcc/wide-int.cc >>> =================================================================== >>> --- gcc/wide-int.cc (revision 205765) >>> +++ gcc/wide-int.cc (working copy) >>> @@ -1275,23 +1275,31 @@ wi::mul_internal (HOST_WIDE_INT *val, co >>> unsigned int j; >>> unsigned int blocks_needed = BLOCKS_NEEDED (prec); >>> unsigned int half_blocks_needed = blocks_needed * 2; >>> - /* The sizes here are scaled to support a 2x largest mode by 2x >>> - largest mode yielding a 4x largest mode result. This is what is >>> - needed by vpn. */ >>> >>> + /* The sizes here are scaled to support a 2*largest mode + a little >>> + by 2*largest mode + a little yielding a 4*largest mode + a >>> + little. This is what is needed by vpn. */ >>> unsigned HOST_HALF_WIDE_INT >>> - u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; >>> + u[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; >>> + unsigned int ulen; >>> unsigned HOST_HALF_WIDE_INT >>> - v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; >>> - /* The '2' in 'R' is because we are internally doing a full >>> + v[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; >>> + unsigned int vlen; >>> + unsigned int uvlen; >>> + unsigned int vallen; >>> + >>> + /* The '4' in 'R' is because we are internally doing a wide >>> multiply. */ >>> unsigned HOST_HALF_WIDE_INT >>> - r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT]; >>> - HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - >>> 1; >>> + r[4 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT]; >> While we're here I think we should convert these arrays into allocas, >> so that we're not hard-coding the specialness of vpn in wide-int.cc. >> I.e. rather than having arrays of maximum size, use XALLOCAVEC to >> allocate a specific number of stack HWIs, once we know how many >> we actually need. That includes the new arrays added in the patch. > I had thought of doing this but there is a part of me that has always > thought of this as being more expensive, but in the scheme of things it > is just noise here, so i will do it.
Thanks. >>> + /* True if the result needs to be negated. */ >>> + bool is_neg = false; >>> >>> /* If the top level routine did not really pass in an overflow, then >>> just make sure that we never attempt to set it. */ >>> bool needs_overflow = (overflow != 0); >>> + const HOST_WIDE_INT zero = 0; >>> if (needs_overflow) >>> *overflow = false; >>> >>> @@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co >>> return 1; >>> } >>> >>> - /* We do unsigned mul and then correct it. */ >>> - wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, >>> - half_blocks_needed, prec, SIGNED); >>> - wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, >>> - half_blocks_needed, prec, SIGNED); >>> - >>> - /* The 2 is for a full mult. */ >>> - memset (r, 0, half_blocks_needed * 2 >>> - * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT); >>> + ulen = op1len * 2; >>> + vlen = op2len * 2; >>> + >>> + if (sgn == SIGNED) >>> + { >>> + if (wi::neg_p (op1)) >>> + { >>> + HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS]; >>> + >> E.g. following on from the above: >> >> /* Add an extra HWI so that we can represent the negative of >> the minimum. */ >> HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1); >> >>> + int nop1len >>> + = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED, >>> 0); >> Not sure about the "prec + 1" here, since it could cause nop1len to be >> greater than the maximum length for "prec". And you go on to unpack >> nop1 in "prec" rather than "prec + 1". > the unpacking with prec is the wrong part. That should have been prec + 1 Hmm, I still think it's the opposite, because... >> AIUI, once you've taken the absolutes, you've got two unsigned numbers >> of precision "prec" and create a "prec * 2" result, so naybe: >> sub_large (..., prec, UNSIGNED, 0) instead? > > the signed and unsigned does not matter here. the prec + 1 means we > never overflow. ...my point is that after this we treat the number as unsigned, so it isn't a question of whether the absolute value overflows the precision as a signed number but whether it overflows as an unsigned number. And it never will. It might need 1 extra HWI than the original input if op1len < blocks_needed, but it shouldn't need extra precision. E.g. say for 8-bit HWIs we start out with a signed -0x80 * -0x80. We convert that into an unsigned 0x80 * 0x80, which is still precision 8 * precision 8 -> precision 16. And we'd need to be able to handle those precision-8 values correctly (with no extra zero HWI) if we had an unsigned 0x80 * 0x80 from the outset. >>> + is_neg = !is_neg; >>> + ulen = nop1len * 2; >>> + wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len, >>> + ulen, prec, SIGNED); >> Back on the alloca theme, we'd have: >> >> u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen); >> >> before this and other unpacks. >> >> Note that there's supposed to be space after ")" in casts. Lots of >> instances in the patch :-) >> >>> + /* UNSIGNED mul. */ >>> + /* For large positive integers inside regular wide-ints, we must >>> + explictly expand out the 1s to full width for the mul to be >>> + correct. Note that the top bit will never be set for a >>> + unsigned number in offset_int of widest-int. */ >> s/of/or/. But I think the comment is confusing because the unsignedness >> is a property of the multiplication rather than the inputs. We should never >> be doing an unsigned operation on widest_int to begin with. And if >> offset_ints are being treated as having a sign then the same is true there. >> >> If we ever do have an unsigned multiplication on those types for some reason >> then -1 will (rightly) be treated as the maximum unsigned value. > The real question here is "do you want to do an assert or do you want to > explain what happens if the input comes in that way?" Sorry, to be clear, I was only talking about the "Note..." part of the comment rather than the full thing. I think the first part of the comment and the code itself is fine. > The current world > is actually structured so that we never ask about overflow for the two > larger classes because the reason that you used those classes was that > you never wanted to have this discussion. So if you never ask about > overflow, then it really does not matter because we are not going to > return enough bits for you to care what happened on the inside. Of > course that could change and someone could say that they wanted overflow > on widest-int. Then the comment makes sense, with revisions, unless > your review of the code that wants overflow on widest int suggests that > they are just being stupid. But widest_int is now supposed to be at least 1 bit wider than widest input type (unlike previously where it was double the widest input type). So I can definitely see cases where we'd want to know whether a widest_int * widest_int result overflows. My point is that the widest_int * widest_int would normally be a signed multiplication rather than an unsigned multiplication, since the extra 1 bit of precision allows every operation to be signed. So it isn't a case of whether the top bit of a widest_int will be set, but whether we ever reach here for widest_int in the first place. We should be going down the sgn == SIGNED path rather than the sgn == UNSIGNED path. widest_int can represent an all-1s value, usually interpreted as -1. If we do go down this sgn == UNSIGNED path for widest_int then we will instead treat the all-1s value as the maximum unsigned number, just like for any other kind of wide int. As far as this function goes there really is no difference between wide_int, offset_int and widest_int. Which is good, because offset_int and widest_int should just be wide_ints that are optimised for a specific and fixed precision. Thanks, Richard