On 12/14/2013 06:40 AM, Richard Sandiford wrote:
Hi Kenny,

Sorry for the slow response.

Kenneth Zadeck <zad...@naturalbridge.com> writes:
Index: gcc/wide-int.cc
===================================================================
--- gcc/wide-int.cc     (revision 205765)
+++ gcc/wide-int.cc     (working copy)
@@ -1275,23 +1275,31 @@ wi::mul_internal (HOST_WIDE_INT *val, co
    unsigned int j;
    unsigned int blocks_needed = BLOCKS_NEEDED (prec);
    unsigned int half_blocks_needed = blocks_needed * 2;
-  /* The sizes here are scaled to support a 2x largest mode by 2x
-     largest mode yielding a 4x largest mode result.  This is what is
-     needed by vpn.  */
+ /* The sizes here are scaled to support a 2*largest mode + a little
+     by 2*largest mode + a little yielding a 4*largest mode + a
+     little.  This is what is needed by vpn.  */
    unsigned HOST_HALF_WIDE_INT
-    u[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
+    u[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned int ulen;
    unsigned HOST_HALF_WIDE_INT
-    v[4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
-  /* The '2' in 'R' is because we are internally doing a full
+    v[2 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
+  unsigned int vlen;
+  unsigned int uvlen;
+  unsigned int vallen;
+
+  /* The '4' in 'R' is because we are internally doing a wide
       multiply.  */
    unsigned HOST_HALF_WIDE_INT
-    r[2 * 4 * MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_HALF_WIDE_INT];
-  HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << HOST_BITS_PER_HALF_WIDE_INT) - 1;
+    r[4 * WIDE_INT_MAX_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
While we're here I think we should convert these arrays into allocas,
so that we're not hard-coding the specialness of vpn in wide-int.cc.
I.e. rather than having arrays of maximum size, use XALLOCAVEC to
allocate a specific number of stack HWIs, once we know how many
we actually need.  That includes the new arrays added in the patch.
I had thought of doing this but there is a part of me that has always thought of this as being more expensive, but in the scheme of things it is just noise here, so i will do it.

+  /* True if the result needs to be negated.  */
+  bool is_neg = false;
/* If the top level routine did not really pass in an overflow, then
       just make sure that we never attempt to set it.  */
    bool needs_overflow = (overflow != 0);
+  const HOST_WIDE_INT zero = 0;
    if (needs_overflow)
      *overflow = false;
@@ -1382,61 +1392,96 @@ wi::mul_internal (HOST_WIDE_INT *val, co
        return 1;
      }
- /* We do unsigned mul and then correct it. */
-  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
-            half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
-            half_blocks_needed, prec, SIGNED);
-
-  /* The 2 is for a full mult.  */
-  memset (r, 0, half_blocks_needed * 2
-         * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT);
+  ulen = op1len * 2;
+  vlen = op2len * 2;
+
+  if (sgn == SIGNED)
+    {
+      if (wi::neg_p (op1))
+       {
+         HOST_WIDE_INT nop1[2 * WIDE_INT_MAX_ELTS];
+       
E.g. following on from the above:

          /* Add an extra HWI so that we can represent the negative of
             the minimum.  */
          HOST_WIDE_INT *nop1 = XALLOCAVEC (HOST_WIDE_INT, op1len + 1);

+         int nop1len
+           = wi::sub_large (nop1, &zero, 1, op1val, op1len, prec + 1, SIGNED, 
0);
Not sure about the "prec + 1" here, since it could cause nop1len to be
greater than the maximum length for "prec".  And you go on to unpack
nop1 in "prec" rather than "prec + 1".
the unpacking with prec is the wrong part.  That should have been prec + 1

AIUI, once you've taken the absolutes, you've got two unsigned numbers
of precision "prec" and create a "prec * 2" result, so naybe:
sub_large (..., prec, UNSIGNED, 0) instead?

the signed and unsigned does not matter here. the prec + 1 means we never overflow.
+         is_neg = !is_neg;
+         ulen = nop1len * 2;
+         wi_unpack (u, (const unsigned HOST_WIDE_INT*)nop1, nop1len,
+                    ulen, prec, SIGNED);
Back on the alloca theme, we'd have:

          u = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, ulen);

before this and other unpacks.

Note that there's supposed to be space after ")" in casts.  Lots of
instances in the patch :-)

+      /* UNSIGNED mul. */
+      /* For large positive integers inside regular wide-ints, we must
+        explictly expand out the 1s to full width for the mul to be
+        correct. Note that the top bit will never be set for a
+        unsigned number in offset_int of widest-int.  */
s/of/or/.  But I think the comment is confusing because the unsignedness
is a property of the multiplication rather than the inputs.  We should never
be doing an unsigned operation on widest_int to begin with.  And if
offset_ints are being treated as having a sign then the same is true there.

If we ever do have an unsigned multiplication on those types for some reason
then -1 will (rightly) be treated as the maximum unsigned value.
The real question here is "do you want to do an assert or do you want to explain what happens if the input comes in that way?" The current world is actually structured so that we never ask about overflow for the two larger classes because the reason that you used those classes was that you never wanted to have this discussion. So if you never ask about overflow, then it really does not matter because we are not going to return enough bits for you to care what happened on the inside. Of course that could change and someone could say that they wanted overflow on widest-int. Then the comment makes sense, with revisions, unless your review of the code that wants overflow on widest int suggests that they are just being stupid.


Maybe easiest just to drop the note?  Or if you don't like that then maybe:

    Note that this case is typically only used for variable-precision
    wide_ints.

+      if (wi::neg_p (op1))
+       ulen = half_blocks_needed;
+      if (wi::neg_p (op2))
+       vlen = half_blocks_needed;
+
+      wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1val, op1len,
+                ulen, prec, SIGNED);
+
+      wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2val, op2len,
+                vlen, prec, SIGNED);
+    }
- for (j = 0; j < half_blocks_needed; j++)
+  uvlen = ulen + vlen;
The alloca change here would be:

   r = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, uvlen);

+  memset (r, 0, uvlen * HOST_BITS_PER_HALF_WIDE_INT / CHAR_BIT);
+
+  /* Do the actually multiply.  */
+  for (j = 0; j < vlen; j++)
      {
        k = 0;
-      for (i = 0; i < half_blocks_needed; i++)
+      for (i = 0; i < ulen; i++)
        {
          t = ((unsigned HOST_WIDE_INT)u[i] * (unsigned HOST_WIDE_INT)v[j]
               + r[i + j] + k);
          r[i + j] = t & HALF_INT_MASK;
          k = t >> HOST_BITS_PER_HALF_WIDE_INT;
        }
-      r[j + half_blocks_needed] = k;
+      r[j + ulen] = k;
      }
- /* We did unsigned math above. For signed we must adjust the
-     product (assuming we need to see that).  */
-  if (sgn == SIGNED && (high || needs_overflow))
+  /* We did unsigned math above.  For signed we may need to adjust the
+     product if exactly 1 of the operands was negative.  */
+  if (is_neg)
      {
-      unsigned HOST_WIDE_INT b;
-      if (wi::neg_p (op1))
+      t = (unsigned HOST_WIDE_INT)(~r[0]) + 1;
r[0] ^ HALF_INT_MASK is more portable than ~r[0] here.  If HOST_WIDE_INT
is int then HOST_HALF_WIDE_INT is short, and the short will be promoted
to int before being inverted.  E.g. with ~, a half-int of 0x0000 would
become 0xffffffff rather than 0x0000ffff.

(Same for the ~ inside the loop of course.)

+      const HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << 
HOST_BITS_PER_HALF_WIDE_INT) - 1;
Long line.  But it looks like this is just HALF_INT_MASK.

@@ -1459,15 +1504,42 @@ wi::mul_internal (HOST_WIDE_INT *val, co
      {
        /* compute [prec] <- ([prec] * [prec]) >> [prec] */
        wi_pack ((unsigned HOST_WIDE_INT *) val,
-              &r[half_blocks_needed], half_blocks_needed);
-      return canonize (val, blocks_needed, prec);
+              r, uvlen);
This wi_pack now fits more naturally on a single line.

+      vallen = canonize (val, (uvlen + 1) >> 1, prec);
+
+      /* Shift is not always safe to write over one of the
+        operands, so we must copy.  */
+      HOST_WIDE_INT tval[2 * WIDE_INT_MAX_ELTS];
+      memcpy (tval, val, vallen * CHAR_BIT / HOST_BITS_PER_WIDE_INT);
vallen * sizeof (HOST_WIDE_INT) would be more typical.
But why not unpack into tval directly and avoid the copy?

+#if 0
+  static int cnt = 0;
+  printf ("cnt=%d op1=", cnt++);
+  for (i = 0; i < op1len; i++)
+    printf ("0x%lx ", op1[i]);
+
+  printf ("  op2=");
+  for (i = 0; i < op2len; i++)
+    printf ("0x%lx ", op2[i]);
+  printf ("  result=");
+  for (i = 0; i < vallen; i++)
+    printf ("0x%lx ", val[i]);
+  if (needs_overflow && *overflow)
+    printf ("  OVERFLOW");
+  printf ("\n");
+#endif
i had forgotten to remove this from the patch, sorry.
I think this should either be removed or put under:

#define DEBUG_MUL 0

   if (DEBUG_MUL)
     {
     }

protection, so that at least it gets compile testing while disabled.

Thanks,
Richard

Reply via email to