Re: [wide-int] Handle more add and sub cases inline

2013-11-29 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 I would like to see some comment to the effect that this to allow 
 inlining for the common case for widest int and offset int without 
 inlining the uncommon case for regular wide-int.

OK, how about:

  /* If the precision is known at compile time to be greater than
 HOST_BITS_PER_WIDE_INT, we can optimize the single-HWI case
 knowing that (a) all bits in those HWIs are significant and
 (b) the result has room for at least two HWIs.  This provides
 a fast path for things like offset_int and widest_int.

 The STATIC_CONSTANT_P test prevents this path from being
 used for wide_ints.  wide_ints with precisions greater than
 HOST_BITS_PER_WIDE_INT are relatively rare and there's not much
 point handling them inline.  */

Thanks,
Richard

 On 11/28/2013 12:38 PM, Richard Sandiford wrote:
 Currently add and sub have no fast path for offset_int and widest_int,
 they just call the out-of-line version.  This patch handles the
 single-HWI cases inline.  At least on x86_64, this only adds one branch
 per call; the fast path itself is straight-line code.

 On the same fold-const.ii testcase, this reduces the number of
 add_large calls from 877507 to 42459.  It reduces the number of
 sub_large calls from 25707 to 148.

 Tested on x86_64-linux-gnu.  OK to install?

 Thanks,
 Richard


 Index: gcc/wide-int.h
 ===
 --- gcc/wide-int.h   2013-11-28 13:34:19.596839877 +
 +++ gcc/wide-int.h   2013-11-28 16:08:11.387731775 +
 @@ -2234,6 +2234,17 @@ wi::add (const T1 x, const T2 y)
 val[0] = xi.ulow () + yi.ulow ();
 result.set_len (1);
   }
 +  else if (STATIC_CONSTANT_P (precision  HOST_BITS_PER_WIDE_INT)
 +xi.len + yi.len == 2)
 +{
 +  unsigned HOST_WIDE_INT xl = xi.ulow ();
 +  unsigned HOST_WIDE_INT yl = yi.ulow ();
 +  unsigned HOST_WIDE_INT resultl = xl + yl;
 +  val[0] = resultl;
 +  val[1] = (HOST_WIDE_INT) resultl  0 ? 0 : -1;
 +  result.set_len (1 + (((resultl ^ xl)  (resultl ^ yl))
 +(HOST_BITS_PER_WIDE_INT - 1)));
 +}
 else
   result.set_len (add_large (val, xi.val, xi.len,
 yi.val, yi.len, precision,
 @@ -2288,6 +2299,17 @@ wi::sub (const T1 x, const T2 y)
 val[0] = xi.ulow () - yi.ulow ();
 result.set_len (1);
   }
 +  else if (STATIC_CONSTANT_P (precision  HOST_BITS_PER_WIDE_INT)
 +xi.len + yi.len == 2)
 +{
 +  unsigned HOST_WIDE_INT xl = xi.ulow ();
 +  unsigned HOST_WIDE_INT yl = yi.ulow ();
 +  unsigned HOST_WIDE_INT resultl = xl - yl;
 +  val[0] = resultl;
 +  val[1] = (HOST_WIDE_INT) resultl  0 ? 0 : -1;
 +  result.set_len (1 + (((resultl ^ xl)  (xl ^ yl))
 +(HOST_BITS_PER_WIDE_INT - 1)));
 +}
 else
   result.set_len (sub_large (val, xi.val, xi.len,
 yi.val, yi.len, precision,



[wide-int] Handle more add and sub cases inline

2013-11-28 Thread Richard Sandiford
Currently add and sub have no fast path for offset_int and widest_int,
they just call the out-of-line version.  This patch handles the
single-HWI cases inline.  At least on x86_64, this only adds one branch
per call; the fast path itself is straight-line code.

On the same fold-const.ii testcase, this reduces the number of
add_large calls from 877507 to 42459.  It reduces the number of
sub_large calls from 25707 to 148.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-11-28 13:34:19.596839877 +
+++ gcc/wide-int.h  2013-11-28 16:08:11.387731775 +
@@ -2234,6 +2234,17 @@ wi::add (const T1 x, const T2 y)
   val[0] = xi.ulow () + yi.ulow ();
   result.set_len (1);
 }
+  else if (STATIC_CONSTANT_P (precision  HOST_BITS_PER_WIDE_INT)
+   xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl + yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl  0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl)  (resultl ^ yl))
+   (HOST_BITS_PER_WIDE_INT - 1)));
+}
   else
 result.set_len (add_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,
@@ -2288,6 +2299,17 @@ wi::sub (const T1 x, const T2 y)
   val[0] = xi.ulow () - yi.ulow ();
   result.set_len (1);
 }
+  else if (STATIC_CONSTANT_P (precision  HOST_BITS_PER_WIDE_INT)
+   xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl - yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl  0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl)  (xl ^ yl))
+   (HOST_BITS_PER_WIDE_INT - 1)));
+}
   else
 result.set_len (sub_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,



Re: [wide-int] Handle more add and sub cases inline

2013-11-28 Thread Kenneth Zadeck
I would like to see some comment to the effect that this to allow 
inlining for the common case for widest int and offset int without 
inlining the uncommon case for regular wide-int.





On 11/28/2013 12:38 PM, Richard Sandiford wrote:

Currently add and sub have no fast path for offset_int and widest_int,
they just call the out-of-line version.  This patch handles the
single-HWI cases inline.  At least on x86_64, this only adds one branch
per call; the fast path itself is straight-line code.

On the same fold-const.ii testcase, this reduces the number of
add_large calls from 877507 to 42459.  It reduces the number of
sub_large calls from 25707 to 148.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-11-28 13:34:19.596839877 +
+++ gcc/wide-int.h  2013-11-28 16:08:11.387731775 +
@@ -2234,6 +2234,17 @@ wi::add (const T1 x, const T2 y)
val[0] = xi.ulow () + yi.ulow ();
result.set_len (1);
  }
+  else if (STATIC_CONSTANT_P (precision  HOST_BITS_PER_WIDE_INT)
+   xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl + yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl  0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl)  (resultl ^ yl))
+   (HOST_BITS_PER_WIDE_INT - 1)));
+}
else
  result.set_len (add_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,
@@ -2288,6 +2299,17 @@ wi::sub (const T1 x, const T2 y)
val[0] = xi.ulow () - yi.ulow ();
result.set_len (1);
  }
+  else if (STATIC_CONSTANT_P (precision  HOST_BITS_PER_WIDE_INT)
+   xi.len + yi.len == 2)
+{
+  unsigned HOST_WIDE_INT xl = xi.ulow ();
+  unsigned HOST_WIDE_INT yl = yi.ulow ();
+  unsigned HOST_WIDE_INT resultl = xl - yl;
+  val[0] = resultl;
+  val[1] = (HOST_WIDE_INT) resultl  0 ? 0 : -1;
+  result.set_len (1 + (((resultl ^ xl)  (xl ^ yl))
+   (HOST_BITS_PER_WIDE_INT - 1)));
+}
else
  result.set_len (sub_large (val, xi.val, xi.len,
   yi.val, yi.len, precision,