Kenneth Zadeck <zad...@naturalbridge.com> writes:
> I would like to see some comment to the effect that this to allow 
> inlining for the common case for widest int and offset int without 
> inlining the uncommon case for regular wide-int.

OK, how about:

  /* If the precision is known at compile time to be greater than
     HOST_BITS_PER_WIDE_INT, we can optimize the single-HWI case
     knowing that (a) all bits in those HWIs are significant and
     (b) the result has room for at least two HWIs.  This provides
     a fast path for things like offset_int and widest_int.

     The STATIC_CONSTANT_P test prevents this path from being
     used for wide_ints.  wide_ints with precisions greater than
     HOST_BITS_PER_WIDE_INT are relatively rare and there's not much
     point handling them inline.  */

Thanks,
Richard

> On 11/28/2013 12:38 PM, Richard Sandiford wrote:
>> Currently add and sub have no fast path for offset_int and widest_int,
>> they just call the out-of-line version.  This patch handles the
>> single-HWI cases inline.  At least on x86_64, this only adds one branch
>> per call; the fast path itself is straight-line code.
>>
>> On the same fold-const.ii testcase, this reduces the number of
>> add_large calls from 877507 to 42459.  It reduces the number of
>> sub_large calls from 25707 to 148.
>>
>> Tested on x86_64-linux-gnu.  OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> Index: gcc/wide-int.h
>> ===================================================================
>> --- gcc/wide-int.h   2013-11-28 13:34:19.596839877 +0000
>> +++ gcc/wide-int.h   2013-11-28 16:08:11.387731775 +0000
>> @@ -2234,6 +2234,17 @@ wi::add (const T1 &x, const T2 &y)
>>         val[0] = xi.ulow () + yi.ulow ();
>>         result.set_len (1);
>>       }
>> +  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
>> +       && xi.len + yi.len == 2)
>> +    {
>> +      unsigned HOST_WIDE_INT xl = xi.ulow ();
>> +      unsigned HOST_WIDE_INT yl = yi.ulow ();
>> +      unsigned HOST_WIDE_INT resultl = xl + yl;
>> +      val[0] = resultl;
>> +      val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
>> +      result.set_len (1 + (((resultl ^ xl) & (resultl ^ yl))
>> +                       >> (HOST_BITS_PER_WIDE_INT - 1)));
>> +    }
>>     else
>>       result.set_len (add_large (val, xi.val, xi.len,
>>                             yi.val, yi.len, precision,
>> @@ -2288,6 +2299,17 @@ wi::sub (const T1 &x, const T2 &y)
>>         val[0] = xi.ulow () - yi.ulow ();
>>         result.set_len (1);
>>       }
>> +  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
>> +       && xi.len + yi.len == 2)
>> +    {
>> +      unsigned HOST_WIDE_INT xl = xi.ulow ();
>> +      unsigned HOST_WIDE_INT yl = yi.ulow ();
>> +      unsigned HOST_WIDE_INT resultl = xl - yl;
>> +      val[0] = resultl;
>> +      val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
>> +      result.set_len (1 + (((resultl ^ xl) & (xl ^ yl))
>> +                       >> (HOST_BITS_PER_WIDE_INT - 1)));
>> +    }
>>     else
>>       result.set_len (sub_large (val, xi.val, xi.len,
>>                             yi.val, yi.len, precision,
>>

Reply via email to