Re: patch to fix constant math - 4th patch - the wide-int class.

Richard Biener Wed, 31 Oct 2012 07:06:58 -0700

On Wed, Oct 31, 2012 at 2:54 PM, Kenneth Zadeck
<zad...@naturalbridge.com> wrote:
>
> On 10/31/2012 08:11 AM, Richard Biener wrote:
>>
>> On Wed, Oct 31, 2012 at 1:05 PM, Richard Sandiford
>> <rdsandif...@googlemail.com> wrote:
>>>
>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>>
>>>> On Wed, Oct 31, 2012 at 11:43 AM, Richard Sandiford
>>>> <rdsandif...@googlemail.com> wrote:
>>>>>
>>>>> Richard Biener <richard.guent...@gmail.com> writes:
>>>>>>
>>>>>> On Thu, Oct 25, 2012 at 12:55 PM, Kenneth Zadeck
>>>>>> <zad...@naturalbridge.com> wrote:
>>>>>>>
>>>>>>> On 10/25/2012 06:42 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> On Wed, Oct 24, 2012 at 7:23 PM, Mike Stump <mikest...@comcast.net>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Oct 24, 2012, at 2:43 AM, Richard Biener
>>>>>>>>> <richard.guent...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 23, 2012 at 6:12 PM, Kenneth Zadeck
>>>>>>>>>> <zad...@naturalbridge.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 10/23/2012 10:12 AM, Richard Biener wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +  HOST_WIDE_INT val[2 * MAX_BITSIZE_MODE_ANY_INT /
>>>>>>>>>>>> HOST_BITS_PER_WIDE_INT];
>>>>>>>>>>>>
>>>>>>>>>>>> are we sure this rounds properly?  Consider a port with max byte
>>>>>>>>>>>> mode
>>>>>>>>>>>> size 4 on a 64bit host.
>>>>>>>>>>>
>>>>>>>>>>> I do not believe that this can happen.   The core compiler
>>>>>>>>>>> includes all
>>>>>>>>>>> modes up to TI mode, so by default we already up to 128 bits.
>>>>>>>>>>
>>>>>>>>>> And mode bitsizes are always power-of-two?  I suppose so.
>>>>>>>>>
>>>>>>>>> Actually, no, they are not.  Partial int modes can have bit sizes
>>>>>>>>> that
>>>>>>>>> are not power of two, and, if there isn't an int mode that is
>>>>>>>>> bigger, we'd
>>>>>>>>> want to round up the partial int bit size.  Something like ((2 *
>>>>>>>>> MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) /
>>>>>>>>> HOST_BITS_PER_WIDE_INT should do it.
>>>>>>>>>
>>>>>>>>>>>> I still would like to have the ability to provide
>>>>>>>>>>>> specializations of
>>>>>>>>>>>> wide_int
>>>>>>>>>>>> for "small" sizes, thus ideally wide_int would be a template
>>>>>>>>>>>> templated
>>>>>>>>>>>> on the number of HWIs in val.  Interface-wise wide_int<2> should
>>>>>>>>>>>> be
>>>>>>>>>>>> identical to double_int, thus we should be able to do
>>>>>>>>>>>>
>>>>>>>>>>>> typedef wide_int<2> double_int;
>>>>>>>>>>>
>>>>>>>>>>> If you want to go down this path after the patches get in, go for
>>>>>>>>>>> it.
>>>>>>>>>>> I
>>>>>>>>>>> see no use at all for this.
>>>>>>>>>>> This was not meant to be a plug in replacement for double int.
>>>>>>>>>>> This
>>>>>>>>>>> goal of
>>>>>>>>>>> this patch is to get the compiler to do the constant math the way
>>>>>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>> target does it.   Any such instantiation is by definition placing
>>>>>>>>>>> some
>>>>>>>>>>> predefined limit that some target may not want.
>>>>>>>>>>
>>>>>>>>>> Well, what I don't really like is that we now have two
>>>>>>>>>> implementations
>>>>>>>>>> of functions that perform integer math on two-HWI sized integers.
>>>>>>>>>> What
>>>>>>>>>> I also don't like too much is that we have two different
>>>>>>>>>> interfaces to
>>>>>>>>>> operate
>>>>>>>>>> on them!  Can't you see how I come to not liking this?  Especially
>>>>>>>>>> the
>>>>>>>>>> latter …
>>>>>>>>>
>>>>>>>>> double_int is logically dead.  Reactoring wide-int and double-int
>>>>>>>>> is a
>>>>>>>>> waste of time, as the time is better spent removing double-int from
>>>>>>>>> the
>>>>>>>>> compiler.  All the necessary semantics and code of double-int _has_
>>>>>>>>> been
>>>>>>>>> refactored into wide-int already.  Changing wide-int in any way to
>>>>>>>>> vend
>>>>>>>>> anything to double-int is wrong, as once double-int is removed,
>>>>>>>>> then all the
>>>>>>>>> api changes to make double-int share from wide-int is wasted and
>>>>>>>>> must then
>>>>>>>>> be removed.  The path forward is the complete removal of
>>>>>>>>> double-int; it is
>>>>>>>>> wrong, has been wrong and always will be wrong, nothing can change
>>>>>>>>> that.
>>>>>>>>
>>>>>>>> double_int, compared to wide_int, is fast and lean.  I doubt we will
>>>>>>>> get rid of it - you
>>>>>>>> will make compile-time math a _lot_ slower.  Just profile when you
>>>>>>>> for
>>>>>>>> example
>>>>>>>> change get_inner_reference to use wide_ints.
>>>>>>>>
>>>>>>>> To be able to remove double_int in favor of wide_int requires _at
>>>>>>>> least_
>>>>>>>> templating wide_int on 'len' and providing specializations for 1 and
>>>>>>>> 2.
>>>>>>>>
>>>>>>>> It might be a non-issue for math that operates on trees or RTXen due
>>>>>>>> to
>>>>>>>> the allocation overhead we pay, but in recent years we transitioned
>>>>>>>> important
>>>>>>>> paths away from using tree math to using double_ints _for speed
>>>>>>>> reasons_.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>
>>>>>>> i do not know why you believe this about the speed.     double int
>>>>>>> always
>>>>>>> does synthetic math since you do everything at 128 bit precision.
>>>>>>>
>>>>>>> the thing about wide int, is that since it does math to the
>>>>>>> precision's
>>>>>>> size, it almost never does uses synthetic operations since the sizes
>>>>>>> for
>>>>>>> almost every instance can be done using the native math on the
>>>>>>> machine.
>>>>>>> almost every call has a check to see if the operation can be done
>>>>>>> natively.
>>>>>>> I seriously doubt that you are going to do TI mode math much faster
>>>>>>> than i
>>>>>>> do it and if you do who cares.
>>>>>>>
>>>>>>> the number of calls does not effect the performance in any negative
>>>>>>> way and
>>>>>>> it fact is more efficient since common things that require more than
>>>>>>> one
>>>>>>> operation in double in are typically done in a single operation.
>>>>>>
>>>>>> Simple double-int operations like
>>>>>>
>>>>>> inline double_int
>>>>>> double_int::and_not (double_int b) const
>>>>>> {
>>>>>>    double_int result;
>>>>>>    result.low = low & ~b.low;
>>>>>>    result.high = high & ~b.high;
>>>>>>    return result;
>>>>>> }
>>>>>>
>>>>>> are always going to be faster than conditionally executing only one
>>>>>> operation
>>>>>> (but inside an offline function).
>>>>>
>>>>> OK, this is really in reply to the 4.8 thing, but it felt more
>>>>> appropriate here.
>>>>>
>>>>> It's interesting that you gave this example, since before you were
>>>>> complaining about too many fused ops.  Clearly this one could be
>>>>> removed in favour of separate and() and not() operations, but why
>>>>> not provide a fused one if there are clients who'll make use of it?
>>>>
>>>> I was more concerned about fused operations that use precision
>>>> or bitsize as input.  That is for example
>>>>
>>>>>> +  bool only_sign_bit_p (unsigned int prec) const;
>>>>>> +  bool only_sign_bit_p () const;
>>>>
>>>> The first is construct a wide-int with precision prec (and sign- or
>>>> zero-extend it) and then call only_sign_bit_p on it.  Such function
>>>> should not be necessary and existing callers should be questioned
>>>> instead of introducing it.
>>>>
>>>> In fact wide-int seems to have so many "fused" operations that
>>>> we run out of sensible recognizable names for them.  Which results
>>>> in a lot of confusion on what the functions actually do (at least for
>>>> me).
>>>
>>> Well, I suppose I can't really say anything useful either way on
>>> that one, since I'm not writing the patch and I'm not reviewing it :-)
>>>
>>>>> I think Kenny's API is just taking that to its logical conclusion.
>>>>> There doesn't seem to be anything sacrosanct about the current choice
>>>>> of what's fused and what isn't.
>>>>
>>>> Maybe.  I'd rather have seen an initial small wide-int API and fused
>>>> operations introduced separately together with the places they are
>>>> used.  In the current way it's way too tedious to go over all of them
>>>> and match them with callers, lookup enough context and then
>>>> make up my mind on whether the caller should do sth different or not.
>>>>
>>>> Thus, consider the big initial API a reason that all this review takes
>>>> so long ...
>>>>
>>>>> The speed problem we had using trees for internal arithmetic isn't
>>>>> IMO a good argument for keeping double_int in preference to wide_int.
>>>>> Allocating and constructing tree objects to hold temporary values,
>>>>> storing an integer representation in it, then calling tree arithmetic
>>>>> routines that pull out the integer representation again and create a
>>>>> tree to hold the result, is going to be much slower than using either
>>>>> double_int or wide_int.  I'd be very surprised if we notice any
>>>>> measurable difference between double_int and wide_int here.
>>>>>
>>>>> I still see no reason to keep double_int around.  The width of a host
>>>>> wide integer really shouldn't have any significance.
>>>>>
>>>>> Your main complaint seems to be that the wide_int API is different
>>>>> from the double_int one, but we can't literally use the same API, since
>>>>> double_int has an implicit precision and bitsize, and wide_int doesn't.
>>>>> Having a precision that is separate from the underlying representation
>>>>> is IMO the most important feature of wide_int, so:
>>>>>
>>>>>     template wide_int<2> double_int;
>>>>>
>>>>> is never going to be a drop-in, API-compatible replacement for
>>>>> double_int.
>>>>
>>>> My reasoning was that if you strip wide-int of precision and bitsize
>>>> you have a double_int<N> class.
>>>
>>> But you don't!  Because...
>>>
>>>> Thus wide-int should have a base
>>>> of that kind and just add precision / bitsize ontop of that.  It
>>>> wouldn't
>>>> be a step forward if we end up replacing double_int uses with
>>>> wide_int uses with precision of 2 * HOST_BITS_PER_WIDE_INT,
>>>> would it?
>>>
>>> ...the precision and bitsize isn't an optional extra, either conceptually
>>> or in implementation.  wide_int happens to use N HOST_WIDE_INTS under
>>> the hood, but the value of N is an internal implementation detail.
>>> No operations are done to N HWIs, they're done to the number of bits
>>> in the operands.  Whereas a double_int<N> class does everything to N
>>> HWIs.
>>
>> If that's the only effect then either bitsize or precision is redundant
>> (and
>> we also have len ...).  Note I did not mention len above, thus the base
>> class would retain 'len' and double-int would simply use 2 for it
>> (if you don't template it but make it variable).
>>
>> Richard.
>>
> NO, in your own words, there are two parts of the compiler that want the
> infinite model.   The rest wants to do the math the way the target does it.
> My version now accommodates both.    In tree vrp it scans the gimple and
> determines what the largest type is and that is the basis of all of the math
> in this pass.  If you just make double int bigger, then you are paying for
> big math everywhere.


You have an artificial limit on what 'len' can be.  And you do not accomodate
users that do not want to pay the storage penalty for that arbitrary upper limit
choice.  That's all because 'len' may grow (mutate).  You could alternatively
not allow bitsize to grow / mutate and have allocation tied to bitsize instead
of len.

Ideally the wide-int interface would have two storage models:

class alloc_storage
{
  unsigned len; /* or bitsize */
  HOST_WIDE_INT *hwis;

  HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
}

class max_mode
{
  HOST_WIDE_INT hwis[largest integer mode size in hwi];

  HOST_WIDE_INT& operator[](unsigned i) { return hwis[i]; }
}

template <class storage>
class wide_int : storage
{

so you can even re-use the (const) in-place storage of INTEGER_CSTs
or RTL CONST_WIDEs.  And VRP would simply have its own storage
supporting 2 times the largest integer mode (or whatever choice it has).
And double-int would simply embed two.

Maybe this is the perfect example for introducing virtual functions as well
to ease inter-operability between the wide-int variants without making each
member a template on the 2nd wide-int operand (it's of course auto-deduced,
but well ...).

The above is just a brain-dump, details may need further thinking
Like the operator[], maybe the storage model should just be able
to return a pointer to the array of HWIs, this way the actual workers
can be out-of-line and non-templates.

Richard.

>
>>> Richard
>
>

Re: patch to fix constant math - 4th patch - the wide-int class.

Reply via email to