Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

Richard Biener Tue, 27 Nov 2012 02:04:10 -0800

On Tue, Nov 27, 2012 at 1:06 AM, Kenneth Zadeck
<zad...@naturalbridge.com> wrote:
> Richard,
>
> I spent a good part of the afternoon talking to Mike about this.  He is on
> the c++ standards committee and is a much more seasoned c++ programmer than
> I am.
>
> He convinced me that with a large amount of engineering and c++
> "foolishness" that it was indeed possible to get your proposal to POSSIBLY
> work as well as what we did.
>
> But now the question is why would any want to do this?
>
> At the very least you are talking about instantiating two instances of
> wide-ints, one for the stack allocated uses and one for the places where we
> just move a pointer from the tree or the rtx. Then you are talking about
> creating connectors so that the stack allocated functions can take
> parameters of pointer version and visa versa.
>
> Then there is the issue that rather than just saying that something is a
> wide int, that the programmer is going to have to track it's origin.   In
> particular,  where in the code right now i say.
>
> wide_int foo = wide_int::from_rtx (r1);
> wide_int bar = wide_int::from_rtx (r2) + foo;
>
> now i would have to say
>
> wide_int_ptr foo = wide_int_ptr::from_rtx (r1);
> wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;


No, you'd say

wide_int foo = wide_int::from_rtx (r1);

and the static, non-templated from_rtx method would automagically
return (always!) a "wide_int_ptr" kind.  The initialization then would
use the assignment operator that mediates between wide_int and
"wide_int_ptr", doing the copying.

The user should get a 'stack' kind by default when specifying wide_int,
like implemented with

struct wide_int_storage_stack;
struct wide_int_storage_ptr;

template <class storage = wide_int_storage_stack>
class wide_int : public storage
{
...
   static wide_int <wide_int_storage_ptr> from_rtx (rtx);
}

the whole point of the exercise is to make from_rtx and from_tree avoid
the copying (and excessive stack space allocation) for the rvalue case
like in

 wide_int res = wide_int::from_rtx (x) + 1;

if you save the result into a wide_int temporary first then you are lost
of course (modulo some magic GCC optimization being able to elide
the copy somehow).

And of course for code like VRP that keeps a lattice of wide_ints to
be able to reduce its footprint by using ptr storage and explicit allocations
(that's a secondary concern, of course).  And for VRP to specify that
it needs more than the otherwise needed MAX_INT_MODE_SIZE.
ptr storage would not have this arbitrary limitation, only embedded
storage (should) have.

> then when i want to call some function using a wide_int ref that function
> now must be either overloaded to take both or i have to choose one of the
> two instantiations (presumably based on which is going to be more common)
> and just have the compiler fix up everything (which it is likely to do).

Nope, they'd be

class wide_int ...
{
   template <class storage1, class storage2>
   wide_int operator+(wide_int <storage1> a, wide_int<storage2> b)
   {
      return wide_int::plus_worker (a.precision, a. ...., a.get_storage_ptr (),
                                                b.precision, ...,
b.get_storage_ptr ());
   }


> And so what is the payoff:
> 1) No one except the c++ elite is going to understand the code. The rest of
> the community will hate me and curse the ground that i walk on.

Maybe for the implementation - but look at hash-table and vec ... not for
usage certainly.

> 2) I will end up with a version of wide-int that can be used as a medium
> life container (where i define medium life as not allowed to survive a gc
> since they will contain pointers into rtxes and trees.)
> 3) An no clients that actually wanted to do this!!    I could use as an
> example one of your favorite passes, tree-vrp.   The current double-int
> could have been a medium lifetime container since it has a smaller
> footprint, but in fact tree-vrp converts those double-ints back into trees
> for medium storage.   Why, because it needs the other fields of a tree-cst
> to store the entire state.  Wide-ints also "suffer" this problem.  their
> only state are the data, and the three length fields.   They have no type
> and none of the other tree info so the most obvious client for a medium
> lifetime object is really not going to be a good match even if you "solve
> the storage problem".
>
> The fact is that wide-ints are an excellent short term storage class that
> can be very quickly converted into our two long term storage classes.  Your
> proposal is requires a lot of work, will not be easy to use and as far as i
> can see has no payoff on the horizon.   It could be that there could be
> future clients for a medium lifetime value, but asking for this with no
> clients in hand is really beyond the scope of a reasonable review.
>
> I remind you that the purpose of these patches is to solve problems that
> exist in the current compiler that we have papered over for years.   If
> someone needs wide-ints in some way that is not foreseen then they can
> change it.

The patches introduce a lot more temporary wide-ints (your words) and
at the same time makes construction of them from tree / rtx very expensive
both stack space and compile-time wise.  Look at how we for example
compute TREE_INT_CST + 1 - int_cst_binop internally uses double_ints
for the computation and then instantiates a new tree for holding the result.
Now we'd use wide_ints for this requring totally unnecessary copying.
Why not in the first place try to avoid that.  And try to avoid making
wide_ints 4 times as large as really necessary just for the sake of VRP!
(VRP should have a way to say "_I_ want larger wide_ints", without putting
this burden on all other users).

Richard.

> kenny
>
>
> On 11/26/2012 11:30 AM, Richard Biener wrote:
>>
>> On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
>> <zad...@naturalbridge.com> wrote:
>>>
>>> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>>>
>>>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck
>>>> <zad...@naturalbridge.com>
>>>> wrote:
>>>>>
>>>>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>>>>>
>>>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>>>>> <rdsandif...@googlemail.com> wrote:
>>>>>>>
>>>>>>> Kenneth Zadeck <zad...@naturalbridge.com> writes:
>>>>>>>>
>>>>>>>> I would like you to respond to at least point 1 of this email.   In
>>>>>>>> it
>>>>>>>> there is code from the rtl level that was written twice, once for
>>>>>>>> the
>>>>>>>> case when the size of the mode is less than the size of a HWI and
>>>>>>>> once
>>>>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>>>>
>>>>>>>> my patch changes this to one instance of the code that works no
>>>>>>>> matter
>>>>>>>> how large the data passed to it is.
>>>>>>>>
>>>>>>>> you have made a specific requirement for wide int to be a template
>>>>>>>> that
>>>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>>>>>>>> I
>>>>>>>> would like to know how this particular fragment is to be rewritten
>>>>>>>> in
>>>>>>>> this model?   It seems that I would have to retain the structure
>>>>>>>> where
>>>>>>>> there is one version of the code for each size that the template is
>>>>>>>> instantiated.
>>>>>>>
>>>>>>> I think richi's argument was that wide_int should be split into two.
>>>>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>>>>> and the main wide_int class should be an extension on top of that
>>>>>>> that does things to a bit precision instead.  Presumably with some
>>>>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>>>>
>>>>>>>      typedef foo<2> double_int;
>>>>>>>
>>>>>>> and a variable for wide_int (because in wide_int the length would be
>>>>>>> the number of significant HWIs rather than the size of the underlying
>>>>>>> array).  wide_int would also record the precision and apply it after
>>>>>>> the full HWI operation.
>>>>>>>
>>>>>>> So the wide_int class would still provide "as wide as we need"
>>>>>>> arithmetic,
>>>>>>> as in your rtl patch.  I don't think he was objecting to that.
>>>>>>
>>>>>> That summarizes one part of my complaints / suggestions correctly.  In
>>>>>> other
>>>>>> mails I suggested to not make it a template but a constant over object
>>>>>> lifetime
>>>>>> 'bitsize' (or maxlen) field.  Both suggestions likely require more
>>>>>> thought
>>>>>> than
>>>>>> I put into them.  The main reason is that with C++ you can abstract
>>>>>> from
>>>>>> where
>>>>>> wide-int information pieces are stored and thus use the arithmetic /
>>>>>> operation
>>>>>> workers without copying the (source) "wide-int" objects.  Thus you
>>>>>> should
>>>>>> be able to write adaptors for double-int storage, tree or RTX storage.
>>>>>
>>>>> We had considered something along these lines and rejected it.   I am
>>>>> not
>>>>> really opposed to doing something like this, but it is not an obvious
>>>>> winning idea and is likely not to be a good idea.   Here was our
>>>>> thought
>>>>> process:
>>>>>
>>>>> if you abstract away the storage inside a wide int, then you should be
>>>>> able
>>>>> to copy a pointer to the block of data from either the rtl level
>>>>> integer
>>>>> constant or the tree level one into the wide int.   It is certainly
>>>>> true
>>>>> that making a wide_int from one of these is an extremely common
>>>>> operation
>>>>> and doing this would avoid those copies.
>>>>>
>>>>> However, this causes two problems:
>>>>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>>>> make
>>>>> the object.   it created the base object and then it allocated the
>>>>> array.
>>>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that
>>>>> had
>>>>> the array in it.   Doing it this way saves one ggc allocation and one
>>>>> indirection when accessing the data within the CONST_WIDE_INT.   Our
>>>>> plan
>>>>> is
>>>>> to use the same trick at the tree level.   So to avoid the copying, you
>>>>> seem
>>>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>>>
>>>> I did not propose having a pointer to the data in the RTX or tree int.
>>>> Just
>>>> the short-lived wide-ints (which are on the stack) would have a pointer
>>>> to
>>>> the data - which can then obviously point into the RTX and tree data.
>>>
>>> There is the issue then what if some wide-ints are not short lived. It
>>> makes
>>> me nervous to create internal pointers to gc ed memory.
>>
>> I thought they were all short-lived.
>>
>>>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>>>> they
>>>>> are created as part of an expression or you have to play some game to
>>>>> represent the two different storage plans inside of wide_int.
>>>>
>>>> Hm?  wide-ints are short-lived and thus never live across a garbage
>>>> collection
>>>> point.  We create non-GCed objects pointing to GCed objects all the time
>>>> and everywhere this way.
>>>
>>> Again, this makes me nervous but it could be done.  However, it does mean
>>> that now the wide ints that are not created from rtxes or trees will be
>>> more
>>> expensive because they are not going to get their storage "for free",
>>> they
>>> are going to alloca it.
>>
>> No, those would simply use the embedded storage model.
>>
>>> however, it still is not clear, given that 99% of the wide ints are going
>>> to
>>> fit in a single hwi, that this would be a noticeable win.
>>
>> Currently even if they fit into a HWI you will still allocate 4 times the
>> larges integer mode size.  You say that doesn't matter because they
>> are short-lived, but I say it does matter because not all of them are
>> short-lived enough.  If 99% fit in a HWI why allocate 4 times the
>> largest integer mode size in 99% of the cases?
>>
>>>>>     Clearly this
>>>>> is where you think that we should be going by suggesting that we
>>>>> abstract
>>>>> away the internal storage.   However, this comes at a price:   what is
>>>>> currently an array access in my patches would (i believe) become a
>>>>> function
>>>>> call.
>>>>
>>>> No, the workers (that perform the array accesses) will simply get
>>>> a pointer to the first data element.  Then whether it's embedded or
>>>> external is of no interest to them.
>>>
>>> so is your plan that the wide int constructors from rtx or tree would
>>> just
>>> copy the pointer to the array on top of the array that is otherwise
>>> allocated on the stack?    I can easily do this.   But as i said, the
>>> gain
>>> seems quite small.
>>>
>>> And of course, going the other way still does need the copy.
>>
>> The proposal was to template wide_int on a storage model, the embedded
>> one would work as-is (embedding 4 times largest integer mode), the
>> external one would have a pointer to data.  All functions that return a
>> wide_int produce a wide_int with the embedded model.  To avoid
>> the function call penalty you described the storage model provides
>> a way to get a pointer to the first element and the templated operations
>> simply dispatch to a worker that takes this pointer to the first element
>> (as the storage model is designed as a template its abstraction is going
>> to be optimized away by means of inlining).
>>
>> Richard.
>>
>>>>>    From a performance point of view, i believe that this is a non
>>>>> starter. If you can figure out how to design this so that it is not a
>>>>> function call, i would consider this a viable option.
>>>>>
>>>>> On the other side of this you are clearly correct that we are copying
>>>>> the
>>>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
>>>>> But
>>>>> this is why we represent data inside of the wide_ints, the INT_CSTs and
>>>>> the
>>>>> CONST_WIDE_INTs in a compressed form.   Even with very big types, which
>>>>> are
>>>>> generally rare, the constants them selves are very small.   So the copy
>>>>> operation is a loop that almost always copies one element, even with
>>>>> tree-vrp which doubles the sizes of every type.
>>>>>
>>>>> There is the third option which is that the storage inside the wide int
>>>>> is
>>>>> just ggced storage.  We rejected this because of the functional nature
>>>>> of
>>>>> wide-ints.    There are zillions created, they can be stack allocated,
>>>>> and
>>>>> they last for very short periods of time.
>>>>
>>>> Of course - GCing wide-ints is a non-starter.
>>>>
>>>>>>> As is probably obvious, I don't agree FWIW.  It seems like an
>>>>>>> unnecessary
>>>>>>> complication without any clear use.  Especially since the number of
>>>>>>
>>>>>> Maybe the double_int typedef is without any clear use.  Properly
>>>>>> abstracting from the storage / information providers will save
>>>>>> compile-time, memory and code though.  I don't see that any thought
>>>>>> was spent on how to avoid excessive copying or dealing with
>>>>>> long(er)-lived objects and their storage needs.
>>>>>
>>>>> I actually disagree.    Wide ints can use a bloated amount of storage
>>>>> because they are designed to be very short lived and very low cost
>>>>> objects
>>>>> that are stack allocated.   For long term storage, there is INT_CST at
>>>>> the
>>>>> tree level and CONST_WIDE_INT at the rtl level.  Those use a very
>>>>> compact
>>>>> storage model.   The copying entailed is only a small part of the
>>>>> overall
>>>>> performance.
>>>>
>>>> Well, but both trees and RTXen are not viable for short-lived things
>>>> because
>>>> the are GCed!  double-ints were suitable for this kind of stuff because
>>>> the also have a moderate size.  With wide-ints size becomes a problem
>>>> (or GC, if you instead use trees or RTXen).
>>>>
>>>>> Everything that you are suggesting along these lines is adding to the
>>>>> weight
>>>>> of a wide-int object.
>>>>
>>>> On the contrary - it lessens their weight (with external already
>>>> existing storage)
>>>> or does not do anything to it (with the embedded storage).
>>>>
>>>>>    You have to understand there will be many more
>>>>> wide-ints created in a normal compilation than were ever created with
>>>>> double-int.    This is because the rtl level had no object like this at
>>>>> all
>>>>> and at the tree level, many of the places that should have used double
>>>>> int,
>>>>> short cut the code and only did the transformations if the types fit in
>>>>> a
>>>>> HWI.
>>>>
>>>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
>>>> will become a very frequent operation and thus it is worth optimizing
>>>> it.
>>>>
>>>>> This is why we are extremely defensive about this issue.   We really
>>>>> did
>>>>> think a lot about it.
>>>>
>>>> I'm sure you did.
>>>>
>>>> Richard.
>>>
>>>
>

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

Reply via email to