Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

Richard Biener Mon, 26 Nov 2012 08:31:17 -0800

On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck
<[email protected]> wrote:
> On 11/26/2012 10:03 AM, Richard Biener wrote:
>>
>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck <[email protected]>
>> wrote:
>>>
>>> On 11/04/2012 11:54 AM, Richard Biener wrote:
>>>>
>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford
>>>> <[email protected]> wrote:
>>>>>
>>>>> Kenneth Zadeck <[email protected]> writes:
>>>>>>
>>>>>> I would like you to respond to at least point 1 of this email.   In it
>>>>>> there is code from the rtl level that was written twice, once for the
>>>>>> case when the size of the mode is less than the size of a HWI and once
>>>>>> for the case where the size of the mode is less that 2 HWIs.
>>>>>>
>>>>>> my patch changes this to one instance of the code that works no matter
>>>>>> how large the data passed to it is.
>>>>>>
>>>>>> you have made a specific requirement for wide int to be a template
>>>>>> that
>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI.
>>>>>> I
>>>>>> would like to know how this particular fragment is to be rewritten in
>>>>>> this model?   It seems that I would have to retain the structure where
>>>>>> there is one version of the code for each size that the template is
>>>>>> instantiated.
>>>>>
>>>>> I think richi's argument was that wide_int should be split into two.
>>>>> There should be a "bare-metal" class that just has a length and HWIs,
>>>>> and the main wide_int class should be an extension on top of that
>>>>> that does things to a bit precision instead.  Presumably with some
>>>>> template magic so that the length (number of HWIs) is a constant for:
>>>>>
>>>>>     typedef foo<2> double_int;
>>>>>
>>>>> and a variable for wide_int (because in wide_int the length would be
>>>>> the number of significant HWIs rather than the size of the underlying
>>>>> array).  wide_int would also record the precision and apply it after
>>>>> the full HWI operation.
>>>>>
>>>>> So the wide_int class would still provide "as wide as we need"
>>>>> arithmetic,
>>>>> as in your rtl patch.  I don't think he was objecting to that.
>>>>
>>>> That summarizes one part of my complaints / suggestions correctly.  In
>>>> other
>>>> mails I suggested to not make it a template but a constant over object
>>>> lifetime
>>>> 'bitsize' (or maxlen) field.  Both suggestions likely require more
>>>> thought
>>>> than
>>>> I put into them.  The main reason is that with C++ you can abstract from
>>>> where
>>>> wide-int information pieces are stored and thus use the arithmetic /
>>>> operation
>>>> workers without copying the (source) "wide-int" objects.  Thus you
>>>> should
>>>> be able to write adaptors for double-int storage, tree or RTX storage.
>>>
>>> We had considered something along these lines and rejected it.   I am not
>>> really opposed to doing something like this, but it is not an obvious
>>> winning idea and is likely not to be a good idea.   Here was our thought
>>> process:
>>>
>>> if you abstract away the storage inside a wide int, then you should be
>>> able
>>> to copy a pointer to the block of data from either the rtl level integer
>>> constant or the tree level one into the wide int.   It is certainly true
>>> that making a wide_int from one of these is an extremely common operation
>>> and doing this would avoid those copies.
>>>
>>> However, this causes two problems:
>>> 1)  Mike's first cut at the CONST_WIDE_INT did two ggc allocations to
>>> make
>>> the object.   it created the base object and then it allocated the array.
>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that had
>>> the array in it.   Doing it this way saves one ggc allocation and one
>>> indirection when accessing the data within the CONST_WIDE_INT.   Our plan
>>> is
>>> to use the same trick at the tree level.   So to avoid the copying, you
>>> seem
>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST.
>>
>> I did not propose having a pointer to the data in the RTX or tree int.
>> Just
>> the short-lived wide-ints (which are on the stack) would have a pointer to
>> the data - which can then obviously point into the RTX and tree data.
>
> There is the issue then what if some wide-ints are not short lived. It makes
> me nervous to create internal pointers to gc ed memory.


I thought they were all short-lived.

>>> 2) You are now stuck either ggcing the storage inside a wide_int when
>>> they
>>> are created as part of an expression or you have to play some game to
>>> represent the two different storage plans inside of wide_int.
>>
>> Hm?  wide-ints are short-lived and thus never live across a garbage
>> collection
>> point.  We create non-GCed objects pointing to GCed objects all the time
>> and everywhere this way.
>
> Again, this makes me nervous but it could be done.  However, it does mean
> that now the wide ints that are not created from rtxes or trees will be more
> expensive because they are not going to get their storage "for free", they
> are going to alloca it.

No, those would simply use the embedded storage model.

> however, it still is not clear, given that 99% of the wide ints are going to
> fit in a single hwi, that this would be a noticeable win.

Currently even if they fit into a HWI you will still allocate 4 times the
larges integer mode size.  You say that doesn't matter because they
are short-lived, but I say it does matter because not all of them are
short-lived enough.  If 99% fit in a HWI why allocate 4 times the
largest integer mode size in 99% of the cases?

>>
>>>    Clearly this
>>> is where you think that we should be going by suggesting that we abstract
>>> away the internal storage.   However, this comes at a price:   what is
>>> currently an array access in my patches would (i believe) become a
>>> function
>>> call.
>>
>> No, the workers (that perform the array accesses) will simply get
>> a pointer to the first data element.  Then whether it's embedded or
>> external is of no interest to them.
>
> so is your plan that the wide int constructors from rtx or tree would just
> copy the pointer to the array on top of the array that is otherwise
> allocated on the stack?    I can easily do this.   But as i said, the gain
> seems quite small.
>
> And of course, going the other way still does need the copy.

The proposal was to template wide_int on a storage model, the embedded
one would work as-is (embedding 4 times largest integer mode), the
external one would have a pointer to data.  All functions that return a
wide_int produce a wide_int with the embedded model.  To avoid
the function call penalty you described the storage model provides
a way to get a pointer to the first element and the templated operations
simply dispatch to a worker that takes this pointer to the first element
(as the storage model is designed as a template its abstraction is going
to be optimized away by means of inlining).

Richard.

>>>   From a performance point of view, i believe that this is a non
>>> starter. If you can figure out how to design this so that it is not a
>>> function call, i would consider this a viable option.
>>>
>>> On the other side of this you are clearly correct that we are copying the
>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs.
>>> But
>>> this is why we represent data inside of the wide_ints, the INT_CSTs and
>>> the
>>> CONST_WIDE_INTs in a compressed form.   Even with very big types, which
>>> are
>>> generally rare, the constants them selves are very small.   So the copy
>>> operation is a loop that almost always copies one element, even with
>>> tree-vrp which doubles the sizes of every type.
>>>
>>> There is the third option which is that the storage inside the wide int
>>> is
>>> just ggced storage.  We rejected this because of the functional nature of
>>> wide-ints.    There are zillions created, they can be stack allocated,
>>> and
>>> they last for very short periods of time.
>>
>> Of course - GCing wide-ints is a non-starter.
>>
>>>>> As is probably obvious, I don't agree FWIW.  It seems like an
>>>>> unnecessary
>>>>> complication without any clear use.  Especially since the number of
>>>>
>>>> Maybe the double_int typedef is without any clear use.  Properly
>>>> abstracting from the storage / information providers will save
>>>> compile-time, memory and code though.  I don't see that any thought
>>>> was spent on how to avoid excessive copying or dealing with
>>>> long(er)-lived objects and their storage needs.
>>>
>>> I actually disagree.    Wide ints can use a bloated amount of storage
>>> because they are designed to be very short lived and very low cost
>>> objects
>>> that are stack allocated.   For long term storage, there is INT_CST at
>>> the
>>> tree level and CONST_WIDE_INT at the rtl level.  Those use a very compact
>>> storage model.   The copying entailed is only a small part of the overall
>>> performance.
>>
>> Well, but both trees and RTXen are not viable for short-lived things
>> because
>> the are GCed!  double-ints were suitable for this kind of stuff because
>> the also have a moderate size.  With wide-ints size becomes a problem
>> (or GC, if you instead use trees or RTXen).
>>
>>> Everything that you are suggesting along these lines is adding to the
>>> weight
>>> of a wide-int object.
>>
>> On the contrary - it lessens their weight (with external already
>> existing storage)
>> or does not do anything to it (with the embedded storage).
>>
>>>   You have to understand there will be many more
>>> wide-ints created in a normal compilation than were ever created with
>>> double-int.    This is because the rtl level had no object like this at
>>> all
>>> and at the tree level, many of the places that should have used double
>>> int,
>>> short cut the code and only did the transformations if the types fit in a
>>> HWI.
>>
>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int
>> will become a very frequent operation and thus it is worth optimizing it.
>>
>>> This is why we are extremely defensive about this issue.   We really did
>>> think a lot about it.
>>
>> I'm sure you did.
>>
>> Richard.
>
>

Re: GCC 4.8.0 Status Report (2012-10-29), Stage 1 to end soon

Reply via email to