On Tue, Nov 27, 2012 at 1:06 AM, Kenneth Zadeck <zad...@naturalbridge.com> wrote: > Richard, > > I spent a good part of the afternoon talking to Mike about this. He is on > the c++ standards committee and is a much more seasoned c++ programmer than > I am. > > He convinced me that with a large amount of engineering and c++ > "foolishness" that it was indeed possible to get your proposal to POSSIBLY > work as well as what we did. > > But now the question is why would any want to do this? > > At the very least you are talking about instantiating two instances of > wide-ints, one for the stack allocated uses and one for the places where we > just move a pointer from the tree or the rtx. Then you are talking about > creating connectors so that the stack allocated functions can take > parameters of pointer version and visa versa. > > Then there is the issue that rather than just saying that something is a > wide int, that the programmer is going to have to track it's origin. In > particular, where in the code right now i say. > > wide_int foo = wide_int::from_rtx (r1); > wide_int bar = wide_int::from_rtx (r2) + foo; > > now i would have to say > > wide_int_ptr foo = wide_int_ptr::from_rtx (r1); > wide_int_stack bar = wide_int_ptr::from_rtx (r2) + foo;
No, you'd say wide_int foo = wide_int::from_rtx (r1); and the static, non-templated from_rtx method would automagically return (always!) a "wide_int_ptr" kind. The initialization then would use the assignment operator that mediates between wide_int and "wide_int_ptr", doing the copying. The user should get a 'stack' kind by default when specifying wide_int, like implemented with struct wide_int_storage_stack; struct wide_int_storage_ptr; template <class storage = wide_int_storage_stack> class wide_int : public storage { ... static wide_int <wide_int_storage_ptr> from_rtx (rtx); } the whole point of the exercise is to make from_rtx and from_tree avoid the copying (and excessive stack space allocation) for the rvalue case like in wide_int res = wide_int::from_rtx (x) + 1; if you save the result into a wide_int temporary first then you are lost of course (modulo some magic GCC optimization being able to elide the copy somehow). And of course for code like VRP that keeps a lattice of wide_ints to be able to reduce its footprint by using ptr storage and explicit allocations (that's a secondary concern, of course). And for VRP to specify that it needs more than the otherwise needed MAX_INT_MODE_SIZE. ptr storage would not have this arbitrary limitation, only embedded storage (should) have. > then when i want to call some function using a wide_int ref that function > now must be either overloaded to take both or i have to choose one of the > two instantiations (presumably based on which is going to be more common) > and just have the compiler fix up everything (which it is likely to do). Nope, they'd be class wide_int ... { template <class storage1, class storage2> wide_int operator+(wide_int <storage1> a, wide_int<storage2> b) { return wide_int::plus_worker (a.precision, a. ...., a.get_storage_ptr (), b.precision, ..., b.get_storage_ptr ()); } > And so what is the payoff: > 1) No one except the c++ elite is going to understand the code. The rest of > the community will hate me and curse the ground that i walk on. Maybe for the implementation - but look at hash-table and vec ... not for usage certainly. > 2) I will end up with a version of wide-int that can be used as a medium > life container (where i define medium life as not allowed to survive a gc > since they will contain pointers into rtxes and trees.) > 3) An no clients that actually wanted to do this!! I could use as an > example one of your favorite passes, tree-vrp. The current double-int > could have been a medium lifetime container since it has a smaller > footprint, but in fact tree-vrp converts those double-ints back into trees > for medium storage. Why, because it needs the other fields of a tree-cst > to store the entire state. Wide-ints also "suffer" this problem. their > only state are the data, and the three length fields. They have no type > and none of the other tree info so the most obvious client for a medium > lifetime object is really not going to be a good match even if you "solve > the storage problem". > > The fact is that wide-ints are an excellent short term storage class that > can be very quickly converted into our two long term storage classes. Your > proposal is requires a lot of work, will not be easy to use and as far as i > can see has no payoff on the horizon. It could be that there could be > future clients for a medium lifetime value, but asking for this with no > clients in hand is really beyond the scope of a reasonable review. > > I remind you that the purpose of these patches is to solve problems that > exist in the current compiler that we have papered over for years. If > someone needs wide-ints in some way that is not foreseen then they can > change it. The patches introduce a lot more temporary wide-ints (your words) and at the same time makes construction of them from tree / rtx very expensive both stack space and compile-time wise. Look at how we for example compute TREE_INT_CST + 1 - int_cst_binop internally uses double_ints for the computation and then instantiates a new tree for holding the result. Now we'd use wide_ints for this requring totally unnecessary copying. Why not in the first place try to avoid that. And try to avoid making wide_ints 4 times as large as really necessary just for the sake of VRP! (VRP should have a way to say "_I_ want larger wide_ints", without putting this burden on all other users). Richard. > kenny > > > On 11/26/2012 11:30 AM, Richard Biener wrote: >> >> On Mon, Nov 26, 2012 at 5:03 PM, Kenneth Zadeck >> <zad...@naturalbridge.com> wrote: >>> >>> On 11/26/2012 10:03 AM, Richard Biener wrote: >>>> >>>> On Mon, Nov 5, 2012 at 2:59 PM, Kenneth Zadeck >>>> <zad...@naturalbridge.com> >>>> wrote: >>>>> >>>>> On 11/04/2012 11:54 AM, Richard Biener wrote: >>>>>> >>>>>> On Thu, Nov 1, 2012 at 2:10 PM, Richard Sandiford >>>>>> <rdsandif...@googlemail.com> wrote: >>>>>>> >>>>>>> Kenneth Zadeck <zad...@naturalbridge.com> writes: >>>>>>>> >>>>>>>> I would like you to respond to at least point 1 of this email. In >>>>>>>> it >>>>>>>> there is code from the rtl level that was written twice, once for >>>>>>>> the >>>>>>>> case when the size of the mode is less than the size of a HWI and >>>>>>>> once >>>>>>>> for the case where the size of the mode is less that 2 HWIs. >>>>>>>> >>>>>>>> my patch changes this to one instance of the code that works no >>>>>>>> matter >>>>>>>> how large the data passed to it is. >>>>>>>> >>>>>>>> you have made a specific requirement for wide int to be a template >>>>>>>> that >>>>>>>> can be instantiated in several sizes, one for 1 HWI, one for 2 HWI. >>>>>>>> I >>>>>>>> would like to know how this particular fragment is to be rewritten >>>>>>>> in >>>>>>>> this model? It seems that I would have to retain the structure >>>>>>>> where >>>>>>>> there is one version of the code for each size that the template is >>>>>>>> instantiated. >>>>>>> >>>>>>> I think richi's argument was that wide_int should be split into two. >>>>>>> There should be a "bare-metal" class that just has a length and HWIs, >>>>>>> and the main wide_int class should be an extension on top of that >>>>>>> that does things to a bit precision instead. Presumably with some >>>>>>> template magic so that the length (number of HWIs) is a constant for: >>>>>>> >>>>>>> typedef foo<2> double_int; >>>>>>> >>>>>>> and a variable for wide_int (because in wide_int the length would be >>>>>>> the number of significant HWIs rather than the size of the underlying >>>>>>> array). wide_int would also record the precision and apply it after >>>>>>> the full HWI operation. >>>>>>> >>>>>>> So the wide_int class would still provide "as wide as we need" >>>>>>> arithmetic, >>>>>>> as in your rtl patch. I don't think he was objecting to that. >>>>>> >>>>>> That summarizes one part of my complaints / suggestions correctly. In >>>>>> other >>>>>> mails I suggested to not make it a template but a constant over object >>>>>> lifetime >>>>>> 'bitsize' (or maxlen) field. Both suggestions likely require more >>>>>> thought >>>>>> than >>>>>> I put into them. The main reason is that with C++ you can abstract >>>>>> from >>>>>> where >>>>>> wide-int information pieces are stored and thus use the arithmetic / >>>>>> operation >>>>>> workers without copying the (source) "wide-int" objects. Thus you >>>>>> should >>>>>> be able to write adaptors for double-int storage, tree or RTX storage. >>>>> >>>>> We had considered something along these lines and rejected it. I am >>>>> not >>>>> really opposed to doing something like this, but it is not an obvious >>>>> winning idea and is likely not to be a good idea. Here was our >>>>> thought >>>>> process: >>>>> >>>>> if you abstract away the storage inside a wide int, then you should be >>>>> able >>>>> to copy a pointer to the block of data from either the rtl level >>>>> integer >>>>> constant or the tree level one into the wide int. It is certainly >>>>> true >>>>> that making a wide_int from one of these is an extremely common >>>>> operation >>>>> and doing this would avoid those copies. >>>>> >>>>> However, this causes two problems: >>>>> 1) Mike's first cut at the CONST_WIDE_INT did two ggc allocations to >>>>> make >>>>> the object. it created the base object and then it allocated the >>>>> array. >>>>> Richard S noticed that we could just allocate one CONST_WIDE_INT that >>>>> had >>>>> the array in it. Doing it this way saves one ggc allocation and one >>>>> indirection when accessing the data within the CONST_WIDE_INT. Our >>>>> plan >>>>> is >>>>> to use the same trick at the tree level. So to avoid the copying, you >>>>> seem >>>>> to have to have a more expensive rep for CONST_WIDE_INT and INT_CST. >>>> >>>> I did not propose having a pointer to the data in the RTX or tree int. >>>> Just >>>> the short-lived wide-ints (which are on the stack) would have a pointer >>>> to >>>> the data - which can then obviously point into the RTX and tree data. >>> >>> There is the issue then what if some wide-ints are not short lived. It >>> makes >>> me nervous to create internal pointers to gc ed memory. >> >> I thought they were all short-lived. >> >>>>> 2) You are now stuck either ggcing the storage inside a wide_int when >>>>> they >>>>> are created as part of an expression or you have to play some game to >>>>> represent the two different storage plans inside of wide_int. >>>> >>>> Hm? wide-ints are short-lived and thus never live across a garbage >>>> collection >>>> point. We create non-GCed objects pointing to GCed objects all the time >>>> and everywhere this way. >>> >>> Again, this makes me nervous but it could be done. However, it does mean >>> that now the wide ints that are not created from rtxes or trees will be >>> more >>> expensive because they are not going to get their storage "for free", >>> they >>> are going to alloca it. >> >> No, those would simply use the embedded storage model. >> >>> however, it still is not clear, given that 99% of the wide ints are going >>> to >>> fit in a single hwi, that this would be a noticeable win. >> >> Currently even if they fit into a HWI you will still allocate 4 times the >> larges integer mode size. You say that doesn't matter because they >> are short-lived, but I say it does matter because not all of them are >> short-lived enough. If 99% fit in a HWI why allocate 4 times the >> largest integer mode size in 99% of the cases? >> >>>>> Clearly this >>>>> is where you think that we should be going by suggesting that we >>>>> abstract >>>>> away the internal storage. However, this comes at a price: what is >>>>> currently an array access in my patches would (i believe) become a >>>>> function >>>>> call. >>>> >>>> No, the workers (that perform the array accesses) will simply get >>>> a pointer to the first data element. Then whether it's embedded or >>>> external is of no interest to them. >>> >>> so is your plan that the wide int constructors from rtx or tree would >>> just >>> copy the pointer to the array on top of the array that is otherwise >>> allocated on the stack? I can easily do this. But as i said, the >>> gain >>> seems quite small. >>> >>> And of course, going the other way still does need the copy. >> >> The proposal was to template wide_int on a storage model, the embedded >> one would work as-is (embedding 4 times largest integer mode), the >> external one would have a pointer to data. All functions that return a >> wide_int produce a wide_int with the embedded model. To avoid >> the function call penalty you described the storage model provides >> a way to get a pointer to the first element and the templated operations >> simply dispatch to a worker that takes this pointer to the first element >> (as the storage model is designed as a template its abstraction is going >> to be optimized away by means of inlining). >> >> Richard. >> >>>>> From a performance point of view, i believe that this is a non >>>>> starter. If you can figure out how to design this so that it is not a >>>>> function call, i would consider this a viable option. >>>>> >>>>> On the other side of this you are clearly correct that we are copying >>>>> the >>>>> data when we are making wide ints from INT_CSTs or CONST_WIDE_INTs. >>>>> But >>>>> this is why we represent data inside of the wide_ints, the INT_CSTs and >>>>> the >>>>> CONST_WIDE_INTs in a compressed form. Even with very big types, which >>>>> are >>>>> generally rare, the constants them selves are very small. So the copy >>>>> operation is a loop that almost always copies one element, even with >>>>> tree-vrp which doubles the sizes of every type. >>>>> >>>>> There is the third option which is that the storage inside the wide int >>>>> is >>>>> just ggced storage. We rejected this because of the functional nature >>>>> of >>>>> wide-ints. There are zillions created, they can be stack allocated, >>>>> and >>>>> they last for very short periods of time. >>>> >>>> Of course - GCing wide-ints is a non-starter. >>>> >>>>>>> As is probably obvious, I don't agree FWIW. It seems like an >>>>>>> unnecessary >>>>>>> complication without any clear use. Especially since the number of >>>>>> >>>>>> Maybe the double_int typedef is without any clear use. Properly >>>>>> abstracting from the storage / information providers will save >>>>>> compile-time, memory and code though. I don't see that any thought >>>>>> was spent on how to avoid excessive copying or dealing with >>>>>> long(er)-lived objects and their storage needs. >>>>> >>>>> I actually disagree. Wide ints can use a bloated amount of storage >>>>> because they are designed to be very short lived and very low cost >>>>> objects >>>>> that are stack allocated. For long term storage, there is INT_CST at >>>>> the >>>>> tree level and CONST_WIDE_INT at the rtl level. Those use a very >>>>> compact >>>>> storage model. The copying entailed is only a small part of the >>>>> overall >>>>> performance. >>>> >>>> Well, but both trees and RTXen are not viable for short-lived things >>>> because >>>> the are GCed! double-ints were suitable for this kind of stuff because >>>> the also have a moderate size. With wide-ints size becomes a problem >>>> (or GC, if you instead use trees or RTXen). >>>> >>>>> Everything that you are suggesting along these lines is adding to the >>>>> weight >>>>> of a wide-int object. >>>> >>>> On the contrary - it lessens their weight (with external already >>>> existing storage) >>>> or does not do anything to it (with the embedded storage). >>>> >>>>> You have to understand there will be many more >>>>> wide-ints created in a normal compilation than were ever created with >>>>> double-int. This is because the rtl level had no object like this at >>>>> all >>>>> and at the tree level, many of the places that should have used double >>>>> int, >>>>> short cut the code and only did the transformations if the types fit in >>>>> a >>>>> HWI. >>>> >>>> Your argument shows that the copy-in/out from tree/RTX to/from wide-int >>>> will become a very frequent operation and thus it is worth optimizing >>>> it. >>>> >>>>> This is why we are extremely defensive about this issue. We really >>>>> did >>>>> think a lot about it. >>>> >>>> I'm sure you did. >>>> >>>> Richard. >>> >>> >