Hi, > > Well, even for the partial inlining case I'd devise a scheme that > if a call to such a forwarder remains it gets expanded as a call to > the original (non-split) function (similar to emitting a call to > the asm-thunk instead of expanding the gimple representation of the > thunk). I realize this is another special case, but I like it more
For partial inlining I do have plan. I intend to simply mark edges of calls containing __builtin_va_arg_pack argument, arrange it to disappear when function gets inlinined and make inliner to do final pass after inlining small function inlining all those calls. This will work nicely for partial inlining - when all header functions gets inlined, the offline copy will remain. When header won't get inlined, the function will be merged again. When some of them gets inlined, the partially inlined function will get unnecesiraly dupicated, but just twice that is not bad for this weird side case we speak about. It won't help thunks where offline copy may neccesarily exist and we already have thunks to avoid that duplication of callee function body after all. I think we have options: 1) keep them "non functions" in the callgraph. Then we need a) I) make inliner aware of expanding the thunk whenever we decide to inline and call edge that "goes through thunk". OR II) make the direct calls to thunks not appearing in our IL and being directly folded into inline code of the thunk b) teach ipa-prop to compute jump functions correctly for thunked edges: for devirtualization happening at IPA we will neccesarily need represent those edges without having the adjustment visible at analysis time. 2) make thunk gimple representible. For this we need a) introduce gimple representation of thunk tail call, i.e. gimple_tailcall_va_arg_pack or something b) make gimple passes to understand it. c) I) invent expansion machinery OR II) teach IPA passes to special case bodies of thunk and do not change them so current ASM machinery stays around d) teach ipa-inline and ipa-prop to handle this new type of call and propagate through. Martin implemented 1-II variant since. It doesn't seem that much worse than together alternatives to me and is significandly easier to implement. Main drawback is the need for nice cgraph representation. Representing thunks completely on side as we do now has turned out to be confusing and not really pretty. So we want thunks appear as full fledged cgraph node with call edge to function they are thunking. This will neccesarily impose some extra work on IPA passes since they will not be able to easilly expect that everything in cgraph is a function. But we have precisely the same problem with alias and unless we want to go for aliases represented as gimple bodies symmetric to "empty thunks", we will need this special case anyway. As for alias/thunk, I plan to make them symbol table entries and make callgraph edges to have "caller" and "target". Target of call is the symbol called, not neccesarily function. Then I plan to have cgraph_edge_callee accesor that will look into real target since it is what majority of IPA code cares about. Also we will need to have cgraph_edge_callee_body_visibility instead of global cgraph_body_visibility we have now since it depends if you call through alias that can be overwritten. We can also have function that will return the thunk adjustments happening on the edge IMO. Since IPA cares about this on two places (computation of jump function and inlining), it give relatively low pain. With this API, 1-a-I or 1-a-II would not be too ugly. Still combination of both makes most sense for me: there is little to lose by inserting the thunk code as soon as we devirutalize during local optimization and if we won't do so we will just keep missing optimizations. We will need II to nicely handle case when devirutalization happens at WPA. 2-I variant is probably most generic and will solve the aforementioned partial inlining better in a sense that we will not need to duplicate function body in the case variadic function gets partially inlined only into some of its callees. At the same time it seems quite invasive and IMO have very little pratical benefit. I also don't know if having extra exotic gimple construct to worry about really makes compiler more maintainable. Honza > as it appears to be more flexible to me ;) (it's similar to > the redefined extern inline function case, one function body for > inlining and another one for call targets) > > > > But well, it's just my random 2 cents on the issue ;) > > :) > :) > > Richard.