Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

Jan Hubicka Tue, 19 Apr 2011 07:50:43 -0700

Hi,
> 
> Well, even for the partial inlining case I'd devise a scheme that
> if a call to such a forwarder remains it gets expanded as a call to
> the original (non-split) function (similar to emitting a call to
> the asm-thunk instead of expanding the gimple representation of the 
> thunk).  I realize this is another special case, but I like it more


For partial inlining I do have plan.  I intend to simply mark edges of calls
containing __builtin_va_arg_pack argument, arrange it to disappear when
function gets inlinined and make inliner to do final pass after inlining small
function inlining all those calls.

This will work nicely for partial inlining - when all header functions gets
inlined, the offline copy will remain.  When header won't get inlined, the
function will be merged again.

When some of them gets inlined, the partially inlined function will get
unnecesiraly dupicated, but just twice that is not bad for this weird side case
we speak about.

It won't help thunks where offline copy may neccesarily exist and we already
have thunks to avoid that duplication of callee function body after all.

I think we have options:

  1) keep them "non functions" in the callgraph.  Then we need
     a) I) make inliner aware of expanding the thunk whenever we decide to 
inline
           and call edge that "goes through thunk".
        OR
        II) make the direct calls to thunks not appearing in our IL and being
            directly folded into inline code of the thunk
     b) teach ipa-prop to compute jump functions correctly for thunked edges:
        for devirtualization happening at IPA we will neccesarily need represent
        those edges without having the adjustment visible at analysis time.
  2) make thunk gimple representible.  For this we need
     a) introduce gimple representation of thunk tail call, i.e. 
gimple_tailcall_va_arg_pack or something
     b) make gimple passes to understand it.
     c) I) invent expansion machinery
        OR
        II) teach IPA passes to special case bodies of thunk and do not change 
them so current
            ASM machinery stays around
     d) teach ipa-inline and ipa-prop to handle this new type of call and 
propagate through.

Martin implemented 1-II variant since. It doesn't seem that much worse than
together alternatives to me and is significandly easier to implement.

Main drawback is the need for nice cgraph representation.  Representing thunks
completely on side as we do now has turned out to be confusing and not really
pretty. So we want thunks appear as full fledged cgraph node with call edge to
function they are thunking.

This will neccesarily impose some extra work on IPA passes since they will not
be able to easilly expect that everything in cgraph is a function.  But we have
precisely the same problem with alias and unless we want to go for aliases
represented as gimple bodies symmetric to "empty thunks", we will need this
special case anyway.

As for alias/thunk, I plan to make them symbol table entries and make callgraph
edges to have "caller" and "target". Target of call is the symbol called, not
neccesarily function.  Then I plan to have cgraph_edge_callee accesor that will
look into real target since it is what majority of IPA code cares about.

Also we will need to have cgraph_edge_callee_body_visibility instead of global
cgraph_body_visibility we have now since it depends if you call through alias
that can be overwritten. We can also have function that will return the thunk
adjustments happening on the edge IMO. Since IPA cares about this on two places
(computation of jump function and inlining), it give relatively low pain.

With this API, 1-a-I or 1-a-II would not be too ugly. Still combination of both
makes most sense for me: there is little to lose by inserting the thunk code as
soon as we devirutalize during local optimization and if we won't do so we will
just keep missing optimizations. We will need II to nicely handle case when
devirutalization happens at WPA.

2-I variant is probably most generic and will solve the aforementioned partial
inlining better in a sense that we will not need to duplicate function body in
the case variadic function gets partially inlined only into some of its
callees. At the same time it seems quite invasive and IMO have very little 
pratical
benefit.  I also don't know if having extra exotic gimple construct to worry
about really makes compiler more maintainable.

Honza

> as it appears to be more flexible to me ;)  (it's similar to
> the redefined extern inline function case, one function body for
> inlining and another one for call targets)
> 
> > > But well, it's just my random 2 cents on the issue ;)
> > :)
> :)
> 
> Richard.

Re: [PATCH] Fix incorrect devirtualization (PR middle-end/48661)

Reply via email to