Hi Richard,

Thanks for your reply. I'm looking forward to hearing from Honza!

But I still have a small question about the IPA passes. If I want to
make such a conversion (sum += foo[i].add == add ? add (1,2) :
foo[i].add (1,2);), I definitely need to change the function's body.
But the article "Using summary information in IPA
passes"(https://gcc.gnu.org/onlinedocs/gccint/link-time-optimization/using-summary-
Information-in-ipa-passes.html) mentioned that if IPA-Pass needs to
change the function body, it needs to be done in the transform stage,
but the execution stage of IPA-inline has ended before the transform
stage. So we still can’t mark the newly created function
calls/cgraph_edge as inlineable. Is that correct?

Or did I understand something wrong?

Thanks
Hanke Zhang

Richard Biener <richard.guent...@gmail.com> 于2024年6月12日周三 18:49写道:
>
> On Wed, Jun 12, 2024 at 11:57 AM Hanke Zhang <hkzhang...@gmail.com> wrote:
> >
> > Richard Biener <richard.guent...@gmail.com> 于2024年5月24日周五 14:39写道:
> > >
> > > On Fri, May 24, 2024 at 5:53 AM Hanke Zhang via Gcc <gcc@gcc.gnu.org> 
> > > wrote:
> > > >
> > > > Hi,
> > > > I got a question about optimizing function pointers for direct
> > > > function calls in C.
> > > >
> > > > Consider the following scenario: one of the fields of a structure is a
> > > > function pointer, and all its assignments come from the same function.
> > > > Can all its uses be replaced by direct calls to this function? So the
> > > > later passes can do more optimizations.
> > > >
> > > > Here is the example:
> > > >
> > > > int add(int a, int b) { return a + b; }
> > > > int sub(int a, int b) { return a - b; }
> > > >
> > > > struct Foo {
> > > >     int (*add)(int, int);
> > > > };
> > > > int main()
> > > > {
> > > >     struct Foo foo[5] = malloc(sizeof(struct Foo) * 5);
> > > >
> > > >     for (int i = 0; i < 5; i++) {
> > > >         foo[i].add = add;
> > > >     }
> > > >
> > > >     int sum = 0;
> > > >     for (int i = 0; i < 5; i++) {
> > > >         sum += foo[i].add(1, 2);
> > > >     }
> > > >
> > > >     return 0;
> > > > }
> > > >
> > > > Can I replace the code above to the code below?
> > > >
> > > > int add(int a, int b) { return a + b; }
> > > > int sub(int a, int b) { return a - b; }
> > > >
> > > > struct Foo {
> > > >     int (*add)(int, int);
> > > > };
> > > > int main()
> > > > {
> > > >     struct Foo foo[5] = malloc(sizeof(struct Foo) * 5);
> > > >
> > > >     for (int i = 0; i < 5; i++) {
> > > >         foo[i].add = add;
> > > >     }
> > > >
> > > >     int sum = 0;
> > > >     for (int i = 0; i < 5; i++) {
> > > >         sum += add(1,2);
> > > >     }
> > > >
> > > >     return 0;
> > > > }
> > > >
> > > > My idea is to determine whether the assignment of the field is the
> > > > same function, and if so, perform the replacement.
> > >
> > > If it's as simple as above then sure, even CSE should do it.  If you
> > > can prove otherwise the memory location with the function pointer
> > > always has the same value you are obviously fine.  If you just
> > > do not see any other store via 'add's FIELD_DECL then no, that
> > > isn't good enough.  Every void * store you do not know where it
> > > goes might go to that slot.
> > >
> > > > Of course this is not a reasonable optimization, I just want to know
> > > > if there are security issues in doing so, and if I want to do it in
> > > > the IPA stage, is it possible?
> > >
> > > For the more general case you can do what we do for speculative
> > > devirtualization - replace the code with
> > >
> > >       sum += foo[i].add == add ? add (1,2) : foo[i].add(1,2);
> >
> > Hi Richard,
> >
> > I'm trying to do what you suggested. (sum += foo[i].add == add ? add
> > (1,2) : foo[i].add(1,2);)
> >
> > I created a new IPA-Pass before IPA-INLINE and I made the changes on
> > the GIMPLE in the "function_transform" stage. But my newly created
> > "foo[i].add(1,2)" seems to fail to be inlined. And it was optimized
> > out in the subsequent FRE. I would like to ask if there is any way to
> > mark my newly created cgraph_edge as "inline" in the
> > "function_transform" stage.
> >
> > Here is part of the code creating the call_stmt in true basic_block in
> > my IPA-PASS:
> >
> > // ...
> > // part of the code have been omitted
> > auto_vec<tree> params;
> > for (int i = 0; i < gimple_call_num_args (call_stmt); i++) {
> >   params.safe_push (gimple_call_arg (call_stmt, i));
> > }
> > gimple* tbb_call = gimple_build_call_vec (fndecl, params); /// = fn()
> > tree tbb_ssa;
> > if (ret_val_flag) {
> >   tbb_ssa = make_ssa_name (TREE_TYPE (gimple_call_lhs(call_stmt)), NULL);
> >   gimple_call_set_lhs (tbb_call, tbb_ssa); /// _2 = fn()
> > }
> > gsi = gsi_start_bb (tbb);
> > gsi_insert_before (&gsi, tbb_call, GSI_SAME_STMT);
> > cgraph_edge* tbb_callee = node->create_edge (cgraph_node::get_create
> > (fndecl), (gcall*)tbb_call, tbb_call->bb->count, true);
> > // what should I do to this cgraph_edge to mark it to be inlined
> > // ...
> >
> > Or should I not do these things in the function_transform stage?
>
> That's indeed too late.  You have to create speculative call edges,
> I would suggest to look how IPA devirtualization handles this case
> and possibly simply amend it.  I'm CCing Honza who should know
> more about this, I do not know the details.
>
> Richard.
>
> > Thanks
> > Hanke Zhang
> >
> >
> > >
> > > that way we can inline the direct call and hopefully the branch will be
> > > well predicted.
> > >
> > > In some SPEC there is IIRC the situation where such speculative
> > > devirtualization candidates can be found solely based on function
> > > signature.  With LTO/IPA you'd basically collect candidate targets
> > > for each indirect call (possibly address-taken function definitions
> > > with correct signature) and if there's just a single one you can
> > > choose that as speculative devirt target.
> > >
> > > Speculative devirt as we have now of course works with profile
> > > data to identify the most probable candidate.
> > >
> > > Richard.
> > >
> > > >
> > > > Thanks
> > > > Hanke Zhang

Reply via email to