Hi Richard, Thanks for your reply. I'm looking forward to hearing from Honza!
But I still have a small question about the IPA passes. If I want to make such a conversion (sum += foo[i].add == add ? add (1,2) : foo[i].add (1,2);), I definitely need to change the function's body. But the article "Using summary information in IPA passes"(https://gcc.gnu.org/onlinedocs/gccint/link-time-optimization/using-summary- Information-in-ipa-passes.html) mentioned that if IPA-Pass needs to change the function body, it needs to be done in the transform stage, but the execution stage of IPA-inline has ended before the transform stage. So we still can’t mark the newly created function calls/cgraph_edge as inlineable. Is that correct? Or did I understand something wrong? Thanks Hanke Zhang Richard Biener <richard.guent...@gmail.com> 于2024年6月12日周三 18:49写道: > > On Wed, Jun 12, 2024 at 11:57 AM Hanke Zhang <hkzhang...@gmail.com> wrote: > > > > Richard Biener <richard.guent...@gmail.com> 于2024年5月24日周五 14:39写道: > > > > > > On Fri, May 24, 2024 at 5:53 AM Hanke Zhang via Gcc <gcc@gcc.gnu.org> > > > wrote: > > > > > > > > Hi, > > > > I got a question about optimizing function pointers for direct > > > > function calls in C. > > > > > > > > Consider the following scenario: one of the fields of a structure is a > > > > function pointer, and all its assignments come from the same function. > > > > Can all its uses be replaced by direct calls to this function? So the > > > > later passes can do more optimizations. > > > > > > > > Here is the example: > > > > > > > > int add(int a, int b) { return a + b; } > > > > int sub(int a, int b) { return a - b; } > > > > > > > > struct Foo { > > > > int (*add)(int, int); > > > > }; > > > > int main() > > > > { > > > > struct Foo foo[5] = malloc(sizeof(struct Foo) * 5); > > > > > > > > for (int i = 0; i < 5; i++) { > > > > foo[i].add = add; > > > > } > > > > > > > > int sum = 0; > > > > for (int i = 0; i < 5; i++) { > > > > sum += foo[i].add(1, 2); > > > > } > > > > > > > > return 0; > > > > } > > > > > > > > Can I replace the code above to the code below? > > > > > > > > int add(int a, int b) { return a + b; } > > > > int sub(int a, int b) { return a - b; } > > > > > > > > struct Foo { > > > > int (*add)(int, int); > > > > }; > > > > int main() > > > > { > > > > struct Foo foo[5] = malloc(sizeof(struct Foo) * 5); > > > > > > > > for (int i = 0; i < 5; i++) { > > > > foo[i].add = add; > > > > } > > > > > > > > int sum = 0; > > > > for (int i = 0; i < 5; i++) { > > > > sum += add(1,2); > > > > } > > > > > > > > return 0; > > > > } > > > > > > > > My idea is to determine whether the assignment of the field is the > > > > same function, and if so, perform the replacement. > > > > > > If it's as simple as above then sure, even CSE should do it. If you > > > can prove otherwise the memory location with the function pointer > > > always has the same value you are obviously fine. If you just > > > do not see any other store via 'add's FIELD_DECL then no, that > > > isn't good enough. Every void * store you do not know where it > > > goes might go to that slot. > > > > > > > Of course this is not a reasonable optimization, I just want to know > > > > if there are security issues in doing so, and if I want to do it in > > > > the IPA stage, is it possible? > > > > > > For the more general case you can do what we do for speculative > > > devirtualization - replace the code with > > > > > > sum += foo[i].add == add ? add (1,2) : foo[i].add(1,2); > > > > Hi Richard, > > > > I'm trying to do what you suggested. (sum += foo[i].add == add ? add > > (1,2) : foo[i].add(1,2);) > > > > I created a new IPA-Pass before IPA-INLINE and I made the changes on > > the GIMPLE in the "function_transform" stage. But my newly created > > "foo[i].add(1,2)" seems to fail to be inlined. And it was optimized > > out in the subsequent FRE. I would like to ask if there is any way to > > mark my newly created cgraph_edge as "inline" in the > > "function_transform" stage. > > > > Here is part of the code creating the call_stmt in true basic_block in > > my IPA-PASS: > > > > // ... > > // part of the code have been omitted > > auto_vec<tree> params; > > for (int i = 0; i < gimple_call_num_args (call_stmt); i++) { > > params.safe_push (gimple_call_arg (call_stmt, i)); > > } > > gimple* tbb_call = gimple_build_call_vec (fndecl, params); /// = fn() > > tree tbb_ssa; > > if (ret_val_flag) { > > tbb_ssa = make_ssa_name (TREE_TYPE (gimple_call_lhs(call_stmt)), NULL); > > gimple_call_set_lhs (tbb_call, tbb_ssa); /// _2 = fn() > > } > > gsi = gsi_start_bb (tbb); > > gsi_insert_before (&gsi, tbb_call, GSI_SAME_STMT); > > cgraph_edge* tbb_callee = node->create_edge (cgraph_node::get_create > > (fndecl), (gcall*)tbb_call, tbb_call->bb->count, true); > > // what should I do to this cgraph_edge to mark it to be inlined > > // ... > > > > Or should I not do these things in the function_transform stage? > > That's indeed too late. You have to create speculative call edges, > I would suggest to look how IPA devirtualization handles this case > and possibly simply amend it. I'm CCing Honza who should know > more about this, I do not know the details. > > Richard. > > > Thanks > > Hanke Zhang > > > > > > > > > > that way we can inline the direct call and hopefully the branch will be > > > well predicted. > > > > > > In some SPEC there is IIRC the situation where such speculative > > > devirtualization candidates can be found solely based on function > > > signature. With LTO/IPA you'd basically collect candidate targets > > > for each indirect call (possibly address-taken function definitions > > > with correct signature) and if there's just a single one you can > > > choose that as speculative devirt target. > > > > > > Speculative devirt as we have now of course works with profile > > > data to identify the most probable candidate. > > > > > > Richard. > > > > > > > > > > > Thanks > > > > Hanke Zhang