https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107667
Bug ID: 107667 Summary: IPA: Speculatively reuse existing specializations Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: christophm30 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- This is a feature request, not a bug report. GCC already does an excellent job in specializing functions in ipa-cp. Such specialized functions often result in much faster execution because the additional information enables additional optimizations (e.g. vectorization). Different call sites can have different specializations and some call sites might not get specialized at all. When looking at the not-specialized call sites, there is one strategy that can be applied: add guards to test if existing specializations can be reused and if so call them. Such an optimization has to be built into ipa-cp and collects all specialized functions, and the constants that are propagated. At the end of the propagation stage, the call graph is changed to add speculative edges to the specialized functions with guards that test if the actual arguments match the constants. To demonstrate the effect, let's consider the following program part: func_1() myfunc(1) func_2() myfunc(2) func_i(i) myfunc(i) In this case the transformation would do the following: func_1() myfunc.constprop.1() // myfunc() with arg0 == 1 func_2() myfunc.constprop.2() // myfunc() with arg0 == 2 func_i(i) if (i == 1) myfunc.constprop.1() // myfunc() with arg0 == 1 else if (i == 2) myfunc.constprop.2() // myfunc() with arg0 == 2 else myfunc(i) Similar to `-devirtualize-speculatively`, such an optimization can be gated using a flag (e.g. `-fipa-guarded-specialization`). One example where this optimization would trigger is x264 (also part of CPU2017), where the function pointer `get_ref` is assigned a single time during startup, and then called multiple times with constant arguments (8 or 16) or with "unknown" arguments which are actually matching the constants at runtime. In combination with PR ipa/107666 (which converts the function pointers into guarded direct calls), this allows propagating the constants into `pixel_avg`, where (limited as documented in PR 106352) vectorization will be enabled.