> In a follow-up patch, can you please document inlining rules involving
> -march and -mtune to "x86 Function Attributes" section? Currently, the
> inlining rules at the end of "target function attribute" section does
> not even mention -march and -mtune. Maybe a subsubsection "Inlining
> rules" should be added (like AArch64 has) to mention that only default
> arch and tune are inlined by default (but inline can be forced with
> always_inline for different mtune flags).

The document has below at the end of 'target (OPTIONS)' section

On the x86, the inliner does not inline a function that has
different target options than the caller, unless the callee
has a subset of the target options of the caller.  For example
a function declared with 'target("sse3")' can inline a
function with 'target("sse2")', since '-msse3' implies
'-msse2'.

Do we need to move this part to a new section and combine with -march and
-mtune rule description to the new subsubsection?

> Looking at the above, perhaps inlining of different arches can also be
> forced with always_inline? This would allow developers some control of
> inlining, and would not be surprising.

If so, I'd like to add the always_inline change on arch to current
patch and leave the
document change alone in the next patch.

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年7月4日周二 14:19写道:
>
> On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang <hongyu.w...@intel.com> wrote:
> >
> > Hi,
> >
> > For function with different target attributes, current logic rejects to
> > inline the callee when any arch or tune is mismatched. Relax the
> > condition to allow callee with default arch/tune to be inlined.
> >
> > Boostrapped/regtested on x86-64-linux-gnu{-m32,}.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.cc (ix86_can_inline_p): If callee has
> >         default arch=x86-64 and tune=generic, do not block the
> >         inlining to its caller.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/inline_target_clones.c: New test.
>
> OK.
>
> In a follow-up patch, can you please document inlining rules involving
> -march and -mtune to "x86 Function Attributes" section? Currently, the
> inlining rules at the end of "target function attribute" section does
> not even mention -march and -mtune. Maybe a subsubsection "Inlining
> rules" should be added (like AArch64 has) to mention that only default
> arch and tune are inlined by default (but inline can be forced with
> always_inline for different mtune flags).
>
> Looking at the above, perhaps inlining of different arches can also be
> forced with always_inline? This would allow developers some control of
> inlining, and would not be surprising.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.cc                       | 22 +++++++++++------
> >  .../gcc.target/i386/inline_target_clones.c    | 24 +++++++++++++++++++
> >  2 files changed, 39 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/inline_target_clones.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 8989985700a..4741c9b5364 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -605,13 +605,6 @@ ix86_can_inline_p (tree caller, tree callee)
> >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> >      ret = false;
> >
> > -  /* See if arch, tune, etc. are the same.  */
> > -  else if (caller_opts->arch != callee_opts->arch)
> > -    ret = false;
> > -
> > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > -    ret = false;
> > -
> >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> >            /* If the calle doesn't use FP expressions differences in
> >               ix86_fpmath can be ignored.  We are called from FEs
> > @@ -622,6 +615,21 @@ ix86_can_inline_p (tree caller, tree callee)
> >                || ipa_fn_summaries->get (callee_node)->fp_expressions))
> >      ret = false;
> >
> > +  /* At this point we cannot identify whether arch or tune setting
> > +     comes from target attribute or not. So the most conservative way
> > +     is to allow the callee that uses default arch and tune string to
> > +     be inlined.  */
> > +  else if (!strcmp (callee_opts->x_ix86_arch_string, "x86-64")
> > +          && !strcmp (callee_opts->x_ix86_tune_string, "generic"))
> > +    ret = true;
> > +
> > +  /* See if arch, tune, etc. are the same.  */
> > +  else if (caller_opts->arch != callee_opts->arch)
> > +    ret = false;
> > +
> > +  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > +    ret = false;
> > +
> >    else if (!always_inline
> >            && caller_opts->branch_cost != callee_opts->branch_cost)
> >      ret = false;
> > diff --git a/gcc/testsuite/gcc.target/i386/inline_target_clones.c 
> > b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > new file mode 100644
> > index 00000000000..53db1600ce5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/inline_target_clones.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-ifunc "" } */
> > +/* { dg-options "-O3 -march=x86-64" } */
> > +/* { dg-final { scan-assembler-not "call\[ \t\]+callee" } } */
> > +
> > +float callee (float a, float b, float c, float d,
> > +             float e, float f, float g, float h)
> > +{
> > +  return a * b + c * d + e * f + g + h + a * c + b * c
> > +    + a * d + b * e + a * f + c * h +
> > +    b * (a - 0.4f) * (c + h) * (b + e * d) - a / f * h;
> > +}
> > +
> > +__attribute__((target_clones("default","arch=icelake-server")))
> > +void caller (int n, float *a,
> > +            float c1, float c2, float c3,
> > +            float c4, float c5, float c6,
> > +            float c7)
> > +{
> > +  for (int i = 0; i < n; i++)
> > +    {
> > +      a[i] = callee (a[i], c1, c2, c3, c4, c5, c6, c7);
> > +    }
> > +}
> > --
> > 2.31.1
> >

Reply via email to