> I don't think this is desirable. If we inline something with different
> ISAs, we get some strange mix of ISAs when the function is inlined.
> OTOH - we already inline with mismatched tune flags if the function is
> marked with always_inline.

Previously ix86_can_inline_p has

if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
     != callee_opts->x_ix86_isa_flags)
    || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
        != callee_opts->x_ix86_isa_flags2))
  ret = false;

It make sure caller ISA is a super set of callee, and the inlined one
should follow caller's ISA specification.

IMHO I cannot give a real example that after inline the caller's
performance get harmed, I added PVW since there might
be some callee want to limit its vector size and caller may have
larger preferred vector size. At least with current change
we get more optimization opportunity for different target_clones.

But I agree the tuning setting may be a factor that affect the
performance. One possible choice is that if the
tune for callee is unspecified or default, just inline it to the
caller with specified arch and tune.

Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年6月27日周二 17:16写道:



>
> On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang <hongyu.w...@intel.com> wrote:
> >
> > Hi,
> >
> > For function with different target attributes, current logic rejects to
> > inline the callee when any arch or tune is mismatched. Relax the
> > condition to honor just prefer_vecotr_width_type and other flags that
> > may cause safety issue so caller can get more optimization opportunity.
>
> I don't think this is desirable. If we inline something with different
> ISAs, we get some strange mix of ISAs when the function is inlined.
> OTOH - we already inline with mismatched tune flags if the function is
> marked with always_inline.
>
> Uros.
>
> > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> >         tune directly, just check prefer_vector_width_type and make sure
> >         not to inline if they mismatch.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/inline-target-attr.c: New test.
> > ---
> >  gcc/config/i386/i386.cc                       | 11 +++++----
> >  .../gcc.target/i386/inline-target-attr.c      | 24 +++++++++++++++++++
> >  2 files changed, 30 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 0761965344b..1d86384ac06 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> >                != (callee_opts->x_target_flags & ~always_inline_safe_mask))
> >      ret = false;
> >
> > -  /* See if arch, tune, etc. are the same.  */
> > -  else if (caller_opts->arch != callee_opts->arch)
> > -    ret = false;
> > -
> > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > +  /* Do not inline when specified perfer-vector-width mismatched between
> > +     callee and caller.  */
> > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > +          && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > +          && callee_opts->x_prefer_vector_width_type
> > +             != caller_opts->x_prefer_vector_width_type)
> >      ret = false;
> >
> >    else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c 
> > b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > new file mode 100644
> > index 00000000000..995502165f0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > +
> > +__attribute__((target("arch=skylake")))
> > +int callee (int n)
> > +{
> > +  int sum = 0;
> > +  for (int i = 0; i < n; i++)
> > +    {
> > +      if (i % 2 == 0)
> > +       sum +=i;
> > +      else
> > +       sum += (i - 1);
> > +    }
> > +  return sum + n;
> > +}
> > +
> > +__attribute__((target("arch=icelake-server")))
> > +int caller (int n)
> > +{
> > +  return callee (n) + n;
> > +}
> > +
> > --
> > 2.31.1
> >

Reply via email to