Re: [PATCH] rs6000: fmr gets used instead of faster xxlor [PR93571]

Segher Boessenkool Fri, 24 Feb 2023 07:13:23 -0800

Hi!

For future patches: please don't send patches as replies to existing
threads.  Just start a new thread for a new patch (series).  You can
mark it as [PATCH v2] in the subject, if you want.


On Fri, Feb 24, 2023 at 01:41:49PM +0530, Ajit Agarwal wrote:
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures.

And fmr is the only option before p7.

>       rs6000: Use xxlor instead of fmr where possible
> 
>       This patch replaces fmr with xxlor instruction for power7
>       and power8 architectures whereas for power9 and power10
>       replaces xxlor with fmr instruction.

Saying "this patch" in a commit message reads strangely.  Just "Replace
fmr with" etc.?

The second part is just wrong, you cannot replace xxlor by fmr in
general.

>       Perf measurement results:
> 
>       Power9 fmr:  201,847,661 cycles.
>       Power9 xxlor: 201,877,78 cycles.
>       Power8 fmr: 201,057,795 cycles.
>         Power8 xxlor: 201,004,671 cycles.

What is this measuring?  100M insns back-to-back, each dependent on the
previous one?

What are the results on p7 and p10?

These numbers show there is no difference on p8 either.  Did you paste
the wrong numbers maybe?

>       * config/rs6000/rs6000.md (*movdf_hardfloat64): Use xxlor
>       for power7 and power8 and fmr for power9 and power10.

Please don't break lines early.  Changelogs lines can be 80 columns
wide, just like source code lines.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -354,7 +354,7 @@ (define_attr "cpu"
>    (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
>  
>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p7p8,p10"

p78v, and sort it after p8v please.

> +     (and (eq_attr "isa" "p7p8")
> +       (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
> +     (const_int 1)

Okay.

>  (define_insn "*mov<mode>_hardfloat64"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> -           "=m,           d,          d,          <f64_p9>,   wY,
> -             <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
> -             YZ,          r,          !r,         *c*l,       !r,
> -            *h,           r,          <f64_dm>,   wa")
> +           "=m,           d,          <f64_vsx>,  <f64_p9>,   wY,
> +            <f64_av>,     Z,          wa,         <f64_vsx>,  !r,
> +            YZ,           r,          !r,         *c*l,       !r,
> +            *h,           r,          <f64_dm>,   d,          wn,
> +            wa")
>       (match_operand:FMOVE64 1 "input_operand"

(You posted this mail as wrapping.  That means the patch cannot be
applied non-manually, and that replies to your mail will be mangled.
Just get a Real mail client, and configure it correctly :-) )

> -            "d,           m,          d,          wY,         <f64_p9>,
> -             Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
> +            "d,           m,          <f64_vsx>,  wY,         <f64_p9>,
> +             Z,           <f64_av>,   wa,         <zero_fp>,  <zero_fp>,
>               r,           YZ,         r,          r,          *h,
> -             0,           <f64_dm>,   r,          eP"))]
> +             0,           <f64_dm>,   r,          d,          wn,
> +             eP"))]

No.  It is impossible to figure out what you changed here by just
reading it.

There is no requirement there should be exactly five alternatives per
line, and/or that there should be the same number everywhere.

If the indentation was incorrect, and you want to fix that, do that in a
separate *earlier* patch in the series, please.

>    "TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
>    "@
>     stfd%U0%X0 %1,%0
>     lfd%U1%X1 %0,%1
> -   fmr %0,%1
> +   xxlor %x0,%x1,%x1
>     lxsd %0,%1
>     stxsd %1,%0
>     lxsdx %x0,%y1
>     stxsdx %x1,%y0
> -   xxlor %x0,%x1,%x1
> +   fmr %0,%1
>     xxlxor %x0,%x0,%x0
>     li %0,0
>     std%U0%X0 %1,%0
> @@ -8467,23 +8474,28 @@ (define_insn "*mov<mode>_hardfloat64"
>     nop
>     mfvsrd %0,%x1
>     mtvsrd %x0,%1
> +   fmr %0,%1
> +   fmr %0,%1
>     #"
>    [(set_attr "type"
> -            "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
> +            "fpstore,     fpload,     veclogical, fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, integer,
>               store,       load,       *,          mtjmpr,     mfjmpr,
> -             *,           mfvsr,      mtvsr,      vecperm")
> +             *,           mfvsr,      mtvsr,      fpsimple,   fpsimple,
> +             vecperm")
>     (set_attr "size" "64")
>     (set_attr "isa"
> -            "*,           *,          *,          p9v,        p9v,
> -             p7v,         p7v,        *,          *,          *,
> -             *,           *,          *,          *,          *,
> -             *,           p8v,        p8v,        p10")
> +            "*,           *,          p7p8,        p9v,        p9v,
> +             p7v,         p7v,        *,           *,          *,
> +             *,           *,          *,           *,          *,
> +             *,           p8v,        p8v,         *,          *,
> +             p10")

So, you swapped the xxlor and fmr entries, and added two nextra fmr
entries at the end?!


Segher

Re: [PATCH] rs6000: fmr gets used instead of faster xxlor [PR93571]

Reply via email to