Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-08 Thread Peter Bergner via Gcc-patches
On 6/7/22 10:16 PM, Michael Meissner wrote:
> Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
> power9 we have to clear the fusion flag.  If store vector pair is a postive
> flag, then it isn't set in power10 flags, but it might be set in next cpu
> flags.  But if it is a negative flag, we have to explicitly clear it.

Ok, I completely forgot about this specific issue and its negative affect on
inlining, so I agree it's a bad idea.  Request withdrawn. :-) 

Peter




Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-08 Thread will schmidt via Gcc-patches
On Tue, 2022-06-07 at 23:16 -0400, Michael Meissner wrote:
> On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote:
> > On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> > > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> > > > I think I mentioned this offline, but I'd prefer a negative target flag,
> > > > something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, 
> > > > meaning we'd
> > > > generate stxvp by default.
> > > 
> > > NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> > > positive.  All of them.
> > 
> > That's not what I was asking for.  I totally agree that 
> > -mno-store-vector-pair
> > should disable generating stxvp and that -mstore-vector-pair should enable
> > generating it.  What I asked for was that the internal flag we use to enable
> > and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR 
> > is
> > true when we use -mno-store-vector-pair and false when using 
> > -mstore-vector-pair.
> > That way we can add that flag to power10's rs6000-cpu.def entry and then 
> > we're
> > done.  What I don't want to have to do is that if/when power87 is released, 
> > we
> > still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
> > get stxvp insns generated.  That adds a cost to every cpu after power10 
> > since
> > we'd have to remember to add that flag to every follow-on cpu.
> 
> FWIW, I really dislike having negative flags like that (just talking about the
> option mask internals, not the user option).

I can't tell there is agreement in either direction, i'll throw some
comments out and see if that helps make a decision. 

I agree with avoiding the negative flags.  Whenever I run across a code
snippet reading  "if (! TARGET_NOT_FOO) ... " it's time to double-check 
everything.  :-)  

If the proposal is to have "TARGET_NO_STORE_VECTOR_PAIR" set to "off",
I'd counter propose whatever variation possible to drop the "NO" from
the string. i.e. "TARGET_STORE_VECTOR_PAIR" set to however it makes
sense to indicate enabled, or not.

All that said, .. with a strong preference to have the internal flags
matching the option flags as closely as possible.


> 
> I don't view the cost to add one postive flag to the next CPU as bad, as it
> will be a one time cost.  Presumably it would be set also next++ CPU.  This is
> like power8 is all of the power7 flags + new flags.  Power9 is all of the
> power8 flags + new flags.  I.e. in general it is cumulative.  Yes, I'm aware
> there are times when there are breaks, but hopefully those are rare.

This sounds reasonable.   Some weight could be added for which way to
bias the flag based on a guess of what the 'power87' release will
allow, but ultimately that shouldn't really matter. 

And no, power87 isnt' real AFAIK,.. I'm just repeating the example
provided by Peter :-) 

Thanks
-Will

> 
> Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
> power9 we have to clear the fusion flag.  If store vector pair is a postive
> flag, then it isn't set in power10 flags, but it might be set in next cpu
> flags.  But if it is a negative flag, we have to explicitly clear it.
> 
> We can do it, but I just prefer to go with the positive flag approach.
> 



Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread Michael Meissner via Gcc-patches
On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote:
> On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> >> I think I mentioned this offline, but I'd prefer a negative target flag,
> >> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning 
> >> we'd
> >> generate stxvp by default.
> > 
> > NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> > positive.  All of them.
> 
> That's not what I was asking for.  I totally agree that -mno-store-vector-pair
> should disable generating stxvp and that -mstore-vector-pair should enable
> generating it.  What I asked for was that the internal flag we use to enable
> and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR is
> true when we use -mno-store-vector-pair and false when using 
> -mstore-vector-pair.
> That way we can add that flag to power10's rs6000-cpu.def entry and then we're
> done.  What I don't want to have to do is that if/when power87 is released, we
> still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
> get stxvp insns generated.  That adds a cost to every cpu after power10 since
> we'd have to remember to add that flag to every follow-on cpu.

FWIW, I really dislike having negative flags like that (just talking about the
option mask internals, not the user option).

I don't view the cost to add one postive flag to the next CPU as bad, as it
will be a one time cost.  Presumably it would be set also next++ CPU.  This is
like power8 is all of the power7 flags + new flags.  Power9 is all of the
power8 flags + new flags.  I.e. in general it is cumulative.  Yes, I'm aware
there are times when there are breaks, but hopefully those are rare.

Otherwise it is like the mess with -mpower8-fusion, where going from power8 to
power9 we have to clear the fusion flag.  If store vector pair is a postive
flag, then it isn't set in power10 flags, but it might be set in next cpu
flags.  But if it is a negative flag, we have to explicitly clear it.

We can do it, but I just prefer to go with the positive flag approach.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread Peter Bergner via Gcc-patches
On 6/7/22 4:24 PM, Segher Boessenkool wrote:
> On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
>> I think I mentioned this offline, but I'd prefer a negative target flag,
>> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
>> generate stxvp by default.
> 
> NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
> positive.  All of them.

That's not what I was asking for.  I totally agree that -mno-store-vector-pair
should disable generating stxvp and that -mstore-vector-pair should enable
generating it.  What I asked for was that the internal flag we use to enable
and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR is
true when we use -mno-store-vector-pair and false when using 
-mstore-vector-pair.
That way we can add that flag to power10's rs6000-cpu.def entry and then we're
done.  What I don't want to have to do is that if/when power87 is released, we
still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to
get stxvp insns generated.  That adds a cost to every cpu after power10 since
we'd have to remember to add that flag to every follow-on cpu.

Peter



Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread Michael Meissner via Gcc-patches
On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> On 6/6/22 7:55 PM, Michael Meissner wrote:
> > gcc/
> [snip]
> > * config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
> [snip]
> > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> > index 4931d781c4e..79ceec6e6a5 100644
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -624,6 +624,10 @@ mieee128-constant
> >  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
> >  Generate (do not generate) code that uses the LXVKQ instruction.
> >  
> > +; Generate (do not generate) code that uses the store vector pair 
> > instruction.
> > +mstore-vector-pair
> > +Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save
> > +
> >  -param=rs6000-density-pct-threshold=
> >  Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) 
> > Init(85) IntegerRange(0, 100) Param
> >  When costing for loop vectorization, we probably need to penalize the loop 
> > body
> 
> I think I mentioned this offline, but I'd prefer a negative target flag,
> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
> generate stxvp by default.  Then I'd like to see MASK_NO_STORE_VECTOR_PAIR
> added to power10's rs6000-cpu.def definition.  That way, stxvp isn't generated
> on Power10, but would be by default on any possible future cpus without
> having to add a flag to those cpus rs6000-cpu.def entries.

I don't much care when the option is spelled, but I'm happy to go with whatever
name people want.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread Segher Boessenkool
On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote:
> I think I mentioned this offline, but I'd prefer a negative target flag,
> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
> generate stxvp by default.

NAK.  All negatives should be -mno-xxx with -mxxx the corresponding
positive.  All of them.

>  Then I'd like to see MASK_NO_STORE_VECTOR_PAIR
> added to power10's rs6000-cpu.def definition.  That way, stxvp isn't generated
> on Power10, but would be by default on any possible future cpus without
> having to add a flag to those cpus rs6000-cpu.def entries.

The p10 cpu support should simply not enable the option by default.
There is no reason to play games.


Segher


Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread Peter Bergner via Gcc-patches
On 6/6/22 7:55 PM, Michael Meissner wrote:
> gcc/
[snip]
>   * config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
[snip]
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 4931d781c4e..79ceec6e6a5 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -624,6 +624,10 @@ mieee128-constant
>  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
>  Generate (do not generate) code that uses the LXVKQ instruction.
>  
> +; Generate (do not generate) code that uses the store vector pair 
> instruction.
> +mstore-vector-pair
> +Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save
> +
>  -param=rs6000-density-pct-threshold=
>  Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) 
> Init(85) IntegerRange(0, 100) Param
>  When costing for loop vectorization, we probably need to penalize the loop 
> body

I think I mentioned this offline, but I'd prefer a negative target flag,
something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd
generate stxvp by default.  Then I'd like to see MASK_NO_STORE_VECTOR_PAIR
added to power10's rs6000-cpu.def definition.  That way, stxvp isn't generated
on Power10, but would be by default on any possible future cpus without
having to add a flag to those cpus rs6000-cpu.def entries.

Peter



Re: [PATCH 1/3] Disable generating store vector pair.

2022-06-07 Thread will schmidt via Gcc-patches
On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote:
> [PATCH 1/3] Disable generating store vector pair.
> 
> Testing has revealed that the power10 has some slowdowns if the store
> vector pair instruction is generated in some cases.  This patch disables
> generating the store vector pair instructions (stxvp, pstxvp, and stxvpx)
> unless an undocumented switch is used.  It is anticipated that perhaps
> with future machines we can generate the store vector pair instruction.
> 
> This patch does a split after reload to convert a store vector pair
> instruction into a pair of store vector instructions.
> 
> We do continue to generate the load vector pair instructions (lxvp, plxvp,
> and lxvpx), since we have found that in code that heavily uses MMA, it is
> still a win to generate the load vector pair instructions.
> 
> There are two future patches planed:
> 
> 1)Disable block moves from generating load/store vector pair
>   instructions unless the the store vector pair instructions are
>   being generted.
> 
> 2)Make the built-in functions for generating store vector pair
>   always generate those instructions even if store vector pair
>   instructions are disabled.
> 
> I have built bootstrap compilers and run the regression tests on three
> different systems:
> 
> 1)Little endian power10 using the --with-cpu=power10 option.
> 
> 2)Little endian power9 using the --with-cpu=power9 option.
> 
> 3)Big endian power8 using the --with-cpu=power8 option.  On this 
> system,
>   both 64-bit and 32-bit code generation was tested.
> 
> There were no regressions in the runs except for the tests that are
> modified in patch #3 in these series of patches.  Can I check this patch
> into the trunk?  If there are no changes needed for the backports, can I
> check this code into the active branches after a burn-in period?
> 
> 2022-06-06   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/mma.md (movoo): Disable generating store vector
>   pair instructions unless these are enabled by the user.
>   (movxo): Likewise.
>   * config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store
>   vector pair instructions are disabled, do not allow vector pair
>   addresses to be indexed.
>   (rs6000_split_multireg_move): Do not split XOmode stores into two
>   store vector pair instructions unless store vector pair
>   instructions are enabled.
>   * config/rs6000/rs6000.md (isa attribute): Add stxvp attribute.
>   (enabled attribute): Disable alternative using store vector pair
>   instructions unless they are enabled.
>   * config/rs6000/rs6000.opt (-mstore-vector-pair): New option.
> 
> gcc/testsuite/
> 
>   * gcc.target/powerpc/p10-store-vector-pair-1.c: New test.
>   * gcc.target/powerpc/p10-store-vector-pair-2.c: New test.
> ---
>  gcc/config/rs6000/mma.md  | 41 ++
>  gcc/config/rs6000/rs6000.cc   |  9 +-
>  gcc/config/rs6000/rs6000.md   |  8 +-
>  gcc/config/rs6000/rs6000.opt  |  4 +
>  .../powerpc/p10-store-vector-pair-1.c | 82 +++
>  .../powerpc/p10-store-vector-pair-2.c | 81 ++
>  6 files changed, 206 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index a183b6a168a..9b5f243b88d 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -274,26 +274,35 @@ (define_expand "movoo"
>DONE;
>  })
> 
> +;; By default for power10, do not generate the stxvp/pstxvp/stxvpx
> +;; instructions.  Instead, split these instructions into two separate store
> +;; vector instructions.  We do always generate a lxvp/plxvp/lxvpx 
> instruction.
> +;; We leave in the support for generating stxvp/pstxvp/stxvpx in future
> +;; machines.

... and if (undocumented) STORE_VECTOR_PAIR option is indicated ?

Nothing else jumps out at me.  

Thanks
-Will