Re: [PATCH 1/3] Disable generating store vector pair.
On 6/7/22 10:16 PM, Michael Meissner wrote: > Otherwise it is like the mess with -mpower8-fusion, where going from power8 to > power9 we have to clear the fusion flag. If store vector pair is a postive > flag, then it isn't set in power10 flags, but it might be set in next cpu > flags. But if it is a negative flag, we have to explicitly clear it. Ok, I completely forgot about this specific issue and its negative affect on inlining, so I agree it's a bad idea. Request withdrawn. :-) Peter
Re: [PATCH 1/3] Disable generating store vector pair.
On Tue, 2022-06-07 at 23:16 -0400, Michael Meissner wrote: > On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote: > > On 6/7/22 4:24 PM, Segher Boessenkool wrote: > > > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote: > > > > I think I mentioned this offline, but I'd prefer a negative target flag, > > > > something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, > > > > meaning we'd > > > > generate stxvp by default. > > > > > > NAK. All negatives should be -mno-xxx with -mxxx the corresponding > > > positive. All of them. > > > > That's not what I was asking for. I totally agree that > > -mno-store-vector-pair > > should disable generating stxvp and that -mstore-vector-pair should enable > > generating it. What I asked for was that the internal flag we use to enable > > and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR > > is > > true when we use -mno-store-vector-pair and false when using > > -mstore-vector-pair. > > That way we can add that flag to power10's rs6000-cpu.def entry and then > > we're > > done. What I don't want to have to do is that if/when power87 is released, > > we > > still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to > > get stxvp insns generated. That adds a cost to every cpu after power10 > > since > > we'd have to remember to add that flag to every follow-on cpu. > > FWIW, I really dislike having negative flags like that (just talking about the > option mask internals, not the user option). I can't tell there is agreement in either direction, i'll throw some comments out and see if that helps make a decision. I agree with avoiding the negative flags. Whenever I run across a code snippet reading "if (! TARGET_NOT_FOO) ... " it's time to double-check everything. :-) If the proposal is to have "TARGET_NO_STORE_VECTOR_PAIR" set to "off", I'd counter propose whatever variation possible to drop the "NO" from the string. i.e. "TARGET_STORE_VECTOR_PAIR" set to however it makes sense to indicate enabled, or not. All that said, .. with a strong preference to have the internal flags matching the option flags as closely as possible. > > I don't view the cost to add one postive flag to the next CPU as bad, as it > will be a one time cost. Presumably it would be set also next++ CPU. This is > like power8 is all of the power7 flags + new flags. Power9 is all of the > power8 flags + new flags. I.e. in general it is cumulative. Yes, I'm aware > there are times when there are breaks, but hopefully those are rare. This sounds reasonable. Some weight could be added for which way to bias the flag based on a guess of what the 'power87' release will allow, but ultimately that shouldn't really matter. And no, power87 isnt' real AFAIK,.. I'm just repeating the example provided by Peter :-) Thanks -Will > > Otherwise it is like the mess with -mpower8-fusion, where going from power8 to > power9 we have to clear the fusion flag. If store vector pair is a postive > flag, then it isn't set in power10 flags, but it might be set in next cpu > flags. But if it is a negative flag, we have to explicitly clear it. > > We can do it, but I just prefer to go with the positive flag approach. >
Re: [PATCH 1/3] Disable generating store vector pair.
On Tue, Jun 07, 2022 at 07:59:34PM -0500, Peter Bergner wrote: > On 6/7/22 4:24 PM, Segher Boessenkool wrote: > > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote: > >> I think I mentioned this offline, but I'd prefer a negative target flag, > >> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning > >> we'd > >> generate stxvp by default. > > > > NAK. All negatives should be -mno-xxx with -mxxx the corresponding > > positive. All of them. > > That's not what I was asking for. I totally agree that -mno-store-vector-pair > should disable generating stxvp and that -mstore-vector-pair should enable > generating it. What I asked for was that the internal flag we use to enable > and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR is > true when we use -mno-store-vector-pair and false when using > -mstore-vector-pair. > That way we can add that flag to power10's rs6000-cpu.def entry and then we're > done. What I don't want to have to do is that if/when power87 is released, we > still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to > get stxvp insns generated. That adds a cost to every cpu after power10 since > we'd have to remember to add that flag to every follow-on cpu. FWIW, I really dislike having negative flags like that (just talking about the option mask internals, not the user option). I don't view the cost to add one postive flag to the next CPU as bad, as it will be a one time cost. Presumably it would be set also next++ CPU. This is like power8 is all of the power7 flags + new flags. Power9 is all of the power8 flags + new flags. I.e. in general it is cumulative. Yes, I'm aware there are times when there are breaks, but hopefully those are rare. Otherwise it is like the mess with -mpower8-fusion, where going from power8 to power9 we have to clear the fusion flag. If store vector pair is a postive flag, then it isn't set in power10 flags, but it might be set in next cpu flags. But if it is a negative flag, we have to explicitly clear it. We can do it, but I just prefer to go with the positive flag approach. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH 1/3] Disable generating store vector pair.
On 6/7/22 4:24 PM, Segher Boessenkool wrote: > On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote: >> I think I mentioned this offline, but I'd prefer a negative target flag, >> something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd >> generate stxvp by default. > > NAK. All negatives should be -mno-xxx with -mxxx the corresponding > positive. All of them. That's not what I was asking for. I totally agree that -mno-store-vector-pair should disable generating stxvp and that -mstore-vector-pair should enable generating it. What I asked for was that the internal flag we use to enable and disable it should be a negative flag, where TARGET_NO_STORE_VECTOR_PAIR is true when we use -mno-store-vector-pair and false when using -mstore-vector-pair. That way we can add that flag to power10's rs6000-cpu.def entry and then we're done. What I don't want to have to do is that if/when power87 is released, we still have to add TARGET_STORE_VECTOR_PAIR its rs6000-cpu.def entry just to get stxvp insns generated. That adds a cost to every cpu after power10 since we'd have to remember to add that flag to every follow-on cpu. Peter
Re: [PATCH 1/3] Disable generating store vector pair.
On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote: > On 6/6/22 7:55 PM, Michael Meissner wrote: > > gcc/ > [snip] > > * config/rs6000/rs6000.opt (-mstore-vector-pair): New option. > [snip] > > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > > index 4931d781c4e..79ceec6e6a5 100644 > > --- a/gcc/config/rs6000/rs6000.opt > > +++ b/gcc/config/rs6000/rs6000.opt > > @@ -624,6 +624,10 @@ mieee128-constant > > Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save > > Generate (do not generate) code that uses the LXVKQ instruction. > > > > +; Generate (do not generate) code that uses the store vector pair > > instruction. > > +mstore-vector-pair > > +Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save > > + > > -param=rs6000-density-pct-threshold= > > Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) > > Init(85) IntegerRange(0, 100) Param > > When costing for loop vectorization, we probably need to penalize the loop > > body > > I think I mentioned this offline, but I'd prefer a negative target flag, > something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd > generate stxvp by default. Then I'd like to see MASK_NO_STORE_VECTOR_PAIR > added to power10's rs6000-cpu.def definition. That way, stxvp isn't generated > on Power10, but would be by default on any possible future cpus without > having to add a flag to those cpus rs6000-cpu.def entries. I don't much care when the option is spelled, but I'm happy to go with whatever name people want. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [PATCH 1/3] Disable generating store vector pair.
On Tue, Jun 07, 2022 at 04:17:04PM -0500, Peter Bergner wrote: > I think I mentioned this offline, but I'd prefer a negative target flag, > something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd > generate stxvp by default. NAK. All negatives should be -mno-xxx with -mxxx the corresponding positive. All of them. > Then I'd like to see MASK_NO_STORE_VECTOR_PAIR > added to power10's rs6000-cpu.def definition. That way, stxvp isn't generated > on Power10, but would be by default on any possible future cpus without > having to add a flag to those cpus rs6000-cpu.def entries. The p10 cpu support should simply not enable the option by default. There is no reason to play games. Segher
Re: [PATCH 1/3] Disable generating store vector pair.
On 6/6/22 7:55 PM, Michael Meissner wrote: > gcc/ [snip] > * config/rs6000/rs6000.opt (-mstore-vector-pair): New option. [snip] > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > index 4931d781c4e..79ceec6e6a5 100644 > --- a/gcc/config/rs6000/rs6000.opt > +++ b/gcc/config/rs6000/rs6000.opt > @@ -624,6 +624,10 @@ mieee128-constant > Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save > Generate (do not generate) code that uses the LXVKQ instruction. > > +; Generate (do not generate) code that uses the store vector pair > instruction. > +mstore-vector-pair > +Target Undocumented Var(TARGET_STORE_VECTOR_PAIR) Init(0) Save > + > -param=rs6000-density-pct-threshold= > Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) > Init(85) IntegerRange(0, 100) Param > When costing for loop vectorization, we probably need to penalize the loop > body I think I mentioned this offline, but I'd prefer a negative target flag, something like TARGET_NO_STORE_VECTOR_PAIR that defaults to off, meaning we'd generate stxvp by default. Then I'd like to see MASK_NO_STORE_VECTOR_PAIR added to power10's rs6000-cpu.def definition. That way, stxvp isn't generated on Power10, but would be by default on any possible future cpus without having to add a flag to those cpus rs6000-cpu.def entries. Peter
Re: [PATCH 1/3] Disable generating store vector pair.
On Mon, 2022-06-06 at 20:55 -0400, Michael Meissner wrote: > [PATCH 1/3] Disable generating store vector pair. > > Testing has revealed that the power10 has some slowdowns if the store > vector pair instruction is generated in some cases. This patch disables > generating the store vector pair instructions (stxvp, pstxvp, and stxvpx) > unless an undocumented switch is used. It is anticipated that perhaps > with future machines we can generate the store vector pair instruction. > > This patch does a split after reload to convert a store vector pair > instruction into a pair of store vector instructions. > > We do continue to generate the load vector pair instructions (lxvp, plxvp, > and lxvpx), since we have found that in code that heavily uses MMA, it is > still a win to generate the load vector pair instructions. > > There are two future patches planed: > > 1)Disable block moves from generating load/store vector pair > instructions unless the the store vector pair instructions are > being generted. > > 2)Make the built-in functions for generating store vector pair > always generate those instructions even if store vector pair > instructions are disabled. > > I have built bootstrap compilers and run the regression tests on three > different systems: > > 1)Little endian power10 using the --with-cpu=power10 option. > > 2)Little endian power9 using the --with-cpu=power9 option. > > 3)Big endian power8 using the --with-cpu=power8 option. On this > system, > both 64-bit and 32-bit code generation was tested. > > There were no regressions in the runs except for the tests that are > modified in patch #3 in these series of patches. Can I check this patch > into the trunk? If there are no changes needed for the backports, can I > check this code into the active branches after a burn-in period? > > 2022-06-06 Michael Meissner > > gcc/ > > * config/rs6000/mma.md (movoo): Disable generating store vector > pair instructions unless these are enabled by the user. > (movxo): Likewise. > * config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): If store > vector pair instructions are disabled, do not allow vector pair > addresses to be indexed. > (rs6000_split_multireg_move): Do not split XOmode stores into two > store vector pair instructions unless store vector pair > instructions are enabled. > * config/rs6000/rs6000.md (isa attribute): Add stxvp attribute. > (enabled attribute): Disable alternative using store vector pair > instructions unless they are enabled. > * config/rs6000/rs6000.opt (-mstore-vector-pair): New option. > > gcc/testsuite/ > > * gcc.target/powerpc/p10-store-vector-pair-1.c: New test. > * gcc.target/powerpc/p10-store-vector-pair-2.c: New test. > --- > gcc/config/rs6000/mma.md | 41 ++ > gcc/config/rs6000/rs6000.cc | 9 +- > gcc/config/rs6000/rs6000.md | 8 +- > gcc/config/rs6000/rs6000.opt | 4 + > .../powerpc/p10-store-vector-pair-1.c | 82 +++ > .../powerpc/p10-store-vector-pair-2.c | 81 ++ > 6 files changed, 206 insertions(+), 19 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-1.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-store-vector-pair-2.c > > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > index a183b6a168a..9b5f243b88d 100644 > --- a/gcc/config/rs6000/mma.md > +++ b/gcc/config/rs6000/mma.md > @@ -274,26 +274,35 @@ (define_expand "movoo" >DONE; > }) > > +;; By default for power10, do not generate the stxvp/pstxvp/stxvpx > +;; instructions. Instead, split these instructions into two separate store > +;; vector instructions. We do always generate a lxvp/plxvp/lxvpx > instruction. > +;; We leave in the support for generating stxvp/pstxvp/stxvpx in future > +;; machines. ... and if (undocumented) STORE_VECTOR_PAIR option is indicated ? Nothing else jumps out at me. Thanks -Will