Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
Hi, on 2023/9/25 09:57, HAO CHEN GUI wrote: > Hi Kewen, > > 在 2023/9/18 15:34, Kewen.Lin 写道: >> hanks for checking! So for P7, this patch looks neutral, but for P8 and >> later, it may cause some few differences in code gen. I'm curious that how >> many total object files and different object files were checked and found >> on P8? > P8 with -O2, following object files are different. > 507.cactuBSSN_r datestamp.o > 511.povray_r colutils.o > 521.wrf_r module_cu_kfeta.fppized.o > 526.blender_r particle_edit.o > 526.blender_r glutil.o > 526.blender_r displist.o > 526.blender_r CCGSubSurf.o > > P8 with -O3, following object files are different. > 502.gcc_r ifcvt.o > 502.gcc_r rtlanal.o > 548.exchange2_r exchange2.fppized.o > 507.cactuBSSN_r datestamp.o > 511.povray_r colutils.o > 521.wrf_r module_bc.fppized.o > 521.wrf_r module_cu_kfeta.fppized.o > 526.blender_r particle_edit.o > 526.blender_r displist.o > 526.blender_r CCGSubSurf.o > 526.blender_r sketch.o > OK, it's concluded that the percentage of the total number of affected object files is quite small ... > > > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html >> I also wonder if it's easy to reduce some of them further as small test >> cases. >> >> Since xxlor is better than fmr at least on Power10, could you also evaluate >> the affected bmks on P10 (even P8/P9) to ensure no performance degradation? > There is no performance recession on P10/P9/P8. The detail data is listed on > internal issue. ... and no runtime performance impact as evaluated, so this patch looks good to me and thanks for further testing. Please wait for a week or so to see if Segher and David have comments. Thanks! BR, Kewen > > Thanks > Gui Haochen
Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
Hi Kewen, 在 2023/9/18 15:34, Kewen.Lin 写道: > hanks for checking! So for P7, this patch looks neutral, but for P8 and > later, it may cause some few differences in code gen. I'm curious that how > many total object files and different object files were checked and found > on P8? P8 with -O2, following object files are different. 507.cactuBSSN_r datestamp.o 511.povray_r colutils.o 521.wrf_r module_cu_kfeta.fppized.o 526.blender_r particle_edit.o 526.blender_r glutil.o 526.blender_r displist.o 526.blender_r CCGSubSurf.o P8 with -O3, following object files are different. 502.gcc_r ifcvt.o 502.gcc_r rtlanal.o 548.exchange2_r exchange2.fppized.o 507.cactuBSSN_r datestamp.o 511.povray_r colutils.o 521.wrf_r module_bc.fppized.o 521.wrf_r module_cu_kfeta.fppized.o 526.blender_r particle_edit.o 526.blender_r displist.o 526.blender_r CCGSubSurf.o 526.blender_r sketch.o > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html > I also wonder if it's easy to reduce some of them further as small test cases. > > Since xxlor is better than fmr at least on Power10, could you also evaluate > the affected bmks on P10 (even P8/P9) to ensure no performance degradation? There is no performance recession on P10/P9/P8. The detail data is listed on internal issue. Thanks Gui Haochen
Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
Hi Haochen, on 2023/9/14 16:35, HAO CHEN GUI wrote: > Hi Kewen, > > 在 2023/9/12 17:33, Kewen.Lin 写道: >> Ok, at least regression testing doesn't expose any needs to do disparaging >> for this. Could you also test this patch with SPEC2017 for P7 and P8 >> separately at options like -O2 or -O3, to see if there is any assembly >> change, and if yes filtering out some typical to check it's expected or >> not? I think it can help us to better evaluate the impact. Thanks! > > Just compared the object files of SPEC2017 for P7 and P8. There is no > difference between P7s'. For P8, some different object files are found. > All differences are the same. Patched object files replace xxlor with fmr. > It's expected as the fmr is added to ahead of xxlor in "*movsi_internal1". Thanks for checking! So for P7, this patch looks neutral, but for P8 and later, it may cause some few differences in code gen. I'm curious that how many total object files and different object files were checked and found on P8? fmr or xxlor preference can be further considered along with existing: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html I also wonder if it's easy to reduce some of them further as small test cases. Since xxlor is better than fmr at least on Power10, could you also evaluate the affected bmks on P10 (even P8/P9) to ensure no performance degradation? Thanks! BR, Kewen
Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
Hi Kewen, 在 2023/9/12 17:33, Kewen.Lin 写道: > Ok, at least regression testing doesn't expose any needs to do disparaging > for this. Could you also test this patch with SPEC2017 for P7 and P8 > separately at options like -O2 or -O3, to see if there is any assembly > change, and if yes filtering out some typical to check it's expected or > not? I think it can help us to better evaluate the impact. Thanks! Just compared the object files of SPEC2017 for P7 and P8. There is no difference between P7s'. For P8, some different object files are found. All differences are the same. Patched object files replace xxlor with fmr. It's expected as the fmr is added to ahead of xxlor in "*movsi_internal1". Thanks Gui Haochen
Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
Hi Haochen, on 2023/9/4 13:33, HAO CHEN GUI wrote: > Hi, > This patch enables SImode in FP registers on P7. Instruction "fctiw" > stores its integer output in an FP register. So SImode in FP register > needs be enabled on P7 if we want support "fctiw" on P7. > > The test case is in the second patch which implements 32bit inline > lrint. > > Compared to the last version, the main change it to remove disparaging > on the alternatives of "fmr". Test shows it doesn't cause regression. Ok, at least regression testing doesn't expose any needs to do disparaging for this. Could you also test this patch with SPEC2017 for P7 and P8 separately at options like -O2 or -O3, to see if there is any assembly change, and if yes filtering out some typical to check it's expected or not? I think it can help us to better evaluate the impact. Thanks! BR, Kewen > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628435.html > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > > > ChangeLog > rs6000: enable SImode in FP register on P7 > > gcc/ > PR target/88558 > * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): > Enable SImode in FP registers on P7. > * config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode > move between FP registers. Set attribute isa of stfiwx to "*" > and attribute of stxsiwx to "p7". > > patch.diff > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 44b448d2ba6..99085c2cdd7 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -1903,7 +1903,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, > machine_mode mode) > if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD) > return 1; > > - if (TARGET_P8_VECTOR && (mode == SImode)) > + if (TARGET_POPCNTD && mode == SImode) > return 1; > > if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode)) > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index cdab49fbb91..edf49bd74e3 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -7566,7 +7566,7 @@ (define_split > > (define_insn "*movsi_internal1" >[(set (match_operand:SI 0 "nonimmediate_operand" > - "=r, r, > + "=r, r, d, > r, d, v, > m, ?Z, ?Z, > r, r, r, r, > @@ -7575,7 +7575,7 @@ (define_insn "*movsi_internal1" > wa, r, > r, *h, *h") > (match_operand:SI 1 "input_operand" > - "r, U, > + "r, U, d, > m, ?Z, ?Z, > r, d, v, > I, L, eI, n, > @@ -7588,6 +7588,7 @@ (define_insn "*movsi_internal1" >"@ > mr %0,%1 > la %0,%a1 > + fmr %0,%1 > lwz%U1%X1 %0,%1 > lfiwzx %0,%y1 > lxsiwzx %x0,%y1 > @@ -7611,7 +7612,7 @@ (define_insn "*movsi_internal1" > mt%0 %1 > nop" >[(set_attr "type" > - "*, *, > + "*, *, fpsimple, > load, fpload, fpload, > store, fpstore,fpstore, > *, *, *, *, > @@ -7620,7 +7621,7 @@ (define_insn "*movsi_internal1" > mtvsr, mfvsr, > *, *, *") > (set_attr "length" > - "*, *, > + "*, *, *, > *, *, *, > *, *, *, > *, *, *, 8, > @@ -7629,9 +7630,9 @@ (define_insn "*movsi_internal1" > *, *, > *, *, *") > (set_attr "isa" > - "*, *, > -*, p8v,p8v, > -*, p8v,p8v, > + "*, *, *, > +*, p7, p8v, > +*, *, p8v, > *, *, p10,*, > p8v,p9v,p9v,p8v, > p9v,p8v,p9v, >
[PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
Hi, This patch enables SImode in FP registers on P7. Instruction "fctiw" stores its integer output in an FP register. So SImode in FP register needs be enabled on P7 if we want support "fctiw" on P7. The test case is in the second patch which implements 32bit inline lrint. Compared to the last version, the main change it to remove disparaging on the alternatives of "fmr". Test shows it doesn't cause regression. https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628435.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. ChangeLog rs6000: enable SImode in FP register on P7 gcc/ PR target/88558 * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Enable SImode in FP registers on P7. * config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode move between FP registers. Set attribute isa of stfiwx to "*" and attribute of stxsiwx to "p7". patch.diff diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 44b448d2ba6..99085c2cdd7 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1903,7 +1903,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode) if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD) return 1; - if (TARGET_P8_VECTOR && (mode == SImode)) + if (TARGET_POPCNTD && mode == SImode) return 1; if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode)) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index cdab49fbb91..edf49bd74e3 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7566,7 +7566,7 @@ (define_split (define_insn "*movsi_internal1" [(set (match_operand:SI 0 "nonimmediate_operand" - "=r, r, + "=r, r, d, r, d, v, m, ?Z, ?Z, r, r, r, r, @@ -7575,7 +7575,7 @@ (define_insn "*movsi_internal1" wa, r, r, *h, *h") (match_operand:SI 1 "input_operand" - "r, U, + "r, U, d, m, ?Z, ?Z, r, d, v, I, L, eI, n, @@ -7588,6 +7588,7 @@ (define_insn "*movsi_internal1" "@ mr %0,%1 la %0,%a1 + fmr %0,%1 lwz%U1%X1 %0,%1 lfiwzx %0,%y1 lxsiwzx %x0,%y1 @@ -7611,7 +7612,7 @@ (define_insn "*movsi_internal1" mt%0 %1 nop" [(set_attr "type" - "*, *, + "*, *, fpsimple, load, fpload, fpload, store, fpstore,fpstore, *, *, *, *, @@ -7620,7 +7621,7 @@ (define_insn "*movsi_internal1" mtvsr, mfvsr, *, *, *") (set_attr "length" - "*, *, + "*, *, *, *, *, *, *, *, *, *, *, *, 8, @@ -7629,9 +7630,9 @@ (define_insn "*movsi_internal1" *, *, *, *, *") (set_attr "isa" - "*, *, - *, p8v,p8v, - *, p8v,p8v, + "*, *, *, + *, p7, p8v, + *, *, p8v, *, *, p10,*, p8v,p9v,p9v,p8v, p9v,p8v,p9v,