Hi Carl, on 2023/6/30 05:36, Carl Love wrote: > Kewen: > > On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote: >>> Yea, I was going with a runnable test and didn't include the >>> instruction counts. Added back in. Rather then doing by processor >>> version (P8, P9, P10) I was able to do it by BE/LE. The >>> instruction >>> counts were the same for LE accross processor versions but there >>> are a >>> few instruction counts that vary with BE and LE. >> >> But the original test case only checks for cpu-types (processor >> version) >> but not for endianness, it means for the bif usages, there should not >> be >> different for endianness. Why does this changes with your new test >> case? >> Could you have a further look and make it consistent with some >> adjustment >> if possible? As we know, checking insn counts sometimes are fragile, >> so >> I think we should try our best to make it as robust as possible in >> the >> first place. >> >> Besides, the original case also have some differences between p7/p8 >> and >> p9. >> > > There are differences on P8 LE versus BE. I did a diff between the P8 > and P9 tests: > > diff vsx-vector-6.p8.c vsx-vector-6.p9.c > 3,4c3,4 > < /* { dg-require-effective-target powerpc_p8vector_ok } */ > < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > --- >> /* { dg-require-effective-target powerpc_p9vector_ok } */ >> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > 12c12 > < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */ > --- >> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */ > 23d22 > < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ > 37c36 > < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */ > --- >> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */ > > So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp, > xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different > between the two architectures. I then wrote a script to compile the > CPU specific test on Power 8, Power 9 and Power 10 architectures and > then grep for the above list of instructions. If I run the scrip on P8 > BE and LE I get> > > Power 8 BE Power 8 LE Power 9 LE Power 9 BE Power 10 LE* > (makalu-lp1) (genoa) (marlin) (nilram) (ltcd97-lp3) > instruction count count count count count > vperm 1 1 0 0 0 > vpermr 0 0 0 0 0 > xxpermr 0 0 1 0 1 > xvmsubadp 1 0 1 1 1 > xvmsubmdp 0 1 0 0 0 > xvsubdp 1 1 1 1 1 >
Thanks for looking into this and making this statistics. Is there a typo for column nilram? Otherwise, the below insn check /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */ would fail there. > > From the diff we see > > { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } } > > This test picks up the correct subtraction instruction for LE versus BE > so this "masks" the LE/BE difference. I changed the check in vsx- > vector-6-func-3op.c to match. This eliminates the LE and BE checks and > reduces the number of specific checks. OK, nice. > > In vsx-vector-6-func-3op.c The new test checks the counts for > xxpermdi, which the original test does not check. The check for > xxpermdi are not needed. They are not directly related to the builtin > tests. I removed them. OK. > > Looking at the LE/BE checks in the other test file vsx-vector-6-func- > 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in > the original test. The functions where these instructions are used get > inlined. On LE, the binary instructions show up in the inlined code as > well as what appears to be the binary for the original, non-inlined > function. Best I can see, the binary for the original function is dead > code. I don't see any calls to it. Seems like it shouldn't be there > as it would make the binary smaller. On BE, I don't see the binary for > the original non-inlined function. > > I had played with putting -Wno-inline on the command line but that > didn't seem to make any difference. However, you suggestion of > __attribute__ ((noipa)) does prevent the inlining and we don't get the > second copy of the instructions showing up. The inlining eliminated the > LE/BE differences for xvmaxsp, xvminsp and xvmaxdp. -Winline is a option for warning: "Warn if a function that is declared as inline cannot be inlined.", I think what you wanted is -fno-inline, and it's good to know noipa helps here. > > The instruction count test for xxlor in vsx-vector-6-func-2lop.c > differs on LE and BE vsx-vector-6-func-2op.c. I believe the > instruction is used with loads to reorder the data. I don't see anyway > to get around the extra xxlor instructions and verify the vec_or > builtin test generates the instruction. > OK, I'm still curious how the loads cause the difference. > I was able to eliminate all of the LE/BE qualifiers in the instruction > counts with the exception of xxlor. By using the same checks that look > for multiple versions of xvmsumb*, as was done in the original test, we > can also eliminate LE/BE specific tests and account for different > instructions across CPU versions. We could go back to checking for > specific instructions being generated on Power 8, Power 9, Power 10 if > you prefer not using checks that cover multiple flavors of a given > instruction across different CPU types. > > FYI, I eliminated the function call to do the various tests. Instead, > I modified the macro to generate a function call to do the test and > check the results. Got it, I'll review the latest revision soon, thanks. BR, Kewen