Hi Carl,

on 2023/6/30 05:36, Carl Love wrote:
> Kewen:
> 
> On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
>>> Yea, I was going with a runnable test and didn't include the
>>> instruction counts.  Added back in.  Rather then doing by processor
>>> version (P8, P9, P10) I was able to do it by BE/LE.  The
>>> instruction
>>> counts were the same for LE accross processor versions but there
>>> are a
>>> few instruction counts that vary with BE and LE.
>>
>> But the original test case only checks for cpu-types (processor
>> version)
>> but not for endianness, it means for the bif usages, there should not
>> be
>> different for endianness.  Why does this changes with your new test
>> case?
>> Could you have a further look and make it consistent with some
>> adjustment
>> if possible?  As we know, checking insn counts sometimes are fragile,
>> so
>> I think we should try our best to make it as robust as possible in
>> the
>> first place.
>>
>> Besides, the original case also have some differences between p7/p8
>> and
>> p9.
>>   
> 
> There are differences on P8 LE versus BE.  I did a diff between the P8
> and P9 tests:
> 
>  diff vsx-vector-6.p8.c vsx-vector-6.p9.c
> 3,4c3,4
> < /* { dg-require-effective-target powerpc_p8vector_ok } */
> < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> ---
>> /* { dg-require-effective-target powerpc_p9vector_ok } */
>> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> 12c12
> < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
> 23d22
> < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> 37c36
> < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
> ---
>> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> 
> So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
> xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
> between the two architectures.  I then wrote a script to compile the
> CPU specific test on Power 8, Power 9 and Power 10 architectures and
> then grep for the above list of instructions.  If I run the scrip on P8
> BE  and LE I get> 
> 
>             Power 8 BE    Power 8 LE   Power 9 LE   Power 9 BE    Power 10 LE*
>            (makalu-lp1)    (genoa)     (marlin)      (nilram)   (ltcd97-lp3)
> instruction   count         count        count         count        count
> vperm          1              1            0             0            0
> vpermr         0              0            0             0            0
> xxpermr        0              0            1             0            1
> xvmsubadp      1              0            1             1            1
> xvmsubmdp      0              1            0             0            0
> xvsubdp        1              1            1             1            1
> 

Thanks for looking into this and making this statistics.

Is there a typo for column nilram?   Otherwise, the below insn check

/* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */

would fail there.

> 
> From the diff we see 
> 
>   { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }
> 
> This test picks up the correct subtraction instruction for LE versus BE
> so this "masks" the LE/BE difference.  I changed the check in vsx-
> vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
> reduces the number of specific checks.

OK, nice.

> 
> In vsx-vector-6-func-3op.c  The new test checks the counts for
> xxpermdi, which the original test does not check.  The check for
> xxpermdi are not needed.  They are not directly related to the builtin
> tests.  I removed them.

OK.

> 
> Looking at the LE/BE checks in the other test file vsx-vector-6-func-
> 2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
> the original test.  The functions where these instructions are used get
> inlined.  On LE, the binary instructions show up in the inlined code as
> well as what appears to be the binary for the original, non-inlined
> function.  Best I can see, the binary for the original function is dead
> code.  I don't see any calls to it.  Seems like it shouldn't be there
> as it would make the binary smaller. On BE, I don't see the binary for
> the original non-inlined function.  
> 
> I had played with putting -Wno-inline on the command line but that
> didn't seem to make any difference.  However, you suggestion of
> __attribute__ ((noipa)) does prevent the inlining and we don't get the
> second copy of the instructions showing up. The inlining eliminated the
> LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

-Winline is a option for warning: "Warn if a function that is declared
as inline cannot be inlined.", I think what you wanted is -fno-inline,
and it's good to know noipa helps here.

> 
> The instruction count test for xxlor in vsx-vector-6-func-2lop.c
> differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
> instruction is used with loads to reorder the data.  I don't see anyway
> to get around the extra xxlor instructions and verify the vec_or
> builtin test generates the instruction.
> 

OK, I'm still curious how the loads cause the difference.

> I was able to eliminate all of the LE/BE qualifiers in the instruction
> counts with the exception of xxlor.  By using the same checks that look
> for multiple versions of xvmsumb*, as was done in the original test, we
> can also eliminate LE/BE specific tests and account for different
> instructions across CPU versions.  We could go back to checking for
> specific instructions being generated on Power 8, Power 9, Power 10 if
> you prefer not using checks that cover multiple flavors of a given
> instruction across different CPU types.  
> 
> FYI, I eliminated the function call to do the various tests.  Instead,
> I modified the macro to generate a function call to do the test and
> check the results.

Got it, I'll review the latest revision soon, thanks.

BR,
Kewen

Reply via email to