Hi,

On 2021/1/27 03:00, David Edelsohn wrote:
> On Tue, Jan 26, 2021 at 2:46 AM Xionghu Luo <luo...@linux.ibm.com> wrote:
>>
>> From: "luo...@cn.ibm.com" <luo...@cn.ibm.com>
>>
>> UNSPEC_SI_FROM_SF is not supported when TARGET_DIRECT_MOVE_64BIT
>> is false for -m32, don't generate VIEW_CONVERT_EXPR(ARRAY_REF) for
>> variable vector insert.  Remove rs6000_expand_vector_set_var helper
>> function, adjust the p8 and p9 definitions position and make them
>> static.
>>
>> The previous commit r11-6858 missed check m32, This patch is tested pass
>> on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE with
>> RUNTESTFLAGS="--target_board =unix'{-m32,-m64}" for BE targets.
> 
> Hi, Xionghu
> 
> Thanks for addressing these failures and the cleanups.
> 
> This patch addresses most of the failures.
> 
> pr79251-run.c continues to fail.  The directives are not complete.
> I'm not certain if your intention is to run the testcase on all
> targets or only on Power7 and above.  The testcase relies on vector
> "long long", which only is available with -mvsx, but the testcase only
> enables -maltivec.  I believe that the testcase happens to pass on the
> Linux platforms you tested because GCC defaulted to Power7 or Power8
> ISA and the ABI specifies VSX.  The testcase probably needs to be
> restricted to only run on some level of VSX enabled processor (VSX?
> Power8? Power9?) and also needs some additional compiler options when
> compiling the testcase instead of relying upon the default
> configuration of the compiler.


P8BE: gcc/testsuite/gcc/gcc.sum(it didn't run before due to no 'dg-do run'):

Running target unix/-m32
Running /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp 
...
PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
PASS: gcc.target/powerpc/pr79251-run.c execution test
                === gcc Summary for unix/-m32 ===

# of expected passes            2
Running target unix/-m64
Running /home/luoxhu/workspace/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp 
...
PASS: gcc.target/powerpc/pr79251-run.c (test for excess errors)
PASS: gcc.target/powerpc/pr79251-run.c execution test
                === gcc Summary for unix/-m64 ===

# of expected passes            2


How did you get the failure of pr79251-run.c, please?  I tested it all
passes on P7BE{m32,m64}/P8BE{m32,m64}/P8LE/P9LE of Linux.  This case is
just verifying the *functionality* of "u = vec_insert (254, v, k)" and
compare whether u[k] is changed to 254, it must work on all platforms,
no matter with the optimization or not, otherwise there is a functional
error.  As to "long long", add target vsx_hw and powerpc like below?
(Also change the -maltive to -mvsx for pr79251.p8.c/pr79251.p9.c.)

--- a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
@@ -1,4 +1,6 @@
-/* { dg-options "-O2 -maltivec" } */
+/* { dg-do run { target powerpc*-*-* } } */
+/* { dg-require-effective-target vsx_hw { target powerpc*-*-* } } */
+/* { dg-options "-O2 -mvsx" } */


Any other options necessary to limit the testcases? :)

> 
> Also, part of the change seems to be
> 
>> -  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
>> -    rs6000_expand_vector_set_var_p9 (target, val, idx);
>> +         if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8)
>> +           {
>> +             rs6000_expand_vector_set_var_p9 (target, val, elt_rtx);
>> +             return;
>> +           }
> 
> Does the P9 case need TARGET_POWERPC64?  This optimization seemed to
> be functioning on P9 in 32 bit mode prior to this fix.  It would be a
> shame to unnecessarily disable this optimization in 32 bit mode.  Or
> maybe it generated a functioning sequence but didn't utilize the
> optimization.  Would you please check / clarify?


>> -      if (TARGET_P8_VECTOR)
>> +      if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT)
>>          {
>>            stmt = build_array_ref (loc, stmt, arg2);
>>            stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,


This change in rs6000-c.c causes it not generating VIEW_CONVERT_EXPR(ARRAY_REF)
gimple code again for P9-32bit, then the IFN VEC_SET won't be matched,
so rs6000.c:rs6000_expand_vector_set_var_p9 won't be called to produce
optimized "lvsl+xxperm+lvsr" for P9-32bit again.  It's a pity, but without
this, it ICEs on P8BE-32bit because of UNSPEC_SI_FROM_SF is not supported
for -m32, if we need to support P9-32bit, why not also support P8-32bit as
only float vec_insert ICE, is there any method could move SI from SF for 
P8-32bit?
(I verified the m32 optimized binary and non-optimized binary for int vec_insert
on P8-BE-32bit, the performance gain is also huge as 8x improvement with this 
patch.)


rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
{
...
  /*  mtvsrd[wz] f0,tmp_val.  */
  rtx tmp_val = gen_reg_rtx (SImode);
  if (inner_mode == E_SFmode)
    emit_insn (gen_movsi_from_sf (tmp_val, val));
  else
    tmp_val = force_reg (SImode, val);
...
}


Thanks
Xionghu


Reply via email to