Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-12 Thread Kewen.Lin via Gcc-patches
on 2022/12/9 06:04, Michael Meissner wrote:
> On Wed, Dec 07, 2022 at 03:55:41PM +0800, Kewen.Lin wrote:
>> Hi Mike,
>>
>> on 2022/12/7 14:44, Michael Meissner wrote:
>>> On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
 Hi Mike,

 Thanks for fixing this!

 Could you help to elaborate why we need to disable it during libgcc 
 building?
>>>
>>> When you are building libgcc, you are building the __mulkc3, __divkc3
>>> functions.  The mapping in the compiler interferes with those functions,
>>> because at the moment, libgcc uses an alternate IEEE 128-bit type.
>>>
>>
>> But I'm still confused.  For __mulkc3 (__divkc3 is similar),
>>
>> 1) with -mabi=ieeelongdouble (TARGET_IEEEQUAD true, define 
>> __LONG_DOUBLE_IEEE128__),
>>the used types are:
>>
>>typedef float TFtype __attribute__ ((mode (TF)));
>>typedef __complex float TCtype __attribute__ ((mode (TC)));
>>
>> 2) with -mabi=ibmlongdouble (TARGET_IEEEQUAD false, not 
>> __LONG_DOUBLE_IEEE128__ defined),
>>the used types are:
>>
>>typedef float TFtype __attribute__ ((mode (KF)));
>>typedef __complex float TCtype __attribute__ ((mode (KC)));
>>
>> The proposed mapping in the current patch is:
>>
>> +
>> +  if (id == complex_multiply_builtin_code (KCmode))
>> +newname = "__mulkc3";
>> +
>> +  else if (id == complex_multiply_builtin_code (ICmode))
>> +newname = "__multc3";
>> +
>> +  else if (id == complex_multiply_builtin_code (TCmode))
>> +newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";
>>
>> for 1), TCmode && TARGET_IEEEQUAD => "__mulkc3"
>> for 2), KCmode => "__mulkc3"
>>
>> Both should be still with name "__mulkc3", do I miss anything?
>>
>> BR,
>> Kewen
> 
> The reason is due to the different internal types, the value range propigation
> pass throws an error when we are trying to build libgcc.  This is due to the
> underlying problem of different IEEE 128-bit types within the compiler.
> 

But this is the reason why we need patch #2 and #3, not the reason why we need
the special handling for building_libgcc in patch #1, right?

Without or with patch #1, the below ICE in libgcc exists, the ICE should have
nothing to do with the special handling for building_libgcc in patch #1.  I
think patch #2 which makes _Float128 and __float128 use the same internal
type fixes that ICE.

I still don't get the point why we need the special handling for 
building_libgcc,
I also tested on top of patch #1 and #2 w/ and w/o the special handling for
building_libgcc, both bootstrapped and regress-tested.

Could you have a double check?

> The 128-bit IEEE support in libgcc was written before _Float128 was added to
> GCC.  One consequence is that you can't get to the complex variant of
> __float128.  So libgcc needs to use the attribute mode to get to that type.
> 
> But with the support for IEEE 128-bit long double changing things, it makes 
> the
> libgcc code use the wrong code.
> 
> /home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c: In function 
> ‘__mulkc3_sw’:
> /home/meissner/fsf-src/work102/libgcc/config/rs6000/_mulkc3.c:97:1: internal 
> compiler error: in fold_stmt, at gimple-range-fold.cc:522
>97 | }
>   | ^
> 0x122784f3 fold_using_range::fold_stmt(vrange&, gimple*, fur_source&, 
> tree_node*)
> /home/meissner/fsf-src/work102/gcc/gimple-range-fold.cc:522
> 0x1226477f gimple_ranger::fold_range_internal(vrange&, gimple*, tree_node*)
> /home/meissner/fsf-src/work102/gcc/gimple-range.cc:257
> 0x12264b1f gimple_ranger::range_of_stmt(vrange&, gimple*, tree_node*)
> /home/meissner/fsf-src/work102/gcc/gimple-range.cc:318
> 0x113bdd8b range_query::value_of_stmt(gimple*, tree_node*)
> /home/meissner/fsf-src/work102/gcc/value-query.cc:134
> 0x1134838f rvrp_folder::value_of_stmt(gimple*, tree_node*)
> /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1023
> 0x111344cf 
> substitute_and_fold_dom_walker::before_dom_children(basic_block_def*)
> /home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:819
> 0x121ecbd3 dom_walker::walk(basic_block_def*)
> /home/meissner/fsf-src/work102/gcc/domwalk.cc:311
> 0x11134ee7 substitute_and_fold_engine::substitute_and_fold(basic_block_def*)
> /home/meissner/fsf-src/work102/gcc/tree-ssa-propagate.cc:998
> 0x11346bb7 execute_ranger_vrp(function*, bool, bool)
> /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1084
> 0x11347063 execute
> /home/meissner/fsf-src/work102/gcc/tree-vrp.cc:1165
> Please submit a full bug report, with preprocessed source (by using 
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
> make[1]: *** [/home/meissner/fsf-src/work102/libgcc/shared-object.mk:14: 
> _mulkc3.o] Error 1
> make[1]: Leaving directory 
> '/home/meissner/fsf-build-ppc64le/work102/powerpc64le-unknown-linux-gnu/libgcc'
> make: *** [Makefile:20623: all-target-libgcc] Error 2
> 
>>> I have a pa

Re: [PATCH] Fix aarch64 PR 99657: ICE with SVE types used without an error

2022-12-08 Thread Kewen.Lin via Gcc-patches
on 2022/12/8 15:43, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> on 2022/12/7 20:55, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
 Hi Richard,

 on 2022/12/7 17:16, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi,
>>
>> In the recent discussion on how to make some built-in type only valid for
>> some target features efficiently[1], Andrew mentioned this patch which he
>> made previously (Thanks!).  I confirmed it can help rs6000 related issue,
>> and noticed PR99657 is still opened, so I think we still want this to
>> be reviewed.
>
> But does it work for things like:
>
> void f(foo_t *x, foo_t *y) { *x = *y; }
>
> where no variables are being created with foo_t type?
>

 I think it can work for this case as it touches build_indirect_ref.
>>>
>>> Ah, ok.  But indirecting through a pointer doesn't seem to match
>>> TCTX_AUTO_STORAGE.
>>>
>>
>> Indeed. :)
>>
>>> I guess another case is where there are global variables of the type
>>> that you want to forbid, compiled while the target feature is enabled,
>>> and then a function tries to access those variables with the target
>>> feature locally disabled (through a pragma or attribute).  Does that
>>> case work?
>>>
>>
>> Thanks for pointing out this, I tried with the below test case:
>>
>> __vector_quad a1;
>> __vector_quad a2;
>>
>> __attribute__((target("cpu=power8")))
>> void foo ()
>> {
>>   a2 = a3;
>> }
>>
>> the verify_type_context doesn't catch it as you suspected, I think
>> it needs some enhancements somewhere.
> 
> FWIW, another possible case is:
> 
>   foo_t f();
>   void g(foo_t);
>   void h() { g(f()); }
> 
> I'm not aware of any verify_type_context checks that would catch this
> for SVE (since it's valid for SVE types).


OK, thanks for the information, MMA built-in types are not allowed to be
used in function arguments, by hacking with the restriction relaxing, I
confirm the verify_type_context check can't catch this case.

BR,
Kewen


Re: [PATCH v4, rs6000] Enable have_cbranchcc4 on rs6000

2022-12-08 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/12/8 11:08, HAO CHEN GUI wrote:
> Hi,
>   This patch enables "have_cbranchcc4" on rs6000 by defining
> a "cbranchcc4" expander. "have_cbrnachcc4" is a flag in ifcvt.cc
> to indicate if branch by CC bits is invalid or not. With this
> flag enabled, some branches can be optimized to conditional
> moves.
> 
>   Compared to last version, the main changes are on the test
> cases. Test case is renamed and comments are modified.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is this okay for trunk? Any recommendations? Thanks
> a lot.
> 

This patch is OK, thanks!

BR,
Kewen

> BR
> Gui Haochen
> 
> ChangeLog
> 2022-12-07  Haochen Gui 
> 
> gcc/
>   * config/rs6000/rs6000.md (cbranchcc4): New expander.
> 
> gcc/testsuite
>   * gcc.target/powerpc/cbranchcc4-1.c: New.
>   * gcc.target/powerpc/cbranchcc4-2.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index e9e5cd1e54d..d7ddd96cc70 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -11932,6 +11932,16 @@ (define_expand "cbranch4"
>DONE;
>  })
> 
> +(define_expand "cbranchcc4"
> +  [(set (pc)
> + (if_then_else (match_operator 0 "branch_comparison_operator"
> + [(match_operand 1 "cc_reg_operand")
> +  (match_operand 2 "zero_constant")])
> +   (label_ref (match_operand 3))
> +   (pc)))]
> +  ""
> +  "")
> +
>  (define_expand "cstore4_signed"
>[(use (match_operator 1 "signed_comparison_operator"
>   [(match_operand:P 2 "gpc_reg_operand")
> diff --git a/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c 
> b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c
> new file mode 100644
> index 000..6c2cd130b6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Verify there is no ICE with cbranchcc4 enabled.  */
> +
> +int foo (double d)
> +{
> +  if (d == 0.0)
> +return 0;
> +
> +  d = ((d) >= 0 ? (d) : -(d));
> +
> +  if (d < 1.0)
> +return 1;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/cbranchcc4-2.c 
> b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-2.c
> new file mode 100644
> index 000..528ba1a878d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ce1" } */
> +/* { dg-final { scan-rtl-dump "noce_try_store_flag_constants" "ce1" } } */
> +
> +/* The inner branch should be detected by ifcvt then be converted to a setcc
> +   with a plus by noce_try_store_flag_constants.  */
> +
> +int test (unsigned int a, unsigned int b)
> +{
> +return (a < b ? 0 : (a > b ? 2 : 1));
> +}


Re: [PATCH v5, rs6000] Change mode and insn condition for VSX scalar extract/insert instructions

2022-12-08 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Thanks for the update, some comments are inlined as below.

on 2022/12/2 15:03, HAO CHEN GUI wrote:
> Hi,
>   For scalar extract/insert instructions, exponent field can be stored in a
> 32-bit register. So this patch changes the mode of exponent field from DI to
> SI so that these instructions can be generated in a 32-bit environment. Also
> it removes TARGET_64BIT check for these instructions.
> 
>   The instructions using DI registers can be invoked with -mpowerpc64 in a
> 32-bit environment. The patch changes insn condition from TARGET_64BIT to
> TARGET_POWERPC64 for those instructions.
> 
>   This patch also changes prototypes and catagories of relevant built-ins and
   ~ categories
> effective target checks of test cases.
> 
>   Compared to last version, main changes are to remove 64-bit environment
> requirement for relevant built-ins in extend.texi. And to change the type of
> arguments of relevant built-ins in rs6000-overload.def.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-12-01  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_extract_exp): Set return type to const unsigned
>   int and move it from power9-64 to power9 catatlog.
 ~~~ catalog
>   (__builtin_vsx_scalar_extract_sig): Set return type to const unsigned
>   long long.
>   (__builtin_vsx_scalar_insert_exp): Set type of second argument to
>   unsigned int.
>   (__builtin_vsx_scalar_insert_exp_dp): Set type of second argument to
>   unsigned int and move it from power9-64 to power9 catatlog.
  ~~~ 

>   * config/rs6000/vsx.md (xsxexpdp): Set mode of first operand to
>   SImode.  Remove TARGET_64BIT from insn condition.
>   (xsxsigdp): Change insn condition from TARGET_64BIT to TARGET_POWERPC64.
>   (xsiexpdp): Change insn condition from TARGET_64BIT to
>   TARGET_POWERPC64.  Set mode of third operand to SImode.
>   (xsiexpdpf): Set mode of third operand to SImode.  Remove TARGET_64BIT
>   from insn condition.
>   * config/rs6000/rs6000-overload.def
>   (__builtin_vec_scalar_insert_exp): Set type of second argument to
>   unsigned int.
>   * doc/extend.texi (scalar_insert_exp): Set type of second argument to
>   unsigned int and remove 64-bit environment requirement when
>   significand has a float type.
>   (scalar_extract_exp): Remove 64-bit environment requirement.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-exp-0.c: Remove lp64 check.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-1.c: Remove lp64 check.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Deleted as the case is
>   invalid now.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-6.c: Replace lp64 check
>   with has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-0.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-6.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-0.c: Replace lp64 check
>   with has_arch_ppc64. Set type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-1.c: Set type of exponent
>   to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-12.c: Replace lp64 check
>   with has_arch_ppc64. Set type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-13.c: Remove lp64 check.
>   Set type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Set type of exponent to
>   unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-3.c: Remove lp64 check. Set
>   type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-4.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Deleted as the case is
>   invalid now.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f76f54793d7..d8d67fa0cad 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2833,6 +2833,11 @@
>const signed int __builtin_dtstsfi_ov_td (const int<6>, _Decimal128);
>  TSTSFI_OV_TD dfptstsfi_unordered_td {}
> 
> +  const unsigned int __builtin_vsx_scalar_extract_exp (double);
> +VSEEDP xsxexpdp {}
> +
> +  const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned int);
> +VSIEDPF xsiexpdpf {}

This __builtin_vsx_scalar_insert_exp_dp still requires 64-bit, see further
explanation below ...

> 
>  [power9-64]
>void __builtin_altivec_xst_len_r (vsc, void *, long);
> @@ -2847,19 +2852,13 @@
>pure vsc __builtin_vsx_lxvl (const void *, signed long);
>  LXVL lxvl {}
> 
> -  const 

Re: [PATCH] Fix aarch64 PR 99657: ICE with SVE types used without an error

2022-12-07 Thread Kewen.Lin via Gcc-patches
on 2022/12/7 20:55, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> on 2022/12/7 17:16, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
 Hi,

 In the recent discussion on how to make some built-in type only valid for
 some target features efficiently[1], Andrew mentioned this patch which he
 made previously (Thanks!).  I confirmed it can help rs6000 related issue,
 and noticed PR99657 is still opened, so I think we still want this to
 be reviewed.
>>>
>>> But does it work for things like:
>>>
>>> void f(foo_t *x, foo_t *y) { *x = *y; }
>>>
>>> where no variables are being created with foo_t type?
>>>
>>
>> I think it can work for this case as it touches build_indirect_ref.
> 
> Ah, ok.  But indirecting through a pointer doesn't seem to match
> TCTX_AUTO_STORAGE.
> 

Indeed. :)

> I guess another case is where there are global variables of the type
> that you want to forbid, compiled while the target feature is enabled,
> and then a function tries to access those variables with the target
> feature locally disabled (through a pragma or attribute).  Does that
> case work?
> 

Thanks for pointing out this, I tried with the below test case:

__vector_quad a1;
__vector_quad a2;

__attribute__((target("cpu=power8")))
void foo ()
{
  a2 = a3;
}

the verify_type_context doesn't catch it as you suspected, I think
it needs some enhancements somewhere.

> That's not an issue for SVE because global variables can't have
> sizeless type.
> 
>>> That's not to say we shouldn't have the patch.  I'm just not sure
>>> it can be the complete solution.
>>
>> I'm not sure about that either, maybe Andrew have more insights.
>> But as you pointed out in [1], I doubted trying to find all invalid
>> uses of a built-in type is worthwhile, it seems catching those usual
>> cases is enough and practical.  So if this verify_type_context
>> framework can cover the most of uses, maybe it's a good direction
>> to go and extend.
> 
> IMO it depends on what we're trying to protect against.  If the
> compiler can handle these types correctly even when the target feature
> is disabled, and we're simply disallowing the types for policy rather
> than correctness reasons, then maybe just handling the usual cases is
> good enough.  But things are different if the compiler is going to ICE
> or generate invalid code when something slips through.  In that case,
> I think the niche cases matter too.
> 

Thanks for the clarification, good point, I agree!  It means we still
need some handlings in movoo and movxo to avoid possible ICE, which can
still be caused by some cases like the above one or similar.  This
verify_type_context checking is only a nice add-on to improve the
diagnosis for invalid built-in type.  I'm going to fix the expanders,
it should be independent of this patch.

BR,
Kewen


Re: [PATCH] Fix aarch64 PR 99657: ICE with SVE types used without an error

2022-12-07 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2022/12/7 17:16, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi,
>>
>> In the recent discussion on how to make some built-in type only valid for
>> some target features efficiently[1], Andrew mentioned this patch which he
>> made previously (Thanks!).  I confirmed it can help rs6000 related issue,
>> and noticed PR99657 is still opened, so I think we still want this to
>> be reviewed.
> 
> But does it work for things like:
> 
> void f(foo_t *x, foo_t *y) { *x = *y; }
> 
> where no variables are being created with foo_t type?
> 

I think it can work for this case as it touches build_indirect_ref.

> That's not to say we shouldn't have the patch.  I'm just not sure
> it can be the complete solution.

I'm not sure about that either, maybe Andrew have more insights.
But as you pointed out in [1], I doubted trying to find all invalid
uses of a built-in type is worthwhile, it seems catching those usual
cases is enough and practical.  So if this verify_type_context
framework can cover the most of uses, maybe it's a good direction
to go and extend.

[1] https://gcc.gnu.org/pipermail/gcc/2022-December/240218.html

BR,
Kewen


Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-06 Thread Kewen.Lin via Gcc-patches
Hi Mike,

on 2022/12/7 14:44, Michael Meissner wrote:
> On Tue, Dec 06, 2022 at 05:36:54PM +0800, Kewen.Lin wrote:
>> Hi Mike,
>>
>> Thanks for fixing this!
>>
>> Could you help to elaborate why we need to disable it during libgcc building?
> 
> When you are building libgcc, you are building the __mulkc3, __divkc3
> functions.  The mapping in the compiler interferes with those functions,
> because at the moment, libgcc uses an alternate IEEE 128-bit type.
> 

But I'm still confused.  For __mulkc3 (__divkc3 is similar),

1) with -mabi=ieeelongdouble (TARGET_IEEEQUAD true, define 
__LONG_DOUBLE_IEEE128__),
   the used types are:

   typedef float TFtype __attribute__ ((mode (TF)));
   typedef __complex float TCtype __attribute__ ((mode (TC)));

2) with -mabi=ibmlongdouble (TARGET_IEEEQUAD false, not __LONG_DOUBLE_IEEE128__ 
defined),
   the used types are:

   typedef float TFtype __attribute__ ((mode (KF)));
   typedef __complex float TCtype __attribute__ ((mode (KC)));

The proposed mapping in the current patch is:

+
+  if (id == complex_multiply_builtin_code (KCmode))
+   newname = "__mulkc3";
+
+  else if (id == complex_multiply_builtin_code (ICmode))
+   newname = "__multc3";
+
+  else if (id == complex_multiply_builtin_code (TCmode))
+   newname = (TARGET_IEEEQUAD) ? "__mulkc3" : "__multc3";

for 1), TCmode && TARGET_IEEEQUAD => "__mulkc3"
for 2), KCmode => "__mulkc3"

Both should be still with name "__mulkc3", do I miss anything?

BR,
Kewen

> I have a patch for making libgcc use the 'right' type that I haven't submitted
> yet.  This is because the more general fix that these 3 patches do impacts 
> other
> functions (due to __float128 and _Float128 being different in the current
> compiler when -mabi=ieeelongdouble).
> 


Re: [PATCH v3, rs6000] Enable have_cbranchcc4 on rs6000

2022-12-06 Thread Kewen.Lin via Gcc-patches
on 2022/12/7 13:24, HAO CHEN GUI wrote:
> Hi Kewen,
>   Thanks so much for your review comments. I will fix them.
> 
> 在 2022/12/7 11:06, Kewen.Lin 写道:
>> Does this issue which relies on the fix for generic part make bootstrapping 
>> fail?
>> If no, how many failures it can cause?  I'm thinking if we can commit this 
>> firstly,
>> then in the commit log of the fix for generic part you can mention it can 
>> fix the
>> ICE exposed by this test case.
> 
> Yes, the bootstrapping fails if we enable cbranchcc4 without the generic 
> patch.
> Actually, the testcase comes from the ICE found in bootstrapping.

Ah, thanks for confirming, so the fix for the generic part should come first.
I just noticed Richi has approved it, you can mention this test case
gcc.target/powerpc/cbranchcc4-1.c in the commit log for a record when committing
it.  Thanks!

BR,
Kewen


Re: [PATCH v3, rs6000] Enable have_cbranchcc4 on rs6000

2022-12-06 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/12/6 13:44, HAO CHEN GUI wrote:
> Hi,
>   This patch enables "have_cbranchcc4" on rs6000 by defining
> a "cbranchcc4" expander. "have_cbrnachcc4" is a flag in ifcvt.cc
> to indicate if branch by CC bits is invalid or not. With this
> flag enabled, some branches can be optimized to conditional
> moves.
> 
>   The patch relies on the former patch which is under review.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607810.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is this okay for trunk? Any recommendations? Thanks
> a lot.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> 2022-12-06  Haochen Gui 
> 
> gcc/
>   * config/rs6000/rs6000.md (cbranchcc4): New expander.
> 
> gcc/testsuite
>   * gcc.target/powerpc/cbranchcc4.c: New.
>   * gcc.target/powerpc/cbranchcc4-1.c: New.

Nit: "cbranchcc4.c" -> "cbranchcc4-2.c" since we already number the cases.

> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index e9e5cd1e54d..d7ddd96cc70 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -11932,6 +11932,16 @@ (define_expand "cbranch4"
>DONE;
>  })
> 
> +(define_expand "cbranchcc4"
> +  [(set (pc)
> + (if_then_else (match_operator 0 "branch_comparison_operator"
> + [(match_operand 1 "cc_reg_operand")
> +  (match_operand 2 "zero_constant")])
> +   (label_ref (match_operand 3))
> +   (pc)))]
> +  ""
> +  "")
> +
>  (define_expand "cstore4_signed"
>[(use (match_operator 1 "signed_comparison_operator"
>   [(match_operand:P 2 "gpc_reg_operand")
> diff --git a/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c 
> b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c
> new file mode 100644
> index 000..3c8286bf091
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/cbranchcc4-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" */
> +

Missing one " }", typo from copy/paste?

> +/* This case should be successfully compiled after cbranchcc4 is enabled.  It
> +   generates a "*cbranch_2insn" insn which makes predicate check of 
> cbranchcc4
> +   failed and returns a NULL rtx from prepare_cmp_insn.  */

Nit: May be shorter like "Verify there is no ICE with cbranchcc4 enabled." and 
put
the details into commit logs.

Does this issue which relies on the fix for generic part make bootstrapping 
fail?
If no, how many failures it can cause?  I'm thinking if we can commit this 
firstly,
then in the commit log of the fix for generic part you can mention it can fix 
the
ICE exposed by this test case.

BR,
Kewen


Re: [PATCH] Fix aarch64 PR 99657: ICE with SVE types used without an error

2022-12-06 Thread Kewen.Lin via Gcc-patches
Hi,

In the recent discussion on how to make some built-in type only valid for
some target features efficiently[1], Andrew mentioned this patch which he
made previously (Thanks!).  I confirmed it can help rs6000 related issue,
and noticed PR99657 is still opened, so I think we still want this to
be reviewed.

Could some C/C++ FE experts help to review it?

Thanks in advance!

BR,
Kewen

[1] https://gcc.gnu.org/pipermail/gcc/2022-December/240220.html

on 2021/11/9 18:09, apinski--- via Gcc-patches wrote:
> From: Andrew Pinski 
> 
> This fixes fully where SVE types were being used without sve being enabled.
> Instead of trying to fix it such that we error out during RTL time, it is
> better to error out in front-ends.  This expands verify_type_context to
> have a context of auto storage decl which is used for both auto storage
> decls and for indirection context.
> 
> A few testcases needed to be updated for the new error message; they were
> already being rejected before hand.
> 
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> 
> PR target/99657
> gcc/c/ChangeLog:
> 
>   * c-decl.c (finish_decl): Call verify_type_context
>   for all decls and not just global_decls.
>   * c-typeck.c (build_indirect_ref): Call verify_type_context
>   to check to see if the type is ok to be used.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-sve-builtins.cc (verify_type_context):
>   Add TXTC_AUTO_STORAGE support
>   * target.h (enum type_context_kind): Add TXTC_AUTO_STORAGE.
> 
> gcc/cp/ChangeLog:
> 
>   * decl.c (cp_finish_decl): Call verify_type_context
>   for all decls and not just global_decls.
>   * typeck.c (cp_build_indirect_ref_1): Call verify_type_context
>   to check to see if the type is ok to be used.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/acle/general/nosve_1.c: Update test.
>   * gcc.target/aarch64/sve/acle/general/nosve_4.c: Likewise.
>   * gcc.target/aarch64/sve/acle/general/nosve_5.c: Likewise.
>   * gcc.target/aarch64/sve/acle/general/nosve_6.c: Likewise.
>   * gcc.target/aarch64/sve/pcs/nosve_2.c: Likewise.
>   * gcc.target/aarch64/sve/pcs/nosve_3.c: Likewise.
>   * gcc.target/aarch64/sve/pcs/nosve_4.c: Likewise.
>   * gcc.target/aarch64/sve/pcs/nosve_5.c: Likewise.
>   * gcc.target/aarch64/sve/pcs/nosve_6.c: Likewise.
>   * gcc.target/aarch64/sve/pcs/nosve_9.c: New test.
> ---
>  gcc/c/c-decl.c| 14 +++---
>  gcc/c/c-typeck.c  |  2 ++
>  gcc/config/aarch64/aarch64-sve-builtins.cc| 14 ++
>  gcc/cp/decl.c | 10 ++
>  gcc/cp/typeck.c   |  4 
>  gcc/target.h  |  3 +++
>  .../gcc.target/aarch64/sve/acle/general/nosve_1.c |  1 +
>  .../gcc.target/aarch64/sve/acle/general/nosve_4.c |  2 +-
>  .../gcc.target/aarch64/sve/acle/general/nosve_5.c |  2 +-
>  .../gcc.target/aarch64/sve/acle/general/nosve_6.c |  1 +
>  .../gcc.target/aarch64/sve/pcs/nosve_2.c  |  2 +-
>  .../gcc.target/aarch64/sve/pcs/nosve_3.c  |  2 +-
>  .../gcc.target/aarch64/sve/pcs/nosve_4.c  |  3 +--
>  .../gcc.target/aarch64/sve/pcs/nosve_5.c  |  3 +--
>  .../gcc.target/aarch64/sve/pcs/nosve_6.c  |  3 +--
>  .../gcc.target/aarch64/sve/pcs/nosve_9.c  | 15 +++
>  16 files changed, 60 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pcs/nosve_9.c
> 
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index 186fa1692c1..b3583622475 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -5441,19 +5441,19 @@ finish_decl (tree decl, location_t init_loc, tree 
> init,
>  
>if (VAR_P (decl))
>  {
> +  type_context_kind context = TCTX_AUTO_STORAGE;
>if (init && TREE_CODE (init) == CONSTRUCTOR)
>   add_flexible_array_elts_to_size (decl, init);
>  
>complete_flexible_array_elts (DECL_INITIAL (decl));
>  
>if (is_global_var (decl))
> - {
> -   type_context_kind context = (DECL_THREAD_LOCAL_P (decl)
> -? TCTX_THREAD_STORAGE
> -: TCTX_STATIC_STORAGE);
> -   if (!verify_type_context (input_location, context, TREE_TYPE (decl)))
> - TREE_TYPE (decl) = error_mark_node;
> - }
> + context = (DECL_THREAD_LOCAL_P (decl)
> +? TCTX_THREAD_STORAGE
> +: TCTX_STATIC_STORAGE);
> +
> +  if (!verify_type_context (input_location, context, TREE_TYPE (decl)))
> + TREE_TYPE (decl) = error_mark_node;
>  
>if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != 
> error_mark_node
> && COMPLETE_TYPE_P (TREE_TYPE (decl)))
> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index 782414f8c8c..e926b7c1964 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/

Re: [PATCH 2/3] Make __float128 use the _Float128 type, PR target/107299

2022-12-06 Thread Kewen.Lin via Gcc-patches
Hi Mike,

Thanks for fixing this, some comments are inlined below.

on 2022/11/2 10:42, Michael Meissner wrote:
> This patch fixes the issue that GCC cannot build when the default long double
> is IEEE 128-bit.  It fails in building libgcc, specifically when it is trying
> to buld the __mulkc3 function in libgcc.  It is failing in 
> gimple-range-fold.cc
> during the evrp pass.  Ultimately it is failing because the code declared the
> type to use TFmode but it used F128 functions (i.e. KFmode).
> 
>   typedef float TFtype __attribute__((mode (TF)));
>   typedef __complex float TCtype __attribute__((mode (TC)));
> 
>   TCtype
>   __mulkc3_sw (TFtype a, TFtype b, TFtype c, TFtype d)
>   {
> TFtype ac, bd, ad, bc, x, y;
> TCtype res;
> 
> ac = a * c;
> bd = b * d;
> ad = a * d;
> bc = b * c;
> 
> x = ac - bd;
> y = ad + bc;
> 
> if (__builtin_isnan (x) && __builtin_isnan (y))
>   {
> _Bool recalc = 0;
> if (__builtin_isinf (a) || __builtin_isinf (b))
>   {
> 
> a = __builtin_copysignf128 (__builtin_isinf (a) ? 1 : 0, a);
> b = __builtin_copysignf128 (__builtin_isinf (b) ? 1 : 0, b);
> if (__builtin_isnan (c))
>   c = __builtin_copysignf128 (0, c);
> if (__builtin_isnan (d))
>   d = __builtin_copysignf128 (0, d);
> recalc = 1;
>   }
> if (__builtin_isinf (c) || __builtin_isinf (d))
>   {
> 
> c = __builtin_copysignf128 (__builtin_isinf (c) ? 1 : 0, c);
> d = __builtin_copysignf128 (__builtin_isinf (d) ? 1 : 0, d);
> if (__builtin_isnan (a))
>   a = __builtin_copysignf128 (0, a);
> if (__builtin_isnan (b))
>   b = __builtin_copysignf128 (0, b);
> recalc = 1;
>   }
> if (!recalc
> && (__builtin_isinf (ac) || __builtin_isinf (bd)
> || __builtin_isinf (ad) || __builtin_isinf (bc)))
>   {
> 
> if (__builtin_isnan (a))
>   a = __builtin_copysignf128 (0, a);
> if (__builtin_isnan (b))
>   b = __builtin_copysignf128 (0, b);
> if (__builtin_isnan (c))
>   c = __builtin_copysignf128 (0, c);
> if (__builtin_isnan (d))
>   d = __builtin_copysignf128 (0, d);
> recalc = 1;
>   }
> if (recalc)
>   {
> x = __builtin_inff128 () * (a * c - b * d);
> y = __builtin_inff128 () * (a * d + b * c);
>   }
>   }
> 
> __real__ res = x;
> __imag__ res = y;
> return res;
>   }
> 

One further reduced test case can be:

typedef float TFtype __attribute__((mode (TF)));

TFtype test (TFtype t)
{
  return __builtin_copysignf128 (1.0q, t);
}

Since this reduced test case is quite small, maybe it's good to make it as one
test case associated with this patch.

> Currently GCC uses the long double type node for __float128 if long double is
> IEEE 128-bit.  It did not use the node for _Float128.
> 
> Originally this was noticed if you call the nansq function to make a signaling
> NaN (nansq is mapped to nansf128).  Because the type node for _Float128 is
> different from __float128, the machine independent code converts signaling 
> NaNs
> to quiet NaNs if the types are not compatible.  The following tests used to
> fail when run on a system where long double is IEEE 128-bit:
> 
>   gcc.dg/torture/float128-nan.c
>   gcc.target/powerpc/nan128-1.c
> 
> This patch makes both __float128 and _Float128 use the same type node.
> 
> One side effect of not using the long double type node for __float128 is that 
> we
> must only use KFmode for _Float128/__float128.  The libstdc++ library won't
> build if we use TFmode for _Float128 and __float128 when long double is IEEE
> 128-bit.
> 

Sorry that I didn't get the point of the latter sentence, this proposed patch
uses KFmode for _Float128 and __float128, do you mean that would be fine for
libstdc++ library building since we don't use TFmode for them?

> Another minor side effect is that the f128 round to odd fused multiply-add
> function will not merge negatition with the FMA operation when the type is 
> long
> double.  If the type is __float128 or _Float128, then it will continue to do 
> the
> optimization.  The round to odd functions are defined in terms of __float128
> arguments.  For example:
> 
>   long double
>   do_fms (long double a, long double b, long double c)
>   {
>   return __builtin_fmaf128_round_to_odd (a, b, -c);
>   }
> 
> will generate (assuming -mabi=ieeelongdouble):
> 
>   xsnegqp 4,4
>   xsmaddqpo 4,2,3
>   xxlor 34,36,36
> 
> w

Re: [PATCH 1/3] Rework 128-bit complex multiply and divide, PR target/107299

2022-12-06 Thread Kewen.Lin via Gcc-patches
Hi Mike,

Thanks for fixing this!

on 2022/11/2 10:40, Michael Meissner wrote:
> This function reworks how the complex multiply and divide built-in functions 
> are
> done.  Previously we created built-in declarations for doing long double 
> complex
> multiply and divide when long double is IEEE 128-bit.  The old code also did 
> not
> support __ibm128 complex multiply and divide if long double is IEEE 128-bit.
> 
> In terms of history, I wrote the original code just as I was starting to test
> GCC on systems where IEEE 128-bit long double was the default.  At the time, 
> we
> had not yet started mangling the built-in function names as a way to bridge
> going from a system with 128-bit IBM long double to 128-bin IEEE long double.
 ~~~ bit
> 
> The original code depends on there only being two 128-bit types invovled.  
> With
  ~~ 
involved.
> the next patch in this series, this assumption will no longer be true.  When
> long double is IEEE 128-bit, there will be 2 IEEE 128-bit types (one for the
> explicit __float128/_Float128 type and one for long double).
> 
> The problem is we cannot create two separate built-in functions that resolve 
> to
> the same name.  This is a requirement of add_builtin_function and the C front
> end.  That means for the 3 possible modes (IFmode, KFmode, and TFmode), you 
> can
> only use 2 of them.
> 
> This code does not create the built-in declaration with the changed name.
> Instead, it uses the TARGET_MANGLE_DECL_ASSEMBLER_NAME hook to change the name
> before it is written out to the assembler file like it now does for all of the
> other long double built-in functions.
> 
> We need to disable using this mapping when we are building libgcc, 
> specifically
> when it is building the floating point 128-bit multiply and divide functions.
> The flag that is used when libgcc is built (-fbuilding-libcc) is only 
> available
> in the C/C++ front ends.  We need to remember that we are building libgcc in 
> the
> rs6000-c.cc support to be able to use this later to decided whether to mangle
> the decl assembler name or not.

IIUC, for the building of floating point 128-bit multiply and divide functions,
the mapping seems still to work fine.  Using the multiply as example here, when
compiling _multc3.o, it's with -mabi=ibmlongdouble, it maps the name with "tc"
which is consistent with what we need.  While compiling _mulkc3*.o, we would
check the macro __LONG_DOUBLE_IEEE128__ and use either KCmode or TCmode, either
of the mapping result would be "kc".

Could you help to elaborate why we need to disable it during libgcc building?

BR,
Kewen

> 
> When I wrote these patches, I discovered that __ibm128 complex multiply and
> divide had originally not been supported if long double is IEEE 128-bit as it
> would generate calls to __mulic3 and __divic3.  I added tests in the testsuite
> to verify that the correct name (i.e. __multc3 and __divtc3) is used in this
> case.
> 
> I tested all 3 patchs for PR target/107299 on:
> 
> 1)LE Power10 using --with-cpu=power10 
> --with-long-double-format=ieee
> 2)LE Power10 using --with-cpu=power10 
> --with-long-double-format=ibm
> 3)LE Power9  using --with-cpu=power9  
> --with-long-double-format=ibm
> 4)BE Power8  using --with-cpu=power8  
> --with-long-double-format=ibm
> 
> Once all 3 patches have been applied, we can once again build GCC when long
> double is IEEE 128-bit.  There were no other regressions with these patches.
> Can I check these patches into the trunk?
> 
> 2022-11-01   Michael Meissner  
> 
> gcc/
> 
>   PR target/107299
>   * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Set
>   building_libgcc.
>   * config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
>   (init_float128_ieee): Delete code to switch complex multiply and divide
>   for long double.
>   (complex_multiply_builtin_code): New helper function.
>   (complex_divide_builtin_code): Likewise.
>   (rs6000_mangle_decl_assembler_name): Add support for mangling the name
>   of complex 128-bit multiply and divide built-in functions.
>   * config/rs6000/rs6000.opt (building_libgcc): New target variable.
> 
> gcc/testsuite/
> 
>   PR target/107299
>   * gcc.target/powerpc/divic3-1.c: New test.
>   * gcc.target/powerpc/divic3-2.c: Likewise.
>   * gcc.target/powerpc/mulic3-1.c: Likewise.
>   * gcc.target/powerpc/mulic3-2.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000-c.cc   |   8 ++
>  gcc/config/rs6000/rs6000.cc | 110 +++-
>  gcc/config/rs6000/rs6000.opt|   4 +
>  gcc/testsuite/gcc.target/powerpc/divic3-1.c |  18 
>  gcc/testsuite/gcc.target/powerpc/divic3-2.c |  17 +++
>  gcc/testsuite/gcc.target/powerpc/mulic3-1.c |  18 
>  gcc/testsuite/gcc.target/powerpc/mulic3-2.c | 

Re: [PATCH 2/3]rs6000: NFC use sext_hwi to replace ((v&0xf..f)^0x80..0) - 0x80..0

2022-12-02 Thread Kewen.Lin via Gcc-patches
on 2022/12/1 20:16, guojiufu wrote:
> On 2022-12-01 15:10, Jiufu Guo via Gcc-patches wrote:
>> Hi Kewen,
>>
>> 在 12/1/22 2:11 PM, Kewen.Lin 写道:
>>> on 2022/12/1 13:35, Jiufu Guo wrote:
 Hi Kewen,

 Thanks for your quick and insight review!

 在 12/1/22 1:17 PM, Kewen.Lin 写道:
> Hi Jeff,
>
> on 2022/12/1 09:36, Jiufu Guo wrote:
>> Hi,
>>
>> This patch just uses sext_hwi to replace the expression like:
>> ((value & 0xf..f) ^ 0x80..0) - 0x80..0 for rs6000.cc and rs6000.md.
>>
>> Bootstrap & regtest pass on ppc64{,le}.
>> Is this ok for trunk?
>
> You didn't say it clearly but I guessed you have grepped in the whole
> config/rs6000 directory, right?  I noticed there are still two places
> using this kind of expression in function constant_generates_xxspltiw,
> but I assumed it's intentional as their types are not HOST_WIDE_INT.
>
> gcc/config/rs6000/rs6000.cc:  short sign_h_word = ((h_word & 0x) 
> ^ 0x8000) - 0x8000;
> gcc/config/rs6000/rs6000.cc:  int sign_word = ((word & 0x) ^ 
> 0x8000) - 0x8000;
>
> If so, could you state it clearly in commit log like "with type
> signed/unsigned HOST_WIDE_INT" or similar?
>
 Good question!

 And as you said sext_hwi is more for "signed/unsigned HOST_WIDE_INT".
 For these two places, it seems sext_hwi is not needed actually!
 And I did see why these expressions are used, may be just an assignment
 is ok.
>>>
>>> ah, I see.  I agree using the assignment is quite enough.  Could you
>>> please also simplify them together?  Since they are with the form
>>> "((value & 0xf..f) ^ 0x80..0) - 0x80..0" too, and can be refactored
>>> in a better way.  Thanks!
>>
>> Sure, I believe just "short sign_h_word = vsx_const->half_words[0];"
>> should be correct :-), and included in the updated patch.
>>
>> Updated patch is attached,  bootstrap®test is on going.
> 

on 2022/12/1 15:10, Jiufu Guo wrote:
> From 8aa8e1234b6ec34473434951a3a6177253aac770 Mon Sep 17 00:00:00 2001
> From: Jiufu Guo 
> Date: Wed, 30 Nov 2022 13:13:37 +0800
> Subject: [PATCH 2/2]rs6000: update ((v&0xf..f)^0x80..0) - 0x80..0 with code: 
> like sext_hwi
> 

May be shorter with "rs6000: Update sign extension computation with sext_hwi"?

> This patch just replaces the expression like: 
> ((value & 0xf..f) ^ 0x80..0) - 0x80..0 to better code(e.g. sext_hwi) for
> rs6000.cc, rs6000.md and predicates.md (files under rs6000/).


> Bootstrap and regtest pass on ppc64{,le}.
> 

Thanks for updating and testing, this patch is OK.

BR,
Kewen


Re: [PATCH 2/3]rs6000: NFC use sext_hwi to replace ((v&0xf..f)^0x80..0) - 0x80..0

2022-11-30 Thread Kewen.Lin via Gcc-patches
on 2022/12/1 13:35, Jiufu Guo wrote:
> Hi Kewen,
> 
> Thanks for your quick and insight review!
> 
> 在 12/1/22 1:17 PM, Kewen.Lin 写道:
>> Hi Jeff,
>>
>> on 2022/12/1 09:36, Jiufu Guo wrote:
>>> Hi,
>>>
>>> This patch just uses sext_hwi to replace the expression like:
>>> ((value & 0xf..f) ^ 0x80..0) - 0x80..0 for rs6000.cc and rs6000.md.
>>>
>>> Bootstrap & regtest pass on ppc64{,le}.
>>> Is this ok for trunk? 
>>
>> You didn't say it clearly but I guessed you have grepped in the whole
>> config/rs6000 directory, right?  I noticed there are still two places
>> using this kind of expression in function constant_generates_xxspltiw,
>> but I assumed it's intentional as their types are not HOST_WIDE_INT.
>>
>> gcc/config/rs6000/rs6000.cc:  short sign_h_word = ((h_word & 0x) ^ 
>> 0x8000) - 0x8000;
>> gcc/config/rs6000/rs6000.cc:  int sign_word = ((word & 0x) ^ 
>> 0x8000) - 0x8000;
>>
>> If so, could you state it clearly in commit log like "with type
>> signed/unsigned HOST_WIDE_INT" or similar?
>>
> Good question!
> 
> And as you said sext_hwi is more for "signed/unsigned HOST_WIDE_INT".
> For these two places, it seems sext_hwi is not needed actually!
> And I did see why these expressions are used, may be just an assignment
> is ok.

ah, I see.  I agree using the assignment is quite enough.  Could you
please also simplify them together?  Since they are with the form 
"((value & 0xf..f) ^ 0x80..0) - 0x80..0" too, and can be refactored
in a better way.  Thanks!

BR,
Kewen



Re: [PATCH 2/3]rs6000: NFC use sext_hwi to replace ((v&0xf..f)^0x80..0) - 0x80..0

2022-11-30 Thread Kewen.Lin via Gcc-patches
on 2022/12/1 13:17, Kewen.Lin via Gcc-patches wrote:
> Hi Jeff,
> 
> on 2022/12/1 09:36, Jiufu Guo wrote:
>> Hi,
>>
>> This patch just uses sext_hwi to replace the expression like:
>> ((value & 0xf..f) ^ 0x80..0) - 0x80..0 for rs6000.cc and rs6000.md.
>>
>> Bootstrap & regtest pass on ppc64{,le}.
>> Is this ok for trunk? 
> 
> You didn't say it clearly but I guessed you have grepped in the whole
> config/rs6000 directory, right?  I noticed there are still two places
> using this kind of expression in function constant_generates_xxspltiw,
> but I assumed it's intentional as their types are not HOST_WIDE_INT.
> 
> gcc/config/rs6000/rs6000.cc:  short sign_h_word = ((h_word & 0x) ^ 
> 0x8000) - 0x8000;
> gcc/config/rs6000/rs6000.cc:  int sign_word = ((word & 0x) ^ 
> 0x8000) - 0x8000;
> 

oh, one place in gcc/config/rs6000/predicates.md got missed.

./predicates.md-756-{
./predicates.md-757-  HOST_WIDE_INT val;
...
./predicates.md-762-  val = const_vector_elt_as_int (op, elt);
./predicates.md:763:  val = ((val & 0xff) ^ 0x80) - 0x80;
./predicates.md-764-  return EASY_VECTOR_15_ADD_SELF (val);
./predicates.md-765-})

Do you mind to have a further check?

Thanks!

Kewen


Re: [PATCH 2/3]rs6000: NFC use sext_hwi to replace ((v&0xf..f)^0x80..0) - 0x80..0

2022-11-30 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2022/12/1 09:36, Jiufu Guo wrote:
> Hi,
> 
> This patch just uses sext_hwi to replace the expression like:
> ((value & 0xf..f) ^ 0x80..0) - 0x80..0 for rs6000.cc and rs6000.md.
> 
> Bootstrap & regtest pass on ppc64{,le}.
> Is this ok for trunk? 

You didn't say it clearly but I guessed you have grepped in the whole
config/rs6000 directory, right?  I noticed there are still two places
using this kind of expression in function constant_generates_xxspltiw,
but I assumed it's intentional as their types are not HOST_WIDE_INT.

gcc/config/rs6000/rs6000.cc:  short sign_h_word = ((h_word & 0x) ^ 
0x8000) - 0x8000;
gcc/config/rs6000/rs6000.cc:  int sign_word = ((word & 0x) ^ 
0x8000) - 0x8000;

If so, could you state it clearly in commit log like "with type
signed/unsigned HOST_WIDE_INT" or similar?

This patch is OK once the above question gets confirmed, thanks!

BR,
Kewen

> 
> BR,
> Jeff (Jiufu)
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (num_insns_constant_gpr): Use sext_hwi.
>   (darwin_rs6000_legitimate_lo_sum_const_p): Likewise.
>   (mem_operand_gpr): Likewise.
>   (mem_operand_ds_form): Likewise.
>   (rs6000_legitimize_address): Likewise.
>   (rs6000_emit_set_const): Likewise.
>   (rs6000_emit_set_long_const): Likewise.
>   (print_operand): Likewise.
>   * config/rs6000/rs6000.md: Likewise.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 30 +-
>  gcc/config/rs6000/rs6000.md | 10 +-
>  2 files changed, 18 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 5efe9b22d8b..718072cc9a1 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -6021,7 +6021,7 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
> 
>else if (TARGET_POWERPC64)
>  {
> -  HOST_WIDE_INT low  = ((value & 0x) ^ 0x8000) - 0x8000;
> +  HOST_WIDE_INT low = sext_hwi (value, 32);
>HOST_WIDE_INT high = value >> 31;
> 
>if (high == 0 || high == -1)
> @@ -8456,7 +8456,7 @@ darwin_rs6000_legitimate_lo_sum_const_p (rtx x, 
> machine_mode mode)
>  }
> 
>/* We only care if the access(es) would cause a change to the high part.  
> */
> -  offset = ((offset & 0x) ^ 0x8000) - 0x8000;
> +  offset = sext_hwi (offset, 16);
>return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
>  }
> 
> @@ -8522,7 +8522,7 @@ mem_operand_gpr (rtx op, machine_mode mode)
>if (GET_CODE (addr) == LO_SUM)
>  /* For lo_sum addresses, we must allow any offset except one that
> causes a wrap, so test only the low 16 bits.  */
> -offset = ((offset & 0x) ^ 0x8000) - 0x8000;
> +offset = sext_hwi (offset, 16);
> 
>return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
>  }
> @@ -8562,7 +8562,7 @@ mem_operand_ds_form (rtx op, machine_mode mode)
>if (GET_CODE (addr) == LO_SUM)
>  /* For lo_sum addresses, we must allow any offset except one that
> causes a wrap, so test only the low 16 bits.  */
> -offset = ((offset & 0x) ^ 0x8000) - 0x8000;
> +offset = sext_hwi (offset, 16);
> 
>return SIGNED_16BIT_OFFSET_EXTRA_P (offset, extra);
>  }
> @@ -9136,7 +9136,7 @@ rs6000_legitimize_address (rtx x, rtx oldx 
> ATTRIBUTE_UNUSED,
>  {
>HOST_WIDE_INT high_int, low_int;
>rtx sum;
> -  low_int = ((INTVAL (XEXP (x, 1)) & 0x) ^ 0x8000) - 0x8000;
> +  low_int = sext_hwi (INTVAL (XEXP (x, 1)), 16);
>if (low_int >= 0x8000 - extra)
>   low_int = 0;
>high_int = INTVAL (XEXP (x, 1)) - low_int;
> @@ -10203,7 +10203,7 @@ rs6000_emit_set_const (rtx dest, rtx source)
> lo = operand_subword_force (dest, WORDS_BIG_ENDIAN != 0,
> DImode);
> emit_move_insn (hi, GEN_INT (c >> 32));
> -   c = ((c & 0x) ^ 0x8000) - 0x8000;
> +   c = sext_hwi (c, 32);
> emit_move_insn (lo, GEN_INT (c));
>   }
>else
> @@ -10242,7 +10242,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
> 
>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
> -emit_move_insn (dest, GEN_INT ((ud1 ^ 0x8000) - 0x8000));
> +emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
> 
>else if ((ud4 == 0x && ud3 == 0x && (ud2 & 0x8000))
>  || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000)))
> @@ -10250,7 +10250,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> 
>emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
> -   GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
> +   GEN_INT (sext_hwi (ud2 << 16, 32)));
>if (ud1 != 0)
>   emit_move_insn (dest,
>   gen_rtx_IOR (DImode, copy_rtx (temp),
> @@ -10261,8 +10261,7 @@ rs6000_e

Re: [PATCH 3/3]rs6000: NFC no need copy_rtx in rs6000_emit_set_long_const and rs6000_emit_set_const

2022-11-30 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2022/12/1 09:36, Jiufu Guo wrote:
> Hi,
> 
> Function rs6000_emit_set_const/rs6000_emit_set_long_const are only invoked 
> from
> two "define_split"s where the target operand is limited to gpc_reg_operand or
> int_reg_operand, then the operand must be REG_P.
> And in rs6000_emit_set_const/rs6000_emit_set_long_const, to create temp rtx,
> it is using code like "gen_reg_rtx({S|D}Imode)", it must also be REG_P.
> So, copy_rtx is not needed for temp and dest.
> 
> This patch removes those "copy_rtx" for rs6000_emit_set_const and
> rs6000_emit_set_long_const.
> 
> Bootstrap & regtest pass on ppc64{,le}.
> Is this ok for trunk? 

This patch is okay, thanks!  For the subject, IMHO it's better to use something
like: "rs6000: Remove useless copy_rtx in rs6000_emit_set_{,long}_const".
I don't see NFC tag used much in GCC, though it's used a lot in llvm, but
anyway you can append (NFC)/[NFC] at the end if you like.  :)

BR,
Kewen


PING^1 [PATCH v2] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-11-30 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603350.html

BR,
Kewen

on 2022/10/12 16:12, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> PR106680 shows that -m32 -mpowerpc64 is different from
> -mpowerpc64 -m32, this is determined by the way how we
> handle option powerpc64 in rs6000_handle_option.
> 
> Segher pointed out this difference should be taken as
> a bug and we should ensure that option powerpc64 is
> independent of -m32/-m64.  So this patch removes the
> handlings in rs6000_handle_option and add some necessary
> supports in rs6000_option_override_internal instead.
> 
> With this patch, if users specify -m{no-,}powerpc64, the
> specified value is honoured, otherwise, for 64bit it
> always enables OPTION_MASK_POWERPC64; while for 32bit
> and TARGET_POWERPC64 and OS_MISSING_POWERPC64, it disables
> OPTION_MASK_POWERPC64.
> 
> btw, following Segher's suggestion, I did some tries to warn
> when OPTION_MASK_POWERPC64 is set for OS_MISSING_POWERPC64.
> If warn for the case that powerpc64 is specified explicitly,
> there are some TCs using -m32 -mpowerpc64 on ppc64-linux,
> they need some updates, meanwhile the artificial run
> with "--target_board=unix'{-m32/-mpowerpc64}'" will have
> noisy warnings on ppc64-linux.  If warn for the case that
> it's specified implicitly, they can just be initialized by
> TARGET_DEFAULT (like -m32 on ppc64-linux) or set from the 
> given cpu mask, we have to special case them and not to warn.
> As Segher's latest comment, I decide not to warn them and
> keep it consistent with before.
> 
> Bootstrapped and regress-tested on:
>   - powerpc64-linux-gnu P7 and P8 {-m64,-m32}
>   - powerpc64le-linux-gnu P9 and P10
>   - powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}
> 
> Hi Iain, could you help to test this new patch on darwin
> again?  Thanks in advance!
> 
> Is it ok for trunk if darwin testing goes well?
> 



[PATCH v2] predict: Adjust optimize_function_for_size_p [PR105818]

2022-11-30 Thread Kewen.Lin via Gcc-patches
Hi,

Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
if fun->decl is not null but no cgraph node is available for it.
As PR105818 shows, this could cause unexpected consequence.  For
the case in PR105818, when parsing bar decl in function foo, the
cfun is the function structure for foo, for which there is no
cgraph node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect
since the context is to optimize for size, the flag optimize_size
is true.

The patch is to make optimize_function_for_size_p to check
opt_for_fn (fun->decl, optimize_size) further when fun->decl
is available but no cgraph node, it's just like what function
cgraph_node::optimize_for_size_p does at its first step.

One regression failure got exposed on aarch64-linux-gnu:

PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
-DPREVENT_OPTIMIZATION  line 21 x == 10 - i

The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
used in function fold_range_test during c parsing, it uses
optimize_function_for_speed_p which is equal to the invertion
of optimize_function_for_size_p.  At that time cfun->decl is valid
but no cgraph node for it, w/o this patch function
optimize_function_for_speed_p returns true eventually, while it
returns false with this patch.  Since the command line option -Os
is specified, there is no reason to interpret it as "for speed".
I think this failure is expected and adjust the test case
accordingly.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596628.html

Comparing with v1, v2 adopts opt_for_fn (fun->decl, optimize_size)
instead of optimize_size as Honza's previous comments.

Besides, the reply to Honza's question "Why exactly PR105818 hits
the flag change issue?" was at the link:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596667.html

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it for trunk?

BR,
Kewen
-
PR middle-end/105818

gcc/ChangeLog:

* predict.cc (optimize_function_for_size_p): Further check
optimize_size of fun->decl when it is valid but no cgraph node.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr105818.c: New test.
* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
---
 gcc/predict.cc  |  3 ++-
 gcc/testsuite/gcc.dg/guality/pr54693-2.c|  2 +-
 gcc/testsuite/gcc.target/powerpc/pr105818.c | 11 +++
 3 files changed, 14 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 1bc7ab94454..ecb4aabc9df 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -268,7 +268,8 @@ optimize_function_for_size_p (struct function *fun)
   cgraph_node *n = cgraph_node::get (fun->decl);
   if (n)
 return n->optimize_for_size_p ();
-  return OPTIMIZE_SIZE_NO;
+  return opt_for_fn (fun->decl, optimize_size) ? OPTIMIZE_SIZE_MAX
+  : OPTIMIZE_SIZE_NO;
 }

 /* Return true if function FUN should always be optimized for speed.  */
diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c 
b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
index 68aa6c63d71..14ca94ad37d 100644
--- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c
+++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
@@ -17,7 +17,7 @@ foo (int x, int y, int z)
   int i = 0;
   while (x > 3 && y > 3 && z > 3)
 {  /* { dg-final { gdb-test .+2 "i" "v + 1" } } */
-   /* { dg-final { gdb-test .+1 "x" "10 - i" } } */
+   /* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { 
aarch64*-*-* && { any-opts "-Os" } } } } } */
   bar (i); /* { dg-final { gdb-test . "y" "20 - 2 * i" } } */
/* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { 
aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */
   i++, x--, y -= 2, z -= 3;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c 
b/gcc/testsuite/gcc.target/powerpc/pr105818.c
new file mode 100644
index 000..679647e189d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c
@@ -0,0 +1,11 @@
+/* { dg-options "-Os -fno-tree-vectorize" } */
+
+/* Verify there is no ICE.  */
+
+#pragma GCC optimize "-fno-tree-vectorize"
+
+void
+foo (void)
+{
+  void bar (void);
+}
--
2.27.0


[PATCH] rs6000: Fix some issues related to Power10 fusion [PR104024]

2022-11-30 Thread Kewen.Lin via Gcc-patches
Hi,

As PR104024 shows, the option -mpower10-fusion isn't guarded by
-mcpu=power10, it causes compiler to fuse for some patterns
even without power10 support and then causes ICE unexpectedly,
this patch is to simply unmask it without power10 support, not
emit any warnings as this option is undocumented.

Besides, for some define_insns in fusion.md which use constraint
v, it requires the condition VECTOR_UNIT_ALTIVEC_OR_VSX_P
(mode), otherwise it can cause ICE in reload, see test
case pr104024-2.c.

Bootstrapped and regtested on powerpc64-linux-gnu P8,
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-
PR target/104024

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Disable
TARGET_P10_FUSION if !TARGET_POWER10.
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/genfusion.pl: Add the check for define_insns
with constraint v.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr104024-1.c: New test.
* gcc.target/powerpc/pr104024-2.c: New test.
---
 gcc/config/rs6000/fusion.md   | 130 +-
 gcc/config/rs6000/genfusion.pl|  12 +-
 gcc/config/rs6000/rs6000.cc   |  11 +-
 gcc/testsuite/gcc.target/powerpc/pr104024-1.c |  16 +++
 gcc/testsuite/gcc.target/powerpc/pr104024-2.c |  18 +++
 5 files changed, 113 insertions(+), 74 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr104024-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr104024-2.c

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index 15f0c16f705..c504f65a045 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -1875,7 +1875,7 @@ (define_insn "*fuse_vand_vand"
   (match_operand:VM 1 "altivec_register_operand" 
"%v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && TARGET_P10_FUSION)"
   "@
vand %3,%1,%0\;vand %3,%3,%2
vand %3,%1,%0\;vand %3,%3,%2
@@ -1893,7 +1893,7 @@ (define_insn "*fuse_vandc_vand"
   (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && TARGET_P10_FUSION)"
   "@
vandc %3,%1,%0\;vand %3,%3,%2
vandc %3,%1,%0\;vand %3,%3,%2
@@ -1911,7 +1911,7 @@ (define_insn "*fuse_veqv_vand"
   (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v")))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && TARGET_P10_FUSION)"
   "@
veqv %3,%1,%0\;vand %3,%3,%2
veqv %3,%1,%0\;vand %3,%3,%2
@@ -1929,7 +1929,7 @@ (define_insn "*fuse_vnand_vand"
   (not:VM (match_operand:VM 1 
"altivec_register_operand" "v,v,v,v")))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && TARGET_P10_FUSION)"
   "@
vnand %3,%1,%0\;vand %3,%3,%2
vnand %3,%1,%0\;vand %3,%3,%2
@@ -1947,7 +1947,7 @@ (define_insn "*fuse_vnor_vand"
   (not:VM (match_operand:VM 1 
"altivec_register_operand" "v,v,v,v")))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && TARGET_P10_FUSION)"
   "@
vnor %3,%1,%0\;vand %3,%3,%2
vnor %3,%1,%0\;vand %3,%3,%2
@@ -1965,7 +1965,7 @@ (define_insn "*fuse_vor_vand"
   (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && TARGET_P10_FUSION)"
   "@
vor %3,%1,%0\;vand %3,%3,%2
vor %3,%1,%0\;vand %3,%3,%2
@@ -1983,7 +1983,7 @@ (define_insn "*fuse_vorc_vand"
   (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 "=X,X,X,&v"))]
-  "(TARGET_P10_FUSION)"
+  "(VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && TARGET_P10_FUSION)"
   "@
vorc %3,%1,%0\;vand %3,%3,%2
vorc %3,%1,%0\;vand %3,%3,%2
@@ -2001,7 +2001,7 @@ (define_insn "*fuse_vxor_vand"
   (match_operand:VM 1 "altivec_register_operand" 
"v,v,v,v"))
  (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
(clobber (match_scratch:VM 4 

Re: [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

2022-11-27 Thread Kewen.Lin via Gcc-patches
Add more experts in CC.

on 2022/11/23 10:54, HAO CHEN GUI wrote:
> Hi,
>   I want to enable "have_cbranchcc4" on rs6000. But not all combinations of
> comparison codes and sub CC modes are benefited to generate cbranchcc4 insns
> on rs6000. There is an predicate for operand0 of cbranchcc4 to bypass
> some combinations. It gets assertion failure in prepare_cmp_insn. I think
> we shouldn't suppose that all comparison codes and sub CC modes are supported
> and throw an assertion failure in prepare_cmp_insn. It might check the
> predicate and go to fail if the predicate can't be satisfied. This patch
> changes the behavior of those codes.
> 
>   Bootstrapped and tested on powerpc64-linux BE/LE and x86 with no 
> regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> 
> ChangeLog
> 2022-11-23  Haochen Gui 
> 
> gcc/
>   * optabs.cc (prepare_cmp_insn): Go to fail other than assert it when
>   predicate check of "cbranchcc4" operand[0] fails.
> 
> patch.diff
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 165f8d1fa22..3ec8f6b17ba 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -4484,8 +4484,9 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>  {
>enum insn_code icode = optab_handler (cbranch_optab, CCmode);
>test = gen_rtx_fmt_ee (comparison, VOIDmode, x, y);
> -  gcc_assert (icode != CODE_FOR_nothing
> -  && insn_operand_matches (icode, 0, test));
> +  gcc_assert (icode != CODE_FOR_nothing);
> +  if (!insn_operand_matches (icode, 0, test))
> + goto fail;


IMHO, this change looks to accord with the other code in prepare_cmp_insn, which
allows the preparation to fail with NULL_RTX ptest.  Its caller can make its own
decision (ICE due to unexpected, or try other ways) when ptest is null.

If this direction is sensible, maybe we can make it goto fail too if the icode 
==
CODE_FOR_nothing, since we already try to relax the restriction.

BR,
Kewen

>*ptest = test;
>return;
>  }




Re: [PATCH] vect: Fold LEN_{LOAD, STORE} if it's for the whole vector [PR107412]

2022-11-27 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2022/11/24 17:24, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi,
>>
>> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
>> STORE} into normal vector load/store if the given length is known
>> to be equal to the length of the whole vector.  It would help to
>> improve overall cycles as normally the latency of vector access
>> with length in bytes is bigger than normal vector access, and it
>> also saves the preparation for length if constant length can not
>> be encoded into instruction (such as on power).
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>>  PR tree-optimization/107412
>>
>> gcc/ChangeLog:
>>
>>  * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
>>  (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
>>  mask_p indicating it's for mask or length, and add some handlings for
>>  IFN LEN_{LOAD,STORE}.
>>  (gimple_fold_mask_load): Rename to ...
>>  (gimple_fold_partial_load): ... this, add one parameter mask_p.
>>  (gimple_fold_mask_store): Rename to ...
>>  (gimple_fold_partial_store): ... this, add one parameter mask_p.
>>  (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
>>  and adjust calls on gimple_fold_mask_load_store_mem_ref to
>>  gimple_fold_partial_load_store_mem_ref.
> 
> Sorry to reply to late (still catching up on email), but:
> 
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/powerpc/pr107412.c: New test.
>>  * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
>>  folded LEN_LOAD.
>> ---
>>  gcc/gimple-fold.cc| 57 ++-
>>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++
>>  3 files changed, 64 insertions(+), 14 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>>
>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
>> index a1704784bc9..e3a087defa6 100644
>> --- a/gcc/gimple-fold.cc
>> +++ b/gcc/gimple-fold.cc
>> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree 
>> type,
>>return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>>  }
>>
>> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
>> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
>> for the memory it references, otherwise return null.  VECTYPE is the
>> -   type of the memory vector.  */
>> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
>> +   otherwise it's for LEN.  */
>>
>>  static tree
>> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
>> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool 
>> mask_p)
>>  {
>>tree ptr = gimple_call_arg (call, 0);
>>tree alias_align = gimple_call_arg (call, 1);
>> -  tree mask = gimple_call_arg (call, 2);
>> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
>> +  if (!tree_fits_uhwi_p (alias_align))
>>  return NULL_TREE;
>>
>> +  if (mask_p)
>> +{
>> +  tree mask = gimple_call_arg (call, 2);
>> +  if (!integer_all_onesp (mask))
>> +return NULL_TREE;
>> +} else {
> 
> Minor nit: }, else, and { should be on separate lines.  But the thing
> I actually wanted to say was...

Thanks for catching, I must have forgotten to reformat these lines.

> 
>> +  tree basic_len = gimple_call_arg (call, 2);
>> +  if (!tree_fits_uhwi_p (basic_len))
>> +return NULL_TREE;
>> +  unsigned int nargs = gimple_call_num_args (call);
>> +  tree bias = gimple_call_arg (call, nargs - 1);
>> +  gcc_assert (tree_fits_uhwi_p (bias));
>> +  tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
>> +  unsigned int len = tree_to_uhwi (biased_len);
>> +  unsigned int vect_len
>> += GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
>> +  if (vect_len != len)
>> +return NULL_TREE;
> 
> Using "unsigned int" truncates the value.  I realise that's probably
> safe in this context, since large values have undefined behaviour.
> But it still seems better to use an untruncated type, so that it
> looks less like an oversight.  (I think this is one case where "auto"
> can help, since it gets the type right automatically.)
> 
> It would also be better to avoid the to_constant, since we haven't
> proven is_constant.  How about:
> 
>   tree basic_len = gimple_call_arg (call, 2);
>   if (!poly_int_tree_p (basic_len))
>   return NULL_TREE;
>   unsigned int nargs = gimple_call_num_args (call);
>   tree bias = gimple_call_arg (call, nargs - 1);
>   gcc_assert (TREE_CODE (bias) == INTEGER_CST);
>   if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
>   GET_MODE_SIZE (TYPE_MODE (vectype
> 

Re: [PATCH]rs6000: Load high and low part of 64bit constant independently

2022-11-27 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/11/25 23:46, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Nov 25, 2022 at 09:21:21PM +0800, Jiufu Guo wrote:
>> "Kewen.Lin"  writes:
>>> on 2022/9/15 16:30, Jiufu Guo wrote:
 For a complicate 64bit constant, blow is one instruction-sequence to
 build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

 while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
 This sequence is using 2 registers to build high and low part firstly,
 and then merge them.
 In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
 register with potential register pressure).
> 
> And crucially this patch only uses two registers if can_create_pseudo_p.
> Please mention that.
> 
* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit
constant build.
> 
> If you don't give details of what this does, just say "Update." please.
> But update to what?
> 
> "Generate more parallel code if can_create_pseudo_p." maybe?
> 
 +rtx H = gen_reg_rtx (DImode);
 +rtx L = gen_reg_rtx (DImode);
> 
> Please don't use all-uppercase variable names, those are for macros.  In
> fact, don't use uppercase in variable (and function etc.) names at all,
> unless there is a really good reason to.
> 
> Just call it "high" and "low", or "hi" and "lo", or something?
> 
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
 @@ -0,0 +1,27 @@
 +/* { dg-do run } */
 +/* { dg-options "-O2 -mdejagnu-cpu=power7  -save-temps" } */
>>>
>>> Why do we need power7 here?
>> power8/9 are also ok for this case.  Actually, O just want to
>> avoid to use new p10 instruction, like "pli", and then selected
>> an old arch option.
> 
> Why does it need _at least_ p7, is the question (as I understand it).
> 

Yeah, that's what I was intended to ask, since those insns to be scanned
don't actually require Power7 or later.

> To prohibit pli etc. you can do -mno-prefixed (which works on all older
> CPUs just as well), or skip the test if prefixed insns are enabled, or
> scan for the then generated code as well.  The first option is by far
> the simplest.

Yeah, using -mno-prefixed is perfect here, nice!

BR,
Kewen


Re: [PATCH]rs6000: Load high and low part of 64bit constant independently

2022-11-25 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

Sorry for the late review.

on 2022/9/15 16:30, Jiufu Guo wrote:
> Hi,
> 
> For a complicate 64bit constant, blow is one instruction-sequence to
> build:
>   lis 9,0x800a
>   ori 9,9,0xabcd
>   sldi 9,9,32
>   oris 9,9,0xc167
>   ori 9,9,0xfa16
> 
> while we can also use below sequence to build:
>   lis 9,0xc167
>   lis 10,0x800a
>   ori 9,9,0xfa16
>   ori 10,10,0xabcd
>   rldimi 9,10,32,0
> This sequence is using 2 registers to build high and low part firstly,
> and then merge them.
> In parallel aspect, this sequence would be faster. (Ofcause, using 1 more
> register with potential register pressure).
> 
> Bootstrap and regtest pass on ppc64le.
> Is this ok for trunk?
> 
> 
> BR,
> Jeff(Jiufu)
> 
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Update 64bit
>   constant build.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/parall_5insn_const.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc   | 45 +++
>  .../gcc.target/powerpc/parall_5insn_const.c   | 27 +++
>  2 files changed, 53 insertions(+), 19 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index a656cb32a47..759c6309677 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10180,26 +10180,33 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>  }
>else
>  {
> -  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> -
> -  emit_move_insn (copy_rtx (temp),
> -   GEN_INT (((ud4 << 16) ^ 0x8000) - 0x8000));
> -  if (ud3 != 0)
> - emit_move_insn (copy_rtx (temp),
> - gen_rtx_IOR (DImode, copy_rtx (temp),
> -  GEN_INT (ud3)));
> +  if (can_create_pseudo_p ())
> + {
> +   /* lis A,U4; ori A,U3; lis B,U2; ori B,U1; rldimi A,B,32,0.  */

Nit: A, B are supposed to be H, L?

> +   rtx H = gen_reg_rtx (DImode);
> +   rtx L = gen_reg_rtx (DImode);
> +   HOST_WIDE_INT num = (ud2 << 16) | ud1;
> +   rs6000_emit_set_long_const (L, (num ^ 0x8000) - 0x8000);
> +   num = (ud4 << 16) | ud3;
> +   rs6000_emit_set_long_const (H, (num ^ 0x8000) - 0x8000);
> +   emit_insn (gen_rotldi3_insert_3 (dest, H, GEN_INT (32), L,
> +GEN_INT (0x)));
> + }
> +  else
> + {
> +   /* lis A, U4; ori A,U3; rotl A,32; oris A,U2; ori A,U1.  */
   ~~~ unexpected space?

> +   emit_move_insn (dest,
> +   GEN_INT (((ud4 << 16) ^ 0x8000) - 0x8000));
> +   if (ud3 != 0)
> + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud3)));
> 
> -  emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest,
> -   gen_rtx_ASHIFT (DImode, copy_rtx (temp),
> -   GEN_INT (32)));
> -  if (ud2 != 0)
> - emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
> - gen_rtx_IOR (DImode, copy_rtx (temp),
> -  GEN_INT (ud2 << 16)));
> -  if (ud1 != 0)
> - emit_move_insn (dest,
> - gen_rtx_IOR (DImode, copy_rtx (temp),
> -  GEN_INT (ud1)));
> +   emit_move_insn (dest, gen_rtx_ASHIFT (DImode, dest, GEN_INT (32)));
> +   if (ud2 != 0)
> + emit_move_insn (dest,
> + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16)));
> +   if (ud1 != 0)
> + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1)));
> + }
>  }
>  }
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c 
> b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> new file mode 100644
> index 000..ed8ccc73378
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power7  -save-temps" } */

Why do we need power7 here?

> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +/* { dg-final { scan-assembler-times {\mlis\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mori\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
> +
> +void __attribute__ ((noinline)) foo (unsigned long long *a)
> +{
> +  /* 2lis+2ori+1rldimi for each constant.  */

Nit: seems better to read with "/* 2 lis + 2 ori + 1 rldimi for ..."

BR,
Kewen


Re: [PATCH v4, rs6000] Change mode and insn condition for VSX scalar extract/insert instructions

2022-11-25 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Sorry for the late reply.

on 2022/11/7 14:45, HAO CHEN GUI wrote:
> Hi,
>   For scalar extract/insert instructions, exponent field can be stored in a
> 32-bit register. So this patch changes the mode of exponent field from DI to
> SI. So these instructions can be generated in a 32-bit environment. The patch
> removes TARGET_64BIT check for these instructiions.
> 
>   The instructions using DI registers can be invoked with -mpowerpc64 in a
> 32-bit environment. The patch changes insn condition from TARGET_64BIT to
> TARGET_POWERPC64 for those instructions.
> 
>   This patch also changes prototypes and catagories of relevant built-ins and
> effective target checks of test cases.
> 
>   Compared to last version, main changes are to set catagories of relevant
> built-ins from power9-64 to power9 and remove some unnecessary test cases.
> Last version: 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601196.html

Nice, but this still missed to update the documentation on scalar_extract_exp 
and
scalar_insert_exp, both are ducumented as "require a 64-bit environment".
We need some corresponding updates in gcc/doc/extend.texi.

The others look good to me excepting for one nit in test cases ...

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> 
> ChangeLog
> 2022-11-07  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_extract_exp): Set return type to const unsigned
>   int and move it from power9-64 to power9 catatlog.
>   (__builtin_vsx_scalar_extract_sig): Set return type to const unsigned
>   long long.
>   (__builtin_vsx_scalar_insert_exp): Set type of second argument to
>   unsigned int.
>   (__builtin_vsx_scalar_insert_exp_dp): Set type of second argument to
>   unsigned int and move it from power9-64 to power9 catatlog.
>   * config/rs6000/vsx.md (xsxexpdp): Set mode of first operand to
>   SImode.  Remove TARGET_64BIT from insn condition.
>   (xsxsigdp): Change insn condition from TARGET_64BIT to TARGET_POWERPC64.
>   (xsiexpdp): Change insn condition from TARGET_64BIT to
>   TARGET_POWERPC64.  Set mode of third operand to SImode.
>   (xsiexpdpf): Set mode of third operand to SImode.  Remove TARGET_64BIT
>   from insn condition.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-exp-0.c: Remove lp64 check.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-1.c: Remove lp64 check.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Deleted as case is
>   invalid now.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-6.c: Replace lp64 check
>   with has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-0.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-6.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-0.c: Replace lp64 check
>   with has_arch_ppc64. Set type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-1.c: Set type of exponent
>   to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-12.c: Replace lp64 check
>   with has_arch_ppc64. Set type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-13.c: Remove lp64 check.
>   Set type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Set type of exponent to
>   unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-3.c: Remove lp64 check. Set
>   type of exponent to unsigned int.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-4.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Deleted as case is
>   invalid now.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f76f54793d7..d8d67fa0cad 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2833,6 +2833,11 @@
>const signed int __builtin_dtstsfi_ov_td (const int<6>, _Decimal128);
>  TSTSFI_OV_TD dfptstsfi_unordered_td {}
> 
> +  const unsigned int __builtin_vsx_scalar_extract_exp (double);
> +VSEEDP xsxexpdp {}
> +
> +  const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned int);
> +VSIEDPF xsiexpdpf {}
> 
>  [power9-64]
>void __builtin_altivec_xst_len_r (vsc, void *, long);
> @@ -2847,19 +2852,13 @@
>pure vsc __builtin_vsx_lxvl (const void *, signed long);
>  LXVL lxvl {}
> 
> -  const signed long __builtin_vsx_scalar_extract_exp (double);
> -VSEEDP xsxexpdp {}
> -
> -  const signed long __builtin_vsx_scalar_extract_sig (double);
> +  const unsigned long long __builtin_vsx_scalar_extract_sig (double);
>  VSESDP xsxsigdp {}
> 
>const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
> -unsigned long long);
> +

Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

2022-11-25 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

Sorry for the late reply.

on 2022/10/26 19:40, Jiufu Guo wrote:
> Hi,
> 
> PR106708 constaint some constants which can be support by li/lis + oris/xoris.
    typo?

for "li/lis + oris/xoris", I interpreted it into four combinations:

   li + oris, lis + oris, li + xoris, lis + xoris.

not sure just me interpreting like that, but the actual combinations
which this patch adopts are:

   li + oris, li + xoris, lis + xoris.

It's a bit off, but not a big deal, up to you to reword it or not.  :)

> 
> For constant C:
> if '(c & 0x80008000ULL) == 0x8000ULL' or say:
> 32(0) || 1(1) || 15(x) || 1(0) || 15(x), we could use li+oris to
> build constant 'C'.
> Here N(M) means N continuous bit M, x for M means it is ok for either
> 1 or 0; '||' means concatenation.
> 
> if '(c & 0x8000ULL) == 0x8000ULL' or say:
> 32(1) || 16(x) || 1(1) || 15(x), using li+xoris would be ok.
> 
> if '(c & 0xULL) == 0x' or say:
> 32(1) || 1(0) || 15(x) || 16(0), using lis+xoris would be ok.
> 
> This patch update rs6000_emit_set_long_const to support these forms.
> Bootstrap and regtest pass on ppc64 and ppc64le.
> 
> Is this ok for trunk?

This updated version looks good to me, but I'd leave it to Segher for the
final say.  Thanks!

BR,
Kewen

> 
> BR,
> Jeff(Jiufu)
> 
> 
>   PR target/106708
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Support
>   constants which can be built with li + oris or li/lis + xoris.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr106708-run.c: New test.
>   * gcc.target/powerpc/pr106708.c: New test.
>   * gcc.target/powerpc/pr106708.h: New file.
> 
> ---
>  gcc/config/rs6000/rs6000.cc   | 41 ++-
>  .../gcc.target/powerpc/pr106708-run.c | 17 
>  gcc/testsuite/gcc.target/powerpc/pr106708.c   | 12 ++
>  gcc/testsuite/gcc.target/powerpc/pr106708.h   |  9 
>  4 files changed, 69 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708-run.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708.h
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index d2743f7bce6..9b7a51f052d 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10228,6 +10228,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>  {
>rtx temp;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
> +  HOST_WIDE_INT orig_c = c;
> 
>ud1 = c & 0x;
>c = c >> 16;
> @@ -10253,21 +10254,41 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>   gen_rtx_IOR (DImode, copy_rtx (temp),
>GEN_INT (ud1)));
>  }
> +  else if ((ud4 == 0x && ud3 == 0x)
> +&& ((ud1 & 0x8000) || (ud1 == 0 && !(ud2 & 0x8000
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +
> +  HOST_WIDE_INT imm = (ud1 & 0x8000) ? ((ud1 ^ 0x8000) - 0x8000)
> +  : ((ud2 << 16) - 0x8000);
> +  /* li/lis + xoris */
> +  emit_move_insn (temp, GEN_INT (imm));
> +  emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
> +  GEN_INT (orig_c ^ imm)));
> +}
>else if (ud3 == 0 && ud4 == 0)
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> 
>gcc_assert (ud2 & 0x8000);
> -  emit_move_insn (copy_rtx (temp),
> -   GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
> -  if (ud1 != 0)
> - emit_move_insn (copy_rtx (temp),
> - gen_rtx_IOR (DImode, copy_rtx (temp),
> -  GEN_INT (ud1)));
> -  emit_move_insn (dest,
> -   gen_rtx_ZERO_EXTEND (DImode,
> -gen_lowpart (SImode,
> - copy_rtx (temp;
> +
> +  if (!(ud1 & 0x8000))
> + {
> +   /* li+oris */
> +   emit_move_insn (temp, GEN_INT (ud1));
> +   emit_move_insn (dest,
> +   gen_rtx_IOR (DImode, temp, GEN_INT (ud2 << 16)));
> + }
> +  else
> + {
> +   emit_move_insn (temp,
> +   GEN_INT (((ud2 << 16) ^ 0x8000) - 0x8000));
> +   if (ud1 != 0)
> + emit_move_insn (temp, gen_rtx_IOR (DImode, temp, GEN_INT (ud1)));
> +   emit_move_insn (dest,
> +   gen_rtx_ZERO_EXTEND (DImode,
> +gen_lowpart (SImode, temp)));
> + }
>  }
>else if (ud1 == ud3 && ud2 == ud4)
>  {
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106708-run.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106708-run.c
> new file mode 100644
> index 000..df65c321f6b
> --- /dev/null
> +++ b/gcc/t

Re: [PATCH-1, rs6000] Generate permute index directly for little endian target [PR100866]

2022-11-24 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Sorry for the late review.

on 2022/10/11 15:38, HAO CHEN GUI wrote:
> Hi,
>   This patch modifies the help function which generates permute index for
> vector byte reversion and generates permute index directly for little endian
> targets. It saves one "xxlnor" instructions on P8 little endian targets as
> the original process needs an "xxlnor" to calculate complement for the index.
> 

Nice.

> Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-10-11  Haochen Gui 
> 
> gcc/
>   PR target/100866
>   * config/rs6000/rs6000-call.cc (swap_endian_selector_for_mode):
>   Generate permute index directly for little endian targets.
>   * config/rs6000/vsx.md (revb_): Call vprem directly with
>   corresponding permute indexes.
> 
> gcc/testsuite/
>   PR target/100866
>   * gcc.target/powerpc/pr100866.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-call.cc 
> b/gcc/config/rs6000/rs6000-call.cc
> index 551968b0995..bad8e9e0e52 100644
> --- a/gcc/config/rs6000/rs6000-call.cc
> +++ b/gcc/config/rs6000/rs6000-call.cc
> @@ -2839,7 +2839,10 @@ swap_endian_selector_for_mode (machine_mode mode)
>  }
> 
>for (i = 0; i < 16; ++i)
> -perm[i] = GEN_INT (swaparray[i]);
> +if (BYTES_BIG_ENDIAN)
> +  perm[i] = GEN_INT (swaparray[i]);
> +else
> +  perm[i] = GEN_INT (~swaparray[i] & 0x001f);

IMHO, it would be good to add a function comment for this function,
it's sad that we didn't have it before.  With this patch, the selector (perm) is
expected to be used with vperm direct as shown below, it would be good to note 
it
explicitly for other potential callers too.

> 
>return force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode,
>gen_rtvec_v (16, perm)));
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..b68eba48d2c 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6096,8 +6096,8 @@ (define_expand "revb_"
>to the endian mode in use, i.e. in LE mode, put elements
>in BE order.  */
>rtx sel = swap_endian_selector_for_mode(mode);
> -  emit_insn (gen_altivec_vperm_ (operands[0], operands[1],
> -operands[1], sel));
> +  emit_insn (gen_altivec_vperm__direct (operands[0], operands[1],
> +   operands[1], sel));>  }
> 
>DONE;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr100866.c 
> b/gcc/testsuite/gcc.target/powerpc/pr100866.c
> new file mode 100644
> index 000..c708dfd502e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100866.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-final { scan-assembler-not "xxlnor" } } */

Nit: may be better with {\mxxlnor\M}?

The others look good to me.  Thanks!

BR,
Kewen

> +
> +#include 
> +
> +vector unsigned short revb(vector unsigned short a)
> +{
> +   return vec_revb(a);
> +}
> 




Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-24 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2022/11/23 00:08, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> Many thanks for your review comments!
>>
>>>>> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
>>>>>> Hi,
>>>>>>
>>>>>> As discussed in PR98125, -fpatchable-function-entry with
>>>>>> SECTION_LINK_ORDER support doesn't work well on powerpc64
>>>>>> ELFv1 because the filled "Symbol" in
>>>>>>
>>>>>>   .section name,"flags"o,@type,Symbol
>>>>>>
>>>>>> sits in .opd section instead of in the function_section
>>>>>> like .text or named .text*.
>>>>>>
>>>>>> Since we already generates one label LPFE* which sits in
>>>>>> function_section of current_function_decl, this patch is
>>>>>> to reuse it as the symbol for the linked_to section.  It
>>>>>> avoids the above ABI specific issue when using the symbol
>>>>>> concluded from current_function_decl.
>>>>>>
>>>>>> Besides, with this support some previous workarounds for
>>>>>> powerpc64 ELFv1 can be reverted.
>>>>>>
>>>>>> btw, rs6000_print_patchable_function_entry can be dropped
>>>>>> but there is another rs6000 patch which needs this rs6000
>>>>>> specific hook rs6000_print_patchable_function_entry, not
>>>>>> sure which one gets landed first, so just leave it here.
>>>>>>
>>>>>> Bootstrapped and regtested on below:
>>>>>>
>>>>>>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>>>>>>  and latest binutils 2.39.
>>>>>>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>>>>>>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>>>>>>   4) x86_64-redhat-linux with default binutils 2.30
>>>>>>  and latest binutils 2.39.
>>>>>>   5) aarch64-linux-gnu  with default binutils 2.30
>>>>>>  and latest binutils 2.39.
>>>>>>
>>
>> [snip...]
>>
>>>>>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>>>>>> index 4db8506b106..d4de6e164ee 100644
>>>>>> --- a/gcc/varasm.cc
>>>>>> +++ b/gcc/varasm.cc
>>>>>> @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, 
>>>>>> unsigned int flags,
>>>>>>  fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
>>>>>>if (flags & SECTION_LINK_ORDER)
>>>>>>  {
>>>>>> -  tree id = DECL_ASSEMBLER_NAME (decl);
>>>>>> -  ultimate_transparent_alias_target (&id);
>>>>>> -  const char *name = IDENTIFIER_POINTER (id);
>>>>>> -  name = targetm.strip_name_encoding (name);
>>>>>> -  fprintf (asm_out_file, ",%s", name);
>>>>>> +  /* For now, only section "__patchable_function_entries"
>>>>>> + adopts flag SECTION_LINK_ORDER, internal label LPFE*
>>>>>> + was emitted in default_print_patchable_function_entry,
>>>>>> + just place it here for linked_to section.  */
>>>>>> +  gcc_assert (!strcmp (name, "__patchable_function_entries"));
>>>
>>> I like the idea of removing the rs600 workaround in favour of making the
>>> target-independent more robust.  But this seems a bit hackish.  What
>>> would we do if SECTION_LINK_ORDER was used for something else in future?
>>>
>>
>> Good question!  I think it depends on how we can get the symbol for the
>> linked_to section, if adopting the name of the decl will suffer the
>> similar issue which this patch wants to fix, we have to reuse the label
>> LPFE* or some kind of new artificial label in the related section; or
>> we can just go with the name of the given decl, or something related to
>> that decl.  Since we can't predict any future uses, I just placed an
>> assertion here to ensure that we would revisit and adjust this part at
>> that time.  Does it sound reasonable to you?
> 
> Yeah, I guess that's good enough.  If the old scheme ends up being
> correct for some future use, we can make the new behaviour conditional
> on __patchable_function_entries.

Yes, we can check if the given section name is
"__patchable_function_entries".

> 
> So yeah, the patch LGTM to me, thanks.

Thanks again!  I rebased and re-tested it on x86/aarch64/powerpc64{,le},
just committed in r13-4294-gf120196382ac5a.

BR,
Kewen


Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-21 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/11/22 13:12, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2022/11/22 11:11, Kewen.Lin 写道:
>> Maybe we can adjust prepare_cmp_insn to fail if the constructed cbranchcc4
>> pattern doesn't satisfy the predicate of operand 0 rather than to assert.
>> It's something like:
>>
>> if (!insn_operand_matches (icode, 0, test))
>>   goto fail;
>>
>> or only assign and return if insn_operand_matches (icode, 0, test).
>>
>> The code makes the assumption that all this kind of cbranchcc4 patterns
>> should match what target defines for cbranchcc4 optab, but unfortunately
>> it's not sure for our port and I don't see how it should be.
> 
> Thanks for your comments.
> 
> I just drafted a patch to let it go to "fail" when predicate of operand 0 is
> not satisfied. It works and passed bootstrap on ppc64le. But my concern is
> prepare_cmp_insn is a generic function and is used to create a cmp rtx. It
> is not only called by emit_conditional* (finally called by ifcvt) but other
> functions (even new functions). If we change the logical in prepare_cmp_insn,
> we may lost some potential optimization. After all, the branch_2insn is a 
> valid
> insn.

I have one assumption that without your proposed have_cbranchcc4 change for
rs6000, for this generic prepare_cmp_insn, it would never be called with CCmode
on rs6000, since we would get ICE with icode CODE_FOR_nothing otherwise.
It means we don't lose anything than before.  Besides, excepting for those
conditional call sites, I doubt CCmode would be used for calling it.  Could
you have a check?

> 
> I think the essential of the problem is we want to exclude those comparisons
> (from cbranchcc4 used in ifcvt) which need two CC bits. So, we can change the
> logical of ifcvt - add an additional check with predicate of operand 0 when
> checking the have_cbranchcc4 flag in ifcvt.

I think that would work.  The only concern is that some use (future) of
prepare_cmp_insn like how it's used in ifcvt would need the same pre checking,
otherwise the ICE happens again.

BR,
Kewen


Re: [PATCHv2, rs6000] Enable have_cbranchcc4 on rs6000

2022-11-21 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Thanks for the explanation.

on 2022/11/21 14:18, HAO CHEN GUI wrote:
> Hi Segher,
> 
> 在 2022/11/18 20:18, Segher Boessenkool 写道:
>> I don't think we should pretend we have any conditional jumps the
>> machine does not actually have, in cbranchcc4.  When would this ever be
>> useful?  cror;beq can be quite expensive, compared to the code it would
>> replace anyway.
>>
>> If something generates those here (which then ICEs later), that is
>> wrong, fix *that*?  Is it ifcvt doing it?
> 
> "*cbranch_2insn" is a valid insn for rs6000. So it generates such insn
> at expand pass. The "prepare_cmp_insn" called by ifcvt just wants to verify
> that the following comparison rtx is valid.

Maybe we can adjust prepare_cmp_insn to fail if the constructed cbranchcc4
pattern doesn't satisfy the predicate of operand 0 rather than to assert.
It's something like:

if (!insn_operand_matches (icode, 0, test))
  goto fail;

or only assign and return if insn_operand_matches (icode, 0, test).

The code makes the assumption that all this kind of cbranchcc4 patterns
should match what target defines for cbranchcc4 optab, but unfortunately
it's not sure for our port and I don't see how it should be.

BR,
Kewen


Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-21 Thread Kewen.Lin via Gcc-patches
Hi Richard,

Many thanks for your review comments!

>>> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
>>>> Hi,
>>>>
>>>> As discussed in PR98125, -fpatchable-function-entry with
>>>> SECTION_LINK_ORDER support doesn't work well on powerpc64
>>>> ELFv1 because the filled "Symbol" in
>>>>
>>>>   .section name,"flags"o,@type,Symbol
>>>>
>>>> sits in .opd section instead of in the function_section
>>>> like .text or named .text*.
>>>>
>>>> Since we already generates one label LPFE* which sits in
>>>> function_section of current_function_decl, this patch is
>>>> to reuse it as the symbol for the linked_to section.  It
>>>> avoids the above ABI specific issue when using the symbol
>>>> concluded from current_function_decl.
>>>>
>>>> Besides, with this support some previous workarounds for
>>>> powerpc64 ELFv1 can be reverted.
>>>>
>>>> btw, rs6000_print_patchable_function_entry can be dropped
>>>> but there is another rs6000 patch which needs this rs6000
>>>> specific hook rs6000_print_patchable_function_entry, not
>>>> sure which one gets landed first, so just leave it here.
>>>>
>>>> Bootstrapped and regtested on below:
>>>>
>>>>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>>>>  and latest binutils 2.39.
>>>>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>>>>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>>>>   4) x86_64-redhat-linux with default binutils 2.30
>>>>  and latest binutils 2.39.
>>>>   5) aarch64-linux-gnu  with default binutils 2.30
>>>>  and latest binutils 2.39.
>>>>

[snip...]

>>>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>>>> index 4db8506b106..d4de6e164ee 100644
>>>> --- a/gcc/varasm.cc
>>>> +++ b/gcc/varasm.cc
>>>> @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, 
>>>> unsigned int flags,
>>>>fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
>>>>if (flags & SECTION_LINK_ORDER)
>>>>{
>>>> -tree id = DECL_ASSEMBLER_NAME (decl);
>>>> -ultimate_transparent_alias_target (&id);
>>>> -const char *name = IDENTIFIER_POINTER (id);
>>>> -name = targetm.strip_name_encoding (name);
>>>> -fprintf (asm_out_file, ",%s", name);
>>>> +/* For now, only section "__patchable_function_entries"
>>>> +   adopts flag SECTION_LINK_ORDER, internal label LPFE*
>>>> +   was emitted in default_print_patchable_function_entry,
>>>> +   just place it here for linked_to section.  */
>>>> +gcc_assert (!strcmp (name, "__patchable_function_entries"));
> 
> I like the idea of removing the rs600 workaround in favour of making the
> target-independent more robust.  But this seems a bit hackish.  What
> would we do if SECTION_LINK_ORDER was used for something else in future?
> 

Good question!  I think it depends on how we can get the symbol for the
linked_to section, if adopting the name of the decl will suffer the
similar issue which this patch wants to fix, we have to reuse the label
LPFE* or some kind of new artificial label in the related section; or
we can just go with the name of the given decl, or something related to
that decl.  Since we can't predict any future uses, I just placed an
assertion here to ensure that we would revisit and adjust this part at
that time.  Does it sound reasonable to you?

BR,
Kewen


Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare

2022-11-20 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/11/18 23:10, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Nov 17, 2022 at 02:59:00PM +0800, Kewen.Lin wrote:
>> on 2022/11/17 02:44, Segher Boessenkool wrote:
>>> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
float only comparison operators.
>>>
>>> Why?  Is that correct?  Your mail says nothing about this :-(
>>>
>>> Is there any testcase that covers this, and that shows things still
>>> generate the same code?
>>>
>>
>> Sorry for the unclear description, I thought mistakenly that it's
>> probably straightforward.
>>
>> With the change in this patch, all 14 vector float comparison operators
>> (unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
>> would be handled early in rs6000_emit_vector_compare.
>>
>> For unordered/ordered/ltgt/uneq, the new way is exactly the same
>> as what we do in rs6000_emit_vector_compare_inner, it means there is
>> no chance to get into rs6000_emit_vector_compare_inner with any of them.
> 
> Ah!  In that case, please add an assert there.  It helps catch problems,
> but much more importantly even, if helps the reader understand what is
> going on :-)

Good idea, will do.

> 
>> For eq/ge/gt, it's the same too, but they are shared with vector integer
>> comparison, I just left them alone here.  Just noticed we can remove ge
>> safely too as it's guarded with !MODE_VECTOR_INT.
> 
> ge is nasty for float, it means something different with and without
> -ffast-math (with fast-math ge means not lt, le means not gt; both can
> be done with a simple single condition, no cror needed.  (Compare to ne
> which is the same with and without -ffast-math, that is because it has a
> "not" in its definition!)
> 

It's true for scalar float comparison, but the context here is for vector
comparison, the result of comparison is still vector (of boolean), and we
have the corresponding vector comparison instruction for ge, so I think it
should be fine here.

>> For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
>> with reverse_condition_maybe_unordered and invert the result, it's the
>> same as what we have in vector.md.
>>
>> ; unge(a,b) = ~lt(a,b)
>> ; unle(a,b) = ~gt(a,b)
>> ; ne(a,b)   = ~eq(a,b)
>> ; ungt(a,b) = ~le(a,b)
>> ; unlt(a,b) = ~ge(a,b)
> 
> But for these last two do we generate identical code still?  Since
> forever we have only use cror here (with CCEQ), not crnor etc. (and will
> CCEQ still do the correct thing always then?)

For ge (~ge), yes; while for le (~le), it's not, as explained previously below.

> 
>> Then eq/ge/gt on the right side would match the cases that were mentioned
>> above.  So we just need to focus on lt and le then.
>>
>> For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
>> it's the same as what we have in vector.md:
>>
>> ; lt(a,b)   = gt(b,a)
>>
>> , and further matches the case mentioned above.
>>
>> As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
>> and further handle lt recursively, that is:
>>le = lt(a,b) || eq(a,b)
>>   = gt(b,a) || eq(a,b)
>>
>> actually this is worse than what vector.md supports:
>>
>> ; le(a,b)   = ge(b,a)
>>
>> In short, the function rs6000_emit_vector_compare_inner is only called by
>> twice in rs6000_emit_vector_compare, there is no chance to enter
>> rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
>> any more, I think it's safe to make the change in function
>> rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
>> vector float comparison can improve slightly for UNGT and LE handlings.
> 
> Thanks for the explanation!
> 
> Can you do this in multiple steps, which will make it much easier to
> review, and to spot the problem if some unexpected problem shows up?

Sure, I'll try my best to separate it into some steps and show how it
evolves gradually.

> 
>> I constructed a test case, compiled with option -O2 -ftree-vectorize
>> -fno-vect-cost-model on ppc64le, which goes into this function
>> rs6000_emit_vector_compare with all 14 vector float comparison codes,
>> the assembly of most functions doesn't change after this patch,
>> excepting for test_UNGT_{float,double} and test_LE_{float,double}.
> 
> For, this is a separate change, a separate and the other patches will
> show no changes in generated code at all.

Good point, will separate it.

> 
>> Maybe it's good to add one test case with function 
>> test_{UNGT,LE}_{float,double}
>> and scan not xvcmp{gt,eq}[sd]p.
> 
> In the patch that changes code gen for those, sure :-)
> 

Thanks for all the comments again.

BR,
Kewen


Re: [PATCH 2/2] rs6000: Refine integer comparison handlings in rs6000_emit_vector_compare

2022-11-16 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the comments!

on 2022/11/17 02:58, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 16, 2022 at 02:51:04PM +0800, Kewen.Lin wrote:
>> The current handlings in rs6000_emit_vector_compare is a bit
>> complicated to me, especially after we emit vector float
>> comparison insn with the given code directly.  This patch is
>> to refine the handlings for vector integer comparison operators,
>> it becomes not recursive, and we don't need the helper function
>> rs6000_emit_vector_compare_inner any more.
> 
> That sounds nice.
> 
>>/* In vector.md, we support all kinds of vector float point
>>   comparison operators in a comparison rtl pattern, we can
>>   just emit the comparison rtx insn directly here.  Besides,
>>   we should have a centralized place to handle the possibility
>> - of raising invalid exception.  */
>> -  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
>> + of raising invalid exception.  Also emit directly for vector
>> + integer comparison operators EQ/GT/GTU.  */
>> +  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
>> +  || rcode == EQ
>> +  || rcode == GT
>> +  || rcode == GTU)
> 
> The comment still says it handles FP only.  That would be best to keep
> imo: add a separate block of code to handle the integer stuff you want
> to add.  You get the same or better generated code, the compiler is
> smart enough.  Code is for the user to read, and C is not a portable
> assembler language.

OK, I'll make two blocks for FP and integer respectively.  I struggled
a bit updating this hunk with comments for integer comparison
consideration, someone could argue that both can share the same handling
if updating the condition.

> 
> This whole series needs to be factored better, it does way too many
> things, and only marginally related things, at every step.  Or I don't
> see it anyway :-)

OK, I was thinking patch 1/2 is to unify the current vector float
comparison handlings, patch 2/2 is to refine the remaining handlings
for vector integer comparison.  I'm pleased to factor it better, any
suggestions on concrete code is highly appreciated.  :)

btw, I constructed one test case as below, there is no assembly change
before and after this patch.

#define GT(a, b) (((a) > (b)))
#define GE(a, b) (((a) >= (b)))
#define LT(a, b) (((a) < (b)))
#define LE(a, b) (((a) <= (b)))
#define EQ(a, b) (((a) == (b)))
#define NE(a, b) (((a) != (b)))

#define TEST_VECT(NAME, TYPE)  \
  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y, \
 int *res, int n)  \
  {\
for (int i = 0; i < n; i++)\
  res[i] = NAME (x[i], y[i]);  \
  }

#include "stdint.h"

#define TEST(TYPE) \
  TEST_VECT (GT, TYPE) \
  TEST_VECT (GE, TYPE) \
  TEST_VECT (LT, TYPE) \
  TEST_VECT (LE, TYPE) \
  TEST_VECT (EQ, TYPE) \
  TEST_VECT (NE, TYPE)

TEST (int64_t)
TEST (uint64_t)
TEST (int32_t)
TEST (uint32_t)
TEST (int16_t)
TEST (uint16_t)
TEST (int8_t)
TEST (uint8_t)



BR,
Kewen


Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare

2022-11-16 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the comments!

on 2022/11/17 02:44, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
>>  * config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
>>  float only comparison operators.
> 
> Why?  Is that correct?  Your mail says nothing about this :-(
> 
> Is there any testcase that covers this, and that shows things still
> generate the same code?
> 

Sorry for the unclear description, I thought mistakenly that it's
probably straightforward.

With the change in this patch, all 14 vector float comparison operators
(unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
would be handled early in rs6000_emit_vector_compare.

For unordered/ordered/ltgt/uneq, the new way is exactly the same
as what we do in rs6000_emit_vector_compare_inner, it means there is
no chance to get into rs6000_emit_vector_compare_inner with any of them.
For eq/ge/gt, it's the same too, but they are shared with vector integer
comparison, I just left them alone here.  Just noticed we can remove ge
safely too as it's guarded with !MODE_VECTOR_INT.

For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
with reverse_condition_maybe_unordered and invert the result, it's the
same as what we have in vector.md.

; unge(a,b) = ~lt(a,b)
; unle(a,b) = ~gt(a,b)
; ne(a,b)   = ~eq(a,b)
; ungt(a,b) = ~le(a,b)
; unlt(a,b) = ~ge(a,b)

Then eq/ge/gt on the right side would match the cases that were mentioned
above.  So we just need to focus on lt and le then.

For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
it's the same as what we have in vector.md:

; lt(a,b)   = gt(b,a)

, and further matches the case mentioned above.

As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
and further handle lt recursively, that is:
   le = lt(a,b) || eq(a,b)
  = gt(b,a) || eq(a,b)

actually this is worse than what vector.md supports:

; le(a,b)   = ge(b,a)

In short, the function rs6000_emit_vector_compare_inner is only called by
twice in rs6000_emit_vector_compare, there is no chance to enter
rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
any more, I think it's safe to make the change in function
rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
vector float comparison can improve slightly for UNGT and LE handlings.

I constructed a test case, compiled with option -O2 -ftree-vectorize
-fno-vect-cost-model on ppc64le, which goes into this function
rs6000_emit_vector_compare with all 14 vector float comparison codes,
the assembly of most functions doesn't change after this patch,
excepting for test_UNGT_{float,double} and test_LE_{float,double}.

one example from 
before:

  lxvx 12,3,9
  lxvx 11,4,9
  xvcmpgtsp 0,11,12
  xvcmpeqsp 12,12,11
  xxlor 0,0,12
  xxlandc 0,32,0
  stxvx 0,5,9
  addi 9,9,16
  bdnz .L77

vs. 

after: (good to be unrolled)

  lxvx 0,4,10
  lxvx 12,3,10
  addi 9,10,16
  lxvx 11,3,9
  xvcmpgesp 12,0,12
  lxvx 0,4,9
  xvcmpgesp 0,0,11
  xxlandc 12,32,12
  stxvx 12,5,10
  addi 10,10,32
  xxlandc 0,32,0
  stxvx 0,5,9
  bdnz .L77


===
$ cat test.h

#define UNORD(a, b) (__builtin_isunordered ((a), (b)))
#define ORD(a, b) (!__builtin_isunordered ((a), (b)))
#define LTGT(a, b) (__builtin_islessgreater ((a), (b)))
#define UNEQ(a, b) (!__builtin_islessgreater ((a), (b)))
#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
#define UNGE(a, b) (!__builtin_isless ((a), (b)))
#define UNLT(a, b) (!__builtin_isgreaterequal ((a), (b)))
#define UNLE(a, b) (!__builtin_isgreater ((a), (b)))
#define GT(a, b) (((a) > (b)))
#define GE(a, b) (((a) >= (b)))
#define LT(a, b) (((a) < (b)))
#define LE(a, b) (((a) <= (b)))
#define EQ(a, b) (((a) == (b)))
#define NE(a, b) (((a) != (b)))

#define TEST_VECT(NAME, TYPE)  \
  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y, \
 int *res, int n)  \
  {\
for (int i = 0; i < n; i++)\
  res[i] = NAME (x[i], y[i]);  \
  }

===
$ cat test.c

#include "test.h"

#define TEST(TYPE) \
  TEST_VECT (UNORD, TYPE)  \
  TEST_VECT (ORD, TYPE)\
  TEST_VECT (LTGT, TYPE)   \
  TEST_VECT (UNEQ, TYPE)   \
  TEST_VECT (UNGT, TYPE)  

[PATCH] Fix typo in gimple_fold_partial_load_store_mem_ref

2022-11-15 Thread Kewen.Lin via Gcc-patches
Hi,

As Robin spotted, my recent commit r13-3716 caused an ICE
on s390 if vector access with length is enabled there (his
patch for the enablement hasn't been committed yet).  The
failure is caused by one stupid typo, the bias on s390 is
-1, so the assertion should use tree_fits_shwi_p rather
than tree_fits_uhwi_p.  Thanks for Robin's catching.

I just reproduced the ICE and verified the fix worked fine
with a cross build, the optimized dump against the test case
gcc.target/powerpc/pr107412.c looked expected.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* gimple-fold.cc (gimple_fold_partial_load_store_mem_ref): Use
tree_fits_shwi_p rather than tree_fits_uhwi_p as bias is signed.
---
 gcc/gimple-fold.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 0a212e6d0d4..f8a1875ea3e 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -5392,7 +5392,7 @@ gimple_fold_partial_load_store_mem_ref (gcall *call, tree 
vectype, bool mask_p)
return NULL_TREE;
   unsigned int nargs = gimple_call_num_args (call);
   tree bias = gimple_call_arg (call, nargs - 1);
-  gcc_assert (tree_fits_uhwi_p (bias));
+  gcc_assert (tree_fits_shwi_p (bias));
   tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
   unsigned int len = tree_to_uhwi (biased_len);
   unsigned int vect_len
--
2.25.1


[PATCH 2/2] rs6000: Refine integer comparison handlings in rs6000_emit_vector_compare

2022-11-15 Thread Kewen.Lin via Gcc-patches
Hi,

The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  This patch is
to refine the handlings for vector integer comparison operators,
it becomes not recursive, and we don't need the helper function
rs6000_emit_vector_compare_inner any more.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and P8,
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this later this week if no objections.

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove.
(rs6000_emit_vector_compare): Refine it by directly using the reversed
or swapped code, to avoid the recursion.
---
 gcc/config/rs6000/rs6000.cc | 159 
 1 file changed, 34 insertions(+), 125 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 56db12f08a0..21f4cda7b80 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15639,169 +15639,78 @@ output_cbranch (rtx op, const char *label, int 
reversed, rtx_insn *insn)
   return string;
 }

-/* Return insn for VSX or Altivec comparisons.  */
-
-static rtx
-rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
-{
-  rtx mask;
-  machine_mode mode = GET_MODE (op0);
-
-  switch (code)
-{
-default:
-  break;
-
-case GE:
-  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
-   return NULL_RTX;
-  /* FALLTHRU */
-
-case EQ:
-case GT:
-case GTU:
-  mask = gen_reg_rtx (mode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
-  return mask;
-}
-
-  return NULL_RTX;
-}
-
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
-   DMODE is expected destination mode. This is a recursive function.  */
+   DMODE is expected destination mode.  */

 static rtx
 rs6000_emit_vector_compare (enum rtx_code rcode,
rtx op0, rtx op1,
machine_mode dmode)
 {
-  rtx mask;
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
+  rtx mask = gen_reg_rtx (dmode);

   /* In vector.md, we support all kinds of vector float point
  comparison operators in a comparison rtl pattern, we can
  just emit the comparison rtx insn directly here.  Besides,
  we should have a centralized place to handle the possibility
- of raising invalid exception.  */
-  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
+ of raising invalid exception.  Also emit directly for vector
+ integer comparison operators EQ/GT/GTU.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
+  || rcode == EQ
+  || rcode == GT
+  || rcode == GTU)
 {
-  mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   return mask;
 }

   bool swap_operands = false;
-  bool try_again = false;
-
-  /* See if the comparison works as is.  */
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
-  if (mask)
-return mask;
+  bool need_invert = false;
+  enum rtx_code code = UNKNOWN;

   switch (rcode)
 {
 case LT:
-  rcode = GT;
-  swap_operands = true;
-  try_again = true;
-  break;
 case LTU:
-  rcode = GTU;
+  code = swap_condition (rcode);
   swap_operands = true;
-  try_again = true;
   break;
 case NE:
-  /* Invert condition and try again.
-e.g., A != B becomes ~(A==B).  */
-  {
-   enum rtx_code rev_code;
-   enum insn_code nor_code;
-   rtx mask2;
-
-   rev_code = reverse_condition_maybe_unordered (rcode);
-   if (rev_code == UNKNOWN)
- return NULL_RTX;
-
-   nor_code = optab_handler (one_cmpl_optab, dmode);
-   if (nor_code == CODE_FOR_nothing)
- return NULL_RTX;
-
-   mask2 = rs6000_emit_vector_compare (rev_code, op0, op1, dmode);
-   if (!mask2)
- return NULL_RTX;
-
-   mask = gen_reg_rtx (dmode);
-   emit_insn (GEN_FCN (nor_code) (mask, mask2));
-   return mask;
-  }
+case LE:
+case LEU:
+  code = reverse_condition (rcode);
+  need_invert = true;
   break;
 case GE:
+  code = GT;
+  swap_operands = true;
+  need_invert = true;
+  break;
 case GEU:
-case LE:
-case LEU:
-  /* Try GT/GTU/LT/LTU OR EQ */
-  {
-   rtx c_rtx, eq_rtx;
-   enum insn_code ior_code;
-   enum rtx_code new_code;
-
-   switch (rcode)
- {
- case  GE:
-   new_code = GT;
-   break;
-
- case GEU:
-   new_code = GTU;
-   break;
-
- case LE:
-   new_code = LT;
-   break;
-
- case LEU:
-   new_code = LTU;
-   break;
-
- default:
-   gcc_unreachable ();
- }
-
-   ior_code = opt

[PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare

2022-11-15 Thread Kewen.Lin via Gcc-patches
Hi,

All kinds of vector float comparison operators have been
supported in one rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is also for a subsequent patch to deal with some
comparison operators under trapping math enabled or disabled,
so it's important to have one centralized place for vector
float comparison handlings for better maintenance.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and P8,
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this later this week if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
float only comparison operators.
(rs6000_emit_vector_compare): Emit vector comparison insn directly for
float modes.
---
 gcc/config/rs6000/rs6000.cc | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 635aced6105..56db12f08a0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15660,10 +15660,6 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, 
rtx op0, rtx op1)
 case EQ:
 case GT:
 case GTU:
-case ORDERED:
-case UNORDERED:
-case UNEQ:
-case LTGT:
   mask = gen_reg_rtx (mode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return mask;
@@ -15681,12 +15677,24 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
machine_mode dmode)
 {
   rtx mask;
-  bool swap_operands = false;
-  bool try_again = false;
-
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));

+  /* In vector.md, we support all kinds of vector float point
+ comparison operators in a comparison rtl pattern, we can
+ just emit the comparison rtx insn directly here.  Besides,
+ we should have a centralized place to handle the possibility
+ of raising invalid exception.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
+{
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+  return mask;
+}
+
+  bool swap_operands = false;
+  bool try_again = false;
+
   /* See if the comparison works as is.  */
   mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
   if (mask)
@@ -15705,10 +15713,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   try_again = true;
   break;
 case NE:
-case UNLE:
-case UNLT:
-case UNGE:
-case UNGT:
   /* Invert condition and try again.
 e.g., A != B becomes ~(A==B).  */
   {
--
2.27.0


Re: [PATCH] rtl: Try to remove EH edges after {pro,epi}logue generation [PR90259]

2022-11-15 Thread Kewen.Lin via Gcc-patches
on 2022/11/10 11:30, Kewen.Lin wrote:
> on 2022/11/9 15:56, Eric Botcazou wrote:
>>> The previous testings on powerpc64{,le}-linux-gnu covered language Go, but
>>> not Ada.  I re-tested it with languages c,c++,fortran,objc,obj-c++,go,ada
>>> on powerpc64le-linux-gnu, the result looked good.  Both x86 and aarch64
>>> cfarm machines which I used for testing don't have gnat installed, do you
>>> think testing Ada on ppc64le is enough?
>>
>> Sure, thanks for having done it!
>>
> 
> Thanks for confirming.
> 
> I'm going to push this next Monday if there is no objection or further
> comments in this week.
> 

Committed in r13-4079.  Do we want this to be backported a week later?

BR,
Kewen


PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-10 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600190.html

Any comments are highly appreciated.

BR,
Kewen

on 2022/9/28 13:41, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600190.html
> 
> BR,
> Kewen
> 
> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> As discussed in PR98125, -fpatchable-function-entry with
>> SECTION_LINK_ORDER support doesn't work well on powerpc64
>> ELFv1 because the filled "Symbol" in
>>
>>   .section name,"flags"o,@type,Symbol
>>
>> sits in .opd section instead of in the function_section
>> like .text or named .text*.
>>
>> Since we already generates one label LPFE* which sits in
>> function_section of current_function_decl, this patch is
>> to reuse it as the symbol for the linked_to section.  It
>> avoids the above ABI specific issue when using the symbol
>> concluded from current_function_decl.
>>
>> Besides, with this support some previous workarounds for
>> powerpc64 ELFv1 can be reverted.
>>
>> btw, rs6000_print_patchable_function_entry can be dropped
>> but there is another rs6000 patch which needs this rs6000
>> specific hook rs6000_print_patchable_function_entry, not
>> sure which one gets landed first, so just leave it here.
>>
>> Bootstrapped and regtested on below:
>>
>>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>>  and latest binutils 2.39.
>>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>>   4) x86_64-redhat-linux with default binutils 2.30
>>  and latest binutils 2.39.
>>   5) aarch64-linux-gnu  with default binutils 2.30
>>  and latest binutils 2.39.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>>
>>  PR target/99889
>>
>> gcc/ChangeLog:
>>
>>  * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>>  Adjust to call function default_print_patchable_function_entry.
>>  * targhooks.cc (default_print_patchable_function_entry_1): Remove and
>>  move the flags preparation ...
>>  (default_print_patchable_function_entry): ... here, adjust to use
>>  current_function_funcdef_no for label no.
>>  * targhooks.h (default_print_patchable_function_entry_1): Remove.
>>  * varasm.cc (default_elf_asm_named_section): Adjust code for
>>  __patchable_function_entries section support with LPFE label.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/pr93195a.C: Remove the skip on powerpc*-*-* 64-bit.
>>  * gcc.target/aarch64/pr92424-2.c: Adjust LPFE1 with LPFE0.
>>  * gcc.target/aarch64/pr92424-3.c: Likewise.
>>  * gcc.target/i386/pr93492-2.c: Likewise.
>>  * gcc.target/i386/pr93492-3.c: Likewise.
>>  * gcc.target/i386/pr93492-4.c: Likewise.
>>  * gcc.target/i386/pr93492-5.c: Likewise.
>> ---
>>  gcc/config/rs6000/rs6000.cc  | 13 +-
>>  gcc/varasm.cc| 15 ---
>>  gcc/targhooks.cc | 45 +++-
>>  gcc/targhooks.h  |  3 --
>>  gcc/testsuite/g++.dg/pr93195a.C  |  1 -
>>  gcc/testsuite/gcc.target/aarch64/pr92424-2.c |  4 +-
>>  gcc/testsuite/gcc.target/aarch64/pr92424-3.c |  4 +-
>>  gcc/testsuite/gcc.target/i386/pr93492-2.c|  4 +-
>>  gcc/testsuite/gcc.target/i386/pr93492-3.c|  4 +-
>>  gcc/testsuite/gcc.target/i386/pr93492-4.c|  4 +-
>>  gcc/testsuite/gcc.target/i386/pr93492-5.c|  4 +-
>>  11 files changed, 40 insertions(+), 61 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..dba28b8e647 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -14771,18 +14771,9 @@ rs6000_print_patchable_function_entry (FILE *file,
>> unsigned HOST_WIDE_INT patch_area_size,
>> bool record_p)
>>  {
>> -  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
>> -  /* When .opd section is emitted, the function symbol
>> - default_print_patchable_function_entry_1 is emitted into the .opd 
>> section
>> - while the patchable area is emitted into the function section.
>> - Don't use SECTION_LINK_ORDER in that case.  */
>> -  if (!(TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2)
>> 

Re: [PATCH] rtl: Try to remove EH edges after {pro,epi}logue generation [PR90259]

2022-11-09 Thread Kewen.Lin via Gcc-patches
on 2022/11/9 15:56, Eric Botcazou wrote:
>> The previous testings on powerpc64{,le}-linux-gnu covered language Go, but
>> not Ada.  I re-tested it with languages c,c++,fortran,objc,obj-c++,go,ada
>> on powerpc64le-linux-gnu, the result looked good.  Both x86 and aarch64
>> cfarm machines which I used for testing don't have gnat installed, do you
>> think testing Ada on ppc64le is enough?
> 
> Sure, thanks for having done it!
> 

Thanks for confirming.

I'm going to push this next Monday if there is no objection or further
comments in this week.

BR,
Kewen


Re: [PATCH] rtl: Try to remove EH edges after {pro,epi}logue generation [PR90259]

2022-11-08 Thread Kewen.Lin via Gcc-patches
Hi Richi and Eric,

Thanks for your reviews and comments!

on 2022/11/8 18:37, Eric Botcazou wrote:
>> It looks reasonable - OK if the others CCed have no comments.
> 
> My only comment is that it needs to be tested with languages enabling -fnon-
> call-exceptions by default (Ada & Go), if not already done.
> 

The previous testings on powerpc64{,le}-linux-gnu covered language Go, but
not Ada.  I re-tested it with languages c,c++,fortran,objc,obj-c++,go,ada
on powerpc64le-linux-gnu, the result looked good.  Both x86 and aarch64
cfarm machines which I used for testing don't have gnat installed, do you
think testing Ada on ppc64le is enough?

BR,
Kewen


[PATCH] rtl: Try to remove EH edges after {pro,epi}logue generation [PR90259]

2022-11-07 Thread Kewen.Lin via Gcc-patches
Hi,

After prologue and epilogue generation, the judgement on whether
one memory access onto stack frame may trap or not could change,
since we get more exact stack information by now.

As PR90259 shows, some memory access becomes impossible to trap
any more after prologue and epilogue generation, it can make
subsequent optimization be able to remove it if safe, but it
results in unexpected control flow status due to REG_EH_REGION
note missing.

This patch proposes to try to remove EH edges with function
purge_all_dead_edges after prologue and epilogue generation,
it simplifies CFG as early as we can and don't need any fixup
in downstream passes.

CFG simplification result with PR90259's case as example:

*before*

   18: %1:TF=call [`__gcc_qdiv'] argc:0
  REG_EH_REGION 0x2
   77: NOTE_INSN_BASIC_BLOCK 3
   19: NOTE_INSN_DELETED
   20: NOTE_INSN_DELETED
  110: [%31:SI+0x20]=%1:DF
  REG_EH_REGION 0x2
  116: NOTE_INSN_BASIC_BLOCK 4
  111: [%31:SI+0x28]=%2:DF
  REG_EH_REGION 0x2
   22: NOTE_INSN_BASIC_BLOCK 5
  108: %0:DF=[%31:SI+0x20]
  REG_EH_REGION 0x2
  117: NOTE_INSN_BASIC_BLOCK 6
  109: %1:DF=[%31:SI+0x28]
  REG_EH_REGION 0x2
   79: NOTE_INSN_BASIC_BLOCK 7
   26: [%31:SI+0x18]=%0:DF
  104: pc=L69
  105: barrier

*after*

   18: %1:TF=call [`__gcc_qdiv'] argc:0
  REG_EH_REGION 0x2
   77: NOTE_INSN_BASIC_BLOCK 3
   19: NOTE_INSN_DELETED
   20: NOTE_INSN_DELETED
  110: [%31:SI+0x20]=%1:DF
  111: [%31:SI+0x28]=%2:DF
  108: %0:DF=[%31:SI+0x20]
  109: %1:DF=[%31:SI+0x28]
   26: [%31:SI+0x18]=%0:DF
  104: pc=L69
  105: barrier

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen

-
PR rtl-optimization/90259

gcc/ChangeLog:

* function.cc (rest_of_handle_thread_prologue_and_epilogue): Add
parameter fun, and call function purge_all_dead_edges.
(pass_thread_prologue_and_epilogue::execute): Name unamed parameter
as fun, and use it for rest_of_handle_thread_prologue_and_epilogue.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr90259.C: New.
---
 gcc/function.cc|  13 ++-
 gcc/testsuite/g++.target/powerpc/pr90259.C | 103 +
 2 files changed, 113 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr90259.C

diff --git a/gcc/function.cc b/gcc/function.cc
index 6474a663b30..3757ded547d 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -6540,7 +6540,7 @@ make_pass_leaf_regs (gcc::context *ctxt)
 }

 static unsigned int
-rest_of_handle_thread_prologue_and_epilogue (void)
+rest_of_handle_thread_prologue_and_epilogue (function *fun)
 {
   /* prepare_shrink_wrap is sensitive to the block structure of the control
  flow graph, so clean it up first.  */
@@ -6557,6 +6557,13 @@ rest_of_handle_thread_prologue_and_epilogue (void)
  Fix that up.  */
   fixup_partitions ();

+  /* After prologue and epilogue generation, the judgement on whether
+ one memory access onto stack frame may trap or not could change,
+ since we get more exact stack information by now.  So try to
+ remove any EH edges here, see PR90259.  */
+  if (fun->can_throw_non_call_exceptions)
+purge_all_dead_edges ();
+
   /* Shrink-wrapping can result in unreachable edges in the epilogue,
  see PR57320.  */
   cleanup_cfg (optimize ? CLEANUP_EXPENSIVE : 0);
@@ -6625,9 +6632,9 @@ public:
   {}

   /* opt_pass methods: */
-  unsigned int execute (function *) final override
+  unsigned int execute (function * fun) final override
 {
-  return rest_of_handle_thread_prologue_and_epilogue ();
+  return rest_of_handle_thread_prologue_and_epilogue (fun);
 }

 }; // class pass_thread_prologue_and_epilogue
diff --git a/gcc/testsuite/g++.target/powerpc/pr90259.C 
b/gcc/testsuite/g++.target/powerpc/pr90259.C
new file mode 100644
index 000..db75ac7fe02
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr90259.C
@@ -0,0 +1,103 @@
+/* { dg-require-effective-target long_double_ibm128 } */
+/* { dg-options "-O2 -ffloat-store -fgcse -fnon-call-exceptions 
-fno-forward-propagate -fno-omit-frame-pointer -fstack-protector-all" } */
+/* { dg-add-options long_double_ibm128 } */
+
+/* Verify there is no ICE.  */
+
+template  struct b
+{
+  static constexpr int c = a;
+};
+template  using d = b;
+struct e
+{
+  int f;
+  int
+  g ()
+  {
+return __builtin_ceil (f / (long double) h);
+  }
+  float h;
+};
+template  using k = d;
+template  class n
+{
+public:
+  e ae;
+  void af ();
+};
+template 
+void
+n::af ()
+{
+  ae.g ();
+}
+template  using m = int;
+template ::c>>
+using aj = n;
+struct o
+{
+  void
+  af ()
+  {
+al.af ();
+  }
+  aj al;
+};
+template  class am;
+template  class ao
+{
+protected:
+  static i *ap (int);
+};
+template  class p;
+template  class p : ao
+{
+public:
+  static ar
+  as (const int &p1, j...)
+  {
+(*ao::ap (p1)) (j ()...);
+  }
+};
+template  class am
+{
+ 

[PATCH] testsuite: Fix gen-vect-34.c with vect_masked_load [PR106806]

2022-11-02 Thread Kewen.Lin via Gcc-patches
Hi,

This is to fix the failure on powerpc as reported in PR106806,
the test case requires tree ifcvt pass to perform on that loop,
and it relies on masked_load support.  The fix is to guard the
expected scan with vect_masked_load effective target.

As tested on powerpc64{,le}-linux-gnu and aarch64-linux-gnu
(cfarm machine), the failures were gone.  But on
x86_64-redhat-linux (cfarm machine) the result becomes from
PASS to N/A.  I think it's expected since that machine doesn't
support AVX by default so both check_avx_available and
vect_masked_load fail, it should work fine on machines with
default AVX support, or if we adjust the current
check_avx_available with current_compiler_flags.

Is it ok for trunk?

BR,
Kewen
-
PR testsuite/106806

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/gen-vect-34.c: Adjust with vect_masked_load
effective target.
---
 gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c 
b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
index 41877e05efd..c2e5dfea35f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/gen-vect-34.c
@@ -13,4 +13,4 @@ float summul(int n, float *arg1, float *arg2)
 return res1;
 }

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
! { avr-*-* pru-*-* riscv*-*-* } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_masked_load } } } */
--
2.27.0


[PATCH] vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412]

2022-11-02 Thread Kewen.Lin via Gcc-patches
Hi,

As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
STORE} into normal vector load/store if the given length is known
to be equal to the length of the whole vector.  It would help to
improve overall cycles as normally the latency of vector access
with length in bytes is bigger than normal vector access, and it
also saves the preparation for length if constant length can not
be encoded into instruction (such as on power).

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/107412

gcc/ChangeLog:

* gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
(gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
mask_p indicating it's for mask or length, and add some handlings for
IFN LEN_{LOAD,STORE}.
(gimple_fold_mask_load): Rename to ...
(gimple_fold_partial_load): ... this, add one parameter mask_p.
(gimple_fold_mask_store): Rename to ...
(gimple_fold_partial_store): ... this, add one parameter mask_p.
(gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
and adjust calls on gimple_fold_mask_load_store_mem_ref to
gimple_fold_partial_load_store_mem_ref.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr107412.c: New test.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
folded LEN_LOAD.
---
 gcc/gimple-fold.cc| 57 ++-
 .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++
 3 files changed, 64 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index a1704784bc9..e3a087defa6 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree 
type,
   return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
 }

-/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
+/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
for the memory it references, otherwise return null.  VECTYPE is the
-   type of the memory vector.  */
+   type of the memory vector.  MASK_P indicates it's for MASK if true,
+   otherwise it's for LEN.  */

 static tree
-gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
+gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p)
 {
   tree ptr = gimple_call_arg (call, 0);
   tree alias_align = gimple_call_arg (call, 1);
-  tree mask = gimple_call_arg (call, 2);
-  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
+  if (!tree_fits_uhwi_p (alias_align))
 return NULL_TREE;

+  if (mask_p)
+{
+  tree mask = gimple_call_arg (call, 2);
+  if (!integer_all_onesp (mask))
+   return NULL_TREE;
+} else {
+  tree basic_len = gimple_call_arg (call, 2);
+  if (!tree_fits_uhwi_p (basic_len))
+   return NULL_TREE;
+  unsigned int nargs = gimple_call_num_args (call);
+  tree bias = gimple_call_arg (call, nargs - 1);
+  gcc_assert (tree_fits_uhwi_p (bias));
+  tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
+  unsigned int len = tree_to_uhwi (biased_len);
+  unsigned int vect_len
+   = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
+  if (vect_len != len)
+   return NULL_TREE;
+}
+
   unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
   if (TYPE_ALIGN (vectype) != align)
 vectype = build_aligned_type (vectype, align);
@@ -5390,16 +5410,18 @@ gimple_fold_mask_load_store_mem_ref (gcall *call, tree 
vectype)
   return fold_build2 (MEM_REF, vectype, ptr, offset);
 }

-/* Try to fold IFN_MASK_LOAD call CALL.  Return true on success.  */
+/* Try to fold IFN_{MASK,LEN}_LOAD call CALL.  Return true on success.
+   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */

 static bool
-gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call)
+gimple_fold_partial_load (gimple_stmt_iterator *gsi, gcall *call, bool mask_p)
 {
   tree lhs = gimple_call_lhs (call);
   if (!lhs)
 return false;

-  if (tree rhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (lhs)))
+  if (tree rhs
+  = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (lhs), mask_p))
 {
   gassign *new_stmt = gimple_build_assign (lhs, rhs);
   gimple_set_location (new_stmt, gimple_location (call));
@@ -5410,13 +5432,16 @@ gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall 
*call)
   return false;
 }

-/* Try to fold IFN_MASK_STORE call CALL.  Return true on success.  */
+/* Try to fold IFN_{MASK,LEN}_STORE call CALL.  Return true on success.
+   MASK_P indicates it's for MASK if true, otherwise it's for LEN.  */

 static bool
-gimple_fold_ma

[PATCH] testsuite: Adjust vect-bitfield-read-* with vect_shift and vect_long_long [PR107240]

2022-10-27 Thread Kewen.Lin via Gcc-patches
Hi,

The test cases vect-bitfield-read-* requires vector shift
target support, they need one explicit vect_shift effective
target requirement checking.  Besides, the vectype for struct
in test cases vect-bitfield-read-{2,4} is vector of long long,
we need to check effective target vect_long_long for them.
This patch can help to fix remaining vect-bitfield-* test
failures on powerpc.

Tested on powerpc64-linux-gnu P7 and P8, as well as
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-
PR testsuite/107240

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-bitfield-read-1.c: Add effective target checking
vect_shift.
* gcc.dg/vect/vect-bitfield-read-3.c: Likewise.
* gcc.dg/vect/vect-bitfield-read-5.c: Likewise.
* gcc.dg/vect/vect-bitfield-read-6.c: Likewise.
* gcc.dg/vect/vect-bitfield-read-7.c: Likewise.
* gcc.dg/vect/vect-bitfield-read-2.c: Add effective target checking
vect_shift and replace vect_int with vect_long_long.
* gcc.dg/vect/vect-bitfield-read-4.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c | 1 +
 gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c | 3 ++-
 gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c | 1 +
 gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c | 3 ++-
 gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c | 1 +
 gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c | 1 +
 gcc/testsuite/gcc.dg/vect/vect-bitfield-read-7.c | 1 +
 7 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
index 01cf34fb444..42e50d9f0c8 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */

 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
index 1a4a1579c14..a9aeefcd72c 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2.c
@@ -1,4 +1,5 @@
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */

 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
index 849f4a017e1..c7d0fd26bad 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-3.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */

 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
index 5bc9c412e96..6a3ed8c0c6f 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-4.c
@@ -1,4 +1,5 @@
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+/* { dg-require-effective-target vect_long_long } */

 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
index 1dc24d3eded..b2889df8a0a 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-5.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */

 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
index 7d24c299758..2445f531be2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-6.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */

 #include 
 #include "tree-vect.h"
diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-7.c 
b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-7.c
index 3b505db2bd3..4b1ec8a6dab 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-7.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-7.c
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */

 #include 
 #include "tree-vect.h"
--
2.27.0


Re: [PATCH] testsuite: Fix failure in test pr105586.c [PR107171]

2022-10-25 Thread Kewen.Lin via Gcc-patches
Hi Surya,

on 2022/10/14 01:02, Surya Kumari Jangala via Gcc-patches wrote:
> testsuite: Fix failure in test pr105586.c [PR107171]
> 
> The test pr105586.c fails on a big endian system when run in 32bit
> mode. The failure occurs as the test case does not guard against
> unsupported __int128.
> 

I thought this is taken as an obvious fix and it didn't ask for a
review/approval, but I just noticed it's not committed yet, so I
assumed maybe you meant to ask for an approval?

Then this patch is OK, it's even obvious, thanks!

BR,
Kewen

> 2022-10-13  Surya Kumari Jangala  
> 
> gcc/testsuite/
>   PR testsuite/107171
>   * gcc.target/powerpc/pr105586.c: Guard against unsupported
>   __int128.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr105586.c 
> b/gcc/testsuite/gcc.target/powerpc/pr105586.c
> index bd397f5..3f88a09 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr105586.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr105586.c
> @@ -1,4 +1,5 @@
>  /* { dg-options "-mdejagnu-tune=power4 -O2 -fcompare-debug 
> -fno-if-conversion -fno-guess-branch-probability" } */
> +/* { dg-require-effective-target int128 } */
>  
>  extern int bar(int i);


Re: vect: Fix wrong shift_n after widening on BE [PR107338]

2022-10-24 Thread Kewen.Lin via Gcc-patches
on 2022/10/24 20:55, Richard Biener wrote:
> On Mon, Oct 24, 2022 at 12:43 PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR107338 shows, with the use of widening loads, the
>> container_type can become a wider type, it causes us to
>> get wrong shift_n since the BIT_FIELD_REF offset actually
>> becomes bigger on BE.  Taking the case in PR107338 as
>> example, at the beginning the container type is short and
>> BIT_FIELD_REF offset is 8 and size is 4, with unpacking to
>> wider type int, the high 16 bits are zero, by viewing it
>> as type int, its offset actually becomes to 24.  So the
>> shift_n should be 4 (32 - 24 - 4) instead of 20 (32 - 8
>> - 4).
>>
>> I noticed that if we move shift_n calculation early
>> before the adjustments for widening loads (container type
>> change), it's based on all the stuffs of the original
>> container, the shfit_n calculated there is exactly what
>> we want, it can be independent of widening.  Besides, I
>> add prec adjustment together with the current adjustments
>> for widening loads, although prec's subsequent uses don't
>> require this change for now, since the container type gets
>> changed, we should keep the corresponding prec consistent.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu, powerpc64-linux-gnu P7 and P8 and
>> powerpc64le-linux-gnu P9 and P10.
>>
>> Is it ok for trunk?
> 
> OK.

Thanks Richi, committed in r13-3474.

BR,
Kewen


vect: Fix wrong shift_n after widening on BE [PR107338]

2022-10-24 Thread Kewen.Lin via Gcc-patches
Hi,

As PR107338 shows, with the use of widening loads, the
container_type can become a wider type, it causes us to
get wrong shift_n since the BIT_FIELD_REF offset actually
becomes bigger on BE.  Taking the case in PR107338 as
example, at the beginning the container type is short and
BIT_FIELD_REF offset is 8 and size is 4, with unpacking to
wider type int, the high 16 bits are zero, by viewing it
as type int, its offset actually becomes to 24.  So the
shift_n should be 4 (32 - 24 - 4) instead of 20 (32 - 8
- 4).

I noticed that if we move shift_n calculation early
before the adjustments for widening loads (container type
change), it's based on all the stuffs of the original
container, the shfit_n calculated there is exactly what
we want, it can be independent of widening.  Besides, I
add prec adjustment together with the current adjustments
for widening loads, although prec's subsequent uses don't
require this change for now, since the container type gets
changed, we should keep the corresponding prec consistent.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu, powerpc64-linux-gnu P7 and P8 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-

PR tree-optimization/107338

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Move
shfit_n calculation before the adjustments for widening loads.
---
 gcc/tree-vect-patterns.cc | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 777ba2f5903..01094e8cb86 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -1925,6 +1925,16 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   tree container_type = TREE_TYPE (container);
   tree vectype = get_vectype_for_scalar_type (vinfo, container_type);

+  /* Calculate shift_n before the adjustments for widening loads, otherwise
+ the container may change and we have to consider offset change for
+ widening loads on big endianness.  The shift_n calculated here can be
+ independent of widening.  */
+  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
+  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
+  if (BYTES_BIG_ENDIAN)
+shift_n = prec - shift_n - mask_width;
+
   /* We move the conversion earlier if the loaded type is smaller than the
  return type to enable the use of widening loads.  */
   if (TYPE_PRECISION (TREE_TYPE (container)) < TYPE_PRECISION (ret_type)
@@ -1935,6 +1945,7 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   NOP_EXPR, container);
   container = gimple_get_lhs (pattern_stmt);
   container_type = TREE_TYPE (container);
+  prec = tree_to_uhwi (TYPE_SIZE (container_type));
   vectype = get_vectype_for_scalar_type (vinfo, container_type);
   append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype);
 }
@@ -1953,12 +1964,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
shift_first = false;
 }

-  unsigned HOST_WIDE_INT shift_n = bit_field_offset (bf_ref).to_constant ();
-  unsigned HOST_WIDE_INT mask_width = bit_field_size (bf_ref).to_constant ();
-  unsigned HOST_WIDE_INT prec = tree_to_uhwi (TYPE_SIZE (container_type));
-  if (BYTES_BIG_ENDIAN)
-shift_n = prec - shift_n - mask_width;
-
   /* If we don't have to shift we only generate the mask, so just fix the
  code-path to shift_first.  */
   if (shift_n == 0)
--
2.35.4


Re: [PATCH] rs6000: using li/lis+oris/xoris to build constants

2022-10-20 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

Sorry for late review, some comments are inline.

on 2022/8/24 16:13, Jiufu Guo via Gcc-patches wrote:
> Hi,
> 
> PR106708 constaint some constants which can be support by li/lis + oris/xoris.
> 
> For constant C:
> if ((c & 0x80008000ULL) == 0x8000ULL) or say:
> 32(0)+1(1)+15(x)+1(0)+15(x), we could use li+oris to build constant 'C'.
> 
> if ((c & 0x8000ULL) == 0x8000ULL) or say:
> 32(1)+16(x)+1(1)+15(x), using li+xoris would be ok.
> 
> if ((c & 0xULL) == 0x) or say:
> 32(1)+1(0)+15(x)+16(0), using lis+xoris would be ok.
> 

Maybe it's good to add some explanation on the proposed writing "N(M)"
N continuous bit M, (x for M means either 1 or 0), and not sure if it's
good to use "||" for concatenation just like what ISA uses, the con
is it can be mis-interpreted as logical "or".

Or maybe just expand all the low 32 bits and use "1..." or "0..." for the
high 32 bits.

> This patch update rs6000_emit_set_long_const to support these forms.
> Bootstrap and regtest pass on ppc64 and ppc64le.
> 
> Is this ok for trunk?
> 
> BR,
> Jeff(Jiufu)
> 
> 
>   PR target/106708
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Using li/lis +
>   oris/xoris to build constants.

Nit: Support constants which can be built with li + oris or li/lis + xoris?

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr106708.c: New test.
>   * gcc.target/powerpc/pr106708.h: New file.
>   * gcc.target/powerpc/pr106708_1.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc   | 22 +++
>  gcc/testsuite/gcc.target/powerpc/pr106708.c   | 10 +
>  gcc/testsuite/gcc.target/powerpc/pr106708.h   |  9 
>  gcc/testsuite/gcc.target/powerpc/pr106708_1.c | 17 ++
>  4 files changed, 58 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106708_1.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..243247fb838 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10112,6 +10112,7 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>  {
>rtx temp;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
> +  HOST_WIDE_INT orig_c = c;
>  
>ud1 = c & 0x;
>c = c >> 16;
> @@ -10137,6 +10138,27 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>   gen_rtx_IOR (DImode, copy_rtx (temp),
>GEN_INT (ud1)));
>  }
> +  else if (ud4 == 0 && ud3 == 0 && (ud2 & 0x8000) && !(ud1 & 0x8000))
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +
> +  /* li+oris */
> +  emit_move_insn (copy_rtx (temp), GEN_INT (ud1));

Nit: in previous discussion on some other patch, copy_rtx is not necessary?

> +  emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (temp),
> +  GEN_INT (ud2 << 16)));
> +}

I think this hunk above can be moved to the existing "(ud3 == 0 && ud4 == 0)"
handling branch (as the diff context below), and ud2 & 0x8000 is already
asserted there, it also saves check.

> +  else if ((ud4 == 0x && ud3 == 0x)
> +&& ((ud1 & 0x8000) || (ud1 == 0 && !(ud2 & 0x8000
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +
> +  HOST_WIDE_INT imm = (ud1 & 0x8000) ? ((ud1 ^ 0x8000) - 0x8000)
> +  : ((ud2 << 16) - 0x8000);
> +  /* li/lis + xoris */
> +  emit_move_insn (copy_rtx (temp), GEN_INT (imm));
> +  emit_move_insn (dest, gen_rtx_XOR (DImode, copy_rtx (temp),
> +  GEN_INT (orig_c ^ imm)));
> +}

Same comment for copy_rtx.

>else if (ud3 == 0 && ud4 == 0)
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106708.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106708.c
> new file mode 100644
> index 000..6445fa47747
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106708.c
> @@ -0,0 +1,10 @@
> +/* PR target/106708 */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-do compile { target has_arch_ppc64 } } */
> +

Put dg-do as the first line, if you want has_arch_ppc64 to be behind dg-options,
separate it into a dg-require-effective-target.

> +#include "pr106708.h"
> +
> +/* { dg-final { scan-assembler-times {\mli\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mlis\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\moris\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxoris\M} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106708.h 
> b/gcc/testsuite/gcc.target/powerpc/pr106708.h
> new file mode 100644
> index 000..a

Re: [PATCH v3, rs6000] Change mode and insn condition for VSX scalar extract/insert instructions

2022-10-20 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Sorry for late review, some comments are inline.

on 2022/9/7 15:44, HAO CHEN GUI wrote:
> Hi,
> 
>   For scalar extract/insert instructions, exponent field can be stored in a
> 32-bit register. So this patch changes the mode of exponent field from DI to
> SI. The instructions using DI registers can be invoked with -mpowerpc64 in a
> 32-bit environment. The patch changes insn condition from TARGET_64BIT to
> TARGET_POWERPC64 for those instructions.
> 
>   This patch also changes prototypes of relevant built-ins and effective
> target of test cases.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-09-07  Haochen Gui  
> 
> gcc/
>   * config/rs6000/rs6000-builtins.def
>   (__builtin_vsx_scalar_extract_exp): Set return type to const unsigned
>   int.
>   (__builtin_vsx_scalar_extract_sig): Set return type to const unsigned
>   long long.
>   (__builtin_vsx_scalar_insert_exp): Set type of second argument to
>   unsigned int.
>   (__builtin_vsx_scalar_insert_exp_dp): Likewise.
>   * config/rs6000/vsx.md (xsxexpdp): Set mode of first operand to
>   SImode.  Remove TARGET_64BIT from insn condition.
>   (xsxsigdp): Change insn condition from TARGET_64BIT to TARGET_POWERPC64.
>   (xsiexpdp): Change insn condition from TARGET_64BIT to
>   TARGET_POWERPC64.  Set mode of third operand to SImode.
>   (xsiexpdpf): Set mode of third operand to SImode.  Remove TARGET_64BIT
>   from insn condition.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/bfp/scalar-extract-exp-0.c: Change effective
>   target from lp64 to has_arch_ppc64.
>   * gcc.target/powerpc/bfp/scalar-extract-exp-6.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-0.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-extract-sig-6.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-0.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-12.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-13.c: Likewise.
>   * gcc.target/powerpc/bfp/scalar-insert-exp-3.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f76f54793d7..ca2a1d7657e 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2847,17 +2847,17 @@
>pure vsc __builtin_vsx_lxvl (const void *, signed long);
>  LXVL lxvl {}
> 
> -  const signed long __builtin_vsx_scalar_extract_exp (double);
> +  const unsigned int __builtin_vsx_scalar_extract_exp (double);
>  VSEEDP xsxexpdp {}
> 

With the relevant define_insn condition change and this prototype
change, I think this bif can work on 32 bit environment.  So it
should be moved to section [power9] instead of [power9-64]?

If we want this supported on 32 bit, the related documentation
and test cases need some updates accordingly.

For the documentation, such as "The *scalar_extract_exp* and
scalar_extract_sig functions require *a 64-bit environment*
supporting ISA 3.0 " in [1].

For the test case, please see separated comments in test case 
part below.

[1] 
https://gcc.gnu.org/onlinedocs//gcc/PowerPC-AltiVec-Built-in-Functions-Available-on-ISA-3_002e0.html

The above comments are also applied for the bif
__builtin_vsx_scalar_insert_exp_dp.


> -  const signed long __builtin_vsx_scalar_extract_sig (double);
> +  const unsigned long long __builtin_vsx_scalar_extract_sig (double);
>  VSESDP xsxsigdp {}
> 
>const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
> -unsigned long long);
> + unsigned int);
>  VSIEDP xsiexpdp {}
> 
> -  const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned long 
> long);
> +  const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned int);
>  VSIEDPF xsiexpdpf {}
> 
>pure vsc __builtin_vsx_xl_len_r (void *, signed long);
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..9d3a2340a79 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5095,10 +5095,10 @@ (define_insn "xsxexpqp_"
> 
>  ;; VSX Scalar Extract Exponent Double-Precision
>  (define_insn "xsxexpdp"
> -  [(set (match_operand:DI 0 "register_operand" "=r")
> - (unspec:DI [(match_operand:DF 1 "vsx_register_operand" "wa")]
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> + (unspec:SI [(match_operand:DF 1 "vsx_register_operand" "wa")]
>UNSPEC_VSX_SXEXPDP))]
> -  "TARGET_P9_VECTOR && TARGET_64BIT"
> +  "TARGET_P9_VECTOR"
>"xsxexpdp %0,%x1"
>[(set_attr "type" "integer")])
> 
> @@ -5116,7 +5116,7 @@ (define_insn "xsxsigdp"
>[(set (match_operand:DI 0 "register_operand" "=r")
>   (unspec:DI [(match_operand:DF 1 "vsx_register_operand" "wa")]
>UN

Re: [PATCH] vect: Try folding first for shifted value generation [PR107240]

2022-10-19 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2022/10/19 15:43, Richard Biener wrote:
> On Wed, Oct 19, 2022 at 5:18 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR107240 shows, when both the value to be shifted and the
>> count used for shifting are constants, it doesn't actually
>> requires a target to support vector shift operations.
>>
>> This patch is to try fold_build2 for the generation of the
>> shifted value first, if it's folded, the shift is gone,
>> otherwise it's the same as before.
>>
>> It can help to make the failures of vect-bitfield-write-{2,3}.c
>> gone on Power.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux,
>> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> PR tree-optimization/107240
>>
>> gcc/ChangeLog:
>>
>> * tree-vect-patterns.cc (vect_recog_bit_insert_pattern): Attempt to
>> fold shifted value.
>> ---
>>  gcc/tree-vect-patterns.cc | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
>> index 6afd57a50c4..3beda774ec3 100644
>> --- a/gcc/tree-vect-patterns.cc
>> +++ b/gcc/tree-vect-patterns.cc
>> @@ -2098,9 +2098,11 @@ vect_recog_bit_insert_pattern (vec_info *vinfo, 
>> stmt_vec_info stmt_info,
>>tree shifted = value;
>>if (shift_n)
>>  {
>> +  tree shifted_tmp
>> +   = fold_build2 (LSHIFT_EXPR, container_type, value, shift);
>>pattern_stmt
>> = gimple_build_assign (vect_recog_temp_ssa_var (container_type),
>> -  LSHIFT_EXPR, value, shift);
>> +  shifted_tmp);
> 
> The canonical way would be to use
> 
>  gimple_seq stmts = NULL;
>  shifted = gimple_build (&stmts, LSHIFT_EXPR, container_type,
> value, shift);
>  if (!gimple_seq_empty_p (stmts))
>append_pattern_def_seq (vinfo, stmt_info,
> gimple_seq_first_stmt (stmts));
> 
> That also avoids the spurious val = constant; with your patch.
> 

Thanks for the suggestion!  This works well by testing those two
cases locally.

I searched around, it seems gimple_build doesn't provide one
interface for its users to specify a ssa name for the result,
previously we use vect_recog_temp_ssa_var for the shifted
result, but I think it's trivial.

I'll do a full testing further as before and push it if
everything goes well.  Thanks again.

BR,
Kewen

> OK if that works.
> 
> thanks,
> Richard.
> 
>>append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
>>shifted = gimple_get_lhs (pattern_stmt);
>>  }
>> --
>> 2.27.0


[PATCH] rs6000/test: Support vect_long_long effective target

2022-10-18 Thread Kewen.Lin via Gcc-patches
Hi,

Currently effective target vect_long_long doesn't have
power specific check, I think it's an oversight.  This
is to add the support which checks for has_arch_pwr8,
since we set rs6000_vector_unit[V2DImode] as:

  (TARGET_P8_VECTOR) ? VECTOR_P8_VECTOR : VECTOR_NONE;

it means its full support starts from ISA 2.07.
Although ISA 2.06 has some instructions like lxvd2x
and stxvd2x etc., it's used for testing, checking for
ISA 2.07 is more sensitive.

Tested well on powerpc64-linux-gnu P7 and P8, as well
as powerpc64le-linux-gnu P9 and P10.

As testing results show, it adds some testing coverage.

I'm going to push this soon if no objections.

BR,
Kewen
-
gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vect_long_long): Add
support for powerpc*-*-*.
---
 gcc/testsuite/lib/target-supports.exp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index fdd88e6a516..5eb7743b53a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7059,7 +7059,10 @@ proc check_effective_target_vect_long_long { } {
 || ([istarget mips*-*-*]
 && [et-is-effective-target mips_msa])
 || ([istarget s390*-*-*]
-&& [check_effective_target_s390_vx]) }}]
+&& [check_effective_target_s390_vx])
+|| ([istarget powerpc*-*-*]
+&& ![istarget powerpc-*-linux*paired*]
+&& [check_effective_target_has_arch_pwr8]) }}]
 }



[PATCH] vect: Try folding first for shifted value generation [PR107240]

2022-10-18 Thread Kewen.Lin via Gcc-patches
Hi,

As PR107240 shows, when both the value to be shifted and the
count used for shifting are constants, it doesn't actually
requires a target to support vector shift operations.

This patch is to try fold_build2 for the generation of the
shifted value first, if it's folded, the shift is gone,
otherwise it's the same as before.

It can help to make the failures of vect-bitfield-write-{2,3}.c
gone on Power.

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/107240

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_recog_bit_insert_pattern): Attempt to
fold shifted value.
---
 gcc/tree-vect-patterns.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 6afd57a50c4..3beda774ec3 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -2098,9 +2098,11 @@ vect_recog_bit_insert_pattern (vec_info *vinfo, 
stmt_vec_info stmt_info,
   tree shifted = value;
   if (shift_n)
 {
+  tree shifted_tmp
+   = fold_build2 (LSHIFT_EXPR, container_type, value, shift);
   pattern_stmt
= gimple_build_assign (vect_recog_temp_ssa_var (container_type),
-  LSHIFT_EXPR, value, shift);
+  shifted_tmp);
   append_pattern_def_seq (vinfo, stmt_info, pattern_stmt);
   shifted = gimple_get_lhs (pattern_stmt);
 }
--
2.27.0


Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-18 Thread Kewen.Lin via Gcc-patches
Hi!

on 2022/10/19 00:52, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Oct 18, 2022 at 10:17:30AM -0500, will schmidt wrote:
>> On Mon, 2022-10-17 at 13:08 -0500, Segher Boessenkool wrote:
>>> It did not happen in GCC 9 obviously.  Do you want to take a
>>> shot?  It
>>> doesn't have to be all at once, it's probably best if not even -- as
>>> I
>>> wrote in the commit message, the flag always was used to mean
>>> different
>>> things.
>>
>> As long as it's OK to be removed, I'll certainly take a shot at it. 
> 
> It is.  Thanks!
> 
>> With that in mind that may simplify things for me here.
>> I expect that
>> anything currently guarded by DIRECT_MOVE should instead be guarded by
>> POWER8.
> 
> Yes.  Which works just as well for the places that actually check
> whether the direct move insns can be used, and for everything else that
> wants p8 :-)
> 

IIUC, this discussion is saying we want to replace all TARGET_DIRECT_MOVE
with TARGET_POWER8?  I may miss something but it sounds wrong to me.
Currently TARGET_DIRECT_MOVE is used to guard many places which rely on
vector (VSR) support.  One example is one place quoted from [1]:

> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe55..be4fb902049d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3407,11 +3407,11 @@ (define_insn "vsx_extract_"
>if (element == VECTOR_ELEMENT_SCALAR_64BIT)
>  {
>if (op0_regno == op1_regno)
>   return ASM_COMMENT_START " vec_extract to same register";
>  
> -  else if (INT_REGNO_P (op0_regno) && TARGET_DIRECT_MOVE
> +  else if (INT_REGNO_P (op0_regno) && TARGET_POWER8
>  && TARGET_POWERPC64)
>   return "mfvsrd %0,%x1";

Now we want TARGET_POWER8 still on if users specify -mcpu=power8 -mno-vsx,
if we guard the above with TARGET_POWER8, it would mean we can still
have "mfvsrd" but it shouldn't be available as it relies on VSX support
which gets disabled explicitly.  So for these kinds of uses of
TARGET_DIRECT_MOVE which are for actual "direct move" insns, they should
be guarded with TARGET_P8_VECTOR instead?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603724.html

BR,
Kewen


Re: [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-10-17 Thread Kewen.Lin via Gcc-patches
Hi Will,

Thanks for fixing this, some comments are inline as below.

on 2022/9/20 00:13, will schmidt wrote:
> [PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]
> 
> Hi,
>   The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
> and can be disabled by dependent options when it should not be.
> This manifests in the issue seen in PR101865 where -mno-vsx
> mistakenly disables _ARCH_PWR8.
> 
> This change replaces the relevant TARGET_DIRECT_MOVE references
> with a TARGET_POWER8 entry so that the direct_move and power8
> features can be enabled or disabled independently.
> 
> This is done via the OPTION_MASK definitions, so this
> means that some references to the OPTION_MASK_DIRECT_MOVE
> option are now replaced with OPTION_MASK_POWER8.
> 
> The existing (and rather lengthy) commentary for DIRECT_MOVE remains
> in place in rs6000-c.cc:rs6000_target_modify_macros().  The
> if-defined logic there will now set a __DIRECT_MOVE__ define when
> TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
> purposes, but is otherwise unused.  This can be removed in a
> subsequent patch, or in an update of this patch, depending on feedback.

The mentioned commentary for DIRECT_MOVE looks out of date since
option direct_move is marked as Undocumented & WarnRemoved, it can't
be enabled/disabled explicitly.  Personally I'm inclined not to
introduce __DIRECT_MOVE__ define, since we don't have a separated
option for it now, and if users want to check the availability,
they can check __VSX__ && _ARCH_PWR8 instead.

> 
> This regests cleanly (power8,power9,power10), and resolves
> PR 101865 as represented in the tests from (1/2).
> 
> OK for trunk?
> Thanks,
> -Will
> 
> 
> gcc/
>   PR Target/101865
>   * config/rs6000/rs6000-builtin.cc
>   (rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
>   usage with TARGET_POWER8.
>   * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
>   Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
>   conditional with OPTION_MASK_POWER8.
>   * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
>   Add OPTION_MASK_POWER8 entry.
>   (POWERPC_MASKS): Same.
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
>   (rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
>   * config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
>   * config/rs6000/vsx.md (vsx_extract_): Replace
>   TARGET_DIRECT_MOVE usage with TARGET_POWER8.
>   (define_peephole2): Same.
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 3ce729c1e6de..91a0f39bd796 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
> fncode)
>  case ENB_P7:
>return TARGET_POPCNTD;
>  case ENB_P7_64:
>return TARGET_POPCNTD && TARGET_POWERPC64;
>  case ENB_P8:
> -  return TARGET_DIRECT_MOVE;
> +  return TARGET_POWER8;
>  case ENB_P8V:
>return TARGET_P8_VECTOR;
>  case ENB_P9:
>return TARGET_MODULO;
>  case ENB_P9_64:
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index ca9cc42028f7..41d51b039061 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, 
> HOST_WIDE_INT flags)
>   turned off in any of the following conditions:
>   1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
>   disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
>   enabled.
>   2. TARGET_VSX is off.  */

As mentioned above, the comments might need some updates.

> -  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
> +  if ((OPTION_MASK_DIRECT_MOVE) != 0)
> +rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
> +  if ((flags & OPTION_MASK_POWER8) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
>if ((flags & OPTION_MASK_MODULO) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
>if ((flags & OPTION_MASK_POWER10) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
> diff --git a/gcc/config/rs6000/rs6000-cpus.def 
> b/gcc/config/rs6000/rs6000-cpus.def
> index c3825bcccd84..c873f6d58989 100644
> --- a/gcc/config/rs6000/rs6000-cpus.def
> +++ b/gcc/config/rs6000/rs6000-cpus.def
> @@ -48,10 +48,11 @@
> system.  */
>  #define ISA_2_7_MASKS_SERVER (ISA_2_6_MASKS_SERVER   \
>| OPTION_MASK_P8_VECTOR\
>| OPTION_MASK_CRYPTO   \
>| OPTION_MASK_DIRECT_MOVE  \
> +  | OPTION_MASK_POWER8   \
>  

Re: [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-10-17 Thread Kewen.Lin via Gcc-patches
Hi Will,

Some comments are inline.

on 2022/9/20 00:05, will schmidt wrote:
> [PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option.
> 
> Hi,
> 
> This adds an assortment of tests to exercise the -mno-vsx option and
> confirm the impacts on the ARCH_PWR8 define.
> 
> These are based on and inspired by PR 101865, which
> reports that _ARCH_PWR8 is disabled when -mno-vsx
> is passed on the commandline.
> 
> There are a small number of failures introduced by these tests,
> those are resolved with the changes in part 2.
> 
> OK for trunk?
> Thanks,
> -Will
> 
> 
> gcc/testsuite:
>   * gcc.target/powerpc/predefine_p7-novsx.c: New test.
>   * gcc.target/powerpc/predefine_p8-noaltivec-novsx.c: New test.
>   * gcc.target/powerpc/predefine_p8-novsx.c: New test.
>   * gcc.target/powerpc/predefine_p9-novsx.c: New test.
>   * gcc.target/powerpc/predefine_pragma_vsx.c: New test.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
> new file mode 100644
> index ..e842025b4d3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
> @@ -0,0 +1,9 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR7 and ARCH_PWR8 defines gets set

Nit: s/gets/get.

> + * when we specify power7, plus options.
> +/* This is a variation of the test at issue in GCC PR 101865 */
> +/* { dg-options "-dM -E -mdejagnu-cpu=power7 -mno-vsx" } */
> +/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR7 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define 
> _ARCH_PWR8 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define __VSX__ 
> 1($|\\n)" } } */
> +/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define __ALTIVEC__ 
> 1($|\\n)" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
> new file mode 100644
> index ..c3b705ca3d48
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
> @@ -0,0 +1,7 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR8 define remains set after disabling both 
> altivec and vsx. */
> +/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-altivec -mno-vsx" } */
> +/* { dg-final { scan-file predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> _ARCH_PWR8 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> _ARCH_PWR9 1($|\\n)" } } */
> +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> __VSX__ 1($|\\n)" } } */
> +/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
> __ALTIVEC__ 1($|\\n)" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
> new file mode 100644
> index ..8b6c69b20104
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
> @@ -0,0 +1,8 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
> +   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
> +/* This is the primary test at issue in GCC PR 101865 */

Nit: the last comment missing a period.

> +/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-vsx" } */
> +/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define _ARCH_PWR8 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p8-novsx.i "(^|\\n)#define __VSX__ 
> 1($|\\n)" } } */
> +/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define __ALTIVEC__ 
> 1($|\\n)" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c 
> b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
> new file mode 100644
> index ..eef42c111663
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
> @@ -0,0 +1,10 @@
> +/* { dg-do preprocess } */
> +/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
> +   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
> +/* This is the primary test at issue in GCC PR 101865 */

Nit: it seems this part of comments were copied from the above case?
better with "s/ARCH_PWR8 define/ARCH_PWR8 and ARCH_PWR9 defines/" and
and removing the last sentence since power9 isn't the primary test?

> +/* { dg-options "-dM -E -mdejagnu-cpu=power9 -mno-vsx" } */
> +/* {xfail *-*-*} */
> +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR8 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR9 
> 1($|\\n)"  } } */
> +/* { dg-final { scan-file-not predefine_p9-novsx.i "(^|\\n)#define __VSX__ 
> 1($|\\n)" } } */
> +/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define __ALTIVEC__ 
> 1($|\\n)" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p

Re: [PATCH v2] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-10-17 Thread Kewen.Lin via Gcc-patches
Hi Iain,

on 2022/10/13 18:09, Iain Sandoe wrote:
> 
> 
>> On 12 Oct 2022, at 09:57, Iain Sandoe  wrote:
>>> On 12 Oct 2022, at 09:12, Kewen.Lin  wrote:
>>
>>> PR106680 shows that -m32 -mpowerpc64 is different from
>>> -mpowerpc64 -m32, this is determined by the way how we
>>> handle option powerpc64 in rs6000_handle_option.
>>>
>>> Segher pointed out this difference should be taken as
>>> a bug and we should ensure that option powerpc64 is
>>> independent of -m32/-m64.  So this patch removes the
>>> handlings in rs6000_handle_option and add some necessary
>>> supports in rs6000_option_override_internal instead.
>>>
>>> With this patch, if users specify -m{no-,}powerpc64, the
>>> specified value is honoured, otherwise, for 64bit it
>>> always enables OPTION_MASK_POWERPC64; while for 32bit
>>> and TARGET_POWERPC64 and OS_MISSING_POWERPC64, it disables
>>> OPTION_MASK_POWERPC64.
>>>
>>> btw, following Segher's suggestion, I did some tries to warn
>>> when OPTION_MASK_POWERPC64 is set for OS_MISSING_POWERPC64.
>>> If warn for the case that powerpc64 is specified explicitly,
>>> there are some TCs using -m32 -mpowerpc64 on ppc64-linux,
>>> they need some updates, meanwhile the artificial run
>>> with "--target_board=unix'{-m32/-mpowerpc64}'" will have
>>> noisy warnings on ppc64-linux.  If warn for the case that
>>> it's specified implicitly, they can just be initialized by
>>> TARGET_DEFAULT (like -m32 on ppc64-linux) or set from the 
>>> given cpu mask, we have to special case them and not to warn.
>>> As Segher's latest comment, I decide not to warn them and
>>> keep it consistent with before.
>>>
>>> Bootstrapped and regress-tested on:
>>> - powerpc64-linux-gnu P7 and P8 {-m64,-m32}
>>> - powerpc64le-linux-gnu P9 and P10
>>> - powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}
>>>
>>> Hi Iain, could you help to test this new patch on darwin
>>> again?  Thanks in advance!
>>
>> I kicked off a bootstrap - and 'check-gcc-c' .. if all goes well, there will 
>> be an 
>> answer in ≈ 7hours.  If something fails, the answer will be sooner ;)
> 
> bootstrapped and tested on powerpc-darwin9, with default CPU configuration.
> I have not yet tried tuning or cpu configure options.
> 
> testresults compare “nominal" against a recent set (another day elapsed time
> would be needed for a proper regtest).

Sounds good!  Many thanks again for your help!

BR,
Kewen


Re: [PATCH V4] rs6000: cannot_force_const_mem for HIGH code rtx[PR106460]

2022-10-13 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2022/10/12 14:48, Jiufu Guo via Gcc-patches wrote:
> Hi,
> 
> As the issue in PR106460, a rtx 'high:DI (symbol_ref:DI ("var_48")' is tried
> to store into constant pool and ICE occur.  But actually, this rtx represents
> partial incomplete address and can not be put into a .rodata section.
> 
> This patch updates rs6000_cannot_force_const_mem to return true for rtx(s) 
> with
> HIGH code, because these rtx(s) indicate part of address and are not ok for
> constant pool.
> 
> Below are some examples:
> (high:DI (const:DI (plus:DI (symbol_ref:DI ("xx") (const_int 12 [0xc])
> (high:DI (symbol_ref:DI ("var_1")..)))
> 
> This patch updated the patchV3 according previous comments.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602308.html
> 
> Bootstrap and regtest pass on ppc64 and ppc64le.
> Is this ok for trunk.

This patch is ok, thanks!

BR,
Kewen

> 
> BR,
> Jeff(Jiufu)
> 
>   PR target/106460
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): Return true
>   for HIGH code rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr106460.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc |  7 +--
>  gcc/testsuite/gcc.target/powerpc/pr106460.c | 12 
>  2 files changed, 17 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106460.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 5f347e9574f..dab66f9213a 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9759,8 +9759,11 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* If GET_CODE (x) is HIGH, the 'X' represets the high part of a 
> symbol_ref.
> + It can not be put into a constant pool.  e.g.
> + (high:DI (unspec:DI [(symbol_ref/u:DI ("*.LC0")..)
> + (high:DI (symbol_ref:DI ("var")..)).  */
> +  if (GET_CODE (x) == HIGH)
>  return true;
>  
>/* A TLS symbol in the TOC cannot contain a sum.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106460.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> new file mode 100644
> index 000..aae4b015bba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> @@ -0,0 +1,12 @@
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O1 -mdejagnu-cpu=power10" } */
> +
> +/* (high:DI (symbol_ref:DI ("var_48")..))) should not cause ICE. */
> +extern short var_48;
> +void
> +foo (double *r)
> +{
> +  if (var_48)
> +*r = 1234.5678;
> +}
> +




Re: [PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-10-12 Thread Kewen.Lin via Gcc-patches
Hi Segher!

on 2022/10/10 21:58, Segher Boessenkool wrote:
> On Mon, Oct 10, 2022 at 10:15:58AM +0800, Kewen.Lin wrote:
>> on 2022/10/4 05:15, Segher Boessenkool wrote:
>>> Right.  If If mpowerpc64 is enabled while OS_MISSING_POWERPC64, warn for
>>> that; 
>>
>> Currently if option powerpc64 is enabled explicitly while 
>> OS_MISSING_POWERPC64,
>> there is no warning.  One typical case is -m32 compilation on ppc64.  I made
>> a patch to warn for this case as you suggested (btw, this change can be taken
>> separately from this rework), it caused some test cases to fail as below:
> 
> "Explicitly" means the user says "-m32 -mpowerpc64".
> 
> I wonder what "on powerpc64" means in what you say, and why that would
> matter?

I guess you meant to ask "on ppc64"?  I meant to say "ppc64-linux", sorry
for the confusion.  On ppc64-linux, OS_MISSING_POWERPC64 is defined as
!TARGET_64BIT, the explicit option "-m32 -mpowerpc64" doesn't warn before
but it's made to warn as the patch mentioned above, then need some test
cases updates.

> 
>> gcc.dg/vect/vect-82_64.c
>> gcc.dg/vect/vect-83_64.c
>> gcc.target/powerpc/bswap64-4.c
>> gcc.target/powerpc/ppc64-double-1.c
>> gcc.target/powerpc/pr106680-4.c 
>> gcc.target/powerpc/rs6000-fpint-2.c
>>
>> It's fine to fix them with one additional option "-w" to disable the warning.
>> But IIUC one concern is that if we want to test with 
>> "--target_board=unix'{-m32,
>> -m32/-mpowerpc64}'", the latter combination will always have this warning,
>> with one extra "-w" (that is -m32/-mpowerpc64/-w) can make some cases which
>> aim to check warning msg ineffective.  So maybe we want to re-consider it
>> (like just leaving it as before)?
> 
> There will always be false positives (and negatives!) if you put any
> warning options in RUNTESTFLAGS.  -w is merely louder than most :-)
> 
> But leave this as further improvement.  Maybe put in a comment.

OK.

> 
>>> and if mpowerpc64 was only implicit, disable it as well (and say
>>> we did!)
>>
>> But on ppc64 linux, for -m32 compilation mpowerpc64 is implicitly enabled
>> since it's with bi-arch supported, I made a patch to disable it as well as
>> warn it, it can't be bootstrapped since it warned for -m32 build (-Werror)
>> and failed.  So I refined it to something like:
>>
>> +  /* With RS6000_BI_ARCH defined (bi-architecture (32/64) 
>> supported),
>> + TARGET_DEFAULT has bit MASK_POWERPC64 on by default, to keep 
>> the
>> + behavior consistent (like: no warnings for -m32 on ppc64), we
>> + just sliently disable it.  Otherwise, disable it and warn.  */
>> +  rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
>> +#ifndef RS6000_BI_ARCH
>> +  warning (0, "powerpc64 is unexpected to be enabled on the "
>> +  "current OS");
>> +#endif
> 
> It has nothing to do with biarch.  Let's just not warn if it is so much
> work to do it correctly.  We never did before, and no one complained,
> how bad can it be :-)
> 

OK, I made a patch v2 which doesn't try to warn for them, fully tested it
and just posted at:

https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603350.html

BR,
Kewen


[PATCH v2] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-10-12 Thread Kewen.Lin via Gcc-patches
Hi,

PR106680 shows that -m32 -mpowerpc64 is different from
-mpowerpc64 -m32, this is determined by the way how we
handle option powerpc64 in rs6000_handle_option.

Segher pointed out this difference should be taken as
a bug and we should ensure that option powerpc64 is
independent of -m32/-m64.  So this patch removes the
handlings in rs6000_handle_option and add some necessary
supports in rs6000_option_override_internal instead.

With this patch, if users specify -m{no-,}powerpc64, the
specified value is honoured, otherwise, for 64bit it
always enables OPTION_MASK_POWERPC64; while for 32bit
and TARGET_POWERPC64 and OS_MISSING_POWERPC64, it disables
OPTION_MASK_POWERPC64.

btw, following Segher's suggestion, I did some tries to warn
when OPTION_MASK_POWERPC64 is set for OS_MISSING_POWERPC64.
If warn for the case that powerpc64 is specified explicitly,
there are some TCs using -m32 -mpowerpc64 on ppc64-linux,
they need some updates, meanwhile the artificial run
with "--target_board=unix'{-m32/-mpowerpc64}'" will have
noisy warnings on ppc64-linux.  If warn for the case that
it's specified implicitly, they can just be initialized by
TARGET_DEFAULT (like -m32 on ppc64-linux) or set from the 
given cpu mask, we have to special case them and not to warn.
As Segher's latest comment, I decide not to warn them and
keep it consistent with before.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P7 and P8 {-m64,-m32}
  - powerpc64le-linux-gnu P9 and P10
  - powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}

Hi Iain, could you help to test this new patch on darwin
again?  Thanks in advance!

Is it ok for trunk if darwin testing goes well?

BR,
Kewen
-
PR target/106680

gcc/ChangeLog:

* common/config/rs6000/rs6000-common.cc (rs6000_handle_option): Remove
the adjustment for option powerpc64 in -m64 handling, and remove the
whole -m32 handling.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): When no
explicit powerpc64 option is provided, enable it for -m64.  For 32 bit
and OS_MISSING_POWERPC64, disable powerpc64 if it's enabled but not
specified explicitly.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106680-1.c: New test.
* gcc.target/powerpc/pr106680-2.c: New test.
* gcc.target/powerpc/pr106680-3.c: New test.
* gcc.target/powerpc/pr106680-4.c: New test.

2022-10-12  Kewen Lin  
Iain Sandoe  
---
 gcc/common/config/rs6000/rs6000-common.cc | 11 --
 gcc/config/rs6000/rs6000.cc   | 37 ++-
 gcc/testsuite/gcc.target/powerpc/pr106680-1.c | 13 +++
 gcc/testsuite/gcc.target/powerpc/pr106680-2.c | 14 +++
 gcc/testsuite/gcc.target/powerpc/pr106680-3.c | 13 +++
 gcc/testsuite/gcc.target/powerpc/pr106680-4.c | 17 +
 6 files changed, 85 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-4.c

diff --git a/gcc/common/config/rs6000/rs6000-common.cc 
b/gcc/common/config/rs6000/rs6000-common.cc
index 8e393d08a23..c76b5c27bb6 100644
--- a/gcc/common/config/rs6000/rs6000-common.cc
+++ b/gcc/common/config/rs6000/rs6000-common.cc
@@ -119,19 +119,8 @@ rs6000_handle_option (struct gcc_options *opts, struct 
gcc_options *opts_set,
 #else
 case OPT_m64:
 #endif
-  opts->x_rs6000_isa_flags |= OPTION_MASK_POWERPC64;
   opts->x_rs6000_isa_flags |= (~opts_set->x_rs6000_isa_flags
   & OPTION_MASK_PPC_GFXOPT);
-  opts_set->x_rs6000_isa_flags |= OPTION_MASK_POWERPC64;
-  break;
-
-#ifdef TARGET_USES_AIX64_OPT
-case OPT_maix32:
-#else
-case OPT_m32:
-#endif
-  opts->x_rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
-  opts_set->x_rs6000_isa_flags |= OPTION_MASK_POWERPC64;
   break;

 case OPT_mminimal_toc:
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index e6fa3ad0eb7..e37d99deb61 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3648,17 +3648,12 @@ rs6000_option_override_internal (bool global_init_p)
   rs6000_pointer_size = 32;
 }

-  /* Some OSs don't support saving the high part of 64-bit registers on context
- switch.  Other OSs don't support saving Altivec registers.  On those OSs,
- we don't touch the OPTION_MASK_POWERPC64 or OPTION_MASK_ALTIVEC settings;
- if the user wants either, the user must explicitly specify them and we
- won't interfere with the user's specification.  */
+  /* Some OSs don't support saving Altivec registers.  On those OSs, we don't
+ touch the OPTION_MASK_ALTIVEC settings; if the user wants it, the user
+ must explicitly specify it and we won't interfere with the user's
+ specification.  */

   set_masks = POWERPC_MASKS;
-#ifdef OS_MIS

Re: [PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-10-09 Thread Kewen.Lin via Gcc-patches
Hi Segher!

Thanks for the comments again!

on 2022/10/4 05:15, Segher Boessenkool wrote:
> On Fri, Sep 30, 2022 at 08:15:37PM +0800, Kewen.Lin wrote:
>> on 2022/9/30 01:11, Segher Boessenkool wrote:
 +#ifdef OS_MISSING_POWERPC64
 +  else if (OS_MISSING_POWERPC64)
 +  /* It's unexpected to have OPTION_MASK_POWERPC64 on for OSes which
 + miss powerpc64 support, so disable it.  */
 +  rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
 +#endif
>>>
>>> All silent stuff is always bad.
>>
>> OK, with more testings for replacing warning instead of silently disablement
>> I noticed that some disablement is needed, one typical case is -m32 
>> compilation
>> on ppc64, we have OPTION_MASK_POWERPC64 on from TARGET_DEFAULT which is used
>> for initialization (It makes sense to have it on in TARGET_DEFAULT because
>> of it's 64 bit cpu).  And -m32 compilation matches OS_MISSING_POWERPC64
>> (!TARGET_64BIT), so it's the case that we have an implicit 
>> OPTION_MASK_POWERPC64
>> on and OS_MISSING_POWERPC64 holds, but it's unexpected not to disable it but
>> warn it.
> 
> Right.  If If mpowerpc64 is enabled while OS_MISSING_POWERPC64, warn for
> that; 
> 

Currently if option powerpc64 is enabled explicitly while OS_MISSING_POWERPC64,
there is no warning.  One typical case is -m32 compilation on ppc64.  I made
a patch to warn for this case as you suggested (btw, this change can be taken
separately from this rework), it caused some test cases to fail as below:

gcc.dg/vect/vect-82_64.c
gcc.dg/vect/vect-83_64.c
gcc.target/powerpc/bswap64-4.c
gcc.target/powerpc/ppc64-double-1.c
gcc.target/powerpc/pr106680-4.c 
gcc.target/powerpc/rs6000-fpint-2.c

It's fine to fix them with one additional option "-w" to disable the warning.
But IIUC one concern is that if we want to test with "--target_board=unix'{-m32,
-m32/-mpowerpc64}'", the latter combination will always have this warning,
with one extra "-w" (that is -m32/-mpowerpc64/-w) can make some cases which
aim to check warning msg ineffective.  So maybe we want to re-consider it
(like just leaving it as before)?


> and if mpowerpc64 was only implicit, disable it as well (and say
> we did!)

But on ppc64 linux, for -m32 compilation mpowerpc64 is implicitly enabled
since it's with bi-arch supported, I made a patch to disable it as well as
warn it, it can't be bootstrapped since it warned for -m32 build (-Werror)
and failed.  So I refined it to something like:

+  /* With RS6000_BI_ARCH defined (bi-architecture (32/64) supported),
+ TARGET_DEFAULT has bit MASK_POWERPC64 on by default, to keep the
+ behavior consistent (like: no warnings for -m32 on ppc64), we
+ just sliently disable it.  Otherwise, disable it and warn.  */
+  rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
+#ifndef RS6000_BI_ARCH
+  warning (0, "powerpc64 is unexpected to be enabled on the "
+  "current OS");
+#endif


BR,
Kewen


Re: [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-09-30 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/9/30 04:31, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Aug 24, 2022 at 04:17:07PM +0800, Kewen.Lin wrote:
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -14771,18 +14771,9 @@ rs6000_print_patchable_function_entry (FILE *file,
>> unsigned HOST_WIDE_INT patch_area_size,
>> bool record_p)
>>  {
>> -  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
>> -  /* When .opd section is emitted, the function symbol
>> - default_print_patchable_function_entry_1 is emitted into the .opd 
>> section
>> - while the patchable area is emitted into the function section.
>> - Don't use SECTION_LINK_ORDER in that case.  */
>> -  if (!(TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2)
>> -  && HAVE_GAS_SECTION_LINK_ORDER)
>> -flags |= SECTION_LINK_ORDER;
>> -  default_print_patchable_function_entry_1 (file, patch_area_size, record_p,
>> -flags);
>> +  default_print_patchable_function_entry (file, patch_area_size, record_p);
>>  }
> 
> Please don't define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY at all,
> instead, and remove this whole function?

This hook is still needed for "ELFv2 support rework" which
was just committed in r13-2984.  There is also a note
explaining this in the original mail: 

"btw, rs6000_print_patchable_function_entry can be dropped
but there is another rs6000 patch which needs this rs6000
specific hook rs6000_print_patchable_function_entry, not
sure which one gets landed first, so just leave it here."

> 
> The rs6000 changes are okay like that, thanks!

Thanks!

BR,
Kewen


Re: [PATCH v4] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-09-30 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the review comments!

on 2022/9/28 23:22, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 25, 2022 at 01:50:28PM +0800, Kewen.Lin wrote:
>> --- a/gcc/config/rs6000/rs6000-internal.h
>> +++ b/gcc/config/rs6000/rs6000-internal.h
>> @@ -183,10 +183,15 @@ extern tree rs6000_fold_builtin (tree fndecl 
>> ATTRIBUTE_UNUSED,
>>   tree *args ATTRIBUTE_UNUSED,
>>   bool ignore ATTRIBUTE_UNUSED);
>>
>> +extern void rs6000_print_patchable_function_entry (FILE *,
>> +   unsigned HOST_WIDE_INT,
>> +   bool);
>> +
>>  extern bool rs6000_passes_float;
>>  extern bool rs6000_passes_long_double;
>>  extern bool rs6000_passes_vector;
>>  extern bool rs6000_returns_struct;
>>  extern bool cpu_builtin_p;
>>
>> +
>>  #endif
> 
> No new random empty lines please.
> 
>> + point would be 2, 6 and 14.  It's possible to support those
>> + other counts of nops by not making a local entry point, but
>> + we don't have clear user cases for them, so leave them
> 
> "use cases"
> 
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -16717,9 +16717,13 @@ the area size or to remove it completely on a 
>> single function.
>>  If @code{N=0}, no pad location is recorded.
>>
>>  The NOP instructions are inserted at---and maybe before, depending on
>> -@var{M}---the function entry address, even before the prologue.
>> +@var{M}---the function entry address, even before the prologue.  On
>> +PowerPC with the ELFv2 ABI, for one function with dual entry points,
>> +the local entry point is taken as the function entry for generation.
> 
> I think "the local entry point is this function entry address" is a bit
> clearer.
> 
>> -The maximum value of @var{N} and @var{M} is 65535.
>> +The maximum value of @var{N} and @var{M} is 65535.  On PowerPC with the
>> +ELFv2 ABI, for one function with dual entry points, the supported values
>> +for @var{M} are 0, 2, 6 and 14.
> 
> "for a function"
> 
> Okay for trunk with those trivial chnges.  Thanks!
> 

Updated as all above comments, re-tested and committed in r13-2984.  Thanks!


BR,
Kewen


Re: [PATCH] rs6000/test: Adjust pr104992.c with vect_int_mod [PR106516]

2022-09-30 Thread Kewen.Lin via Gcc-patches
Hi Segher!

on 2022/9/28 22:55, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Aug 24, 2022 at 04:17:55PM +0800, Kewen.Lin wrote:
>> As PR106516 shows, we can get unexpected gimple outputs for
>> function thud on some target which supports modulus operation
>> for vector int.  This patch introduces one effective target
>> vect_int_mod for it, then adjusts the test case with it.
> 
>> +# Return 1 if the target supports vector int modulus, 0 otherwise.
>> +
>> +proc check_effective_target_vect_int_mod { } {
>> +return [check_cached_effective_target_indexed vect_int_mod {
>> +  expr { [istarget powerpc*-*-*]
>> + && [check_effective_target_power10_ok] }}]
>> +}
> 
> power10_ok does not mean the vmod[su][wdq] instructions will be
> generated.  You need to test if we have -mcpu=power10 or such, so,
> check_effective_target_has_arch_pwr10 .

Indeed, the context is different from those cases in gcc.target/powerpc
which have -mdejagnu-cpu=power10 normally.  Thanks for catching and 
the correction!

> 
> _ok tests if it is okay to enable .  _hw tests if the hardware
> can do .  has_arch_ tests if the compiler is asked to generate
> code for  (which is reflected in the _ARCH_* preprocessor symbols,
> hence the name).
> 
> Okay for trunk with the correct check_effective_target_* .  Thanks!
> 

Thanks, re-tested as before, committed in r13-2983.

BR,
Kewen


Re: [PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-09-30 Thread Kewen.Lin via Gcc-patches
on 2022/9/30 01:11, Segher Boessenkool wrote:
> On Thu, Sep 29, 2022 at 01:45:16PM +0800, Kewen.Lin wrote:
>> I found this flag is mainly related to tune setting and spotted that we have 
>> some code
>> for tune setting when no explicit cpu is given. 
>>
>> ...
>>
>>   else
>> {
>>   size_t i;
>>   enum processor_type tune_proc
>>  = (TARGET_POWERPC64 ? PROCESSOR_DEFAULT64 : PROCESSOR_DEFAULT);
>>
>>   tune_index = -1;
>>   for (i = 0; i < ARRAY_SIZE (processor_target_table); i++)
>>  if (processor_target_table[i].processor == tune_proc)
>>{
>>  tune_index = i;
>>  break;
>>}
>> }
> 
> Ah cool, that needs fixing yes.
> 
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -3702,7 +3702,7 @@ rs6000_option_override_internal (bool global_init_p)
>>else
>>  {
>>/* PowerPC 64-bit LE requires at least ISA 2.07.  */
>> -  const char *default_cpu = (!TARGET_POWERPC64
>> +  const char *default_cpu = (!TARGET_POWERPC64 && TARGET_32BIT
>>   ? "powerpc"
>>   : (BYTES_BIG_ENDIAN
>>  ? "powerpc64"
> 
> ... but not like that.  If this snippet should happen later just move it
> later.  Or introduce a new variable to make the control flow less
> confused.  Or something else.  But don't make the code more complex,
> introducing more special cases like this.

Agree, the diff was mainly to check if it's the root cause.  I think we
need to place TARGET_POWERPC64 enablement for 64 bit before this hunk,
I've adjusted it in the new version, will post it once it's full tested.

> 
>> +#ifdef OS_MISSING_POWERPC64
>> +  else if (OS_MISSING_POWERPC64)
>> +/* It's unexpected to have OPTION_MASK_POWERPC64 on for OSes which
>> +   miss powerpc64 support, so disable it.  */
>> +rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
>> +#endif
> 
> All silent stuff is always bad.
> 

OK, with more testings for replacing warning instead of silently disablement
I noticed that some disablement is needed, one typical case is -m32 compilation
on ppc64, we have OPTION_MASK_POWERPC64 on from TARGET_DEFAULT which is used
for initialization (It makes sense to have it on in TARGET_DEFAULT because
of it's 64 bit cpu).  And -m32 compilation matches OS_MISSING_POWERPC64
(!TARGET_64BIT), so it's the case that we have an implicit OPTION_MASK_POWERPC64
on and OS_MISSING_POWERPC64 holds, but it's unexpected not to disable it but
warn it.

BR,
Kewen

> If things are done well, we will end up with *less* code than what we
> had before, not more!
> 
> 
> Segher


Re: [PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-09-30 Thread Kewen.Lin via Gcc-patches
Hi Segher & Iain!

on 2022/9/30 02:37, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 29, 2022 at 07:25:44PM +0100, Iain Sandoe wrote:
>>> On 29 Sep 2022, at 18:04, Segher Boessenkool  
>>> wrote:
>>> On Thu, Sep 29, 2022 at 09:16:33AM +0100, Iain Sandoe wrote:
 Which means that we do not report an error, but a warning, and then we 
 force 64b on (taking
 the user’s intention to be specified by the explicit ‘-m64’).
>>>
>>> And that is wrong.  Any silent overriding of what the user says is bad.
>>
>> It is not silent - it warns and then carries on, 
> 
> Yes, but I meant the status quo.  We agree :-)
> 
>>> Not overriding it (and then later ICEing) is bad as well, so it should
>>> be an error here.  And in generic code anyway.
>>
>> As noted, if that change is made we will see what the fallout is :)
> 
> Hopefully it magically makes everything fine ;-)

Okay, if we want to unify the behavior everywhere in generic code,
I'll make a separated follow-up patch for it.  :)

BR,
Kewen


Re: [PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-09-29 Thread Kewen.Lin via Gcc-patches
Hi Iain,

Thanks again for your help!!

on 2022/9/29 16:16, Iain Sandoe wrote:
> Hi Kewen,
> 
> thanks for looking at this!
> (I suspect it would also affect a 32b linux host with a 64b multilib)
> 

Quite reasonable suspicion.

> quite likely powerpc-darwin is the only 32b ppc host in regular testing.
> 
[...snip...]
>>
>> I'm testing the attached diff which can be applied on top of the previous 
>> proposed patch
>> on ppc64 and ppc64le, could you help to test it can fix the issue?
> 
> It does work on a cross from x86_64-darwin => powerpc-darwin, I can also do 
> compile-only
> tests there with a dummy board and the new tests pass with one minor tweak as 
> described
> below.
> 

Nice!  How blind I was, I should have searched for "requires.*PowerPC64".

> full regstrap on the G5 will take a day or so .. but I’ll do the C target 
> tests first to get a heads up.
> 

Thanks!  I think the C target tests is enough for now.  I just refined the 
patch by
addressing Segher's review comments and some other adjustments, I'm going to 
test it
on ppc64/ppc64le/aix first, if everything goes well, I'll ask your help for a 
full
regstrap on the new version.

> 
> 
> OK. So one small wrinkle, 
> 
> Darwin already has 
> 
>   if (TARGET_64BIT && ! TARGET_POWERPC64)
> {
>   rs6000_isa_flags |= OPTION_MASK_POWERPC64;
>   warning (0, "%qs requires PowerPC64 architecture, enabling", "-m64");
> }
> 
> in darwin_rs6000_override_options()
> 
> Which means that we do not report an error, but a warning, and then we force 
> 64b on (taking
> the user’s intention to be specified by the explicit ‘-m64’).
> 
> If there’s a strong feeling that this should really be an error, then I could 
> make that change and
> see what fallout results.

IMHO it's fine to leave it unchanged, aix also follows the same idea emitting 
warning
instead of error, there are probably some actual user cases relying on this 
behavior,
changing it can affect them.  Thanks for bringing this up anyway!

> 
> the patch below amends the test expectations to include Darwin with the 
> warning it currently
> reports.

Will incorporate!  Thanks agian!

BR,
Kewen


Re: [PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-09-28 Thread Kewen.Lin via Gcc-patches
Hi Segher!

Thanks for the review comments!!

on 2022/9/29 06:04, Segher Boessenkool wrote:
> On Wed, Sep 28, 2022 at 01:30:46PM +0800, Kewen.Lin wrote:
>> PR106680 shows that -m32 -mpowerpc64 is different from
>> -mpowerpc64 -m32, this is determined by the way how we
>> handle option powerpc64 in rs6000_handle_option.
>>
>> Segher pointed out this difference should be taken as
>> a bug and we should ensure that option powerpc64 is
>> independent of -m32/-m64.  So this patch removes the
>> handlings in rs6000_handle_option and add some necessary
>> supports in rs6000_option_override_internal instead.
> 
> Thanks!
> 
>> With this patch, if users specify -m{no-,}powerpc64, the
>> specified value is honoured, otherwise, for 64bit it
>> always enables OPTION_MASK_POWERPC64 while for 32bit
>> it disables OPTION_MASK_POWERPC64 if OS_MISSING_POWERPC64.
> 
> If the user says -m64 -mno-powerpc64 it should error, and perhaps -m32
> -mpowerpc64 should warn if OS_MISSING_POWERPC64?

OK ...

> 
>> -  /* Some OSs don't support saving the high part of 64-bit registers on 
>> context
>> - switch.  Other OSs don't support saving Altivec registers.  On those 
>> OSs,
>> - we don't touch the OPTION_MASK_POWERPC64 or OPTION_MASK_ALTIVEC 
>> settings;
>> - if the user wants either, the user must explicitly specify them and we
>> - won't interfere with the user's specification.  */
>> +  /* Some OSs don't support saving Altivec registers.  On those OSs, we 
>> don't
>> + touch the OPTION_MASK_POWERPC64 or OPTION_MASK_ALTIVEC settings; if the
>> + user wants either, the user must explicitly specify them and we won't
>> + interfere with the user's specification.  */
>>
>>set_masks = POWERPC_MASKS;
>> -#ifdef OS_MISSING_POWERPC64
>> -  if (OS_MISSING_POWERPC64)
>> -set_masks &= ~OPTION_MASK_POWERPC64;
>> -#endif
> 
> As I said elsewhere, it probably is helpful if we still warn here for
> -m32 -mpowerpc64 with OS_MISSING_POWERPC64 (or without the -m32 even,
> same thing).
> 

OK ... 

>> +  /* With option powerpc64 specified explicitly (either on or off), even if
>> + being compiled for 64 bit we don't need to check if it's disabled here,
>> + since subtargets will check and raise an error message if necessary
>> + later.  But without option powerpc64 specified explicitly, we need to
>> + ensure powerpc64 enabled for 64 bit and disabled on those OSes with
>> + OS_MISSING_POWERPC64, since they don't support saving the high part of
>> + 64-bit registers on context switch.  */
>> +  if (!(rs6000_isa_flags_explicit & OPTION_MASK_POWERPC64))
>> +{
>> +  if (TARGET_64BIT)
>> +/* Make sure we always enable it by default for 64 bit.  */
>> +rs6000_isa_flags |= OPTION_MASK_POWERPC64;
>> +#ifdef OS_MISSING_POWERPC64
>> +  else if (OS_MISSING_POWERPC64)
>> +/* It's unexpected to have OPTION_MASK_POWERPC64 on for OSes which
>> +   miss powerpc64 support, so disable it.  */
>> +rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
>> +#endif
>> +}
> 
> Aha.  Please don't, just warn instead?  Silently disabling such stuff is
> the worst option :-(

... I'll update this to warn instead as you suggested. :)

> 
>> +/* { dg-error "'-m64' requires a PowerPC64 cpu" "PR106680" { target 
>> powerpc*-*-linux* powerpc-*-rtems* } 0 } */
> 
> Everything except AIX even?  So it will include Darwin as well (and the
> BSDs, and powerpc*-elf, etc.)

I found this message only existed in file rtems.h and function 
rs6000_linux64_override_options,
the latter is used by files linux64.h and freebsd64.h, I guess we just want to 
add one more
powerpc*-*-freebsd*, but leave the others alone (and update this as needed 
later)?

> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106680-4.c
>> @@ -0,0 +1,16 @@
>> +/* Skip this on aix, otherwise it emits the error message like "64-bit
>> +   computation with 32-bit addressing not yet supported" on aix.  */
>> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
>> +/* { dg-require-effective-target ilp32 } */
>> +/* { dg-options "-mpowerpc64 -m32 -O2" } */
> 
> If you have -m32 you don't need ilp32, and the other way around.
> 

Will update!  I was afraid the dejagnu version mattered, it can be:
"-mpowerpc64 -m32 -O2 -m64" or "-m64 -mpowerpc64 -m32 -O2", but just
realized -mpowerpc64 would always take effect, useless worry.  :)

>> +/* Verify option -m32 doesn't override option -mpowerpc64.
>> +   If option -mpowerpc64 gets overridden, the assembly would
>> +   end up with addc and adde.  */
>> +/* { dg-final { scan-assembler-not "addc" } } */
>> +/* { dg-final { scan-assembler-not "adde" } } */
> 
> Lol, nice :-)
> 
> "adde" is a frequent substring, use \m \M please?  You will always get
> these exact insns anyway.  And you could add a -times {\madd\M} 1 ?

Will update, thanks again for all the comments!

> 
> The Darwin problem might be something in darwin*.h, but I don't see it.
> Maybe it is a more generic problem?
> 

Yeah, it's

Re: [PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-09-28 Thread Kewen.Lin via Gcc-patches
Hi Iain,

Thanks very much for your time!!!

on 2022/9/29 03:09, Iain Sandoe wrote:
> Hi Kewen
> 
>> On 28 Sep 2022, at 17:18, Iain Sandoe  wrote:
>>
>> (reduced CC list, if folks want to be re-included .. please add them back).
>>
>>> On 28 Sep 2022, at 07:37, Iain Sandoe  wrote:
>>
>>>> On 28 Sep 2022, at 06:30, Kewen.Lin via Gcc-patches 
>>>>  wrote:
>>>
>>>> PR106680 shows that -m32 -mpowerpc64 is different from
>>>> -mpowerpc64 -m32, this is determined by the way how we
>>>> handle option powerpc64 in rs6000_handle_option.
>>>>
>>>> Segher pointed out this difference should be taken as
>>>> a bug and we should ensure that option powerpc64 is
>>>> independent of -m32/-m64.  So this patch removes the
>>>> handlings in rs6000_handle_option and add some necessary
>>>> supports in rs6000_option_override_internal instead.
>>>>
>>>> With this patch, if users specify -m{no-,}powerpc64, the
>>>> specified value is honoured, otherwise, for 64bit it
>>>> always enables OPTION_MASK_POWERPC64 while for 32bit
>>>> it disables OPTION_MASK_POWERPC64 if OS_MISSING_POWERPC64.
>>>>
>>>> Bootstrapped and regress-tested on:
>>>> - powerpc64-linux-gnu P7 and P8 {-m64,-m32}
>>>> - powerpc64le-linux-gnu P9 and P10
>>>> - powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}
>>>>
>>>> Hi Iain, could you help to test this on darwin to ensure
>>>> it won't break darwin's build and new tests are fine?
>>>> Thanks in advance!
>>>
>>> Will do, it will take a day or so, thanks,
>>
>> Perhaps a small exposition on the target:
>>
>> powerpc-apple-darwin, is perhaps somewhat unusual in that it is nominally a 
>> 32b kernel, but the OS supports 64b processes on suitable hardware (and the 
>> OS does preserve the upper bits of 64b regs in the context).
>>
>> -
>>
>> I bootstrapped (all supported languages) and tested r13-2892 yesterday with 
>> “nominal” results.
>>
>> Then I added this patch .. and did a clean bootstrap (same configuration).
>>
>> the bootstrap fails on the stage3 libgomp (building the ppc64 multilib) with 
>> the error below
>> What is somewhat odd here is that libgomp is bootstrapped with the compiler 
>> and, apparently,
>> openacc-init.c built OK at stage2.
>>
>> ——
>>
>> Of course, powerpc-darwin is not a blocker for anything, it should not hold 
>> you up (but sometimes it
>> manages to find a glitch missed elsewhere).  I will try to take a look at 
>> this this evening see if I can throw
>> any more light on it.
>>
>> --
>>
>> /src-local/gcc-master/libgomp/oacc-init.c:876:1: internal compiler error: 
>> ‘global_options’ are modified in local context
>>  876 | {
>>  | ^
>> 0xe940d7 cl_optimization_compare(gcc_options*, gcc_options*)
>>/scratch/10-5-leo/gcc-master/gcc/options-save.cc:14082
> 
> This repeats on a cross from x86_64-darwin to powerpc-darwin .. (makes debug 
> a bit quicker)
> 
> this is the failing case - which does not (immediately) seem directly 
> connected .. does it ring
> any bells for you?
> 
>16649if (ptr1->x_rs6000_sched_restricted_insns_priority != 
> ptr2->x_rs6000_sched_restricted_insns_priority)
> -> 16650  internal_error ("% are modified in local 
> context”);
> 

I found this flag is mainly related to tune setting and spotted that we have 
some code
for tune setting when no explicit cpu is given. 

...

  else
{
  size_t i;
  enum processor_type tune_proc
= (TARGET_POWERPC64 ? PROCESSOR_DEFAULT64 : PROCESSOR_DEFAULT);

  tune_index = -1;
  for (i = 0; i < ARRAY_SIZE (processor_target_table); i++)
if (processor_target_table[i].processor == tune_proc)
  {
tune_index = i;
break;
  }
}

It checks TARGET_POWERPC64 directly here, my proposed patch will adjust 
TARGET_POWERPC64
after this hunk, so it seems to be problematic for some case.

I'm testing the attached diff which can be applied on top of the previous 
proposed patch
on ppc64 and ppc64le, could you help to test it can fix the issue?

BR,
Kewendiff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 605d35893f9..3bfbb4eac21 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3702,7 +3702,7 @@ rs6000_option_override_internal (bool global_init_p)
   else
{
  /* PowerPC 64-bit LE requires at least ISA 2.07.  */
-

Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]

2022-09-27 Thread Kewen.Lin via Gcc-patches
on 2022/8/29 14:35, Kewen.Lin via Gcc-patches wrote:
> on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote:
>> on 2022/7/11 11:42, Kewen.Lin wrote:
>>> on 2022/6/15 14:20, Kewen.Lin wrote:
>>>> Hi Honza,
>>>>
>>>> Thanks for the comments!  Some replies are inlined below.
>>>>
>>>> on 2022/6/14 19:37, Jan Hubicka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>>>>> if func->decl is not null but no cgraph node is available for it.
>>>>>> As PR105818 shows, this could give unexpected result.  For the
>>>>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>>>>> is a function structure for foo, for which there is none cgraph
>>>>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>>>>> the context is to optimize for size, the flag optimize_size is
>>>>>> true.
>>>>>>
>>>>>> The patch is to make optimize_function_for_size_p to check
>>>>>> optimize_size as what it does when func->decl is unavailable.
>>>>>>
>>>>>> One regression failure got exposed on aarch64-linux-gnu:
>>>>>>
>>>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>>>>>  -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>>>>
>>>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>>>>> used in function fold_range_test during c parsing, it uses
>>>>>> optimize_function_for_speed_p which is equal to the invertion
>>>>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>>>>> but no cgraph node for it, w/o this patch function
>>>>>> optimize_function_for_speed_p returns true eventually, while it
>>>>>> returns false with this patch.  Since the command line option -Os
>>>>>> is specified, there is no reason to interpret it as "for speed".
>>>>>> I think this failure is expected and adjust the test case
>>>>>> accordingly.
>>>>>>
>>>>>> Is it ok for trunk?
>>>>>>
>>>>>> BR,
>>>>>> Kewen
>>>>>> -
>>>>>>
>>>>>>  PR target/105818
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>  * predict.cc (optimize_function_for_size_p): Check optimize_size when
>>>>>>  func->decl is valid but its cgraph node is unavailable.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>  * gcc.target/powerpc/pr105818.c: New test.
>>>>>>  * gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>>>>> ---
>>>>>>  gcc/predict.cc  | 2 +-
>>>>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +-
>>>>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +
>>>>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>>>>
>>>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>>>>> index 5734e4c8516..6c60a973236 100644
>>>>>> --- a/gcc/predict.cc
>>>>>> +++ b/gcc/predict.cc
>>>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>>>>cgraph_node *n = cgraph_node::get (fun->decl);
>>>>>>if (n)
>>>>>>  return n->optimize_for_size_p ();
>>>>>> -  return OPTIMIZE_SIZE_NO;
>>>>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>>>
>>>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>>>> probably better since one can change optimize_size with optimization
>>>>> attribute.
>>>>
>>>> Good point, agree!
>>>>
>>>>> However I think in most cases we check for optimize_size early I think
>>>>> we are doing something wrong, since at that time htere is no profile
>>>>> available.  Why exactly PR105818 hits the flag change issue?
>>>>
>>>> For PR105818, the reason why the flag changs is that:
>>>>

PING^1 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-09-27 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600190.html

BR,
Kewen

on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As discussed in PR98125, -fpatchable-function-entry with
> SECTION_LINK_ORDER support doesn't work well on powerpc64
> ELFv1 because the filled "Symbol" in
> 
>   .section name,"flags"o,@type,Symbol
> 
> sits in .opd section instead of in the function_section
> like .text or named .text*.
> 
> Since we already generates one label LPFE* which sits in
> function_section of current_function_decl, this patch is
> to reuse it as the symbol for the linked_to section.  It
> avoids the above ABI specific issue when using the symbol
> concluded from current_function_decl.
> 
> Besides, with this support some previous workarounds for
> powerpc64 ELFv1 can be reverted.
> 
> btw, rs6000_print_patchable_function_entry can be dropped
> but there is another rs6000 patch which needs this rs6000
> specific hook rs6000_print_patchable_function_entry, not
> sure which one gets landed first, so just leave it here.
> 
> Bootstrapped and regtested on below:
> 
>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>  and latest binutils 2.39.
>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>   4) x86_64-redhat-linux with default binutils 2.30
>  and latest binutils 2.39.
>   5) aarch64-linux-gnu  with default binutils 2.30
>  and latest binutils 2.39.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
> 
>   PR target/99889
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>   Adjust to call function default_print_patchable_function_entry.
>   * targhooks.cc (default_print_patchable_function_entry_1): Remove and
>   move the flags preparation ...
>   (default_print_patchable_function_entry): ... here, adjust to use
>   current_function_funcdef_no for label no.
>   * targhooks.h (default_print_patchable_function_entry_1): Remove.
>   * varasm.cc (default_elf_asm_named_section): Adjust code for
>   __patchable_function_entries section support with LPFE label.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/pr93195a.C: Remove the skip on powerpc*-*-* 64-bit.
>   * gcc.target/aarch64/pr92424-2.c: Adjust LPFE1 with LPFE0.
>   * gcc.target/aarch64/pr92424-3.c: Likewise.
>   * gcc.target/i386/pr93492-2.c: Likewise.
>   * gcc.target/i386/pr93492-3.c: Likewise.
>   * gcc.target/i386/pr93492-4.c: Likewise.
>   * gcc.target/i386/pr93492-5.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000.cc  | 13 +-
>  gcc/varasm.cc| 15 ---
>  gcc/targhooks.cc | 45 +++-
>  gcc/targhooks.h  |  3 --
>  gcc/testsuite/g++.dg/pr93195a.C  |  1 -
>  gcc/testsuite/gcc.target/aarch64/pr92424-2.c |  4 +-
>  gcc/testsuite/gcc.target/aarch64/pr92424-3.c |  4 +-
>  gcc/testsuite/gcc.target/i386/pr93492-2.c|  4 +-
>  gcc/testsuite/gcc.target/i386/pr93492-3.c|  4 +-
>  gcc/testsuite/gcc.target/i386/pr93492-4.c|  4 +-
>  gcc/testsuite/gcc.target/i386/pr93492-5.c|  4 +-
>  11 files changed, 40 insertions(+), 61 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..dba28b8e647 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14771,18 +14771,9 @@ rs6000_print_patchable_function_entry (FILE *file,
>  unsigned HOST_WIDE_INT patch_area_size,
>  bool record_p)
>  {
> -  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
> -  /* When .opd section is emitted, the function symbol
> - default_print_patchable_function_entry_1 is emitted into the .opd 
> section
> - while the patchable area is emitted into the function section.
> - Don't use SECTION_LINK_ORDER in that case.  */
> -  if (!(TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2)
> -  && HAVE_GAS_SECTION_LINK_ORDER)
> -flags |= SECTION_LINK_ORDER;
> -  default_print_patchable_function_entry_1 (file, patch_area_size, record_p,
> - flags);
> +  default_print_patchable_function_entry (file, patch_area_size, record_p);
>  }
> -
> 
> +
>  enum rtx_code
>  rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>  {
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 4db8506b106..d4de6e164ee 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
>

PING^1 [PATCH v4] rs6000: Rework ELFv2 support for -fpatchable-function-entry* [PR99888]

2022-09-27 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600277.html

BR,
Kewen

on 2022/8/25 13:50, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR99888 and its related show, the current support for
> -fpatchable-function-entry on powerpc ELFv2 doesn't work
> well with global entry existence.  For example, with one
> command line option -fpatchable-function-entry=3,2, it got
> below w/o this patch:
> 
>   .LPFE1:
> nop
> nop
> .type   foo, @function
>   foo:
> nop
>   .LFB0:
> .cfi_startproc
>   .LCF0:
>   0:  addis 2,12,.TOC.-.LCF0@ha
> addi 2,2,.TOC.-.LCF0@l
> .localentry foo,.-foo
> 
> , the assembly is unexpected since the patched nops have
> no effects when being entered from local entry.
> 
> This patch is to update the nops patched before and after
> local entry, it looks like:
> 
> .type   foo, @function
>   foo:
>   .LFB0:
> .cfi_startproc
>   .LCF0:
>   0:  addis 2,12,.TOC.-.LCF0@ha
> addi 2,2,.TOC.-.LCF0@l
> nop
> nop
> .localentry foo,.-foo
> nop
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8,
> and powerpc64le-linux-gnu P9 & P10.
> 
> v4: Change the remaining NOP to nop and update documentation of option
> -fpatchable-function-entry for PowerPC ELFv2 ABI dual entry points
> as Segher suggested.
> 
> v3: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599925.html
> 
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599617.html
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599461.html
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> 
>   PR target/99888
>   PR target/105649
> 
> gcc/ChangeLog:
> 
>   * doc/invoke.texi (option -fpatchable-function-entry): Adjust the
>   documentation for PowerPC ELFv2 ABI dual entry points.
>   * config/rs6000/rs6000-internal.h
>   (rs6000_print_patchable_function_entry): New function declaration.
>   * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
>   Support patchable-function-entry by emitting nops before and after
>   local entry for the function that needs global entry.
>   * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): Skip
>   the function that needs global entry till global entry has been
>   emitted.
>   * config/rs6000/rs6000.h (struct machine_function): New bool member
>   global_entry_emitted.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr99888-1.c: New test.
>   * gcc.target/powerpc/pr99888-2.c: New test.
>   * gcc.target/powerpc/pr99888-3.c: New test.
>   * gcc.target/powerpc/pr99888-4.c: New test.
>   * gcc.target/powerpc/pr99888-5.c: New test.
>   * gcc.target/powerpc/pr99888-6.c: New test.
>   * c-c++-common/patchable_function_entry-default.c: Adjust for
>   powerpc_elfv2 to avoid compilation error.
> ---
>  gcc/config/rs6000/rs6000-internal.h   |  5 +++
>  gcc/config/rs6000/rs6000-logue.cc | 32 ++
>  gcc/config/rs6000/rs6000.cc   | 10 -
>  gcc/config/rs6000/rs6000.h|  4 ++
>  gcc/doc/invoke.texi   |  8 +++-
>  .../patchable_function_entry-default.c|  3 ++
>  gcc/testsuite/gcc.target/powerpc/pr99888-1.c  | 43 +++
>  gcc/testsuite/gcc.target/powerpc/pr99888-2.c  | 43 +++
>  gcc/testsuite/gcc.target/powerpc/pr99888-3.c  | 11 +
>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  | 13 ++
>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  | 13 ++
>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  | 14 ++
>  12 files changed, 195 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-2.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-3.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-4.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-5.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr99888-6.c
> 
> diff --git a/gcc/config/rs6000/rs6000-internal.h 
> b/gcc/config/rs6000/rs6000-internal.h
> index 8ee8c987b81..d80c04b5ae5 100644
> --- a/gcc/config/rs6000/rs6000-internal.h
> +++ b/gcc/config/rs6000/rs6000-internal.h
> @@ -183,10 +183,15 @@ extern tree rs6000_fold_builtin (tree fndecl 
> ATTRIBUTE_UNUSED,
>tree *args ATTRIBUTE_UNUSED,
>bool ignore ATTRIBUTE_UNUSED);
> 
> +extern void r

PING^1 [PATCH] rs6000/test: Adjust pr104992.c with vect_int_mod [PR106516]

2022-09-27 Thread Kewen.Lin via Gcc-patches
Hi,

I assumed the generic part introducing check_effective_target_vect_int_mod
needs the approval from global maintainers.

So gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600191.html

BR,
Kewen

on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As PR106516 shows, we can get unexpected gimple outputs for
> function thud on some target which supports modulus operation
> for vector int.  This patch introduces one effective target
> vect_int_mod for it, then adjusts the test case with it.
> 
> Tested on x86_64-redhat-linux and powerpc64{,le}-linux-gnu,
> especially powerpc64le Power10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
>   PR testsuite/106516
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr104992.c: Adjust with vect_int_mod.
>   * lib/target-supports.exp (check_effective_target_vect_int_mod): New
>   proc for effective target vect_int_mod.
> ---
>  gcc/testsuite/gcc.dg/pr104992.c   | 3 ++-
>  gcc/testsuite/lib/target-supports.exp | 8 
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr104992.c b/gcc/testsuite/gcc.dg/pr104992.c
> index 217c89a458c..82f8c75559c 100644
> --- a/gcc/testsuite/gcc.dg/pr104992.c
> +++ b/gcc/testsuite/gcc.dg/pr104992.c
> @@ -54,4 +54,5 @@ __attribute__((noipa)) unsigned waldo (unsigned x, unsigned 
> y, unsigned z) {
>  return x / y * z == x;
>  }
> 
> -/* { dg-final {scan-tree-dump-times " % " 9 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " % " 9 "optimized" { target { ! 
> vect_int_mod } } } } */
> +/* { dg-final { scan-tree-dump-times " % " 6 "optimized" { target 
> vect_int_mod } } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 04a2a8e8659..a4bdd23bed0 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -8239,6 +8239,14 @@ proc check_effective_target_vect_long_mult { } {
>  return $answer
>  }
> 
> +# Return 1 if the target supports vector int modulus, 0 otherwise.
> +
> +proc check_effective_target_vect_int_mod { } {
> +return [check_cached_effective_target_indexed vect_int_mod {
> +  expr { [istarget powerpc*-*-*]
> +  && [check_effective_target_power10_ok] }}]
> +}
> +
>  # Return 1 if the target supports vector even/odd elements extraction, 0 
> otherwise.
> 
>  proc check_effective_target_vect_extract_even_odd { } {
> --
> 2.27.0


[PATCH] rs6000: Rework option -mpowerpc64 handling [PR106680]

2022-09-27 Thread Kewen.Lin via Gcc-patches
Hi,

PR106680 shows that -m32 -mpowerpc64 is different from
-mpowerpc64 -m32, this is determined by the way how we
handle option powerpc64 in rs6000_handle_option.

Segher pointed out this difference should be taken as
a bug and we should ensure that option powerpc64 is
independent of -m32/-m64.  So this patch removes the
handlings in rs6000_handle_option and add some necessary
supports in rs6000_option_override_internal instead.

With this patch, if users specify -m{no-,}powerpc64, the
specified value is honoured, otherwise, for 64bit it
always enables OPTION_MASK_POWERPC64 while for 32bit
it disables OPTION_MASK_POWERPC64 if OS_MISSING_POWERPC64.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P7 and P8 {-m64,-m32}
  - powerpc64le-linux-gnu P9 and P10
  - powerpc-ibm-aix7.2.0.0 {-maix64,-maix32}

Hi Iain, could you help to test this on darwin to ensure
it won't break darwin's build and new tests are fine?
Thanks in advance!

Is it ok for trunk if darwin testing goes well?

BR,
Kewen
-
PR target/106680

gcc/ChangeLog:

* common/config/rs6000/rs6000-common.cc (rs6000_handle_option): Remove
the adjustment for option powerpc64 in -m64 handling, and remove the
whole -m32 handling.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): When no
explicit powerpc64 option is provided, enable it at -m64 and disable it
for OS_MISSING_POWERPC64.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106680-1.c: New test.
* gcc.target/powerpc/pr106680-2.c: New test.
* gcc.target/powerpc/pr106680-3.c: New test.
* gcc.target/powerpc/pr106680-4.c: New test.
---
 gcc/common/config/rs6000/rs6000-common.cc | 11 ---
 gcc/config/rs6000/rs6000.cc   | 33 ++-
 gcc/testsuite/gcc.target/powerpc/pr106680-1.c | 12 +++
 gcc/testsuite/gcc.target/powerpc/pr106680-2.c | 13 
 gcc/testsuite/gcc.target/powerpc/pr106680-3.c | 12 +++
 gcc/testsuite/gcc.target/powerpc/pr106680-4.c | 16 +
 6 files changed, 77 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-3.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106680-4.c

diff --git a/gcc/common/config/rs6000/rs6000-common.cc 
b/gcc/common/config/rs6000/rs6000-common.cc
index 8e393d08a23..c76b5c27bb6 100644
--- a/gcc/common/config/rs6000/rs6000-common.cc
+++ b/gcc/common/config/rs6000/rs6000-common.cc
@@ -119,19 +119,8 @@ rs6000_handle_option (struct gcc_options *opts, struct 
gcc_options *opts_set,
 #else
 case OPT_m64:
 #endif
-  opts->x_rs6000_isa_flags |= OPTION_MASK_POWERPC64;
   opts->x_rs6000_isa_flags |= (~opts_set->x_rs6000_isa_flags
   & OPTION_MASK_PPC_GFXOPT);
-  opts_set->x_rs6000_isa_flags |= OPTION_MASK_POWERPC64;
-  break;
-
-#ifdef TARGET_USES_AIX64_OPT
-case OPT_maix32:
-#else
-case OPT_m32:
-#endif
-  opts->x_rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
-  opts_set->x_rs6000_isa_flags |= OPTION_MASK_POWERPC64;
   break;

 case OPT_mminimal_toc:
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index e6fa3ad0eb7..605d35893f9 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3648,17 +3648,12 @@ rs6000_option_override_internal (bool global_init_p)
   rs6000_pointer_size = 32;
 }

-  /* Some OSs don't support saving the high part of 64-bit registers on context
- switch.  Other OSs don't support saving Altivec registers.  On those OSs,
- we don't touch the OPTION_MASK_POWERPC64 or OPTION_MASK_ALTIVEC settings;
- if the user wants either, the user must explicitly specify them and we
- won't interfere with the user's specification.  */
+  /* Some OSs don't support saving Altivec registers.  On those OSs, we don't
+ touch the OPTION_MASK_POWERPC64 or OPTION_MASK_ALTIVEC settings; if the
+ user wants either, the user must explicitly specify them and we won't
+ interfere with the user's specification.  */

   set_masks = POWERPC_MASKS;
-#ifdef OS_MISSING_POWERPC64
-  if (OS_MISSING_POWERPC64)
-set_masks &= ~OPTION_MASK_POWERPC64;
-#endif
 #ifdef OS_MISSING_ALTIVEC
   if (OS_MISSING_ALTIVEC)
 set_masks &= ~(OPTION_MASK_ALTIVEC | OPTION_MASK_VSX
@@ -3753,6 +3748,26 @@ rs6000_option_override_internal (bool global_init_p)
error ("AltiVec not supported in this target");
 }

+  /* With option powerpc64 specified explicitly (either on or off), even if
+ being compiled for 64 bit we don't need to check if it's disabled here,
+ since subtargets will check and raise an error message if necessary
+ later.  But without option powerpc64 specified explicitly, we need to
+ ensure powerpc64 enabled for 64 bit and disabled on those OSes with
+ OS_MISSING_POWE

Re: [PATCH V3] rs6000: cannot_force_const_mem for HIGH code rtx[PR106460]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2022/9/7 15:08, Jiufu Guo via Gcc-patches wrote:
> Hi,
> 
> As the issue in PR106460, a rtx 'high:DI (symbol_ref:DI ("var_48")' is tried
> to store into constant pool and ICE occur.  But actually, this rtx represents
> partial address and can not be put into a .rodata section.
> 
> This patch updates rs6000_cannot_force_const_mem to return true for rtx(s) 
> with
> HIGH code, because these rtx(s) indicate part of address and are not ok for
> constant pool.
> 
> Below are some examples:
> (high:DI (const:DI (plus:DI (symbol_ref:DI ("xx") (const_int 12 [0xc])
> (high:DI (symbol_ref:DI ("var_1")..)))
> 
> This patch updated the previous patch, and drafted an test case which ICE
> without the patch, and assoicated with one PR.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597712.html
> This patch also updated the message for previous patch V2.
> 
> I would ask help to review this patch one more time.
> 
> Bootstrap and regtest pass on ppc64 and ppc64le.
> Is this ok for trunk.
> 
> BR,
> Jeff(Jiufu)
> 
>   PR target/106460
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): Return true
>   for HIGH code rtx.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr106460.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc |  7 +--
>  gcc/testsuite/gcc.target/powerpc/pr106460.c | 11 +++
>  2 files changed, 16 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106460.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2f3146e56f8..04e3a393147 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9643,8 +9643,11 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* If GET_CODE (x) is HIGH, the 'X' represets the high part of a 
> symbol_ref.
> + It indicates partial address,  which can not be put into a constant 
> pool.
> + e.g.  (high:DI (unspec:DI [(symbol_ref/u:DI ("*.LC0")..)
> + (high:DI (symbol_ref:DI ("var")..)).  */

Nit: Maybe it's good to align these two "(high:DI ... ?

> +  if (GET_CODE (x) == HIGH)
>  return true;
>  
>/* A TLS symbol in the TOC cannot contain a sum.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106460.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> new file mode 100644
> index 000..dfaffcb6e28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106460.c
> @@ -0,0 +1,11 @@

Need a power10_ok effective target here.

/* { dg-require-effective-target power10_ok } */

> +/* { dg-options "-O1 -mdejagnu-cpu=power10" } */

Nit: As Segher's review on one of my patches, O2 is preferred against O1 if it
still works for this issue.  The point is to avoid some related optimization
(routines or passes) to be disabled at O1 one day and this becomes ineffective.

BR,
Kewen




Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/9/22 22:05, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 22, 2022 at 10:28:23AM +0800, Kewen.Lin wrote:
>> on 2022/9/22 05:56, Segher Boessenkool wrote:
>>> On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote:
>>> In the other direction I am worried that the unspecs will degrade
>>> performance (relative to smin/smax) when -ffast-math *is* active (and
>>> this new builtin code and pattern doesn't blow up).
>>
>> For fmin/fmax it would be fine, since they are transformed to {MAX,MIN}
>> EXPR in middle end, and yes, it can degrade for the bifs, although IMHO
>> the previous expansion to smin/smax contradicts with the bif names (users
>> expect to map them to xs{min,max}dp than others).
> 
> But builtins *never* say to generate any particular instruction.  They
> say to generate code that implements certain functionality.  For many
> builtins this does of course boil down to specific instructions, but
> even then it could be optimised away completely or replace with
> something more specific if things can be folded or such.

ah, your explanation refreshed my mind, thanks!  Previously I thought the
bifs with specific mnemonic as part of their names should be used to generate
specific instructions, it's to save users' efforts using inline-asm, if
we want them to represent the generic functionality (not bind with specific),
we can use some generic names instead.  As your explanation, binding at
fast-math isn't needed, then I think Haochen's patch v7 with gimple folding
can avoid the concern on degradation at fast-math (still smax/smin), nice. :)

BR,
Kewen


Re: [PATCH] rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the comments!

on 2022/9/23 06:13, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Sep 22, 2022 at 09:41:42AM +0800, Kewen.Lin wrote:
>>  * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
>>  condition for adding REG_CFA_DEF_CFA reg note with
>>  frame_pointer_needed_indeed.
> 
>> --- a/gcc/config/rs6000/rs6000-logue.cc
>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>> @@ -4956,7 +4956,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>>   a REG_CFA_DEF_CFA note, but that's OK;  A duplicate is
>>   discarded by dwarf2cfi.cc/dwarf2out.cc, and in any case would
>>   be harmless if emitted.  */
>> -  if (frame_pointer_needed)
>> +  if (frame_pointer_needed_indeed)
>>  {
>>insn = get_last_insn ();
> 
> I thought about adding an assert here, but the very next insn gives a
> clear enough message anyway, zo it would be just noise :-)
> 
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96072.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr96072.c
>> new file mode 100644
>> index 000..23d1cc74ffd
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96072.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-options "-O1" } */
>> +
>> +/* Verify there is no ICE on 32 bit environment.  */
> 
> /* This used to ICE with the SYSV ABI (PR96072).  */

Updated.

> 
> Please use -O2 if that works here.
> 

Updated too.

> Okay for trunk.  Thank you!
> 

Comitted in r13-2846, since it's a regression causing ICE, I think we want
this to backport?  Is it ok to backport this after burn-in time?

Thanks again!

BR,
Kewen


Re: [PATCH] rs6000: Fix condition of define_expand vec_shr_ [PR100645]

2022-09-25 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the comments!

on 2022/9/23 05:39, Segher Boessenkool wrote:
> Hi!
> 
> Heh, I first thought I had mistyped thgew PR #, but it is this one after
> all :-)
> 
> On Thu, Sep 22, 2022 at 09:41:34AM +0800, Kewen.Lin wrote:
>> PR100645 exposes one latent bug in define_expand vec_shr_
>> that the current condition TARGET_ALTIVEC is too loose.  The
>> mode iterator VEC_L contains a few modes, they are not always
>> supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
>> be used like some other VEC_L usages.
> 
>> --- a/gcc/config/rs6000/vector.md
>> +++ b/gcc/config/rs6000/vector.md
>> @@ -1475,7 +1475,7 @@ (define_expand "vec_shr_"
>>[(match_operand:VEC_L 0 "vlogical_operand")
>> (match_operand:VEC_L 1 "vlogical_operand")
>> (match_operand:QI 2 "reg_or_short_operand")]
>> -  "TARGET_ALTIVEC"
>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> 
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr100645.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>> +/* { dg-options "-mdejagnu-cpu=power6 -maltivec" } */
> 
> This is a strange choice: we normally do not enable VMX on p6.  Just use
> p7 instead?  There is no need for altivec_ok in any case, the -mcpu=
> guarantees it is satisfied.

Unfortunately a single power7 doesn't work for this case, since it (VSX) makes
rs6000_vector_mem[TImode] not VECTOR_NONE any more, we need one extra -mno-vsx
to reproduce this.

As you mentioned above, power6 doesn't enable altivec by default, I noticed
altivec_ok excludes some envs like aix 5.3 etc., and also ensures it's fine
to have an explicit maltivec there, so I added it for robustness.

> 
>> +/* It's to verify no ICE here.  */
> 
> "This used to ICE."?

Updated.

> 
> Please commit this now, looks good.  Thanks!
> 

Committed in r13-2844.  Thanks!

BR,
Kewen


Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-21 Thread Kewen.Lin via Gcc-patches
on 2022/9/22 05:56, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote:
>>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
>> of smin/max. So the builtins always generate xs[min/max]dp on all
>> platforms.
> 
> But how does this not blow up with -ffast-math?

Indeed.  Since it guards with "TARGET_VSX && !flag_finite_math_only",
the bifs seem to cause ICE at -ffast-math.

Haochen, could you double check it?

> 
> In the other direction I am worried that the unspecs will degrade
> performance (relative to smin/smax) when -ffast-math *is* active (and
> this new builtin code and pattern doesn't blow up).

For fmin/fmax it would be fine, since they are transformed to {MAX,MIN}
EXPR in middle end, and yes, it can degrade for the bifs, although IMHO
the previous expansion to smin/smax contradicts with the bif names (users
expect to map them to xs{min,max}dp than others).

> 
> I still think we should get RTL codes for this, to have access to proper
> floating point min/max semantics always and everywhere.  "fmin" and
> "fmax" seem to be good names :-)

It would be good, especially if we have observed some uses of these bifs
and further opportunities around them.  :)

BR,
Kewen


[PATCH] rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]

2022-09-21 Thread Kewen.Lin via Gcc-patches
Hi,

As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously.  That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but later it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981.  It
caused ICE due to unexpected NULL insn.  This patch is to
make the conditions consistent.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-
PR target/96072

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr96072.c: New test.
---
 gcc/config/rs6000/rs6000-logue.cc  |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr96072.c | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96072.c

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index 51f55d1d527..41daf6ee646 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -4956,7 +4956,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
 a REG_CFA_DEF_CFA note, but that's OK;  A duplicate is
 discarded by dwarf2cfi.cc/dwarf2out.cc, and in any case would
 be harmless if emitted.  */
-  if (frame_pointer_needed)
+  if (frame_pointer_needed_indeed)
{
  insn = get_last_insn ();
  add_reg_note (insn, REG_CFA_DEF_CFA,
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96072.c 
b/gcc/testsuite/gcc.target/powerpc/pr96072.c
new file mode 100644
index 000..23d1cc74ffd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96072.c
@@ -0,0 +1,14 @@
+/* { dg-options "-O1" } */
+
+/* Verify there is no ICE on 32 bit environment.  */
+
+void
+he (int jn)
+{
+  {
+int bh[jn];
+if (jn != 0)
+  goto wa;
+  }
+wa:;
+}
--
2.27.0


[PATCH] rs6000: Fix condition of define_expand vec_shr_ [PR100645]

2022-09-21 Thread Kewen.Lin via Gcc-patches
Hi,
 
PR100645 exposes one latent bug in define_expand vec_shr_
that the current condition TARGET_ALTIVEC is too loose.  The
mode iterator VEC_L contains a few modes, they are not always
supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
be used like some other VEC_L usages.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push it a week later if no objections.

BR,
Kewen
-
PR target/100645

gcc/ChangeLog:

* config/rs6000/vector.md (vec_shr_): Replace condition
TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr100645.c: New test.
---
 gcc/config/rs6000/vector.md |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr100645.c | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100645.c

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index a0d33d2f604..0171705803c 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1475,7 +1475,7 @@ (define_expand "vec_shr_"
   [(match_operand:VEC_L 0 "vlogical_operand")
(match_operand:VEC_L 1 "vlogical_operand")
(match_operand:QI 2 "reg_or_short_operand")]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
 {
   rtx bitshift = operands[2];
   rtx shift;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100645.c 
b/gcc/testsuite/gcc.target/powerpc/pr100645.c
new file mode 100644
index 000..e221287c0f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100645.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-mdejagnu-cpu=power6 -maltivec" } */
+
+/* It's to verify no ICE here.  */
+
+typedef long long v2di __attribute__ ((vector_size (16)));
+
+v2di
+foo_v2di_l (v2di x)
+{
+  return __builtin_shuffle ((v2di){0, 0}, x, (v2di){3, 0});
+}
+
--
2.27.0


Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-21 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/6/24 10:02, HAO CHEN GUI wrote:
> Hi,
>   This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
> Tests show that outputs of xs[min/max]dp are consistent with the standard
> of C99 fmin/max.
> 
>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
> of smin/max. So the builtins always generate xs[min/max]dp on all
> platforms.
> 
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-06-24 Haochen Gui 
> 
> gcc/
>   PR target/103605
>   * config/rs6000/rs6000.md (FMINMAX): New.
>   (minmax_op): New.
>   (f3): New pattern by UNSPEC_FMAX and UNSPEC_FMIN.

Nit: here miss UNSPEC_FMAX and UNSPEC_FMIN.

>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xsmaxdp): Set
>   pattern to fmaxdf3.
>   (__builtin_vsx_xsmindp): Set pattern to fmindf3.
> 
> gcc/testsuite/
>   PR target/103605
>   * gcc.dg/powerpc/pr103605.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f4a9f24bcc5..8b735493b40 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,10 +1613,10 @@
>  XSCVSPDP vsx_xscvspdp {}
> 
>const double __builtin_vsx_xsmaxdp (double, double);
> -XSMAXDP smaxdf3 {}
> +XSMAXDP fmaxdf3 {}
> 
>const double __builtin_vsx_xsmindp (double, double);
> -XSMINDP smindf3 {}
> +XSMINDP fmindf3 {}
> 
>const double __builtin_vsx_xsrdpi (double);
>  XSRDPI vsx_xsrdpi {}
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bf85baa5370..ae0dd98f0f9 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,8 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN
>])
> 
>  ;;
> @@ -5341,6 +5343,22 @@ (define_insn_and_split "*s3_fpr"
>DONE;
>  })
> 
> +
> +(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
> +
> +(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
> +  (UNSPEC_FMIN "min")])
> +
> +(define_insn "f3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
> +   (match_operand:SFDF 2 "vsx_register_operand" "wa")]
> +  FMINMAX))]
> +  "TARGET_VSX && !flag_finite_math_only"
> +  "xsdp %x0,%x1,%x2"
> +  [(set_attr "type" "fp")]
> +)
> +
>  (define_expand "movcc"
> [(set (match_operand:GPR 0 "gpc_reg_operand")
>(if_then_else:GPR (match_operand 1 "comparison_operator")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> new file mode 100644
> index 000..1c938d40e61
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */

Nit: This dg-do line isn't needed.  OK with or without two nits fixed.  Thanks!

BR,
Kewen


Re: [PATCH v2] Handle OPAQUE_TYPE specially in verify_type [PR106833]

2022-09-09 Thread Kewen.Lin via Gcc-patches
on 2022/9/9 15:25, Richard Biener wrote:
> On Fri, Sep 9, 2022 at 8:51 AM Kewen.Lin  wrote:
>>
>> Hi Richi,
>>
>> Thanks for the review comments!
>>
>> on 2022/9/8 15:36, Richard Biener wrote:
>>>
>>>
 Am 08.09.2022 um 07:53 schrieb Kewen.Lin :

 Hi,

 As PR106833 shows, cv-qualified opaque type can cause ICE
 during LTO.  It exposes that we missd to handle OPAQUE_TYPE
 well in type verification.  As Richi pointed out, also
 assuming that target will always define TYPE_MAIN_VARIANT
 and TYPE_CANONICAL for opaque type, this patch is to check
 both are OPAQUE_TYPE_P.  Besides, it also checks the only
 available size and alignment information as well as type
 mode for TYPE_MAIN_VARIANT.

>> ...
 +
 +  if (t != tv)
 +{
 +  verify_match (TREE_CODE, t, tv);
 +  verify_match (TYPE_MODE, t, tv);
 +  verify_match (TYPE_SIZE, t, tv);
>>>
>>> TYPE_SIZE is a tree, you should probably
>>> Compare this with operand_equal_p.  It’s
>>> Not documented to be a constant size?
>>> Thus some VLA vector mode might be allowed ( a poly_int size),
>>
>> Thanks for catching, I was referencing the code in function
>> verify_type_variant, that corresponding part seems imperfect:
>>
>>   if (TREE_CODE (TYPE_SIZE (t)) != PLACEHOLDER_EXPR
>>   && TREE_CODE (TYPE_SIZE (tv)) != PLACEHOLDER_EXPR)
>> verify_variant_match (TYPE_SIZE);
>>
>> I agree poly_int size is allowed, the patch was updated for it.
>>
>> BLKmode
>>> Is ruled out(?),
>>
>> Yes, it requires a mode of MODE_OPAQUE class.
>>
>> the docs say we have
>>> ‚An MODE_Opaque‘ here but I don’t see
>>> This being verified?
>>>
>>
>> There is a MODE equality check, I assumed the given t already
>> has one MODE_OPAQUE mode, but the patch was updated to make
>> it explicit as you concerned.
>>
>>> The macro makes this a bit unworldly
>>> For the only benefit of elaborate diagnostic
>>> Which I think isn’t really necessary
>>
>> OK, fixed!
>>
>> The previous version makes just one check on TYPE_CANONICAL to
>> be cheap as gimple_canonical_types_compatible_p said, but
>> since there are just several fields to be check, this updated
>> version adjusted it to be the same as what's for TYPE_MAIN_VARIANT.
>> Hope it's fine. :)
> 
> I think we'll call verify_type on the main variant as well so that would be
> redundant (ensured by transitivity), can you check?

I just had a check and found that we don't always call verify_type
on the main variant.  For example, with one case like:

__attribute__((noipa))
int foo(c){
  return 0;
}

int main ()
{
  const __vector_quad c;
  int r = foo(c);
  return r;
}

Checking during LTO WPA, verify_type only gets type "const
__vector_quad", no type "__vector_quad".

btw, it needs some hacking in rs6000_function_arg to make this
opaque type valid for function arg.

> 
>> Tested as before.
>>
>> Does this updated patch look good to you?
> 
> Yes, please remove the checks against the main variant if the above holds,
> OK with or without that change depending on this outcome.
> 

Committed in r13-2562, thanks!

BR,
Kewen


[PATCH v2] Handle OPAQUE_TYPE specially in verify_type [PR106833]

2022-09-08 Thread Kewen.Lin via Gcc-patches
Hi Richi,

Thanks for the review comments!

on 2022/9/8 15:36, Richard Biener wrote:
> 
> 
>> Am 08.09.2022 um 07:53 schrieb Kewen.Lin :
>>
>> Hi,
>>
>> As PR106833 shows, cv-qualified opaque type can cause ICE
>> during LTO.  It exposes that we missd to handle OPAQUE_TYPE
>> well in type verification.  As Richi pointed out, also
>> assuming that target will always define TYPE_MAIN_VARIANT
>> and TYPE_CANONICAL for opaque type, this patch is to check
>> both are OPAQUE_TYPE_P.  Besides, it also checks the only
>> available size and alignment information as well as type
>> mode for TYPE_MAIN_VARIANT.
>>
...
>> +
>> +  if (t != tv)
>> +{
>> +  verify_match (TREE_CODE, t, tv);
>> +  verify_match (TYPE_MODE, t, tv);
>> +  verify_match (TYPE_SIZE, t, tv);
> 
> TYPE_SIZE is a tree, you should probably 
> Compare this with operand_equal_p.  It’s 
> Not documented to be a constant size?
> Thus some VLA vector mode might be allowed ( a poly_int size), 

Thanks for catching, I was referencing the code in function
verify_type_variant, that corresponding part seems imperfect:

  if (TREE_CODE (TYPE_SIZE (t)) != PLACEHOLDER_EXPR
  && TREE_CODE (TYPE_SIZE (tv)) != PLACEHOLDER_EXPR)
verify_variant_match (TYPE_SIZE);

I agree poly_int size is allowed, the patch was updated for it.

BLKmode
> Is ruled out(?), 

Yes, it requires a mode of MODE_OPAQUE class.

the docs say we have
> ‚An MODE_Opaque‘ here but I don’t see
> This being verified?
> 

There is a MODE equality check, I assumed the given t already
has one MODE_OPAQUE mode, but the patch was updated to make
it explicit as you concerned.

> The macro makes this a bit unworldly
> For the only benefit of elaborate diagnostic
> Which I think isn’t really necessary

OK, fixed!

The previous version makes just one check on TYPE_CANONICAL to
be cheap as gimple_canonical_types_compatible_p said, but
since there are just several fields to be check, this updated
version adjusted it to be the same as what's for TYPE_MAIN_VARIANT.
Hope it's fine. :)

Tested as before.

Does this updated patch look good to you?

BR,
Kewen
--From 4a905fcb2abcc4e488d90011dd2c2125fb9e14b2 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Thu, 8 Sep 2022 21:34:29 -0500
Subject: [PATCH] Handle OPAQUE_TYPE specially in verify_type [PR106833]

As PR106833 shows, cv-qualified opaque type can cause ICE
during LTO.  It exposes that we missd to handle OPAQUE_TYPE
well in type verification.  As Richi pointed out, also
assuming that target will always define TYPE_MAIN_VARIANT
TYPE_CANONICAL for opaque type, this patch is to check
both are OPAQUE_TYPE_P and their modes are of MODE_OPAQUE
class.  Besides, it also checks the only available size
and alignment information.

PR middle-end/106833

gcc/ChangeLog:

* tree.cc (verify_opaque_type): New function.
(verify_type): Call verify_opaque_type for OPAQUE_TYPE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106833.c: New test.
---
 gcc/testsuite/gcc.target/powerpc/pr106833.c | 14 
 gcc/tree.cc | 74 -
 2 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106833.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr106833.c 
b/gcc/testsuite/gcc.target/powerpc/pr106833.c
new file mode 100644
index 000..968d75184ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106833.c
@@ -0,0 +1,14 @@
+/* { dg-do link } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -mdejagnu-cpu=power10" } */
+
+/* Verify there is no ICE in LTO mode.  */
+
+int main ()
+{
+  float *b;
+  const __vector_quad c;
+  __builtin_mma_disassemble_acc (b, &c);
+  return 0;
+}
diff --git a/gcc/tree.cc b/gcc/tree.cc
index fed1434d141..b755cd5083a 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -13670,6 +13670,71 @@ gimple_canonical_types_compatible_p (const_tree t1, 
const_tree t2,
 }
 }
 
+/* For OPAQUE_TYPE T, it should have only size and alignment information
+   and its mode should be of class MODE_OPAQUE.  This function verifies
+   these properties of T match TV which is the main variant of T and TC
+   which is the canonical of T.  */
+
+static void
+verify_opaque_type (const_tree t, tree tv, tree tc)
+{
+  gcc_assert (OPAQUE_TYPE_P (t));
+  gcc_assert (tv && tv == TYPE_MAIN_VARIANT (tv));
+  gcc_assert (tc && tc == TYPE_CANONICAL (tc));
+
+  /* For an opaque type T1, check if some of its properties match
+ the corresponding ones of the other opaque type T2, emit some
+ error messages for those inconsistent ones.  */
+  auto check_properties_for_opaque_type = [](const_tree t1, tree t2,
+const char *kind_msg)
+  {
+if (!OPAQUE_TYPE_P (t2))
+  {
+   error ("type %s is not an opaque type", kind_msg);
+   debug_tree (t2);
+   return;
+  }
+if (!OPAQUE_MODE_P (TYPE

[PATCH] Handle OPAQUE_TYPE specially in verify_type [PR106833]

2022-09-07 Thread Kewen.Lin via Gcc-patches
Hi,

As PR106833 shows, cv-qualified opaque type can cause ICE
during LTO.  It exposes that we missd to handle OPAQUE_TYPE
well in type verification.  As Richi pointed out, also
assuming that target will always define TYPE_MAIN_VARIANT
and TYPE_CANONICAL for opaque type, this patch is to check
both are OPAQUE_TYPE_P.  Besides, it also checks the only
available size and alignment information as well as type
mode for TYPE_MAIN_VARIANT.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-
PR middle-end/106833

gcc/ChangeLog:

* tree.cc (verify_opaque_type): New function.
(verify_match): New macro.
(verify_type): Call verify_opaque_type for OPAQUE_TYPE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106833.c: New test.
---
 gcc/testsuite/gcc.target/powerpc/pr106833.c | 14 +++
 gcc/tree.cc | 45 -
 2 files changed, 58 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106833.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr106833.c 
b/gcc/testsuite/gcc.target/powerpc/pr106833.c
new file mode 100644
index 000..968d75184ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106833.c
@@ -0,0 +1,14 @@
+/* { dg-do link } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target lto } */
+/* { dg-options "-flto -mdejagnu-cpu=power10" } */
+
+/* Verify there is no ICE in LTO mode.  */
+
+int main ()
+{
+  float *b;
+  const __vector_quad c;
+  __builtin_mma_disassemble_acc (b, &c);
+  return 0;
+}
diff --git a/gcc/tree.cc b/gcc/tree.cc
index fed1434d141..e67caa8f85d 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -13670,6 +13670,42 @@ gimple_canonical_types_compatible_p (const_tree t1, 
const_tree t2,
 }
 }

+/* For OPAQUE_TYPE T, it has only size and alignment information, so verify
+   these properties of T match TV which is the main variant of T.  Also
+   verify the type of TC, which is the canonical of T, is OPAQUE_TYPE.  */
+
+static void
+verify_opaque_type (const_tree t, tree tv, tree tc)
+{
+  gcc_assert (OPAQUE_TYPE_P (t));
+  gcc_assert (tv && tv == TYPE_MAIN_VARIANT (tv));
+  gcc_assert (tc && tc == TYPE_CANONICAL (tc));
+
+#define verify_match(flag, t, tv)  
\
+  do   
\
+{  
\
+  if (flag (t) != flag (tv))   
\
+   {  \
+ error ("opaque type differs by %s", #flag);  \
+ debug_tree (tv); \
+   }  \
+}  
\
+  while (false)
+
+  if (t != tv)
+{
+  verify_match (TREE_CODE, t, tv);
+  verify_match (TYPE_MODE, t, tv);
+  verify_match (TYPE_SIZE, t, tv);
+  verify_match (TYPE_ALIGN, t, tv);
+  verify_match (TYPE_USER_ALIGN, t, tv);
+}
+
+  if (t != tc)
+verify_match (TREE_CODE, t, tc);
+#undef verify_match
+}
+
 /* Verify type T.  */

 void
@@ -13677,6 +13713,14 @@ verify_type (const_tree t)
 {
   bool error_found = false;
   tree mv = TYPE_MAIN_VARIANT (t);
+  tree ct = TYPE_CANONICAL (t);
+
+  if (OPAQUE_TYPE_P (t))
+{
+  verify_opaque_type (t, mv, ct);
+  return;
+}
+
   if (!mv)
 {
   error ("main variant is not defined");
@@ -13691,7 +13735,6 @@ verify_type (const_tree t)
   else if (t != mv && !verify_type_variant (t, mv))
 error_found = true;

-  tree ct = TYPE_CANONICAL (t);
   if (!ct)
 ;
   else if (TYPE_CANONICAL (ct) != ct)
--
2.27.0



Re: [PATCH] Using pli(paddi) and rotate to build 64bit constants

2022-09-06 Thread Kewen.Lin via Gcc-patches
Hi!

> +
> +   /* Use paddi for the low 32 bits.  */
> +   if (ud2 != 0 && ud1 != 0 && can_use_paddi)
> + emit_move_insn (dest, gen_rtx_PLUS (DImode, dest,
> + GEN_INT ((ud2 << 16) | ud1)));
> +
> +   /* Use oris, ori for low 32 bits.  */
> +   if (ud2 != 0 && (ud1 == 0 || !can_use_paddi))
> + emit_move_insn (ud1 != 0 ? dest : dest,

Nit: "ud1 != 0 ? dest : dest" => dest

> + gen_rtx_IOR (DImode, dest, GEN_INT (ud2 << 16)));
> +   if (ud1 != 0 && (ud2 == 0 || !can_use_paddi))
> + emit_move_insn (dest, gen_rtx_IOR (DImode, dest, GEN_INT (ud1)));
> + }
> +}
>else
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> new file mode 100644
> index 000..d023fac4676
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> @@ -0,0 +1,14 @@
> +/* PR target/106550 */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-require-effective-target power10_ok } */
> +
> +void
> +foo (unsigned long long *a)
> +{
> +  *a++ = 0x020805006106003; /* pli+pli+rldimi */
> +  *a++ = 0x2351847027482577;/* pli+pli+rldimi */  
> +}
> +
> +/* { dg-final { scan-assembler-times {\mpli\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
> new file mode 100644
> index 000..48f76ca3da9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
> @@ -0,0 +1,22 @@
> +/* PR target/106550 */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */
> +/* force the constant splitter run after RA: -fdisable-rtl-split1.  */
> +
> +void
> +foo (unsigned long long *a)
> +{
> +  /* Test oris/ori is used where paddi does not work with 'r0'. */
> +  register long long d asm("r0") = 0x1245abcef9240dec; /* pli+sldi+oris+ori 
> */
> +  long long n;
> +  asm("cntlzd %0, %1" : "=r"(n) : "r"(d));
> +  *a++ = n;
> +
> +  *a++ = 0x235a8470a748ULL; /* pli+sldi+oris*/
> +  *a++ = 0x23a18470b677ULL; /* pli+sldi+ori*/

Nit: I guess you want one space at the separated end of these two comment lines
since the comment lines at the other places have.  :)

BR,
Kewen


Re: [PATCH v2] rs6000/test: Fix empty TU in some cases of effective targets [PR106345]

2022-09-05 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Gentle ping this patch as you wanted empty TU issue to be fixed
first at the discussion [1].  Thanks in advance for your time!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600927.html

BR,
Kewen

on 2022/7/25 14:26, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As the failure of test case gcc.target/powerpc/pr92398.p9-.c in
> PR106345 shows, some test sources for some powerpc effective
> targets use empty translation unit wrongly.  The test sources
> could go with options like "-ansi -pedantic-errors", then those
> effective target checkings will fail unexpectedly with the
> error messages like:
> 
>   error: ISO C forbids an empty translation unit [-Wpedantic]
> 
> This patch is to fix empty TUs with one dummy function definition
> accordingly.
> 
> Excepting for the failures on gcc.target/powerpc/pr92398.p9-.c
> fixed, I can see it helps to bring back some testing coverage like:
> 
> NA->PASS: gcc.target/powerpc/pr92398.p9+.c
> NA->PASS: gcc.target/powerpc/pr93453-1.c
> 
> Tested as before.
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598602.html
> v2: Use dummy function instead of dummy int as Segher suggested.
> 
> Segher, does this v2 look good to you?
> 
> BR,
> Kewen
> -
>   PR testsuite/106345
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp (check_effective_target_powerpc_sqrt): Add
>   a function definition to avoid pedwarn about empty translation unit.
>   (check_effective_target_has_arch_pwr5): Likewise.
>   (check_effective_target_has_arch_pwr6): Likewise.
>   (check_effective_target_has_arch_pwr7): Likewise.
>   (check_effective_target_has_arch_pwr8): Likewise.
>   (check_effective_target_has_arch_pwr9): Likewise.
>   (check_effective_target_has_arch_pwr10): Likewise.
>   (check_effective_target_has_arch_ppc64): Likewise.
>   (check_effective_target_ppc_float128): Likewise.
>   (check_effective_target_ppc_float128_insns): Likewise.
>   (check_effective_target_powerpc_vsx): Likewise.
> ---
>  gcc/testsuite/lib/target-supports.exp | 33 +++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 4ed7b25b9a4..06484330178 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -6259,9 +6259,12 @@ proc check_effective_target_powerpc_sqrt { } {
>  }
> 
>  return [check_no_compiler_messages powerpc_sqrt object {
> + void test (void)
> + {
>   #ifndef _ARCH_PPCSQ
>   #error _ARCH_PPCSQ is not defined
>   #endif
> + }
>  } {}]
>  }
> 
> @@ -6369,71 +6372,92 @@ proc check_effective_target_powerpc_p9modulo_ok { } {
>  # as provided by the test.
>  proc check_effective_target_has_arch_pwr5 { } {
>   return [check_no_compiler_messages_nocache arch_pwr5 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR5
>   #error does not have power5 support.
>   #else
>   /* "has power5 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr6 { } {
>   return [check_no_compiler_messages_nocache arch_pwr6 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR6
>   #error does not have power6 support.
>   #else
>   /* "has power6 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr7 { } {
>   return [check_no_compiler_messages_nocache arch_pwr7 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR7
>   #error does not have power7 support.
>   #else
>   /* "has power7 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr8 { } {
>   return [check_no_compiler_messages_nocache arch_pwr8 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR8
>   #error does not have power8 support.
>   #else
>   /* "has power8 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr9 { } {
>   return [check_no_compiler_messages_nocache arch

Re: [PATCH] rs6000: Don't ICE when we disassemble an MMA variable [PR101322]

2022-09-05 Thread Kewen.Lin via Gcc-patches
on 2022/9/1 22:17, Peter Bergner wrote:
> On 9/1/22 3:29 AM, Kewen.Lin wrote:
>>> I have no idea why ptr_vector_*_type would behave differently here than
>>> build_pointer_type (vector_*_type_node).  Using the build_pointer_type()
>>> fixed it for me, so that's why I went with it. :-)  Maybe this is a bug
>>> in lto???
>>
>> Thanks for your time to reproduce this!
>>
>> The only difference is that ptr_vector_*_type are built from the
>> qualified_type based on vector_*_type_node, instead of directly from
>> vector_*_type_node.  I'm interested to have a further look at this later.
> 
> If you look into this, please let me know.  I'd like to know what you
> find out.

I just filed PR106833 for it.

BR,
Kewen


Re: [PATCH] rs6000/test: Fix bswap64-4.c with has_arch_ppc64 [PR106680]

2022-09-04 Thread Kewen.Lin via Gcc-patches
on 2022/9/3 01:44, Segher Boessenkool wrote:
> On Fri, Sep 02, 2022 at 08:51:01AM +0800, Kewen.Lin wrote:
>> on 2022/9/1 23:04, Segher Boessenkool wrote:
>>> On Thu, Sep 01, 2022 at 05:05:44PM +0800, Kewen.Lin wrote:
 Without any explicit -mpowerpc64 (and -mno-), I think we all agree
 that -m64 should set OPTION_MASK_POWERPC64 in opts, conversely -m32
 should unset OPTION_MASK_POWERPC64 in opts.
>>>
>>> The latter only for OSes that do not handle -mpowerpc64 correctly.
>>
>> I think it's the same for the OSes that handle -mpowerpc64 correctly.
> 
> No.  -m32 should not set or unset POWERPC64.  The two options are
> independent.
> 
> -m64 on the other hand forces POWERPC64 to on.  -m64 -mno-powerpc64 is
> invalid (and we do indeed error on that).  But we do allow
>   -m32 -mno-powerpc64 -m64
> (silently enabling it again), urgh.

I just realized the discussion here depends on how we implemented it,
I can understand what you mean now.  Yes, if we implemented it like the
other option supports in rs6000_option_override_internal, we only get
-m32 or -m64 eventually, we don't need to do anything for -m32 but need
to forces POWERPC64 for -m64 if it's not set in opts_set.  The current
implementation by setting and unsetting in command line option handling
is bad, it makes us have to set/unset on the way.

> 
>>
>> Note that it's for the context without any explicit -mpowerpc64 (and
>> -mno-), assuming we don't "unset OPTION_MASK_POWERPC64 in opts" for
>> -m32, then the command line "-m64 -m32" would not be the same as
>> "-m32", since the previous "-m64" sets OPTION_MASK_POWERPC64 in opts
>> and it's still kept, it's unexpected.
> 
> No.  -m64 -m32 does not set POWERPC64!  Or it shouldn't, in any case :-(
> 

Yeah, (... does not set/unset), thanks again for your clarification. :)

BR,
Kewen


Re: [PATCH] rs6000/test: Fix bswap64-4.c with has_arch_ppc64 [PR106680]

2022-09-04 Thread Kewen.Lin via Gcc-patches
on 2022/9/3 01:36, Segher Boessenkool wrote:
> On Fri, Sep 02, 2022 at 08:50:52AM +0800, Kewen.Lin wrote:
>> on 2022/9/1 22:57, Segher Boessenkool wrote:
>>> These two are independent, but apparently we have a bug here, which will
>>> make what you did malfunction in some cases -- the test will not run for
>>> ilp32 if you have RUNTESTFLAGS {-m32,-m64}.
>>
>> Yeah, because of the bug (or call it surprised behavior),
> 
> No, I call it a bug.  Because that is what it is!
> 

OK. :)

>> the test case can
>> fail for some dejaGnu version like 1.5.1 (how it places the dg-options 
>> matters).
> 
> Yes, but that is only one way to expose the problem.
> 
> The bug just should be fixed.

Agreed.

> 
>> But to be clarified, the order of 
>>
>>   /* { dg-options "-O2 -mpowerpc64" } */
>>
>> and 
>>   
>>   /* { dg-require-effective-target has_arch_ppc64 } */
>>
>> matters in this proposed fix, not for the line with ilp32.
> 
> Of course :-)
> 
>> has_arch_ppc64 uses current_compiler_flags which only incorporates dg-options
>> which is placed before the dg-require-effective-target.  I guess it's related
>> to how dejaGnu parses lines and sets global variables, for this kind of case,
>> we have to put the expected order for now.
> 
> Even just to avoid having to uselessly edit hundreds of testcases, it
> would be better to just fix the bug!

I think "the bug" here means the "-mpower64" with "-m32/-m64" thing, not the 
dejaGnu thing mentioned above, then yes.

BR,
Kewen


Re: [PATCH v2, rs6000] Put dg-options before effective target checks

2022-09-01 Thread Kewen.Lin via Gcc-patches
on 2022/9/2 11:23, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> On 1/9/2022 下午 5:34, Kewen.Lin wrote:
>> Thanks for the updated patch!
>>
>> I just found that it seems all the three test cases suffer the empty
>> TU error issue from those has_arch* effective target checks?
>>
>> If yes, it looks we don't need to bother this once patch [1] gets
>> landed?
>>
>> Sorry, I didn't notice and ask when reviewing the previous version.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598748.html
> 
> Yes, those 3 test cases all suffer from "empty translation unit" problem.
> My patch just has an side effect which avoid "empty translation unit"
> problem. But the real problem is still there.

OK, thanks for the information!  If so, I would prefer to leave them
alone for now, the issues should be fixed once [1] gets landed.

> 
> pr92398.p9+.c has another problem. It's a compiling case and it should be
> compiled on any platform when "-mdejagnu-cpu=power9" is set in dg-options
> or RUNTESTFLAGS. Putting dg-options before "has_arch_pwr9" check achieves
> this target.

OK, then go ahead to enhance it separately.  :)

BR,
Kewen


Re: [PATCH 1/2] Using pli(paddi) and rotate to build 64bit constants

2022-09-01 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

Thanks for the patch, some comments on nits are inline.

on 2022/9/1 11:24, Jiufu Guo wrote:
> Hi,
> 
> As mentioned in PR106550, since pli could support 34bits immediate, we could
> use less instructions(3insn would be ok) to build 64bits constant with pli.
> 
> For example, for constant 0x020805006106003, we could generate it with:
> asm code1:
> pli 9,101736451 (0x6106003)
> sldi 9,9,32
> paddi 9,9, 213 (0x0208050)
> 
> or asm code2:
> pli 10, 213
> pli 9, 101736451
> rldimi 9, 10, 32, 0
> 
> Testing with simple cases as below, run them a lot of times:
> f1.c
> long __attribute__ ((noinline)) foo (long *arg,long *,long*)
> {
>   *arg = 0x2351847027482577;
> }
> 5insns: base
> pli+sldi+paddi: similar -0.08%
> pli+pli+rldimi: faster +0.66%
> 
> f2.c
> long __attribute__ ((noinline)) foo (long *arg, long *arg2, long *arg3)
> {
>   *arg = 0x2351847027482577;
>   *arg2 = 0x3257845024384680;
>   *arg3 = 0x1245abcef9240dec;
> }
> 5nisns: base
> pli+sldi+paddi: faster +1.35%
> pli+pli+rldimi: faster +5.49%
> 
> f2.c would be more meaningful.  Because 'sched passes' are effective for
> f2.c, but 'scheds' do less thing for f1.c.
> 
> Compare with previous patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599525.html
> This one updates code slightly and extracts changes on rs6000.md to a
> seperate patch.
> 
> This patch pass boostrap and regtest on ppc64le(includes p10).
> Is it ok for trunk?
> 
> BR,
> Jeff(Jiufu)
> 
> 
>   PR target/106550
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add 'pli' for
>   constant building.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr106550.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc | 39 +
>  gcc/testsuite/gcc.target/powerpc/pr106550.c | 14 
>  2 files changed, 53 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..1ccb2ff30a1 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10181,6 +10181,45 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>   gen_rtx_IOR (DImode, copy_rtx (temp),
>GEN_INT (ud1)));
>  }
> +  else if (TARGET_PREFIXED)
> +{
> +  /* pli 9,high32 + pli 10,low32 + rldimi 9,10,32,0.  */
> +  if (can_create_pseudo_p ())
> + {
> +   temp = gen_reg_rtx (DImode);
> +   rtx temp1 = gen_reg_rtx (DImode);
> +   emit_move_insn (copy_rtx (temp), GEN_INT ((ud4 << 16) | ud3));
> +   emit_move_insn (copy_rtx (temp1), GEN_INT ((ud2 << 16) | ud1));
> +

Nit: copy_rtx here seems not necessary, as both temp and temp1 are with CODE 
REG.
The function copy_rtx returns the given rtx for code REG.

> +   emit_insn (gen_rotldi3_insert_3 (dest, temp, GEN_INT (32), temp1,
> +GEN_INT (0x)));
> + }
> +
> +  /* pli 9,high32 + sldi 9,32 + paddi 9,9,low32.  */
> +  else
> + {
> +   emit_move_insn (copy_rtx (dest), GEN_INT ((ud4 << 16) | ud3));
> +
> +   emit_move_insn (copy_rtx (dest),
> +   gen_rtx_ASHIFT (DImode, copy_rtx (dest),
> +   GEN_INT (32)));
> +
> +   bool can_use_paddi = REGNO (dest) != FIRST_GPR_REGNO;
> +

The REGNO usage has asserted dest is with CODE REG, if it's always true
I don't see why we need copy_rtx around.  Or do I miss something?

> +   /* Use paddi for the low32 bits.  */
> +   if (ud2 != 0 && ud1 != 0 && can_use_paddi)
> + emit_move_insn (dest, gen_rtx_PLUS (DImode, copy_rtx (dest),
> + GEN_INT ((ud2 << 16) | ud1)));
> +   /* Use oris, ori for low32 bits.  */
> +   if (ud2 != 0 && (ud1 == 0 || !can_use_paddi))
> + emit_move_insn (ud1 != 0 ? copy_rtx (dest) : dest,
> + gen_rtx_IOR (DImode, copy_rtx (dest),
> +  GEN_INT (ud2 << 16)));
> +   if (ud1 != 0 && (ud2 == 0 || !can_use_paddi))
> + emit_move_insn (dest, gen_rtx_IOR (DImode, copy_rtx (dest),
> +GEN_INT (ud1)));
> + }
> +}
>else
>  {
>temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> new file mode 100644
> index 000..c6f4116bb9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> @@ -0,0 +1,14 @@
> +/* PR target/106550 */
> +/* { dg-options "-O2 -std=c99 -mdejagnu-cpu=power10" } */
> +

Need to check power10_ok, like:
/* { dg-require-effective-target power10_ok } */

Nit: -std=c99 is not needed?

BR,
Kewen


Re: [PATCH] rs6000/test: Fix bswap64-4.c with has_arch_ppc64 [PR106680]

2022-09-01 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/9/1 23:04, Segher Boessenkool wrote:
> On Thu, Sep 01, 2022 at 05:05:44PM +0800, Kewen.Lin wrote:
>>> On Wed, Aug 31, 2022 at 05:33:28PM +0800, Kewen.Lin wrote:
>>> *Should* -mpowerpc64  be disabled by -m32?  
>>
>> I think the reason to disable -mpowerpc64 at -m32 is that we have
>> -mpowerpc64 explicitly specified at -m64 (equivalent behavior).
> 
> *Im*plicitly.  Explicit means the user has it on the command line.
> 

aha, let me reword it. :)  ... is that when -m64 is specified we make
it act like -mpowerpc64 is specified explicitly too even if user doesn't
actually specify -mpowerpc64.

>> In the current implementation, when -m64 is specified, we set the
>> bit OPTION_MASK_POWERPC64 in both opts and opts_set.  Since we
>> set OPTION_MASK_POWERPC64 in opts_set for -m64, when we find the
>> OPTION_MASK_POWERPC64 is ON in opts_set, we don't know if there
>> is one actual cmd-line option -mpowerpc64 or just -m64.
> 
> Yes.  That is what _explicit is for :-)
> 
>> Without any explicit -mpowerpc64 (and -mno-), I think we all agree
>> that -m64 should set OPTION_MASK_POWERPC64 in opts, conversely -m32
>> should unset OPTION_MASK_POWERPC64 in opts.
> 
> The latter only for OSes that do not handle -mpowerpc64 correctly.

I think it's the same for the OSes that handle -mpowerpc64 correctly.

Note that it's for the context without any explicit -mpowerpc64 (and
-mno-), assuming we don't "unset OPTION_MASK_POWERPC64 in opts" for
-m32, then the command line "-m64 -m32" would not be the same as
"-m32", since the previous "-m64" sets OPTION_MASK_POWERPC64 in opts
and it's still kept, it's unexpected.

> 
>> To make -m32/-m64 and -mpowerpc64 orthogonal, IMHO we should not
>> set bit OPTION_MASK_POWERPC64 in opts_set for -m64.
> 
> No.  Instead, we should not touch it if the user has explicitly set it
> or unset it.  Just like with all other flags :-)

I may miss something, but I think what we said here is consistent.
"should not set bit OPTION_MASK_POWERPC64 in opts_set" means we should
not make it act as -mpowerpc64 is specified explicitly, (once we won't
do the "unexpected" thing for -m64, then no reason to unset it for -m32
conversely, so explicit set/unset -mpowerpc64 is independent of -m32/-m64). 

BR,
Kewen


Re: [PATCH] rs6000/test: Fix bswap64-4.c with has_arch_ppc64 [PR106680]

2022-09-01 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2022/9/1 22:57, Segher Boessenkool wrote:
> On Thu, Sep 01, 2022 at 04:57:59PM +0800, Kewen.Lin wrote:
>> on 2022/8/31 22:13, Peter Bergner wrote:
>>> On 8/31/22 4:33 AM, Kewen.Lin wrote:
 @@ -1,7 +1,8 @@
  /* { dg-do compile { target { powerpc*-*-* } } } */
  /* { dg-skip-if "" { powerpc*-*-aix* } } */
 -/* { dg-options "-O2 -mpowerpc64" } */
  /* { dg-require-effective-target ilp32 } */
 +/* { dg-options "-O2 -mpowerpc64" } */
 +/* { dg-require-effective-target has_arch_ppc64 } */
>>>
>>> With many of our recent patches moving the dg-options before any
>>> dg-requires-effectice-target so it affects the results of the
>>> dg-requires-effectice-target test, this looks like it's backwards
>>> from that process.  I understand why, so I think an explicit comment
>>> here in the test case explaining why it's after in this case.
>>> Just so in a few years when we come back to this test case, we
>>> won't accidentally undo this change.
>>
>> Oops, the diff shows it's like "after", but it's actually still "before". :)
>> The dg-options is meant to be placed before the succeeding has_arch_ppc64
>> effective target which is supposed to use dg-options to compile.  I felt
>> good to let ilp32 checking go first then has_arch_ppc64, so moved dg-option
>> downward.
> 
> These two are independent, but apparently we have a bug here, which will
> make what you did malfunction in some cases -- the test will not run for
> ilp32 if you have RUNTESTFLAGS {-m32,-m64}.

Yeah, because of the bug (or call it surprised behavior), the test case can
fail for some dejaGnu version like 1.5.1 (how it places the dg-options matters).
What I proposed is to detect this kind of test environment by has_arch_ppc64,
then turn the failure into unsupported.  Then the test case can survive for
any dejaGnu versions.  But based on the discussions, I'd like to try to fix
the bug and abandon this testing fix first.

> 
> It should not make a difference, -mpowerpc64 and -m32 should be wholly
> independent, and their order should not matter.  So the order of the
>   /* { dg-require-effective-target ilp32 } */
>   /* { dg-options "-O2 -mpowerpc64" } */
> lines should not make a difference either.  But it does :-(
> 

I agree the point that the order of lines should not make a difference.  :)
But to be clarified, the order of 

  /* { dg-options "-O2 -mpowerpc64" } */

and 
  
  /* { dg-require-effective-target has_arch_ppc64 } */

matters in this proposed fix, not for the line with ilp32.

has_arch_ppc64 uses current_compiler_flags which only incorporates dg-options
which is placed before the dg-require-effective-target.  I guess it's related
to how dejaGnu parses lines and sets global variables, for this kind of case,
we have to put the expected order for now.

BR,
Kewen


Re: [PATCH v2, rs6000] Put dg-options before effective target checks

2022-09-01 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/9/1 13:30, HAO CHEN GUI wrote:
> Hi,
>   This patch changes the sequence of test directives for 3 test cases.
> Originally, these 3 cases got failed or unsupported on some platforms, as
> their effective target checks depend on compiling options.
> 

Thanks for the updated patch!

I just found that it seems all the three test cases suffer the empty
TU error issue from those has_arch* effective target checks?

If yes, it looks we don't need to bother this once patch [1] gets
landed?

Sorry, I didn't notice and ask when reviewing the previous version.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598748.html

BR,
Kewen

>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> 2022-08-31  Haochen Gui  
> 
> rs6000: Change the sequence of test directives for some test cases.  Put
> dg-options before effective target checks as those has_arch_* adopt
> current_compiler_flags in their checks and rely on compiling options to get an
> accurate check.  dg-options setting before dg-require-effective-target are
> added into current_compiler_flags, but not added if they're after.  So
> adjusting the location of dg-options makes the check more robust.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/pr92398.p9+.c: Put dg-options before effective
>   target check.  Replace lp64 check with has_arch_ppc64 and int128.
>   * gcc.target/powerpc/pr92398.p9-.c: Likewise.
>   * gcc.target/powerpc/pr93453-1.c: Put dg-options before effective
>   target check.
> 
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c 
> b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
> index 72dd1d9a274..b4f5c7f4b82 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
> @@ -1,6 +1,10 @@
> -/* { dg-do compile { target { lp64 && has_arch_pwr9 } } } */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +/* { dg-require-effective-target int128 } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
> -/* { dg-options "-O2 -mvsx" } */
> +/* The test case can be compiled on all platforms with compiling option
> +   -mdejagnu-cpu=power9.  */
> 
>  /* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */
>  /* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c 
> b/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
> index bd7fa98af51..4e6a8c8cb8e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
> @@ -1,6 +1,8 @@
> -/* { dg-do compile { target { lp64 && {! has_arch_pwr9} } } } */
> -/* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O2 -mvsx" } */
> +/* { dg-do compile { target { ! has_arch_pwr9 } } } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
>  /* { dg-final { scan-assembler-times {\mnot\M} 2 { xfail be } } } */
>  /* { dg-final { scan-assembler-times {\mstd\M} 2 { xfail { { {! 
> has_arch_pwr9} && has_arch_pwr8 } && be } } } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr93453-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
> index b396458ba12..6f4d899c114 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr93453-1.c
> @@ -1,5 +1,6 @@
> -/* { dg-do compile { target has_arch_ppc64 } } */
> +/* { dg-do compile } */
>  /* { dg-options "-mdejagnu-cpu=power6 -O2" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
>  unsigned long load_byte_reverse (unsigned long *in)
>  {



Re: [PATCH] rs6000/test: Fix bswap64-4.c with has_arch_ppc64 [PR106680]

2022-09-01 Thread Kewen.Lin via Gcc-patches
Hi Segher and Peter,

Thanks a lot for your insightful comments on this.

I just read through all discussions and plan to give a
try as replied below.

on 2022/8/31 23:24, Segher Boessenkool wrote:
> On Wed, Aug 31, 2022 at 05:33:28PM +0800, Kewen.Lin wrote:
>> Test case bswap64-4.c suffers the issue as its comments:
>>
>> /* On some versions of dejagnu this test will fail when
>>biarch testing with RUNTESTFLAGS="--target_board=unix
>>'{-m64,-m32}'" due to -m32 being added on the command
>>line after the dg-options -mpowerpc64.
>>common/config/rs6000/rs6000-common.c:
>>rs6000_handle_option disables -mpowerpc64 for -m32.  */
>>
>> As tested, on test machine with dejaGnu 1.6.2, the compilation
>> option order looks like: -m32 ... -mpowerpc64, option
>> -mpowerpc64 still takes effect;  While on test machine with
>> dejaGnu 1.5.1, the option order looks like: -mpowerpc64 ... -m32,
>> option -mpowerpc64 is disabled by -m32, then the case fails.
> 
> *Should* -mpowerpc64  be disabled by -m32?  

I think the reason to disable -mpowerpc64 at -m32 is that we have
-mpowerpc64 explicitly specified at -m64 (equivalent behavior).

In the current implementation, when -m64 is specified, we set the
bit OPTION_MASK_POWERPC64 in both opts and opts_set.  Since we
set OPTION_MASK_POWERPC64 in opts_set for -m64, when we find the
OPTION_MASK_POWERPC64 is ON in opts_set, we don't know if there
is one actual cmd-line option -mpowerpc64 or just -m64.

Assuming there is -m32 given after -m64 in cmd-line option, it's
also unclear how OPTION_MASK_POWERPC64 in opts_set is set, so
to keep conservative it has to disable -mpowerpc64 to ensure
the options like "-m64 -m32" not to have OPTION_MASK_POWERPC64
ON, just like what we have when just specifying "-m32".

Without any explicit -mpowerpc64 (and -mno-), I think we all agree
that -m64 should set OPTION_MASK_POWERPC64 in opts, conversely -m32
should unset OPTION_MASK_POWERPC64 in opts.

To make -m32/-m64 and -mpowerpc64 orthogonal, IMHO we should not
set bit OPTION_MASK_POWERPC64 in opts_set for -m64.  I'm not sure
if there is some particular reason why we set OPTION_MASK_POWERPC64
in opts_set, I hope no. :)  One possible reason I can imagine is
that we want to get the cmd-line options "-mno-powerpc64 -m64" not
raise error, but I think having it to error makes more senses.

So if no objections I'm going to give it a shot like:

```
Iff -mpowerpc64 (or -mno-powerpc64) is specified, the bit
OPTION_MASK_POWERPC64 in opts_set is set.  Either -m64 and -m32
will leave OPTION_MASK_POWERPC64 in opts alone, it only honors
the specified option, and we will raise error for "-m64" +
"-mno-powerpc64" (either order).

When no explicit -mpowerpc64 (or -mno-powerpc64) is provided,
for -m64, set bit OPTION_MASK_POWERPC64 in opts; while for -m32,
unset bit OPTION_MASK_POWERPC64 in opts.  Both will not touch
OPTION_MASK_POWERPC64 in opts_set.
```

btw, I guess the option compatibility isn't an blocking issue
here, right?

BR,
Kewen


Re: [PATCH] rs6000/test: Fix bswap64-4.c with has_arch_ppc64 [PR106680]

2022-09-01 Thread Kewen.Lin via Gcc-patches
on 2022/8/31 22:13, Peter Bergner wrote:
> On 8/31/22 4:33 AM, Kewen.Lin wrote:
>> @@ -1,7 +1,8 @@
>>  /* { dg-do compile { target { powerpc*-*-* } } } */
>>  /* { dg-skip-if "" { powerpc*-*-aix* } } */
>> -/* { dg-options "-O2 -mpowerpc64" } */
>>  /* { dg-require-effective-target ilp32 } */
>> +/* { dg-options "-O2 -mpowerpc64" } */
>> +/* { dg-require-effective-target has_arch_ppc64 } */
> 
> With many of our recent patches moving the dg-options before any
> dg-requires-effectice-target so it affects the results of the
> dg-requires-effectice-target test, this looks like it's backwards
> from that process.  I understand why, so I think an explicit comment
> here in the test case explaining why it's after in this case.
> Just so in a few years when we come back to this test case, we
> won't accidentally undo this change.

Oops, the diff shows it's like "after", but it's actually still "before". :)
The dg-options is meant to be placed before the succeeding has_arch_ppc64
effective target which is supposed to use dg-options to compile.  I felt
good to let ilp32 checking go first then has_arch_ppc64, so moved dg-option
downward.

Sorry for the confusion, I should have placed the has_arch_ppc64
effective target just after the dg-options.  Anyway, it's a good idea
to add more comments in test case source!  Thanks!

BR,
Kewen


Re: [PATCH] rs6000: Don't ICE when we disassemble an MMA variable [PR101322]

2022-09-01 Thread Kewen.Lin via Gcc-patches
>>> ...and of course, now I can't recreate that issue at all and the
>>> ptr_vector_*_type use work fine now.  Strange! ...so ok, changed.
>>> Maybe the behavior changed since my PR106017 fix went in???
>>
>> That is my best guess as well.  But, how did that help this test?
> 
> It didn't. :-)   During my bootstrap, I hit the gimple verification issue
> I mentioned seeing earlier.  My problem was I thought I hit it with the
> test case, but it was exposed on a different test case in the testsuite.
> Here's what I'm seeing, which only happens when using -O0 -flto:
> 
> rain6p1% gcc -O0 -mcpu=power10 -flto pr102347.c 
> lto1: internal compiler error: in gimple_canonical_types_compatible_p, at 
> tree.cc:13677
> 0x11930a97 gimple_canonical_types_compatible_p(tree_node const*, tree_node 
> const*, bool)
>   /home/bergner/gcc/gcc-fsf-mainline-pr101322/gcc/tree.cc:13677
> 0x1192f1ab verify_type_variant
>   /home/bergner/gcc/gcc-fsf-mainline-pr101322/gcc/tree.cc:13377
> 0x11930beb verify_type(tree_node const*)
>   /home/bergner/gcc/gcc-fsf-mainline-pr101322/gcc/tree.cc:13700
> 0x106bbd37 lto_fixup_state
>   /home/bergner/gcc/gcc-fsf-mainline-pr101322/gcc/lto/lto-common.cc:2629
> 0x106bbff3 lto_fixup_decls
>   /home/bergner/gcc/gcc-fsf-mainline-pr101322/gcc/lto/lto-common.cc:2660
> 0x106bce13 read_cgraph_and_symbols(unsigned int, char const**)
>   /home/bergner/gcc/gcc-fsf-mainline-pr101322/gcc/lto/lto-common.cc:2901
> 0x1067bcbf lto_main()
>   /home/bergner/gcc/gcc-fsf-mainline-pr101322/gcc/lto/lto.cc:656
> Please submit a full bug report, with preprocessed source (by using 
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
> lto-wrapper: fatal error: 
> /home/bergner/gcc/build/gcc-fsf-mainline-pr101322-debug/gcc/xgcc returned 1 
> exit status
> compilation terminated.
> /home/bergner/binutils/install/binutils-power10/bin/ld: error: lto-wrapper 
> failed
> collect2: error: ld returned 1 exit status
> 
> The problem goes away if I use use -O1 or above, I drop -flto or I use
> the code I originally posted without the ptr_vector_*_type
> 
> The assert in gimple_canonical_types_compatible_p() we're hitting is:
> 13673 default:
> 13674   /* Consider all types with language specific trees in them 
> mutually
> 13675  compatible.  This is executed only from verify_type and false
> 13676  positives can be tolerated.  */
> 13677   gcc_assert (!in_lto_p);
> 13678   return true;
> 
> I have no idea why ptr_vector_*_type would behave differently here than
> build_pointer_type (vector_*_type_node).  Using the build_pointer_type()
> fixed it for me, so that's why I went with it. :-)  Maybe this is a bug
> in lto???

Thanks for your time to reproduce this!

The only difference is that ptr_vector_*_type are built from the
qualified_type based on vector_*_type_node, instead of directly from
vector_*_type_node.  I'm interested to have a further look at this later.

BR,
Kewen


Re: [PATCH] rs6000: Don't ICE when we disassemble an MMA variable [PR101322]

2022-09-01 Thread Kewen.Lin via Gcc-patches
 +  if (TREE_TYPE (TREE_TYPE (src_ptr)) != src_type)
>>>
>>> This line looks unexpected, the former is type char while the latter is 
>>> type __vector_pair *.
>>>
>>> I guess you meant to compare the type of pointer type like: 
>>>
>>>TREE_TYPE (TREE_TYPE (src_ptr)) != TREE_TYPE (src_type)
>>
>> Maybe?  However, if that is the case, how can it be working for me?
>> Let me throw this in the debugger and verify the types and I'll report
>> back with what I find.
> 
> Ok, you are correct.  Thanks for catching that!  I don't think we need
> those matching outer TREE_TYPE() uses.  I think just a simple:
> 
>   if (TREE_TYPE (src_ptr) != src_type)
> 
> ...should suffice.
> 

Yeah, it's enough for the associated test case.  :)

> 
>>> or even with mode like:
>>>
>>>TYPE_MODE (TREE_TYPE (TREE_TYPE (src_ptr))) != TYPE_MODE (TREE_TYPE 
>>> (src_type))
> 
> I'd rather not look at the mode here, since OOmode/XOmode doesn't necessarily
> mean __vector_{pair,quad}, so I'll go with the modified test above.

Good point.  I thought the cv qualifier can affect the type equality check and
assumed for test case like:

void
foo (char *resp, const __vector_pair *vpp)
{
  __builtin_vsx_disassemble_pair (resp, (__vector_pair *) vpp);
}

, we don't want to have the conversion there and the ICE seems related to the
underlying mode, so I thought maybe you wanted to use TYPE_MODE.


 +  src_ptr = build1 (VIEW_CONVERT_EXPR, src_type, src_ptr);
>>>
>>> Nit: NOP_EXPR seems to be better suited here for pointer conversion.
> 
> Ok, this works too, so code changed to use it.  Thanks!
> 
> Question for my own education, when would you use VIEW_CONVERT_EXPR over 
> NOP_EXPR?

tree.def has some note about VIEW_CONVERT_EXPR, it quite matches what Segher 
replied.
In my experience, VIEW_CONVERT_EXPR are used a lot for vector type conversion.

BR,
Kewen


Re: [PATCH] rs6000/test: Fix typo in pr86731-fwrapv-longlong.c [PR106682]

2022-09-01 Thread Kewen.Lin via Gcc-patches
Hi Segher & Peter,

Thanks for your reviews!

on 2022/8/31 23:12, Segher Boessenkool wrote:
> On Wed, Aug 31, 2022 at 05:33:21PM +0800, Kewen.Lin wrote:
>> It's meant to update "lxv" to "p?lxv" and should leave the
>> "lvx" unchanged.  So this is to fix the typo accordingly.
>>
>> I'll push this soon if no objections.
> 
> Please go ahead.  Out of interest, did you see failures from this, was
> it just by visual inspection,  something else?
> 

I did reproduce the failure for this test case on ppc64 P8 machine. :)
For the other test cases updated by commit r12-2266, I did a quick visual
inspection on them instead of actually testing them, there are some other
typos but they have been fixed by r12-2889-g8464894c86b03e.

To avoid some to escape, I just tested the other cases on ppc64 P8 and
ppc64le P9 and P10, no failures were found.

So committed as r13-2332-g023c5b36e47697.  Thanks!

BR,
Kewen


<    1   2   3   4   5   6   7   8   9   10   >