[Bug target/115713] rs6000: Miss warning for incompatible no-altivec and vsx in target attribute

2024-07-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115713

--- Comment #6 from Kewen Lin  ---
(In reply to Richard Biener from comment #5)
> The docs are at least imprecise.  Surely command-line -maltivec with
> target ("no-vsx") shouldn't revert to whatever is default with the target
> opts.

Thanks for confirming, I'll update the affected test case.

[Bug target/115713] rs6000: Miss warning for incompatible no-altivec and vsx in target attribute

2024-07-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115713

Kewen Lin  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org

--- Comment #4 from Kewen Lin  ---
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 76bbb3a28ea..4638c34cc24 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -24638,8 +24638,11 @@ rs6000_inner_target_options (tree args, bool attr_p)
>{
>  if (mask == OPTION_MASK_VSX)
>{
> -mask |= OPTION_MASK_ALTIVEC;
> -TARGET_AVOID_XFORM = 0;
> +if (!(rs6000_isa_flags_explicit
> +  & OPTION_MASK_ALTIVEC))
> +  mask |= OPTION_MASK_ALTIVEC;
> +if (!OPTION_SET_P (TARGET_AVOID_XFORM))
> +  TARGET_AVOID_XFORM = 0;
>}
>}

Testing this patch and found one regression failure:

gcc/testsuite/gcc.target/powerpc/ppc-target-4.c

It fails due to that the command line (dg-options) specifies -mno-altivec

/* { dg-options "-O2 -ffast-math -mdejagnu-cpu=power5 -mno-altivec
-mabi=altivec -fno-unroll-loops" } */

meanwhile #pragma target just specifies "vsx"

#pragma GCC target("vsx")
#include 


w/o this patch, "vsx" would enable altivec implicitly no matter that command
line option specifying -mno-altivec, while w/ this patch "vsx" won't enable
altivec any more.  It's like the case in #c0, the difference is -mno-altivec is
from command line or target attribute itself. It can be fixed by explicitly
specifying "-maltivec" in target attribute.

But by checking the manual, I noticed the documentation on target attribute
says "The original target command-line options are ignored.", I think it
contradicts with what we implemented for target attribute parsing now? It seems
the other targets don't conform with this very well, target_option_default_node
has considered all command line options? I wonder if the documentation is
intentional to define like this.

[Bug target/115466] rs6000 vec_ld built-in works on BE but not LE

2024-07-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115466

Kewen Lin  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #7 from Kewen Lin  ---
Per all the discussion above, resolving this as invalid.

[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496

2024-07-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Kewen Lin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #25 from Kewen Lin  ---
Should be fixed on trunk and affected release branches now.

[Bug target/106069] [12/13/14/15 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2024-07-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

Kewen Lin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Status|NEW |RESOLVED

--- Comment #52 from Kewen Lin  ---
Should be fixed on trunk and affected release branches now.

[Bug target/115739] Building cross-compiler to sparc-wrs-vxworks fails since r15-1594-g55947b32c38a40

2024-07-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115739

Kewen Lin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Kewen Lin  ---
Should be fixed on trunk.

[Bug target/115739] Building cross-compiler to sparc-wrs-vxworks fails since r15-1594-g55947b32c38a40

2024-07-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115739

--- Comment #4 from Kewen Lin  ---
(In reply to Eric Botcazou from comment #3)
> The fix is OK for mainline, thanks!

Thanks Eric! btw, a formal patch was sent at
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656136.html

[Bug target/115739] Building cross-compiler to sparc-wrs-vxworks fails since r15-1594-g55947b32c38a40

2024-07-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115739

Kewen Lin  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu.org

--- Comment #2 from Kewen Lin  ---
The commit dropped the define of SPARC_LONG_DOUBLE_TYPE_SIZE in sparc.cc and we
don't have a default one in sparc.h:

  /* SPARC_LONG_DOUBLE_TYPE_SIZE is defined per OS even though the
 SPARC ABI says that it is 128-bit wide.  LONG_DOUBLE_TYPE_SIZE
 get poisoned, so add SPARC_ prefix.  */
  /* #define SPARC_LONG_DOUBLE_TYPE_SIZE  128 */

Although we can bring back the define to sparc.cc, but per the above comments,
I think we want to define it in vxworks.h:

diff --git a/gcc/config/sparc/vxworks.h b/gcc/config/sparc/vxworks.h
index c1a9310fb3f..4cdb3b1685d 100644
--- a/gcc/config/sparc/vxworks.h
+++ b/gcc/config/sparc/vxworks.h
@@ -62,3 +62,7 @@ along with GCC; see the file COPYING3.  If not see
 /* This platform supports the probing method of stack checking (RTP mode).
8K is reserved in the stack to propagate exceptions in case of overflow. 
*/
 #define STACK_CHECK_PROTECT 8192
+
+/* SPARC_LONG_DOUBLE_TYPE_SIZE should be defined per OS.  */
+#undef SPARC_LONG_DOUBLE_TYPE_SIZE
+#define SPARC_LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)

Built well with the above fix.

[Bug target/115739] Building cross-compiler to sparc-wrs-vxworks fails since r15-1594-g55947b32c38a40

2024-07-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115739

Kewen Lin  changed:

   What|Removed |Added

   Last reconfirmed||2024-07-02
   Target Milestone|--- |15.0
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Kewen Lin  ---
Thanks for reporting! I'll take a look at this.

[Bug target/115713] rs6000: Miss warning for incompatible no-altivec and vsx in target attribute

2024-07-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115713

--- Comment #3 from Kewen Lin  ---
(In reply to Peter Bergner from comment #2)
> (In reply to Kewen Lin from comment #0)
> > As Peter found in the PR115688, there isn't a warning for:
> > 
> > long __attribute__ ((target ("no-altivec,vsx")))
> > foo (void)
> > {
> >   return 0;
> > }
> > 
> > It's expected to see warning like:
> > 
> > warning: ‘-mvsx’ and ‘-mno-altivec’ are incompatible
> 
> I think Segher and I mentioned in the other bug, that conflicting options
> like this should be an error, rather than a warning.

ah, thanks for noting this, sorry that I missed that part.  I agree that error
sounds better as both are explicitly specified. I thought the reason why it's a
warning is that the whole processing hunk is for warning:

  /* Add some warnings for VSX.  */
  if (TARGET_VSX)
{
  const char *msg = NULL;
  if (!TARGET_HARD_FLOAT)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
msg = N_("%<-mvsx%> requires hardware floating point");
  else
{
  rs6000_isa_flags &= ~ OPTION_MASK_VSX;
  rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
}
}
  else if (TARGET_AVOID_XFORM > 0)
msg = N_("%<-mvsx%> needs indexed addressing");
  else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
   & OPTION_MASK_ALTIVEC))
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
msg = N_("%<-mvsx%> and %<-mno-altivec%> are incompatible");
  else
msg = N_("%<-mno-altivec%> disables vsx");
}

  if (msg)
{
  warning (0, msg);
  rs6000_isa_flags &= ~ OPTION_MASK_VSX;
  rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
}
}

I think we still would like to leave the others as warning, then I'll add one
flag specific in explicit vsx and altivec arm for error.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-07-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #10 from Kewen Lin  ---
(In reply to Richard Biener from comment #9)
> I think the inversion code wants to check invert_tree_comparison and see if
> the inverted compare is supported and only if not fall back to inverting the
> comparison result (there is of course the multi-use case to consider).

OK, for now all/most targets claim to support all comparisons (do swapping and
inversion etc. in expanders themselves), it seems that we have to handle this
until then we have some generic handlings for them.

> I also think that incrementally improving the /* Try to fold x CMP y ? -1 :
> 0 to x CMP y.  */ is fine we don't have to handle everything in one patch.
> 
> Thanks for working on this.  The x86 folks seem to be able to handle most
> things within the backend which is also fine, handling common problems in
> the middle-end is of course better.

Thanks for the suggestions, posted two patches for review and comments. Yes, I
realized that with some define_insn_and_split in backend can also catch some
pattern and generate expected code.

[Bug target/115688] [15 regression] ICE on simple test case from r15-703-gb390b011569635

2024-06-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115688

--- Comment #8 from Kewen Lin  ---
> > -mabi={no-,}altivec is only for the 32-bit ABIs.  All the 64-bit ABIs had
> > either only compatible changes to support VMX, or only ever had support for
> > it in the first place.
> In that case, -mabi=no-altivec should also be a hard error if -m64 is in
> effect.

Filed PR115714 to track.

[Bug target/115714] rs6000: Refine option -mabi={no-}altivec handlings with some related option

2024-06-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115714

Kewen Lin  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
   Last reconfirmed||2024-06-30
 Target||powerpc*
   Keywords||diagnostic
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Ever confirmed|0   |1
   Target Milestone|--- |15.0

[Bug target/115714] New: rs6000: Refine option -mabi={no-}altivec handlings with some related option

2024-06-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115714

Bug ID: 115714
   Summary: rs6000: Refine option -mabi={no-}altivec handlings
with some related option
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

As Peter found in [1], even with altivec flag explicitly unset, we can still
have altivec_abi set, it's unexpected.

And we want to raise error when specify -mabi=no-altivec for linux 64 bit ABI
as Segher's comments in [2].

Besides, we need to sort out the combination between -mabi={no-}altivec and
-m{no-,}altivec (or -mvsx etc.) and raise warning/error if needed.

So filing this to track these.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654546.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115688#c5

[Bug target/115713] rs6000: Miss warning for incompatible no-altivec and vsx in target attribute

2024-06-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115713

--- Comment #1 from Kewen Lin  ---
There IS a warning for:

long __attribute__ ((target ("vsx,no-altivec")))
foo1 (void)
{
  return 0;
}

, interesting. :)

It's due to that we enable altivec when parsing vsx in target attribute, but
don't consider if it's explicit set, so the fix can be:

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 76bbb3a28ea..4638c34cc24 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -24638,8 +24638,11 @@ rs6000_inner_target_options (tree args, bool attr_p)
   {
 if (mask == OPTION_MASK_VSX)
   {
-mask |= OPTION_MASK_ALTIVEC;
-TARGET_AVOID_XFORM = 0;
+if (!(rs6000_isa_flags_explicit
+  & OPTION_MASK_ALTIVEC))
+  mask |= OPTION_MASK_ALTIVEC;
+if (!OPTION_SET_P (TARGET_AVOID_XFORM))
+  TARGET_AVOID_XFORM = 0;
   }
   }

[Bug target/115688] [15 regression] ICE on simple test case from r15-703-gb390b011569635

2024-06-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115688

Kewen Lin  changed:

   What|Removed |Added

 CC||meissner at gcc dot gnu.org

--- Comment #7 from Kewen Lin  ---
(In reply to Segher Boessenkool from comment #3)
> Something like that.
> 
> But why would we want to disable generation of VSX or VMX insns at all?
> This is similar to disabling generation of popcntd insns if you do not like
> those!
> 
> Having generation of V*X insns enabled is completely independent of whether
> something special is done for them for inter-procedural things (ABI things
> or similar).  It sounds like the actual problem this code wants to tackle is
> one of those things, but instead it uses a heavy hammer?

This adjustment was added since target attribute/pragma support
(r0-104781-gfd438373cdd2a5), Mike may have more insightful comments on this.
According to the comments around, it aims to avoid the error message when users
specify a target attribute like cpu=power7 while the command line is being
specified like -m32 -mcpu=power6 etc. Without this adjustment, the following
check will raise error "target attribute or pragma changes AltiVec ABI".

  if (TARGET_ELF)
{
  if (!OPTION_SET_P (rs6000_altivec_abi)
  && (TARGET_64BIT || TARGET_ALTIVEC || TARGET_VSX))
{
  if (main_target_opt != NULL &&
  !main_target_opt->x_rs6000_altivec_abi)
error ("target attribute or pragma changes AltiVec ABI");
  else
rs6000_altivec_abi = 1;
}
}

This adjustment silently disable this as it mask off altivec and vsx when they
are not explicitly specified.

(In reply to Peter Bergner from comment #4)
> (In reply to Kewen Lin from comment #2)
> 
> > +  /* Don't mask off ALTIVEC if it is enabled by an explicit VSX.  */
> > +  if (!TARGET_VSX || !(rs6000_isa_flags_explicit & OPTION_MASK_VSX))
> 
> TARGET_VSX is only true here if it was explictly used, so I think you can
> drop the "|| !(rs6000_isa_flags_explicit & OPTION_MASK_VSX)" part of this
> test.

Good point, will adjust it accordingly.

> That said, how does your patch handle the following test case?
> 
> long __attribute__ ((target ("no-altivec,vsx")))
> foo (void)
> {
>   return 0;
> }
> 
> ...currently, this compiles with with no error or warning message which
> seems wrong to me.

Good finding, but it is an separated issue, it shows one bug in our target
attribute handling, filed PR115713.

[Bug target/115713] rs6000: Miss warning for incompatible no-altivec and vsx in target attribute

2024-06-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115713

Kewen Lin  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-30
 Target||powerpc*
 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |15.0
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 CC||bergner at gcc dot gnu.org,
   ||meissner at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
 Ever confirmed|0   |1
   Keywords||diagnostic

[Bug target/115713] New: rs6000: Miss warning for incompatible no-altivec and vsx in target attribute

2024-06-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115713

Bug ID: 115713
   Summary: rs6000: Miss warning for incompatible no-altivec and
vsx in target attribute
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

As Peter found in the PR115688, there isn't a warning for:

long __attribute__ ((target ("no-altivec,vsx")))
foo (void)
{
  return 0;
}

It's expected to see warning like:

warning: ‘-mvsx’ and ‘-mno-altivec’ are incompatible

[Bug target/115688] [15 regression] ICE on simple test case from r15-703-gb390b011569635

2024-06-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115688

--- Comment #2 from Kewen Lin  ---
The assertion does expose an inconsistent combination !TARGET_ALTIVEC but
TARGET_VSX wiht 32-bit target attribute -mvsx.  There is one special handling
for altivec_abi:

  /* Disable VSX and Altivec silently if the user switched cpus to power7 in a
 target attribute or pragma which automatically enables both options,
 unless the altivec ABI was set.  This is set by default for 64-bit, but
 not for 32-bit.  Don't move this before the above code using ignore_masks,
 since it can reset the cleared VSX/ALTIVEC flag again.  */
  if (main_target_opt && !main_target_opt->x_rs6000_altivec_abi)
rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC)
  & ~rs6000_isa_flags_explicit);

// 32 bit has altivec_abi unset, so that's why it doesn't ICE at -m64.

It would mask off altivec and vsx flag bit if they are not specified explicitly
for 32-bit (which has altivec_abi unset). For the given case, vsx is explicitly
specified, altivec is implicitly enabled as it's part of ISA_2_6_MASKS_SERVER.
When hitting the above hunk, vsx is kept as it's explicitly enabled but altivec
gets masked off. Then it results in an unexpected status that we have vsx but
not altivec. The fix looks to guard altivec masking off by checking if vsx is
explicitly specified.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cd14e5a34ed..a8a3b79dda0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3925,8 +3925,12 @@ rs6000_option_override_internal (bool global_init_p)
  not for 32-bit.  Don't move this before the above code using
ignore_masks,
  since it can reset the cleared VSX/ALTIVEC flag again.  */
   if (main_target_opt && !main_target_opt->x_rs6000_altivec_abi)
-rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC)
-  & ~rs6000_isa_flags_explicit);
+{
+  rs6000_isa_flags &= ~(OPTION_MASK_VSX & ~rs6000_isa_flags_explicit);
+  /* Don't mask off ALTIVEC if it is enabled by an explicit VSX.  */
+  if (!TARGET_VSX || !(rs6000_isa_flags_explicit & OPTION_MASK_VSX))
+rs6000_isa_flags &= ~(OPTION_MASK_ALTIVEC &
~rs6000_isa_flags_explicit);
+}

   if (TARGET_CRYPTO && !TARGET_ALTIVEC)
 {

[Bug target/115688] ICE on simple test case from r15-703-gb390b011569635

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115688

Kewen Lin  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #1 from Kewen Lin  ---
Mine, thanks for reporting, it seems to expose something inconsistent, I'll
look into it soon.

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #8 from Kewen Lin  ---
Inspired by Andrew's comments, it looks we can have:

   c = x CMP y
   r = c ?  0 :  z   =>  r =  ~c & z  (1)
   r = c ?  z :  0   =>  r =   c & z  (2)
   r = c ? -1 :  z   =>  r =   c | z  (3)
   r = c ?  z : -1   =>  r =  ~c | z  (4)

so if target supports vector "or" and "and", (2)(3) is clearly an improvement
(basic logical operation should not be slower than vector select), (1)(4) may
need further cost comparison (or if target supports the compound operation then
query with optab support).

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #7 from Kewen Lin  ---
> > > (simplify
> > >  (vec_cond @0 @1 integer_all_ones_p)
> > >  (bit_ior (view_convert @0) @1))
> > > ```
> > 
> > Missing negate for the vector one?
> 
> No because vector true is already -1 :).

I could be wrong, but this vector transformation seems wrong, like @0 is -1,
originally wants @1 but this simplification returns -1, while @0 is 0,
originally wants -1 but this simplification returns @1, the results get
switched?

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #5 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #2)
> Note I think this could help scalar code too:
> ```
> int a[1], b[1], c[1];
> 
> void
> test (void)
> {
>   a[0] = (b[0] == c[0]) ? -1 : a[0];
> }
> 
> void
> test1 (void)
> {
>   a[0] = (-(b[0] == c[0])) | a[0];
> }
> 
> ```
> 

Good catch!

> So this could be something like:
> ```
> (simplify
>  (cond @0 @1 integer_all_ones_p)
>  (bit_ior (negate (convert @0)) @1))
> (simplify
>  (vec_cond @0 @1 integer_all_ones_p)
>  (bit_ior (view_convert @0) @1))
> ```

Missing negate for the vector one?

> The second one might need a target_supports_op_p for the bit_ior.

Thanks for the hints! This looks more simplified than still keeping vec_cond,
do we need to consider the target costing on cond (conditional select) vs.
negate + or?

[Bug tree-optimization/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-27 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #4 from Kewen Lin  ---
(In reply to Richard Biener from comment #3)
>c = x CMP y 
>r = c ? -1 : z  =>  r = c ? c : z
>r = c ?  z : 0  =>  r = c ? z : c
> 
> this is probably best left for ISEL.  I agree the transforms eliminating
> the COND are useful in general and suitable also for match.pd.  Watch
> out for vectorizer patterns though which creates scalar COND_EXPRs for
> bool mask <-> bool value transforms.

Thanks for the suggestion! If going with ISEL, the patch seems to be like:

-
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 54c1801038b..abb18932228 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -240,16 +240,34 @@ gimple_expand_vec_cond_expr (struct function *fun,
gimple_stmt_iterator *gsi,
 can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
  tcode);

-  /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
  if (can_compute_op0
- && integer_minus_onep (op1)
- && integer_zerop (op2)
  && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
{
- tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
- gassign *new_stmt = gimple_build_assign (lhs, conv_op);
- gsi_replace (gsi, new_stmt, true);
- return new_stmt;
+ bool op1_minus_onep = integer_minus_onep (op1);
+ bool op2_zerop = integer_zerop (op2);
+ /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
+ if (op1_minus_onep && op2_zerop)
+   {
+ tree conv_op
+   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+   }
+ /* Try to fold x CMP y ? -1 : z to x CMP y ? x CMP y : z,
+or x CMP y ? z : 0 to x CMP y ? z : x CMP y.  */
+ if (op1_minus_onep || op2_zerop)
+   {
+ tree conv_op
+   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
+ tree new_op = make_ssa_name (TREE_TYPE (lhs));
+ gassign *new_stmt = gimple_build_assign (new_op, conv_op);
+ if (op1_minus_onep)
+   op1 = new_op;
+ else
+   op2 = new_op;
+ gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
+   }
}

  /* When the compare has EH we do not want to forward it when

-

But this doesn't help this exposed failure, as it belongs to the latter case.
If further going with some hacks for inversion:

-
diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index abb18932228..afc2c9f1386 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -240,6 +240,15 @@ gimple_expand_vec_cond_expr (struct function *fun,
gimple_stmt_iterator *gsi,
can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
 tcode);

+ auto need_inverted_p = [](tree_code c, machine_mode m) {
+   if (GET_MODE_CLASS (m) == MODE_VECTOR_INT)
+ return (c == NE_EXPR || c == GE_EXPR || c == LE_EXPR);
+   gcc_assert (GET_MODE_CLASS (m) == MODE_VECTOR_FLOAT);
+   return (c == NE_EXPR || c == UNLE_EXPR || c == UNLT_EXPR
+   || c == UNGE_EXPR || c == UNGT_EXPR || c == UNORDERED_EXPR
+   || c == UNEQ_EXPR);
+ };
+
  if (can_compute_op0
  && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
{
@@ -254,6 +263,23 @@ gimple_expand_vec_cond_expr (struct function *fun,
gimple_stmt_iterator *gsi,
  gsi_replace (gsi, new_stmt, true);
  return new_stmt;
}
+ bool inverted_p = need_inverted_p (tcode, TYPE_MODE (op0a_type));
+ bool op1_zerop = integer_zerop (op1);
+ bool op2_minus_onep = integer_minus_onep (op2);
+ /* Try to fold x CMP y ? 0 : -1 to ~(x CMP y), it can reuse
+the comparison before the inversion.  */
+ if (inverted_p && op1_zerop && op2_minus_onep)
+   {
+ tree inv_op0 = make_ssa_name (TREE_TYPE (op0));
+ gassign *inv_stmt
+   = gimple_build_assign (inv_op0, BIT_NOT_EXPR, op0);
+ gsi_insert_seq_before (gsi, inv_stmt, GSI_SAME_STMT);
+ tree conv_op
+   = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), inv_op0);
+ gassign *new_stmt = gimple_build_assign (lhs, conv_op);
+ gsi_replace (gsi, new_stmt, true);
+ return new_stmt;
+   }
  /* Try to fold x CMP y ? 

[Bug target/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

--- Comment #1 from Kewen Lin  ---
Now isel has some handling on x CMP y ? -1 : 0 to x CMP y, 

  /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
  if (can_compute_op0
  && integer_minus_onep (op1)
  && integer_zerop (op2)
  && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
{
  tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
  gassign *new_stmt = gimple_build_assign (lhs, conv_op);
  gsi_replace (gsi, new_stmt, true);
  return new_stmt;
}

it looks can be extended to cover:

   c = x CMP y 
   r = c ? -1 : z  =>  r = c ? c : z
   r = c ?  z : 0  =>  r = c ? z : c

, but better to be supported in match.pd?

The handling in rs6000_emit_vector_cond_expr already knows inversion happens or
not, so it further handles the case like:

   c = x CMP y  // c' = x OP y, c = ~c'

   r = c ?  0 : z   =>  r = c' ?  z  : c'
   r = c ?  z : -1  =>  r = c' ?  c' : z

it seems to need a helper to query whether if target would expand with
inversion for one given comparison operator?

[Bug target/115659] powerpc fallout from removing vcond{,u,eq} patterns

2024-06-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

Kewen Lin  changed:

   What|Removed |Added

 Blocks||114189
 CC||bergner at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
 Target||powerpc*
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-06-26
   Keywords||missed-optimization
   Target Milestone|--- |15.0
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Ever confirmed|0   |1


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114189
[Bug 114189] Target implements obsolete vcond{,u,eq} expanders

[Bug target/115659] New: powerpc fallout from removing vcond{,u,eq} patterns

2024-06-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115659

Bug ID: 115659
   Summary: powerpc fallout from removing vcond{,u,eq} patterns
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

Applying the patch dropping vcond{,u,eq}_optab support
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114189#c2), there is only one
failure on both BE and LE:

FAIL: gcc.target/powerpc/pr66144-3.c scan-assembler-not mvspltiswM

Previously I blindly took it as false alarm, but after further checking, I
realized it exposed a miss-opt.

In function rs6000_emit_vector_cond_expr, there is one optimization

  /* Optimize vec1 == vec2, to know the mask generates -1/0.  */
  if (GET_MODE_CLASS (dest_mode) == MODE_VECTOR_INT
  && (GET_CODE (op_true) == CONST_VECTOR
  || GET_CODE (op_false) == CONST_VECTOR))

  ...

, it's some special handling for

   1) op_true -1 and op_false 0
   2) op_false 0 and op_true -1
   3) op_true -1
   4) op_false 0

by reusing the result of vector comparison as it returns -1 or 0.

[Bug target/115612] powerpc: define_insn_and_splits calling gen_reg_rtx unconditionally (-flate-combine disabled by default for powerpc port)

2024-06-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115612

--- Comment #1 from Kewen Lin  ---
Thanks for filing this!

For the given example, previously split1 splits ordered test into unordered
test + xor, late-combine pass recombines them into ordered test then split2
fails to create a pseduo after RA. There seems two alternatives, adding
can_create_pseudo_p() to define_insn condition to stop late-combine after RA to
re-combine it, or appending a match_scratch to get temp register for rtx tmp
after RA. IMHO we don't expect to see ordered test after RA any more, so the
former is preferred? but people can argue that the latter is more flexible.

Segher originated these define_insn_and_split, looking forward to his opinion.

Need more checkings for the other failures.

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-06-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

Kewen Lin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Kewen Lin  ---
Should be fixed on trunk and all active release branches.

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-06-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

--- Comment #17 from Kewen Lin  ---

(In reply to Peter Bergner from comment #11)
> Have we done the backports so we can just mark this bug a FIXED?  ...or do
> we still need to push the backports?

(In reply to Segher Boessenkool from comment #12)
> The backports have not been done yet.
> 

Yeah, I just backported it to all active release branches, previously when I
wanted to backport them, I noticed release 12.4 is ongoing, so I held it for a
while.

> It would be good if the blockage / barrier would get some comment btw, saying
> what exactly it is intended to do!  It is very much cargo-cult the way it is.

Good idea, sorry that didn't specify it originally, the reason why adding the
barrier is that: bb verifier requires returnjump_p should be followed by a
barrier:

  if (JUMP_P (x)
  && returnjump_p (x) && ! condjump_p (x)
  && ! ((y = next_nonnote_nondebug_insn (x))
&& BARRIER_P (y)))
fatal_insn ("return not followed by barrier", x); 

, otherwise we will encounter an ICE. Adding blockage follows the existing
practice when calling rs6000_emit_epilogue to respect option -mno-sched-epilog
(put one scheduling boundary on the emitted insns from rs6000_emit_epilogue).

[Bug target/115466] rs6000 vec_ld built-in works on BE but not LE

2024-06-13 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115466

Kewen Lin  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org

--- Comment #2 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #1)
> >  int ia[8] = {1, 2, 3, 4, 5, 6, 7, 8};
> >  float fa[8] = {10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0};
> 
> The way vec_ld works is is (a+b)&~0xf is the address which is being loaded. 
> 
> Both ia and fa are not specified as being aligned to the 16byte boundary so
> it could be loading from before hand.
> 
> What happens if you do:
>   int ia[8] __attribute__((aligned(16))) = {1, 2, 3, 4, 5, 6, 7, 8};
>   float fa[8] __attribute__((aligned(16))) = {10.0, 20.0, 30.0, 40.0, 50.0,
> 60.0, 70.0, 80.0};
> 
> Instead?

Good point, I can't reproduce the reported issue on two LE machines, so I'd
leave Carl to confirm. But from the output (biasing two elements from the
expected), I believe this is the root cause.

[Bug testsuite/115262] [15 regression] gcc.target/powerpc/pr66144-3.c fails after r15-831-g05daf617ea22e1

2024-06-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115262

--- Comment #3 from Kewen Lin  ---
(In reply to Peter Bergner from comment #2)
> (In reply to Jeffrey A. Law from comment #1)
> > It looks like the test wants to see xxsel, but after that change we get
> > xxlor and  what looks like a slight difference in register allocation.  I
> > can't really judge if the new code is better, worse is equivalent.
> 
> xxsel XT,XA,XB,XC computes XT = (XA & ~XC) | (XB & XC).  Using De Morgan's
> law given XB == XC, that seems to simplify to XT = XA | XB which is what
> you're producing and an xxlor (a simple logical or) is not going to be
> slower than a xxsel and is probably faster.  I agree with Bill that this
> looks like an example of needing to update the expected results of the test
> case.  I'll let Segher and/or Ke Wen comment though.

I agree they are equivalent here, from the scheduling descriptions, xxsel and
xxlor are in the same unit.

[Bug tree-optimization/115427] fallback for interclass mathfn bifs like isinf, isfinite, isnormal

2024-06-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427

--- Comment #5 from Kewen Lin  ---
(In reply to rguent...@suse.de from comment #4)
> On Tue, 11 Jun 2024, linkw at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427
> > 
> > --- Comment #3 from Kewen Lin  ---
> > (In reply to Richard Biener from comment #2)
> > > The canonical way would be to handle these in the ISEL pass and remove
> > > the (fallback) expansion.  But then we can see whether the expander FAILs
> > > (ideally expanders would never be allowed to FAIL, and for FAILing 
> > > expanders
> > > we'd have a way to query the target like we have the vec_perm_const hook).
> > > 
> > > But I'll note that currently the expanders may FAIL but then we expand to
> > > a call rather than the inline-expansion (and for example AVR relies on 
> > > this
> > > now to avoid early folding of isnan).
> > > 
> > > So - for the cases of isfininte and friends without a fallback call I
> > > would suggest to expand from ISEL to see if it FAILs and throw away
> > > the result (similar as how IVOPTs probes things).  Or make those _not_
> > > allowed to FAIL?  Why would they fail to expand anyway?
> > 
> > Thanks for the suggestion! IIUC considering the AVR example we still want
> > *isinf* to fall back with the library call (so not falling back with
> > inline-expansion way then).  Currently at least for rs6000 port there is no
> > case that we want to make it FAIL, but not sure some other targets will have
> > such need in future.  From the review comment[1], we don't note it's not
> > allowed to FAIL so we probably need to ensure there is some handling for 
> > FAIL
> > in case some future FAIL cause some unexpected failure. Do you prefer not
> > allowing it to FAIL? then re-open this and go with ISEL if some port wants 
> > it
> > to FAIL?
> 
> I think it would be cleaner to not allow it FAIL since there's no library
> fallback.  

Fair enough!

> FAILing patterns are a hassle when it comes to GIMPLE
> optimizations.

Yeah, for some cases port isn't able to put some condition as part of condition
HAVE_* (such as further checking operand special values etc.), FAIL has to be
used.

> 
> As said, there should be a good reason why patterns FAIL - what's
> the idea behind this feature anyway?

No solid input for this, as the proposed documentation implicitly indicates
FAIL is possible to be used (like some other existing expanders), I didn't
consider carefully if it has a good reason, but just assuming it can happen. :(
It's a really good question if there will be a need for it.

[Bug tree-optimization/115427] fallback for interclass mathfn bifs like isinf, isfinite, isnormal

2024-06-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427

--- Comment #3 from Kewen Lin  ---
(In reply to Richard Biener from comment #2)
> The canonical way would be to handle these in the ISEL pass and remove
> the (fallback) expansion.  But then we can see whether the expander FAILs
> (ideally expanders would never be allowed to FAIL, and for FAILing expanders
> we'd have a way to query the target like we have the vec_perm_const hook).
> 
> But I'll note that currently the expanders may FAIL but then we expand to
> a call rather than the inline-expansion (and for example AVR relies on this
> now to avoid early folding of isnan).
> 
> So - for the cases of isfininte and friends without a fallback call I
> would suggest to expand from ISEL to see if it FAILs and throw away
> the result (similar as how IVOPTs probes things).  Or make those _not_
> allowed to FAIL?  Why would they fail to expand anyway?

Thanks for the suggestion! IIUC considering the AVR example we still want
*isinf* to fall back with the library call (so not falling back with
inline-expansion way then).  Currently at least for rs6000 port there is no
case that we want to make it FAIL, but not sure some other targets will have
such need in future.  From the review comment[1], we don't note it's not
allowed to FAIL so we probably need to ensure there is some handling for FAIL
in case some future FAIL cause some unexpected failure. Do you prefer not
allowing it to FAIL? then re-open this and go with ISEL if some port wants it
to FAIL?

[1]
https://inbox.sourceware.org/gcc-patches/CAFiYyc3wE=xdkrzuvf1kttdrkvaaw-dyw+ztryc1p6+6nmt...@mail.gmail.com/

[Bug tree-optimization/115427] fallback for interclass mathfn bifs like isinf, isfinite, isnormal

2024-06-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427

Kewen Lin  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
   Keywords||internal-improvement
 CC||bergner at gcc dot gnu.org,
   ||guihaoc at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #1 from Kewen Lin  ---
Now we have expand_builtin_interclass_mathfn to expand these functions if they
don't have optab defined, it seems fine to generate equivalent RTL as
fold_builtin_interclass_mathfn there. However, by considering the
maintainability, IMHO it's better to reuse the tree exp in
fold_builtin_interclass_mathfn, then we only have one place for such folding.
It would be like something:

@@ -2534,6 +2536,20 @@ expand_builtin_interclass_mathfn (tree exp, rtx target)
   && maybe_emit_unop_insn (icode, ops[0].value, op0, UNKNOWN))
 return ops[0].value;

+  location_t loc = EXPR_LOCATION (exp);
+  tree fold_res
+= fold_builtin_interclass_mathfn (loc, fndecl, orig_arg, false);
+
+  if (fold_res)
+{
+  op0 = expand_expr (fold_res, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+  tree rtype = TREE_TYPE (TREE_TYPE (fndecl));
+  machine_mode rmode = TYPE_MODE (rtype);
+  if (rmode != GET_MODE (op0))
+op0 = convert_to_mode (rmode, op0, 0);
+  return op0;
+}
+
   delete_insns_since (last);
   CALL_EXPR_ARG (exp, 0) = orig_arg;

But unfortunately since fold_builtin_interclass_mathfn is for both front-end
and middle-end, it would have some tree code like TRUTH_NOT_EXPR, which isn't
supported in expand_expr. To make it work, we can change TRUTH_NOT_EXPR with
BIT_NOT_EXPR (like in fold_builtin_unordered_cmp), but there are some other
codes like TRUTH_ANDIF_EXPR, TRUTH_ORIF_EXPR (for ibmlongdouble) which can't be
replaced with BIT_AND_EXPR and BIT_OR_EXPR by considering the short-circuit, so
I tried to use COND_EXPR for them instead, but by testing a case with ibmlong
double, there are still some gaps from the original folding code.

I also tried a hackish way that is to force tree exp to gimple stmts and try to
expand these stmts one by one, but it adds more ssa than before and ICE on ssa
to rtx things, not sure if it's a considerable direction to dig into.

I'm looking for suggestions here, is there some existing practice to follow?
which is preferred that expanding from folded tree exp or generating equivalent
rtx directly.  If for the former one, allowing some difference from the
original folding (FAIL can be rare), or experimenting some other ways.

[Bug tree-optimization/115427] New: fallback for interclass mathfn bifs like isinf, isfinite, isnormal

2024-06-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427

Bug ID: 115427
   Summary: fallback for interclass mathfn bifs like isinf,
isfinite, isnormal
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

This is filed as follow up for the discussion in [1].

The optabs for isfinite and isnormal would be landed soon, the documentation
allows the optab expansion to fail (as it doesn't mention it's not allowed to),
but with an artificial FAIL in the define_expand for these optabs, there are
two cases:
  1) for isinf, it would result in a call to isinf, but in fact
fold_builtin_interclass_mathfn is able to fold them if there is no target
specific implementation.
  2) for isfinite and isnormal, since there is no library call registered, it
would result in a call to __builtin_{isfinite, isnormal}, which is completely
wrong.

So following Richi's suggestion, this PR is to follow up the falling back way.

[1]
https://inbox.sourceware.org/gcc-patches/17c9ab5d-f1d4-9447-fccf-d9aa0ad56...@linux.ibm.com/

[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496

2024-06-07 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

--- Comment #11 from Kewen Lin  ---
(In reply to Jens Seifert from comment #10)
> Does this affect loop vectorize and slp vectorize ?
> 
> -fno-tree-loop-vectorize avoids loop vectorization to be performed and
> workarounds this issue. Does the same problems also affect SLP
> vectorization, which does not take place in this sample.
> 
> In other words, do I need
> -fno-tree-loop-vectorize
> or
> -fno-tree-vectorize
> to workaround this bug ?

Since it's an issue on vector merge insn patterns in target code and
vectorization just exposes it, it's hard to workaround this bug completely just
by disabling both loop and slp vectorization, as its related bug PR106069
shows, even without vectorization but using some vec merge built-ins, it's
still possible to hit this issue.  But I'd expect disabling both loop and slp
vectorization (-fno-tree-vectorize) can greatly reduce the possibility of
encountering it.

[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496

2024-06-06 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

--- Comment #9 from Kewen Lin  ---
(In reply to Peter Bergner from comment #7)
> The test fails when setToIdentityBAD's index var is unsigned int.  It passes
> when using unsigned long long, unsigned long, unsigned short and unsigned
> char.  When using unsigned long long/unsigned long, we do no vectorize the

unsigned {long ,}long fails to vectorize due to cost modeling:

  missed:  cost model: the vector iteration cost = 2 divided by the scalar
iteration cost = 1 is greater or equal to the vectorization factor = 2.
  missed:  not vectorized: vectorization not profitable.

it can be forced with -fno-vect-cost-model.

> loop.  We vectorize the loop when using unsigned int/short/char.  The
> vectorized code is a little strange, in that the smaller the integer type we
> use for the index var, the more code we generate.  
> 
> The vectorized code for unsigned char is truly huge!  ...although it does
> seem to work correctly.  I'm attaching the "unsigned char i" code gen for
> setToIdentityBAD for people to examine.  Even though it gives "correct"
> results, it can't really be the code we want to generate, correct???

It's due to aggressive unrolling, as it has one early check on the loop bound
between 16 and 255, then cunroll completely unrolls it for each 16 multiples
(totally 15 loops). A compact version of code can be generated with
-fdisable-tree-cunroll.

[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496

2024-06-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

--- Comment #8 from Kewen Lin  ---
(In reply to Peter Bergner from comment #5)
> FYI, fails for me with gcc 12 and later and works with gcc 11.  It also
> fails with -O3 -mcpu=power10.

Thanks for the information, bisection shows r12-4496 is the culprit commit, I
just tested and confirmed Xionghu's latest patch for PR106069 also fixed this
one.

  - latest rev. for his fix:
https://inbox.sourceware.org/gcc-patches/20230210025952.1887696-1-xionghu...@tencent.com/,
which was resent from
https://inbox.sourceware.org/gcc-patches/37b57a54-f98e-96a3-edff-866c8aae4...@gmail.com/

  - original thread and some discussions:
https://inbox.sourceware.org/gcc-patches/20220808034247.2618809-1-xionghu...@tencent.com/

The latest rev. looked to me as
(https://inbox.sourceware.org/gcc-patches/e8e69f0c-7f36-e671-6c3b-74401e4d8...@linux.ibm.com/),
still looking forward to Segher's review and approval on this.

[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-06-05
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

--- Comment #4 from Kewen Lin  ---
Thanks for reporting, I'll have a look first.

[Bug target/115282] [15 regression] gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c fails after r15-812-gc71886f2ca2e46

2024-05-31 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115282

Kewen Lin  changed:

   What|Removed |Added

   Last reconfirmed||2024-05-31
 Status|UNCONFIRMED |NEW
 CC||linkw at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Kewen Lin  ---
(In reply to Richard Biener from comment #1)
> I don't see a good reason why, but I don't have a BE cross around to check
> myself.  Does BE vect maybe not have unsigned integer vector multiplication
> support?

BE should have int vector mult too, I noticed it's guarded with TARGET_ALTIVEC.

The first loop (line 17) causes the difference, previously it did the splitting
like: 

test.c:16:17: note:   Splitting SLP group at stmt 6
test.c:16:17: note:   Split group into 6 and 2

but now it won't and then seems to fail due to that:

test.c:16:17: note:   ==> examining statement: _14 = in[_13];
test.c:16:17: missed:   permutation requires at least three vectors _2 =
in[_1];
test.c:16:17: missed:   unsupported load permutation
test.c:25:14: missed:   not vectorized: relevant stmt not supported: _14 =
in[_13];
test.c:16:17: note:   Cannot vectorize all-constant op node 0x140dd450
test.c:16:17: note:   removing SLP instance operations starting from: out[_1] =
_17;
test.c:16:17: missed:  unsupported SLP instances
test.c:16:17: note:  re-trying with SLP disabled
test.c:16:17: note:  vectorization_factor = 4, niters = 8

I can't figure out why it can pass on LE, so I did a test on LE and found it
fails on LE too!?

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-05-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

--- Comment #10 from Kewen Lin  ---
(In reply to Peter Bergner from comment #9)
> (In reply to Kewen Lin from comment #8)
> > Should be fixed on trunk, it's not a regression, but we probably want
> > backporting this?
> 
> For code correctness bugs, yes, we want them backported.

Thanks for confirming!  Will do backporting after burn-in time.

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-05-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

--- Comment #8 from Kewen Lin  ---
Should be fixed on trunk, it's not a regression, but we probably want
backporting this?

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-05-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

Kewen Lin  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2024-May/651
   ||025.html

--- Comment #18 from Kewen Lin  ---
A formal patch had been sent out as URL field shows, still waiting for review.

[Bug target/114402] rs6000: ICE when long double is ieee128 format by default but without vsx

2024-05-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114402

Kewen Lin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Kewen Lin  ---
Not a regression, it should be rare to adopt ieee long double but disabling
vsx, so not backported.  Should be fixed on trunk.

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-05-14 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

Kewen Lin  changed:

   What|Removed |Added

  Attachment #58067|0   |1
is obsolete||

--- Comment #6 from Kewen Lin  ---
Created attachment 58201
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58201=edit
tested patch

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-04-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

--- Comment #5 from Kewen Lin  ---
Created attachment 58067
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58067=edit
untested patch

[Bug testsuite/113535] rs6000, testsuite: Re-visit the current vect_* for Power

2024-04-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113535

--- Comment #1 from Kewen Lin  ---
One issue: https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650171.html

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-04-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

Kewen Lin  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Kewen Lin from comment #2)
> > As https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114843#c8, we may need some
> > similar handling like r14-6440-g4b421728289e6f.
> 
> Note rs6000_emit_epilogue mostly handles eh_returns so it might not be as
> hard as other targets.

Yes, making a patch.

[Bug target/44793] [11/12/13/14/15 Regression] libgcc does not include t-ppccomm on rtems

2024-04-28 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44793

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME
 CC||linkw at gcc dot gnu.org

--- Comment #26 from Kewen Lin  ---
libgcc/config.host on gcc-11 has:

powerpc-*-rtems*)
  tmake_file="${tmake_file} rs6000/t-ppccomm rs6000/t-savresfgpr
rs6000/t-crtstuff t-crtstuff-p  ic t-fdpbit"
  extra_parts="$extra_parts crtbeginS.o crtendS.o crtbeginT.o ecrti.o
ecrtn.o ncrti.o ncrtn.o"
  ;;

I think this had been fixed already by r0-119741-g6f28886030623a.

Please feel free to reopen it if it still occurs on active releases. Thanks!

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-04-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

--- Comment #2 from Kewen Lin  ---
As https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114843#c8, we may need some
similar handling like r14-6440-g4b421728289e6f.

[Bug target/114846] powerpc: epilogue in _Unwind_RaiseException corrupts return value due to __builtin_eh_return

2024-04-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114846

Kewen Lin  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2024-04-25
 Status|UNCONFIRMED |NEW
 CC||bergner at gcc dot gnu.org,
   ||linkw at gcc dot gnu.org,
   ||segher at gcc dot gnu.org
 Target|powerpc64-linux-gnu |powerpc64*-linux-gnu
   |powerpc-linux-gnu   |powerpc-linux-gnu

--- Comment #1 from Kewen Lin  ---
Thanks for reporting, confirmed, it also fails on LE (ppc64le-linux).

[Bug testsuite/114842] rs6000: Adjust some test cases with powerpc_vsx_ok

2024-04-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114842

--- Comment #1 from Kewen Lin  ---
We can extend powerpc_vsx to consider current_compiler_flags, it means that if
a test case has an explicit -mvsx, even if users specify -mno-vsx it's still
able to be tested if powerpc_vsx checking concludes VSX is enabled, it can keep
some previous testing coverage.

[Bug testsuite/114842] rs6000: Adjust some test cases with powerpc_vsx_ok

2024-04-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114842

Kewen Lin  changed:

   What|Removed |Added

 Target||powerpc*-linux-gnu
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
   Last reconfirmed||2024-04-25
   Target Milestone|--- |15.0
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

[Bug testsuite/114842] New: rs6000: Adjust some test cases with powerpc_vsx_ok

2024-04-25 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114842

Bug ID: 114842
   Summary: rs6000: Adjust some test cases with powerpc_vsx_ok
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

The current effective target powerpc_vsx_ok is mainly to check if it's fine to
specify -mvsx (without any warnings etc.) and can finally result in a object
file (it means the underlying environment like assembler supports vsx insns).
But most of the test cases being guarded with this checking actually want to
check if VSX feature is enabled, such as: the wanted behavior only happens with
VSX feature enabled. When users specifying -mno-vsx in RUNTESTFLAGS, it can
disable VSX feature (with some old runtest, -mno-vsx comes after -mvsx), but
powerpc_vsx_ok checking will still pass as it's fine to specify -mvsx, so if
the test case doesn't have explicit -mvsx, then the given -mno-vsx can disable
VSX feature and make that test case fail, meanwhile even if the test case has
specified -mvsx explicitly it would fail with old runtest as -mno-vsx comes
last. We already have another effective target powerpc_vsx which effectively
checks for VSX enabled, so we should update most of test case to adopt it
instead.

[Bug target/88309] [11/12/13/14 Regression] ICE: Floating point exception (in is_miss_rate_acceptable), target assigning alignent of 4 bits(!) to vector

2024-04-24 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88309

Kewen Lin  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Kewen Lin  ---
Should be fixed on trunk and active release branches.

[Bug target/105359] _Float128 expanders and builtins disabled on ppc targets with 64-bit long double

2024-04-23 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105359

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-04-23
   Keywords||missed-optimization
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 CC||linkw at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Kewen Lin  ---
Thanks for reporting, I'll have a look.

[Bug testsuite/114744] test case gcc.target/powerpc/builtins-6-p9-runnable.c fails

2024-04-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114744

Kewen Lin  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Kewen Lin  ---
Should be fixed on trunk, since it's a test issue, no backporting need.

[Bug testsuite/114744] test case gcc.target/powerpc/builtins-6-p9-runnable.c fails

2024-04-16 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114744

Kewen Lin  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2024-04-17
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #2 from Kewen Lin  ---
This is very very likely a test issue, due to endianness which the load vector
should consider. I'll have a look.

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-04-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #17 from Kewen Lin  ---
(In reply to Michael Matz from comment #16)
> (In reply to Kewen Lin from comment #15)
> > I agree, thanks for the comments! btw, I'm not fighting for the current
> > implementation, just want to know more details why users are unable to make
> > use of the current implementation, is it just due to its inefficiency (like
> > the above sequence) or un-usability (unused at all). As your comments, I
> > think it's due to the former (inefficiency)?!
> 
> Okay.  So, yeah, I _think_ that other way (with NOPs between GEP and LEP,
> plus a jump around them) could be made to work with userspace live patching.
> It would just be inefficient.  But do note that that jump around was _not_
> part of the original way of -fpatchable-function-entry, so a change to
> codegen
> would have to have happened anyway to make that other way usable.  And it
> has the
> (perhaps theoretical, who knows :) ) problem of not using the normal 8-byte
> difference between GEP and LEP.
> 

Thanks again for confirming this understanding!

> I think your current proposal from comment #10 is the better from all
> perspectives.

Yeah, I agree. When reworking this support previously, comment #10 like
implementation was considered as a better one but it's not finally made due to
the concern that can break the assumption NOPs should be consecutive, based on
all the inputs here I think it's time to "fix" it by just underscoring this
special not-consecutive NOPs in documentation.

[Bug target/114567] rs6000: explicit _Float128 doesn't generate optimal code

2024-04-10 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114567

--- Comment #1 from Kewen Lin  ---
This is power8 LE specific, for KFmode its mov expander calls
rs6000_emit_le_vsx_move, so it's with V1TI subreg, then rs6000 specific pass
swaps generate one MEM with AND -16, which make combine unable to optimize it
with that *signbit2_dm_mem pattern due to mode_dependent_address_p
returning false always for AND. Although it looks to me we can extend
mode_dependent_address_p to consider the to-mode in that context, it's still
sub-optimal due to the existence of AND -16, which result in an explicit "and"
then.

[Bug testsuite/114662] [14 regression] new test case c_lto_pr113359-2 from r14-9841-g1e3312a25a7b34 fails

2024-04-10 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114662

Kewen Lin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Kewen Lin  ---
Should be fixed on latest trunk.

[Bug testsuite/114662] [14 regression] new test case c_lto_pr113359-2 from r14-9841-g1e3312a25a7b34 fails

2024-04-10 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114662

Kewen Lin  changed:

   What|Removed |Added

  Component|lto |testsuite
   Target Milestone|--- |14.0
   Keywords||testsuite-fail

[Bug lto/114662] [14 regression] new test case c_lto_pr113359-2 from r14-9841-g1e3312a25a7b34 fails

2024-04-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114662

Kewen Lin  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 CC||linkw at gcc dot gnu.org
   Last reconfirmed||2024-04-10
 Status|UNCONFIRMED |ASSIGNED

--- Comment #2 from Kewen Lin  ---
I think this is a test issue, with -m32 unsigned long is 4 bytes while CL1,CL2
are 8 bytes constants, then it considers some checks would always fail and the
abort will happen, since the optimization aggressively optimize away the call
to getb, there is no chance to further check "semantic equality". The IR for
main at *.015t.cfg looks like:

int main (int argc, char * * argv)
{
  struct SB b;
  struct SA a;
  int D.3983;

   :
  init ();
  geta (, );
  _1 = a.ax;
  if (_1 != 3735928559)
goto ; [INV]
  else
goto ; [INV]

   :
  __builtin_abort ();

   :
  __builtin_abort ();

}

[Bug rtl-optimization/114664] -fno-omit-frame-pointer causes an ICE during the build of the greenlet package

2024-04-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114664

--- Comment #8 from Kewen Lin  ---
(In reply to Peter Bergner from comment #7)
> (In reply to Andrew Pinski from comment #6)
> > Pre-IRA fix was done to specifically reject this:
> > https://inbox.sourceware.org/gcc-patches/
> > ab3a61990702021658w4dc049cap53de8010a7d86...@mail.gmail.com/
> 
> Then that would seem to indicate that mentioning the frame pointer reg in
> the asm clobber list is an error, but how are users supposed to know whether
> -fno-omit-frame-pointer is in effect or not?  I've looked and there is no
> pre-defined macro a user could check.

I noticed even without -fno-omit-frame-pointer, the test case still fails with
the same symptom (with error msg rather than ICE), did I miss something?

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-04-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #15 from Kewen Lin  ---
(In reply to Michael Matz from comment #14)
> Hmm?  But this is not how the global-to-local hand-off is implemented (and
> expected by tooling): a fall-through.  The global entry sets up the GOT
> register, there simply is no '[b localentry]'.
> 
> If you mean to imply that also the '[b localentry]' should be patched in at
> live-patch application time (and hence the GOT setup would need to be moved
> to still somewhere else), then you have the problem that (in the
> not-yet-patched 
> case) as long as the L1-nops sit between global and local entry they will
> always 
> be executed when the global entry is called.

Sorry for confusion, I meant the sequence like:

global entry:
  [TOC base setup] // always here
  [b localentry] // which is added when patching
L1:
  [patched code] // from patching
  localentry: 
  [b L1] // from patching

> That's wasteful.

I agree, nops are not zero cost on Power8/Power9.

> 
> Additionally tooling will be surprised if the address difference between
> global and local entry isn't exactly 8 (i.e. two instructions).  The psABI
> allows for different values, of course.  But I'm willing to bet that there
> are
> bugs in the wild when different values would be actually used.
> 

It's possible that some tooling doesn't conform the ABI doc well, but I think
the tooling should fix itself if that is the case. :)

> So, the nops-between-gep-and-lep could probably be somehow made to work with
> userspace live patching, but your most recent patch here makes this all mood.
> It generates exactly the sequence we want: a single nop at the LEP, and
> a configurable patching area outside of, but near to, the function (here: in
> front of the GEP).

I agree, thanks for the comments! btw, I'm not fighting for the current
implementation, just want to know more details why users are unable to make use
of the current implementation, is it just due to its inefficiency (like the
above sequence) or un-usability (unused at all). As your comments, I think it's
due to the former (inefficiency)?!

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-04-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #13 from Kewen Lin  ---
(In reply to Giuliano Belinassi from comment #12)
> With your patch we have:
> 
> > .LPFE0:
> > ...
> Which seems what is expected.

Hi Giuliano, thanks for your time on testing it!  Could you kindly help to
explain a bit on why "In such way we can't use the this space to place a
trampoline to the new function"? Is it due to inefficient code like needing
more branches?

global entry:
  [b localentry]
L1:
  [patched code]

localentry:
  [b L1]

Or some other reason which makes it unused at all?

[Bug testsuite/114614] New test case gcc.misc-tests/gcov-20.c from r14-9789-g08a52331803f66 fails

2024-04-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114614

Kewen Lin  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Kewen Lin  ---
Should be fixed on latest trunk.

[Bug testsuite/114642] new test case gcc.dg/debug/btf/btf-datasec-3.c from r14-6195-gb8cf266f4ca4ff fails for 32 bits

2024-04-08 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114642

Kewen Lin  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2024-April/6
   ||48994.html
 CC||linkw at gcc dot gnu.org
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |david.faust at oracle 
dot com

--- Comment #2 from Kewen Lin  ---
David posted a fix (see URL).

[Bug testsuite/114614] New test case gcc.misc-tests/gcov-20.c from r14-9789-g08a52331803f66 fails

2024-04-07 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114614

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
   Last reconfirmed||2024-04-08
 CC||linkw at gcc dot gnu.org

--- Comment #1 from Kewen Lin  ---
It requires effective target profile_update_atomic.

[Bug target/114567] rs6000: explicit _Float128 doesn't generate optimal code

2024-04-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114567

Kewen Lin  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
   Keywords||missed-optimization
 Target||powerpc64*-linux-gnu
   Last reconfirmed||2024-04-03
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

[Bug target/114567] New: rs6000: explicit _Float128 doesn't generate optimal code

2024-04-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114567

Bug ID: 114567
   Summary: rs6000: explicit _Float128 doesn't generate optimal
code
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

This is an issue which I happened to spot when I have been working on patches
for PR112993.

=== test case ===

#define TYPE _Float128

#ifdef LD
#undef TYPE
#define TYPE long double
#endif

int sbm (TYPE *a) { return __builtin_signbit (*a); }

==

/opt/gcc-nightly/trunk/bin/gcc -mcpu=power8 -mvsx -O2 -mabi=ieeelongdouble
-Wno-psabi test.c -DLD -S -o ref.s
/opt/gcc-nightly/trunk/bin/gcc -mcpu=power8 -mvsx -O2 -mabi=ibmlongdouble
-Wno-psabi test.c -S -o float128.s

diff -Nur ref.s float128.s
--- ref.s   2024-03-18 05:41:00.302208975 -0400
+++ float128.s  2024-03-18 05:41:00.392205513 -0400
@@ -9,7 +9,10 @@
 sbm:
 .LFB0:
.cfi_startproc
-   ld 3,8(3)
+   rldicr 3,3,0,59
+   lxvd2x 0,0,3
+   xxpermdi 0,0,0,2
+   mfvsrd 3,0
srdi 3,3,63
blr
.long 0

[Bug target/88309] [11/12/13/14 Regression] ICE: Floating point exception (in is_miss_rate_acceptable), target assigning alignent of 4 bits(!) to vector

2024-04-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88309

Kewen Lin  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

--- Comment #6 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Kewen Lin from comment #4)
> > Hi Andrew, thanks for digging into this!  William has not worked on GCC
> > project any more, will you make a patch for this?
> 
> I don't have time to test it really.

No problem, I'll work on this.

[Bug target/88309] [11/12/13/14 Regression] ICE: Floating point exception (in is_miss_rate_acceptable), target assigning alignent of 4 bits(!) to vector

2024-04-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88309

Kewen Lin  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org

--- Comment #4 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #3)
> Found it:
>   /* In GIMPLE the type of the MEM_REF specifies the alignment.  The
> required alignment (power) is 4 bytes regardless of data type.  */
>   tree align_ltype = build_aligned_type (lhs_type, 4);
> 
> That should be 4*8 instead of just 4.
> 
> There are 2 build_aligned_type in rs6000-builtins.cc which uses the wrong
> alignment; thinking it was the alignment argument was bytes rather than bits.
> 
> Introduced by r9-2375-g3f7a77cd20d07c which means this is a regression.

Hi Andrew, thanks for digging into this!  William has not worked on GCC project
any more, will you make a patch for this?

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-04-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #11 from Kewen Lin  ---
(In reply to Giuliano Belinassi from comment #9)
> Yes, this is for userspace livepatching.
> 
> Assume the following example:
> https://godbolt.org/z/b9M8nMbo1
> 
> As one can see, the sequence of 14 nops are generated after the global
> function entry point. In such way we can't use the this space to place a
> trampoline to the new function. We need this sequence of nops to be placed
> *before* the global function entry point.
> 

Hi Giuliano, thanks for the inputs!

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-04-02 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #10 from Kewen Lin  ---
Created attachment 57844
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57844=edit
patch changing the current implementation

Considering the current implementation is not useful at all for both kernel and
userspace uses, I'm inclined to change the current implementation instead of
introducing another option, but updating the documentation to emphasize the
NOPs may not be consecutive for this case.

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-04-01 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #8 from Kewen Lin  ---
Hi @Michael, @Martin, could you help to confirm/clarify what triggers you to be
interested in this feature, is it for some user space usage or not?

[Bug target/114402] rs6000: ICE when long double is ieee128 format by default but without vsx

2024-03-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114402

--- Comment #1 from Kewen Lin  ---
Currently the only pattern to match IEEE128 comparison is:

;; IEEE 128-bit comparisons
(define_insn "*cmp_hw"
  [(set (match_operand:CCFP 0 "cc_reg_operand" "=y")
(compare:CCFP (match_operand:IEEE128 1 "altivec_register_operand" "v")
  (match_operand:IEEE128 2 "altivec_register_operand"
"v")))]
  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
   "xscmpuqp %0,%1,%2"
  [(set_attr "type" "veccmp")
   (set_attr "size" "128")])

It requires TARGET_FLOAT128_HW, so nothing can be used for matching.

The below patch can fix this ICE, it makes no-vsx IEEE128 also go with libfunc
call like !TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode).

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 5d975dab921..237d138faec 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15329,7 +15329,7 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
   rtx op0 = XEXP (cmp, 0);
   rtx op1 = XEXP (cmp, 1);

-  if (!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode))
+  if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
 comp_mode = CCmode;
   else if (FLOAT_MODE_P (mode))
 comp_mode = CCFPmode;
@@ -15361,7 +15361,7 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)

   /* IEEE 128-bit support in VSX registers when we do not have hardware
  support.  */
-  if (!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode))
+  if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
 {
   rtx libfunc = NULL_RTX;
   bool check_nan = false;

[Bug target/114402] rs6000: ICE when long double is ieee128 format by default but without vsx

2024-03-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114402

Kewen Lin  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org,
   ||g...@the-meissners.org,
   ||segher at gcc dot gnu.org
   Last reconfirmed||2024-03-21
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

[Bug target/114402] rs6000: ICE when long double is ieee128 format by default but without vsx

2024-03-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114402

Kewen Lin  changed:

   What|Removed |Added

 Target||powerpc64*-linux-gnu
   Keywords||ice-on-valid-code
   Target Milestone|--- |15.0
  Known to fail||12.3.1, 13.2.1

[Bug target/112980] 64-bit powerpc ELFv2 does not allow nops to be generated before function global entry point

2024-03-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112980

--- Comment #6 from Kewen Lin  ---
(In reply to Martin Jambor from comment #5)
> I'd like to ping this, are there plans to implement this in the near-ish
> term?

Some weeks ago, Naveen had been doing some experiments to see if there is a
better way for function tracer support, and if the idea works and the
experiment result is promising, he may request something different, so we are
still waiting for that. @Naveen Feel free to correct me if any
misunderstanding.

[Bug target/114402] New: rs6000: ICE when long double is ieee128 format by default but without vsx

2024-03-20 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114402

Bug ID: 114402
   Summary: rs6000: ICE when long double is ieee128 format by
default but without vsx
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: linkw at gcc dot gnu.org
  Target Milestone: ---

When I was doing a patch to make us only have two 128bit fp on rs6000, I found
that we can have long double with ieee128 format by default and even not having
vsx support, but a simple test case with comparison triggers ICE as below:

long double a;
long double b;

int foo() {
  if (a > b)
return 0;
  else
return 1;
}

/opt/gcc-nightly/trunk/bin/gcc test.c -mno-vsx

test.c: In function ‘foo’:
test.c:9:1: error: unrecognizable insn:
9 | }
  | ^
(insn 9 8 10 2 (set (reg:CCFP 123)
(compare:CCFP (reg:TF 117 [ a.0_1 ])
(reg:TF 118 [ b.1_2 ]))) "test.c":5:6 -1
 (nil))
during RTL pass: vregs
test.c:9:1: internal compiler error: in extract_insn, at recog.cc:2812
0x102b7353 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
/home/gccbuild/gcc_trunk_git/gcc/gcc/rtl-error.cc:108
0x102b73a7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/home/gccbuild/gcc_trunk_git/gcc/gcc/rtl-error.cc:116
0x10c6636b extract_insn(rtx_insn*)
/home/gccbuild/gcc_trunk_git/gcc/gcc/recog.cc:2812
0x107ef797 instantiate_virtual_regs_in_insn
/home/gccbuild/gcc_trunk_git/gcc/gcc/function.cc:1611
0x107ef797 instantiate_virtual_regs
/home/gccbuild/gcc_trunk_git/gcc/gcc/function.cc:1994
0x107ef797 execute
/home/gccbuild/gcc_trunk_git/gcc/gcc/function.cc:2041
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Note that it should be configured with --with-long-double-format=ieee, since if
-mabi=ieeelongdouble is specified, it will requires vsx to be enabled.

[Bug testsuite/114320] New test case in r14-9439-g4aa87b856067d4 fails

2024-03-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114320

--- Comment #3 from Kewen Lin  ---
(In reply to Nathaniel Shead from comment #2)
> Sorry about that. I've not been able to work out what configure flags I need
> to pass to cause this to error in the first place (I don't normally develop
> for powerpc and the machine I'm using doesn't seem to fail no matter what

I guess the machine you are using (were referring to) isn't with powerpc chip,
cfarm provides some powerpc machines (https://portal.cfarm.net/machines/list/),
both ppc64le (LE -m64) and ppc64 (BE -m32/-m64), it's recommended to leverage
them for building/testing. :)

> flags I try), but am I correct in understanding that just adding
> "-Wno-psabi" to the tests should stop them from failing? If so I'm happy to
> push a patch to that effect.

I think so, for now we don't have an effective target dedicated for __ibm128
type but it's guarded the same as what's for __float128 type (it would be
relaxed though in future, even with that using ppc_float128_sw should just be
more strict).  Ideally we can add one effective target powerpc_vsx_ok (should
be powerpc_vsx) to ensure VSX to be enabled, but considering we are going to
rework it in next release and we don't normally disable vsx explicitly, this
can be optional.

[Bug testsuite/114320] New test case in r14-9439-g4aa87b856067d4 fails

2024-03-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114320

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-03-13
 Ever confirmed|0   |1
 CC||linkw at gcc dot gnu.org

--- Comment #1 from Kewen Lin  ---
These new test cases require "-Wno-psabi" to suppress the warning.

[Bug testsuite/101461] [12/13/14 regression] gcc.target/powerpc/fold-vec-load-builtin_vec_xl test cases fail after r12-2266

2024-03-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101461

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
 CC||linkw at gcc dot gnu.org

--- Comment #4 from Kewen Lin  ---
Already fixed by r12-2889-g8464894c86b03e.

[Bug target/113507] can't build a cross compiler to rs6000-ibm-aix7.2

2024-02-26 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113507

Kewen Lin  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #6 from Kewen Lin  ---
Segher will clean up this rs6000-*-* thing in next release, please use
powerpc*-*-* instead.

[Bug testsuite/106680] Test gcc.target/powerpc/bswap64-4.c fails on 32-bit BE

2024-02-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106680

--- Comment #12 from Kewen Lin  ---
(In reply to Sebastian Huber from comment #10)
> (In reply to Kewen Lin from comment #9)
> > Note that now we only disable implicit powerpc64 for -m32 when the
> > OS_MISSING_POWERPC64 is set.
> > 
> >   /* Don't expect powerpc64 enabled on those OSes with OS_MISSING_POWERPC64,
> >  since they do not save and restore the high half of the GPRs correctly
> >  in all cases.  If the user explicitly specifies it, we won't interfere
> >  with the user's specification.  */
> > #ifdef OS_MISSING_POWERPC64
> >   if (OS_MISSING_POWERPC64
> >   && TARGET_32BIT
> >   && TARGET_POWERPC64
> >   && !(rs6000_isa_flags_explicit & OPTION_MASK_POWERPC64))
> > rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
> > #endif
> > 
> > But rtems.h doesn't define OS_MISSING_POWERPC64
> 
> RTEMS supports the 64-bit PowerPC for the 64-bit multilibs.
> 

64-bit kernel should support 64-bit PowerPC, but does 32-bit kernel support
saving and restoring 64-bit regs?

The current rtems.h is saying yes, if it's no, we should fix the rtems.h and
you won't need the explicit -mno-powerpc64 then.


btw, take the comments in freebsd64.h for example.

/* FreeBSD doesn't support saving and restoring 64-bit regs with a 32-bit
   kernel. This is supported when running on a 64-bit kernel with
   COMPAT_FREEBSD32, but tell GCC it isn't so that our 32-bit binaries
   are compatible. */
#define OS_MISSING_POWERPC64 !TARGET_64BIT

[Bug testsuite/106680] Test gcc.target/powerpc/bswap64-4.c fails on 32-bit BE

2024-02-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106680

--- Comment #11 from Kewen Lin  ---
(In reply to Sebastian Huber from comment #8)
> Yes, it seems that -mcpu=e6500 -mno-powerpc64 yields the right code for the
> attached test case (with or without the -m32).

The default is -m32 I guess? :)

> 
> I am now a bit confused what the purpose of the -m32 and -m64 options is.

For -m32/-m64, the manual says:

Generate code for 32-bit or 64-bit environments of Darwin and SVR4 targets
(including GNU/Linux). The 32-bit environment sets int, long and pointer to 32
bits and generates code that runs on any PowerPC variant. The 64-bit
environment sets int to 32 bits and long and pointer to 64 bits, and generates
code for PowerPC64, as for -mpowerpc64.

But it's possible to interact with option powerpc64, like cpu e6500 which by
default supports powerpc64 and if applied OS is able to support the necessary
context switches, we want -mpowerpc64 kept and it's able to generate more
efficient code (leveraging insns guarded with powerpc64 flag).

[Bug testsuite/106680] Test gcc.target/powerpc/bswap64-4.c fails on 32-bit BE

2024-02-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106680

--- Comment #9 from Kewen Lin  ---
Note that now we only disable implicit powerpc64 for -m32 when the
OS_MISSING_POWERPC64 is set.

  /* Don't expect powerpc64 enabled on those OSes with OS_MISSING_POWERPC64,
 since they do not save and restore the high half of the GPRs correctly
 in all cases.  If the user explicitly specifies it, we won't interfere
 with the user's specification.  */
#ifdef OS_MISSING_POWERPC64
  if (OS_MISSING_POWERPC64
  && TARGET_32BIT
  && TARGET_POWERPC64
  && !(rs6000_isa_flags_explicit & OPTION_MASK_POWERPC64))
rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
#endif

But rtems.h doesn't define OS_MISSING_POWERPC64

gcc/config/rs6000/linux.h:#define OS_MISSING_POWERPC64 1
gcc/config/rs6000/freebsd64.h:#define OS_MISSING_POWERPC64 !TARGET_64BIT
gcc/config/rs6000/aix.h:#define OS_MISSING_POWERPC64 1
gcc/config/rs6000/linux64.h:#define OS_MISSING_POWERPC64 !TARGET_64BIT

meanwhile cpu "e6500" has MASK_POWERPC64 set by default (it's 64bit core).

That's why you still have powerpc64 flag set when you specify -m32 on rtems.

[Bug testsuite/106680] Test gcc.target/powerpc/bswap64-4.c fails on 32-bit BE

2024-02-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106680

--- Comment #7 from Kewen Lin  ---
(In reply to Sebastian Huber from comment #6)
> It seems that the change
> 
> commit acc727cf02a1446dc00f8772f3f479fa3a508f8e
> Author: Kewen Lin 
> Date:   Tue Dec 27 04:13:07 2022 -0600
> 
> rs6000: Rework option -mpowerpc64 handling [PR106680]
> 
> causes a regression for -mcpu=e6500 -m32, for example:
> 
> gcc -fpreprocessed -O2 -S -mcpu=e6500 -m32 -S imfs_add_node.c.67.s
> imfs_add_node.c.67.i
> 
> diff -u imfs_add_node.c.67.s.good.e2acff49fb2962b921bf8b73984b89878b61492c
> imfs_add_node.c.67.s.bad.acc727cf02a1446dc00f8772f3f479fa3a508f8e
> --- imfs_add_node.c.67.s.good.e2acff49fb2962b921bf8b73984b89878b61492c 
> 2024-01-20 12:15:15.143182571 +0100
> +++ imfs_add_node.c.67.s.bad.acc727cf02a1446dc00f8772f3f479fa3a508f8e  
> 2024-01-20 12:11:46.804204927 +0100
> @@ -52,8 +52,8 @@
> bne- 0,.L4
>  .L2:
> mr 4,29
> -   addi 3,1,8
> li 5,24
> +   addi 3,1,8
> bl rtems_filesystem_eval_path_start
> lis 9,IMFS_node_clone@ha
> lwz 10,20(3)
> @@ -63,12 +63,12 @@
> cmpw 0,10,9
> beq- 0,.L24
> li 4,134
> -   addi 3,1,8
> +   li 3,0
> bl rtems_filesystem_eval_path_error
>  .L9:
> li 31,-1
>  .L10:
> -   addi 3,1,8
> +   li 3,0
> bl rtems_filesystem_eval_path_cleanup
>  .L1:
> lwz 0,116(1)
> @@ -93,7 +93,7 @@
> lwz 9,12(31)
> li 8,96
> lhz 10,16(31)
> -   addi 3,1,8
> +   li 3,0
> stw 8,24(1)
> stw 9,8(1)
> stw 10,12(1)
> @@ -105,7 +105,7 @@
> cmpwi 0,9,0
> beq- 0,.L9
> li 4,22
> -   addi 3,1,8
> +   li 3,0
> bl rtems_filesystem_eval_path_error
> b .L9
> .p2align 4,,15
> @@ -129,12 +129,9 @@
> stw 9,0(10)
> stw 10,4(9)
> bl _Timecounter_Getbintime
> -   lwz 10,64(1)
> -   lwz 11,68(1)
> -   stw 10,40(30)
> -   stw 11,44(30)
> -   stw 10,48(30)
> -   stw 11,52(30)
> +   ld 9,64(1)
> +   std 9,40(30)
> +   std 9,48(30)
> b .L10
> .cfi_endproc
>  .LFE351:
> 
> For the call to rtems_filesystem_eval_path_cleanup() the register 3 should
> point to a structure on the stack. Correct is:
> 
> -   addi 3,1,8
> 
> Wrong is:
> 
> +   li 3,0
> 
> It seems that for the -mcpu=e6500 the -m32 option has not the right effect
> and some 64-bit instructions are generated, for example ld and std plus the

As the commit log, the previous behavior that -m32 also disables -mpowerpc64 is
wrong, -m{no,}powerpc64 should be independent of -m32/-m64.

> wrong function parameters.

I supposed that the behavior you wanted with -m32 is not to enable powerpc64
(since the previous behavior is -m32 can disable -mpowerpc64 as well), so I
think you can get the previous behavior if you specify one explicit
-mno-powerpc64 when adopting -m32.

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-01-30 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-30

--- Comment #13 from Kewen Lin  ---
One more finding: without an explicit cpu type but -mvsx, gcc passes -mpower7
to assembler already, but if there is an explicitly specified cpu type, it
won't do that. I think the reason why it doesn't always make it is that only
the last cpu type wins and the passing can override some higher cpu type
unexpectedly.

The fixing candidates seems to be:

diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index b09b5664af0..47b06d3c30d 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -74,7 +74,7 @@ fp128_includes = $(srcdir)/soft-fp/double.h \
   $(srcdir)/soft-fp/soft-fp.h

 # Build the emulator without ISA 3.0 hardware support.
-FP128_CFLAGS_SW  = -Wno-type-limits -mvsx -mfloat128 \
+FP128_CFLAGS_SW  = -Wno-type-limits -mvsx -mfloat128 -mcpu=power7 \
-mno-float128-hardware -mno-gnu-attribute \
-I$(srcdir)/soft-fp \
-I$(srcdir)/config/rs6000 \

Or

diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index b09b5664af0..bf4a5e6aaf0 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -74,7 +74,7 @@ fp128_includes = $(srcdir)/soft-fp/double.h \
   $(srcdir)/soft-fp/soft-fp.h

 # Build the emulator without ISA 3.0 hardware support.
-FP128_CFLAGS_SW  = -Wno-type-limits -mvsx -mfloat128 \
+FP128_CFLAGS_SW  = -Wno-type-limits -mvsx -mfloat128 -Wa,-many \
-mno-float128-hardware -mno-gnu-attribute \
-I$(srcdir)/soft-fp \
-I$(srcdir)/config/rs6000 \

As gcc considers -mvsx to imply -mcpu=power7 (appending onto the current
specified cpu type if there is one) while assembler doesn't consider like that.

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-01-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

Kewen Lin  changed:

   What|Removed |Added

Summary|Failed bootstrap on ppc |[14 regression] Failed
   |unrecognized opcode:|bootstrap on ppc
   |`lfiwzx' with -mcpu=7450|unrecognized opcode:
   ||`lfiwzx' with -mcpu=7450

--- Comment #12 from Kewen Lin  ---
(In reply to Sam James from comment #10)
> (In reality, I think it is a regression, given:
> a) it regresses non-release checking (which we sometimes use even for
> released versions, it's opt-in though);

But I assumed that non-release checking on old releases should also fail, from
non-release vs. non-release, the behavior doesn't change.

> b) it blocks further testing with GCC 14
> 

Sorry for that, put it back as you like. :)

> but I understand the argument that if a release were made with it, it
> wouldn't be the end of the world by itself and it only affects a specific
> configuration.)

[Bug target/113652] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-01-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #11 from Kewen Lin  ---
In gcc, lfiwzx is guarded with TARGET_LFIWZX => TARGET_POPCNTD (ISA2.06), while
-mvsx will guarantee TARGET_POPCNTD (ISA_2_6_MASKS_SERVER) set, so it considers
lfiwzx is supported. IMHO the underlying philosophy is that having the
capability of vsx the supported ISA level is at least 2.06, lfiwzx is supported
from 2.06, so it's supported.

But binutils seems not to follow it:
{"xvadddp", XX3(60,96), XX3_MASK,PPCVSX,PPCVLE, {XT6,
XA6, XB6}},
{"lfiwzx",  X(31,887),  X_MASK,   POWER7|PPCA2, 0,  {FRT,
RA0, RB}},
Both are guarded with different masks and apparently PPCVSX doesn't enable
POWER7.

Hi Alan and Peter,

I wonder if assembler can enable POWER7 when PPCVSX gets enabled like what gcc
adopts now?

[Bug target/113652] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-01-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

Kewen Lin  changed:

   What|Removed |Added

Summary|[14 regression] Failed  |Failed bootstrap on ppc
   |bootstrap on ppc|unrecognized opcode:
   |unrecognized opcode:|`lfiwzx' with -mcpu=7450
   |`lfiwzx' with -mcpu=7450|

--- Comment #9 from Kewen Lin  ---
(In reply to Andrew Pinski from comment #8)
> So t-float128 has this line:
> # Build the emulator without ISA 3.0 hardware support.
> FP128_CFLAGS_SW  = -Wno-type-limits -mvsx -mfloat128 \
> ...
> 
> Which gets added to some of the libgcc object files while compiling:
> $(fp128_softfp_obj)  : INTERNAL_CFLAGS += $(FP128_CFLAGS_SW)
> $(fp128_ppc_obj) : INTERNAL_CFLAGS += $(FP128_CFLAGS_SW)
> 
> 
> The problem is CFLAGS gets added also. It seems like passing -mvsx enables
> some other instructions in GCC's code generation BUT does not enable it for
> the assembler ...

ah, just noticed that it's bootstrapping gcc. Stripping regression tag since I
don't think it's actually a regression as comments above.

I found that the libgcc_cv_powerpc_float128 checking can pass with -mcpu=7450
-mabi=altivec -mvsx -mfloat128, the assembler options are "-a32 -mppc -mvsx
-maltivec -mbig" is actually the same as what are used for the case #c5
compiling. So it looks that -mvsx is supposed to tell assembler to recognize
vsx instructions but somehow "lfiwzx" is not counted as vsx instruction.

More specifically "xvadddp" is recognized by assembler with -mvsx while
"lfiwzx" isn't.

$ cat t1.s
.machine "7450"
lfiwzx 1,0,9

$ cat t2.s
.machine "7450"
xvadddp 34,34,35

$ as -a32 -mppc -mvsx t1.s -o t1.o
t1.s: Assembler messages:
t1.s:2: Error: unrecognized opcode: `lfiwzx'
$ as -a32 -mppc -mvsx t2.s -o t2.o
$ echo $?
$ 0

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-01-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #7 from Kewen Lin  ---
oops, I meant --enable-checking rather than --checking.

[Bug target/113652] [14 regression] Failed bootstrap on ppc unrecognized opcode: `lfiwzx' with -mcpu=7450

2024-01-29 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113652

--- Comment #6 from Kewen Lin  ---
I think this is related to r10-580-ge154242724b084 and this failure is expected
and a use error.

With it applied, we don't always pass -many to assembler with CHECKING_P
enabled. Actually compilers (gcc-13, gcc-12, gcc-11 or trunk) generate the same
assembly, but because gcc-11/gcc-12/gcc-13 is built with --checking=release by
default which doesn't set CHECKING_P while trunk is built with
--checking=yes,extra by default which set CHECKING_P. So it causes the
different behaviors so that further considered as regression unexpectedly.

The issue should be gone if trunk gets released as gcc-14 or it's built with
--checking=release. IMO Alan's commit aims to help to expose more and more such
unexpected use cases and users can fix them in place. As #c3 "PowerPC 7450 (aka
PowerPC G4) is only capable of -maltivec but not -mvsx", so it's unexpected to
have -mcpu=7450 meanwhile having -mvsx, could you check where the -mvsx comes
from and fix it instead?  Thanks!

btw, a workaround option is to add -Wa,-many to restore the previous behavior
that passing -many to assembler.

[Bug target/113507] can't build a cross compiler to rs6000-ibm-aix7.2

2024-01-22 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113507

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||segher at gcc dot gnu.org
   Last reconfirmed||2024-01-23
 Ever confirmed|0   |1

--- Comment #5 from Kewen Lin  ---
(In reply to H.J. Lu from comment #3)
> (In reply to Kewen Lin from comment #2)
> > Guessing /usr/local/bin/ld is a gnu ld? Based on what I heard before, gnu ld
> > has some problems on aix, people pass object files to aix system and use aix
> > ld there. Not sure if the understanding still holds.
> 
> I am building a cross compiler.  No AIX tools are involved.

Thanks for clarifying, I was dull and misunderstood it.

Confirmed, some symbols are from rs6000-builtin.cc (which is not generated) but
it requires some symbols in rs6000-builtins.cc (which is generated). Both
object files are not included in linking. The below diff can fix it:

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b2d7d7dd475..6b62e4fe56c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -557,8 +557,10 @@ rs6000*-*-*)
 extra_options="${extra_options} g.opt fused-madd.opt
rs6000/rs6000-tables.opt"
 extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
 extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+extra_objs="${extra_objs} rs6000-builtin.o rs6000-builtins.o"
 target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-logue.cc
\$(srcdir)/config/rs6000/rs6000-call.cc"
 target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
+target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
 ;;
 sparc*-*-*)
 cpu_type=sparc

According to David's comments "rs6000-ibm-aix doesn't exist any more" and I
vaguely remembered Segher also mentioned rs6000*-*-*) becomes stale, maybe we
can aggressively drop the whole rs6000*-*-*) case handling?

[Bug target/113507] can't build a cross compiler to rs6000-ibm-aix7.2

2024-01-21 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113507

Kewen Lin  changed:

   What|Removed |Added

 CC||bergner at gcc dot gnu.org,
   ||dje at gcc dot gnu.org,
   ||linkw at gcc dot gnu.org

--- Comment #2 from Kewen Lin  ---
Guessing /usr/local/bin/ld is a gnu ld? Based on what I heard before, gnu ld
has some problems on aix, people pass object files to aix system and use aix ld
there. Not sure if the understanding still holds.

[Bug testsuite/109705] [14 regression] gcc.dg/vect/pr25413a.c fails after r14-333-g6d4b59a9356ac4

2024-01-21 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109705

Kewen Lin  changed:

   What|Removed |Added

 CC||linkw at gcc dot gnu.org

--- Comment #7 from Kewen Lin  ---
(In reply to Peter Bergner from comment #6)
> (In reply to GCC Commits from comment #5)
> > commit r14-7270-g39fa71a0882928a25bd170580e3e9e89a69dce36
> > Author: Kewen Lin 
> > Date:   Mon Jan 15 20:55:40 2024 -0600
> > 
> > testsuite: Fix vect_long_mult on Power [PR109705]
> > 
> > As pointed out by the discussion in PR109705, the current
> > vect_long_mult effective target check on Power is broken.
> > This patch is to fix it accordingly.
> 
> Does this need backporting?

I guess no, the only use of vect_long_mult in release branches is
gcc/testsuite/gcc.dg/vect/pr60656.c which has another check
vect_widen_mult_si_to_di_pattern unsupported on Power.

[Bug testsuite/113535] rs6000, testsuite: Re-visit the current vect_* for Power

2024-01-21 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113535

Kewen Lin  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-22
   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org
 CC||bergner at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1

  1   2   3   4   5   6   7   8   9   10   >