[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #17 from GCC Commits  ---
The master branch has been updated by Andreas Krebbel :

https://gcc.gnu.org/g:42189f21b22c43ac8ab46edf5f6a7b4d99bc86a5

commit r14-10087-g42189f21b22c43ac8ab46edf5f6a7b4d99bc86a5
Author: Andreas Krebbel 
Date:   Tue Apr 23 10:05:46 2024 +0200

s390x: Fix vec_xl/vec_xst type aliasing [PR114676]

The requirements of the vec_xl/vec_xst intrinsincs wrt aliasing of the
pointer argument are not really documented.  As it turns out, users
are likely to get it wrong.  With this patch we let the pointer
argument alias everything in order to make it more robust for users.

gcc/ChangeLog:

PR target/114676
* config/s390/s390-c.cc (s390_expand_overloaded_builtin): Use a
MEM_REF with an addend of type ptr_type_node.

gcc/testsuite/ChangeLog:

PR target/114676
* gcc.target/s390/zvector/pr114676.c: New test.

Suggested-by: Jakub Jelinek 

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #16 from Andreas Krebbel  ---
(In reply to Aleksei Nikiforov from comment #15)
> I think fixing compiled code should be possible. I'm not sure if this bug
> should be just closed.

In addition to fixing the PyTorch usage of the builtin, I also plan to change
GCC to the "alias everything" approach now. Although the documentation does not
strictly requires us to, it prevents other users from falling into the same
trap and makes GCC to match what Clang already does. The documentation anyway
discourages everyone from using these builtins. So it should not be a big deal,
if we sacrifice a bit of performance to make it more robust.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-18 Thread aleksei.nikiforov at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #15 from Aleksei Nikiforov  
---
I think fixing compiled code should be possible. I'm not sure if this bug
should be just closed.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #14 from Jakub Jelinek  ---
(In reply to Andreas Krebbel from comment #13)
> We will go and fix PyTorch instead. Although it is not clearly documented,
> the way PyTorch uses the builtin right now is probably not what was
> intended. It is pretty clear that the element type pointer needs to alias
> vectors of the same element type, but there is no saying about aliasing
> everything.
> 
> I'm just wondering how to improve the diagnostics in our backend to catch
> this. The example below is similar to what PyTorch does today. Casting mem
> to (float*) prevents our builtin code from complaining about the type
> mismatch and by that opens the door for the much harder to debug TBAA
> problem.

We need a TBAA analyzer among sanitizers (but writing it is really hard).

> #include 
> 
> void __attribute__((noinline)) foo (int *mem)
> {
>   vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem);

So use
  *(vector float __attribute__((__may_alias__)) *)mem = (vector float){ 1.0f,
2.0f, 3.0f, 4.0f };
instead?  Sure, GCC extension, not an intrinsic in that case...

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-17 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #13 from Andreas Krebbel  ---
We will go and fix PyTorch instead. Although it is not clearly documented, the
way PyTorch uses the builtin right now is probably not what was intended. It is
pretty clear that the element type pointer needs to alias vectors of the same
element type, but there is no saying about aliasing everything.

I'm just wondering how to improve the diagnostics in our backend to catch this.
The example below is similar to what PyTorch does today. Casting mem to
(float*) prevents our builtin code from complaining about the type mismatch and
by that opens the door for the much harder to debug TBAA problem.

#include 

void __attribute__((noinline)) foo (int *mem)
{
  vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem);
}

int
main ()
{
  int m[4] = { 0 };
  foo (m);
  if (m[3] == 0)
__builtin_abort ();
  return 0;
}

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-12 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Sam James  changed:

   What|Removed |Added

   Priority|P1  |P2

--- Comment #12 from Sam James  ---
P1->P2 after IRC discussion (we released with it & unclear what the intrinsic
behaviour is even supposed to be wrt tbaa).

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||law at gcc dot gnu.org
   Priority|P3  |P1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-04-12

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #11 from Andreas Krebbel  ---
The documentation of vec_xl and vec_xst doesn't seem to mention anything
special with regard to that. So I understand the memory is only accessed
through pointers which are compatible to the ones used when invoking the
builtin.

That particular usage within pytorch looks ok to me.

I'm already testing a patch which matches what you are proposing. I hope to be
able to reduce the testcase somewhat.

Thanks for your help!

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #10 from Jakub Jelinek  ---
I admit I haven't studied what exactly pytorch does there.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #9 from Jakub Jelinek  ---
It depends on what the vec_xl*/vec_xst* documentation requires/user expect.
If the expectation is that the loads/stores should alias the scalar pointee of
the pointer type passed to those intrinsics, then
--- gcc/config/s390/s390-c.cc.jj2024-01-03 11:51:54.847407588 +0100
+++ gcc/config/s390/s390-c.cc   2024-04-10 16:30:31.896197832 +0200
@@ -498,8 +498,8 @@ s390_expand_overloaded_builtin (location
/* Build a vector type with the alignment of the source
   location in order to enable correct alignment hints to be
   generated for vl.  */
-   tree mem_type = build_aligned_type (return_type,
-   TYPE_ALIGN (TREE_TYPE (TREE_TYPE
((*arglist)[1];
+   unsigned align = TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[1])));
+   tree mem_type = build_aligned_type (return_type, align);
return build2 (MEM_REF, mem_type,
   fold_build_pointer_plus ((*arglist)[1], (*arglist)[0]),
   build_int_cst (TREE_TYPE ((*arglist)[1]), 0));
@@ -511,11 +511,13 @@ s390_expand_overloaded_builtin (location
/* Build a vector type with the alignment of the target
   location in order to enable correct alignment hints to be
   generated for vst.  */
-   tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]),
-   TYPE_ALIGN (TREE_TYPE (TREE_TYPE
((*arglist)[2];
+   unsigned align = TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[2])));
+   tree mem_type = build_aligned_type (TREE_TYPE ((*arglist)[0]), align);
return build2 (MODIFY_EXPR, mem_type,
-  build1 (INDIRECT_REF, mem_type,
-  fold_build_pointer_plus ((*arglist)[2],
(*arglist)[1])),
+  build2 (MEM_REF, mem_type,
+  fold_build_pointer_plus ((*arglist)[2],
+   (*arglist)[1]),
+  build_int_cst (TREE_TYPE ((*arglist)[2]), 0)),
   (*arglist)[0]);
   }
 case S390_OVERLOADED_BUILTIN_s390_vec_load_pair:
might be all that is needed (if it is needed at all).
If the expectation is that one can read what has been written by those
intrinsics or write
what will be read by those intrinsics though unrelated effective pointers, then
it should alias everything,
which could be say just using ptr_type_node instead of the current types in
both of the build_int_cst (..., 0) calls above.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #8 from Andreas Krebbel  ---
Apparently, I decided to go with a MEM_REF already for the load variant of the
builtin - vec_xl. I've to check whether there was any reason not to do this
also for vec_xst.

Making it a pointer which aliases everything might be too big of a hammer I
guess?!

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #7 from Richard Biener  ---
The question is what the intrinsic constraints are on TBAA.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-10 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Jakub Jelinek  changed:

   What|Removed |Added

 CC||iii at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org,
   ||krebbel at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
(In reply to Andrew Pinski from comment #1)
> /* Build a vector type with the alignment of the target
>location in order to enable correct alignment hints to be
>generated for vst.  */
> tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]),
> TYPE_ALIGN (TREE_TYPE (TREE_TYPE
> ((*arglist)[2];
> return build2 (MODIFY_EXPR, mem_type,
>build1 (INDIRECT_REF, mem_type,
>fold_build_pointer_plus ((*arglist)[2],
> (*arglist)[1])),
>(*arglist)[0]);
> 
> 
> Does -fno-strict-aliasing help? I wonder if the above (which is
> S390_OVERLOADED_BUILTIN_s390_vec_xst from s390_expand_overloaded_builtin)
> should be have an may_alias variant .

Building MEM_REF is one option, another one is to cast the operand of
INDIRECT_REF to
build_pointer_type (mem_type, VOIDmode, true) type.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #5 from Andrew Pinski  ---
> I've bisected gcc, and issue first appears with gcc commit
> 32955416d8040b1fa1ba21cd4179b3264e6c5bd6.

This just improved DSE but I suspect there might be a way to reproduce it
before that.

Note the rs6000 backend works correctly with vec_sxt/vec_xl because it builds a
MEM_REF directly instead of an INDIRECT_REF.
```
/* Since arg1 may be cast to a different type, just use ptr_type_node
   here instead of trying to enforce TBAA on pointer types.  */
tree arg1_type = ptr_type_node;

...
/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
   take an offset, but since we've already incorporated the offset
   above, here we just pass in a zero.  */
gimple *g
  = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,
  build_int_cst (arg1_type, 0)));

```
arg1_type is used for the aliasing set ...

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-10 Thread aleksei.nikiforov at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #4 from Aleksei Nikiforov  
---
Yes, it doesn't reproduce on x86_64, and previously getting rid of -O3 or using
-fno-tree-dse also helped.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #3 from Andrew Pinski  ---
(In reply to Aleksei Nikiforov from comment #2)
> Yes, -fno-strict-aliasing helped.

Then the issue is in s390_expand_overloaded_builtin where it should be more
aliasing friendly. and it is a target issue ...

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-10 Thread aleksei.nikiforov at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

--- Comment #2 from Aleksei Nikiforov  
---
Yes, -fno-strict-aliasing helped.

[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later

2024-04-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
  Component|tree-optimization   |target
   Target Milestone|--- |12.4

--- Comment #1 from Andrew Pinski  ---
/* Build a vector type with the alignment of the target
   location in order to enable correct alignment hints to be
   generated for vst.  */
tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]),
TYPE_ALIGN (TREE_TYPE (TREE_TYPE
((*arglist)[2];
return build2 (MODIFY_EXPR, mem_type,
   build1 (INDIRECT_REF, mem_type,
   fold_build_pointer_plus ((*arglist)[2],
(*arglist)[1])),
   (*arglist)[0]);


Does -fno-strict-aliasing help? I wonder if the above (which is
S390_OVERLOADED_BUILTIN_s390_vec_xst from s390_expand_overloaded_builtin)
should be have an may_alias variant .