[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #17 from GCC Commits --- The master branch has been updated by Andreas Krebbel : https://gcc.gnu.org/g:42189f21b22c43ac8ab46edf5f6a7b4d99bc86a5 commit r14-10087-g42189f21b22c43ac8ab46edf5f6a7b4d99bc86a5 Author: Andreas Krebbel Date: Tue Apr 23 10:05:46 2024 +0200 s390x: Fix vec_xl/vec_xst type aliasing [PR114676] The requirements of the vec_xl/vec_xst intrinsincs wrt aliasing of the pointer argument are not really documented. As it turns out, users are likely to get it wrong. With this patch we let the pointer argument alias everything in order to make it more robust for users. gcc/ChangeLog: PR target/114676 * config/s390/s390-c.cc (s390_expand_overloaded_builtin): Use a MEM_REF with an addend of type ptr_type_node. gcc/testsuite/ChangeLog: PR target/114676 * gcc.target/s390/zvector/pr114676.c: New test. Suggested-by: Jakub Jelinek
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #16 from Andreas Krebbel --- (In reply to Aleksei Nikiforov from comment #15) > I think fixing compiled code should be possible. I'm not sure if this bug > should be just closed. In addition to fixing the PyTorch usage of the builtin, I also plan to change GCC to the "alias everything" approach now. Although the documentation does not strictly requires us to, it prevents other users from falling into the same trap and makes GCC to match what Clang already does. The documentation anyway discourages everyone from using these builtins. So it should not be a big deal, if we sacrifice a bit of performance to make it more robust.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #15 from Aleksei Nikiforov --- I think fixing compiled code should be possible. I'm not sure if this bug should be just closed.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #14 from Jakub Jelinek --- (In reply to Andreas Krebbel from comment #13) > We will go and fix PyTorch instead. Although it is not clearly documented, > the way PyTorch uses the builtin right now is probably not what was > intended. It is pretty clear that the element type pointer needs to alias > vectors of the same element type, but there is no saying about aliasing > everything. > > I'm just wondering how to improve the diagnostics in our backend to catch > this. The example below is similar to what PyTorch does today. Casting mem > to (float*) prevents our builtin code from complaining about the type > mismatch and by that opens the door for the much harder to debug TBAA > problem. We need a TBAA analyzer among sanitizers (but writing it is really hard). > #include > > void __attribute__((noinline)) foo (int *mem) > { > vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem); So use *(vector float __attribute__((__may_alias__)) *)mem = (vector float){ 1.0f, 2.0f, 3.0f, 4.0f }; instead? Sure, GCC extension, not an intrinsic in that case...
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #13 from Andreas Krebbel --- We will go and fix PyTorch instead. Although it is not clearly documented, the way PyTorch uses the builtin right now is probably not what was intended. It is pretty clear that the element type pointer needs to alias vectors of the same element type, but there is no saying about aliasing everything. I'm just wondering how to improve the diagnostics in our backend to catch this. The example below is similar to what PyTorch does today. Casting mem to (float*) prevents our builtin code from complaining about the type mismatch and by that opens the door for the much harder to debug TBAA problem. #include void __attribute__((noinline)) foo (int *mem) { vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem); } int main () { int m[4] = { 0 }; foo (m); if (m[3] == 0) __builtin_abort (); return 0; }
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 Sam James changed: What|Removed |Added Priority|P1 |P2 --- Comment #12 from Sam James --- P1->P2 after IRC discussion (we released with it & unclear what the intrinsic behaviour is even supposed to be wrt tbaa).
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 CC||law at gcc dot gnu.org Priority|P3 |P1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-04-12
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #11 from Andreas Krebbel --- The documentation of vec_xl and vec_xst doesn't seem to mention anything special with regard to that. So I understand the memory is only accessed through pointers which are compatible to the ones used when invoking the builtin. That particular usage within pytorch looks ok to me. I'm already testing a patch which matches what you are proposing. I hope to be able to reduce the testcase somewhat. Thanks for your help!
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #10 from Jakub Jelinek --- I admit I haven't studied what exactly pytorch does there.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #9 from Jakub Jelinek --- It depends on what the vec_xl*/vec_xst* documentation requires/user expect. If the expectation is that the loads/stores should alias the scalar pointee of the pointer type passed to those intrinsics, then --- gcc/config/s390/s390-c.cc.jj2024-01-03 11:51:54.847407588 +0100 +++ gcc/config/s390/s390-c.cc 2024-04-10 16:30:31.896197832 +0200 @@ -498,8 +498,8 @@ s390_expand_overloaded_builtin (location /* Build a vector type with the alignment of the source location in order to enable correct alignment hints to be generated for vl. */ - tree mem_type = build_aligned_type (return_type, - TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[1]; + unsigned align = TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[1]))); + tree mem_type = build_aligned_type (return_type, align); return build2 (MEM_REF, mem_type, fold_build_pointer_plus ((*arglist)[1], (*arglist)[0]), build_int_cst (TREE_TYPE ((*arglist)[1]), 0)); @@ -511,11 +511,13 @@ s390_expand_overloaded_builtin (location /* Build a vector type with the alignment of the target location in order to enable correct alignment hints to be generated for vst. */ - tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]), - TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[2]; + unsigned align = TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[2]))); + tree mem_type = build_aligned_type (TREE_TYPE ((*arglist)[0]), align); return build2 (MODIFY_EXPR, mem_type, - build1 (INDIRECT_REF, mem_type, - fold_build_pointer_plus ((*arglist)[2], (*arglist)[1])), + build2 (MEM_REF, mem_type, + fold_build_pointer_plus ((*arglist)[2], + (*arglist)[1]), + build_int_cst (TREE_TYPE ((*arglist)[2]), 0)), (*arglist)[0]); } case S390_OVERLOADED_BUILTIN_s390_vec_load_pair: might be all that is needed (if it is needed at all). If the expectation is that one can read what has been written by those intrinsics or write what will be read by those intrinsics though unrelated effective pointers, then it should alias everything, which could be say just using ptr_type_node instead of the current types in both of the build_int_cst (..., 0) calls above.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #8 from Andreas Krebbel --- Apparently, I decided to go with a MEM_REF already for the load variant of the builtin - vec_xl. I've to check whether there was any reason not to do this also for vec_xst. Making it a pointer which aliases everything might be too big of a hammer I guess?!
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #7 from Richard Biener --- The question is what the intrinsic constraints are on TBAA.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 Jakub Jelinek changed: What|Removed |Added CC||iii at gcc dot gnu.org, ||jakub at gcc dot gnu.org, ||krebbel at gcc dot gnu.org --- Comment #6 from Jakub Jelinek --- (In reply to Andrew Pinski from comment #1) > /* Build a vector type with the alignment of the target >location in order to enable correct alignment hints to be >generated for vst. */ > tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]), > TYPE_ALIGN (TREE_TYPE (TREE_TYPE > ((*arglist)[2]; > return build2 (MODIFY_EXPR, mem_type, >build1 (INDIRECT_REF, mem_type, >fold_build_pointer_plus ((*arglist)[2], > (*arglist)[1])), >(*arglist)[0]); > > > Does -fno-strict-aliasing help? I wonder if the above (which is > S390_OVERLOADED_BUILTIN_s390_vec_xst from s390_expand_overloaded_builtin) > should be have an may_alias variant . Building MEM_REF is one option, another one is to cast the operand of INDIRECT_REF to build_pointer_type (mem_type, VOIDmode, true) type.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #5 from Andrew Pinski --- > I've bisected gcc, and issue first appears with gcc commit > 32955416d8040b1fa1ba21cd4179b3264e6c5bd6. This just improved DSE but I suspect there might be a way to reproduce it before that. Note the rs6000 backend works correctly with vec_sxt/vec_xl because it builds a MEM_REF directly instead of an INDIRECT_REF. ``` /* Since arg1 may be cast to a different type, just use ptr_type_node here instead of trying to enforce TBAA on pointer types. */ tree arg1_type = ptr_type_node; ... /* Use the build2 helper to set up the mem_ref. The MEM_REF could also take an offset, but since we've already incorporated the offset above, here we just pass in a zero. */ gimple *g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr, build_int_cst (arg1_type, 0))); ``` arg1_type is used for the aliasing set ...
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #4 from Aleksei Nikiforov --- Yes, it doesn't reproduce on x86_64, and previously getting rid of -O3 or using -fno-tree-dse also helped.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #3 from Andrew Pinski --- (In reply to Aleksei Nikiforov from comment #2) > Yes, -fno-strict-aliasing helped. Then the issue is in s390_expand_overloaded_builtin where it should be more aliasing friendly. and it is a target issue ...
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #2 from Aleksei Nikiforov --- Yes, -fno-strict-aliasing helped.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 Andrew Pinski changed: What|Removed |Added Keywords||wrong-code Component|tree-optimization |target Target Milestone|--- |12.4 --- Comment #1 from Andrew Pinski --- /* Build a vector type with the alignment of the target location in order to enable correct alignment hints to be generated for vst. */ tree mem_type = build_aligned_type (TREE_TYPE((*arglist)[0]), TYPE_ALIGN (TREE_TYPE (TREE_TYPE ((*arglist)[2]; return build2 (MODIFY_EXPR, mem_type, build1 (INDIRECT_REF, mem_type, fold_build_pointer_plus ((*arglist)[2], (*arglist)[1])), (*arglist)[0]); Does -fno-strict-aliasing help? I wonder if the above (which is S390_OVERLOADED_BUILTIN_s390_vec_xst from s390_expand_overloaded_builtin) should be have an may_alias variant .