[Bug libstdc++/115420] Default constructor of unordered_map deleted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115420 Andreas Krebbel changed: What|Removed |Added Resolution|FIXED |INVALID
[Bug libstdc++/115420] Default constructor of unordered_map deleted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115420 Andreas Krebbel changed: What|Removed |Added Status|RESOLVED|CLOSED Resolution|INVALID |FIXED --- Comment #2 from Andreas Krebbel --- Oh right, didn't make the connection from the error message to the header file rearrangements. Thanks for the pointer!
[Bug c++/115420] New: Default constructor of unordered_map deleted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115420 Bug ID: 115420 Summary: Default constructor of unordered_map deleted Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- t.cc: #include #include class Foo { std::unordered_map bar; Foo() : bar() {} }; g++ -std=c++11 -c t.cc fails with: t.cc: In constructor ‘Foo::Foo()’: t.cc:6:11: error: use of deleted function ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map() [with _Key = std::__cxx11::basic_string; _Tp = long unsigned int; _Hash = std::hash >; _Pred = std::equal_to >; _Alloc = std::allocator, long unsigned int> >]’ 6 | Foo() : bar() {} | ^ In file included from /home2/andreas/bisect/gcc-02dab998665-install/include/c++/13.0.0/unordered_map:41, from t.cc:1: /home2/andreas/bisect/gcc-02dab998665-install/include/c++/13.0.0/bits/unordered_map.h:146:7: note: ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map() [with _Key = std::__cxx11::basic_string; _Tp = long unsigned int; _Hash = std::hash >; _Pred = std::equal_to >; _Alloc = std::allocator, long unsigned int> >]’ is implicitly deleted because the default definition would be ill-formed: 146 | unordered_map() = default; | ^ starting with: commit 227351345d0caa596eff8325144f15b15f704c08 Author: Jonathan Wakely Date: Thu Jan 12 13:03:01 2023 + libstdc++: Do not include in concurrency headers The , , and headers use std::errc constants, but don't use std::system_error itself. They only use the __throw_system_error(int) function, which is defined in . By including the header for the errc constants instead of the whole of we avoid depending on the whole std::string definition. libstdc++-v3/ChangeLog: * include/bits/std_mutex.h: Remove include. * include/std/condition_variable: Add include. * include/std/mutex: Likewise. * include/std/shared_mutex: Likewise. The testcase works fine with gcc 12 and clang. There are similar BZs on that topic but these do not seem to fit exactly, as I understand it (e.g. 107022)
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #16 from Andreas Krebbel --- (In reply to Aleksei Nikiforov from comment #15) > I think fixing compiled code should be possible. I'm not sure if this bug > should be just closed. In addition to fixing the PyTorch usage of the builtin, I also plan to change GCC to the "alias everything" approach now. Although the documentation does not strictly requires us to, it prevents other users from falling into the same trap and makes GCC to match what Clang already does. The documentation anyway discourages everyone from using these builtins. So it should not be a big deal, if we sacrifice a bit of performance to make it more robust.
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #13 from Andreas Krebbel --- We will go and fix PyTorch instead. Although it is not clearly documented, the way PyTorch uses the builtin right now is probably not what was intended. It is pretty clear that the element type pointer needs to alias vectors of the same element type, but there is no saying about aliasing everything. I'm just wondering how to improve the diagnostics in our backend to catch this. The example below is similar to what PyTorch does today. Casting mem to (float*) prevents our builtin code from complaining about the type mismatch and by that opens the door for the much harder to debug TBAA problem. #include void __attribute__((noinline)) foo (int *mem) { vec_xst ((vector float){ 1.0f, 2.0f, 3.0f, 4.0f }, 0, (float*)mem); } int main () { int m[4] = { 0 }; foo (m); if (m[3] == 0) __builtin_abort (); return 0; }
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #11 from Andreas Krebbel --- The documentation of vec_xl and vec_xst doesn't seem to mention anything special with regard to that. So I understand the memory is only accessed through pointers which are compatible to the ones used when invoking the builtin. That particular usage within pytorch looks ok to me. I'm already testing a patch which matches what you are proposing. I hope to be able to reduce the testcase somewhat. Thanks for your help!
[Bug target/114676] [12/13/14 Regression] DSE removes assignment that is used later
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 --- Comment #8 from Andreas Krebbel --- Apparently, I decided to go with a MEM_REF already for the load variant of the builtin - vec_xl. I've to check whether there was any reason not to do this also for vec_xst. Making it a pointer which aliases everything might be too big of a hammer I guess?!
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #33 from Andreas Krebbel --- (In reply to Andrew Pinski from comment #26) ... > I suspect if we change the s390 backend just slightly to set the cost when > there is an index to the address to 1 for the MEM, combine won't be acting > up here. > Basically putting in sync the 2 cost methods. I've tried that but this didn't change anything. As you have expected the problem goes away when letting s390_address_cost always return 0.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #32 from Andreas Krebbel --- (In reply to Segher Boessenkool from comment #25) > So this testcase compiles on powerpc64-linux (-O2) in about 34s. Is s390x > way worse, or is this in lie what you are seeing? Way worse. See #c22 : 20s before your commit and 5min with it.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #23 from Andreas Krebbel --- Created attachment 57646 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57646=edit Testcase for comment #22
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #22 from Andreas Krebbel --- I did a git bisect which ended up pointing at this commit, somewhere between GCC 8 and 9: commit c4c5ad1d6d1e1e1fe7a1c2b3bb097cc269dc7306 (bad) Author: Segher Boessenkool Date: Mon Jul 30 15:18:17 2018 +0200 combine: Allow combining two insns to two insns This patch allows combine to combine two insns into two. This helps in many cases, by reducing instruction path length, and also allowing further combinations to happen. PR85160 is a typical example of code that it can improve. This patch does not allow such combinations if either of the original instructions was a simple move instruction. In those cases combining the two instructions increases register pressure without improving the code. With this move test register pressure does no longer increase noticably as far as I can tell. (At first I also didn't allow either of the resulting insns to be a move instruction. But that is actually a very good thing to have, as should have been obvious). With this command line: cc1plus -O2 -march=z196 -fpreprocessed Q111-8.ii -quiet before: 20s compile-time and21846 total combine attempts after: > 5min compile-time and 43175686 total combine attempts
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #21 from Andreas Krebbel --- (In reply to Segher Boessenkool from comment #16) ... > When some insns have changed (or might have changed, combine does not always > know > the details), combinations of the insn with later insns are tried again. > Sometimes > this finds new combination opportunities. > > Not retrying combinations after one of the insns has changed would be a > regression. Wouldn't it in this particular case be possible to recognize already in try_combine that separating the move out of the parallel cannot lead to additional optimization opportunities? To me it looks like we are just recreating the situation we had before merging the INSNs into a parallel. Is there a situation where this could lead to any improvement in the end?
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #20 from Andreas Krebbel --- (In reply to Segher Boessenkool from comment #17) ... > So what is really happening? And, when did this start, anyway, because > apparently at some point in time all was fine? Due to the C++ constructs used the testcase doesn't compile with much older GCCs. However, I can confirm that the problem can already be reproduced with GCC 11.1.0.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #19 from Andreas Krebbel --- (In reply to Sarah Julia Kriesch from comment #15) > (In reply to Segher Boessenkool from comment #13) > > (In reply to Sarah Julia Kriesch from comment #12) > > A bigger case of what? What do you mean? > Not only one software package is affected by this bug. "Most" software > builds are affected. As Andreas mentioned correctly, the fix is also > beneficial for other projects/target software. I don't think we have any evidence yet that this is the problem which also hits us with other packages builds. If you have other cases please open separate BZs for that and we will try to figure out whether it is actually a DUP of this one. With "targets" I meant other GCC build targets. This pattern doesn't look s390x-specific to me, although I haven't tried to reproduce it somewhere else.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #14 from Andreas Krebbel --- If my analysis from comment #1 is correct, combine does superfluous steps here. Getting rid of this should not cause any harm, but should be beneficial for other targets as well. I agree that the patch I've proposed is kind of a hack. Do you think this could be turned into a proper fix?
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #10 from Andreas Krebbel --- Created attachment 57599 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57599=edit Testcase - somewhat reduced from libecpint Verified with rev 146f16c97f6 cc1plus -O2 t.cc try_combine invocations: x86: 3 27262 27603 s390x: 8 40439657 40440339
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 Andreas Krebbel changed: What|Removed |Added CC||stefansf at linux dot ibm.com --- Comment #4 from Andreas Krebbel --- Hi Segher, any guidance on how to proceed with that? This recently was brought up by distro people again because it is causing actual problems in their build setups.
[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986 Andreas Krebbel changed: What|Removed |Added CC||shinwogud12 at gmail dot com --- Comment #7 from Andreas Krebbel --- *** Bug 112665 has been marked as a duplicate of this bug. ***
[Bug target/112665] I am getting incorrect output values at optimization level 2 in GCC for the s390x architecture.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112665 Andreas Krebbel changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE CC||krebbel at gcc dot gnu.org --- Comment #2 from Andreas Krebbel --- I can confirm this when running the program with qemu but not on real hardware. The code is also using the chrl instruction so I guess this is another instance of PR112986 *** This bug has been marked as a duplicate of bug 112986 ***
[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986 Andreas Krebbel changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #6 from Andreas Krebbel --- No problem. Thanks for testing s390x! I've requested the qemu fix to be included into Ubuntu 22.04. Closing the BZ now.
[Bug target/112996] Improperly evaluated value of the s390x conditional expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112996 Andreas Krebbel changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||iii at linux dot ibm.com Resolution|--- |DUPLICATE --- Comment #2 from Andreas Krebbel --- Same as with PR112986. Confirmed with qemu. Runs fine on real hardware. *** This bug has been marked as a duplicate of bug 112986 ***
[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986 --- Comment #3 from Andreas Krebbel --- *** Bug 112996 has been marked as a duplicate of this bug. ***
[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986 Andreas Krebbel changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2023-12-13 CC||iii at linux dot ibm.com --- Comment #2 from Andreas Krebbel --- I can confirm the failure when running the binaries with qemu. However, the binaries run as expected on real hardware. So it might rather be a qemu issue. @Ilya:Ubuntu 22.04 is using qemu 6.2.0. Is this perhaps something you have fixed already?
[Bug pch/112319] New: segfault with pch and #pragma GCC diagnostic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112319 Bug ID: 112319 Summary: segfault with pch and #pragma GCC diagnostic Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: pch Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- touch s.h touch u.h main.cpp: #include "s.h" #include "u.h" g++ s.h g++ -c main.cpp --save-temps In file included from main.cpp:2: u.h:1:9: internal compiler error: Segmentation fault 1 | #pragma GCC diagnostic | ^~~ 0x1764fbf crash_signal /home/andreas/build/../gcc/gcc/toplev.cc:314 0x10201cd maybe_read_tokens_for_pragma_lex /home/andreas/build/../gcc/gcc/cp/parser.cc:49713 0x10201cd pragma_lex(tree_node**, unsigned int*) /home/andreas/build/../gcc/gcc/cp/parser.cc:49735 0x11a962c pragma_diagnostic_lex /home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:851 0x11a9d78 handle_pragma_diagnostic_impl /home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:879 0x11a9d78 handle_pragma_diagnostic_early_pp /home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:1039 0x11abdbd c_pp_invoke_early_pragma_handler(unsigned int) /home/andreas/build/../gcc/gcc/c-family/c-pragma.cc:1769 0x11a8180 token_streamer::stream(cpp_reader*, cpp_token const*, unsigned int) /home/andreas/build/../gcc/gcc/c-family/c-ppoutput.cc:293 0x11a8461 scan_translation_unit /home/andreas/build/../gcc/gcc/c-family/c-ppoutput.cc:351 0x11a8461 preprocess_file(cpp_reader*) /home/andreas/build/../gcc/gcc/c-family/c-ppoutput.cc:106 0x11a693d c_common_init() /home/andreas/build/../gcc/gcc/c-family/c-opts.cc:1236 0xfa35be cxx_init() /home/andreas/build/../gcc/gcc/cp/lex.cc:338 0xe94153 lang_dependent_init /home/andreas/build/../gcc/gcc/toplev.cc:1816 0xe94153 do_compile /home/andreas/build/../gcc/gcc/toplev.cc:2111 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Can be reproduced on x86 and s390x since: e664ea960a200aac88ffc3c7fb9fe55ea4df2011 is the first bad commit commit e664ea960a200aac88ffc3c7fb9fe55ea4df2011 Author: Lewis Hyatt Date: Fri Jun 30 18:23:24 2023 -0400 c-family: Implement pragma_lex () for preprocess-only mode
[Bug tree-optimization/111039] New: Unable to coalesce ssa_names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111039 Bug ID: 111039 Summary: Unable to coalesce ssa_names Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- compiler_corruption_function(flags) { int nowait = flags & 1048576, isexpand = flags & 8388608; abcd(); _setjmp(flags); if (nowait && isexpand) flags &= 0; abcde(); } gcc -mbranch-cost=0 -O t.c verified with recent GCC: 02ecc9a2632 t.c: In function ‘compiler_corruption_function’: t.c:1:1: internal compiler error: SSA corruption 1 | compiler_corruption_function(flags) { | ^~~~ 0x15a657c fail_abnormal_edge_coalesce /home/andreas/build/../gcc/gcc/tree-ssa-coalesce.cc:1003 0x15a657c coalesce_partitions /home/andreas/build/../gcc/gcc/tree-ssa-coalesce.cc:1425 0x15a657c coalesce_ssa_name(_var_map*) /home/andreas/build/../gcc/gcc/tree-ssa-coalesce.cc:1755 0x153d6cf remove_ssa_form /home/andreas/build/../gcc/gcc/tree-outof-ssa.cc:1065 0x153d6cf rewrite_out_of_ssa(ssaexpand*) /home/andreas/build/../gcc/gcc/tree-outof-ssa.cc:1323 0xf42073 execute /home/andreas/build/../gcc/gcc/cfgexpand.cc:6610 This very much looks like another instance of PR71020.
[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 Andreas Krebbel changed: What|Removed |Added Version|13.0|12.2.1 --- Comment #16 from Andreas Krebbel --- The testcase fails on GCC 12.2.1 as well. Should we apply it there as well after giving it some time in mainline?
[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 Andreas Krebbel changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #15 from Andreas Krebbel --- Your patch fixes the problem for me. Thanks for the quick fix!
[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 --- Comment #7 from Andreas Krebbel --- (In reply to Andrew Pinski from comment #6) > (In reply to Andreas Krebbel from comment #5) > > In: > > > > _1 = src_6(D)->a; > > dst$val_9 = _1; > > _2 = BIT_FIELD_REF ; > > _3 = _2 & 64; > > if (_3 != 0) > > There is only 2 accesses going on in the above IR because SRA removed the > 3rd when it replaced the access of dst.val with dst$val but didn't update > BIT_FIELD_REF to remove the byteswap ... Ok, got it. It isn't the removal of the assignment. As you say it happens in early SRA when changing dst.val to dst$val and with that going from the union with the storage order marker to a long int without it. The marker on the BIT_FIELD_REF needs to be in sync with the marker on its inner reference. Dropping one without adjusting the other is the problem here. Thanks for the pointer! The following change helps with that testcase: diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 8dfc923ed7e..6b1ce6e8b4a 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -3815,8 +3815,13 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) } } else - *expr = repl; - sra_stats.exprs++; + { + if (bfr && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (*expr))) + REF_REVERSE_STORAGE_ORDER (bfr) = 0; + + *expr = repl; + sra_stats.exprs++; + } } else if (write && access->grp_to_be_debug_replaced) {
[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 --- Comment #5 from Andreas Krebbel --- In: _1 = src_6(D)->a; dst$val_9 = _1; _2 = BIT_FIELD_REF ; _3 = _2 & 64; if (_3 != 0) src, dst and the BIT_FIELD_REF carry storage order flags which result in either bswaps being emitted or, in case of the bitfield, the constant for the compare to be adjusted. So from reading "src" to evaluating "_2" 3 "bswaps" will be applied. After dropping the assignment to dst only two remain which cancel each other out. So in the end we access the value without any adjustments. Just to check I did: diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index b36dd97802b..b858194a432 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -1820,6 +1820,7 @@ handle_scalar_storage_order_attribute (tree *node, tree name, tree args, } TYPE_REVERSE_STORAGE_ORDER (type) = reverse; + TYPE_VOLATILE (type) = reverse; return NULL_TREE; } As expected this "fixes" the problem but is probably too big of a hammer here since it basically voids many of the advantages of the attribute which is folding away many of the bswaps.
[Bug tree-optimization/108199] Bitfields and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 --- Comment #3 from Andreas Krebbel --- Moving the local definition of dst out of the function to global scope prevents the store from getting eliminated. union DST dst; As expected the store is still in the FRE dump: _1 = src_6(D)->a; dst.val = _1;<--- _2 = BIT_FIELD_REF ; _3 = _2 & 64; if (_3 != 0) ... and the first by is accessed: bar: movq(%rdi), %rax movq%rax, dst(%rip) testb $64, %al jne .L11
[Bug tree-optimization/108199] Bitfields and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 Andreas Krebbel changed: What|Removed |Added Target||x86_64 Build||x86_64 Keywords||wrong-code Host||x86_64 --- Comment #2 from Andreas Krebbel --- The testcase does an assigned between two struct with endianess differing from host endianess (assumed to be little). Here the required byteswaps are supposed to cancel each other out. After that a bitfield comparison on the target struct is done. This comparison uses the wrong byte offset into the bitfield: testb $64, 7(%rdi) jne .L11 On a big endian target the first bits in the bitfield are supposed to reside in the first bytes in memory. The problem appears to get introduced when dead store elimination removes the assignment to the target struct in FRE. Before FRE we have the following: _1 = src_6(D)->a; bswap dst$val_9 = _1; bswap _2 = BIT_FIELD_REF ; bswap _3 = _2 & 64; if (_3 != 0) ... This would result in 3 bswaps chained to each other. However, after FRE we have only two because the dead store to dst$val is removed. _1 = src_6(D)->a; _2 = BIT_FIELD_REF <_1, 8, 0>; _3 = _2 & 64; if (_3 != 0) Now we have only which cancel each other out. Looks like we have to prevent depending stores/loads with different endianess from getting removed - perhaps by making them also volatile? I think we have to keep the number of memory accesses with foreign endianess constant over the optimizations.
[Bug tree-optimization/108199] Bitfields and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 --- Comment #1 from Andreas Krebbel --- Created attachment 54150 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54150=edit Testcase
[Bug tree-optimization/108199] New: Bitfields and storage_order_attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199 Bug ID: 108199 Summary: Bitfields and storage_order_attribute Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: ---
[Bug c++/107632] New: has_facet does not work with -mlong-double-64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107632 Bug ID: 107632 Summary: has_facet does not work with -mlong-double-64 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- #include #include using namespace std; int main(int argc, char *argv[]) { locale oGlobalLocale; if (!has_facet< num_get > > >( oGlobalLocale )) __builtin_abort (); } g++ t.cpp -o t && ./t -> works as expected g++ t.cpp -o t -mlong-double-64 && ./t -> aborts
[Bug tree-optimization/107372] Loop distribution create memcpy between structs with different storage order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107372 --- Comment #1 from Andreas Krebbel --- Created attachment 53764 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53764=edit Experimental Fix Looks like the error while analyzing the data ref is not propagated to the upper layers to actually prevent the optimization. This patch fixes this for me.
[Bug tree-optimization/107372] New: Loop distribution create memcpy between structs with different storage order
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107372 Bug ID: 107372 Summary: Loop distribution create memcpy between structs with different storage order Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- For t.c with "gcc -O3 t.c": struct L { unsigned int val[256]; } __attribute__((scalar_storage_order ("little-endian"))); struct B { unsigned int val[256]; } __attribute__((scalar_storage_order ("big-endian"))); void foo (struct L *restrict l, struct B *restrict b) { int i; for (i = 0; i < 256; i++) l->val[i] = b->val[i]; } The loop distribution pass currently generates a memcpy although it recognizes correctly that both sides of the assignment have different storage order: Analyzing # of iterations of loop 1 exit condition [255, + , 4294967295] != 0 bounds on difference of bases: -255 ... -255 result: # of iterations 255, bounded by 255 Creating dr for *b_5(D).val[i_11] analyze_innermost: t.c:16:23: missed: failed: reverse storage order. ... void foo (struct L * restrict l, struct B * restrict b) { int i; [local count: 10737416]: __builtin_memcpy (l_6(D), b_5(D), 1024); return; }
[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101 --- Comment #22 from Andreas Krebbel --- The longer a have been looking at these STRICT_LOW_PART issue the more I think that STRICT_LOW_PART is an awful way to express what we need: - the information needed to understand what it is doing is distributed across 3 RTXs (strict_low_part (subreg:mode1 (reg:mode2 xx) OFS)) - the big problems arise since the involved RTXs are separately optimized and we might end up with partial information without a clear definition of how to deal with that - actually it is really hard to handle the RTXs as one unit. Recursively walking RTXs needs to record whether we are in a STRICT_LOW_PART or not. I think it might make sense to explore other ways to express this: 1. SUBREG flag - Looks easy, but it would be hard to catch all places which should care about that flag. 2. Introduce a new RTX code which has a mode and an offset attached but does not require an additional SUBREG anymore. 3. Since a STRICT_LOW_PART is essentially a bit insertion operation we could express it always with a ZERO_EXTRACT target operand and get rid of STRICT_LOW_PART entirely. A ZERO_EXTRACT would be somewhat more cumbersome to deal with, since it would always require to check the bit width and offset for all the cases which just use mode boundaries. But at least most passes know how to deal with them already.
[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101 --- Comment #21 from Andreas Krebbel --- I have committed a patch now which accepts only SUBREGs before reload and then also REGs to deal with how LRA operates right now. I've continued a bit with the patch from Comment 18. It bootstraps on s390x and x86-64. On s390x also the testsuite is clean. However, I see a few failures in the arch specific tests on x86-64. The cases I looked at so far are the result of several peepholes and splitters not being triggered anymore. I've fixed most of them I think but there are also cases where I'm not sure what to do exactly. In case of a matching constraint between a strict_low_part operand and a normal operand. Reload now (with the patch from Comment 18) would remove the subreg on the operand with the matching constraint and would leave it in for the strict_low_part operand. (insn 9 8 16 2 (parallel [ (set (strict_low_part (subreg:QI (reg/v:SI 0 ax [orig:86 a ] [86]) 0)) (and:QI (reg:QI 0 ax [orig:86 a ] [86]) (reg:SI 4 si [92]))) (clobber (reg:CC 17 flags)) ]) "/home/andreas/gcc/gcc/testsuite/gcc.target/i386/pr91188-1a.c":20:10 553 {*andqi_1_slp} (nil)) I think this should be addressed separately. Once we solved it I will adjust the s390x backend again if necessary.
[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101 --- Comment #18 from Andreas Krebbel --- (In reply to Segher Boessenkool from comment #17) ... > Yes, but that says the high 48 bits of the hardware reg are untouched, which > is not true (only the high 16 of the low 32 are guaranteed unmodified). Right, if the original register mode does not match the mode of the full hardreg, we continue to need that mode as the upper bound. So with the subreg folding in reload we appear to loose information we need to interpret the STRICT_LOW_PART correctly. I'm testing the following patch in combination with my other fix now: diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc index 4ddbe477d92..9c125a9ce38 100644 --- a/gcc/lra-spills.cc +++ b/gcc/lra-spills.cc @@ -855,6 +855,7 @@ lra_final_code_change (void) for (i = id->insn_static_data->n_operands - 1; i >= 0; i--) if ((DEBUG_INSN_P (insn) || ! static_id->operand[i].is_operator) + && ! static_id->operand[i].strict_low && alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn))) { lra_update_dup (id, i); With that change the SUBREG folding from comment #11 happens later in final (cleanup_subreg_operands). I'm not sure whether we would have to prevent it there as well?!
[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101 --- Comment #16 from Andreas Krebbel --- (In reply to Segher Boessenkool from comment #15) > (In reply to Andreas Krebbel from comment #14) > > > So you are suggesting that every strict_low_part after reload can just be > > > removed? If that is true, should we not just do exactly that then? > > > > I think we have 3 options: > > (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs. > > (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG > > (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. ... > > (3) E.g. it means that the bits of hardreg x in its hardware mode (the mode > > for UNITS_PER_WORD) which are not covered by MODE are not touched by the > > SET. > > But say you have (strict_low_part (subreg:HI (reg:SI) 0)) and the hardware > is 64-bit. That only means the low 32 bits of the reg aren't clobbered, the > high 32 bits are fair game. That does not agree with your proposed > semantics. In that case I would have expected reload to turn this into (strict_low_part (reg:HI xx)) already.
[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101 --- Comment #14 from Andreas Krebbel --- (In reply to Segher Boessenkool from comment #13) > (Sorry I missed this) > > (In reply to Andreas Krebbel from comment #11) > > I've tried to change our movstrict backend patterns to use a predicate on > > the dest operand which enforces a subreg. However, since reload strips the > > subreg away when assigning hard regs we end up with a STRICT_LOW_PART of a > > reg again. At least after reload something like this should be acceptable - > > right? > > > > 298r.ira: > > (insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4)) > > (const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi} > > (nil)) > > > > 299r.reload: > > (insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64])) > > (mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S4 A32])) > > "t.cc":37:17 1485 {movstrictsi} > > (nil)) > > So you are suggesting that every strict_low_part after reload can just be > removed? If that is true, should we not just do exactly that then? I think we have 3 options: (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs. (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. (1) For that, all passes after reload must be able to deal with these SUBREGs. Since SUBREGs are rare after reload it is hard to say how robust that handling is right now. (2) Here the question to me is which passes after reload currently do something with the strict-low-part info. Clearly a non-option if we would loose any optimizations with that. (3) E.g. it means that the bits of hardreg x in its hardware mode (the mode for UNITS_PER_WORD) which are not covered by MODE are not touched by the SET.
[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101 --- Comment #11 from Andreas Krebbel --- I've tried to change our movstrict backend patterns to use a predicate on the dest operand which enforces a subreg. However, since reload strips the subreg away when assigning hard regs we end up with a STRICT_LOW_PART of a reg again. At least after reload something like this should be acceptable - right? 298r.ira: (insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4)) (const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi} (nil)) 299r.reload: (insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64])) (mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S4 A32])) "t.cc":37:17 1485 {movstrictsi} (nil))
[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101 --- Comment #10 from Andreas Krebbel --- We generate the movstrict target operand with gen_lowpart. If the operand for gen_lowpart is already a paradoxical subreg the two subregs cancel each other out and we end up with a plain reg. I'm testing the following patch right now. It falls back to a normal move in that case and fixes the testcase: diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 5aaf76a9490..d90ec1a6de1 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -6523,6 +6523,14 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src) rtx low_dest = gen_lowpart (smode, dest); rtx low_src = gen_lowpart (smode, src); + /* In case two subregs cancelled each other out, do a normal +move. */ + if (!SUBREG_P (low_dest)) + { + emit_move_insn (low_dest, low_src); + return true; + } + switch (smode) { case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src)); return true;
[Bug tree-optimization/105175] [12 Regression] Pointless warning about missed vector optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105175 --- Comment #2 from Andreas Krebbel --- I would expect the vectorizer to only generate vector modes which would fit into word mode if no hardware vector support is available. E.g. for: struct { unsigned a, b, c, d; } s; foo() { s.a &= 42; s.b &= 42; s.c &= 42; s.d &= 42; } I see two "vector 2 unsigned" operations being generated when compiling with -mno-sse but with sse I get a 4 element vector as expected.
[Bug rtl-optimization/105175] New: [12 Regression] Pointless warning about missed vector optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105175 Bug ID: 105175 Summary: [12 Regression] Pointless warning about missed vector optimization Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- For this code snippet extracted from Qemu source: enum { QEMU_MIGRATION_COOKIE_PERSISTENT = 1 }; struct { unsigned flags; unsigned flagsMandatory } qemuMigrationCookieGetPersistent_mig; qemuMigrationCookieGetPersistent() { qemuMigrationCookieGetPersistent_mig.flags &= QEMU_MIGRATION_COOKIE_PERSISTENT; qemuMigrationCookieGetPersistent_mig.flagsMandatory &= QEMU_MIGRATION_COOKIE_PERSISTENT; } cc1 -O3 -mno-sse t.c -Wvector-operation-performance gives me: t.c: In function ‘qemuMigrationCookieGetPersistent’: t.c:7:46: warning: vector operation will be expanded with a single scalar operation [-Wvector-operation-performance] 7 | qemuMigrationCookieGetPersistent_mig.flags &= The generated code actually looks quite decent. Both integer AND operations are merged into a 64 bit AND since https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=f31da42e047e8018ca6ad9809273bc7efb6ffcaf This appears to be a nice optimization to me. However, in tree-vect-generic.cc we then complain about this being implemented with just a scalar instruction. Apart from this being pretty confusing for the programmer who never requested anything to be vectorized I also don't see why it is a bad thing to implement a vector operation with a scalar operation as long as it is able to cover the entire vector with that. With GCC 12 we have auto-vectorization enabled already with -O2, so I expect this warning to surface much more frequently now. In particular on targets like s390 where older distros still have to build everything without hardware vector support this might be annoying. Also I'm not sure whether this warning ever points at an actual problem. To me it looks like we should just drop it altogether.
[Bug target/104327] [12 Regression] Inlining error on s390x since r12-1039
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104327 --- Comment #8 from Andreas Krebbel --- I will work on a patch. Thanks for the hint! I agree for HTM. VX is an ABI switch since it changes the calling conventions for vector types.
[Bug target/104327] [12 Regression] Inlining error on s390x since r12-1039
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104327 --- Comment #5 from Andreas Krebbel --- Yes, that's the right fix I think. Thanks! MVCLE is a shorter version of a loop doing MVCs but has some startup overhead.
[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364 Andreas Krebbel changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID
[Bug rtl-optimization/104034] Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104034 Andreas Krebbel changed: What|Removed |Added Last reconfirmed||2022-01-14 Ever confirmed|0 |1 Priority|P3 |P2 Keywords||wrong-code Host||s390x Status|UNCONFIRMED |NEW
[Bug rtl-optimization/104034] New: Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104034 Bug ID: 104034 Summary: Miscompilation of LLVM on s390x with -march=z13 -mtune=z14 in GCC 8.x Product: gcc Version: 8.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Created attachment 52194 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52194=edit Testcase Initial analysis done by Jakub Jelinek as part of: https://bugzilla.redhat.com/show_bug.cgi?id=2028609 The following testcase is miscompiled on s390x with g++ -fPIC -fvisibility-inlines-hidden -ffunction-sections -fdata-sections -O2 -fPIC -fno-exceptions -fno-rtti -std=c++14 -mlong-double-128 -march=z13 -mtune=z14 both with the RHEL gcc 8.x and with upstream 8.5.0. When miscompiled, it prints something like __insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0xdeadbeefcafebabe 0xdeadbeefcafebabe __insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe rather than __insertion_sort 0x3ffd74fd310 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe __insertion_sort 0x3ffd74fd348 0x3ffd74fd348 0x10006b8 0xdeadbeefcafebabe The interesting part is below, .cfi_* directives removed for brevity. On entry, this function has 3 pointers in %r2, %r3 and %r4 registers, and %r5 is pointer to the 16-byte function_ref - object with trivially copyable class containing 2 8-byte members. _ZSt24__merge_sort_with_bufferIPPvS1_N4llvm12function_refIFbS0_S0_vT_S6_T0_T1_: stmg%r6,%r15,48(%r15) lgr %r14,%r15 lay %r15,-248(%r15) aghi%r14,-32 std %f8,0(%r14) std %f12,8(%r14) std %f14,16(%r14) std %f9,24(%r14) sgrk%r11,%r3,%r2 lgr %r1,%r4 srag%r13,%r11,3 agr %r1,%r11 lmg %r8,%r9,0(%r5) stmg%r8,%r9,160(%r15) ! The above stores the whole 16-byte function_ref correctly to %r15+160 cgijle %r11,48,.L13 vlvgp %v0,%r8,%r9 ldgr%f9,%r1 ldgr%f12,%r4 la %r1,200(%r15) lgr %r10,%r3 stg %r11,176(%r15) ldgr%f8,%r2 lgr %r6,%r9 vlgvg %r7,%v0,1 stmg%r8,%r9,184(%r15) ! So does the above lgr %r8,%r1 .L14: la %r11,56(%r2) lgr %r4,%r8 lgr %r3,%r11 stmg%r6,%r7,200(%r15) ! But this one actually stores both 8-byte words the same to %r15+160, and %r15+200 is passed as %r4 to the function brasl %r14,_ZSt16__insertion_sortIPPvN4llvm12function_refIFbS0_S0_vT_S6_T0_@PLT In *.postreload, we have still correct: (insn 16 12 166 2 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69]) (reg:TI 8 %r8)) 1268 {movti} (nil)) ... (insn 137 136 140 3 (set (reg/v:TI 6 %r6 [orig:69 __comp ] [69]) (reg/v:TI 16 %f0 [orig:69 __comp ] [69])) 1268 {movti} (nil)) The code spills it to 128-bit %f0 register and loads it back from it. Next, split2 pass splits the latter (but not the former) into: (insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ]) (reg:DI 16 %f0)) 1269 {*movdi_64} (nil)) (insn 168 167 140 3 (set (reg:DI 7 %r7 [orig:69 __comp+8 ] [69]) (unspec:DI [ (reg:V2DI 16 %f0) (const_int 1 [0x1]) ] UNSPEC_VEC_EXTRACT)) 402 {*vec_extractv2di} (nil)) and finally cprop_hardreg seeing (insn 187 188 186 3 (set (reg/v:TI 16 %f0 [orig:69 __comp ] [69]) (reg:TI 8 %r8)) 1268 {movti} (nil)) changes insn 167 to: (insn 167 136 168 3 (set (reg:DI 6 %r6 [ __comp ]) (reg:DI 9 %r9 [16])) 1269 {*movdi_64} (nil)) I'm not sure if this is a bug in the ; Split a VR -> GPR TImode move into 2 vector load GR from VR element. ; For the higher order bits we do simply a DImode move while the ; second part is done via vec extract. Both will end up as vlgvg. (define_split [(set (match_operand:TI 0 "register_operand" "") (match_operand:TI 1 "register_operand" ""))] "TARGET_VX && reload_completed && GENERAL_REG_P (operands[0]) && VECTOR_REG_P (operands[1])" [(set (match_dup 2) (match_dup 4)) (set (match_dup 3) (unspec:DI [(match_dup 5) (const_int 1)] UNSPEC_VEC_EXTRACT))] { operands[2] = operand_subword (operands[0], 0, 0, TImode); operands[3] = operand_subword (operands[0], 1, 0, TImode); operands[4] = gen_rtx_REG (DImode, REGNO (operands[1])); operands[5] = gen_rtx_REG (V2DImode, REGNO (operands[1])); }) splitter, in cprop_hardreg or the s390x representation of those TImodes in floating point registers. In GCC 9 it got "fixed" with https://gcc.gnu.org/r9-3763-gef976be1a23a517 but that just means it went lat
[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364 --- Comment #22 from Andreas Krebbel --- (In reply to Sarah Julia Kriesch from comment #21) > Did you use a mainframe as a local system? I did run these commands on a z15 Lpar with Fedora33 installed.
[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364 --- Comment #20 from Andreas Krebbel --- (In reply to Sarah Julia Kriesch from comment #18) ... > sudo zypper in osc build obs-service-format_spec_file bsdtar #also possible > with other Linux distributions > osc co openSUSE:Factory:zSystems/postgresql14 > cd openSUSE\:Factory\:zSystems/postgresql14/ > osc build --vm-type=kvm --vm-memory=4G Tried with these commands. Build fails due to OOM killer with 4GB and 8GB. Package builds fine starting with 12GB. In none of the cases I got the ld error.
[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364 --- Comment #17 from Andreas Krebbel --- (In reply to Sarah Julia Kriesch from comment #12) > that is happening during the build process in OBS with a really minimal > openSUSE Tumbleweed. We are using VMs using QEMU and with 4GB of memory. Why only 4GB? Isn't this way too low for building things like rust with lto and everything? I've successfully built rust1.54 and postgresql14 several times in an opensuse tumbleweed container. So I would suspect either the kernel or the guest setup you are using. Could it perhaps be that ld processes got oom killed and have left half-complete binaries which triggered the error then? In the current logs I don't see the ld issue anymore. Apparently you already gave it more memory and the behavior changed due to that?
[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364 --- Comment #15 from Andreas Krebbel --- (In reply to Sarah Julia Kriesch from comment #0) ... > Full PostgreSQL log: > https://build.opensuse.org/build/openSUSE:Factory:zSystems/standard/s390x/ > postgresql14/_log > > Full Rust log: > https://build.opensuse.org/build/openSUSE:Factory:zSystems/standard/s390x/ > rust1.54/_log No access
[Bug middle-end/103364] s390x: TLS reference in /usr/lib64/libLLVM.so mismatches non-TLS reference in /usr/lib64/libLLVM.so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103364 --- Comment #11 from Andreas Krebbel --- Could you please provide the steps to reproduce the issue. I just tried real quick with a container image and couldn't reproduce it.
[Bug target/103028] ICE in extract_constrain_insn, at recog.c:2670
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103028 --- Comment #3 from Andreas Krebbel --- So I think what is needed is something like this: diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 017944f4f79a..1f5b9476ac2e 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -4341,7 +4341,8 @@ find_if_header (basic_block test_bb, int pass) && cond_exec_find_if_block (_info)) goto success; - if (targetm.have_trap () + if (!reload_completed + && targetm.have_trap () && optab_handler (ctrap_optab, word_mode) != CODE_FOR_nothing && find_cond_trap (test_bb, then_edge, else_edge)) goto success;
[Bug target/103028] ICE in extract_constrain_insn, at recog.c:2670
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103028 --- Comment #2 from Andreas Krebbel --- IF-convert generates the compare *after* reload. The operands get checked for validity only by invoking the predicates. That means everything which is accepted by TARGET_LEGITIMATE_CONSTANT_P is ok for a general_operand. However, we have several patterns where the union of all constraints would accept less operands than the predicate assuming that reload is able to sort this out. The ICE is triggered by emitting a pattern which actually would need to be fixed by reload. The problem could easily be avoided by e.g. enforcing operand1 to satisfy the constraint used in the pattern. However, I'm wondering how this is supposed to work in general. Couldn't this trigger all sorts of problems? Are we the only backend relying on LRA sorting out these kind of issues for us? Btw. I couldn't trigger the problem without -fharden-conditional-branches so far.
[Bug target/102222] ICE on s390 (internal compiler error: in extract_insn, at recog.c:2770)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10 --- Comment #6 from Andreas Krebbel --- (insn 9 8 10 2 (set (strict_low_part (reg:SI 66)) (mem/c:SI (plus:SI (reg/f:SI 64) (const_int 4 [0x4])) [1 read_inode_val+0 S4 A32])) With -mesa this should be a simple move. However, in that case it apparently is emitted via insv.
[Bug target/102222] ICE on s390 (internal compiler error: in extract_insn, at recog.c:2770)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10 Andreas Krebbel changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |krebbel at gcc dot gnu.org --- Comment #5 from Andreas Krebbel --- Created attachment 51461 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51461=edit Experimental patch
[Bug target/96127] ICE in extract_insn, at recog.c:2294
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96127 --- Comment #4 from Andreas Krebbel --- The testcase does not appear to fail on current GCC 10 branch. So I would just close it as fixed in GCC 11.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #2 from Andreas Krebbel --- Created attachment 51174 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51174=edit Experimental Fix With that patch the number of combine attempts goes back to normal.
[Bug rtl-optimization/101523] Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 --- Comment #1 from Andreas Krebbel --- This appears to be triggered by try_combine unnecessarily setting back the position by returning the i2 insn. When 866 is inserted into 973 866 still needs to be kept around for other users. So try_combine first merges the two sets into a parallel and immediately notices that this can't be recognized. Because none of the sets is a trivial move it is split again into two separate insns. Although the new i2 pattern exactly matches the input i2 combine considers this to be a new insn and triggers all the scanning log link creation and eventually returns it what let's the combine start all over at 866. Due to that combine tries many of the substitutions more than 400x. Trying 866 -> 973: 866: r22393:DI=r22391:DI+r22392:DI 973: r22499:DF=r22498:DF*[r22393:DI] REG_DEAD r22498:DF Failed to match this instruction: (parallel [ (set (reg:DF 22499) (mult:DF (reg:DF 22498) (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ]) (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64]))) (set (reg/f:DI 22393 [ _85087 ]) (plus:DI (reg/f:DI 22391 [ _85085 ]) (reg:DI 22392 [ _85086 ]))) ]) Failed to match this instruction: (parallel [ (set (reg:DF 22499) (mult:DF (reg:DF 22498) (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ]) (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64]))) (set (reg/f:DI 22393 [ _85087 ]) (plus:DI (reg/f:DI 22391 [ _85085 ]) (reg:DI 22392 [ _85086 ]))) ]) Successfully matched this instruction: (set (reg/f:DI 22393 [ _85087 ]) (plus:DI (reg/f:DI 22391 [ _85085 ]) (reg:DI 22392 [ _85086 ]))) Successfully matched this instruction: (set (reg:DF 22499) (mult:DF (reg:DF 22498) (mem:DF (plus:DI (reg/f:DI 22391 [ _85085 ]) (reg:DI 22392 [ _85086 ])) [17 *_85087+0 S8 A64]))) allowing combination of insns 866 and 973 original costs 4 + 4 = 8 replacement costs 4 + 4 = 8 modifying insn i2 866: r22393:DI=r22391:DI+r22392:DI deferring rescan insn with uid = 866. modifying insn i3 973: r22499:DF=r22498:DF*[r22391:DI+r22392:DI] REG_DEAD r22498:DF deferring rescan insn with uid = 973.
[Bug rtl-optimization/101523] New: Huge number of combine attempts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101523 Bug ID: 101523 Summary: Huge number of combine attempts Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Compiling the attached testcase on s390x with: cc1plus -fpreprocessed t.ii -quiet -march=z196 -g -O2 -std=c++11 produces a huge amount of combine attempts compared to x86 consuming more than 11GB of memory: x86: 27264 combine attempts for 170631 insns s390x: 40009540 combine attempts for 164327 insns gcc g:6d4da4aeef5b20f7f9693ddc27d26740d0dbe36c
[Bug target/86681] ICE in extract_insn, at recog.c:2304 on s390x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86681 --- Comment #6 from Andreas Krebbel --- Do you have the command line for the tattr-1.c test? The verbose options line appears to contain the options for a different test. I could not reproduce the problem with these options.
[Bug rtl-optimization/101426] Wrong code redirecting IPA thunk parms to tail-call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101426 --- Comment #1 from Andreas Krebbel --- Created attachment 51136 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51136=edit Experimental Fix With this patch the address is copied to a pseudo first. That way the register allocator will sort out the dependencies resulting in the following code being generated: lgr %r2,%r3 lgr %r3,%r4 lgr %r4,%r5 jg _ZN1r6NSPACE6AShrOp5buildERNS_9OpBuilderERNS_14OperationStateENS_10ValueRangeEN6nspace8ArrayRefISt4pairINS_10IdentifierENS_9Attribute.constprop.0
[Bug rtl-optimization/101426] New: Wrong code redirecting IPA thunk parms to tail-call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101426 Bug ID: 101426 Summary: Wrong code redirecting IPA thunk parms to tail-call Product: gcc Version: 10.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Created attachment 51135 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51135=edit Testcase Building the attached testcase with: cc1plus -fpreprocessed t.cc -quiet -m64 -mzarch -O2 Produces wrong code with GCC commits up to g:9725df0233b A specialized clone of AShrOp::build without the first parameter is created. The calls in the foo* functions to the clone have to shift the function parameters one hard reg down to fit the signature of the clone: void r::NSPACE::foo1 (struct OpBuilder & D.2900, struct OperationState & state, struct ValueRange operands, struct ArrayRef attributes) { [local count: 1073741824]: r::NSPACE::AShrOp::build.constprop (state_3(D), operands, attributes); [tail call] return; } The generated code overwrites the 2. and the 3. parameter with the 4. of the caller: lgr %r2,%r3 lgr %r4,%r5 lgr %r3,%r5 jg _ZN1r6NSPACE6AShrOp5buildERNS_9OpBuilderERNS_14OperationStateENS_10ValueRangeEN6nspace8ArrayRefISt4pairINS_10IdentifierENS_9Attribute.constprop.0 The problem does not occur on head and 10.3 after this commit g:defafb78cbc With this the parameters are always copied. The fix was done for PR90448 to fix an ICE triggered when building and address operand based on the DECL_RTL of the parameter which wasn't addressable at that point. I think the situation is a bit different here. The code wires up the incoming hardregs with the callee parms without considering that the resulting moves might affect each other.
[Bug middle-end/100908] asan clobberes register asm variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100908 --- Comment #1 from Andreas Krebbel --- https://gcc.gnu.org/pipermail/gcc/2021-June/236269.html
[Bug middle-end/100908] New: asan clobberes register asm variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100908 Bug ID: 100908 Summary: asan clobberes register asm variables Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Created attachment 50933 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50933=edit Testcase Compiling the testcase with either: gcc -O3 t1.c -o t -fsanitize=address --param asan-instrumentation-with-call-threshold=0 or gcc -O3 t1.c -o t -fsanitize=kernel-address -lasan aborts because dereferencing y triggers the address sanitizer to introduce a function call. That a function call might clobber registers assigned with register asm is a documented limitation of the register asm construct: https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html However, in combination with the address sanitizer this becomes even less obvious making even the most experienced kernel developers trip over it: https://lkml.org/lkml/2020/10/23/908 For IBM Z quite a few cases like this have been reported to me. Here just one I could find quickly: https://lore.kernel.org/patchwork/patch/1413907/ Btw. clang appears to handle this more gracefully and preserves the value of the variable around function calls. The attached testcase works fine with clang. I think it would be much better to find a solution which allows to directly name hard registers as inline assembly constraints. I'll post an RFC on the mailing list.
[Bug c++/100281] ICE with SImode pointer assignment in C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100281 Andreas Krebbel changed: What|Removed |Added Attachment #50685|0 |1 is obsolete|| --- Comment #5 from Andreas Krebbel --- Created attachment 50689 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50689=edit Fixed patch with testcase
[Bug c++/100281] ICE with SImode pointer assignment in C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100281 --- Comment #3 from Andreas Krebbel --- This is a hard requirement for the z/TPF operating system supported as part of our IBM Z backend. It happens to work for many years already and they make extensive use of it.
[Bug c++/100281] New: ICE with SImode pointer assignment in C++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100281 Bug ID: 100281 Summary: ICE with SImode pointer assignment in C++ Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Created attachment 50685 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50685=edit Experimental Fix typedef void * __attribute__((mode (SI))) __ptr32_t; void foo(){ unsigned int b = 100; __ptr32_t a; a = b; } Building with "cc1plus t.cpp" ICEs on s390x: void foo() in strip_typedefs, at cp/tree.c:1770 6 | a = b; | ^ 0x156f731 strip_typedefs(tree_node*, bool*, unsigned int) /home2/andreas/build/../gcc/gcc/cp/tree.c:1770 0x135c827 type_to_string /home2/andreas/build/../gcc/gcc/cp/error.c:3298 0x136c723 cxx_format_postprocessor::handle(pretty_printer*) /home2/andreas/build/../gcc/gcc/cp/error.c:4242 0x291f171 pp_format(pretty_printer*, text_info*) /home2/andreas/build/../gcc/gcc/pretty-print.c:1496 0x28ffecb diagnostic_report_diagnostic(diagnostic_context*, diagnostic_info*) /home2/andreas/build/../gcc/gcc/diagnostic.c:1244 0x2902cef diagnostic_impl /home2/andreas/build/../gcc/gcc/diagnostic.c:1406 0x2902cef permerror(rich_location*, char const*, ...) /home2/andreas/build/../gcc/gcc/diagnostic.c:1688 0x12441f7 convert_like_internal /home2/andreas/build/../gcc/gcc/cp/call.c:7581 0x12460e1 convert_like /home2/andreas/build/../gcc/gcc/cp/call.c:8114 0x12463b3 convert_like /home2/andreas/build/../gcc/gcc/cp/call.c:8126 0x12463b3 perform_implicit_conversion_flags(tree_node*, tree_node*, int, int) /home2/andreas/build/../gcc/gcc/cp/call.c:12303 0x1599687 cp_build_modify_expr(unsigned int, tree_node*, tree_code, tree_node*, int) /home2/andreas/build/../gcc/gcc/cp/typeck.c:8887 0x159b66d build_x_modify_expr(unsigned int, tree_node*, tree_code, tree_node*, int) /home2/andreas/build/../gcc/gcc/cp/typeck.c:8978 0x1435d8d cp_parser_assignment_expression /home2/andreas/build/../gcc/gcc/cp/parser.c:10184 0x1437661 cp_parser_expression /home2/andreas/build/../gcc/gcc/cp/parser.c:10313 0x143b5c1 cp_parser_expression_statement /home2/andreas/build/../gcc/gcc/cp/parser.c:12041 0x1449a71 cp_parser_statement /home2/andreas/build/../gcc/gcc/cp/parser.c:11837 0x144bac7 cp_parser_statement_seq_opt /home2/andreas/build/../gcc/gcc/cp/parser.c:12189 0x144bbc7 cp_parser_compound_statement /home2/andreas/build/../gcc/gcc/cp/parser.c:12138 0x146ef03 cp_parser_function_body /home2/andreas/build/../gcc/gcc/cp/parser.c:24080 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. The problem appears to be triggered by two locations in the front-end where non-POINTER_SIZE pointers aren't handled right now. 1. An assertion in strip_typedefs is triggered because the alignment of the types don't match. This in turn is caused by creating the new type with build_pointer_type instead of taking the type of the original pointer into account. 2. An assertion in cp_convert_to_pointer is triggered which expects the target type to always have POINTER_SIZE.
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #5 from Andreas Krebbel --- Created attachment 50132 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50132=edit RTL dump from store motion pass
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #4 from Andreas Krebbel --- The update of global variable c is moved out of the loop. Due to that c stays at 8 although it should be counted down to 2.
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #3 from Andreas Krebbel --- Created attachment 50131 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50131=edit RTL GCSE dump without -fgcse-sm
[Bug rtl-optimization/98973] [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 --- Comment #2 from Andreas Krebbel --- Created attachment 50130 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50130=edit RTL GCSE dump with -fgcse-sm
[Bug rtl-optimization/98973] New: [11 regression] Wrong code with gcse store motion pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98973 Bug ID: 98973 Summary: [11 regression] Wrong code with gcse store motion pass Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- This test aborts when compiled on IBM Z with: gcc -O3 t.c -o t -fgcse-sm it succeeds with -O2 or without -fgcse-sm Tested with Commit ID: 072f20c5559 It works with GCC 10 branch: eb15f761bc7 long a; int b, c; short d; int e[] = { 1, 1, 0, 1, 1, 1, 1, 1, 1, 1 }; void f () { g: c = 9; for (; c >= 3; c--) { int h[5]; for (; d; d--) ; for (; a;) if (e[c]) b = h[4]; if (e[c]) continue; goto g; } } int main () { f (); if (c != 2) __builtin_abort(); }
[Bug inline-asm/98847] Miscompilation with c++17, templates, and register keyword
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98847 Andreas Krebbel changed: What|Removed |Added CC||krebbel at gcc dot gnu.org --- Comment #6 from Andreas Krebbel --- Thanks for fixing this. When I had a look at it in 2015 I found that template instantiation explicitly zeroes out the asm name. Solution for me was to prevent that for hard reg decls. Not sure what approach is preferable here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33661#c13
[Bug tree-optimization/98736] Wrong partition order generated in loop distribution pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736 Andreas Krebbel changed: What|Removed |Added Keywords||wrong-code Priority|P3 |P2 Target||s390x
[Bug tree-optimization/98736] New: Wrong partition order generated in loop distribution pass
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98736 Bug ID: 98736 Summary: Wrong partition order generated in loop distribution pass Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- int a[6]; char b, c; int main() { int d[4] = {0, 0, 0, 0}; for (c = 0; c <= 5; c++) { for (b = 2; b != 0; b++) a[c] = 8; a[c] = d[3]; } if (a[0] != 0) __builtin_abort(); } Aborts when compiled with: gcc -Os -march=z13 t.c -o t Succeeds with: gcc -O3 -march=z13 t.c -o t The outer loop is recognized as clearmem. Unfortunately it is generated before the inner loop body.
[Bug target/98550] [11 Regression] ICE in exact_div, at poly-int.h:2219 on s390x-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98550 Andreas Krebbel changed: What|Removed |Added Status|WAITING |NEW --- Comment #5 from Andreas Krebbel --- With the patch the sign extension of 6 shift count operands from int to long int is now marked as vect_external_def. This makes the vectype field in the slp node to be bumped from "vector 2 int" to a "vector 4 int" in: vectorizable_conversion->vect_maybe_update_slp_op_vectype This then triggers the ICE when trying to divide vf*group_size (which is 1*6 here) by the number of elements in the vector type (now 4) in vect_slp_analyze_node_operations. Is changing the vectype field of an slp node to a type with a different number of elements actually valid? slp1: bb$dh_5 = D.4123.dh; _10 = MEM[(int *)bb$dh_5]; pretmp_62 = a.cp[1]; pretmp_79 = a.cp[2]; pretmp_31 = a.cp[3]; pretmp_39 = a.cp[4]; pretmp_16 = a.cp[5]; pretmp_19 = a.cp[6]; goto ; [100.00%] [local count: 1014686041]: _20 = prephitmp_78 >> _10; a.cp[1] = _20; _22 = prephitmp_80 >> _10; a.cp[2] = _22; _24 = prephitmp_32 >> _10; a.cp[3] = _24; _26 = prephitmp_40 >> _10; a.cp[4] = _26; _28 = prephitmp_17 >> _10; a.cp[5] = _28; _30 = prephitmp_11 >> _10; a.cp[6] = _30; cn ={v} {CLOBBER};
[Bug target/98550] [11 Regression] ICE in exact_div, at poly-int.h:2219 on s390x-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98550 --- Comment #4 from Andreas Krebbel --- The problem occurs starting with: commit 1e1e1edf88a7c40ae4ae0de9e6077179e13ccf6d Author: Richard Biener Date: Thu Oct 29 08:48:15 2020 +0100 More BB vectorization tweaks This tweaks the op build from splats to allow loads marked as not vectorizable. It also amends some dump prints with the address of the SLP node or the instance to better be able to debug things. 2020-10-29 Richard Biener * tree-vect-slp.c (vect_build_slp_tree_2): Allow splatting not vectorizable loads. (vect_build_slp_instance): Amend dumping with address. (vect_slp_convert_to_external): Likewise. * gcc.dg/vect/bb-slp-pr65935.c: Adjust.
[Bug target/98550] [11 Regression] ICE in exact_div, at poly-int.h:2219 on s390x-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98550 --- Comment #3 from Andreas Krebbel --- Created attachment 49944 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49944=edit Reduced testcase This testcase fails on bcb3065b2ba with cc1plus t.cpp -march=z13 -O3
[Bug rtl-optimization/78559] [7 Regression] wrong code due to tree if-conversion?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78559 Andreas Krebbel changed: What|Removed |Added CC||stli at linux dot ibm.com --- Comment #15 from Andreas Krebbel --- *** Bug 98269 has been marked as a duplicate of this bug. ***
[Bug c/98269] gcc 6.5.0 __builtin_add_overflow() with small uint32_t values incorrectly detects overflow
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98269 Andreas Krebbel changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE CC||krebbel at gcc dot gnu.org --- Comment #4 from Andreas Krebbel --- The problem is a CC mode mismatch generated by combine. After splitting the add insn 135 generates a CCL1mode cc while the conditional jump consumes it as CCUmode. This leads to the wrong condition code mask being generated in the end. (insn 135 56 136 7 (parallel [ (set (reg:CCL1 33 %cc) (compare:CCL1 (plus:SI (reg:SI 108) (mem:SI (plus:DI (reg:DI 88 [ ivtmp.10 ]) (const_int 12 [0xc])) [3 MEM[base: previous_25, offset: 12B]+0 S4 A32])) (reg:SI 108))) (set (reg:SI 109) (plus:SI (reg:SI 108) (mem:SI (plus:DI (reg:DI 88 [ ivtmp.10 ]) (const_int 12 [0xc])) [3 MEM[base: previous_25, offset: 12B]+0 S4 A32]))) ]) t.c:31 1358 {*addsi3_carry1_cc} (expr_list:REG_DEAD (reg:SI 108) (nil))) (note 136 135 64 7 NOTE_INSN_DELETED) (insn 64 136 65 7 (set (mem:SI (plus:DI (reg:DI 88 [ ivtmp.10 ]) (const_int 28 [0x1c])) [3 MEM[base: previous_25, offset: 28B]+0 S4 A32]) (reg:SI 109)) t.c:31 1077 {*movsi_zarch} (nil)) (note 65 64 66 7 NOTE_INSN_DELETED) (jump_insn 66 65 67 7 (set (pc) (if_then_else (geu (reg:CCU 33 %cc) (const_int 0 [0])) (label_ref 78) (pc))) t.c:31 1661 {*cjump_64} (int_list:REG_BR_PROB 9500 (expr_list:REG_DEAD (reg:CCZ 33 %cc) (nil))) The failure disappears with: commit bf7499197fbb065123257c374064f6bb715c951b Author: Dominik Vogt Date: Mon Jul 4 14:25:22 2016 + S/390: Add support for z13 instructions lochi and locghi. The attached patch adds patterns to make use of the z13 LOCHI and LOCGHI instructions. ... But that one only hides the problem. The mere presence of the lochi alternatives lead to different RTL being emitted (although the alternative is not enabled for -march=z196). The split then doesn't happen anymore. Reverting the patch and continue bisecting. The failure finally disappears with: 3f54004b095d1cd513e63753ee0f8f9f13698347 is the first bad commit commit 3f54004b095d1cd513e63753ee0f8f9f13698347 Author: Bin Cheng Date: Fri Jan 27 14:42:23 2017 + re PR rtl-optimization/78559 (wrong code due to tree if-conversion?) PR rtl-optimization/78559 * combine.c (try_combine): Discard REG_EQUAL and REG_EQUIV for other_insn in combine. This looks like the actual fix to me. The wrong CC mode survives as part of a REG_EQUAL note: Successfully matched this instruction: (set (reg:SI 93 [ _27+4 ]) (if_then_else:SI (geu (reg:CCL1 33 %cc) (const_int 0 [0])) (reg:SI 93 [ _27+4 ]) (reg:SI 118))) allowing combination of insns 56 and 135 original costs 4 + 4 = 16 replacement cost 8 deferring deletion of insn with uid = 56. modifying other_insn 136: r93:SI={(geu(%cc:CCL1,0))?r93:SI:r118:SI} REG_DEAD %cc:CCU REG_EQUAL ltu(%cc:CCU,0) deferring rescan insn with uid = 136. modifying insn i3 135: {%cc:CCL1=cmp(r108:SI+[r88:DI+0xc],r108:SI);r109:SI=r108:SI+[r88:DI+0xc];} REG_DEAD r108:SI deferring rescan insn with uid = 135. So we probably should mark it as duplicate of PR78559. *** This bug has been marked as a duplicate of bug 78559 ***
[Bug tree-optimization/98221] [10/11 regression] Wrong unpack operation emitted in tree-ssa-forwprop.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98221 --- Comment #3 from Andreas Krebbel --- tree-vect-loop-manip.c: vect_maybe_permute_loop_masks also emits VEC_UNPACKS_HI/LO dependent on BYTES_BIG_ENDIAN. What is the expectation wrt the meaning of hi/lo in RTL standard names? I couldn't find it clearly documented for this either. Well, for things like 'smulm3_highpart' we say it is about the 'most significant half' but I don't see anything for the vector hi/lo.
[Bug tree-optimization/98221] [11 regression] Wrong unpack operation emitted in tree-ssa-forwprop.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98221 Andreas Krebbel changed: What|Removed |Added Priority|P3 |P2 Keywords||wrong-code Target||s390x
[Bug tree-optimization/98221] New: [11 regression] Wrong unpack operation emitted in tree-ssa-forwprop.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98221 Bug ID: 98221 Summary: [11 regression] Wrong unpack operation emitted in tree-ssa-forwprop.c Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Created attachment 49728 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49728=edit Fix The vec-abi-varargs-1.c testcase on IBM Z currently fails. While adding an SI mode vector to a DI mode vector the first is unpacked using: _28 = BIT_INSERT_EXPR <{ 0, 0, 0, 0 }, _2, 0>; _34 = [vec_unpack_lo_expr] _28; However, on big endian targets lo refers to the right hand side of the vector - in this case the zeroes. This appears to be triggered with that patch: commit 78307657cf9675bc4aa2e77561c823834714b4c8 Author: Richard Biener Date: Thu Nov 28 12:22:04 2019 + re PR tree-optimization/92645 (Hand written vector code is 450 times slower when compiled with GCC compared to Clang) 2019-11-28 Richard Biener PR tree-optimization/92645 * tree-ssa-forwprop.c (get_bit_field_ref_def): Also handle conversions inside a mode class. Remove restriction on preserving the element size. (simplify_vector_constructor): Deal with the above and for identity permutes also try using VEC_UNPACK_[FLOAT_]LO_EXPR and VEC_PACK_TRUNC_EXPR. * gcc.target/i386/pr92645-4.c: New testcase.
[Bug target/98124] Z: Load and test LTDBR instruction gets not used for comparison against 0.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98124 --- Comment #2 from Andreas Krebbel --- (In reply to Andreas Krebbel from comment #1) > LTDBR turns SNaNs into QNaNs and that's not supposed to happen in your > testcase. We emit LTDBR only with -fno-trapping-math ... or if the result of LTDBR isn't used.
[Bug target/98124] Z: Load and test LTDBR instruction gets not used for comparison against 0.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98124 Andreas Krebbel changed: What|Removed |Added CC||krebbel at gcc dot gnu.org Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #1 from Andreas Krebbel --- LTDBR turns SNaNs into QNaNs and that's not supposed to happen in your testcase. We emit LTDBR only with -fno-trapping-math
[Bug target/97326] [11 Regression] s390: ICE in do_store_flag after 10843f830350
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97326 Andreas Krebbel changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Andreas Krebbel --- Fixed
[Bug target/97326] [11 Regression] s390: ICE in do_store_flag after 10843f830350
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97326 Andreas Krebbel changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #5 from Andreas Krebbel --- closing
[Bug middle-end/97326] [11 Regression] s390: ICE in do_store_flag after 10843f830350
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97326 --- Comment #2 from Andreas Krebbel --- Probably my fault. I did forget supporting floats in vec_cmp. I'm testing a patch.
[Bug target/96456] [10/11 Regression] ICE in expand_insn, at optabs.c:7511 on s390x-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96456 Andreas Krebbel changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #5 from Andreas Krebbel --- closing
[Bug target/96456] [10/11 Regression] ICE in expand_insn, at optabs.c:7511 on s390x-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96456 Andreas Krebbel changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Andreas Krebbel --- Fixed for trunk and GCC 10
[Bug bootstrap/97502] [11 Regression] PGO bootstrap failure on s390x-linux with -march=z13 starting with r11-3426
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97502 Andreas Krebbel changed: What|Removed |Added Last reconfirmed||2020-10-21 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #7 from Andreas Krebbel --- The vec_cmp* expanders in vx-builtins.md are only supposed to be used for expanding the builtins. Unfortunately the names appear to collide with the rtx standard names to some degree. I will try to implement the standard name patterns and direct builtin expansion to them instead.
[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497 --- Comment #6 from Andreas Krebbel --- Alternatively I could also mark r12 as preserved across function calls for -fpic in the backend. In fact all the bits we care about are preserved. Since the register is fixed all the accesses do come from the backend itself. That's similar to what I was trying with the fixed_regs hack. But I agree that this might not be correct in general. The full fix is probably to track the exact parts of partially clobbered regs which stay live but this would be a major change.
[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497 --- Comment #4 from Andreas Krebbel --- Reading from symbol t uses the GOT pointer in r12. The call then partially clobbers r12 but does not affect the lower 32 bits where the GOT pointer resides. So the GOT pointer stays in fact live across the call. The way this is currently handled in gcse the second access of t reads a different value than the first and this is wrong I think. This leads to a disagreement between the pre_delete_map and the insert locations. The later read of t is removed because it is an anticipated use but no copy of the value is inserted after the first read of t because the expression is not considered to be available anymore at the second location. The availability of the entire expression is broken by the set of r12 at the call insns. I didn't know how to solve this without being able to keep track of what parts of hard regs are clobbered. The idea behind the fixed_reg check is to trust the backend that it does not emit uninitialized uses of hard regs in the first place. t.c.250r.cprop1: (insn 6 3 7 2 (set (reg/f:SI 65) (mem/u/c:SI (plus:SI (reg:SI 12 %r12) (const:SI (unspec:SI [ (symbol_ref:SI ("t") [flags 0x6c0] ) ] UNSPEC_GOT))) [0 S4 A8])) "t.c":8:3 1387 {*movsi_zarch} (nil)) (insn 7 6 8 2 (set (reg:SI 3 %r3) (mem/f/c:SI (reg/f:SI 65) [1 t+0 S4 A32])) "t.c":8:3 1387 {*movsi_zarch} (expr_list:REG_DEAD (reg/f:SI 65) (nil))) (insn 8 7 9 2 (set (reg:SI 2 %r2) (const_int 1 [0x1])) "t.c":8:3 1387 {*movsi_zarch} (nil)) (call_insn 9 8 10 2 (parallel [ (call (mem:QI (const:SI (unspec:SI [ (symbol_ref:SI ("bar") [flags 0x41] ) ] UNSPEC_PLT)) [0 bar S1 A8]) (const_int 0 [0])) (clobber (reg:SI 14 %r14)) ]) "t.c":8:3 2053 {*brasl} (expr_list:REG_DEAD (reg:SI 3 %r3) (expr_list:REG_DEAD (reg:SI 2 %r2) (expr_list:REG_CALL_DECL (symbol_ref:SI ("bar") [flags 0x41] ) (nil (expr_list (use (reg:SI 12 %r12)) (expr_list:SI (use (reg:SI 2 %r2)) (expr_list:SI (use (reg:SI 3 %r3)) (nil) ... (insn 13 12 14 4 (set (reg/f:SI 66) (mem/u/c:SI (plus:SI (reg:SI 12 %r12) (const:SI (unspec:SI [ (symbol_ref:SI ("t") [flags 0x6c0] ) ] UNSPEC_GOT))) [0 S4 A8])) "t.c":10:5 1387 {*movsi_zarch} (nil))
[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497 --- Comment #3 from Andreas Krebbel --- Created attachment 49405 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49405=edit testcase
[Bug rtl-optimization/97497] gcse wrong code generation with partial register clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497 --- Comment #1 from Andreas Krebbel --- Created attachment 49402 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49402=edit Proposed fix With the patch only regs are considered which aren't "fixed" assuming that for fixed_regs the backend takes care of only actually using the well-defined part of the hard regs.
[Bug rtl-optimization/97497] New: gcse wrong code generation with partial register clobbers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97497 Bug ID: 97497 Summary: gcse wrong code generation with partial register clobbers Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: krebbel at gcc dot gnu.org Target Milestone: --- Compiling the attached testcase produces wrong code on IBM Z: cc1 t.c -m31 -mzarch -march=z900 -O2 -fpic -o t.s foo: stm %r11,%r15,44(%r15) larl%r12,_GLOBAL_OFFSET_TABLE_ lr %r11,%r2 l %r1,t@GOT(%r12) ahi %r15,-96 lhi %r2,1 l %r3,0(%r1) brasl %r14,bar@PLT ltr %r11,%r11 jne .L8 lhi %r2,1 l %r3,0 <--- dereference address 0 brasl %r14,bar@PLT l %r4,152(%r15) lm %r11,%r15,140(%r15) br %r4 .L8: lhi %r3,1 l %r2,0 <--- dereference address 0 brasl %r14,baz@PLT lhi %r2,1 l %r3,0 brasl %r14,bar@PLT l %r4,152(%r15) lm %r11,%r15,140(%r15) br %r4 gcse decides to remove the load from t in the subsequent bbs but does not generate the load into a temp reg in the first bb leaving the bbs loading from an uninitialized pseudo. With -mzarch -m31 we have the GOT pointer marked as partially clobbered. The loads from t use the GOT pointer explicitly in the RTX. Since this patch r12 is considered to be fully clobbered by call insns: commit a4dfaad2e5594d871fe00a1116005e28f95d644e (refs/bisect/bad) Author: Richard Sandiford Date: Mon Sep 30 16:20:44 2019 + Remove global call sets: gcse.c This is another case in which we can conservatively treat partial kills as full kills. Again this is in principle a bug fix for TARGET_HARD_REGNO_CALL_PART_CLOBBERED targets, but in practice it probably doesn't make a difference. 2019-09-30 Richard Sandiford gcc/ * gcse.c: Include function-abi.h. (compute_hash_table_work): Use insn_callee_abi to get the ABI of the call insn target. Invalidate partially call-clobbered registers as well as fully call-clobbered ones. From-SVN: r276323 Now the RTX for t which references r12 is considered to be not available anymore in the later bbs due to r12 being clobbered by the calls. Hence no load of the original expression is being emitted.