[Bug sanitizer/115461] lsan doesn't work on s390x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115461 --- Comment #6 from Ilya Leoshkevich --- Forgot to add: since the runtime is shared, this observation applies to both GCC and LLVM. $ gcc k.c -fsanitize=leak; ./a.out 0x5080 $ LSAN_OPTIONS=use_stacks=0 ./a.out 0x5080 = ==948446==ERROR: LeakSanitizer: detected memory leaks Direct leak of 123 byte(s) in 1 object(s) allocated from: #0 0x3fff7a16caf in malloc (/lib64/liblsan.so.0+0x16caf) (BuildId: 58eab4a667c0b1f8c0ff7fe7ac931e0eaa86cd5e) #1 0x1001219 in main (/tmp/a.out+0x1001219) (BuildId: 277d8d1498d2a3f76a547ae04af127173f8a2c76) SUMMARY: LeakSanitizer: 123 byte(s) leaked in 1 allocation(s).
[Bug sanitizer/115461] lsan doesn't work on s390x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115461 --- Comment #5 from Ilya Leoshkevich --- The LLVM testsuite still passes. Looking a bit deeper: $ LSAN_OPTIONS=verbosity=1,log_pointers=1 ./a.out [...] 0x5080 ==1522380==LeakSanitizer: checking for leaks [...] ==1522381==Scanning STACK range 0x03ffa3d8-0x03ffb000. ==1522381==0x03ffa820: found 0x5080 pointing into chunk 0x5080-0x5080007b of size 123. So something spilled the pointer value on stack, and LSan thinks that it's still referenced. And indeed, turning stack scanning off resolves the issue: $ LSAN_OPTIONS=use_stacks=0 ./a.out 0x5080 = ==1522412==ERROR: LeakSanitizer: detected memory leaks Direct leak of 123 byte(s) in 1 object(s) allocated from: #0 0x2aa00045bbd in malloc [...]/llvm-project/compiler-rt/lib/lsan/lsan_interceptors.cpp:75:3 #1 0x2aa0004779d in main ([...]/llvm-project/build/a.out+0x4779d) SUMMARY: LeakSanitizer: 123 byte(s) leaked in 1 allocation(s).
[Bug sanitizer/115461] lsan doesn't work on s390x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115461 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #4 from Ilya Leoshkevich --- It doesn't work for me anymore either. I will take a look at both GCC and LLVM issues.
[Bug sanitizer/79341] Many Asan tests fail on s390
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341 --- Comment #77 from Ilya Leoshkevich --- Apparently fixing the message in GCC will produce maintenance overhead [1]. If that's not very important to you, I'd rather leave this message as is. [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/648775.html
[Bug sanitizer/79341] Many Asan tests fail on s390
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79341 --- Comment #76 from Ilya Leoshkevich --- It's because the sanitizer runtime was copied from LLVM to GCC. I will post a patch removing the unsupported MSan and DFSan from the error message.
[Bug target/114404] [11] GCC reorders stores when it probably shouldn't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114404 --- Comment #4 from Ilya Leoshkevich --- Thanks, cherry-picking https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a98d5130a6dcff2ed4db371e500550134777b8cf helped both with the minimized testcase and the actual kernel bug. What you describe there - reassociation causing a wrong base term to be selected - matches what I've seen during debugging as well. So let's close this as a duplicate.
[Bug c/114404] [11] GCC reorders stores when it probably shouldn't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114404 --- Comment #2 from Ilya Leoshkevich --- Created attachment 57745 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57745=edit cc1 invocation
[Bug c/114404] [11] GCC reorders stores when it probably shouldn't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114404 --- Comment #1 from Ilya Leoshkevich --- Created attachment 57744 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57744=edit preprocessed source
[Bug c/114404] New: [11] GCC reorders stores when it probably shouldn't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114404 Bug ID: 114404 Summary: [11] GCC reorders stores when it probably shouldn't Product: gcc Version: 11.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- Reproducible with gcc commit 1b5510a59163. I'm writing this up as a result of the following linux kernel discussion: https://lore.kernel.org/bpf/c9923c1d-971d-4022-8dc8-1364e929d...@gmail.com/ https://lore.kernel.org/bpf/20240320015515.11883-1-...@linux.ibm.com/ In the following code: extern const char bpf_plt[]; extern const char bpf_plt_ret[]; extern const char bpf_plt_target[]; static void bpf_jit_plt(void *plt, void *ret, void *target) { memcpy(plt, bpf_plt, BPF_PLT_SIZE); *(void **)((char *)plt + (bpf_plt_ret - bpf_plt)) = ret; *(void **)((char *)plt + (bpf_plt_target - bpf_plt)) = target ?: ret; } GCC 11's sched1 pass reorders memcpy() and assignments. In GCC 12 this behavior is gone after commit 2e96b5f14e4025691b57d2301d71aa6092ed44bc Author: Aldy Hernandez Date: Tue Jun 15 12:32:51 2021 +0200 Backwards jump threader rewrite with ranger. but this seems to be accidental. Internally, output_dependence() for the respective mems returns false, because it believes that they are based on different SYMBOL_REFs. This may be because on the C level we are not allowed to subtract pointers to different objects. However, a possible solution to this should be casting pointers to longs, since C pointer subtraction rules would no longer apply, but in practice this does nothing. In the attached minimized preprocessed source with long casts we get: stg %r3,232(%r2,%r15) ltgr%r11,%r11 locgrne %r3,%r11 stg %r3,232(%r1,%r15) la %r2,0(%r1,%r9) la %r3,232(%r1,%r15) mvc 232(16,%r15),0(%r5) mvc 248(16,%r15),16(%r5) lghi%r4,8 brasl %r14,s390_kernel_write@PLT so the assignments are placed before the memcpy().
[Bug sanitizer/113284] [14 regression] many failures in asan after r14-6946-ge66dc37b299cac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113284 --- Comment #6 from Ilya Leoshkevich --- Created attachment 57014 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57014=edit patch v2 Thanks for the correction. I've noticed the function label and got tunnel vision; .opd does indeed contain no code, but only function and toc pointers, and we don't want that in ASAN reports. Would the attached patch be okay? It's basically your proposal, but with some code reuse.
[Bug sanitizer/113284] [14 regression] many failures in asan after r14-6946-ge66dc37b299cac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113284 --- Comment #4 from Ilya Leoshkevich --- Thanks, I can repro this on cfarm203 now. Apparently I missed diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 94fbf46f2b6..fd9bb807957 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -21334,7 +21334,7 @@ rs6000_elf_declare_function_name (FILE *file, const char *name, tree decl) if (TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2) { fputs ("\t.section\t\".opd\",\"aw\"\n\t.align 3\n", file); - ASM_OUTPUT_LABEL (file, name); + ASM_OUTPUT_FUNCTION_LABEL (file, name, decl); fputs (DOUBLE_INT_ASM_OP, file); rs6000_output_function_entry (file, name); fputs (",.TOC.@tocbase,0\n\t.previous\n", file); in commit c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL"). I will start a full regtest and send a patch shortly.
[Bug sanitizer/113284] [14 regression] many failures in asan after r14-6946-ge66dc37b299cac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113284 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #1 from Ilya Leoshkevich --- Could you please share the configure command that you use? I originally regtested that patch on cfarm120 (POWER10) with `./configure --enable-checking=yes,rtl`, and I cannot reproduce the issue there.
[Bug target/113273] [14 Regression][x86][asan] ICE Segmentation fault since r14-6946-ge66dc37b299cac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113273 --- Comment #4 from Ilya Leoshkevich --- I've pushed the fix. This can be closed as a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113251.
[Bug target/113273] [14 Regression][x86][asan] ICE Segmentation fault since r14-6946-ge66dc37b299cac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113273 --- Comment #3 from Ilya Leoshkevich --- Thank you for the confirmation. I will push the fix as soon as my regtests finish.
[Bug target/113273] [14 Regression][x86][asan] ICE Segmentation fault since r14-6946-ge66dc37b299cac
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113273 --- Comment #1 from Ilya Leoshkevich --- Hi, sorry about the regression. Could you please check if https://inbox.sourceware.org/gcc-patches/20240108092434.554918-1-...@linux.ibm.com/ fixes that for you?
[Bug sanitizer/99476] 'PATH_MAX' was not declared in this scope
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99476 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #3 from Ilya Leoshkevich --- I had a similar issue when compiling GCC targeting i686-linux on x86_64 debian, and --includedir= helped, thanks! I had to do the following: ../configure --target=i686-linux-gnu --disable-bootstrap --prefix=/usr --includedir=/usr/i686-linux-gnu/include
[Bug sanitizer/113251] [14 Regression] ICE on gcc.dg/asan/pr63845.c on i686-linux since r14-6946
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113251 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #1 from Ilya Leoshkevich --- I can reproduce this manually and will work on a fix. Surprisingly, this does not show in my test results. I.e.: $ make check-gcc RUNTESTFLAGS="asan.exp=pr63845.c --debug" === gcc Summary === # of expected passes7 $ cat gcc/testsuite/gcc/gcc.sum PASS: gcc.dg/asan/pr63845.c -O0 (test for excess errors) PASS: gcc.dg/asan/pr63845.c -O1 (test for excess errors) PASS: gcc.dg/asan/pr63845.c -O2 (test for excess errors) PASS: gcc.dg/asan/pr63845.c -O3 -g (test for excess errors) PASS: gcc.dg/asan/pr63845.c -Os (test for excess errors) PASS: gcc.dg/asan/pr63845.c -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors) PASS: gcc.dg/asan/pr63845.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors) But! $ cat gcc/testsuite/gcc/dbg.log expect: does "fPIC170653.c:3:13: internal compiler error: Segmentation fault\r\n" (spawn_id exp7) match regular expression ".+"? (No Gate, RE only) gate=yes re=yes compiler exited with status 1 So the problem manifests itself during the test run, but the runner fails to recognize it for some reason.
[Bug target/112986] s390x gcc O2, O3: Incorrect logic operation in < comparison with the same values
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112986 --- Comment #4 from Ilya Leoshkevich --- Hi, Nina fixed this in v8.0.0 (https://gitlab.com/qemu-project/qemu/-/commit/54fce97cfcaf5463ee5f325bc1f1d4adc2772f38). The fix was backported to v7.2.2 (https://gitlab.com/qemu-project/qemu/-/commit/17b032c6598ea756889f25e8d3e4cd9f2036669b), but not to v6.
[Bug target/106342] [12/13/14 Regression] internal compiler error: in extract_insn, at recog.cc:2791 since r12-4240-g2b8453c401b699
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106342 --- Comment #10 from Ilya Leoshkevich --- This bug was fixed and backported to gcc-12: commit 06254d97b8fa3a5d1c8b6b4e091d851700801385 Author: Ilya Leoshkevich Date: Fri Jul 29 16:14:10 2022 +0200 PR106342 - IBM zSystems: Provide vsel for all vector modes dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3 produces an insn that vsel is supposed to recognize, but can't, because it's not defined for V2SF. Fix by defining it for all vector modes supported by copysign3. gcc/ChangeLog: * config/s390/vector.md (V_HW_FT): New iterator. * config/s390/vx-builtins.md (vsel): Use V_HW_FT instead of V_HW. (cherry picked from commit 2f17f489de47d46626ed85804c3b810547ef550e) I think it should be closed.
[Bug target/93242] [MIPS] patchable-function-entry broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93242 --- Comment #11 from Ilya Leoshkevich --- I see. It would be good to update https://gcc.gnu.org/gcc-9/ then - e.g. https://gcc.gnu.org/gcc-8/ says "This release series is no longer supported".
[Bug target/93242] [MIPS] patchable-function-entry broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93242 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #9 from Ilya Leoshkevich --- Would it be possible to backport this to gcc-9? Linux kernel will start using patchable_function_entry soon, and there are problems with s390x, which this patch fixes as well: https://lore.kernel.org/bpf/9099057e-124c-8f30-c29d-54be85eee...@iogearbox.net/ There is a workaround for now, but it would be good to have this fixed in all the maintained gccs (gcc-8 is no longer maintained as far as I can see, so this leaves only gcc-9).
[Bug target/106342] [12/13 Regression] internal compiler error: in extract_insn, at recog.cc:2791 since r12-4240-g2b8453c401b699
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106342 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #6 from Ilya Leoshkevich --- Maybe that's something obvious, but still: wouldn't adding V1SF, V2SF, and V1DF to vsel also work? E.g. by changing it from using V_HW to using VT.
[Bug c++/100853] internal compiler error: in cp_tree_equal, at cp/tree.c:4148
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100853 --- Comment #1 from Ilya Leoshkevich --- Created attachment 50903 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50903=edit repro
[Bug c++/100853] New: internal compiler error: in cp_tree_equal, at cp/tree.c:4148
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100853 Bug ID: 100853 Summary: internal compiler error: in cp_tree_equal, at cp/tree.c:4148 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- $ cat cp-tree-equal.cpp typedef struct a *b; template struct c { d({ b *e; __typeof (*({ __typeof *e f; f})).g const f (({ __typeof (*({ int h; f} )).g const f ( $ gcc/cc1plus -fno-PIE -g -O2 -fno-checking -gtoggle -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables cp-tree-equal.cpp ... cp-tree-equal.cpp:8:37: internal compiler error: in cp_tree_equal, at cp/tree.c:4148 8 | f} )).g const f ( | ^ 0x7062e9 cp_tree_equal(tree_node*, tree_node*) ../../gcc/cp/tree.c:4148 0xbd1d3e cp_tree_equal(tree_node*, tree_node*) ../../gcc/cp/tree.c:4138 0xbd1d3e cp_tree_equal(tree_node*, tree_node*) ../../gcc/cp/tree.c:4138 0xbd1d3e cp_tree_equal(tree_node*, tree_node*) ../../gcc/cp/tree.c:4138 0xbd1d3e cp_tree_equal(tree_node*, tree_node*) ../../gcc/cp/tree.c:4138 0xbdc784 structural_comptypes ../../gcc/cp/typeck.c:1491 0xae2ffc check_local_shadow ../../gcc/cp/name-lookup.c:3264 0xae2ffc do_pushdecl ../../gcc/cp/name-lookup.c:3773 0xae39b4 pushdecl(tree_node*, bool) ../../gcc/cp/name-lookup.c:3852 0xa3995e start_decl(cp_declarator const*, cp_decl_specifier_seq*, int, tree_node*, tree_node*, tree_node**) ../../gcc/cp/decl.c:5591 0xb2dd61 cp_parser_init_declarator ../../gcc/cp/parser.c:21802 0xb093cd cp_parser_simple_declaration ../../gcc/cp/parser.c:14487 0xb0b0a9 cp_parser_declaration_statement ../../gcc/cp/parser.c:13622 0xb0bc0b cp_parser_statement ../../gcc/cp/parser.c:11848 0xb0ce0e cp_parser_statement_seq_opt ../../gcc/cp/parser.c:12215 0xb0cee8 cp_parser_compound_statement ../../gcc/cp/parser.c:12164 0xb10473 cp_parser_statement_expr ../../gcc/cp/parser.c:5142 0xb10473 cp_parser_primary_expression ../../gcc/cp/parser.c:5549 0xb11b80 cp_parser_postfix_expression ../../gcc/cp/parser.c:7528 0xb243af cp_parser_unary_expression ../../gcc/cp/parser.c:8849 Found when reducing a testcase for another problem.
[Bug middle-end/100278] IBM Z: Segmentation fault when building valgrind with -march=z14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100278 Ilya Leoshkevich changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #6 from Ilya Leoshkevich --- Fixed, thanks!
[Bug middle-end/100278] New: IBM Z: Segmentation fault when building valgrind with -march=z14
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100278 Bug ID: 100278 Summary: IBM Z: Segmentation fault when building valgrind with -march=z14 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- Minimized test: $ cat test.c a() { register b asm(""); if (b) b = 1; for (; b;) ; } $ $HOME/gcc/build/dist/bin/gcc -m64 -O2 -g -finline-functions -fno-stack-protector -fno-builtin -fomit-frame-pointer -fstrict-aliasing -march=z14 -c test.c test.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int] 1 | a() { | ^ test.c: In function ‘a’: test.c:2:12: warning: type defaults to ‘int’ in declaration of ‘b’ [-Wimplicit-int] 2 | register b asm(""); |^ during GIMPLE pass: pre test.c:1:1: internal compiler error: Segmentation fault 1 | a() { | ^ 0x1a33499 crash_signal ../../gcc/toplev.c:327 0x11e9bf2 contains_struct_check(tree_node*, tree_node_structure_enum, char const*, int, char const*) ../../gcc/tree.h:3466 0x1c57673 compute_avail ../../gcc/tree-ssa-pre.c:4163 0x1c580d9 execute ../../gcc/tree-ssa-pre.c:4370 Bisect points to: commit 577d05fc914338cd7ddc254f3bee4532331f5c13 Author: Richard Biener Date: Tue Mar 9 09:29:29 2021 +0100 tree-optimization/99473 - more cselim
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #9 from Ilya Leoshkevich --- Created attachment 50679 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50679=edit regtesting this patch now
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #8 from Ilya Leoshkevich --- Yeah, inline asm seems to be problematic: /home/iii/gcc/build/gcc/xgcc -B/home/iii/gcc/build/gcc/ /home/iii/gcc/gcc/testsuite/gcc.target/s390/vector/long-double-asm-hardreg.c -fdiagnostics-plain-output -O2 -march=z14 -mzarch -S -o long-double-asm-hardreg.s with the patch from comment 2 produces: foo: .LFB0: .cfi_startproc larl%r5,.L4 vl %v0,.L5-.L4(%r5),3 #APP # 10 "/home/iii/gcc/gcc/testsuite/gcc.target/s390/vector/long-double-asm-hardreg.c" 1 # %v0 # 0 "" 2 #NO_APP br %r14 `vl %v0,.L5-.L4(%r5),3` loads 1.0L into %v0[0:128]. However, it should be loaded into %v0[0:64] . %v2[0:64]. With the patch from comment 3 I get: foo: .LFB0: .cfi_startproc larl%r5,.L4 ld %f0,.L5-.L4(%r5) ld %f2,.L5-.L4+8(%r5) #APP # 10 "/home/iii/gcc/gcc/testsuite/gcc.target/s390/vector/long-double-asm-hardreg.c" 1 # %f0 # 0 "" 2 #NO_APP br %r14 which is correct, but in general case the exact reg that the user requested is not honored.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 --- Comment #5 from Ilya Leoshkevich --- That would be an ideal solution, but I wonder how to implement it? Suppose we find a way to convince expand to pick FPRX2mode for such a long double. What if the following comes up? register long double x asm ("v0"); /* FPRX2mode */ long double y; /* TFmode */ x += y; /* convert? */ Would it be feasible to also teach expand to do the mode conversions? One other alternative might be to detect `register long double asm("fN")` declarations and go back to using floating point register pairs for functions that contain them.
[Bug target/100217] [11/12 Regression] ICE when building valgrind testsuite with -march=z14 since r11-7552
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100217 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #3 from Ilya Leoshkevich --- There main problem here is that `register long double f0 asm ("f0")` does not make sense on z14 anymore. long doubles are stored in vector registers now, not in floating-point register pairs. If we skip the hard reg, the code will end up having the following semantics: vr0[0:128] = 1.0L; asm("/* expect the value in vr0[0:64] . vr2[0:64] */"); and fail during the run time. So I think it's better to use the "best effort" approach and force it into a pseudo, even if this would mean that the user-specified register is not honored: --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16814,6 +16814,12 @@ s390_md_asm_adjust (vec , vec , gcc_assert (allows_reg); /* Copy input value from a vector register into a FPR pair. */ rtx fprx2 = gen_reg_rtx (FPRX2mode); + if (REG_P (inputs[i]) && HARD_REGISTER_P (inputs[i])) + { + rtx orig_input = inputs[i]; + inputs[i] = gen_reg_rtx (TFmode); + emit_move_insn (inputs[i], orig_input); + } emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); inputs[i] = fprx2; input_modes[i] = FPRX2mode; I need to check whether we can keep the output logic as is. Ideally the code should be adapted and use the __LONG_DOUBLE_VX__ macro like this: #ifdef __LONG_DOUBLE_VX__ register long double f0 asm ("v0"); #else register long double f0 asm ("f0"); #endif f0 = 1.0L; #ifdef __LONG_DOUBLE_VX__ asm("" : : "v" (f0)); #else asm("" : : "f" (f0)); #endif Maybe a warning recommending to do this should be printed.
[Bug libgomp/98738] task-detach-6.f90 hangs intermittently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98738 --- Comment #1 from Ilya Leoshkevich --- I realized I didn't post the command line I used to build task-detach-6.exe (there are multiple variants of this test); here it is: gcc/build/x86_64-pc-linux-gnu/libgomp/testsuite$ ../../../../build/./gcc/xgcc -B../../../../build/./gcc/ -B../../../../install/x86_64-pc-linux-gnu/bin/ -B../../../../install/x86_64-pc-linux-gnu/lib/ -isystem ../../../../install/x86_64-pc-linux-gnu/include -isystem ../../../../install/x86_64-pc-linux-gnu/sys-include -fchecking=1 ../../../../libgomp/testsuite/libgomp.fortran/task-detach-6.f90 -B../../../../build/x86_64-pc-linux-gnu/./libgomp/ -B../../../../build/x86_64-pc-linux-gnu/./libgomp/.libs -I../../../../build/x86_64-pc-linux-gnu/./libgomp -I../../../../libgomp/testsuite/../../include -I../../../../libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -B../../../../build/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ -O1 -B../../../../build/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs -fintrinsic-modules-path=../../../../build/x86_64-pc-linux-gnu/./libgomp -L../../../../build/x86_64-pc-linux-gnu/./libgomp/.libs -L../../../../build/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ -L../../../../build/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs -lgfortran -foffload=-lgfortran -lquadmath -lm -o ./task-detach-6.exe
[Bug libgomp/98738] New: task-detach-6.f90 hangs intermittently
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98738 Bug ID: 98738 Summary: task-detach-6.f90 hangs intermittently Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com CC: jakub at gcc dot gnu.org Target Milestone: --- I'm currently on commit 2e43880dbd4c. Building task-detach-6.exe and running it in a loop eventually leads to a hang (might take a while, during the last run it happened after 7k runs): gcc/build/x86_64-pc-linux-gnu/libgomp/testsuite$ while true; do LD_LIBRARY_PATH=../../../../install/lib64 ./task-detach-6.exe; echo -n .; done I first spotted this on s390 and then checked on x86_64; the issue is reproducible on both.
[Bug testsuite/98208] make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 Ilya Leoshkevich changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Ilya Leoshkevich --- I've committed the fix: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=057dc81f820b I think I messed up the commit message and Bugzilla did not link the commit to this bug. Anyway, marking this as RESOLVED/FIXED now.
[Bug testsuite/98208] make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 --- Comment #10 from Ilya Leoshkevich --- I've posted the combined fixincludes/tests/base/sys/types.h + genfixes patch here: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561601.html
[Bug testsuite/98208] make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 --- Comment #8 from Ilya Leoshkevich --- Hm, can it be that fixincludes/tests/base/sys/types.h simply needs to be updated? For example, here is a similar commit: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=081b3517b4df826ac917147eb906bbb8fc6528b1 There, both fixincludes/inclhack.def and fixincludes/tests/base/sys/inttypes.h are updated. I tried the following and it helped: diff --git a/fixincludes/tests/base/sys/types.h b/fixincludes/tests/base/sys/types.h index 683b5e9..a318f9b 100644 --- a/fixincludes/tests/base/sys/types.h +++ b/fixincludes/tests/base/sys/types.h @@ -9,6 +9,11 @@ +#if defined( AIX_PHYSADR_T_CHECK ) +typedef struct __physadr_s { +#endif /* AIX_PHYSADR_T_CHECK */ + + #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T
[Bug testsuite/98208] make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 --- Comment #7 from Ilya Leoshkevich --- Still a similar error: sys/types.h /home/iii/gcc/fixincludes/tests/base/sys/types.h differ: byte 243, line 12 *** sys/types.h 2020-12-09 15:57:57.575959676 + --- /home/iii/gcc/fixincludes/tests/base/sys/types.h2020-04-14 11:43:52.317860128 + *** *** 9,20 - #if defined( AIX_PHYSADR_T_CHECK ) - typedef struct __physadr_s { int r[1]; } *physadr_t; - - #endif /* AIX_PHYSADR_T_CHECK */ - - #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T --- 9,14 There were fixinclude test FAILURES What I don't quite get is why does this kick in on Linux? It seems to be gated by `mach = "*-*-aix*"`, just like other similar fixes which don't cause issues.
[Bug testsuite/98208] make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 --- Comment #5 from Ilya Leoshkevich --- Oh, just in case: gcc121 is x86_64 CentOS Linux 7, not AIX.
[Bug testsuite/98208] make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 --- Comment #4 from Ilya Leoshkevich --- Unfortunately not, with this patch I get: sys/types.h gcc/fixincludes/tests/base/sys/types.h differ: byte 243, line 12 *** sys/types.h 2020-12-09 15:46:15.843503181 + --- gcc/fixincludes/tests/base/sys/types.h 2020-04-14 11:43:52.317860128 + *** *** 9,19 - #if defined( AIX_PHYSADR_T_CHECK ) - typedef struct __physadr_s { random text } *physadr_t; - #endif /* AIX_PHYSADR_T_CHECK */ - - #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T --- 9,14 There were fixinclude test FAILURES
[Bug testsuite/98208] make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 Ilya Leoshkevich changed: What|Removed |Added CC||nathan at acm dot org --- Comment #1 from Ilya Leoshkevich --- Bisect points to: commit 92648faa1cb2b28685f3b3dccfdfc4b1ed2c5a7b Author: Nathan Sidwell Date: Wed Nov 18 10:33:30 2020 -0800 aix: Fixinclude
[Bug testsuite/98208] New: make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98208 Bug ID: 98208 Summary: make check's check-fixincludes fails in sys/types.h around AIX_PHYSADR_T_CHECK Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- With the recent master (f1b6e17b3f75) make check fails (on gcc121 machine) as follows: sys/types.h gcc/regtest-f1b6e17b3f75/fixincludes/tests/base/sys/types.h differ: byte 243, line 12 *** sys/types.h 2020-12-08 20:08:54.944208838 + --- gcc/regtest-f1b6e17b3f75/fixincludes/tests/base/sys/types.h 2020-12-08 18:36:20.011729819 + *** *** 9,19 - #if defined( AIX_PHYSADR_T_CHECK ) - typedef struct __physadr_s { - #endif /* AIX_PHYSADR_T_CHECK */ - - #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T --- 9,14 There were fixinclude test FAILURES Might be related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97865, but I haven't bisected yet.
[Bug tree-optimization/98113] [11 Regression] popcnt is not vectorized on s390 since f5e18dd9c7da
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98113 --- Comment #6 from Ilya Leoshkevich --- With the patch, vxe/popcount-1.c works on s390 again: vpopctf: .LFB2: .cfi_startproc vpopctf %v24,%v24 br %r14 Thanks!
[Bug tree-optimization/98113] New: [11 Regression] popcnt is not vectorized on s390 since f5e18dd9c7da
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98113 Bug ID: 98113 Summary: [11 Regression] popcnt is not vectorized on s390 since f5e18dd9c7da Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- s390's vxe/popcount-1.c began to fail after PR96789 fix. The reason is that for the following source code uv4si __attribute__((noinline)) vpopctf (uv4si a) { uv4si r; int i; for (i = 0; i < 4; i++) r[i] = __builtin_popcount (a[i]); return r; } FRE turned _4 = BIT_FIELD_REF ; _11 = __builtin_popcountD.1211 (_4); _18 = (unsigned intD.9) _11; BIT_FIELD_REF = _18; i_20 = 1; ivtmp_21 = 3; _25 = VIEW_CONVERT_EXPR(aD.2283)[i_20]; _26 = __builtin_popcountD.1211 (_25); _27 = (unsigned intD.9) _26; VIEW_CONVERT_EXPR(rD.2286)[i_20] = _27; i_29 = i_20 + 1; ivtmp_30 = ivtmp_21 + 4294967295; _34 = VIEW_CONVERT_EXPR(aD.2283)[i_29]; _35 = __builtin_popcountD.1211 (_34); _36 = (unsigned intD.9) _35; VIEW_CONVERT_EXPR(rD.2286)[i_29] = _36; i_38 = i_29 + 1; ivtmp_39 = ivtmp_30 + 4294967295; _1 = VIEW_CONVERT_EXPR(aD.2283)[i_38]; _2 = __builtin_popcountD.1211 (_1); _3 = (unsigned intD.9) _2; VIEW_CONVERT_EXPR(rD.2286)[i_38] = _3; i_10 = i_38 + 1; ivtmp_16 = ivtmp_39 + 4294967295; _7 = rD.2286; rD.2286 ={v} {CLOBBER}; return _7; into _4 = BIT_FIELD_REF ; _11 = __builtin_popcountD.1211 (_4); _18 = (unsigned intD.9) _11; r_14 = BIT_INSERT_EXPR ; _25 = BIT_FIELD_REF ; _26 = __builtin_popcountD.1211 (_25); _27 = (unsigned intD.9) _26; r_33 = BIT_INSERT_EXPR ; _34 = BIT_FIELD_REF ; _35 = __builtin_popcountD.1211 (_34); _36 = (unsigned intD.9) _35; r_32 = BIT_INSERT_EXPR ; _1 = BIT_FIELD_REF ; _2 = __builtin_popcountD.1211 (_1); _3 = (unsigned intD.9) _2; r_31 = BIT_INSERT_EXPR ; _7 = r_31; return _7; that is, replaced a sequence of stores with a sequence of BIT_INSERT_EXPRs. slp1 now says: "missed: not vectorized: no grouped stores in basic block", presumably because it doesn't understand BIT_INSERT_EXPRs.
[Bug target/97866] [11 Regression] bootstrap error in libasan building a s390x-linux-gnu cross compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97866 --- Comment #3 from Ilya Leoshkevich --- I believe it's already fixed by: commit 253c415a1acba50711c82693426391743ac18040 Author: Vladimir N. Makarov Date: Sun Nov 15 11:22:19 2020 -0500 Do not put reload insns in the last empty BB. gcc/ * lra.c (lra_process_new_insns): Don't put reload insns in the last empty BB. Cherry-picking it helps, and the comment from this commit describes what is happening here: "Do not put reload insns if it is the last BB without actual insns. In this case the reload insns can get null BB after emitting".
[Bug target/97866] [11 Regression] bootstrap error in libasan building a s390x-linux-gnu cross compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97866 --- Comment #2 from Ilya Leoshkevich --- Never mind, I managed to reproduce it now: ubuntu-focal-amd64$ git rev-parse --short HEAD 77f67db2a47 ubuntu-focal-amd64$ ../configure --target=s390x-linux-gnu --exec-prefix=/usr --disable-bootstrap --disable-multilib --enable-languages=c,c++ ubuntu-focal-amd64$ cat test.cpp typedef long a; typedef void (*b)(a, a, void *); class c { unsigned char *m_fn1(); char d(a); a e(a); void f(); }; b g; void *h; void c::f() { for (a j; j < 6; j++) { unsigned char *flags = m_fn1(); for (a i, k; i < k; i++) { if (flags) continue; int *ff = reinterpret_cast(d(i)); g(a(ff), e(j), h); } } } ubuntu-focal-amd64$ gcc/xgcc -Bgcc -std=gnu++14 -O2 -c test.cpp during RTL pass: reload test.cpp: In member function ‘void c::f()’: test.cpp:21:1: internal compiler error: in get_insn_freq, at lra.c:1554
[Bug target/97866] [11 Regression] bootstrap error in libasan building a s390x-linux-gnu cross compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97866 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #1 from Ilya Leoshkevich --- Could you please share your configure flags? On x86_64 Ubuntu 20.04 the following worked fine: ../configure --target=s390x-linux-gnu --exec-prefix=/usr --disable-bootstrap --disable-multilib --enable-languages=c,c++
[Bug rtl-optimization/97326] New: [11 Regression] s390: ICE in do_store_flag after 10843f830350
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97326 Bug ID: 97326 Summary: [11 Regression] s390: ICE in do_store_flag after 10843f830350 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- The following (minimized from gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c) produces an ICE on s390: build$ cat x.c long *a; double *b; c; d() { for (; c; c++) a[c] = 0 <= b[c] && 0 >= b[c]; } build$ gcc/cc1 -O3 -march=z15 x.c during RTL pass: expand x.c: In function 'd': x.c:4:1: internal compiler error: in do_store_flag, at expr.c:12388 0x150dd15 do_store_flag ../../gcc/expr.c:12388 0x1505e1b expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../gcc/expr.c:9621 0x14ec50d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/expr.c:10165 0x14ef981 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) ../../gcc/expr.c:8480 0x1635191 expand_normal ../../gcc/expr.h:288 0x1635191 expand_vect_cond_optab_fn ../../gcc/internal-fn.c:2602 0x136d83d expand_call_stmt ../../gcc/cfgexpand.c:2612 0x136d83d expand_gimple_stmt_1 ../../gcc/cfgexpand.c:3686 0x136d83d expand_gimple_stmt ../../gcc/cfgexpand.c:3851 0x1374f4b expand_gimple_basic_block ../../gcc/cfgexpand.c:5892 0x1377963 execute ../../gcc/cfgexpand.c:6576 Bisect points to: commit 10843f8303509fcba880c6c05c08e4b4ccd24f36 Author: Richard Biener Date: Thu Sep 24 10:14:33 2020 +0200 tree-optimization/97085 - fold some trivial bool vector ?:
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #17 from Ilya Leoshkevich --- Created attachment 48917 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48917=edit aarch64 native build fix Could you please try the attached patch? It fixed the issue for me, and survived bootstrap/regtest on x86_64.
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #16 from Ilya Leoshkevich --- I finally managed to reproduce this by doing `./configure --host=aarch64-none-linux-gnu` on gcc113. The problem is that `CXX_FOR_BUILD` doesn't seem to be set correctly - normally it's `g++-4.8.1 -std=gnu++11`, but in this case it's just `g++`. I'm currently trying to wrap my head around autotools build setup in order to figure out where exactly things went wrong.
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #14 from Ilya Leoshkevich --- gcc113 has 4.8.4, which is a bit newer. But in any case, according to https://gcc.gnu.org/projects/cxx-status.html, gcc should support nullptr since 4.6. Could you please post the failing compiler invocation command? In the meantime I will build gcc 4.8.1 on gcc113 and try to build master with it.
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #12 from Ilya Leoshkevich --- I managed to bootstrap and regtest upstream commit 6e41c27bf549 on gcc113 farm machine. Two questions: - What is your system compiler version? For GCC 11, C++11 compiler is required: https://gcc.gnu.org/install/prerequisites.html - What exactly is "native aarch64 build" - is it simply building the compiler on aarch64, or something else?
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #11 from Ilya Leoshkevich --- Sorry about that! I will have a look.
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #8 from Ilya Leoshkevich --- Created attachment 48750 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48750=edit proposed patch (tests are running)
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #7 from Ilya Leoshkevich --- Would it be OK then to replace last arguments of functions with __attribute__((sentinel)) from NULLs to nullptrs? I can make a patch for this.
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #5 from Ilya Leoshkevich --- I'm sorry, I should not have written (uintptr_t)0 - I just used it as a synonym for a "pointer-sized int". Would allowing 0L as a sentinel value be a reasonable thing?
[Bug c++/95700] read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 --- Comment #4 from Ilya Leoshkevich --- Created attachment 48740 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48740=edit preprocessed output In the preprocessed output I see that gcc's stddef.h is used, but most likely `#define NULL 0L` comes from some other musl header, since musl defines it in like 8 places.
[Bug c++/95700] New: read-md.c: "missing sentinel in function call" when building gcc with musl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95700 Bug ID: 95700 Summary: read-md.c: "missing sentinel in function call" when building gcc with musl Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- I'm trying to bootstrap gcc on gcc301 with --disable-multilib --build=x86_64-alpine-linux-musl. The following error occurs: /home/iii/gcc/regtest-f8a59086423e/build/./prev-gcc/xg++ -B/home/iii/gcc/regtest-f8a59086423e/build/./prev-gcc/ -B/home/iii/gcc/regtest-f8a59086423e/install/x86_64-alpine-linux-musl/bin/ -nostdinc++ -B/home/iii/gcc/regtest-f8a59086423e/build/prev-x86_64-alpine-linux-musl/libstdc++-v3/src/.libs -B/home/iii/gcc/regtest-f8a59086423e/build/prev-x86_64-alpine-linux-musl/libstdc++-v3/libsupc++/.libs -I/home/iii/gcc/regtest-f8a59086423e/build/prev-x86_64-alpine-linux-musl/libstdc++-v3/include/x86_64-alpine-linux-musl -I/home/iii/gcc/regtest-f8a59086423e/build/prev-x86_64-alpine-linux-musl/libstdc++-v3/include -I/home/iii/gcc/regtest-f8a59086423e/libstdc++-v3/libsupc++ -L/home/iii/gcc/regtest-f8a59086423e/build/prev-x86_64-alpine-linux-musl/libstdc++-v3/src/.libs -L/home/iii/gcc/regtest-f8a59086423e/build/prev-x86_64-alpine-linux-musl/libstdc++-v3/libsupc++/.libs -c -g -O2 -fno-checking -gtoggle -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include -I../../gcc/../libcpp/include \ -o build/read-md.o ../../gcc/read-md.c ../../gcc/read-md.c: In member function ‘const char* md_reader::join_c_conditions(const char*, const char*)’: ../../gcc/read-md.c:158:58: error: missing sentinel in function call [-Werror=format=] 158 | result = concat ("(", cond1, ") && (", cond2, ")", NULL); | ^ ../../gcc/read-md.c: In member function ‘void md_reader::handle_enum(file_location, bool)’: ../../gcc/read-md.c:947:58: error: missing sentinel in function call [-Werror=format=] 947 |value_name = concat (def->name, "_", name.string, NULL); | ^ ../../gcc/read-md.c: In member function ‘void md_reader::handle_include(file_location)’: ../../gcc/read-md.c:1072:57: error: missing sentinel in function call [-Werror=format=] 1072 |pathname = concat (stackp->fname, sep, filename, NULL); | ^ ../../gcc/read-md.c:1085:47: error: missing sentinel in function call [-Werror=format=] 1085 | pathname = concat (m_base_dir, filename, NULL); | ^ cc1plus: all warnings being treated as errors musl has the following commit: https://git.musl-libc.org/cgit/musl/commit/?id=c8a9c22173f485c8c053709e1dfa0a617cb6be1a, which suggests that C++ (as opposed to plain C) should allow plain (uintptr_t)0 as a sentinel value. gcc wants either a pointer or __null though: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/c-family/c-common.c;h=b1379faa412e3646a443969c0067f5c4fb23e107;hb=929fd91ba975eebf9e57f7f092041271dcaf0c34#l5385 Would it be possible to allow (uintptr_t)0 as a valid sentinel value for C++? Or is it musl that is wrong here?
[Bug tree-optimization/94792] New: Missed SLP optimization in pr65930-2.c variation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94792 Bug ID: 94792 Summary: Missed SLP optimization in pr65930-2.c variation Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- gcc commit cf3a909cf455. Consider the following variation of pr65930-2.c: $ cat pr65930-2b.c #include "tree-vect.h" int __attribute__((noipa)) bar (unsigned int *x, int n) { unsigned int sum = 4; x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); for (int i = 0; i < n; ++i) sum += x[i*4+0]+ x[i*4 + 1] + x[i*4 + 2] + x[i*4 + 3]; return sum; } int main () { static int a[16] __attribute__((aligned(__BIGGEST_ALIGNMENT__))) = { 1, 3, 5, 8, 9, 10, 17, 18, 23, 29, 30, 55, 42, 2, 3, 1 }; check_vect (); if (bar (a, 4) != 260) abort (); return 0; } This differs from pr65930-2.c only in that sum type is unsigned int, which should be on cast less. And yet: $ gcc pr65930-2b.c -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -fdiagnostics-urls=never -msse2 -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./pr65930-2.exe ; grep SLP pr65930-2b.c.161t.vect | wc -l 0 whereas for the original version: $ gcc pr65930-2.c -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -fdiagnostics-urls=never -msse2 -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -lm -o ./pr65930-2.exe ; grep SLP pr65930-2.c.161t.vect | wc -l 33 The resulting assembly is also noticeably larger and uses regular adds for at least part of the data.
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #29 from Ilya Leoshkevich --- Created attachment 47463 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47463=edit nop plugin Hi Maxim, Just to clear my conscience, could you please try the nop trick in your setup? I normally use the attached plugin for that. Just build it and add e.g. -fplugin=$HOME/gcc-nop-plugin/libgcc_nop_plugin.so -fplugin-arg-libgcc_nop_plugin-S_regmatch=4 to the compiler flags. Best regards, Ilya
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #27 from Ilya Leoshkevich --- With -DSPEC_CPU -DNDEBUG -DPERL_CORE -O3 -save-temps=obj -fopt-info-vec-optimized -DSPEC_CPU_LP64 -DSPEC_CPU_LINUX_X64 -fgnu89-inline on gcc113 I can see 2% slowdown: r277511 (without this fix): 880.09s r277515 (with this fix):897.85s The function that degraded the most is indeed S_regmatch: $ perf diff perf-9760321.data perf-44b2b4c.data 32.24% exe[.] S_regmatch 8.92% exe[.] S_find_byclass.isra.0 6.80% +0.28% libc-2.19.so [.] 0x0007dec0 5.20% exe[.] S_regtry However, the "shape" of S_regmatch did not change, that is, when all offsets and register numbers are replaced with "x" in the objdump output, the old and the new versions are identical. This hints at some microarchitectural effect - aliasing in the branch predictor maybe? From my perspective, this happens too often, so I use the following test to rule this out: just add a nop at the beginning of the problematic function. This changes all the offsets and makes aliasing situation completely different. And indeed, by adding a single nop to S_regmatch, I get wildly different results (for now this is just 1 repeat, I will run best-of-3 overnight): r277511 (without this fix): 929.1s r277515 (with this fix):931.48s
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #26 from Ilya Leoshkevich --- Whoops, I accidentally used a script I normally use for big-endian machines (s390 actually). But gcc113 is an APM X-Gene Mustang board. I'll try again with your flags and see if I can reproduce the regression there.
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #24 from Ilya Leoshkevich --- I got the following results on gcc113: 400.perlbench Compiler flags: -DSPEC_CPU -DNDEBUG -DPERL_CORE -march=native -g -O3 -funroll-loops -fopt-info-vec-optimized -DSPEC_CPU -DNDEBUG -DPERL_CORE -DSPEC_CPU_LINUX -DSPEC_CPU_BIGENDIAN -D_GNU_SOURCE -DSPEC_CPU_LP64 -fno-strict-aliasing -std=gnu90 r277511 (without this fix): 884.11s r277515 (with this fix):874.93s Maxim, could you please share compiler flags with which you are seeing the regression?
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #22 from Ilya Leoshkevich --- Hello Maxim, Sorry about that! I don't think it's possible to simply move jump threading back, since it's not correct to have it where it used to be. So I will build and run the new and the old 400.perlbench on gcc compile farm and see what else I can do about the difference. Best regards, Ilya
[Bug rtl-optimization/92430] [9/10 Regression] Compile-time hog w/ -Os -fno-if-conversion -fno-tree-dce -fno-tree-loop-optimize -fno-tree-vrp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92430 --- Comment #3 from Ilya Leoshkevich --- Findings so far: when we forward an edge like this: #0 redirect_edge_succ (e=0x76d73cc0, new_succ=0x76c2aa90) at ../.././gcc/cfg.c:368 #1 0x00a776ff in redirect_edge_succ_nodup (e=0x76d73cc0, new_succ=0x76c2aa90) at ../.././gcc/cfghooks.c:469 #2 0x00a9c18a in cfg_layout_redirect_edge_and_branch (e=0x76d73cc0, dest=0x76c2aa90) at ../.././gcc/cfgrtl.c:4500 #3 0x00a77419 in redirect_edge_and_branch (e=0x76d73cc0, dest=0x76c2aa90) at ../.././gcc/cfghooks.c:373 #4 0x02496e8d in try_forward_edges (mode=40, b=0x76d86680) at ../.././gcc/cfgcleanup.c:563 #5 0x024a2654 in try_optimize_cfg (mode=40) at ../.././gcc/cfgcleanup.c:2961 #6 0x024a2d1a in cleanup_cfg (mode=40) at ../.././gcc/cfgcleanup.c:3175 #7 0x024a2f29 in (anonymous namespace)::pass_jump_after_combine::execute (this=0x38a2b00) at ../.././gcc/cfgcleanup.c:3315 we don't seem to correctly update dominance info (if at all), making it inconsistent with the actual CFG. In this particular case, inconsistency makes the following call chain produce a loop in the dominator tree: #3 0x00b37638 in redirect_immediate_dominators (dir=CDI_DOMINATORS, bb=0x76c2ab60, to=0x76d867b8) at ../.././gcc/dominance.c:995 #4 0x00a7838c in merge_blocks (a=0x76d867b8, b=0x76c2ab60) at ../.././gcc/cfghooks.c:852 #5 0x024a1a1d in try_optimize_cfg (mode=40) at ../.././gcc/cfgcleanup.c:2825 #6 0x024a2d1a in cleanup_cfg (mode=40) at ../.././gcc/cfgcleanup.c:3175 #7 0x024a2f29 in (anonymous namespace)::pass_jump_after_combine::execute (this=0x38a2b00) at ../.././gcc/cfgcleanup.c:3315 which ultimately leads to the hang that we are observing.
[Bug rtl-optimization/92430] [9/10 Regression] Compile-time hog w/ -Os -fno-if-conversion -fno-tree-dce -fno-tree-loop-optimize -fno-tree-vrp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92430 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com, ||krebbel1 at de dot ibm.com --- Comment #2 from Ilya Leoshkevich --- I'm looking into this.
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #15 from Ilya Leoshkevich --- Created attachment 47059 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47059=edit proposed fix (without renaming the pass so far)
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #14 from Ilya Leoshkevich --- Created attachment 47058 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47058=edit temporary patch for finding out the number of threaded edges
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #12 from Ilya Leoshkevich --- > Well, it apparently has found new jump threading opportunities after > partition_blocks. Are such changes useful? Does it happen often? It's still combine that was responsible for this particular opportunity. I've added a simple counter of threaded edges and built two compiler versions: with and without the patch from comment 3. The value of the counter is the same in both cases for the code from this bugreport. Furthermore, I've built SPEC 2006 and SPEC 2017 with vanilla and patched compilers and aggregated the counter values. When doing jump threading right after reload, 3889 edges are threaded. When doing jump threading right after combine, 3918 edges are threaded. Both figures are more or less the same, we even end up losing some opportunities if we delay jump threading.
[Bug tree-optimization/92115] [10 Regression] ICE in gimple_cond_get_ops_from_tree, at gimple-expr.c:577
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92115 --- Comment #6 from Ilya Leoshkevich --- > Am 16.10.2019 um 16:32 schrieb asolokha at gmx dot com > : > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92115 > > --- Comment #4 from Arseny Solokha --- > (In reply to Ilya Leoshkevich from comment #3) >> Arseny, how did you find this? Did you just run the regtest? I wonder why >> didn't I see it during my test runs. > > My test harness continuously compiles a corpus of C, C++, and Fortran code > with > the latest weekly trunk snapshot, picking a random set of compiler options and > parameters for each file. gcc and libstdc++ test suites constitute an > important > part of that corpus. When compiling files from these test suites, my test > harness ignores compiler options specified there for DejaGNU and uses its own > randomly chosen ones instead. Of course, this approach is not suitable for > testing run-time correctness. > > So, if there are no testcases in the test suite yet which could trigger that > specific code path in gcc internals, probably due to an unusual set of > compiler > options, your testing won't reveal a problem reported here. That is probably > OK, as regression testing have to be deterministic, after all. Hi Arseny, Did you per chance open-source it? Best regards, Ilya
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #10 from Ilya Leoshkevich --- > Question is how to figure out which to do when. I would always do the former before reload, and always the latter after reload. However, I have a concern regarding this approach: in more complicated cases instead of just a single 11/COLD we might have a larger lump of cold basic blocks. In order to avoid introducing new crossing edges, we would have to make them all hot (using e.g. a simple worklist algorithm). Is such an end result desirable? I'd also still like to understand the motivation behind doing this pass after reload. When I introduced it in r266734, the only goal was to clean up the CFG after combine. I was advised to put it where it is now, and back then I did not see any downsides to doing so. But now that this problem has arisen - what is the advantage of doing this after the following 16 additional passes? What would be the downside of doing it between pass_combine and pass_partition_blocks? NEXT_PASS (pass_combine); -- NEXT_PASS (pass_if_after_combine); NEXT_PASS (pass_partition_blocks); NEXT_PASS (pass_outof_cfg_layout_mode); NEXT_PASS (pass_split_all_insns); NEXT_PASS (pass_lower_subreg3); NEXT_PASS (pass_df_initialize_no_opt); NEXT_PASS (pass_stack_ptr_mod); NEXT_PASS (pass_mode_switching); NEXT_PASS (pass_match_asm_constraints); NEXT_PASS (pass_sms); NEXT_PASS (pass_live_range_shrinkage); NEXT_PASS (pass_sched); NEXT_PASS (pass_early_remat); NEXT_PASS (pass_ira); NEXT_PASS (pass_reload); NEXT_PASS (pass_postreload); PUSH_INSERT_PASSES_WITHIN (pass_postreload) -- NEXT_PASS (pass_postreload_jump);
[Bug tree-optimization/92115] [10 Regression] ICE in gimple_cond_get_ops_from_tree, at gimple-expr.c:577
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92115 --- Comment #3 from Ilya Leoshkevich --- Thanks again, Jakub. Arseny, how did you find this? Did you just run the regtest? I wonder why didn't I see it during my test runs.
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #7 from Ilya Leoshkevich --- How can we do this here? When we make a decision to eliminate bb 5, all the "nearby" edges are hot. Having eliminated bb 5, we cannot avoid making bb 6 cold, since this would violate CFG integrity: as far as I understand, it's important to maintain the property that cold bbs cannot dominate hot bbs. So we would have to avoid eliminating bb 5 in the first place, and for that we would need to analyze which consequences that would have w.r.t. dominators and partitioning, and that might be costly.
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #5 from Ilya Leoshkevich --- +1 regarding renaming, I just wanted to keep it simple here. Landing pad issue aside, I'm beginning to wonder if we can have a jump pass after reload at all? For example, if hotness of a basic block changes, and a related jump becomes a crossing one: can it be that on some targets we would have to change a "simple" branching instruction to a sequence that first fetches a target address from a literal pool? And then, since reload has already completed, how do we allocate a register for that?
[Bug middle-end/92063] [10 Regression] ICE in operation_could_trap_p, at tree-eh.c:2528 when compiling Python's Python/_warnings.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92063 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #10 from Ilya Leoshkevich --- Hi Jakub, thanks for fixing this! FWIW the patch looks good to me. I have also run my signaling comparison tests on S/390 with it, and they still work. Is there something else I need to look at in context of this problem?
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 --- Comment #3 from Ilya Leoshkevich --- Jump threading has converted this: +-- 2/HOT + | | v v 3/HOT --> 5/HOT --> 8/HOT --> 11/COLD --> 6/HOT --EH--> 16/HOT | ^ | | +---+ into this: +-- 2/HOT --+ | | v v 3/HOT --> 8/HOT --> 11/COLD --> 6/COLD --EH--> 16/HOT by eleminating bb 5. This made bb 6 dominated by cold bb 11, and because of this fixup_partitions made bb 6 cold as well, which in turn made EH edge 6->16 a crossing one. According to https://gcc.gnu.org/viewcvs/gcc?view=revision=176696 in this situation we need to create a cold landing pad. But I wonder whether we could just do the following instead? --- a/gcc/passes.def +++ b/gcc/passes.def @@ -439,6 +439,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ud_rtl_dce); NEXT_PASS (pass_combine); NEXT_PASS (pass_if_after_combine); + NEXT_PASS (pass_postreload_jump); NEXT_PASS (pass_partition_blocks); NEXT_PASS (pass_outof_cfg_layout_mode); NEXT_PASS (pass_split_all_insns); @@ -455,7 +456,6 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_reload); NEXT_PASS (pass_postreload); PUSH_INSERT_PASSES_WITHIN (pass_postreload) - NEXT_PASS (pass_postreload_jump); NEXT_PASS (pass_postreload_cse); NEXT_PASS (pass_gcse2); NEXT_PASS (pass_split_after_reload); This will fix this problem while retaining the benefits of the additional jump threading pass.
[Bug target/91323] LTGT rtx produces UCOMISS instead of COMISS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91323 Ilya Leoshkevich changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #21 from Ilya Leoshkevich --- I'm happy with x86 and spec improvements; the latter has also helped me to make progress with S/390 implementation of signaling comparisons. Thanks!
[Bug rtl-optimization/92007] [9/10 Regression] ICE: verify_flow_info failed (error: EH edge crosses section boundary in bb 7)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92007 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #2 from Ilya Leoshkevich --- I will have a look at this and try to adjust the CLEANUP_THREADING code.
[Bug target/88082] ICE in change_address_1, at emit-rtl.c:2286
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88082 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #1 from Ilya Leoshkevich --- Hello Martin, do you per chance remember the failing revision? With r274945 and stable gcc 9.1.1 it seems to work fine: $ ./build/gcc/cc1 gcc/testsuite/c-c++-common/pr59037.c -Os -march=z14 ; echo $? 0 $ gcc-9 gcc/testsuite/c-c++-common/pr59037.c -Os -march=z14 ; echo $? 0
[Bug target/87206] Suboptimal code generation for __atomic_compare_exchange_n followed by a comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87206 --- Comment #1 from Ilya Leoshkevich --- Gentle ping. Is there a way to make this work? I could look into implementing this if someone points me in the right direction.
[Bug target/91323] New: LTGT rtx produces UCOMISS instead of COMISS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91323 Bug ID: 91323 Summary: LTGT rtx produces UCOMISS instead of COMISS Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- I'm implementing signaling comparisons on S/390 and I'm trying to figure out whether or not LTGT is supposed to be signaling. I've decided to check what Intel does, and ran into what appears to be a bug. Consider the following functions: int f1(float a, float b) { return a < b || a > b; } int f2(float a, float b) { return __builtin_isless(a, b) || __builtin_isgreater(a, b); } int f3(float a, float b) { return __builtin_islessgreater(a, b); } gcc creates LTGT rtx for f1 and UNEQ for f2 and f3. However, for all 3 variants it then emits UCOMISS instruction. I would expect f1 to be compiled to COMISS, since I believe that comparison operators in C are supposed to be signaling.
[Bug target/89233] [9 Regression] ICE in change_address_1, at emit-rtl.c:2286
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89233 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #2 from Ilya Leoshkevich --- I'll look into this.
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #7 from Ilya Leoshkevich --- Apparently, for this specific case doing more of hard register copy propagation is enough. I've just tried running pass_cprop_hardreg before pass_thread_prologue_and_epilogue, and it helped. So, would running a mini-cprop_hardreg instead of just copyprop_hardreg_forward_bb_without_debug_insn (entry_block) be reasonable here? Something along the lines of: - Do something like pre_and_rev_post_order_compute_fn (), but do not go further from bbs which contain insns satisfying requires_stack_frame_p (), since shrink-wrapping cannot happen past those anyway. Same for bbs which have more than 1 predecessor, since cprop_hardreg forgets everything it saw when it encounters those. Not sure if a reasonable merge function can be defined for struct value_data to improve this? Maybe also stop completely when a certain number of bbs is found. - Do something like pass_cprop_hardreg::execute (), but use only bbs computed during the previous step. Btw, would reverse postorder be the "more intelligent queuing of blocks" mentioned in pass_cprop_hardreg::execute ()? When you say that what IRA does is not effective, do you mean just the need to track indirect hard register copies, or can it be improved even further?
[Bug target/87762] [9 Regression] extract_constrain_insn, at recog.c:2206 on s390x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87762 --- Comment #5 from Ilya Leoshkevich --- Martin, I believe I fixed this one. Could you please give it another try?
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #5 from Ilya Leoshkevich --- By the time shrink-wrapping is performed, which is after LRA (pass_thread_prologue_and_epilogue, to be precise), aren't all spilling decisions already made? Because if that's true, we have to be conservative in prepare_shrink_wrap () anyway, and move down copies only when the parameter register still contains the parameter value.
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #3 from Ilya Leoshkevich --- Judging by the following comment in lra-coalesce.c, RA doesn't do this intentionally: Here we coalesce only spilled pseudos. Coalescing non-spilled pseudos (with different hard regs) might result in spilling additional pseudos because of possible conflicts with other non-spilled pseudos and, as a consequence, in more constraint passes and even LRA infinite cycling. Trivial the same hard register moves will be removed by subsequent compiler passes. In which cases would moving copies down in prepare_shrink_wrap () make the code worse?
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #1 from Ilya Leoshkevich --- Bisect points to r265398: combine: Do not combine moves from hard registers. I wonder what would be the best place to fix this? I was thinking about making shrink-wrapping try harder by not limiting the processing to the first basic block.
[Bug rtl-optimization/87902] New: [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 Bug ID: 87902 Summary: [9 Regression] Shrink-wrapping multiple conditions Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com CC: krebbel at gcc dot gnu.org Target Milestone: --- Target: s390x-linux-gnu I've noticed that r265830 fails to shrink-wrap multiple early returns in gcc/testsuite/gcc.target/s390/nobp-return-mem-z900.c, while r264877 managed to do so just fine. After reload we end up with the following code for those conditions: ;; basic block 2 (note 5 1 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 3 5 2 2 NOTE_INSN_FUNCTION_BEG) (insn 2 3 7 2 (set (reg/v:DI 12 %r12 [orig:63 aD.2191+-4 ] [63]) (reg:DI 2 %r2 [72])) "gcc/testsuite/gcc.target/s390/nobp-return-mem-z900.c":14:1 1269 {*movdi_64} (nil)) (insn 7 2 8 2 (set (reg:CCZ 33 %cc) (compare:CCZ (reg:SI 12 %r12 [orig:63 aD.2191 ] [63]) (const_int 42 [0x2a]))) "gcc/testsuite/gcc.target/s390/nobp-return-mem-z900.c":17:6 1232 {*cmpsi_cct} (nil)) (jump_insn 8 7 9 2 (set (pc) (if_then_else (eq (reg:CCZ 33 %cc) (const_int 0 [0])) (label_ref:DI 33) (pc))) "gcc/testsuite/gcc.target/s390/nobp-return-mem-z900.c":17:6 1896 {*cjump_64} (int_list:REG_BR_PROB 225163668 (nil)) -> 33) ;; basic block 3 (note 9 8 12 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 12 9 13 3 (set (reg:CCS 33 %cc) (compare:CCS (reg:SI 12 %r12 [orig:63 aD.2191 ] [63]) (const_int 0 [0]))) "gcc/testsuite/gcc.target/s390/nobp-return-mem-z900.c":20:3 1222 {*tstsi_cconly2} (nil)) (jump_insn 13 12 14 3 (set (pc) (if_then_else (le (reg:CCS 33 %cc) (const_int 0 [0])) (label_ref:DI 33) (pc))) "gcc/testsuite/gcc.target/s390/nobp-return-mem-z900.c":20:3 1896 {*cjump_64} (int_list:REG_BR_PROB 118111604 (nil)) -> 33) Note that comparisons use a copy in caller-saved %r12, and not %r2. Then, prepare_shrink_wrap () successfully propagates it to basic block 2. Basic block 3 is not affected - this seems to be by design, since prepare_shrink_wrap () only concerns itself with the first basic block. In the past reload used to eliminate the copy and use %r2 directly in both comparisons, but this seems to be no longer the case.
[Bug target/87762] [9 Regression] extract_constrain_insn, at recog.c:2206 on s390x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87762 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #2 from Ilya Leoshkevich --- This must have slipped through, because I tested the movdi patch on top of the outdated trunk (r264877). Candidate fix: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01793.html
[Bug bootstrap/87747] [9 regression] Bootstrap failure if using gcc-4.6 as stage1 compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87747 --- Comment #2 from Ilya Leoshkevich --- This is a little bit more complicated, because EQ_ATTR_ALT is valid only for GENERATOR_FILEs. The regtest has just finished, so I will post the patch to the mailing list now.
[Bug tree-optimization/87687] New: s390x gcc 9 ICE in value_range::check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87687 Bug ID: 87687 Summary: s390x gcc 9 ICE in value_range::check Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- Target: s390x-redhat-linux SVN r265373 / git f9fd74d64e9: $ f9fd74d64e9-install/bin/gcc -x c -O2 -c - void b() { int c = 1, d, e = 4096; for (; c; c--) { d = 1; for (; d; d--) e--; } } during GIMPLE pass: evrp : In function ‘b’: :8:1: internal compiler error: in check, at tree-vrp.c:155 0x1ab6019 value_range::check() /home/iii/ibm/gcc-bisect/src/gcc/tree-vrp.c:155 0x1ab9a35 value_range::value_range(value_range_kind, tree_node*, tree_node*, bitmap_head*) /home/iii/ibm/gcc-bisect/src/gcc/tree-vrp.c:110 0x1ab9a35 set_value_range_with_overflow /home/iii/ibm/gcc-bisect/src/gcc/tree-vrp.c:1422 0x1ab9a35 extract_range_from_binary_expr_1(value_range*, tree_code, tree_node*, value_range const*, value_range const*) /home/iii/ibm/gcc-bisect/src/gcc/tree-vrp.c:1679 0x1b48af7 vr_values::extract_range_from_binary_expr(value_range*, tree_code, tree_node*, tree_node*, tree_node*) /home/iii/ibm/gcc-bisect/src/gcc/vr-values.c:734 0x1b4b0d1 vr_values::extract_range_from_assignment(value_range*, gassign*) /home/iii/ibm/gcc-bisect/src/gcc/vr-values.c:1389 0x1f03e29 evrp_range_analyzer::record_ranges_from_stmt(gimple*, bool) /home/iii/ibm/gcc-bisect/src/gcc/gimple-ssa-evrp-analyze.c:285 0x1f0228f evrp_dom_walker::before_dom_children(basic_block_def*) /home/iii/ibm/gcc-bisect/src/gcc/gimple-ssa-evrp.c:139 0x1edb47d dom_walker::walk(basic_block_def*) /home/iii/ibm/gcc-bisect/src/gcc/domwalk.c:353 0x1f02dc9 execute_early_vrp /home/iii/ibm/gcc-bisect/src/gcc/gimple-ssa-evrp.c:311 0x1f02dc9 execute /home/iii/ibm/gcc-bisect/src/gcc/gimple-ssa-evrp.c:348 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug bootstrap/87417] [9 regression] Internal error: abort in attr_alt_intersection, at genattrtab.c:2357
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87417 --- Comment #6 from Ilya Leoshkevich --- Candidate patch here: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01382.html
[Bug bootstrap/87417] [9 regression] Internal error: abort in attr_alt_intersection, at genattrtab.c:2357
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87417 --- Comment #5 from Ilya Leoshkevich --- Ok, makes sense. I've just made a patch that adds the 5th, but it had to be special-cased for GENERATOR_FILE, and thus doesn't look too nice. FORMAT[0] == 'w' looks much cleaner.
[Bug bootstrap/87417] [9 regression] Internal error: abort in attr_alt_intersection, at genattrtab.c:2357
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87417 --- Comment #3 from Ilya Leoshkevich --- Valgrind has found an issue: ==12738== Invalid write of size 4 ==12738==at 0x804CC48: attr_rtx_1 (genattrtab.c:518) ==12738==by 0x804CC48: attr_rtx(rtx_code, ...) (genattrtab.c:588) ==12738==by 0x804EA6D: mk_attr_alt (genattrtab.c:2406) ==12738==by 0x804EA6D: check_attr_test(file_location, rtx_def*, attr_desc*) (genattrtab.c:709) ==12738==by 0x804EBBF: check_attr_value(file_location, rtx_def*, attr_desc*) (genattrtab.c:945) ==12738==by 0x804A0AA: check_defs (genattrtab.c:1108) ==12738==by 0x804A0AA: main (genattrtab.c:5253) ==12738== Address 0x6d79aa0 is 0 bytes after a block of size 16 alloc'd ==12738==at 0x402E27C: malloc (vg_replace_malloc.c:299) ==12738==by 0x8064FD3: xmalloc (xmalloc.c:147) ==12738==by 0x805233E: ggc_internal_alloc (ggc.h:130) ==12738==by 0x805233E: ggc_alloc_rtx_def_stat (ggc.h:275) ==12738==by 0x805233E: rtx_alloc_stat_v(rtx_code, int) (rtl.c:209) ==12738==by 0x805236D: rtx_alloc(rtx_code) (rtl.c:233) ==12738==by 0x804CC39: attr_rtx_1 (genattrtab.c:516) ==12738==by 0x804CC39: attr_rtx(rtx_code, ...) (genattrtab.c:588) ==12738==by 0x804EA6D: mk_attr_alt (genattrtab.c:2406) ==12738==by 0x804EA6D: check_attr_test(file_location, rtx_def*, attr_desc*) (genattrtab.c:709) ==12738==by 0x804EBBF: check_attr_value(file_location, rtx_def*, attr_desc*) (genattrtab.c:945) ==12738==by 0x804A0AA: check_defs (genattrtab.c:1108) ==12738==by 0x804A0AA: main (genattrtab.c:5253) Apparently allocated EQ_ATTR_ALT is smaller than I expect: 16 bytes are clearly not enough to contain rtx_def and 2 HOST_WIDE_INTs.
[Bug bootstrap/87417] [9 regression] Internal error: abort in attr_alt_intersection, at genattrtab.c:2357
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87417 --- Comment #2 from Ilya Leoshkevich --- Fails on i686-linux-gnu: *** Error in `build/genattrtab': malloc(): memory corruption: 0x08e56da0 *** === Backtrace: = /lib/i386-linux-gnu/libc.so.6(+0x6738a)[0xf755c38a] /lib/i386-linux-gnu/libc.so.6(+0x6dfc7)[0xf7562fc7] /lib/i386-linux-gnu/libc.so.6(+0x6ff82)[0xf7564f82] /lib/i386-linux-gnu/libc.so.6(__libc_malloc+0xc5)[0xf7566bf5] build/genattrtab[0x8064fd4] build/genattrtab[0x805233f] build/genattrtab[0x805236e] build/genattrtab[0x804cc3a] build/genattrtab[0x804ea6e] build/genattrtab[0x804ebc0] build/genattrtab[0x804a0ab] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf6)[0xf750d286] build/genattrtab[0x804ba27] === Memory map: 08048000-08091000 r-xp 00:21 29450225 /Users/beep/ibm/gcc/host-x86_64-pc-linux-gnu/gcc/build/genattrtab 08092000-08093000 r--p 00049000 00:21 29450225 /Users/beep/ibm/gcc/host-x86_64-pc-linux-gnu/gcc/build/genattrtab 08093000-08097000 rw-p 0004a000 00:21 29450225 /Users/beep/ibm/gcc/host-x86_64-pc-linux-gnu/gcc/build/genattrtab 08097000-0809b000 rw-p 00:00 0 08452000-08eef000 rw-p 00:00 0 [heap] f710-f7121000 rw-p 00:00 0 f7121000-f720 ---p 00:00 0 f72af000-f73b1000 rw-p 00:00 0 f73ce000-f73ea000 r-xp 00:2b 74 /lib/i386-linux-gnu/libgcc_s.so.1 f73ea000-f73eb000 r--p 0001b000 00:2b 74 /lib/i386-linux-gnu/libgcc_s.so.1 f73eb000-f73ec000 rw-p 0001c000 00:2b 74 /lib/i386-linux-gnu/libgcc_s.so.1 f73f1000-f74f5000 rw-p 00:00 0 f74f5000-f76a6000 r-xp 00:2b 43 /lib/i386-linux-gnu/libc-2.24.so f76a6000-f76a8000 r--p 001b 00:2b 43 /lib/i386-linux-gnu/libc-2.24.so f76a8000-f76a9000 rw-p 001b2000 00:2b 43 /lib/i386-linux-gnu/libc-2.24.so f76a9000-f76ac000 rw-p 00:00 0 f76ac000-f76ff000 r-xp 00:2b 88 /lib/i386-linux-gnu/libm-2.24.so f76ff000-f770 r--p 00052000 00:2b 88 /lib/i386-linux-gnu/libm-2.24.so f770-f7701000 rw-p 00053000 00:2b 88 /lib/i386-linux-gnu/libm-2.24.so f7705000-f7709000 rw-p 00:00 0 f7709000-f770b000 r--p 00:00 0 [vvar] f770b000-f770c000 r-xp 00:00 0 [vdso] f770c000-f772f000 r-xp 00:2b 36 /lib/i386-linux-gnu/ld-2.24.so f772f000-f773 r--p 00022000 00:2b 36 /lib/i386-linux-gnu/ld-2.24.so f773-f7731000 rw-p 00023000 00:2b 36 /lib/i386-linux-gnu/ld-2.24.so fffc5000-fffe7000 rw-p 00:00 0 [stack]
[Bug bootstrap/87417] [9 regression] Internal error: abort in attr_alt_intersection, at genattrtab.c:2357
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87417 --- Comment #1 from Ilya Leoshkevich --- Ouch! Somehow s2 got corrupted (the 2nd value can be either 0 or 1). I'm looking at this now.
[Bug tree-optimization/87309] [9 Regression] Spurious note: messages when building with -fopt-info-vec-optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87309 --- Comment #7 from Ilya Leoshkevich --- Thanks!
[Bug tree-optimization/87309] [9 Regression] Spurious note: messages when building with -fopt-info-vec-optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87309 --- Comment #4 from Ilya Leoshkevich --- Do we also need to test m_test_pp_flags? At least dump_context::emit_item does it.
[Bug tree-optimization/87309] Spurious note: messages when building with -fopt-info-vec-optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87309 Ilya Leoshkevich changed: What|Removed |Added CC||iii at linux dot ibm.com --- Comment #1 from Ilya Leoshkevich --- Created attachment 44693 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44693=edit patch This fixes the problem for me, but I'm not sure if this is the right solution.
[Bug tree-optimization/87309] New: Spurious note: messages when building with -fopt-info-vec-optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87309 Bug ID: 87309 Summary: Spurious note: messages when building with -fopt-info-vec-optimized Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com Target Milestone: --- $ cat test.cpp void a() {} $ g++ -c test.cpp -fopt-info-vec-optimized -O3 test.cpp:1:6: note: test.cpp:1:11: note: This is coming from DUMP_VECT_SCOPE ("vect_analyze_data_refs"); in vect_analyze_data_refs(). I suspect that alt_flags check around dump_loc call is missing in dump_context::begin_scope.
[Bug tree-optimization/87206] New: Suboptimal code generation for __atomic_compare_exchange_n followed by a comparison
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87206 Bug ID: 87206 Summary: Suboptimal code generation for __atomic_compare_exchange_n followed by a comparison Product: gcc Version: 8.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: iii at linux dot ibm.com CC: krebbel at gcc dot gnu.org Target Milestone: --- I tried to build the example #5 from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 on x86_64 and observed a similar issue: $ cat 1.c extern void bar (int *); void foo5(int *mem) { int oldval = 0; __atomic_compare_exchange_n (mem, (void *) , 1, 1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); if (oldval != 0) bar (mem); } $ gcc-8 -c 1.c -O3 -g $ objdump -d 1.o # skip <_foo5>: 0: 31 c0 xor%eax,%eax 2: ba 01 00 00 00 mov$0x1,%edx 7: f0 0f b1 17 lock cmpxchg %edx,(%rdi) b: 85 c0 test %eax,%eax d: 75 01 jne10 <_foo5+0x10> f: c3 retq 10: e9 00 00 00 00 jmpq 15 <_foo5+0x15> We don't have to do "test %eax,%eax", because this information is already available through ZF, which is set by CMPXCHG. I wonder if it would be possible to come up with a common solution for all architectures, including x86_64 and s390?
[Bug target/80080] S390: Isses with emitted cs-instructions for __atomic builtins.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 --- Comment #12 from Ilya Leoshkevich --- I've investigated foo3, foo4 and foo5, and came to the following conclusions: When foo3 is compiled with -march=z10 or later, cprop1 pass propagates global's SYMBOL_REF value into UNSPECV_CAS. On previous machines it does not happen, because the result is rejected by insn_invalid_p (). Then, reload realizes that SYMBOL_REF cannot be a legitimate UNSPECV_CAS argument, and loads it into a pseudo right before. The net result is that loading of SYMBOL_REF is moved from outside of the loop into the loop. So we need to somehow inhibit constant propagation for this case. Jump threading in foo4 does not work, because it's done only during `jump' pass, at which point there are insns with side-effects in the basic block of the 2nd jump. They are later deleted by the `combine' pass, but we don't request CLEANUP_THREADING after that. I wonder if we could introduce it? In addition, when foo4 is compiled with -O2 or -O3, we don't use conditional return, because our return sequence contains a PARALLEL, which is rejected by bb_is_just_return (). This can also be improved. Finally, in foo5 `cs' is generated by s390_expand_cs_tdsi (), and comparison is generated by common expansion logic, so it doesn't look possible to improve the situation solely in the back-end. We need to somehow make gcc aware that (oldval == 0) and (retval != 0) are equivalent after `cs', but I'm not sure at which point we could and should do this - in theory doing this on tree rather than RTL level can help other architectures.