[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962 Uroš Bizjak changed: What|Removed |Added Assignee|ubizjak at gmail dot com |unassigned at gcc dot gnu.org Status|ASSIGNED|NEW --- Comment #10 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #7) > but with -fexceptions (and probably because we incorrectly don't mark the > builtins nothrow?) this doesn't happen. Maybe we should finally fix the above nothrow issue?
[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962 --- Comment #9 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #8) > Of course, yet another option is: This goes out of my (limited) area of expertise, so if my proposed (trivial) patch is papering over some other issue, I'll happily leave the solution to you.
[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962 --- Comment #6 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #3) > I was thinking whether it wouldn't be better to expand x86 const or pure > builtins when lhs is ignored to nothing in the expanders. Something like this? --cut here-- diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index a53d69d5400..0f3d6108d77 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -13032,6 +13032,9 @@ ix86_expand_builtin (tree exp, rtx target, rtx subtarget, unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl); HOST_WIDE_INT bisa, bisa2; + if (ignore && (TREE_READONLY (fndecl) || DECL_PURE_P (fndecl))) +return const0_rtx; + /* For CPU builtins that can be folded, fold first and expand the fold. */ switch (fcode) { @@ -14401,9 +14404,6 @@ rdseed_step: return target; case IX86_BUILTIN_READ_FLAGS: - if (ignore) - return const0_rtx; - emit_insn (gen_pushfl ()); if (optimize --cut here--
[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962 --- Comment #4 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #3) > I was thinking whether it wouldn't be better to expand x86 const or pure > builtins when lhs is ignored to nothing in the expanders. Yes, this could be a better solution.
[Bug target/112962] [14 Regression] ICE: SIGSEGV in operator() (recog.h:431) with -fexceptions -mssse3 and __builtin_ia32_pabsd128()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112962 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Last reconfirmed||2023-12-12 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak --- Created attachment 56862 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56862&action=edit Proposed patch Patch in testing.
[Bug rtl-optimization/112760] [14 Regression] wrong code with -O2 -fno-dce -fno-guess-branch-probability -m8bit-idiv -mavx --param=max-cse-insns=0 and __builtin_add_overflow_p()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112760 Uroš Bizjak changed: What|Removed |Added Component|target |rtl-optimization Last reconfirmed||2023-11-29 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Target Milestone|--- |14.0 --- Comment #2 from Uroš Bizjak --- With the original testcase, ce1 pass is if-converting: 20: flags:CCZ=cmp(r110:SI,r111:SI) REG_DEAD r111:SI REG_DEAD r110:SI 21: pc={(flags:CCZ==0)?L23:pc} REG_DEAD flags:CCZ 39: NOTE_INSN_BASIC_BLOCK 5 22: r103:HI=0x1 23: L23: with: IF-THEN-JOIN block found, pass 2, test 2, then 5, join 6 scanning new insn with uid = 45. scanning new insn with uid = 44. scanning new insn with uid = 46. if-conversion succeeded through noce_try_cmove Removing jump 21. deleting insn with uid = 21. deleting insn with uid = 22. to: 20: flags:CCZ=cmp(r110:SI,r111:SI) REG_DEAD r111:SI REG_DEAD r110:SI 45: r118:HI=0x1 44: flags:CCZ=cmp(r110:SI,r111:SI) 46: r103:HI={(flags:CCZ==0)?r103:HI:r118:HI} And things go downhill from here. Before postreload we have: 20: flags:CCZ=cmp(ax:SI,dx:SI) REG_UNUSED flags:CCZ 44: flags:CCZ=cmp(ax:SI,dx:SI) REG_DEAD dx:SI REG_DEAD ax:SI 62: ax:HI=0x1 REG_EQUIV 0x1 46: bx:HI={(flags:CCZ==0)?bx:HI:ax:HI} REG_DEAD flags:CCZ REG_DEAD ax:HI and in posteload pass (insn 44) is removed: 20: flags:CCZ=cmp(ax:SI,dx:SI) REG_UNUSED flags:CCZ 62: ax:HI=0x1 REG_EQUIV 0x1 46: bx:HI={(flags:CCZ==0)?bx:HI:ax:HI} REG_DEAD flags:CCZ REG_DEAD ax:HI here comes pro_and_epilogue pass that detects "unused" (insn 20) and removes it: df_analyze called deleting insn with uid = 20. Confirmed as RTL optimization problem.
[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560 Uroš Bizjak changed: What|Removed |Added Keywords||patch --- Comment #4 from Uroš Bizjak --- Patch at [1]. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638589.html
[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|NEW |ASSIGNED --- Comment #3 from Uroš Bizjak --- Created attachment 56705 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56705&action=edit Proposed patch The code assumes that cc_use_loc represents a comparison operator. Skip the modification of CC-using operation if this is not the case.
[Bug target/112494] ICE in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Target Milestone|--- |14.0 Resolution|--- |FIXED --- Comment #10 from Uroš Bizjak --- Fixed for 14.0.
[Bug middle-end/112560] [14 Regression] ICE in try_combine on pr112494.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560 Bug 112560 depends on bug 112494, which changed state. Bug 112494 Summary: ICE in ix86_cc_mode, at config/i386/i386.cc:16477 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug target/112686] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -fsplit-stack -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112686 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED CC|uros at gcc dot gnu.org| --- Comment #5 from Uroš Bizjak --- Fixed.
[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672 Uroš Bizjak changed: What|Removed |Added Target Milestone|14.0|11.5 --- Comment #9 from Uroš Bizjak --- Fixed everywhere.
[Bug target/112686] [14 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1176 with -fsplit-stack -mcmodel=large
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112686 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|NEW |ASSIGNED --- Comment #3 from Uroš Bizjak --- diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 7b922857d80..50e8826dbe5 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -10503,7 +10503,7 @@ ix86_expand_split_stack_prologue (void) fn = copy_to_suggested_reg (x, reg11, Pmode); } else - fn = split_stack_fn_large; + fn = copy_to_suggested_reg (split_stack_fn_large, reg11, Pmode); /* When using the large model we need to load the address into a register, and we've run out of registers. So we
[Bug target/89316] ICE with -mforce-indirect-call and -fsplit-stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89316 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Target Milestone|--- |14.0 Resolution|--- |FIXED --- Comment #16 from Uroš Bizjak --- Fixed for gcc-14.
[Bug target/112672] [14 Regression] wrong code with __builtin_parityl() at -O and above on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112672 Uroš Bizjak changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #4 from Uroš Bizjak --- (In reply to Andrew Pinski from comment #3) > parityhi2 should have: > rtx extra = gen_reg_rtx (HImode); > emit_move_insn (extra, operands[1]); > emit_insn (gen_parityhi2_cmp (extra)); > > Or something similar because parityqi2_cmp clobbers its argument. Exactly. I have a patch in testing.
[Bug target/112445] [14 Regression] ICE: in lra_split_hard_reg_for, at lra-assigns.cc:1861 unable to find a register to spill: {*umulditi3_1} with -O -march=cascadelake -fwrapv since r14-4968-g89e5d90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112445 --- Comment #6 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #4) > I think this goes wrong during combine. Combine does not / should not combine moves from hard registers just because of extending register live range. It looks that this should also include zero-extracts and other "pseudo-move" instructions. The relevant patch and discussion is at [1]. [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-10/msg01356.html
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #6 from Uroš Bizjak --- This is by design, CMOV should not be used instead of well predicted jumps. FYI, CMOV is quite problematic on x86, there are several PRs where conversion to CMOV resulted in 2x slower execution. Please see e.g.: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309#c26
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #5 from Uroš Bizjak --- Digging a bit further: if_info.max_seq_cost is calculated via targetm.max_noce_ifcvt_seq_cost, where without params set we return: return BRANCH_COST (true, predictable_p) * COSTS_N_INSNS (2); with: #define BRANCH_COST(speed_p, predictable_p) \ (!(speed_p) ? 2 : (predictable_p) ? 0 : ix86_branch_cost) So, the conversion is clearly not desirable for well predicted jumps.
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #4 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #3) > (In reply to Andrew Pinski from comment #2) > > > Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works > > on aarch64. Note there are some new changes to ifcvt.cc in review which > > might improve this, though I am not sure. > > x86_64 targetm.noce_conversion_profitable_p returns false for: Actually, the cost function goes to default_noce_conversion_profitable_p, where: (gdb) p cost $1 = 16 (gdb) p if_info->original_cost $2 = 8 (gdb) p if_info->max_seq_cost $3 = 0 For some reason, max_seq_cost remains zero, while on aarch64: (gdb) p cost $2 = 12 (gdb) p if_info->original_cost $3 = 8 (gdb) p if_info->max_seq_cost $4 = 12 So, x86_64 returns false from the default cost function: /* When compiling for size, we can make a reasonably accurately guess at the size growth. When compiling for speed, use the maximum. */ return speed_p && cost <= if_info->max_seq_cost;
[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #3 from Uroš Bizjak --- (In reply to Andrew Pinski from comment #2) > Someone will have to debug ifcvt.cc to see why it fails on x86_64 but works > on aarch64. Note there are some new changes to ifcvt.cc in review which > might improve this, though I am not sure. x86_64 targetm.noce_conversion_profitable_p returns false for: (insn 20 0 19 (set (reg:SI 101) (const_int -9 [0xfff7])) 85 {*movsi_internal} (nil)) (insn 19 20 21 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:SI 99 [ c ]) (const_int 14 [0xe]))) 11 {*cmpsi_1} (nil)) (insn 21 19 0 (set (reg/v:SI 99 [ c ]) (if_then_else:SI (ne (reg:CCZ 17 flags) (const_int 0 [0])) (reg/v:SI 99 [ c ]) (reg:SI 101))) 1438 {*movsicc_noc} (nil))
[Bug target/89316] ICE with -mforce-indirect-call and -fsplit-stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89316 Uroš Bizjak changed: What|Removed |Added Keywords||patch --- Comment #14 from Uroš Bizjak --- Patch at [1]. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637478.html
[Bug target/89316] ICE with -mforce-indirect-call and -fsplit-stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89316 Uroš Bizjak changed: What|Removed |Added Attachment #56637|0 |1 is obsolete|| --- Comment #13 from Uroš Bizjak --- Created attachment 56647 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56647&action=edit Proposed patch v2 New version, also fixes "-fsplit-stack -fpic -mforce-indirect-call" on 32-bit targets.
[Bug target/89316] ICE with -mforce-indirect-call and -fsplit-stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89316 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|NEW |ASSIGNED --- Comment #12 from Uroš Bizjak --- Created attachment 56637 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56637&action=edit Proposed patch Patch that implements ideas from Comment 7 and Comment 8.
[Bug target/111657] Memory copy with structure assignment from named address space should be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111657 --- Comment #9 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #8) > I'd say it is a user error to invoke memcpy/memset etc. with pointers to > non-default address spaces, and for aggregate copies the middle-end should > ensure that the copying is not done using library calls; is that the case > and the problem was just that optab expansion was allowed for the structure > copies and the backend decided to use libcall in that case? Yes, the stringop selection mechanism chose libcall strategy. However, the call to memcpy is unavailable for non-default address space, so the middle-end expanded the call into most trivial byte-copy loop. The patch just teaches stringop selection to use optimized copy loop as a last resort with non-default address spaces instead.
[Bug middle-end/112581] [14 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu (generated code hangs)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112581 --- Comment #3 from Uroš Bizjak --- (In reply to Andrew Pinski from comment #1) > It might be one of the x86 specific target patches ... I don't think so, these patches deal specifically with high registers, and: $ grep %.h pr112581.s finds none.
[Bug target/112567] [14 regression] ICE in RTL pass: split2: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112567 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Uroš Bizjak --- Fixed.
[Bug target/112567] [14 regression] ICE in RTL pass: split2: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112567 Uroš Bizjak changed: What|Removed |Added Last reconfirmed||2023-11-16 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #1 from Uroš Bizjak --- Mine, due to [1], this time I managed to split to invalid RTX... I have a patch. [1] https://gcc.gnu.org/pipermail/gcc-cvs/2023-November/393104.html
[Bug target/112540] [14 regression] ICE in extract_insn, at recog.cc:2804 since r14-5456-gb42a09b258c3ed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112540 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Uroš Bizjak --- Fixed.
[Bug target/112540] [14 regression] ICE in extract_insn, at recog.cc:2804 since r14-5456-gb42a09b258c3ed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112540 Uroš Bizjak changed: What|Removed |Added Last reconfirmed||2023-11-15 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Target Milestone|--- |14.0 Ever confirmed|0 |1 Host||x86 Status|UNCONFIRMED |ASSIGNED --- Comment #4 from Uroš Bizjak --- https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636593.html
[Bug target/112494] ICE in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 --- Comment #7 from Uroš Bizjak --- It looks to me that gcc_unreachable is problematic in SELECT_CC_MODE. We should simply return CCmode for all unrecognised RTX: diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 2c80fd8ebf3..5b87361e2e1 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -16469,12 +16504,9 @@ ix86_cc_mode (enum rtx_code code, rtx op0, rtx op1) return CCNOmode; else return CCGCmode; - /* strcmp pattern do (use flags) and combine may ask us for proper -mode. */ -case USE: - return CCmode; default: - gcc_unreachable (); + /* CCmode should be used in all other cases. */ + return CCmode; } } Using the above patch, we can also define cmpstrnqi_1 to what it really does: @@ -22954,9 +22958,8 @@ (define_expand "cmpstrnqi_1" (const_int 0)) (compare:CC (match_operand 4 "memory_operand") (match_operand 5 "memory_operand")) - (const_int 0))) + (reg:CC FLAGS_REG))) (use (match_operand:SI 3 "immediate_operand")) - (use (reg:CC FLAGS_REG)) (clobber (match_operand 0 "register_operand")) (clobber (match_operand 1 "register_operand")) (clobber (match_dup 2))])]
[Bug target/112494] ICE in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 --- Comment #6 from Uroš Bizjak --- Now we have: #1 0x0286a3aa in try_combine (i3=0x7fffe3c18100, i2=0x7fffe3c18000, i1=0x0, i0=0x0, new_direct_jump_p=0x7fffd8eb, last_combined_insn=0x7fffe3c18100) at ../../git/gcc/gcc/combine.cc:3207 3207= SELECT_CC_MODE (compare_code, op0, op1); (gdb) p compare_code $1 = UNSPEC compare_code = UNSPEC won't fly...
[Bug target/112494] ICE in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 --- Comment #5 from Uroš Bizjak --- Created attachment 56567 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56567&action=edit Proposed patch Nope, even with the above patch the compiler ICEs at the same place: 0x1956968 ix86_cc_mode(rtx_code, rtx_def*, rtx_def*) ../../git/gcc/gcc/config/i386/i386.cc:16508 0x286a3a9 try_combine ../../git/gcc/gcc/combine.cc:3207 0x2864cbf combine_instructions ../../git/gcc/gcc/combine.cc:1264 Trying 5 -> 8: 5: r98:DI=0xd7 8: flags:CCZ=cmp(r98:DI,0) REG_EQUAL cmp(0xd7,0) (insn 5 2 6 2 (set (reg/v:DI 98 [ flags ]) (const_int 215 [0xd7])) "pr112494.c":10:15 84 {*movdi_internal} (nil)) (insn 6 5 7 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (const_int 215 [0xd7])) "/hdd/uros/gcc-build-fast/gcc/include/ia32intrin.h":270:3 58 {*pushdi2_rex64} (nil)) (insn 7 6 8 2 (set (reg:CC 17 flags) (unspec:CC [ (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0 S8 A8]) ] UNSPEC_SET_FLAGS)) "/hdd/uros/gcc-build-fast/gcc/include/ia32intrin.h":270:3 72 {*popfldi1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 8 7 11 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:DI 98 [ flags ]) (const_int 0 [0]))) "pr112494.c":12:9 8 {*cmpdi_ccno_1} (expr_list:REG_EQUAL (compare:CCZ (const_int 215 [0xd7]) (const_int 0 [0])) (nil))) (insn 11 8 12 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (unspec:DI [ (reg:CC 17 flags) ] UNSPEC_GET_FLAGS)) "/hdd/uros/gcc-build-fast/gcc/include/ia32intrin.h":262:10 70 {*pushfldi2} (expr_list:REG_DEAD (reg:CC 17 flags) (nil))) There is nothing suspicious in target code anymore (IMO, the above patch should be applied nevertheless, the register modes are now fully correct)
[Bug target/112494] ICE in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 Uroš Bizjak changed: What|Removed |Added Component|rtl-optimization|target Status|NEW |ASSIGNED
[Bug rtl-optimization/112494] ICE in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 --- Comment #4 from Uroš Bizjak --- (In reply to Andrew Pinski from comment #3) > I almost want to say this is a bug in the x86 back-end where it pushes the > flags onto the stack. Yes, could be - let me look into this a bit more.
[Bug rtl-optimization/112494] GCC: 14: internal compiler error: in ix86_cc_mode, at config/i386/i386.cc:16477
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494 Uroš Bizjak changed: What|Removed |Added Last reconfirmed||2023-11-12 Ever confirmed|0 |1 Component|target |rtl-optimization Status|UNCONFIRMED |NEW --- Comment #1 from Uroš Bizjak --- Combine pass is trying to combine: Trying 5 -> 8: 5: r98:DI=0xd7 8: flags:CCZ=cmp(r98:DI,0) REG_EQUAL cmp(0xd7,0) where: (insn 5 2 6 2 (set (reg/v:DI 98 [ flags ]) (const_int 215 [0xd7])) "pr112494.c":10:26 84 {*movdi_internal} (nil)) (insn 8 7 11 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:DI 98 [ flags ]) (const_int 0 [0]))) "pr112494.c":12:9 8 {*cmpdi_ccno_1} (expr_list:REG_EQUAL (compare:CCZ (const_int 215 [0xd7]) (const_int 0 [0])) (nil))) and calls ix86_cc_mode with: Breakpoint 1, ix86_cc_mode (code=code@entry=SET, op0=0x7fffe3e37680, op1=0x7fffea209490) code = SET will trigger gcc_unreachable() at the end of the ix86_cc_mode function. Confirmed as a generic RTL optimization problem.
[Bug target/110790] [14 Regression] gcc -m32 generates invalid bit test code on gmp-6.2.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110790 --- Comment #9 from Uroš Bizjak --- (In reply to Andrew Pinski from comment #8) > I need some code generation help for gcc.target/i386/pr110790-2.c, I have a > patch where we now generate: > ``` > movq(%rdi,%rax,8), %rax > shrq%cl, %rax > andl$1, %eax > ``` > > instead of previously: > ``` > movq(%rdi,%rax,8), %rax > btq %rsi, %rax > setc%al > movzbl %al, %eax > ``` > > I suspect the sequence that contains shrq/and is better but I am 100% sure. > We still get btq when used with a conditional too. The new sequence is better. It does not create a partial reg write (setc needs a clearing XOR in fron of CC-setting instruction).
[Bug target/97503] Suboptimal use of cntlzw and cntlzd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503 --- Comment #7 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #6) > (In reply to LIU Hao from comment #4) > > Are there any reasons why this was not done for 64? > > (https://gcc.godbolt.org/z/7vddPdxaP) > > There is zero-extension from the result of __builtin_clzll that confuses > optimizers. Actually, sign-extension, but the result is never sign-extended.
[Bug target/97503] Suboptimal use of cntlzw and cntlzd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503 --- Comment #6 from Uroš Bizjak --- (In reply to LIU Hao from comment #4) > Are there any reasons why this was not done for 64? > (https://gcc.godbolt.org/z/7vddPdxaP) There is zero-extension from the result of __builtin_clzll that confuses optimizers.
[Bug target/112332] [14 regression] ICE: internal compiler error: in extract_constrain_insn, at recog.cc:2705
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112332 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Target||x86 Status|UNCONFIRMED |RESOLVED Target Milestone|--- |14.0 --- Comment #5 from Uroš Bizjak --- Fixed.
[Bug target/112332] [14 regression] ICE: internal compiler error: in extract_constrain_insn, at recog.cc:2705
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112332 --- Comment #3 from Uroš Bizjak --- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 35d073c9a21..75c75f610c2 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -25748,7 +25748,7 @@ (define_peephole2 (set (match_operand:W 2 "general_reg_operand") (const_int 0)) (clobber (reg:CC FLAGS_REG))]) (set (match_operand:SWI48 3 "general_reg_operand") - (match_operand:SWI48 4 "general_operand"))] + (match_operand:SWI48 4 "general_gr_operand"))] "peep2_reg_dead_p (0, operands[3]) && peep2_reg_dead_p (1, operands[2])" [(parallel [(set (match_dup 0)
[Bug target/112332] [14 regression] ICE: internal compiler error: in extract_constrain_insn, at recog.cc:2705
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112332 --- Comment #2 from Uroš Bizjak --- (In reply to Sergei Trofimovich from comment #1) > Slightly shorter example: > > typedef union { > double d; > int L[2]; > } U; > void d2b(int*); > void _Py_dg_dtoa(double dd) { > int be; > U u; > u.d = dd; > if ((&u)->L[1]) > d2b(&be); > } Let's put back those extran constraints...
[Bug target/110551] [11/12/13/14 Regression] an extra mov when doing 128bit multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551 --- Comment #7 from Uroš Bizjak --- (In reply to CVS Commits from comment #5) > The master branch has been updated by Roger Sayle : > > https://gcc.gnu.org/g:89e5d902fc55ad375f149f25a84c516ad360a606 > > commit r14-4968-g89e5d902fc55ad375f149f25a84c516ad360a606 > Author: Roger Sayle > Date: Fri Oct 27 10:03:53 2023 +0100 Looks like the patch regressed -march=cascadelake. https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634660.html
[Bug target/111698] Narrow memory access of compare to byte width
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111698 Uroš Bizjak changed: What|Removed |Added Target|x86_64-*-* |x86-*-* Target Milestone|--- |14.0 Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Uroš Bizjak --- Implemented for gcc-14.
[Bug target/111698] Narrow memory access of compare to byte width
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111698 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Ever confirmed|0 |1 Last reconfirmed||2023-10-24 --- Comment #3 from Uroš Bizjak --- Created attachment 56187 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56187&action=edit Propsed patch
[Bug sanitizer/111736] New: Address sanitizer is not compatible with named address spaces
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111736 Bug ID: 111736 Summary: Address sanitizer is not compatible with named address spaces Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- >From [1], gcc is doing a KASAN check on a percpu address (when percpu access is implemented using named address spaces). This is not a "real" address, just an offset from the segment register. The testcase --cut here-- int __seg_gs m; int foo (void) { return m; } --cut here-- does not show any special handling that would handle segment registers. [1] https://lore.kernel.org/lkml/CAHk-=wi6u-o1wdpoesuce6qo2oapu0hezaig0udou4l5cre...@mail.gmail.com/
[Bug target/111657] Memory copy with structure assignment from named address space should be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111657 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Target Milestone|--- |14.0 Resolution|--- |FIXED --- Comment #7 from Uroš Bizjak --- Fixed.
[Bug target/111698] Narrow memory access of compare to byte width
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111698 --- Comment #2 from Uroš Bizjak --- (In reply to Richard Biener from comment #1) > I guess we could do this even on GIMPLE and in general to aligned sub-word > accesses (where byte accesses are always aligned). > > It might be also a good fit for RTL forwprop or that mem-offset pass in > development. I don't think this optimization should be universally enabled. According to Agner Fog, older x86 cores suffer from store forwarding stall when smaller read doesn't start at the same address. Intel Sandybridge and AMD Steamroller families relaxed this constraint.
[Bug target/111698] New: Narrow memory access of compare to byte width
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111698 Bug ID: 111698 Summary: Narrow memory access of compare to byte width Product: gcc Version: 12.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com Target Milestone: --- Following testcase: --cut here-- int m; _Bool foo (void) { return m & 0x0f; } --cut here-- compiles to: 0: f7 05 00 00 00 00 00testl $0xf,0x0(%rip) 7: 00 0f 00 The test instruction can be demoted to byte test from addr+2. Currently, the demotion works for lowest byte, so the testcase: --cut here-- int m; _Bool foo (void) { return m & 0x0f; } --cut here-- compiles to: 0: f6 05 00 00 00 00 0ftestb $0xf,0x0(%rip) which is three bytes shorter. Any half-way modern Intel and AMD cores will forward any fully contained load, so there is no danger of forwarding stall with recent CPU cores.
[Bug target/111657] Memory copy with structure assignment from named address space should be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111657 --- Comment #5 from Uroš Bizjak --- I have tried to compile with -mtune=nocona that has: static stringop_algs nocona_memcpy[2] = { {libcall, {{12, loop_1_byte, false}, {-1, rep_prefix_4_byte, false}}}, {libcall, {{32, loop, false}, {2, rep_prefix_8_byte, false}, {10, unrolled_loop, false}, {-1, libcall, false; and compiler produces code as expected in both cases (use unrolled_loop when rep movsq is unavailable): foo: movq%fs:0, %rdx leaqt@tpoff(%rdx), %rsi movl$30, %ecx rep movsq ret bar: xorl%edx, %edx .L4: movl%edx, %eax movq%gs:s(%rax), %r9 movq%gs:s+8(%rax), %r8 movq%gs:s+16(%rax), %rsi movq%gs:s+24(%rax), %rcx movq%r9, (%rdi,%rax) movq%r8, 8(%rdi,%rax) movq%rsi, 16(%rdi,%rax) movq%rcx, 24(%rdi,%rax) addl$32, %edx cmpl$224, %edx jb .L4 addq%rdx, %rdi movq%gs:s(%rdx), %rax movq%rax, (%rdi) movq%gs:s+8(%rdx), %rax movq%rax, 8(%rdi) ret
[Bug target/111657] Memory copy with structure assignment from named address space should be improved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111657 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|NEW |ASSIGNED --- Comment #4 from Uroš Bizjak --- Created attachment 56030 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56030&action=edit Propsed patch Proposed patch declares libcall algorithm unavailable to non-default address spaces and falls back to a loop if everything else fails. The following testcase: --cut here-- struct a { long arr[30]; }; __thread struct a t; void foo (struct a *dst) { *dst = t; } __seg_gs struct a s; void bar (struct a *dst) { *dst = s; } --cut here-- now compiles (-O2 -mno-sse) to: foo: movq%fs:0, %rdx movl$30, %ecx leaqt@tpoff(%rdx), %rsi rep movsq ret bar: xorl%eax, %eax .L4: movl%eax, %edx addl$8, %eax movq%gs:s(%rdx), %rcx movq%rcx, (%rdi,%rdx) cmpl$240, %eax jb .L4 ret (rep movsq copies only from the default ds: address space)
[Bug middle-end/111657] Memory copy with structure assignment from named address space is not working
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111657 Uroš Bizjak changed: What|Removed |Added Depends on||79649 --- Comment #1 from Uroš Bizjak --- Looks like another issue with IVopts (PR79649). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79649 [Bug 79649] Memset pattern in named address space crashes compiler or generates wrong code
[Bug middle-end/111657] New: Memory copy with structure assignment from named address space is not working
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111657 Bug ID: 111657 Summary: Memory copy with structure assignment from named address space is not working Product: gcc Version: 12.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com Target Milestone: --- Taken from [1]. Compile the following testcase with -O2 -mno-sse: --cut here-- struct a { long arr[30]; }; __seg_gs struct a m; void foo (struct a *dst) { *dst = m; } --cut here-- the produced assembly: foo: .LFB0: xorl%eax, %eax cmpq$240, %rax jnb .L5 .L2: movzbl %gs:m(%rax), %edx movb%dl, (%rdi,%rax) addq$1, %rax cmpq$240, %rax jb .L2 .L5: ret As rightfully said in [1]: "...and look at the end result. It's complete and utter sh*t: <...> to the point that I can only go "WTF"? I mean, it's not just that it does the copy one byte at a time. It literally compares %rax to $240 just after it has cleared it. I look at that code, and I go "a five-year old with a crayon could have done better". [1] https://lore.kernel.org/lkml/CAHk-=wh+cfn58xxmlng6dh+eb9-2dyfabxjf2ftsz+vfqvv...@mail.gmail.com/
[Bug target/111340] gcc.dg/bitint-12.c fails on x86_64-apple-darwin or fails on x86_64-linux-gnu with -fPIE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111340 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |11.5 Status|ASSIGNED|RESOLVED --- Comment #11 from Uroš Bizjak --- Fixed.
[Bug target/111340] gcc.dg/bitint-12.c fails on x86_64-apple-darwin or fails on x86_64-linux-gnu with -fPIE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111340 Uroš Bizjak changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com CC|uros at gcc dot gnu.org| --- Comment #5 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #4) > Of course, what exactly falls under the "g" constraint is target specific. > Though, because that constraint also allows the constant to be reload into a > register, > if such constant isn't valid, then RA should have reloaded it into register > or memory. > > Seems the failure is that i386.cc (output_pic_addr_const) doesn't have the > CONST_WIDE_INT case unlike output_addr_const. Indeed. Patch in testing: --cut here-- diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 1cef7ee8f1a..477e6cecc38 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -12344,8 +12344,8 @@ output_pic_addr_const (FILE *file, rtx x, int code) assemble_name (asm_out_file, buf); break; -case CONST_INT: - fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (x)); +CASE_CONST_SCALAR_INT: + output_addr_const (file, x); break; case CONST: --cut here--
[Bug target/111165] [13 regression] builtin strchr miscompiles on Debian/x32 with dietlibc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65 Uroš Bizjak changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #14 from Uroš Bizjak --- (In reply to Thorsten Glaser from comment #13) > The interesting part is around the occurrence of… > > # eval.c:399: sp = cstrchr(sp, '\0') + 1; > > … in the .s files (it occurs thrice, the first is the beginning of the setup > part, the second and third surround the strlen call, so they’re all within a > bunch of lines). Unfortunately, the runtime bug requires test that fails at runtime; the attached dumps are not that usable. The fact that the compiler fails for not so common target makes things even harder. I think that the best way forward is to create a minimized standalone testcase (From Comment #11 it looks that the issue is independent of dietlibc) that can be compiled with -mx32 in a kind of cross-compiler fashion. You can use -maddress-mode=long with -mx32 to create a .s assembly file that is compatible with x86_64, as far as stack handling is concerned. The resulting .s assembly can then be compiled and linked with a C wrapper, so a testcase that eventually fails on x86_64 can be produced. IOW, does the testcase fail when -maddress-mode=long is used?
[Bug target/110762] [11/12/13 Regression] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Target Milestone|13.3|14.0 Resolution|--- |FIXED --- Comment #25 from Uroš Bizjak --- Let's keep this patch to gcc-14+. The compiler now sanitizes every partial vector input to potentially trapping instructions. OTOH, the patch introduced noticeable runtime regression, so in a follow-up patch (PR110832) -fno-trapping-math removes sanitization fixups (and the documentation documents possible issues with assembler and builtins passing non-conformat FP values), and -m[no-]partial-vector-fp-math option is introduced to completely disable potentially traping instructions for partial vectors. So, fixed for gcc-14+.
[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 Uroš Bizjak changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #13 from Uroš Bizjak --- Let's keep this patch to gcc-14+. The runtime regression is now due to strict IEEE compilance, where the compiler sanitizes every partial vector input to potentially trapping instructions. OTOH, -fno-trapping-math removes sanitization fixups (and the documentation documents possible issues with assembler and builtins passing non-conformat FP values), and -m[no-]partial-vector-fp-math option is introduced to completely disable potentially traping instructions for partial vectors. So, fixed for gcc-14+.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Uroš Bizjak changed: What|Removed |Added Target Milestone|--- |14.0 Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #10 from Uroš Bizjak --- Implemented for gcc-14.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #7 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #6) > > So, the compiler still expects vec_concat/vec_select patterns to be present. > > v2df foo_v2df (v2df x) > { >return __builtin_shuffle (x, (v2df) { 0, 0 }, (v2di) { 0, 2 }); > } > > The testcase is not a typical vec_merge case, for vec_merge, the shuffle > index should be {0, 3}. Here it happened to be a vec_merge because the > second vector is all zero. And yes for this case, we still need to > vec_concat:vec_select pattern. I guess the original patch is the way to go then.
[Bug target/111010] [13/14 regression] error: unable to find a register to spill compiling GCDAProfiling.c since r13-5092-g4e0b504f26f78f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111010 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #20 from Uroš Bizjak --- Fixed for gcc-13.3+
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #5 from Uroš Bizjak --- Created attachment 55778 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55778&action=edit Failing patch, for reference Patch that converts vec_concat/vec_select sse2_movq128 patterns to vec_merge.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 --- Comment #4 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #3) > in x86 backend expand_vec_perm_1, we always tries vec_merge frist for > !one_operand_p, expand_vselect_vconcat is only tried when vec_merge failed > which means we'd better to use vec_merge instead of vec_select:vec_concat > when available in out backend pattern match. In fact, I tried to convert existing sse2_movq128 patterns to vec_merge, but the patch regressed: -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler movq -FAIL: gcc.target/i386/sse2-pr94680-2.c scan-assembler-not pxor -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-not pxor -FAIL: gcc.target/i386/sse2-pr94680.c scan-assembler-times (?n)(?:mov|psrldq).*%xmm[0-9] 12 So, the compiler still expects vec_concat/vec_select patterns to be present.
[Bug target/94866] Failure to optimize pinsrq of 0 with index 1 into movq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94866 Uroš Bizjak changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #2 from Uroš Bizjak --- Created attachment 55776 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55776&action=edit Proposed patch Patch that introduces alternative MOVQ RTX definition.
[Bug target/111010] [13/14 regression] error: unable to find a register to spill compiling GCDAProfiling.c since r13-5092-g4e0b504f26f78f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111010 Uroš Bizjak changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #17 from Uroš Bizjak --- (In reply to r...@cebitec.uni-bielefeld.de from comment #16) > >> Regtested on i386-pc-solaris2.11; compiles both the reduced and the full > >> testcase with ICE. > > > > *WITH* ICE? > > With*out* ICE. Sorry for being too dumb to type ;-) Oh, thanks. I'll take care of the bug later today/tomorrow.
[Bug target/111010] [13/14 regression] error: unable to find a register to spill compiling GCDAProfiling.c since r13-5092-g4e0b504f26f78f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111010 --- Comment #15 from Uroš Bizjak --- (In reply to r...@cebitec.uni-bielefeld.de from comment #13) > > --- Comment #11 from Uroš Bizjak --- > > Created attachment 55772 [details] > > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55772&action=edit > > The correct proposed patch > > > > Eh, sorry for wrong attachment. This is the correct one. > > Regtested on i386-pc-solaris2.11; compiles both the reduced and the full > testcase with ICE. *WITH* ICE?
[Bug target/111010] [13/14 regression] error: unable to find a register to spill compiling GCDAProfiling.c since r13-5092-g4e0b504f26f78f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111010 --- Comment #12 from Uroš Bizjak --- gcc-13 version: --cut here-- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 5363b37d448..df476763f85 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -11527,7 +11527,8 @@ (define_insn_and_split "*concat3_3" { split_double_concat (mode, operands[0], operands[3], operands[1]); DONE; -}) +} + [(set_attr "isa" "*,*,*,x64")]) (define_insn_and_split "*concat3_4" [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r") @@ -11545,7 +11546,8 @@ (define_insn_and_split "*concat3_4" { split_double_concat (mode, operands[0], operands[1], operands[2]); DONE; -}) +} + [(set_attr "isa" "*,*,*,x64")]) (define_insn_and_split "*concat3_5" [(set (match_operand:DWI 0 "nonimmediate_operand" "=r,o,o") --cut here--
[Bug target/111010] [13/14 regression] error: unable to find a register to spill compiling GCDAProfiling.c since r13-5092-g4e0b504f26f78f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111010 Uroš Bizjak changed: What|Removed |Added Attachment #55771|0 |1 is obsolete|| --- Comment #11 from Uroš Bizjak --- Created attachment 55772 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55772&action=edit The correct proposed patch Eh, sorry for wrong attachment. This is the correct one.
[Bug target/111010] [13/14 regression] error: unable to find a register to spill compiling GCDAProfiling.c since r13-5092-g4e0b504f26f78f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111010 --- Comment #10 from Uroš Bizjak --- Created attachment 55771 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55771&action=edit Proposed patch This (untested) patch should solve the PR on trunk.
[Bug target/111010] [13/14 regression] error: unable to find a register to spill compiling GCDAProfiling.c since r13-5092-g4e0b504f26f78f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111010 --- Comment #9 from Uroš Bizjak --- (In reply to r...@cebitec.uni-bielefeld.de from comment #8) > > --- Comment #7 from Richard Biener --- > > > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > > index f3a3305ac4f..d38b9d764d8 100644 > > --- a/gcc/config/i386/i386.md > > +++ b/gcc/config/i386/i386.md > > @@ -11511,7 +11511,7 @@ > > }) > > > > (define_insn_and_split "*concat3_3" > > - [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,&r") > > + [(set (match_operand: 0 "nonimmediate_operand" "=ro,r,r,!&r") > > (any_or_plus: > > (ashift: > > (zero_extend: > > > > fixes the issue for me, this disparages the &r,m,m alternative since > > that makes any reloading difficult(?) and the early-clobber output > > makes register pressure even harder to deal with. > > On the gcc-13 branch, it does indeed, both for the reduced testcase and > the original one. I've also successfully regtested the patch just in > case. I think you should add: (set_attr "isa" "*,*,*,x64") attribute to hard disable 32bit targets from having two memory operands.
[Bug target/111023] missing extendv4siv4hi (and friends)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111023 Uroš Bizjak changed: What|Removed |Added Assignee|ubizjak at gmail dot com |unassigned at gcc dot gnu.org Status|ASSIGNED|NEW CC||ubizjak at gmail dot com --- Comment #7 from Uroš Bizjak --- The target part is now implemented (even for SSE2). Should we keep this PR open as a tree-vectorizer enhancement?
[Bug target/111023] missing extendv4siv4hi (and friends)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111023 --- Comment #4 from Uroš Bizjak --- Created attachment 55753 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55753&action=edit Proposed patch Patch that implements zero/sign extend of <= 64byte vector modes to a wider vector mode also for SSE2.
[Bug target/111023] missing extendv4siv4hi (and friends)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111023 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2023-08-18 Ever confirmed|0 |1 --- Comment #3 from Uroš Bizjak --- The idea of implementing some sign/zero extensions using PUNPCKL?? is quite interesting. We can implement extensions for all <= 64byte vector modes that extend to wider vector mode also for SSE2. I have a patch.
[Bug target/111023] missing extendv4siv4hi (and friends)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111023 --- Comment #1 from Uroš Bizjak --- (In reply to Richard Biener from comment #0) > We could vectorize gcc.dg/vect/pr65947-7.c if we implement the > extendv4siv4hi pattern (sign-extend V4HI to V4SI). We can already do > vec_unpacks_lo via > > pcmpgtw %xmm0, %xmm1 > movdqa %xmm0, %xmm2 > punpcklwd %xmm1, %xmm2 > > and that would trivially extend to the required pattern - just the > input is v4hi instead of v8hi. > > Other related patterns are probably missing as well, where we can do > vec_unpack[s]_lo we should be able to implement [zero_]extend. We have: (define_expand "v4hiv4si2" [(set (match_operand:V4SI 0 "register_operand") (any_extend:V4SI (match_operand:V4HI 1 "nonimmediate_operand")))] "TARGET_SSE4_1" in sse.md, so the testcase should be vectorized using -msse4.1. Is there any other pattern missing for efficient vectorization?
[Bug tree-optimization/110991] [14 Regression] Dead Code Elimination Regression at -O2 since r14-1135-gc53f51005de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110991 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-08-11 --- Comment #1 from Uroš Bizjak --- For gcc-13, fre4 pass is able to simplify the scalar code, but nothing simplifies vectorized code in gcc-14.
[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 Uroš Bizjak changed: What|Removed |Added Last reconfirmed||2023-08-09 Keywords|needs-bisection | Ever confirmed|0 |1 Status|UNCONFIRMED |NEW
[Bug fortran/110957] New: -ffpe-trap and -ffpe-summary options issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110957 Bug ID: 110957 Summary: -ffpe-trap and -ffpe-summary options issues Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com Target Milestone: --- A couple of issues with -ffpe-trap and -ffpe-summary options: a) Invalid argument report should be switched: $ gfortran -ffpe-summary=aaa ac.f90 f951: Fatal Error: Argument to ‘-ffpe-trap’ is not valid: aaa compilation terminated. $ gfortran -ffpe-trap=aaa ac.f90 f951: Fatal Error: Argument to ‘-ffpe-summary’ is not valid: aaa compilation terminated. b) Specifying also -fno-trapping-math should be detected and handled $ gfortran -ffpe-trap=invalid -fno-trapping-math ac.f90 [no diagnostics] The issue b) should either report incompatibility between options, or force -ftrapping-math (probably with a warning). Ideally, -ffpe-* should always set flag_trapping_math, in case the compiler switches to no trapping math by default in future.
[Bug rtl-optimization/110587] [14 regression] 96% pr28071.c compile time regression since r14-2337-g37a231cc7594d1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110587 --- Comment #20 from Uroš Bizjak --- Can we revert the Comment #13 kludge now?
[Bug target/110762] [11/12/13 Regression] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #22 from Uroš Bizjak --- It looks to me that partial vector half-float instructions have the same issue.
[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 --- Comment #10 from Uroš Bizjak --- (In reply to Hongtao.liu from comment #9) > for mov_internal, we can just set alternative (v,v) with mode DI, then > it will use vmovq, for other alternatives which set sse_regs, the > instructions has already cleared the upper bits. Move instructions can be sanitized in ix86_expand_vector_move. If the target is in V2SFmode and the source is a subreg register, then movq_v2sf_to_sse should be emitted. However, we would still like to emit MOVAPS reg, reg for V2SF to V2SF moves, because MOVAPS may be eliminated by hardware, while MOVQ won't be.
[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 --- Comment #8 from Uroš Bizjak --- (In reply to Richard Biener from comment #6) > Do we know whether we could in theory improve the sanitizing by optimization > without -funsafe-math-optimizations (I think -fno-trapping-math, > -ffinite-math-only -fno-signalling-nans should be a better guard?)? Regarding the sanitizing, we can remove all sanitizing MOVQ instructions between trapping instructions (IOW, the result of ADDPS is guaranteed to have zeros in the high part outside V2SF, so MOVQ is unnecessary in front of a follow-up MULPS). I think that some instruction back-walking pass on the RTL insn stream would be able to identify these unnecessary instructions and remove them. Also, as mentioned elsewhere, it is really hard to get non-zero value to the highpart of XMM register. The compiler takes great care to always load values via MOVQ, so one has to craft a special code that works around all these fences. OTOH, in two years since gcc-11 was released with the V2SF support, not a single PR involving spurious exceptions was reported. Even capacita benchmark enables: Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL without problems. As an example here, it looks that polyhedron capacita greatly benefits from V2SF vectors, and I was surprised that sanitizing MOVQ has such an effect here.
[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 --- Comment #7 from Uroš Bizjak --- (In reply to Richard Biener from comment #6) > Do we know whether we could in theory improve the sanitizing by optimization > without -funsafe-math-optimizations (I think -fno-trapping-math, > -ffinite-math-only -fno-signalling-nans should be a better guard?)? I was looking at -funsafe-math-optimizations because the compiler links in crtfastmath.c which sets DAZ and FTZ flags, so eventual denormals won't bother us. -fu-m-o also enables -fno-trapping-math, which assumes masked FP exceptions, so we can still allow V2SF infinities and NaNs. FYI, clang enables this optimization by default, since it defaults to -fno-trapping-math. It seems to me that they don't care about denormals.
[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 Uroš Bizjak changed: What|Removed |Added CC||ubizjak at gmail dot com --- Comment #5 from Uroš Bizjak --- (In reply to Richard Biener from comment #3) > Maybe r14-2786-gade30fad6669e5 Yes. This is the cost to sanitize operands before every operation. However, we can recover the performance for -funsafe-math-optimizations with the patch, attached to the previous message, from: 21,592075559 seconds time elapsed to: 20,047717312 seconds time elapsed
[Bug middle-end/110832] [14 Regression] 14% capacita -O2 regression between g:9fdbd7d6fa5e0a76 (2023-07-26 01:45) and g:ca912a39cccdd990 (2023-07-27 03:44) on zen3 and core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110832 --- Comment #4 from Uroš Bizjak --- Created attachment 55652 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55652&action=edit Patch to recover performance for -funsafe-math-optimizations This patch will recover performance with -funsafe-math-optimizations.
[Bug target/110762] [11/12/13 Regression] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 Uroš Bizjak changed: What|Removed |Added CC|uros at gcc dot gnu.org| Target Milestone|11.5|13.3 --- Comment #21 from Uroš Bizjak --- (In reply to Richard Biener from comment #20) > Thanks a lot. So this should now be fully fixed in GCC 14. The original > testcase is also broken in GCC 11, 12 and 13 but not 10, but I'm not sure > how far we'd want to backport this change - I'd consider the 13 branch but > that's probably it. After some time soaking, that is. The issue can be triggered only with a specially crafted code (such as the one in Comment #0 / Comment #12) that deliberatelly exposes the problem. Otherwise, the approach from PR 95046 is quite robust, and there have been no PRs in this area reported, although V2SF is auto-vectorized by default. The patch is written in such a way to minimize exposure to subregs (the temporary V4SFmode output register is used and later copied via subreg to target V2SFmode operand) to avoid eventual problems in RA. GCC 13.2 was just released, so I think the patch could be backported to gcc-13 branch in the first week of august, but as you propose, only to gcc-13 branch, and not any further.
[Bug rtl-optimization/91838] [8/9 Regression] incorrect use of shr and shrx to shift by 64, missed optimization of vector shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91838 --- Comment #18 from Uroš Bizjak --- (In reply to Richard Biener from comment #17) > Interestingly even with -mno-sse we somehow have a shift for V2QImode. This is implemented by a combination of shl rl,cl and shl rh,cl, so no XMM registers are needed.
[Bug target/110788] Spilling to mask register for GPR vec_duplicate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110788 --- Comment #3 from Uroš Bizjak --- (In reply to Richard Biener from comment #0) > I suppose it could also be a missed optimization in REE since I think > the HImode regs should already be zero-extended? No, only SImode moves have implicit zero extensions. Plain HImode and QImode moves behave as inserts into the lowpart of the wide register.
[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #18 from Uroš Bizjak --- (In reply to Richard Biener from comment #17) > > compiles to: > > > > movq%xmm1, %xmm1# 8 [c=4 l=4] *vec_concatv4sf_0 > > movq%xmm0, %xmm0# 9 [c=4 l=4] *vec_concatv4sf_0 > > movq%xmm2, %xmm2# 12[c=4 l=4] *vec_concatv4sf_0 > > mulps %xmm1, %xmm0# 10[c=16 l=3] *mulv4sf3/0 > > movq%xmm0, %xmm0# 13[c=4 l=4] *vec_concatv4sf_0 > > so this one is obviously redundant - I suppose at the RTL level we have > no chance of noticing this. I hope for integer vector operations we > avoid these ops? I think this will make epilog vectorization with V2SFmode > a bad idea, we'd need to appropriately disqualify this in the costing > hooks. Yes, the redundant movq is emitted only in front of V2SFmode trapping operations. So, all integer, V2SF logic and swizzling operations are still implemented directly with "emulated" instructions. > > I wonder if combine could for example combine a v2sf load with the > upper half zeroing for the next use? Likewise for arithmetics. The patch already does that. We know that V2SF load zeroes the upper half, so there is no additional MOVQ emitted. To illustrate, the testcase: --cut here-- typedef float __attribute__((vector_size(8))) v2sf; v2sf m; v2sf test (v2sf a) { return a - m; } --cut here-- compiles to: movqm(%rip), %xmm1 # 6 [c=4 l=8] *vec_concatv4sf_0 movq%xmm0, %xmm0# 7 [c=4 l=4] *vec_concatv4sf_0 subps %xmm1, %xmm0# 8 [c=12 l=3] *subv4sf3/0 As far as arithmetic is concerned, perhaps some back-walking RTL optimization pass can figure out that the preceding trapping V2SFmode operation guarantees zeros in the upper half and remove clearing insn. However, MOVQ xmm,xmm is an extremely fast instruction with latency of 1 and reciprocal throughput of 0.33, so I guess it is not of much concern.
[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|NEW |ASSIGNED --- Comment #16 from Uroš Bizjak --- Created attachment 55636 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55636&action=edit Proposed patch Proposed patch clears the upper half of a V4SFmode operand register before all potentially trapping instructions. The testcase from comment #12 now compiles to: movq%xmm1, %xmm1# 9 [c=4 l=4] *vec_concatv4sf_0 movq%xmm0, %xmm0# 10[c=4 l=4] *vec_concatv4sf_0 addps %xmm1, %xmm0# 11[c=12 l=3] *addv4sf3/0 This approach addresses issues with traps (Comment #0), as well as with denormal/invalid values (Comment #14). An obvious exception to the rule is a division, where the value != 0.0 should be loaded into the upper half of the denominator. The patch effectively tightens the solution from PR95046 by clearing upper halves of all operand registers before every potentially trapping instruction. The testcase: --cut here-- typedef float __attribute__((vector_size(8))) v2sf; v2sf test (v2sf a, v2sf b, v2sf c) { return a * b - c; } --cut here-- compiles to: movq%xmm1, %xmm1# 8 [c=4 l=4] *vec_concatv4sf_0 movq%xmm0, %xmm0# 9 [c=4 l=4] *vec_concatv4sf_0 movq%xmm2, %xmm2# 12[c=4 l=4] *vec_concatv4sf_0 mulps %xmm1, %xmm0# 10[c=16 l=3] *mulv4sf3/0 movq%xmm0, %xmm0# 13[c=4 l=4] *vec_concatv4sf_0 subps %xmm2, %xmm0# 14[c=12 l=3] *subv4sf3/0 The implementation simply calls V4SFmode operation, so we can remove all "emulated" SSE2 V2SFmode instructions and SSE2 V2SFmode alternatives from 3dNOW! insn patterns.
[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #13 from Uroš Bizjak --- I think we should put all partial vector V2SF operations under !flag_trapping_math.
[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #10 from Uroš Bizjak --- (In reply to Richard Biener from comment #7) > I guess for the specific usage we need to wrap this in an UNSPEC? Probably, so a MOVQ xmm, xmm insn should be emitted for __builtin_ia32_storelps (AKA _mm_storel_pi), so the top 64bits will be cleared. There is already *vec_concatv4sf_0 that looks appropriate to implement the move.
[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #3 from Uroš Bizjak --- (In reply to Richard Biener from comment #1) > So what's the issue? That this is wrong for -ftrapping-math? Or that the > return value has undefined contents in the upper half? (I don't think the > ABI specifies how V2SF is returned) __m64 is classified as SSE class, returned in XMM register.
[Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717 Uroš Bizjak changed: What|Removed |Added Assignee|ubizjak at gmail dot com |unassigned at gcc dot gnu.org Status|ASSIGNED|NEW --- Comment #8 from Uroš Bizjak --- (In reply to CVS Commits from comment #7) > The master branch has been updated by Uros Bizjak : The patch implements transform for x86 targets only. Due to eventual STV transformation, x86 targets handle double-word operations in its own way. I'll left the target-independent implementation to someone else.
[Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717 Uroš Bizjak changed: What|Removed |Added Target Milestone|--- |14.0 CC|uros at gcc dot gnu.org| --- Comment #6 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #5) > Thanks. > Shouldn't > INTVAL (operands[2]) < * BITS_PER_UNIT > be > UINTVAL (operands[2]) < * BITS_PER_UNIT > just to make sure it doesn't trigger for negative? Ah, yes, I'll change it.
[Bug rtl-optimization/110717] Double-word sign-extension missed-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110717 Uroš Bizjak changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com --- Comment #4 from Uroš Bizjak --- Created attachment 55578 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55578&action=edit Proposed patch Patch in testing.
[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206 Uroš Bizjak changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|14.0|12.4 Status|ASSIGNED|RESOLVED --- Comment #20 from Uroš Bizjak --- Fixed for gcc-12.4+.
[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206 --- Comment #16 from Uroš Bizjak --- v2 patch at [1]. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624491.html
[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206 --- Comment #15 from Uroš Bizjak --- Created attachment 55537 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55537&action=edit Proposed patch. v2 patch in testing. This version prevents emission of invalid REG_EQUAL note in cprop.cc/try_replace_reg when original, non-simplified RTX contains SUBREG. The patch is in effect an one-liner: @@ -795,7 +796,8 @@ try_replace_reg (rtx from, rtx to, rtx_insn *insn) /* If we've failed perform the replacement, have a single SET to a REG destination and don't yet have a note, add a REG_EQUAL note to not lose information. */ - if (!success && note == 0 && set != 0 && REG_P (SET_DEST (set))) + if (!success && note == 0 && set != 0 && REG_P (SET_DEST (set)) + && !contains_paradoxical_subreg_p (SET_SRC (set))) note = set_unique_reg_note (insn, REG_EQUAL, copy_rtx (src)); } but we have to move contains_paradoxical_subreg_p to rtlanal.cc.
[Bug target/106966] [12/13/14 Regression] alpha cross build crashes gcc-12 "internal compiler error: in emit_move_insn"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106966 Uroš Bizjak changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #17 from Uroš Bizjak --- Thanks for helping with tests! Fixed for gcc-12.4+
[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206 --- Comment #14 from Uroš Bizjak --- (In reply to Uroš Bizjak from comment #10) > (In reply to Uroš Bizjak from comment #9) > > and simplify_replace_rtx simplifies the above to: > > > > (gdb) p debug_rtx (src) > > (const_vector:V8HI [ > > (const_int 204 [0xcc]) repeated x8 > > ]) > > Patched compiler simplifies to: > > (gdb) p debug_rtx (src) > (const_vector:V8HI [ > (const_int 204 [0xcc]) repeated x4 > (const_int 0 [0]) repeated x4 > ]) The patched compiler puts the above in REG_EQUAL note. While the value is "more correct", I don't think the compiler has the right to set REG_EQUAL note when the top 4 bytes are actually undefined (as a result of an operation with an undefined input, which is the case with paradoxical subreg).
[Bug rtl-optimization/110206] [14 Regression] wrong code with -Os -march=cascadelake since r14-1246
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110206 --- Comment #13 from Uroš Bizjak --- (In reply to Richard Biener from comment #12) > I can see cprop1 adds the REG_EQUAL note: > > (insn 22 21 23 4 (set (reg:V8HI 100) > (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0) > (parallel [ > (const_int 0 [0]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) > (const_int 3 [0x3]) > (const_int 4 [0x4]) > (const_int 5 [0x5]) > (const_int 6 [0x6]) > (const_int 7 [0x7]) > ] "t.c":12:42 7557 {sse4_1_zero_extendv8qiv8hi2} > - (expr_list:REG_DEAD (reg:V4QI 98) > -(nil))) > + (expr_list:REG_EQUAL (const_vector:V8HI [ > +(const_int 204 [0xcc]) repeated x8 > +]) > +(expr_list:REG_DEAD (reg:V4QI 98) > +(nil > > but I don't see yet what the actual wrong transform based on this REG_EQUAL > note is? We constant fold V4QImode const_vector to a V8HImode const_vector with 8 defined elements. We started with undefined top four bytes, but now we magically define them. > > It looks like we CSE the above with > > - 46: r122:V8QI=[`*.LC3'] > - REG_EQUAL const_vector > - 48: r125:V8HI=zero_extend(vec_select(r122:V8QI#0,parallel)) > - REG_EQUAL const_vector > - REG_DEAD r122:V8QI > - 49: r126:V8HI=r124:V8HI*r125:V8HI > - REG_DEAD r125:V8HI > + 49: r126:V8HI=r124:V8HI*r100:V8HI > > but otherwise do nothing. So the issue is that we rely on the "undefined" > vals to have a specific value (from the earlier REG_EQUAL note) but actual > code generation doesn't ensure this (it doesn't need to). That said, > the issue isn't the constant folding per-se but that we do not actually > constant fold but register an equality that doesn't hold. The above CSE is the consequence of REG_EQUAL note that compiler set on the insn. Compiler claims that the value of (insn 22) equals an array of 8 consts { 204 , ... , 204 }, but in reality (c.f. Comment #3) the value in the register %xmm4 before VPMULLW insn is { 0, 0, 0, 0, 204, 204, 204, 204 }.