https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107099
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2022-10-06 Status|UNCONFIRMED |NEW Keywords| |missed-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- uncprop should really be merged into the out-of-SSA coalescing machinery, it's sole purpose is to avoid edge copies (which for constants you always have to perform) when there's the possibility to generate reg-reg moves or coalescing instead. As such it's a heuristic that needs to be weighted against other heuristics in place. In this case we emit a copy instead of a zeroing (and zeroing is cheap on the target, so zero might be a good candidate to special-case anyway), but we have the zeroing in the loop body now. The odd thing is that we perform desired coalescing: Partition 0 (_1(D) - 1 6 11 ) (_1(D) is created for DECL_RESULT) but expansion of __builtin_ia32_ptestz128 does ;; _11 = __builtin_ia32_ptestz128 (_4, _4); (insn 17 16 18 (set (reg:SI 90) (const_int 0 [0])) "include/smmintrin.h":69:10 -1 (nil)) (insn 18 17 19 (set (reg:CC 17 flags) (unspec:CC [ (reg:V2DI 82 [ _4 ]) repeated x2 ] UNSPEC_PTEST)) "include/smmintrin.h":69:10 -1 (nil)) (insn 19 18 20 (set (strict_low_part (subreg:QI (reg:SI 90) 0)) (eq:QI (reg:CC 17 flags) (const_int 0 [0]))) "include/smmintrin.h":69:10 -1 (nil)) (insn 20 19 0 (set (reg:SI 86 [ <retval> ]) (reg:SI 90)) "include/smmintrin.h":69:10 -1 (nil)) which eventually ends up as (insn 21 20 22 4 (set (reg:SI 86 [ <retval> ]) (eq:SI (reg:CC 17 flags) (const_int 0 [0]))) "t.c":7:12 940 {*setcc_si_1_movzbl} (nil)) (jump_insn 22 21 23 4 (set (pc) (if_then_else (ne (reg:CC 17 flags) (const_int 0 [0])) (label_ref:DI 32) (pc))) "t.c":7:12 946 {*jcc} (int_list:REG_BR_PROB 59055804 (expr_list:REG_DEAD (reg:CCZ 17 flags) (nil))) that's not forseen by coalescing - that _11 is actually not needed inside the loop but only the CC result. So heuristically we might want to disable uncprop when definitions from calls (to [target] builtins) would be used.