https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107099

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-10-06
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
uncprop should really be merged into the out-of-SSA coalescing machinery, it's
sole purpose is to avoid edge copies (which for constants you always have to
perform) when there's the possibility to generate reg-reg moves or coalescing
instead.

As such it's a heuristic that needs to be weighted against other heuristics
in place.

In this case we emit a copy instead of a zeroing (and zeroing is cheap on the
target, so zero might be a good candidate to special-case anyway), but we
have the zeroing in the loop body now.

The odd thing is that we perform desired coalescing:

Partition 0 (_1(D) - 1 6 11 )

(_1(D) is created for DECL_RESULT)

but expansion of __builtin_ia32_ptestz128 does

;; _11 = __builtin_ia32_ptestz128 (_4, _4);

(insn 17 16 18 (set (reg:SI 90)
        (const_int 0 [0])) "include/smmintrin.h":69:10 -1
     (nil))

(insn 18 17 19 (set (reg:CC 17 flags)
        (unspec:CC [
                (reg:V2DI 82 [ _4 ]) repeated x2
            ] UNSPEC_PTEST)) "include/smmintrin.h":69:10 -1
     (nil))

(insn 19 18 20 (set (strict_low_part (subreg:QI (reg:SI 90) 0))
        (eq:QI (reg:CC 17 flags)
            (const_int 0 [0]))) "include/smmintrin.h":69:10 -1
     (nil))

(insn 20 19 0 (set (reg:SI 86 [ <retval> ])
        (reg:SI 90)) "include/smmintrin.h":69:10 -1
     (nil))

which eventually ends up as

(insn 21 20 22 4 (set (reg:SI 86 [ <retval> ])
        (eq:SI (reg:CC 17 flags)
            (const_int 0 [0]))) "t.c":7:12 940 {*setcc_si_1_movzbl}
     (nil))
(jump_insn 22 21 23 4 (set (pc)
        (if_then_else (ne (reg:CC 17 flags)
                (const_int 0 [0]))
            (label_ref:DI 32)
            (pc))) "t.c":7:12 946 {*jcc}
     (int_list:REG_BR_PROB 59055804 (expr_list:REG_DEAD (reg:CCZ 17 flags)
            (nil)))

that's not forseen by coalescing - that _11 is actually not needed inside
the loop but only the CC result.

So heuristically we might want to disable uncprop when definitions from
calls (to [target] builtins) would be used.

Reply via email to