https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86891
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |collison at gcc dot gnu.org, | |jakub at gcc dot gnu.org, | |ktkachov at gcc dot gnu.org, | |rth at gcc dot gnu.org Component|tree-optimization |target --- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Started with r262890. I believe the bug is in the RTL of the *sub<mode>3_carryinCV* patterns (though I admit I don't know exactly what those insns do on aarch64). Through -O and disabling of some GIMPLE opts we end up with arguments to SUB_OVERFLOW that aren't constant during expansion, but will be turned into constants during RTL optimizations. We effectively are checking if unsigned subtraction 0xfff9U - 0xfffffffffffffffffffffffffffffff8U overflows. The aarch64 expander has a pattern for usubvti4, and on input we have that 0xfff9U constant in (reg:TI 111) and the -(unsigned __int128) 8 constant in (reg:TI 117). The expander emits: (insn 23 22 24 2 (parallel [ (set (reg:CC 66 cc) (compare:CC (subreg:DI (reg:TI 111) 0) (subreg:DI (reg:TI 117) 0))) (set (reg:DI 121) (minus:DI (subreg:DI (reg:TI 111) 0) (subreg:DI (reg:TI 117) 0))) ]) "pr86891.c":9 298 {subdi3_compare1} (nil)) (insn 24 23 25 2 (set (reg:DI 123) (subreg:DI (reg:TI 111) 8)) "pr86891.c":9 47 {*movdi_aarch64} (nil)) (insn 25 24 26 2 (parallel [ (set (reg:CC 66 cc) (compare:CC (sign_extend:TI (reg:DI 123)) (plus:TI (sign_extend:TI (subreg:DI (reg:TI 117) 8)) (ltu:TI (reg:CC 66 cc) (const_int 0 [0]))))) (set (reg:DI 122) (minus:DI (minus:DI (reg:DI 123) (subreg:DI (reg:TI 117) 8)) (ltu:DI (reg:CC 66 cc) (const_int 0 [0])))) ]) "pr86891.c":9 367 {*subdi3_carryinCV} (nil)) (insn 26 25 27 2 (set (subreg:DI (reg:TI 120) 0) (reg:DI 121)) "pr86891.c":9 47 {*movdi_aarch64} (nil)) (insn 27 26 28 2 (set (subreg:DI (reg:TI 120) 8) (reg:DI 122)) "pr86891.c":9 47 {*movdi_aarch64} (nil)) (jump_insn 28 27 71 2 (set (pc) (if_then_else (ltu (reg:CC 66 cc) (const_int 0 [0])) (label_ref 31) (pc))) "pr86891.c":9 9 {condjump} (int_list:REG_BR_PROB 536868 (nil)) -> 31) I assume these instructions DTRT at runtime. Now consider what happens when you propagate those constants into that RTL. insn 23 sets pseudo 121 to 0xfff9ULL - (-8ULL), i.e. 0x10001ULL with overflow (first argument is smaller than second argument), so I expect carry is set. Now, insn 25 I'd hope performs r122 = 0ULL - (-1ULL) - 1 (the last being carry), which is I think modelled properly by the (set (reg:DI 122) (minus ...)) part of the insn. What is incorrect is the compare part of insn 25. Because it is compare 0 (-(unsigned __int128)1 + carry), i.e. comparison of 0 and 0, so no CC is set, even when I hope the instruction actually should set carry (if it doesn't it couldn't be used even at runtime, because 0xfff9ULL - (-(unsigned __int128)8) does overflow. Note, the i386.md pattern which I'd think would be similar in what it does at runtime, is: (define_insn "sub<mode>3_carry_ccc" [(set (reg:CCC FLAGS_REG) (compare:CCC (zero_extend:<DWI> (match_operand:DWIH 1 "register_operand" "0")) (plus:<DWI> (ltu:<DWI> (reg:CC FLAGS_REG) (const_int 0)) (zero_extend:<DWI> (match_operand:DWIH 2 "x86_64_sext_operand" "rmWe"))))) (clobber (match_scratch:DWIH 0 "=r"))] "" "sbb{<imodesuffix>}\t{%2, %0|%0, %2}" [(set_attr "type" "alu") (set_attr "mode" "<MODE>")]) i.e. instead of comparing (sign_extend:TI (operand 1)) with (plus:TI (sign_extend:TI (operand 2)) (ltu:TI (cc) (const_int 0))) in aarch64 backend it compares (zero_extend:TI (operand 1)) with (plus:TI (ltu:TI (cc) (const_int 0)) (zero_extend:TI (operand 2)). Not really sure if the order of carry and extension of operand 2 matters that much (needs to be checked in combine log if it should be matches by combiner in some cases), but I think the zero_extend vs. sign_extend is significant. Now, looking at what aarch64 does for add with carry, there are separate patterns like add<mode>3_carryinC which set CC_C mode and use zero_extend and add<mode>3_carryinV which sets CC_V mode and uses sign_extend. So, shouldn't sub<mode>3_carryin{C,V} be split similarly and if we check carry flag, we should use subdi3_carryinC?