[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |10.0 Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #33 from Andrew Pinski --- Fixed.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #32 from Paweł Bylica --- For what it's worth, the original code is compiled the same as in Clang since GCC 10. https://godbolt.org/z/vxorYW815
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #31 from Segher Boessenkool --- (In reply to Segher Boessenkool from comment #29) > Does it help the i386 port if we disallow a hard reg dest as well? > RA should be able to handle that just fine as well. I tried this out, and it creates much worse code (much bigger, anyway), on all targets.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #30 from Uroš Bizjak --- (In reply to Segher Boessenkool from comment #29) > > Does it help the i386 port if we disallow a hard reg dest as well? > RA should be able to handle that just fine as well. I don't know from the top of my head, but the current approach where invalid/unwanted propagation is rejected by ix86_legitimate_combined_insn works well. So, to answer your question, the effects of a proposed patch should be analysed on a case-by-case basis. There are several instructions that use fixed output registers (mostly double-word insns that use rax/rdx pair, e.g. mult and div) which are usually tied to the input operand, but nobody ever analysed how additional RA freedom would affect them. > > This still will not get rid of *all* (non-fixed) hard registers, you > still can get them from explicit register variables, and target code > (or even generic code) can still put some in non-copies, which combine > will happily propagate further. Thanks for this info, so it looks that a safety net in the form of ix86_legitimate_combined_insn (and insn constraints) is still needed.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #29 from Segher Boessenkool --- (In reply to Uroš Bizjak from comment #27) > FYI, these constraints were used in the past (when combine was allowed to > propagate hard registers into combined insn) to prevent reload failures, > where reload was not able to e.g. reload wrong hard reg in the place of > count reg in the shift insn pattern to %ecx. Nowadays, constraints in the > patterns of pre-reload insn_and_split are not necessary anymore, and can be > removed together with ix86_legitimate_combined_insn target hook. combine still propagates hard registers. The only thing it does not combine is single-set register-register copies from a non-fixed hard register; everything else with hard registers it still does, like if the *destination* of the move is a hard reg (e.g., in the function return value). Does it help the i386 port if we disallow a hard reg dest as well? RA should be able to handle that just fine as well. This still will not get rid of *all* (non-fixed) hard registers, you still can get them from explicit register variables, and target code (or even generic code) can still put some in non-copies, which combine will happily propagate further.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #28 from Jakub Jelinek --- Author: jakub Date: Sat Oct 19 12:46:57 2019 New Revision: 277203 URL: https://gcc.gnu.org/viewcvs?rev=277203=gcc=rev Log: PR target/92140 * config/i386/predicates.md (int_nonimmediate_operand): New special predicate. * config/i386/i386.md (*add3_eq, *add3_ne, *add3_eq_0, *add3_ne_0, *sub3_eq, *sub3_ne, *sub3_eq_1, *sub3_eq_0, *sub3_ne_0): New define_insn_and_split patterns. * gcc.target/i386/pr92140.c: New test. * gcc.c-torture/execute/pr92140.c: New test. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr92140.c trunk/gcc/testsuite/gcc.target/i386/pr92140.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/predicates.md trunk/gcc/testsuite/ChangeLog
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #27 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #26) > Created attachment 47070 [details] > gcc10-prereload-splitters.patch > > Updated patch for the pre-reload splitters, which uses a new predicate and > additionally removes constraints and attributes from those > define_insn_and_split that had them (most of them didn't), because nothing > will really look at constraints before reload and similarly for enabled etc. > attributes. FYI, these constraints were used in the past (when combine was allowed to propagate hard registers into combined insn) to prevent reload failures, where reload was not able to e.g. reload wrong hard reg in the place of count reg in the shift insn pattern to %ecx. Nowadays, constraints in the patterns of pre-reload insn_and_split are not necessary anymore, and can be removed together with ix86_legitimate_combined_insn target hook.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 Jakub Jelinek changed: What|Removed |Added Attachment #47069|0 |1 is obsolete|| --- Comment #26 from Jakub Jelinek --- Created attachment 47070 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47070=edit gcc10-prereload-splitters.patch Updated patch for the pre-reload splitters, which uses a new predicate and additionally removes constraints and attributes from those define_insn_and_split that had them (most of them didn't), because nothing will really look at constraints before reload and similarly for enabled etc. attributes.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #25 from Jakub Jelinek --- The define_insn part of define_insn_and_split needs constraints if it is meant to match during or after reload, the patterns are just written with the assumption that they are split before reload. At least that is my understanding of them.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #24 from Segher Boessenkool --- A dumb question I'm sure, but I don't see it: if the rest of your define_insn doesn't need constraints, why would the match_scratch need some? (A define_split never uses constraints).
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #23 from Jakub Jelinek --- (In reply to Segher Boessenkool from comment #22) > Hrm, I don't see how this is nicer than just adding a scratch in the > pattern? What makes that a worse option? Most of the patterns don't have constraints and don't want to deal with that. See the ugliness I had to play with the enabled attribute in the earlier version of the patch to deal properly with the constraints. Many of them actually don't create any pseudos, just want to be matched only before reload, split there and not match afterwards.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #22 from Segher Boessenkool --- Hrm, I don't see how this is nicer than just adding a scratch in the pattern? What makes that a worse option?
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #21 from Jakub Jelinek --- Created attachment 47069 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47069=edit gcc10-prereload-splitters.patch Ah, apparently we already have for ~ 2 years a property to handle this safely. So perhaps following incremental (so far completely untested) patch? It unfortunately requires the two generic changes, the alternative would be to add a helper function somewhere in i386*.c which would return can_create_pseudo_p () && !(cfun->curr_properties & PROP_rtl_split_insns); declare it in i386-protos.h and just use it in config/i386/*.md instead. Any preferences? From maintainance POV, it might be cleaner to have the wrapper, but then the question is what is the best name for it.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #20 from Segher Boessenkool --- Ah, okay. So it is either one or two insns (zero can not be handled, but you can do a noop, a move of a reg to itself, and that will be optimised away just fine). Three insns is not something combine ever handles at all: it's always {2,3,4}->{1,2}. Since some years new pseudos *can* be created during combine, but there are various problems with that still.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #19 from Uroš Bizjak --- (In reply to Segher Boessenkool from comment #18) > (In reply to Uroš Bizjak from comment #15) > > Is it possible to lift the limitation from the combine pass, where the > > combine tries to split the insn, but expects exactly two new insn patterns > > to be generated. From the docs: > > > > --q-- > > When the combiner phase tries to split an insn pattern, it is always > > the case that the pattern is _not_ matched by any 'define_insn'. The > > combiner pass first tries to split a single 'set' expression and then > > the same 'set' expression inside a 'parallel', but followed by a > > 'clobber' of a pseudo-reg to use as a scratch register. In these cases, > > the combiner expects exactly two new insn patterns to be generated. It > > will verify that these patterns match some 'define_insn' definitions, so > > you need not do this test in the 'define_split' (of course, there is no > > point in writing a 'define_split' that will never produce insns that > > match). > > --/q-- > > > > Many define_and_split insns in i386.md that use can_create_pseudo_p have > > this unwanted condition in their insn condition just because of the > > mentioned limitation. > > What unwanted condition? What limitation? "unwanted condition" refers to the usage of can_create_pseudo_p in the insn condition. "Limitation" refers to "the combiner expects exactly two new insn patterns to be generated".
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #18 from Segher Boessenkool --- (In reply to Uroš Bizjak from comment #15) > Is it possible to lift the limitation from the combine pass, where the > combine tries to split the insn, but expects exactly two new insn patterns > to be generated. From the docs: > > --q-- > When the combiner phase tries to split an insn pattern, it is always > the case that the pattern is _not_ matched by any 'define_insn'. The > combiner pass first tries to split a single 'set' expression and then > the same 'set' expression inside a 'parallel', but followed by a > 'clobber' of a pseudo-reg to use as a scratch register. In these cases, > the combiner expects exactly two new insn patterns to be generated. It > will verify that these patterns match some 'define_insn' definitions, so > you need not do this test in the 'define_split' (of course, there is no > point in writing a 'define_split' that will never produce insns that > match). > --/q-- > > Many define_and_split insns in i386.md that use can_create_pseudo_p have > this unwanted condition in their insn condition just because of the > mentioned limitation. What unwanted condition? What limitation? From combine.c: /* If we were combining three insns and the result is a simple SET with no ASM_OPERANDS that wasn't recognized, try to split it into two insns. There are two ways to do this. It can be split using a machine-specific method (like when you have an addition of a large constant) or by combine in the function find_split_point. */ /* See if the MD file can split NEWPAT. If it can't, see if letting it use I2DEST as a scratch register will help. In the latter case, convert I2DEST to the mode of the source of NEWPAT if we can. */ /* ??? Reusing i2dest without resetting the reg_stat entry for it (temporarily, until we are committed to this instruction combination) does not work: for example, any call to nonzero_bits on the register (from a splitter in the MD file, for example) will get the old information, which is invalid. Since nowadays we can create registers during combine just fine, we should just create a new one here, not reuse i2dest. */ Exactly two new insns... Exactly one is also okay. I'll fix the docs.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #17 from Jakub Jelinek --- I've tried to change the patch to use define_split instead of define_insn_and_split, with all of them changed, it creates worse code for f8/f12/f15 (the last one is expected, because we split into 3 instructions instead of two), while if I change only the *{add,sub}3_{eq,ne} define_insn_and_split into define_split and keep the others, those 3 functions are fixed again, but f5/f6/f7/f9/f10/f11/f13/f14 result in worse code.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #16 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #13) > Created attachment 47067 [details] > gcc10-pr92140.patch > > So what about this version then? I've changed back a couple of > to nonimmediate_operand and removed corresponding > force_reg, because it would be in spots where the there is already one > possible immediate which would be in operands[2] rather than operands[1], > changed the eq/ne_0_operator to the define_special_predicate you've > suggested and added testcase coverage. LGTM.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #15 from Uroš Bizjak --- (In reply to Segher Boessenkool from comment #12) > (In reply to Uroš Bizjak from comment #10) > > Regarding reliability of pre-reload splitters, IIRC they should be safe, but > > I'll leave the final verdict to RTL experts. > > The *splitters* are just fine; a define_insn with a condition that turns off > in a later pass generally is not. Is it possible to lift the limitation from the combine pass, where the combine tries to split the insn, but expects exactly two new insn patterns to be generated. From the docs: --q-- When the combiner phase tries to split an insn pattern, it is always the case that the pattern is _not_ matched by any 'define_insn'. The combiner pass first tries to split a single 'set' expression and then the same 'set' expression inside a 'parallel', but followed by a 'clobber' of a pseudo-reg to use as a scratch register. In these cases, the combiner expects exactly two new insn patterns to be generated. It will verify that these patterns match some 'define_insn' definitions, so you need not do this test in the 'define_split' (of course, there is no point in writing a 'define_split' that will never produce insns that match). --/q-- Many define_and_split insns in i386.md that use can_create_pseudo_p have this unwanted condition in their insn condition just because of the mentioned limitation.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #14 from Jakub Jelinek --- And as for the define_insn_and_split without constraints that don't expect to be matched post split1, I think we can try to figure out something incrementally and change all of them at once, e.g. a property set by expand pass and cleared by split1 pass, or property set by split1 pass or something similar.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #13 from Jakub Jelinek --- Created attachment 47067 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47067=edit gcc10-pr92140.patch So what about this version then? I've changed back a couple of to nonimmediate_operand and removed corresponding force_reg, because it would be in spots where the there is already one possible immediate which would be in operands[2] rather than operands[1], changed the eq/ne_0_operator to the define_special_predicate you've suggested and added testcase coverage. I'm not sure trying to do something here in peephole2 would catch as many cases as the combiner patterns can handle.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #12 from Segher Boessenkool --- (In reply to Uroš Bizjak from comment #10) > Regarding reliability of pre-reload splitters, IIRC they should be safe, but > I'll leave the final verdict to RTL experts. The *splitters* are just fine; a define_insn with a condition that turns off in a later pass generally is not.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #11 from Segher Boessenkool --- If an insn condition uses can_create_pseudo_p, the insn will suddenly stop to match after reload --> kaboom. If your insn always splits ("&& 1"), this means that if any of these: NEXT_PASS (pass_lower_subreg3); NEXT_PASS (pass_df_initialize_no_opt); NEXT_PASS (pass_stack_ptr_mod); NEXT_PASS (pass_mode_switching); NEXT_PASS (pass_match_asm_constraints); NEXT_PASS (pass_sms); NEXT_PASS (pass_live_range_shrinkage); NEXT_PASS (pass_sched); NEXT_PASS (pass_early_remat); NEXT_PASS (pass_ira); NEXT_PASS (pass_reload); passes creates a new such insn, you ICE (or worse, do the wrong thing). That limits exposure quite a bit, but OTOH it is really hard to prove some of these passes will not create such insns.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #10 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #9) > Created attachment 47065 [details] > gcc10-pr92140-wip.patch > > If pre-reload splitters are reliable, my patch can be greatly simplified and > using the formatting, && can_create_pseudos (), && 1 and + > force_reg ideas from your patch this is the merge of both, which can handle > all f1-f14 plus tst1-3. You can use define_special_predicate to check the mode of the operand inside relational operator. To avoid pattern explosion, maybe compound instructions should be synthesized by peephole pass. Regarding reliability of pre-reload splitters, IIRC they should be safe, but I'll leave the final verdict to RTL experts.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #9 from Jakub Jelinek --- Created attachment 47065 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47065=edit gcc10-pr92140-wip.patch If pre-reload splitters are reliable, my patch can be greatly simplified and using the formatting, && can_create_pseudos (), && 1 and + force_reg ideas from your patch this is the merge of both, which can handle all f1-f14 plus tst1-3.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #8 from Jakub Jelinek --- Comparing the two patches, your patch handles f1-f4 in /* PR target/92140 */ char c; int v; __attribute__((noipa)) void f1 (void) { v += c != 0; } __attribute__((noipa)) void f2 (void) { v -= c != 0; } __attribute__((noipa)) void f3 (void) { v += c == 0; } __attribute__((noipa)) void f4 (void) { v -= c == 0; } __attribute__((noipa)) void f5 (void) { v += (c != 0) - 26; } __attribute__((noipa)) void f6 (void) { v -= (c != 0) - 26; } __attribute__((noipa)) void f7 (void) { v += (c == 0) - 26; } __attribute__((noipa)) void f8 (void) { v -= (c == 0) - 26; } __attribute__((noipa)) void f9 (void) { v += (c != 0) + 42; } __attribute__((noipa)) void f10 (void) { v -= (c != 0) + 42; } __attribute__((noipa)) void f11 (void) { v += (c == 0) + 42; } __attribute__((noipa)) void f12 (void) { v -= (c == 0) + 42; } __attribute__((noipa)) void f13 (int z) { v += (c == 0) + z; } __attribute__((noipa)) void f14 (int z) { v -= (c == 0) + z; } int main () { int i; for (i = 0; i < 2; i++) { v = 15; if (i == 1) c = 37; f1 (); if (v != 15 + i) __builtin_abort (); f2 (); if (v != 15) __builtin_abort (); f3 (); if (v != 16 - i) __builtin_abort (); f4 (); if (v != 15) __builtin_abort (); f5 (); if (v != 15 + i - 26) __builtin_abort (); f6 (); if (v != 15) __builtin_abort (); f7 (); if (v != 16 - i - 26) __builtin_abort (); f8 (); if (v != 15) __builtin_abort (); f9 (); if (v != 15 + i + 42) __builtin_abort (); f10 (); if (v != 15) __builtin_abort (); f11 (); if (v != 16 - i + 42) __builtin_abort (); f12 (); if (v != 15) __builtin_abort (); f13 (173); if (v != 16 - i + 173) __builtin_abort (); f14 (173); if (v != 15) __builtin_abort (); f13 (-35); if (v != 16 - i - 35) __builtin_abort (); f14 (-35); if (v != 15) __builtin_abort (); } return 0; } and all cases in the #c0 testcase, while my patch handles f1-f14 in the above testcase and only tst1/tst2 in #c0 testcase. For the pre-reload only splitters, I'm never sure if it is safe. If it is matched pre-split1 (usually during combine), it is of course fine, but what if it appears after split1 and before end of reload, do we have a guarantee that nothing will try to match those? If it is safe, I can of course add the && !reload_completed or similar to the conditions (or && can_create_pseudo_p ()).
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #7 from Uroš Bizjak --- Created attachment 47064 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47064=edit Proposed patch with pre-reload splitters Maybe we should use pre-reload splitters as with the attached patch that converts all test cases.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 Jakub Jelinek changed: What|Removed |Added Attachment #47062|0 |1 is obsolete|| --- Comment #6 from Jakub Jelinek --- Created attachment 47063 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47063=edit gcc10-pr92140-wip.patch Updated patch that implements that, so when expanded only adds 36 define_insn and 36 define_split rather than 144 each.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #5 from Jakub Jelinek --- The patch adds 144 define_insn and 144 define_split to tmp-mddump.md though, to total 6217 define_insn and 733 define_split. Maybe a better way to deal with it would be to have x86_ne_0_operator and x86_eq_0_operator with a mode-less operand inside of it, where the predicate would ensure the mode of the operand is QI/HI/SI/DI (last one only if TARGET_64BIT) in addition checking it is a ne (resp. eq). Unfortunately there is the problem with the constraints, where !TARGET_64BIT requires q constraint instead of r for the comparison operand, maybe use for that enabled attribute?
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #4 from Jakub Jelinek --- Created attachment 47062 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47062=edit gcc10-pr92140-wip.patch Slightly extended untested patch, which handles all the cases in the testcase at the start of the file. Originally reported tst3 still unhandled though.
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 --- Comment #3 from Jakub Jelinek --- Untested patch that handles tst1 and tst2 and some more, but doesn't handle tst3 yet and is still missing some patterns. Unfortunately, it can result in quite a lot of define_insn_and_split patterns, while for !TARGET_64BIT it is just 9 per one in the source, for TARGET_64BIT 16, and we need always one for eq and one for ne, because those need to be handled quite differently, eq being easier because we don't need to change any other arguments, for ne we need to require the other argument to be a constant that we can change, as we need to negate it. Uros, does this look like a way forward? --- gcc/config/i386/i386.md.jj 2019-09-20 12:25:48.0 +0200 +++ gcc/config/i386/i386.md 2019-10-18 10:28:57.953445277 +0200 @@ -1001,6 +1001,9 @@ (define_mode_iterator SWI24 [HI SI]) ;; Single word integer modes. (define_mode_iterator SWI [QI HI SI (DI "TARGET_64BIT")]) +;; The same, for the case where we need to iterate on SWI x SWI +(define_mode_iterator SWIALT [QI HI SI (DI "TARGET_64BIT")]) + ;; Single word integer modes without QImode. (define_mode_iterator SWI248 [HI SI (DI "TARGET_64BIT")]) @@ -6843,6 +6846,100 @@ (define_insn "*addsi3_zext_cc_overflow_2 [(set_attr "type" "alu") (set_attr "mode" "SI")]) +(define_insn_and_split "*add3_eq" + [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,") + (plus:SWI + (plus:SWI + (eq:SWI (match_operand:SWIALT 3 "nonimmediate_operand" + "m,m") + (const_int 0)) + (match_operand:SWI 1 "nonimmediate_operand" "%0,0")) + (match_operand:SWI 2 "" ",m"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (PLUS, mode, operands)" + "#" + "" + [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 3) (const_int 1))) + (parallel [(set (match_dup 0) + (plus:SWI +(plus:SWI (ltu:SWI (reg:CC FLAGS_REG) (const_int 0)) + (match_dup 1)) +(match_dup 2))) + (clobber (reg:CC FLAGS_REG))])]) + +(define_insn_and_split "*add3_eq_0" + [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") + (plus:SWI + (eq:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m") + (const_int 0)) + (match_operand:SWI 1 "nonimmediate_operand" "0"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_unary_operator_ok (PLUS, mode, operands)" + "#" + "" + [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 2) (const_int 1))) + (parallel [(set (match_dup 0) + (plus:SWI (ltu:SWI (reg:CC FLAGS_REG) (const_int 0)) +(match_dup 1))) + (clobber (reg:CC FLAGS_REG))])]) + +(define_insn_and_split "*add3_ne_0" + [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") + (plus:SWI + (ne:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m") + (const_int 0)) + (match_operand:SWI 1 "nonimmediate_operand" "0"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_unary_operator_ok (PLUS, mode, operands)" + "#" + "" + [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 2) (const_int 1))) + (parallel [(set (match_dup 0) + (minus:SWI (minus:SWI + (match_dup 1) + (ltu:SWI (reg:CC FLAGS_REG) (const_int 0))) + (const_int -1))) + (clobber (reg:CC FLAGS_REG))])]) + +(define_insn_and_split "*sub3_eq_0" + [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") + (minus:SWI + (match_operand:SWI 1 "nonimmediate_operand" "0") + (eq:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m") + (const_int 0 + (clobber (reg:CC FLAGS_REG))] + "ix86_unary_operator_ok (MINUS, mode, operands)" + "#" + "" + [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 2) (const_int 1))) + (parallel [(set (match_dup 0) + (minus:SWI (match_dup 1) + (ltu:SWI (reg:CC FLAGS_REG) (const_int 0 + (clobber (reg:CC FLAGS_REG))])]) + +(define_insn_and_split "*sub3_ne_0" + [(set (match_operand:SWI 0 "nonimmediate_operand" "=m") + (minus:SWI + (match_operand:SWI 1 "nonimmediate_operand" "0") + (ne:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m") + (const_int 0 + (clobber (reg:CC FLAGS_REG))] + "ix86_unary_operator_ok (MINUS, mode, operands)" + "#" + "" + [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 2) (const_int 1))) + (parallel [(set (match_dup 0) + (plus:SWI (plus:SWI + (ltu:SWI (reg:CC FLAGS_REG) (const_int 0)) + (match_dup 1)) +(const_int -1))) + (clobber (reg:CC FLAGS_REG))])]) + ;; The patterns that
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- So, what combine attempts here and fails recognizing is in tst1: (parallel [ (set (mem/c:SI (symbol_ref:DI ("v") [flags 0x40] ) [1 v+0 S4 A32]) (plus:SI (ne:SI (mem:QI (plus:DI (reg:DI 89 [ c ]) (symbol_ref:DI ("table") [flags 0x40] )) [0 table S1 A8]) (const_int 0 [0])) (mem/c:SI (symbol_ref:DI ("v") [flags 0x40] ) [1 v+0 S4 A32]))) (clobber (reg:CC 17 flags)) ]) in tst2: (parallel [ (set (mem/c:SI (symbol_ref:DI ("v") [flags 0x40] ) [1 v+0 S4 A32]) (minus:SI (mem/c:SI (symbol_ref:DI ("v") [flags 0x40] ) [1 v+0 S4 A32]) (ne:SI (mem:QI (plus:DI (reg:DI 89 [ c ]) (symbol_ref:DI ("table") [flags 0x40] )) [0 table S1 A8]) (const_int 0 [0] (clobber (reg:CC 17 flags)) ]) and in tst3: (parallel [ (set (reg:SI 82 [ ]) (plus:SI (ne:SI (reg:SI 86) (const_int 0 [0])) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ])
[Bug target/92140] clang vs gcc optimizing with adc/sbb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-10-17 Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #1 from Uroš Bizjak --- This optimization builds on the equivalence of: cmpl$0, %edi sete%al where zero flag is used to set %al and cmpl$1, %edi setc%al where carry flag is set to the "borrow" from the unsigned subtract of the $1 from %edi. The carry flag (aka "borrow") is set only for (0 - 1), when %edi == 0. The above conversion is beneficial when the following insn consumes carry flag (e.g. adc/sbb, but also rcr and rcl). IMO, the combine pass should be the most appropriate place for the above transformation.