[Bug target/92140] clang vs gcc optimizing with adc/sbb

2023-05-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #33 from Andrew Pinski  ---
Fixed.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2023-05-07 Thread chfast at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #32 from Paweł Bylica  ---
For what it's worth, the original code is compiled the same as in Clang since
GCC 10. https://godbolt.org/z/vxorYW815

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-11-10 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #31 from Segher Boessenkool  ---
(In reply to Segher Boessenkool from comment #29)
> Does it help the i386 port if we disallow a hard reg dest as well?
> RA should be able to handle that just fine as well.

I tried this out, and it creates much worse code (much bigger, anyway),
on all targets.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-21 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #30 from Uroš Bizjak  ---
(In reply to Segher Boessenkool from comment #29)
>
> Does it help the i386 port if we disallow a hard reg dest as well?
> RA should be able to handle that just fine as well.

I don't know from the top of my head, but the current approach where
invalid/unwanted propagation is rejected by ix86_legitimate_combined_insn works
well.

So, to answer your question, the effects of a proposed patch should be analysed
on a case-by-case basis. There are several instructions that use fixed output
registers (mostly double-word insns that use rax/rdx pair, e.g. mult and div)
which are usually tied to the input operand, but nobody ever analysed how
additional RA freedom would affect them.

>
> This still will not get rid of *all* (non-fixed) hard registers, you
> still can get them from explicit register variables, and target code
> (or even generic code) can still put some in non-copies, which combine
> will happily propagate further.

Thanks for this info, so it looks that a safety net in the form of
ix86_legitimate_combined_insn (and insn constraints) is still needed.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-19 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #29 from Segher Boessenkool  ---
(In reply to Uroš Bizjak from comment #27)
> FYI, these constraints were used in the past (when combine was allowed to
> propagate hard registers into combined insn) to prevent reload failures,
> where reload was not able to e.g. reload wrong hard reg in the place of
> count reg in the shift insn pattern to %ecx. Nowadays, constraints in the
> patterns of pre-reload insn_and_split are not necessary anymore, and can be
> removed together with ix86_legitimate_combined_insn target hook.

combine still propagates hard registers.  The only thing it does not
combine is single-set register-register copies from a non-fixed hard
register; everything else with hard registers it still does, like if
the *destination* of the move is a hard reg (e.g., in the function
return value).

Does it help the i386 port if we disallow a hard reg dest as well?
RA should be able to handle that just fine as well.

This still will not get rid of *all* (non-fixed) hard registers, you
still can get them from explicit register variables, and target code
(or even generic code) can still put some in non-copies, which combine
will happily propagate further.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #28 from Jakub Jelinek  ---
Author: jakub
Date: Sat Oct 19 12:46:57 2019
New Revision: 277203

URL: https://gcc.gnu.org/viewcvs?rev=277203=gcc=rev
Log:
PR target/92140
* config/i386/predicates.md (int_nonimmediate_operand): New special
predicate.
* config/i386/i386.md (*add3_eq, *add3_ne,
*add3_eq_0, *add3_ne_0, *sub3_eq, *sub3_ne,
*sub3_eq_1, *sub3_eq_0, *sub3_ne_0): New
define_insn_and_split patterns.

* gcc.target/i386/pr92140.c: New test.
* gcc.c-torture/execute/pr92140.c: New test.

Added:
trunk/gcc/testsuite/gcc.c-torture/execute/pr92140.c
trunk/gcc/testsuite/gcc.target/i386/pr92140.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/predicates.md
trunk/gcc/testsuite/ChangeLog

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-19 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #27 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #26)
> Created attachment 47070 [details]
> gcc10-prereload-splitters.patch
> 
> Updated patch for the pre-reload splitters, which uses a new predicate and
> additionally removes constraints and attributes from those
> define_insn_and_split that had them (most of them didn't), because nothing
> will really look at constraints before reload and similarly for enabled etc.
> attributes.

FYI, these constraints were used in the past (when combine was allowed to
propagate hard registers into combined insn) to prevent reload failures, where
reload was not able to e.g. reload wrong hard reg in the place of count reg in
the shift insn pattern to %ecx. Nowadays, constraints in the patterns of
pre-reload insn_and_split are not necessary anymore, and can be removed
together with ix86_legitimate_combined_insn target hook.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #47069|0   |1
is obsolete||

--- Comment #26 from Jakub Jelinek  ---
Created attachment 47070
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47070=edit
gcc10-prereload-splitters.patch

Updated patch for the pre-reload splitters, which uses a new predicate and
additionally removes constraints and attributes from those
define_insn_and_split that had them (most of them didn't), because nothing will
really look at constraints before reload and similarly for enabled etc.
attributes.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #25 from Jakub Jelinek  ---
The define_insn part of define_insn_and_split needs constraints if it is meant
to match during or after reload, the patterns are just written with the
assumption that they are split before reload.  At least that is my
understanding of them.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #24 from Segher Boessenkool  ---
A dumb question I'm sure, but I don't see it: if the rest of your
define_insn doesn't need constraints, why would the match_scratch
need some?  (A define_split never uses constraints).

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #23 from Jakub Jelinek  ---
(In reply to Segher Boessenkool from comment #22)
> Hrm, I don't see how this is nicer than just adding a scratch in the
> pattern?  What makes that a worse option?

Most of the patterns don't have constraints and don't want to deal with that. 
See the ugliness I had to play with the enabled attribute in the earlier
version of the patch to deal properly with the constraints.  Many of them
actually don't create any pseudos, just want to be matched only before reload,
split there and not match afterwards.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #22 from Segher Boessenkool  ---
Hrm, I don't see how this is nicer than just adding a scratch in the
pattern?  What makes that a worse option?

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #21 from Jakub Jelinek  ---
Created attachment 47069
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47069=edit
gcc10-prereload-splitters.patch

Ah, apparently we already have for ~ 2 years a property to handle this safely.
So perhaps following incremental (so far completely untested) patch?
It unfortunately requires the two generic changes, the alternative would be
to add a helper function somewhere in i386*.c which would
return can_create_pseudo_p () && !(cfun->curr_properties &
PROP_rtl_split_insns);
declare it in i386-protos.h and just use it in config/i386/*.md instead.
Any preferences?  From maintainance POV, it might be cleaner to have the
wrapper, but then the question is what is the best name for it.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #20 from Segher Boessenkool  ---
Ah, okay.  So it is either one or two insns (zero can not be handled, but you
can do a noop, a move of a reg to itself, and that will be optimised away just
fine).  Three insns is not something combine ever handles at all: it's always
{2,3,4}->{1,2}.

Since some years new pseudos *can* be created during combine, but there are
various problems with that still.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #19 from Uroš Bizjak  ---
(In reply to Segher Boessenkool from comment #18)
> (In reply to Uroš Bizjak from comment #15)
> > Is it possible to lift the limitation from the combine pass, where the
> > combine tries to split the insn, but expects exactly two new insn patterns
> > to be generated. From the docs:
> > 
> > --q--
> >  When the combiner phase tries to split an insn pattern, it is always
> > the case that the pattern is _not_ matched by any 'define_insn'.  The
> > combiner pass first tries to split a single 'set' expression and then
> > the same 'set' expression inside a 'parallel', but followed by a
> > 'clobber' of a pseudo-reg to use as a scratch register.  In these cases,
> > the combiner expects exactly two new insn patterns to be generated.  It
> > will verify that these patterns match some 'define_insn' definitions, so
> > you need not do this test in the 'define_split' (of course, there is no
> > point in writing a 'define_split' that will never produce insns that
> > match).
> > --/q--
> > 
> > Many define_and_split insns in i386.md that use can_create_pseudo_p have
> > this unwanted condition in their insn condition just because of the
> > mentioned limitation.
> 
> What unwanted condition?  What limitation?

"unwanted condition" refers to the usage of can_create_pseudo_p in the insn
condition. "Limitation" refers to "the combiner expects exactly two new insn
patterns to be generated".

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #18 from Segher Boessenkool  ---
(In reply to Uroš Bizjak from comment #15)
> Is it possible to lift the limitation from the combine pass, where the
> combine tries to split the insn, but expects exactly two new insn patterns
> to be generated. From the docs:
> 
> --q--
>  When the combiner phase tries to split an insn pattern, it is always
> the case that the pattern is _not_ matched by any 'define_insn'.  The
> combiner pass first tries to split a single 'set' expression and then
> the same 'set' expression inside a 'parallel', but followed by a
> 'clobber' of a pseudo-reg to use as a scratch register.  In these cases,
> the combiner expects exactly two new insn patterns to be generated.  It
> will verify that these patterns match some 'define_insn' definitions, so
> you need not do this test in the 'define_split' (of course, there is no
> point in writing a 'define_split' that will never produce insns that
> match).
> --/q--
> 
> Many define_and_split insns in i386.md that use can_create_pseudo_p have
> this unwanted condition in their insn condition just because of the
> mentioned limitation.

What unwanted condition?  What limitation?

From combine.c:

  /* If we were combining three insns and the result is a simple SET
 with no ASM_OPERANDS that wasn't recognized, try to split it into two
 insns.  There are two ways to do this.  It can be split using a
 machine-specific method (like when you have an addition of a large
 constant) or by combine in the function find_split_point.  */

  /* See if the MD file can split NEWPAT.  If it can't, see if letting it
 use I2DEST as a scratch register will help.  In the latter case,
 convert I2DEST to the mode of the source of NEWPAT if we can.  */

  /* ??? Reusing i2dest without resetting the reg_stat entry for it
 (temporarily, until we are committed to this instruction
 combination) does not work: for example, any call to nonzero_bits
 on the register (from a splitter in the MD file, for example)
 will get the old information, which is invalid.

 Since nowadays we can create registers during combine just fine,
 we should just create a new one here, not reuse i2dest.  */

Exactly two new insns...  Exactly one is also okay.  I'll fix the docs.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #17 from Jakub Jelinek  ---
I've tried to change the patch to use define_split instead of
define_insn_and_split, with all of them changed, it creates worse code for
f8/f12/f15 (the last one is expected, because we split into 3 instructions
instead of two), while if I change only the *{add,sub}3_{eq,ne}
define_insn_and_split into define_split and keep the others, those 3 functions
are fixed again, but f5/f6/f7/f9/f10/f11/f13/f14 result in worse code.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #16 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #13)
> Created attachment 47067 [details]
> gcc10-pr92140.patch
> 
> So what about this version then?  I've changed back a couple of
>  to nonimmediate_operand and removed corresponding
> force_reg, because it would be in spots where the there is already one
> possible immediate which would be in operands[2] rather than operands[1],
> changed the eq/ne_0_operator to the define_special_predicate you've
> suggested and added testcase coverage.

LGTM.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #15 from Uroš Bizjak  ---
(In reply to Segher Boessenkool from comment #12)
> (In reply to Uroš Bizjak from comment #10)
> > Regarding reliability of pre-reload splitters, IIRC they should be safe, but
> > I'll leave the final verdict to RTL experts.
> 
> The *splitters* are just fine; a define_insn with a condition that turns off
> in a later pass generally is not.

Is it possible to lift the limitation from the combine pass, where the combine
tries to split the insn, but expects exactly two new insn patterns to be
generated. From the docs:

--q--
 When the combiner phase tries to split an insn pattern, it is always
the case that the pattern is _not_ matched by any 'define_insn'.  The
combiner pass first tries to split a single 'set' expression and then
the same 'set' expression inside a 'parallel', but followed by a
'clobber' of a pseudo-reg to use as a scratch register.  In these cases,
the combiner expects exactly two new insn patterns to be generated.  It
will verify that these patterns match some 'define_insn' definitions, so
you need not do this test in the 'define_split' (of course, there is no
point in writing a 'define_split' that will never produce insns that
match).
--/q--

Many define_and_split insns in i386.md that use can_create_pseudo_p have this
unwanted condition in their insn condition just because of the mentioned
limitation.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #14 from Jakub Jelinek  ---
And as for the define_insn_and_split without constraints that don't expect to
be matched post split1, I think we can try to figure out something
incrementally and change all of them at once, e.g. a property set by expand
pass and cleared by split1 pass, or property set by split1 pass or something
similar.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #13 from Jakub Jelinek  ---
Created attachment 47067
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47067=edit
gcc10-pr92140.patch

So what about this version then?  I've changed back a couple of
 to nonimmediate_operand and removed corresponding force_reg,
because it would be in spots where the there is already one possible immediate
which would be in operands[2] rather than operands[1], changed the
eq/ne_0_operator to the define_special_predicate you've suggested and added
testcase coverage.
I'm not sure trying to do something here in peephole2 would catch as many cases
as the combiner patterns can handle.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #12 from Segher Boessenkool  ---
(In reply to Uroš Bizjak from comment #10)
> Regarding reliability of pre-reload splitters, IIRC they should be safe, but
> I'll leave the final verdict to RTL experts.

The *splitters* are just fine; a define_insn with a condition that turns off
in a later pass generally is not.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #11 from Segher Boessenkool  ---
If an insn condition uses can_create_pseudo_p, the insn will suddenly stop
to match after reload --> kaboom.

If your insn always splits ("&& 1"), this means that if any of these:
  NEXT_PASS (pass_lower_subreg3);
  NEXT_PASS (pass_df_initialize_no_opt);
  NEXT_PASS (pass_stack_ptr_mod);
  NEXT_PASS (pass_mode_switching);
  NEXT_PASS (pass_match_asm_constraints);
  NEXT_PASS (pass_sms);
  NEXT_PASS (pass_live_range_shrinkage);
  NEXT_PASS (pass_sched);
  NEXT_PASS (pass_early_remat);
  NEXT_PASS (pass_ira);
  NEXT_PASS (pass_reload);
passes creates a new such insn, you ICE (or worse, do the wrong thing).

That limits exposure quite a bit, but OTOH it is really hard to prove some
of these passes will not create such insns.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #10 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #9)
> Created attachment 47065 [details]
> gcc10-pr92140-wip.patch
> 
> If pre-reload splitters are reliable, my patch can be greatly simplified and
> using the formatting, && can_create_pseudos (), && 1 and  +
> force_reg ideas from your patch this is the merge of both, which can handle
> all f1-f14 plus tst1-3.

You can use define_special_predicate to check the mode of the operand inside
relational operator. To avoid pattern explosion, maybe compound instructions
should be synthesized by peephole pass.

Regarding reliability of pre-reload splitters, IIRC they should be safe, but
I'll leave the final verdict to RTL experts.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #9 from Jakub Jelinek  ---
Created attachment 47065
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47065=edit
gcc10-pr92140-wip.patch

If pre-reload splitters are reliable, my patch can be greatly simplified and
using the formatting, && can_create_pseudos (), && 1 and  +
force_reg ideas from your patch this is the merge of both, which can handle all
f1-f14 plus tst1-3.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #8 from Jakub Jelinek  ---
Comparing the two patches, your patch handles f1-f4 in
/* PR target/92140 */

char c;
int v;

__attribute__((noipa)) void f1 (void) { v += c != 0; }
__attribute__((noipa)) void f2 (void) { v -= c != 0; }
__attribute__((noipa)) void f3 (void) { v += c == 0; }
__attribute__((noipa)) void f4 (void) { v -= c == 0; }
__attribute__((noipa)) void f5 (void) { v += (c != 0) - 26; }
__attribute__((noipa)) void f6 (void) { v -= (c != 0) - 26; }
__attribute__((noipa)) void f7 (void) { v += (c == 0) - 26; }
__attribute__((noipa)) void f8 (void) { v -= (c == 0) - 26; }
__attribute__((noipa)) void f9 (void) { v += (c != 0) + 42; }
__attribute__((noipa)) void f10 (void) { v -= (c != 0) + 42; }
__attribute__((noipa)) void f11 (void) { v += (c == 0) + 42; }
__attribute__((noipa)) void f12 (void) { v -= (c == 0) + 42; }
__attribute__((noipa)) void f13 (int z) { v += (c == 0) + z; }
__attribute__((noipa)) void f14 (int z) { v -= (c == 0) + z; }

int
main ()
{
  int i;
  for (i = 0; i < 2; i++)
{
  v = 15;
  if (i == 1)
c = 37;
  f1 ();
  if (v != 15 + i)
__builtin_abort ();
  f2 ();
  if (v != 15)
__builtin_abort ();
  f3 ();
  if (v != 16 - i)
__builtin_abort ();
  f4 ();
  if (v != 15)
__builtin_abort ();
  f5 ();
  if (v != 15 + i - 26)
__builtin_abort ();
  f6 ();
  if (v != 15)
__builtin_abort ();
  f7 ();
  if (v != 16 - i - 26)
__builtin_abort ();
  f8 ();
  if (v != 15)
__builtin_abort ();
  f9 ();
  if (v != 15 + i + 42)
__builtin_abort ();
  f10 ();
  if (v != 15)
__builtin_abort ();
  f11 ();
  if (v != 16 - i + 42)
__builtin_abort ();
  f12 ();
  if (v != 15)
__builtin_abort ();
  f13 (173);
  if (v != 16 - i + 173)
__builtin_abort ();
  f14 (173);
  if (v != 15)
__builtin_abort ();
  f13 (-35);
  if (v != 16 - i - 35)
__builtin_abort ();
  f14 (-35);
  if (v != 15)
__builtin_abort ();
}
  return 0;
}
and all cases in the #c0 testcase, while my patch handles f1-f14 in the above
testcase and only tst1/tst2 in #c0 testcase.

For the pre-reload only splitters, I'm never sure if it is safe.  If it is
matched pre-split1 (usually during combine), it is of course fine, but what if
it appears after split1  and before end of reload, do we have a guarantee that
nothing will try to match those?  If it is safe, I can of course add the &&
!reload_completed or similar to the conditions (or && can_create_pseudo_p ()).

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #7 from Uroš Bizjak  ---
Created attachment 47064
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47064=edit
Proposed patch with pre-reload splitters

Maybe we should use pre-reload splitters as with the attached patch that
converts all test cases.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #47062|0   |1
is obsolete||

--- Comment #6 from Jakub Jelinek  ---
Created attachment 47063
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47063=edit
gcc10-pr92140-wip.patch

Updated patch that implements that, so when expanded only adds 36 define_insn
and 36 define_split rather than 144 each.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #5 from Jakub Jelinek  ---
The patch adds 144 define_insn and 144 define_split to tmp-mddump.md though, to
total 6217 define_insn and 733 define_split.
Maybe a better way to deal with it would be to have x86_ne_0_operator and
x86_eq_0_operator with a mode-less operand inside of it, where the predicate
would ensure the mode of the operand is QI/HI/SI/DI (last one only if
TARGET_64BIT) in addition checking it is a ne (resp. eq).
Unfortunately there is the problem with the constraints, where !TARGET_64BIT
requires q constraint instead of r for the comparison operand, maybe use for
that enabled attribute?

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #4 from Jakub Jelinek  ---
Created attachment 47062
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47062=edit
gcc10-pr92140-wip.patch

Slightly extended untested patch, which handles all the cases in the testcase
at the start of the file.  Originally reported tst3 still unhandled though.

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #3 from Jakub Jelinek  ---
Untested patch that handles tst1 and tst2 and some more, but doesn't handle
tst3 yet and is still missing some patterns.
Unfortunately, it can result in quite a lot of define_insn_and_split patterns,
while for !TARGET_64BIT it is just 9 per one in the source, for TARGET_64BIT
16,
and we need always one for eq and one for ne, because those need to be handled
quite differently, eq being easier because we don't need to change any other
arguments, for ne we need to require the other argument to be a constant that
we can change, as we need to negate it.

Uros, does this look like a way forward?

--- gcc/config/i386/i386.md.jj  2019-09-20 12:25:48.0 +0200
+++ gcc/config/i386/i386.md 2019-10-18 10:28:57.953445277 +0200
@@ -1001,6 +1001,9 @@ (define_mode_iterator SWI24 [HI SI])
 ;; Single word integer modes.
 (define_mode_iterator SWI [QI HI SI (DI "TARGET_64BIT")])

+;; The same, for the case where we need to iterate on SWI x SWI
+(define_mode_iterator SWIALT [QI HI SI (DI "TARGET_64BIT")])
+
 ;; Single word integer modes without QImode.
 (define_mode_iterator SWI248 [HI SI (DI "TARGET_64BIT")])

@@ -6843,6 +6846,100 @@ (define_insn "*addsi3_zext_cc_overflow_2
   [(set_attr "type" "alu")
(set_attr "mode" "SI")])

+(define_insn_and_split "*add3_eq"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")
+   (plus:SWI
+ (plus:SWI
+   (eq:SWI (match_operand:SWIALT 3 "nonimmediate_operand"
+   "m,m")
+   (const_int 0))
+   (match_operand:SWI 1 "nonimmediate_operand" "%0,0"))
+ (match_operand:SWI 2 "" ",m")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (PLUS, mode, operands)"
+  "#"
+  ""
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 3) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (plus:SWI
+(plus:SWI (ltu:SWI (reg:CC FLAGS_REG) (const_int 0))
+  (match_dup 1))
+(match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_insn_and_split "*add3_eq_0"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m")
+   (plus:SWI
+ (eq:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m")
+ (const_int 0))
+ (match_operand:SWI 1 "nonimmediate_operand" "0")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_unary_operator_ok (PLUS, mode, operands)"
+  "#"
+  ""
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 2) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (plus:SWI (ltu:SWI (reg:CC FLAGS_REG) (const_int 0))
+(match_dup 1)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_insn_and_split "*add3_ne_0"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m")
+   (plus:SWI
+ (ne:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m")
+ (const_int 0))
+ (match_operand:SWI 1 "nonimmediate_operand" "0")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_unary_operator_ok (PLUS, mode, operands)"
+  "#"
+  ""
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 2) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (minus:SWI (minus:SWI
+   (match_dup 1)
+   (ltu:SWI (reg:CC FLAGS_REG) (const_int 0)))
+ (const_int -1)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_insn_and_split "*sub3_eq_0"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m")
+   (minus:SWI
+ (match_operand:SWI 1 "nonimmediate_operand" "0")
+ (eq:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m")
+ (const_int 0
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_unary_operator_ok (MINUS, mode, operands)"
+  "#"
+  ""
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 2) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (minus:SWI (match_dup 1)
+ (ltu:SWI (reg:CC FLAGS_REG) (const_int 0
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_insn_and_split "*sub3_ne_0"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m")
+   (minus:SWI
+ (match_operand:SWI 1 "nonimmediate_operand" "0")
+ (ne:SWI (match_operand:SWIALT 2 "nonimmediate_operand" "m")
+ (const_int 0
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_unary_operator_ok (MINUS, mode, operands)"
+  "#"
+  ""
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 2) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (plus:SWI (plus:SWI
+  (ltu:SWI (reg:CC FLAGS_REG) (const_int 0))
+  (match_dup 1))
+(const_int -1)))
+ (clobber (reg:CC FLAGS_REG))])])
+
 ;; The patterns that 

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
So, what combine attempts here and fails recognizing is in tst1:
(parallel [
(set (mem/c:SI (symbol_ref:DI ("v") [flags 0x40]  ) [1 v+0 S4 A32])
(plus:SI (ne:SI (mem:QI (plus:DI (reg:DI 89 [ c ])
(symbol_ref:DI ("table") [flags 0x40]  )) [0 table S1 A8])
(const_int 0 [0]))
(mem/c:SI (symbol_ref:DI ("v") [flags 0x40]  ) [1 v+0 S4 A32])))
(clobber (reg:CC 17 flags))
])
in tst2:
(parallel [
(set (mem/c:SI (symbol_ref:DI ("v") [flags 0x40]  ) [1 v+0 S4 A32])
(minus:SI (mem/c:SI (symbol_ref:DI ("v") [flags 0x40]  ) [1 v+0 S4 A32])
(ne:SI (mem:QI (plus:DI (reg:DI 89 [ c ])
(symbol_ref:DI ("table") [flags 0x40]  )) [0 table S1 A8])
(const_int 0 [0]
(clobber (reg:CC 17 flags))
])
and in tst3:
(parallel [
(set (reg:SI 82 [  ])
(plus:SI (ne:SI (reg:SI 86)
(const_int 0 [0]))
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
])

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-17 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

Uroš Bizjak  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-10-17
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #1 from Uroš Bizjak  ---
This optimization builds on the equivalence of:

cmpl$0, %edi
sete%al

where zero flag is used to set %al and

cmpl$1, %edi
setc%al

where carry flag is set to the "borrow" from the unsigned subtract of the $1
from %edi. The carry flag (aka "borrow") is set only for (0 - 1), when %edi ==
0.


The above conversion is beneficial when the following insn consumes carry flag
(e.g. adc/sbb, but also rcr and rcl).

IMO, the combine pass should be the most appropriate place for the above
transformation.