[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-17 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #17 from Jakub Jelinek  ---
Should be fixed now.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-17 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:1dfc50232dcb703454db4f54c538042a32be2138

commit r10-7773-g1dfc50232dcb703454db4f54c538042a32be2138
Author: Jakub Jelinek 
Date:   Fri Apr 17 16:56:12 2020 +0200

i386: Fix up *testqi_ext_3 define_insn_and_split [PR94567]

As the testcase shows, there are unfortunately more problematic cases
in *testqi_ext_3 if the mode is not CCZmode, because the sign flag might
not behave the same between the insn with zero_extract and what we split it
into.

The previous fix to the insn condition was because *testdi_1 for mask with
upper 32-bits clear and bit 31 set is implemented using SImode test and
thus
SF is set depending on that bit 31 rather than on always cleared.

But we can have other cases.  On the zero_extract (which has mode),
we can have either the pos + len == precision of mode, or
pos + len < precision of mode cases.  The former one copies the most
significant bit into SF, the latter will have SF always cleared.

For the former case, either it is a zero_extract from a larger mode, but
then when we perform test in that larger mode, SF will be always clear and
thus mismatch from the zero_extract case (so we need to enforce CCZmode),
or it will be a zero_extract from same mode with pos 0 and len equal to
mode precision, such zero_extracts should have been really simplified
into their first operand.

For the latter case, when SF is always clear on the define_insn with
zero_extract, we need to split into something that doesn't sometimes set
SF, i.e. it has to be a test with mask that doesn't have the most
significant bit set.  In some cases it can be achieved through using test
in a wider mode (e.g. in the testcase, there is
(zero_extract:SI (reg:HI) (const_int 13) (const_int 3))
which will always set SF to 0, but we split it into
(and:HI (reg:HI) (const_int -8))
which will copy the MSB of (reg:HI) into SF, but we can do:
(and:SI (subreg:SI (reg:HI) 0) (const_int 0xfff8))
which will keep SF always cleared), but there are various cases where we
can't (when already using DImode, or when SImode and we'd turned it into
the problematic *testdi_1 implemented using SImode test, or when
the val operand is a MEM (we don't want to read from memory more than
the user originally wanted), paradoxical subreg of MEM could be problematic
too if we through the narrowing end up with a MEM).

So, the patch attempts to require CCZmode (and not CCNOmode) if it can't
really ensure the SF will have same meaning between the define_insn and
what
we split it into, and if we decide we allow CCNOmode, it needs to avoid
performing narrowing and/or widen if pos + len would indicate we'd have MSB
set in the mask.

2020-04-17  Jakub Jelinek  
Jeff Law  

PR target/94567
* config/i386/i386.md (*testqi_ext_3): Use CCZmode rather than
CCNOmode in ix86_match_ccmode if len is equal to mode
precision,
or pos + len >= 32, or pos + len is equal to operands[2] precision
and operands[2] is not a register operand.  During splitting
perform
SImode AND if operands[0] doesn't have CCZmode and pos + len is
equal to mode precision.

* gcc.c-torture/execute/pr94567.c: New test.

Co-Authored-By: Jeff Law 

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-16 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #15 from Jeffrey A. Law  ---
Comment on attachment 48288
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48288
gcc10-pr94567.patch

I think that'll work. If it passes, consider it approved.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #14 from Jakub Jelinek  ---
Created attachment 48288
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48288=edit
gcc10-pr94567.patch

So perhaps this?  In the condition exclude cases where we can't widen the
problematic cases (and, as I figured out today, we can't widen if val is a MEM
because we may not read from memory more than was read originally and also
paradoxical subregs of MEM which are accepted as register_operand before reload
could be problematic) and during splitting ensure we don't narrow if we
couldn't widen again and widen if possible?

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-15 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #13 from Jeffrey A. Law  ---
Sigh.  That code is good in that it's rejecting matching the pattern for the
SImode sign bit that we can't implement.   For some dumb reason I was thinking
it was changing how we split, but it's actually the main condition.  So calling
it a "hack" was a mistake.

The only time we have to widen is when the pos + len exactly hits the sign bit
in the operand's mode, which is what I thought my patch did.  Certainly we
don't want to be changing sizes unnecessarily, that's a given.  I guess what I
did could be refined to allow that case for CCZmode.

Your test of:

+|| (INTVAL (operands[3]) + INTVAL (operands[4])
+== GET_MODE (operands[2]))

Looks wrong.  Didn't you mean to get the precision of the mode of operand2?

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #12 from Jakub Jelinek  ---
(In reply to Jeffrey A. Law from comment #11)
> Rather than extending that hack, I think just widening the mode when the
> sign bit is being tested (c#5) is simpler and easier to understand.  The
> bits you're changing should be killed rather than extended to handle more
> cases.

That is not a hack.  x86_64 doesn't have an instruction that ands a 64-bit
reg/mem with a 32-bit zero extended immediate (nor 64-bit immediate), so when
we want to use reg against 32-bit zero extended, we need to use 32-bit
instruction and that has different behavior for the SF, so we must ensure we
don't care about that most significant bit.

Now for these other cases, perhaps we could instead user wider mode if
possible, but it isn't possible always.
So, combine my patch with the last condition changed into >= 32 from
== GET_MODE (operands[2]) and then apply what you have in #c5, but with
hardcoded SImode instead of wider mode - we want to avoid introducing HImode
stuff when there wasn't before, it is both larger and slower - and only do it
if GET_MODE (operands[0]) isn't CCZmode.
If pos + len >= 32, such in the pos + len == 64 special case where we turn
testing bits pos (< 32) up to most significant into a testq which has that
sign-extended 32-bit immediate, then the pattern before splitting would have
always SF zero (unless pos is 0) but split pattern would always copy there the
MSB of the 64-bit register.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-15 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #11 from Jeffrey A. Law  ---
Rather than extending that hack, I think just widening the mode when the sign
bit is being tested (c#5) is simpler and easier to understand.  The bits you're
changing should be killed rather than extended to handle more cases.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #10 from Jakub Jelinek  ---
So something like:
--- gcc/config/i386/i386.md.jj  2020-03-16 22:56:55.556043275 +0100
+++ gcc/config/i386/i386.md 2020-04-15 19:07:04.405933639 +0200
@@ -8732,8 +8732,20 @@
&& ix86_match_ccmode (insn,
 /* *testdi_1 requires CCZmode if the mask has bit
31 set and all bits above it clear.  */
-GET_MODE (operands[2]) == DImode
-&& INTVAL (operands[3]) + INTVAL (operands[4]) == 32
+(GET_MODE (operands[2]) == DImode
+ && INTVAL (operands[3]) + INTVAL (operands[4]) == 32)
+/* If zero_extract mode precision is the same
+   as len, the SF of the zero_extract
+   comparison will be the most significant
+   extracted bit, but this could be matched
+   after splitting only for pos 0 len all bits
+   trivial extractions.  Require CCZmode.  */
+|| (GET_MODE_PRECISION (mode)
+== INTVAL (operands[3]))
+/* Otherwise, require CCZmode if we'd use a mask
+   with the most significant bit set.  */
+|| (INTVAL (operands[3]) + INTVAL (operands[4])
+== GET_MODE (operands[2]))
 ? CCZmode : CCNOmode)"
   "#"
   "&& 1"
@@ -8758,7 +8770,12 @@
 }

   /* Small HImode tests can be converted to QImode.  */
-  if (register_operand (val, HImode) && pos + len <= 8)
+  if (register_operand (val, HImode)
+  && (pos + len < 8
+ /* If the mask would include all bits, ensure we don't
+care about the SF.  */
+ || (pos + len == 8
+ && GET_MODE (operands[0]) == CCZmode)))
 {
   val = gen_lowpart (QImode, val);
   mode = QImode;

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-15 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

Jakub Jelinek  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org,
   ||uros at gcc dot gnu.org

--- Comment #9 from Jakub Jelinek  ---
(In reply to Richard Biener from comment #6)
> There's also
> 
>&& ix86_match_ccmode (insn,
>  /* *testdi_1 requires CCZmode if the mask has bit
> 31 set and all bits above it clear.  */
>  GET_MODE (operands[2]) == DImode
>  && INTVAL (operands[3]) + INTVAL (operands[4]) == 32
>  ? CCZmode : CCNOmode)"
> 
> which is likely where the original correctness issue lies - it fails
> to check for the similar cases and HImode (and SImode and QImode).

At least in my understanding of this PR, the above addition in PR94088 isn't
where the problem is, but the problem is that we shouldn't require CCZmode just
in that case, but also in other cases where the splitting would change the
meaning of the sign flag.
If (GET_MODE_PRECISION (mode) < INTVAL (operands[3])), then the unsplit
pattern will have SF always false, so we need to ensure that either the pattern
is using CCZmode in which case we don't care about the sign, or that the mask
we'll use doesn't have the most significant bit set.
If (GET_MODE_PRECISION (mode) == INTVAL (operands[3])), then the SF will
be equal to the most significant bit before splitting.  But then we'd need the
resulting insn to use that exact mode, which would mean only trivial zero
extraction from position 0 and all bits, combiner should have simplified that
IMNSHO.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-15 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #8 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #7)
> I think it's trying to use smaller modes because the encodings can be
> smaller.  In other cases it changes the mode to avoid partial register
> stalls.  It's a bit of a mess.
> 
> WRT the fragment you mentioned, I looked at that repeatedly trying to
> ascertain the real motivation and whether or not that code needed
> generalization to handle this case or was a misguided attempt to fix another
> instance of this issue.
> 
> The conclusion I came to was that hunk of code may well be working around
> another instance of this same problem, but it was neither generalizable to
> this BZ nor would my approach totally fix that instance.
> 
> We may be able to remove the hack in the testqi_ext_3 pattern, but I think
> the corresponding hack in testdi_1 would have to stay unless we found a way
> to merge testdi_1 into the more general test_1 pattern.  Neither of
> those seems terribly appropriate right now.

agreed

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-14 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #7 from Jeffrey A. Law  ---
I think it's trying to use smaller modes because the encodings can be smaller. 
In other cases it changes the mode to avoid partial register stalls.  It's a
bit of a mess.

WRT the fragment you mentioned, I looked at that repeatedly trying to ascertain
the real motivation and whether or not that code needed generalization to
handle this case or was a misguided attempt to fix another instance of this
issue.

The conclusion I came to was that hunk of code may well be working around
another instance of this same problem, but it was neither generalizable to this
BZ nor would my approach totally fix that instance.

We may be able to remove the hack in the testqi_ext_3 pattern, but I think the
corresponding hack in testdi_1 would have to stay unless we found a way to
merge testdi_1 into the more general test_1 pattern.  Neither of those
seems terribly appropriate right now.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #6 from Richard Biener  ---
(In reply to Jeffrey A. Law from comment #5)
> I've pondered just killing that pattern, but I'm pretty sure there'll be
> notable regressions.  There was a clear regression we fixed in gcc-6 due to
> not handling QImode operands in that pattern.
> 
> What I'm playing with is looking at pos + len and if it hits the sign bit in
> the operand's mode, then widening the mode.  Something like this:
> 
> +  /* If the mask is going to have the sign bit set in the mode
> + we want to do the comparison, then we must widen the target
> + mode.  Otherwise the flags will be incorrect when we split
> + this into a (compare (and (op0) (mask))) and a subsequent
> + test like LE will get the wrong result.  */
> +  if (mode < E_DImode
> +  && pos + len == GET_MODE_PRECISION (mode))
> +{
> +  mode = GET_MODE_WIDER_MODE (mode).require ();
> +  val = gen_lowpart (mode, val);
> +}
> +
> 
> Which I think is roughly what you were suggesting.  Mine does it with a
> SUBREG, so it matches existing patterns...  A ZERO_EXTEND may well require
> new patterns.

I see.  A subreg should work as well indeed.  Note I think simply using
the original mode combine used is safer than using some random other(?)
then we know the comparison semantics are unchanged.  Of course your
suggested change above will cause less changes at this point.

There's also

   && ix86_match_ccmode (insn,
 /* *testdi_1 requires CCZmode if the mask has bit
31 set and all bits above it clear.  */
 GET_MODE (operands[2]) == DImode
 && INTVAL (operands[3]) + INTVAL (operands[4]) == 32
 ? CCZmode : CCNOmode)"

which is likely where the original correctness issue lies - it fails
to check for the similar cases and HImode (and SImode and QImode).

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-14 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #5 from Jeffrey A. Law  ---
I've pondered just killing that pattern, but I'm pretty sure there'll be
notable regressions.  There was a clear regression we fixed in gcc-6 due to not
handling QImode operands in that pattern.

What I'm playing with is looking at pos + len and if it hits the sign bit in
the operand's mode, then widening the mode.  Something like this:

+  /* If the mask is going to have the sign bit set in the mode
+ we want to do the comparison, then we must widen the target
+ mode.  Otherwise the flags will be incorrect when we split
+ this into a (compare (and (op0) (mask))) and a subsequent
+ test like LE will get the wrong result.  */
+  if (mode < E_DImode
+  && pos + len == GET_MODE_PRECISION (mode))
+{
+  mode = GET_MODE_WIDER_MODE (mode).require ();
+  val = gen_lowpart (mode, val);
+}
+

Which I think is roughly what you were suggesting.  Mine does it with a SUBREG,
so it matches existing patterns...  A ZERO_EXTEND may well require new
patterns.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

--- Comment #4 from Richard Biener  ---
Wonder if the simplest thing is to turn the zero_extract into a zero_extend
so we can maintain the and in SImode.  And whether combine will ever
generate a zero_extract that extracts the SImode MSB.

[Bug target/94567] [10 Regression] wrong code at -O2 and -O3 on x86_64-linux-gnu

2020-04-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94567

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-* i?86-*-*
  Component|rtl-optimization|target

--- Comment #3 from Richard Biener  ---
Target issue then.