[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-11 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #11 from Eric Botcazou  ---
> Does that also fix PR89795?

Yes, I'm going to fix PR89795 and let you check what happens for this one.

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #10 from Jakub Jelinek  ---
(In reply to Eric Botcazou from comment #9)
> > That change added the && !REG_P in there, but unless the reg is loaded from
> > memory, it is unclear how it can guarantee that the upper bits are zero
> > (resp. sign) extended after arbitrary operations.
> 
> Precisely because WORD_REGISTER_OPERATIONS is defined, i.e. the processor is
> supposed to operate only a full registers.

True, but that still doesn't imply that the upper bits are zeros.
If something is loaded from memory, we do get that guarantee.  But if we do say
PLUS, the lowest upper bit might be 0 or 1, the bits above it zeros.
If a constant is loaded into a register, the upper bits might be all zeros or
all ones.  For a multiplication, the upper bits could be anything, etc.

>  But, OK, I'm going to remove the
> problematic && !REG_P and XFAIL the associated testcase.

Does that also fix PR89795?

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-11 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

Eric Botcazou  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-09-11
 Ever confirmed|0   |1

--- Comment #9 from Eric Botcazou  ---
> That change added the && !REG_P in there, but unless the reg is loaded from
> memory, it is unclear how it can guarantee that the upper bits are zero
> (resp. sign) extended after arbitrary operations.

Precisely because WORD_REGISTER_OPERATIONS is defined, i.e. the processor is
supposed to operate only a full registers.  But, OK, I'm going to remove the
problematic && !REG_P and XFAIL the associated testcase.

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread wilson at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

Jim Wilson  changed:

   What|Removed |Added

 CC||wilson at gcc dot gnu.org

--- Comment #8 from Jim Wilson  ---
Created attachment 46867
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46867=edit
simplest fix: don't define WORD_REGISTER_OPERATIONS

I've had problems with combine, LOAD_EXTEND_OP, and WORD_REGISTER_OPERATIONS
before.  I think combine is making unsafe assumptions in this case, but I'm
getting tired of arguing about this.  So the simple solution is to just stop
defining WORD_REGISTER_OPERATIONS.  It makes this testcase work.

I get about a 0.1% code size increase when I try it with a 64-bit toolchain,
looking at libc and libstdc++ library file sizes.  If this affects dhrystone or
coremark code size or performance, a lot of people will be unhappy, so I will
have to double check that.  For a 32-bit toolchain it is more like 0.01% which
is not as bad.

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #7 from Jakub Jelinek  ---
Looks very similar to PR89795 on arm.

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #6 from Jakub Jelinek  ---
I think this has been introduced in PR59461 change and is contrary to what is
documented:
"The high-order bits of rvalues are defined in the following circumstances:

@itemize
@item @code{subreg}s of @code{mem}
When @var{m2} is smaller than a word, the macro @code{LOAD_EXTEND_OP},
can control how the high-order bits are defined.

@item @code{subreg} of @code{reg}s
The upper bits are defined when @code{SUBREG_PROMOTED_VAR_P} is true.
@code{SUBREG_PROMOTED_UNSIGNED_P} describes what the upper bits hold.
Such subregs usually represent local variables, register variables
and parameter pseudo variables that have been promoted to a wider mode."

That change added the && !REG_P in there, but unless the reg is loaded from
memory, it is unclear how it can guarantee that the upper bits are zero (resp.
sign) extended after arbitrary operations.

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #5 from Jakub Jelinek  ---
(In reply to Eric Botcazou from comment #3)
> > So, the bug is either in nonzero_bits that it for the
> > WORD_REGISTER_OPERATIONS and load_extend_op (QImode) == ZERO_EXTEND returns
> > 0s in the upper bits, or in
> > simplify_and_const_int trusting nonzero_bits even on paradoxical subregs, 
> > ...
> 
> As Jim said on gcc-patches@, let's concede defeat for
> WORD_REGISTER_OPERATIONS and just remove it from the compiler.  This will
> pessimize but I'm personally fed up with this endless stream of nonsensical
> bug reports with dozens of improbable -fno- switches...

The "nonsensical bug reports" just happen to uncover bugs in the RTL optimizers
and backends that are usually latent because GIMPLE optimizations optimize
stuff before it has a chance to get into RTL.
It isn't clear to me how WORD_REGISTER_OPERATIONS is supposed to work, while
loads from memory will have those upper bits of the paradoxical regs perhaps
clear, the registers can be set by other operations; do we zero extend say any
addition afterwards to ensure that?  And in this case, do all immediate loads
need to be also zero extended for LOAD_EXTEND_OP ZERO_EXTEND?

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #4 from Jakub Jelinek  ---
If W_O_R with load_extend_op (QImode) == ZERO_EXTEND says that the upper bits
are all clear, then
(insn 34 33 35 (set (reg:QI 15 a5 [orig:94 iftmp.0_7 ] [94])
(const_int -128 [0xff80])) "pr91720.c":12:5 142
{*movqi_internal}
 (nil))
needs to be actually emitted as a5 = 128 rather than a5 = -128, otherwise I
don't see how it can work.

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread ebotcazou at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #3 from Eric Botcazou  ---
> So, the bug is either in nonzero_bits that it for the
> WORD_REGISTER_OPERATIONS and load_extend_op (QImode) == ZERO_EXTEND returns
> 0s in the upper bits, or in
> simplify_and_const_int trusting nonzero_bits even on paradoxical subregs, ...

As Jim said on gcc-patches@, let's concede defeat for WORD_REGISTER_OPERATIONS
and just remove it from the compiler.  This will pessimize but I'm personally
fed up with this endless stream of nonsensical bug reports with dozens of
improbable -fno- switches...

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

--- Comment #2 from Segher Boessenkool  ---
Isn't this *exactly* what WORD_REGISTER_OPERATIONS says is okay to do?

[Bug target/91720] [10 Regression] wrong code with -Og -fno-forward-propagate -frerun-cse-after-loop -fno-tree-fre

2019-09-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91720

Jakub Jelinek  changed:

   What|Removed |Added

 CC||ebotcazou at gcc dot gnu.org,
   ||jakub at gcc dot gnu.org,
   ||segher at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
The problem is that try_combine for:
(gdb) p debug_rtx (i2)
(insn 35 34 36 6 (set (reg:SI 95 [ iftmp.0_7 ])
(zero_extend:SI (reg:QI 94 [ iftmp.0_7 ]))) "pr91720.c":12:5 88
{zero_extendqisi2}
 (nil))
(gdb) p debug_rtx (i3)
(insn 36 35 37 6 (set (reg:SI 96)
(lshiftrt:SI (reg:SI 95 [ iftmp.0_7 ])
(const_int 1 [0x1]))) "pr91720.c":12:5 152 {lshrsi3}
 (expr_list:REG_DEAD (reg:SI 95 [ iftmp.0_7 ])
(nil)))
combine_simplify_rtx "simplifies"
(lshiftrt:SI (zero_extend:SI (reg:QI 94 [ iftmp.0_7 ]))
(const_int 1 [0x1]))
into:
(lshiftrt:SI (subreg:SI (reg:QI 94 [ iftmp.0_7 ]) 0)
(const_int 1 [0x1]))
which is not equivalent, because it pushes random undefined bits to the right.
expand_compound_operation has:
7437  tem = gen_lowpart (mode, XEXP (x, 0));
7438  if (!tem || GET_CODE (tem) == CLOBBER)
7439return x;
7440  tem = simplify_shift_const (NULL_RTX, ASHIFT, mode,
7441  tem, modewidth - pos - len);
7442  tem = simplify_shift_const (NULL_RTX, unsignedp ? LSHIFTRT :
ASHIFTRT,
7443  mode, tem, modewidth - len);
and while the first simplify_shift_const (properly) returns
(ashift:SI (subreg:SI (reg:QI 94 [ iftmp.0_7 ]) 0)
(const_int 24 [0x18]))
because the upper bits of the register are all shifted out, the second
simplify_shift_const returns the paradoxical subreg.
What is specific to this testcase is that reg:QI 94 has nonzero_bits equal to
128.
simplify_shift_const_1 then will simplify it to the (subreg:SI (reg:QI 94 [
iftmp.0_7 ]) 0) with desired AND outer operation and outer_const 128, that
still seems to be correct, but then
p debug_rtx (simplify_and_const_int (0, int_result_mode, x, 128))
(subreg:SI (reg:QI 94 [ iftmp.0_7 ]) 0)
where int_result_mode is SImode and this is already wrong, because paradoxical
subregs don't have defined content of the upper bits.

  /* See what bits may be nonzero in VAROP.  Unlike the general case of
 a call to nonzero_bits, here we don't care about bits outside
 MODE.  */

  nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);

  /* Turn off all bits in the constant that are known to already be zero.
 Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
 which is tested below.  */

  constop &= nonzero;

...
  /* If we are only masking insignificant bits, return VAROP.  */
  if (constop == nonzero)
return varop;

So, the bug is either in nonzero_bits that it for the WORD_REGISTER_OPERATIONS
and load_extend_op (QImode) == ZERO_EXTEND returns 0s in the upper bits, or in
simplify_and_const_int trusting nonzero_bits even on paradoxical subregs, ...