[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2016-01-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jakub Jelinek  ---
Testcase fixed, for GCC 7 I've cloned this into PR69231 for the rtx_cost of a
SUBREG issue.

[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2016-01-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

--- Comment #8 from Jakub Jelinek  ---
Author: jakub
Date: Mon Jan 11 19:07:31 2016
New Revision: 232242

URL: https://gcc.gnu.org/viewcvs?rev=232242&root=gcc&view=rev
Log:
PR target/67462
* gcc.dg/ifcvt-3.c: Only compile on lp64 targets, include also i?86
if lp64.

Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/ifcvt-3.c

[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2016-01-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

--- Comment #7 from Jakub Jelinek  ---
(In reply to Bernd Schmidt from comment #6)
> That does look dodgy. It's also really old, from when rtx_cost was part of
> cse.c. Kenner added it along with many other changes in r754 in 1992.
> 
> See what happens to codegen if you just strip SUBREGs before this switch and
> lose the MODES_TIEABLE thing?

Well, if the SUBREG modes are not tieable and it is expected that reload will
have to add some reload insns to read those subregs or store them, I think it
is appropriate to have some non-zero cost for them.  Even for the integral
modes, if the RA chooses say on x86-64 to allocate them say in SSE registers
then trying to read smaller modes out of them might have some cost.  It is just
that if they happen to be allocated in i?86 GPRs that the cost is zero.  And,
with the stv pass that doesn't have to happen as often as in the past.
I'd say it is too dangerous so late in stage3 to change the costs now, and for
GCC 7 perhaps we should just allow the target hook to guess the cost of the
subreg.

[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2016-01-09 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

--- Comment #6 from Bernd Schmidt  ---
That does look dodgy. It's also really old, from when rtx_cost was part of
cse.c. Kenner added it along with many other changes in r754 in 1992.

See what happens to codegen if you just strip SUBREGs before this switch and
lose the MODES_TIEABLE thing?

[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2016-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

Jakub Jelinek  changed:

   What|Removed |Added

 CC||bernds at gcc dot gnu.org,
   ||law at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
BTW, the reason why the costs are different is not something in backend's
control.
rtx_cost has:
case SUBREG:
  total = 0;
  /* If we can't tie these modes, make this expensive.  The larger
 the mode, the more expensive it is.  */
  if (! MODES_TIEABLE_P (mode, GET_MODE (SUBREG_REG (x
return COSTS_N_INSNS (2 + factor);
  break;
without the possibility of target to override this, and as for 32-bit arches
HARD_REGNO_MODE_NREGS is different between SImode and DImode, those are
required not to be tieable.
I fail to see why at least on i686/x86_64 for the word mode integral subregs of
integral double word mode there is any higher cost than of simple REG (i.e. 0)
though, if the pseudo the subreg is of is given a hard register, then reload
turns it into access of just one register of the GPR pair, and if it lives in a
stack slot, then reload can just load and/or store one half of the memory slot.

[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2016-01-09 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Created attachment 37294
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37294&action=edit
gcc6-pr67462.patch

This is not just a cost issue, x86_64 -m32 is a 32-bit wordsize target, some of
the instructions ce1 sees are still DImode, but e.g. the comparisons are ors of
the subreg parts etc.  This really should not be expected to be optimized at
the RTL level, you'd need to optimize it at the gimple level.

[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2015-09-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.0

--- Comment #3 from Richard Biener  ---
The tree level should indeed do a better job here but it gets "confused" by
narrowing the return expressions to int before it gets a chance to do
that optimization.  It's

 s64 d = a - b;

  if (d == 0)
return (unsigned)a + (unsigned)c;
  else
return (unsigned)b + (unsigned)d + (unsigned)c;

to them and 'd' is not handled the same way because the shortening happens
in the frontend.

You might want to file a separate PR about this missed optimization.


[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"

2015-09-08 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Component|rtl-optimization|target

--- Comment #2 from ktkachov at gcc dot gnu.org ---
IMO this is a target issue.
If you think if-conversion should happen for -m32 then the backend costs should
be fixed.
If not, then this test should be skipped.