[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 Jakub Jelinek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Jakub Jelinek --- Testcase fixed, for GCC 7 I've cloned this into PR69231 for the rtx_cost of a SUBREG issue.
[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 --- Comment #8 from Jakub Jelinek --- Author: jakub Date: Mon Jan 11 19:07:31 2016 New Revision: 232242 URL: https://gcc.gnu.org/viewcvs?rev=232242&root=gcc&view=rev Log: PR target/67462 * gcc.dg/ifcvt-3.c: Only compile on lp64 targets, include also i?86 if lp64. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/ifcvt-3.c
[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 --- Comment #7 from Jakub Jelinek --- (In reply to Bernd Schmidt from comment #6) > That does look dodgy. It's also really old, from when rtx_cost was part of > cse.c. Kenner added it along with many other changes in r754 in 1992. > > See what happens to codegen if you just strip SUBREGs before this switch and > lose the MODES_TIEABLE thing? Well, if the SUBREG modes are not tieable and it is expected that reload will have to add some reload insns to read those subregs or store them, I think it is appropriate to have some non-zero cost for them. Even for the integral modes, if the RA chooses say on x86-64 to allocate them say in SSE registers then trying to read smaller modes out of them might have some cost. It is just that if they happen to be allocated in i?86 GPRs that the cost is zero. And, with the stv pass that doesn't have to happen as often as in the past. I'd say it is too dangerous so late in stage3 to change the costs now, and for GCC 7 perhaps we should just allow the target hook to guess the cost of the subreg.
[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 --- Comment #6 from Bernd Schmidt --- That does look dodgy. It's also really old, from when rtx_cost was part of cse.c. Kenner added it along with many other changes in r754 in 1992. See what happens to codegen if you just strip SUBREGs before this switch and lose the MODES_TIEABLE thing?
[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 Jakub Jelinek changed: What|Removed |Added CC||bernds at gcc dot gnu.org, ||law at gcc dot gnu.org --- Comment #5 from Jakub Jelinek --- BTW, the reason why the costs are different is not something in backend's control. rtx_cost has: case SUBREG: total = 0; /* If we can't tie these modes, make this expensive. The larger the mode, the more expensive it is. */ if (! MODES_TIEABLE_P (mode, GET_MODE (SUBREG_REG (x return COSTS_N_INSNS (2 + factor); break; without the possibility of target to override this, and as for 32-bit arches HARD_REGNO_MODE_NREGS is different between SImode and DImode, those are required not to be tieable. I fail to see why at least on i686/x86_64 for the word mode integral subregs of integral double word mode there is any higher cost than of simple REG (i.e. 0) though, if the pseudo the subreg is of is given a hard register, then reload turns it into access of just one register of the GPR pair, and if it lives in a stack slot, then reload can just load and/or store one half of the memory slot.
[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- Created attachment 37294 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37294&action=edit gcc6-pr67462.patch This is not just a cost issue, x86_64 -m32 is a 32-bit wordsize target, some of the instructions ce1 sees are still DImode, but e.g. the comparisons are ors of the subreg parts etc. This really should not be expected to be optimized at the RTL level, you'd need to optimize it at the gimple level.
[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 Richard Biener changed: What|Removed |Added Target Milestone|--- |6.0 --- Comment #3 from Richard Biener --- The tree level should indeed do a better job here but it gets "confused" by narrowing the return expressions to int before it gets a chance to do that optimization. It's s64 d = a - b; if (d == 0) return (unsigned)a + (unsigned)c; else return (unsigned)b + (unsigned)d + (unsigned)c; to them and 'd' is not handled the same way because the shortening happens in the frontend. You might want to file a separate PR about this missed optimization.
[Bug target/67462] [6 Regression] FAIL: gcc.dg/ifcvt-3.c scan-rtl-dump ce1 "3 true changes made"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67462 ktkachov at gcc dot gnu.org changed: What|Removed |Added Component|rtl-optimization|target --- Comment #2 from ktkachov at gcc dot gnu.org --- IMO this is a target issue. If you think if-conversion should happen for -m32 then the backend costs should be fixed. If not, then this test should be skipped.