"H.J. Lu" <hjl.to...@gmail.com> writes: > On Wed, Sep 14, 2011 at 8:24 AM, Richard Sandiford > <rdsandif...@googlemail.com> wrote: >> At the moment, fwprop will propagate constants and registers >> even if no further rtl simplifications are possible: >> >> if (REG_P (new_rtx) || CONSTANT_P (new_rtx)) >> flags |= PR_CAN_APPEAR; >> >> What do you think about extending this to subregs? The reason for >> asking is that on NEON, vector loads like vld4 are represented as a load >> of a single monolithic register followed by subreg extractions of each >> vector: >> >> (set (reg:OI FULL) (...)) >> (set (reg:V2SI V0) (subreg:V2SI (reg:OI FULL) 0)) >> (set (reg:V2SI V1) (subreg:V2SI (reg:OI FULL) 16)) >> (set (reg:V2SI V2) (subreg:V2SI (reg:OI FULL) 32)) >> (set (reg:V2SI V3) (subreg:V2SI (reg:OI FULL) 48)) >> >> Nothing ever propagates these subregs, so the separate moves >> survive until IRA. This has three problems: >> >> - We generally want the registers allocated to V0...V3 to be the same >> as FULL, so that the four subreg moves become nops. And this often >> happens in simple examples. But if register pressure is relatively >> high, these moves can sometimes cause IRA to spill in cases where >> it doesn't if the subregs are used instead of each Vi. >> >> - Perhaps related, register pressure becomes harder to estimate. >> >> - These moves can interfere with pre-reload scheduling. >> >> In combination with the MODES_TIEABLE_P patch that I posted here: >> >> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00626.html >> >> this patch significantly improves the code generated for several libav >> loops. Unfortunately, I don't have a setup that can do meaningful >> x86_64 performance measurements, but a diff of the before and after >> output for libav showed many cases where the patch removed moves. >> >> What do you think? Alternatives include propagating in lower-subreg, >> or maybe only in the second fwprop pass. >> >> Richard >> >> >> gcc/ >> * fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs. >> >> Index: gcc/fwprop.c >> =================================================================== >> --- gcc/fwprop.c 2011-08-26 09:58:28.829540497 +0100 >> +++ gcc/fwprop.c 2011-08-26 10:14:03.767707504 +0100 >> @@ -664,7 +664,7 @@ propagate_rtx (rtx x, enum machine_mode >> return NULL_RTX; >> >> flags = 0; >> - if (REG_P (new_rtx) || CONSTANT_P (new_rtx)) >> + if (REG_P (new_rtx) || CONSTANT_P (new_rtx) || GET_CODE (new_rtx) == >> SUBREG) >> flags |= PR_CAN_APPEAR; >> if (!for_each_rtx (&new_rtx, varying_mem_p, NULL)) >> flags |= PR_HANDLE_MEM; >> > > A SUBREG may not be REG nor CONSTANT. Don't you need > to check REG_P/CONSTANT_P on SUBREG?
Yeah, good point. There should be a "&& REG_P (SUBREG_REG (new_rtx))" in there. Probably also worth checking for non-paradoxical subregs. Richard