"H.J. Lu" <hjl.to...@gmail.com> writes:
> On Wed, Sep 14, 2011 at 8:24 AM, Richard Sandiford
> <rdsandif...@googlemail.com> wrote:
>> At the moment, fwprop will propagate constants and registers
>> even if no further rtl simplifications are possible:
>>
>>  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
>>    flags |= PR_CAN_APPEAR;
>>
>> What do you think about extending this to subregs?  The reason for
>> asking is that on NEON, vector loads like vld4 are represented as a load
>> of a single monolithic register followed by subreg extractions of each
>> vector:
>>
>>  (set (reg:OI FULL) (...))
>>  (set (reg:V2SI V0) (subreg:V2SI (reg:OI FULL) 0))
>>  (set (reg:V2SI V1) (subreg:V2SI (reg:OI FULL) 16))
>>  (set (reg:V2SI V2) (subreg:V2SI (reg:OI FULL) 32))
>>  (set (reg:V2SI V3) (subreg:V2SI (reg:OI FULL) 48))
>>
>> Nothing ever propagates these subregs, so the separate moves
>> survive until IRA.  This has three problems:
>>
>>  - We generally want the registers allocated to V0...V3 to be the same
>>    as FULL, so that the four subreg moves become nops.  And this often
>>    happens in simple examples.  But if register pressure is relatively
>>    high, these moves can sometimes cause IRA to spill in cases where
>>    it doesn't if the subregs are used instead of each Vi.
>>
>>  - Perhaps related, register pressure becomes harder to estimate.
>>
>>  - These moves can interfere with pre-reload scheduling.
>>
>> In combination with the MODES_TIEABLE_P patch that I posted here:
>>
>>    http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00626.html
>>
>> this patch significantly improves the code generated for several libav
>> loops.  Unfortunately, I don't have a setup that can do meaningful
>> x86_64 performance measurements, but a diff of the before and after
>> output for libav showed many cases where the patch removed moves.
>>
>> What do you think?  Alternatives include propagating in lower-subreg,
>> or maybe only in the second fwprop pass.
>>
>> Richard
>>
>>
>> gcc/
>>        * fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs.
>>
>> Index: gcc/fwprop.c
>> ===================================================================
>> --- gcc/fwprop.c        2011-08-26 09:58:28.829540497 +0100
>> +++ gcc/fwprop.c        2011-08-26 10:14:03.767707504 +0100
>> @@ -664,7 +664,7 @@ propagate_rtx (rtx x, enum machine_mode
>>     return NULL_RTX;
>>
>>   flags = 0;
>> -  if (REG_P (new_rtx) || CONSTANT_P (new_rtx))
>> +  if (REG_P (new_rtx) || CONSTANT_P (new_rtx) || GET_CODE (new_rtx) == 
>> SUBREG)
>>     flags |= PR_CAN_APPEAR;
>>   if (!for_each_rtx (&new_rtx, varying_mem_p, NULL))
>>     flags |= PR_HANDLE_MEM;
>>
>
> A SUBREG may not be REG nor CONSTANT. Don't you need
> to check REG_P/CONSTANT_P on SUBREG?

Yeah, good point.  There should be a "&& REG_P (SUBREG_REG (new_rtx))"
in there.  Probably also worth checking for non-paradoxical subregs.

Richard

Reply via email to