Re: extend fwprop optimization

Oleg Endo Sun, 24 Mar 2013 05:33:09 -0700

Hi,

On Sat, 2013-03-23 at 21:18 -0700, Wei Mi wrote:
> This is the patch to add the shift truncation in
> simplify_binary_operation_1. I add a new hook
> TARGET_SHIFT_COUNT_TRUNCATED which uses enum rtx_code to decide
> whether we can do shift truncation. I didn't use
> TARGET_SHIFT_TRUNCATION_MASK in simplify_binary_operation_1 because it
> uses the macro SHIFT_COUNT_TRUNCATED. If I change
> SHIFT_COUNT_TRUNCATED to targetm.shift_count_truncated in
> TARGET_SHIFT_TRUNCATION_MASK, I need to give
> TARGET_SHIFT_TRUNCATION_MASK a enum rtx_code param, which wasn't
> trivial to get at many places in existing code.
>


During 4.8 development there was a similar issue with the
TARGET_CANONICALIZE_COMPARISON hook.  As a temporary solution the
rtx_code has been passed as int.  I think the story started here:
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00379.html

The conclusion regarding rtx_code ...
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00646.html

Maybe this should be addressed first, since now there are at least two
cases where it's in the way.

Cheers,
Oleg



> patch.1 ~ patch.4 pass regression and bootstrap on x86_64-unknown-linux-gnu.
> 
> Thanks,
> Wei.
> 
> On Sun, Mar 17, 2013 at 12:15 AM, Wei Mi <[email protected]> wrote:
> > Hi,
> >
> > On Sat, Mar 16, 2013 at 3:48 PM, Steven Bosscher <[email protected]> 
> > wrote:
> >> On Tue, Mar 12, 2013 at 8:18 AM, Wei Mi wrote:
> >>> For the motivational case, I need insn splitting to get the cost
> >>> right. insn splitting is not very intrusive. All I need is to call
> >>> split_insns func.
> >>
> >> It may not look very intrusive, but there's a lot happening in the
> >> back ground. You're creating a lot of new RTL, and then just throw it
> >> away again. You fake the compiler into thinking you're much deeper in
> >> the pipeline than you really are. You're assuming there are no
> >> side-effects other than that some insn gets split, but there are back
> >> ends where splitters may have side-effects.
> >
> > Ok, then I will remove the split_insns call.
> >
> >>
> >> Even though I've asked twice now, you still have not explained this
> >> motivational case, except to say that there is one. *What* are you
> >> trying to do, *what* is not happening without the splits, and *what*
> >> happens if you split. Only if you explain that in a lot more detail
> >> than "I have a motivational case" then we can look into what is a
> >> proper solution.
> >
> > :-). Sorry, I didn't say it clearly. The motivational case is the one
> > mentioned in the following posts (split_insns changes a << (b & 63) to
> > a << b).
> > http://gcc.gnu.org/ml/gcc/2013-01/msg00181.html
> > http://gcc.gnu.org/ml/gcc-patches/2013-02/msg01144.html
> >
> > If I remove the split_insns call and related cost estimation
> > adjustment, the fwprop 18-->22 and 18-->23 will punt, because fwprop
> > here looks like a reverse process of cse, the total cost after fwprop
> > change is increased.
> >
> > Def insn 18:
> >         Use insn 23
> >         Use insn 22
> >
> > If we include the split_insns cost estimation adjustment.
> >   extra benefit by removing def insn 18 = 5
> >   change[0]: benefit = 0, verified - ok  // The cost of insn 22 will
> > not change after fwprop + insn splitting.
> >   change[1]: benefit = 0, verified - ok  // The insn 23 is the same with 
> > insn 22
> > Total benefit is 5, fwprop will go on.
> >
> > If we remove the split_insns cost estimation adjustment.
> >   extra benefit by removing def insn 18 = 5
> >   change[0]: benefit = -4, verified - ok   // The costs of insn 22 and
> > insn 23 will increase after fwprop.
> >   change[1]: benefit = -4, verified - ok   // The insn 23 is the same
> > with insn 22
> > Total benefit is -3, fwprop will punt.
> >
> > How about adding the (a << (b&63) ==> a << b) transformation in
> > simplify_binary_operation_1, becuase (a << (b&63) ==> a << b) is a
> > kind of architecture specific expr simplification? Then fwprop could
> > do the propagation as I expect.
> >
> >>
> >> The problem with some of the splitters is that they exist to break up
> >> RTL from 'expand' to initially keep some pattern together to allow the
> >> code transformation passes to handle the pattern as one instruction.
> >> This made sense when RTL was the only intermediate representation and
> >> splitting too early would inhibit some optimizations. But I would
> >> expect most (if not all) such cases to be less relevant because of the
> >> GIMPLE middle-end work. The only splitters you can trigger are the
> >> pre-reload splitters (all the reload_completed conditions obviously
> >> can't trigger if you're splitting from fwprop). Perhaps those
> >> splitters can/should run earlier, or be made obsolete by expanding
> >> directly to the post-splitting insns.
> >>
> >> Unfortunately, it's not possible to tell for your case, because you
> >> haven't explained it yet...
> >>
> >>
> >>> So how about keep split_insns and remove peephole in the cost estimation 
> >>> func?
> >>
> >> I'd strongly oppose this. I do not believe this is necessary, and I
> >> think it's conceptually wrong.
> >>
> >>
> >>>> What happens if you propagate into an insn that uses the same register
> >>>> twice? Will the DU chains still be valid (I don't think that's
> >>>> guaranteed)?
> >>>
> >>> I think the DU chains still be valid. If propagate into the insn that
> >>> uses the same register twice, the two uses will be replaced when the
> >>> first use is seen (propagate_rtx_1 will propagate all the occurrances
> >>> of the same reg in the use insn).  When the second use is seen, the
> >>> df_use and use insn in its insn_info are still available.
> >>> forward_propagate_into will early return after check reg_mentioned_p
> >>> (DF_REF_REG (use), parent) and find out no reg is used  any more.
> >>
> >> With reg_mentioned_p you cannot verify that the DF_REF_LOC of USE is
> >> still valid.
> >
> > I think DF_REF_LOC of USE may be invalid if dangling rtx will be
> > recycled by garbage collection very soon (I don't know when GC will
> > happen). Although DF_REF_LOC of USE maybe invalid, the early return in
> > forward_propagate_into ensure it will not cause any correctness
> > problem.
> >
> >>
> >> In any case, returning to the RD problem for DU/UD chains is probably
> >> a good idea, now that RD is not such a hog anymore. In effect fwprop.c
> >> would return to what it looked like before the patch of r149010.
> >
> > I remove MD problem and use DU/UD instead.
> >
> >>
> >> As a way forward on all of this, I'd suggest the following steps, each
> >> with a separate patch:
> >
> > Thanks for the suggestion!
> >
> >> 1. replace the MD problem with RD again, and build full DU/UD chains.
> >
> > I include patch.1 attached.
> >
> >> 2. post all the recog changes separately, with minimum impact on the
> >> parts of the compiler you don't really change. (For apply_change_group
> >> you could even choose to overload it, or use a NUM argument with a
> >> default value -- not sure if default argument values are OK for GCC
> >> tho'.)
> >
> > patch.2 attached.
> >
> >> 3. implement propagation into multiple USEs, but without the splitting
> >> and peepholing.
> >
> > patch.3 attached.
> >
> >> 4. see about fixing the back end to either split earlier or expand to
> >> the desired patterns directly.
> >
> > I havn't included this part. If you agree with the proposal to add the
> > transformation (a << (b&63) ==> a << b) in
> > simplify_binary_operation_1, I will send out another patch about it.
> >
> > Thanks,
> > Wei.

Re: extend fwprop optimization

Reply via email to