Hi, On Sat, 2013-03-23 at 21:18 -0700, Wei Mi wrote: > This is the patch to add the shift truncation in > simplify_binary_operation_1. I add a new hook > TARGET_SHIFT_COUNT_TRUNCATED which uses enum rtx_code to decide > whether we can do shift truncation. I didn't use > TARGET_SHIFT_TRUNCATION_MASK in simplify_binary_operation_1 because it > uses the macro SHIFT_COUNT_TRUNCATED. If I change > SHIFT_COUNT_TRUNCATED to targetm.shift_count_truncated in > TARGET_SHIFT_TRUNCATION_MASK, I need to give > TARGET_SHIFT_TRUNCATION_MASK a enum rtx_code param, which wasn't > trivial to get at many places in existing code. >
During 4.8 development there was a similar issue with the TARGET_CANONICALIZE_COMPARISON hook. As a temporary solution the rtx_code has been passed as int. I think the story started here: http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00379.html The conclusion regarding rtx_code ... http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00646.html Maybe this should be addressed first, since now there are at least two cases where it's in the way. Cheers, Oleg > patch.1 ~ patch.4 pass regression and bootstrap on x86_64-unknown-linux-gnu. > > Thanks, > Wei. > > On Sun, Mar 17, 2013 at 12:15 AM, Wei Mi <w...@google.com> wrote: > > Hi, > > > > On Sat, Mar 16, 2013 at 3:48 PM, Steven Bosscher <stevenb....@gmail.com> > > wrote: > >> On Tue, Mar 12, 2013 at 8:18 AM, Wei Mi wrote: > >>> For the motivational case, I need insn splitting to get the cost > >>> right. insn splitting is not very intrusive. All I need is to call > >>> split_insns func. > >> > >> It may not look very intrusive, but there's a lot happening in the > >> back ground. You're creating a lot of new RTL, and then just throw it > >> away again. You fake the compiler into thinking you're much deeper in > >> the pipeline than you really are. You're assuming there are no > >> side-effects other than that some insn gets split, but there are back > >> ends where splitters may have side-effects. > > > > Ok, then I will remove the split_insns call. > > > >> > >> Even though I've asked twice now, you still have not explained this > >> motivational case, except to say that there is one. *What* are you > >> trying to do, *what* is not happening without the splits, and *what* > >> happens if you split. Only if you explain that in a lot more detail > >> than "I have a motivational case" then we can look into what is a > >> proper solution. > > > > :-). Sorry, I didn't say it clearly. The motivational case is the one > > mentioned in the following posts (split_insns changes a << (b & 63) to > > a << b). > > http://gcc.gnu.org/ml/gcc/2013-01/msg00181.html > > http://gcc.gnu.org/ml/gcc-patches/2013-02/msg01144.html > > > > If I remove the split_insns call and related cost estimation > > adjustment, the fwprop 18-->22 and 18-->23 will punt, because fwprop > > here looks like a reverse process of cse, the total cost after fwprop > > change is increased. > > > > Def insn 18: > > Use insn 23 > > Use insn 22 > > > > If we include the split_insns cost estimation adjustment. > > extra benefit by removing def insn 18 = 5 > > change[0]: benefit = 0, verified - ok // The cost of insn 22 will > > not change after fwprop + insn splitting. > > change[1]: benefit = 0, verified - ok // The insn 23 is the same with > > insn 22 > > Total benefit is 5, fwprop will go on. > > > > If we remove the split_insns cost estimation adjustment. > > extra benefit by removing def insn 18 = 5 > > change[0]: benefit = -4, verified - ok // The costs of insn 22 and > > insn 23 will increase after fwprop. > > change[1]: benefit = -4, verified - ok // The insn 23 is the same > > with insn 22 > > Total benefit is -3, fwprop will punt. > > > > How about adding the (a << (b&63) ==> a << b) transformation in > > simplify_binary_operation_1, becuase (a << (b&63) ==> a << b) is a > > kind of architecture specific expr simplification? Then fwprop could > > do the propagation as I expect. > > > >> > >> The problem with some of the splitters is that they exist to break up > >> RTL from 'expand' to initially keep some pattern together to allow the > >> code transformation passes to handle the pattern as one instruction. > >> This made sense when RTL was the only intermediate representation and > >> splitting too early would inhibit some optimizations. But I would > >> expect most (if not all) such cases to be less relevant because of the > >> GIMPLE middle-end work. The only splitters you can trigger are the > >> pre-reload splitters (all the reload_completed conditions obviously > >> can't trigger if you're splitting from fwprop). Perhaps those > >> splitters can/should run earlier, or be made obsolete by expanding > >> directly to the post-splitting insns. > >> > >> Unfortunately, it's not possible to tell for your case, because you > >> haven't explained it yet... > >> > >> > >>> So how about keep split_insns and remove peephole in the cost estimation > >>> func? > >> > >> I'd strongly oppose this. I do not believe this is necessary, and I > >> think it's conceptually wrong. > >> > >> > >>>> What happens if you propagate into an insn that uses the same register > >>>> twice? Will the DU chains still be valid (I don't think that's > >>>> guaranteed)? > >>> > >>> I think the DU chains still be valid. If propagate into the insn that > >>> uses the same register twice, the two uses will be replaced when the > >>> first use is seen (propagate_rtx_1 will propagate all the occurrances > >>> of the same reg in the use insn). When the second use is seen, the > >>> df_use and use insn in its insn_info are still available. > >>> forward_propagate_into will early return after check reg_mentioned_p > >>> (DF_REF_REG (use), parent) and find out no reg is used any more. > >> > >> With reg_mentioned_p you cannot verify that the DF_REF_LOC of USE is > >> still valid. > > > > I think DF_REF_LOC of USE may be invalid if dangling rtx will be > > recycled by garbage collection very soon (I don't know when GC will > > happen). Although DF_REF_LOC of USE maybe invalid, the early return in > > forward_propagate_into ensure it will not cause any correctness > > problem. > > > >> > >> In any case, returning to the RD problem for DU/UD chains is probably > >> a good idea, now that RD is not such a hog anymore. In effect fwprop.c > >> would return to what it looked like before the patch of r149010. > > > > I remove MD problem and use DU/UD instead. > > > >> > >> As a way forward on all of this, I'd suggest the following steps, each > >> with a separate patch: > > > > Thanks for the suggestion! > > > >> 1. replace the MD problem with RD again, and build full DU/UD chains. > > > > I include patch.1 attached. > > > >> 2. post all the recog changes separately, with minimum impact on the > >> parts of the compiler you don't really change. (For apply_change_group > >> you could even choose to overload it, or use a NUM argument with a > >> default value -- not sure if default argument values are OK for GCC > >> tho'.) > > > > patch.2 attached. > > > >> 3. implement propagation into multiple USEs, but without the splitting > >> and peepholing. > > > > patch.3 attached. > > > >> 4. see about fixing the back end to either split earlier or expand to > >> the desired patterns directly. > > > > I havn't included this part. If you agree with the proposal to add the > > transformation (a << (b&63) ==> a << b) in > > simplify_binary_operation_1, I will send out another patch about it. > > > > Thanks, > > Wei.