1.c attached. On Mon, Apr 1, 2013 at 10:43 PM, Wei Mi <w...@google.com> wrote: > I attached the patch.4 based on r197308. r197308 changes shift-and > type truncation from define_insn_and_split to define_insn. patch.4 > changes ix86_rtx_costs for shift-and type rtx to get the correct cost > for the result after the shift-and truncation. > > With the patch.1 ~ patch.4, fwprop extension could handle the > motivational case 1.c attached by removing all the redundent "x & 63" > operations. > > patch.1~patch.4 regression and bootstrap ok on > x86_64-unknown-linux-gnu. Is it ok for trunk? > > Thanks, > Wei. > > On Sun, Mar 17, 2013 at 12:15 AM, Wei Mi <w...@google.com> wrote: >> Hi, >> >> On Sat, Mar 16, 2013 at 3:48 PM, Steven Bosscher <stevenb....@gmail.com> >> wrote: >>> On Tue, Mar 12, 2013 at 8:18 AM, Wei Mi wrote: >>>> For the motivational case, I need insn splitting to get the cost >>>> right. insn splitting is not very intrusive. All I need is to call >>>> split_insns func. >>> >>> It may not look very intrusive, but there's a lot happening in the >>> back ground. You're creating a lot of new RTL, and then just throw it >>> away again. You fake the compiler into thinking you're much deeper in >>> the pipeline than you really are. You're assuming there are no >>> side-effects other than that some insn gets split, but there are back >>> ends where splitters may have side-effects. >> >> Ok, then I will remove the split_insns call. >> >>> >>> Even though I've asked twice now, you still have not explained this >>> motivational case, except to say that there is one. *What* are you >>> trying to do, *what* is not happening without the splits, and *what* >>> happens if you split. Only if you explain that in a lot more detail >>> than "I have a motivational case" then we can look into what is a >>> proper solution. >> >> :-). Sorry, I didn't say it clearly. The motivational case is the one >> mentioned in the following posts (split_insns changes a << (b & 63) to >> a << b). >> http://gcc.gnu.org/ml/gcc/2013-01/msg00181.html >> http://gcc.gnu.org/ml/gcc-patches/2013-02/msg01144.html >> >> If I remove the split_insns call and related cost estimation >> adjustment, the fwprop 18-->22 and 18-->23 will punt, because fwprop >> here looks like a reverse process of cse, the total cost after fwprop >> change is increased. >> >> Def insn 18: >> Use insn 23 >> Use insn 22 >> >> If we include the split_insns cost estimation adjustment. >> extra benefit by removing def insn 18 = 5 >> change[0]: benefit = 0, verified - ok // The cost of insn 22 will >> not change after fwprop + insn splitting. >> change[1]: benefit = 0, verified - ok // The insn 23 is the same with >> insn 22 >> Total benefit is 5, fwprop will go on. >> >> If we remove the split_insns cost estimation adjustment. >> extra benefit by removing def insn 18 = 5 >> change[0]: benefit = -4, verified - ok // The costs of insn 22 and >> insn 23 will increase after fwprop. >> change[1]: benefit = -4, verified - ok // The insn 23 is the same >> with insn 22 >> Total benefit is -3, fwprop will punt. >> >> How about adding the (a << (b&63) ==> a << b) transformation in >> simplify_binary_operation_1, becuase (a << (b&63) ==> a << b) is a >> kind of architecture specific expr simplification? Then fwprop could >> do the propagation as I expect. >> >>> >>> The problem with some of the splitters is that they exist to break up >>> RTL from 'expand' to initially keep some pattern together to allow the >>> code transformation passes to handle the pattern as one instruction. >>> This made sense when RTL was the only intermediate representation and >>> splitting too early would inhibit some optimizations. But I would >>> expect most (if not all) such cases to be less relevant because of the >>> GIMPLE middle-end work. The only splitters you can trigger are the >>> pre-reload splitters (all the reload_completed conditions obviously >>> can't trigger if you're splitting from fwprop). Perhaps those >>> splitters can/should run earlier, or be made obsolete by expanding >>> directly to the post-splitting insns. >>> >>> Unfortunately, it's not possible to tell for your case, because you >>> haven't explained it yet... >>> >>> >>>> So how about keep split_insns and remove peephole in the cost estimation >>>> func? >>> >>> I'd strongly oppose this. I do not believe this is necessary, and I >>> think it's conceptually wrong. >>> >>> >>>>> What happens if you propagate into an insn that uses the same register >>>>> twice? Will the DU chains still be valid (I don't think that's >>>>> guaranteed)? >>>> >>>> I think the DU chains still be valid. If propagate into the insn that >>>> uses the same register twice, the two uses will be replaced when the >>>> first use is seen (propagate_rtx_1 will propagate all the occurrances >>>> of the same reg in the use insn). When the second use is seen, the >>>> df_use and use insn in its insn_info are still available. >>>> forward_propagate_into will early return after check reg_mentioned_p >>>> (DF_REF_REG (use), parent) and find out no reg is used any more. >>> >>> With reg_mentioned_p you cannot verify that the DF_REF_LOC of USE is >>> still valid. >> >> I think DF_REF_LOC of USE may be invalid if dangling rtx will be >> recycled by garbage collection very soon (I don't know when GC will >> happen). Although DF_REF_LOC of USE maybe invalid, the early return in >> forward_propagate_into ensure it will not cause any correctness >> problem. >> >>> >>> In any case, returning to the RD problem for DU/UD chains is probably >>> a good idea, now that RD is not such a hog anymore. In effect fwprop.c >>> would return to what it looked like before the patch of r149010. >> >> I remove MD problem and use DU/UD instead. >> >>> >>> As a way forward on all of this, I'd suggest the following steps, each >>> with a separate patch: >> >> Thanks for the suggestion! >> >>> 1. replace the MD problem with RD again, and build full DU/UD chains. >> >> I include patch.1 attached. >> >>> 2. post all the recog changes separately, with minimum impact on the >>> parts of the compiler you don't really change. (For apply_change_group >>> you could even choose to overload it, or use a NUM argument with a >>> default value -- not sure if default argument values are OK for GCC >>> tho'.) >> >> patch.2 attached. >> >>> 3. implement propagation into multiple USEs, but without the splitting >>> and peepholing. >> >> patch.3 attached. >> >>> 4. see about fixing the back end to either split earlier or expand to >>> the desired patterns directly. >> >> I havn't included this part. If you agree with the proposal to add the >> transformation (a << (b&63) ==> a << b) in >> simplify_binary_operation_1, I will send out another patch about it. >> >> Thanks, >> Wei.
typedef unsigned long long uint64; typedef unsigned int uint32;
class Decoder { public: Decoder() : k_minus_1_(0), buf_(0), bits_left_(0) {} ~Decoder() {} uint32 ExtractBits(uint64 end, uint64 start); inline uint32 GetBits(int bits) { uint32 val = ExtractBits(bits, 0); buf_ >>= bits; bits_left_ -= bits; return val; } uint32 Get(uint32 bits); uint32 k_minus_1_; uint64 buf_; unsigned long bits_left_; }; uint32 Decoder::ExtractBits(uint64 end, uint64 start) { return (buf_ << (-end & 63)) >> ((start - end) & 63); } uint32 Decoder::Get(uint32 bits) { bits += k_minus_1_; uint32 msbit = (bits > (k_minus_1_ + 1)); return GetBits(bits - msbit) | (msbit << (bits - 1)); }