Hello Alex: On 12/04/24 8:15 pm, Alex Coplan wrote: > On 12/04/2024 20:02, Ajit Agarwal wrote: >> Hello Alex: >> >> On 11/04/24 7:55 pm, Alex Coplan wrote: >>> On 10/04/2024 23:48, Ajit Agarwal wrote: >>>> Hello Alex: >>>> >>>> On 10/04/24 7:52 pm, Alex Coplan wrote: >>>>> Hi Ajit, >>>>> >>>>> On 10/04/2024 15:31, Ajit Agarwal wrote: >>>>>> Hello Alex: >>>>>> >>>>>> On 10/04/24 1:42 pm, Alex Coplan wrote: >>>>>>> Hi Ajit, >>>>>>> >>>>>>> On 09/04/2024 20:59, Ajit Agarwal wrote: >>>>>>>> Hello Alex: >>>>>>>> >>>>>>>> On 09/04/24 8:39 pm, Alex Coplan wrote: >>>>>>>>> On 09/04/2024 20:01, Ajit Agarwal wrote: >>>>>>>>>> Hello Alex: >>>>>>>>>> >>>>>>>>>> On 09/04/24 7:29 pm, Alex Coplan wrote: >>>>>>>>>>> On 09/04/2024 17:30, Ajit Agarwal wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 05/04/24 10:03 pm, Alex Coplan wrote: >>>>>>>>>>>>> On 05/04/2024 13:53, Ajit Agarwal wrote: >>>>>>>>>>>>>> Hello Alex/Richard: >>>>>>>>>>>>>> >>>>>>>>>>>>>> All review comments are incorporated. >>>>> <snip> >>>>>>>>>>>>>> @@ -2890,8 +3018,8 @@ ldp_bb_info::merge_pairs (insn_list_t >>>>>>>>>>>>>> &left_list, >>>>>>>>>>>>>> // of accesses. If we find two sets of adjacent accesses, call >>>>>>>>>>>>>> // merge_pairs. >>>>>>>>>>>>>> void >>>>>>>>>>>>>> -ldp_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>>>> - access_group &group) >>>>>>>>>>>>>> +pair_fusion_bb_info::transform_for_base (int encoded_lfs, >>>>>>>>>>>>>> + access_group &group) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> const auto lfs = decode_lfs (encoded_lfs); >>>>>>>>>>>>>> const unsigned access_size = lfs.size; >>>>>>>>>>>>>> @@ -2909,7 +3037,7 @@ ldp_bb_info::transform_for_base (int >>>>>>>>>>>>>> encoded_lfs, >>>>>>>>>>>>>> access.cand_insns, >>>>>>>>>>>>>> lfs.load_p, >>>>>>>>>>>>>> access_size); >>>>>>>>>>>>>> - skip_next = access.cand_insns.empty (); >>>>>>>>>>>>>> + skip_next = bb_state->cand_insns_empty_p >>>>>>>>>>>>>> (access.cand_insns); >>>>>>>>>>>>> >>>>>>>>>>>>> As above, why is this needed? >>>>>>>>>>>> >>>>>>>>>>>> For rs6000 we want to return always true. as load store pair >>>>>>>>>>>> that are to be merged with 8/16 16/32 32/64 is occuring for rs6000. >>>>>>>>>>>> And we want load store pair to 8/16 32/64. Thats why we want >>>>>>>>>>>> to generate always true for rs6000 to skip pairs as above. >>>>>>>>>>> >>>>>>>>>>> Hmm, sorry, I'm not sure I follow. Are you saying that for rs6000 >>>>>>>>>>> you have >>>>>>>>>>> load/store pair instructions where the two arms of the access are >>>>>>>>>>> storing >>>>>>>>>>> operands of different sizes? Or something else? >>>>>>>>>>> >>>>>>>>>>> As it stands the logic is to skip the next iteration only if we >>>>>>>>>>> exhausted all the candidate insns for the current access. In the >>>>>>>>>>> case >>>>>>>>>>> that we didn't exhaust all such candidates, then the idea is that >>>>>>>>>>> when >>>>>>>>>>> access becomes prev_access, we can attempt to use those candidates >>>>>>>>>>> as >>>>>>>>>>> the "left-hand side" of a pair in the next iteration since we >>>>>>>>>>> failed to >>>>>>>>>>> use them as the "right-hand side" of a pair in the current >>>>>>>>>>> iteration. >>>>>>>>>>> I don't see why you wouldn't want that behaviour. Please can you >>>>>>>>>>> explain? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In merge_pair we get the 2 load candiates one load from 0 offset and >>>>>>>>>> other load is from 16th offset. Then in next iteration we get load >>>>>>>>>> from 16th offset and other load from 32 offset. In next iteration >>>>>>>>>> we get load from 32 offset and other load from 48 offset. >>>>>>>>>> >>>>>>>>>> For example: >>>>>>>>>> >>>>>>>>>> Currently we get the load candiates as follows. >>>>>>>>>> >>>>>>>>>> pairs: >>>>>>>>>> >>>>>>>>>> load from 0th offset. >>>>>>>>>> load from 16th offset. >>>>>>>>>> >>>>>>>>>> next pairs: >>>>>>>>>> >>>>>>>>>> load from 16th offset. >>>>>>>>>> load from 32th offset. >>>>>>>>>> >>>>>>>>>> next pairs: >>>>>>>>>> >>>>>>>>>> load from 32th offset >>>>>>>>>> load from 48th offset. >>>>>>>>>> >>>>>>>>>> Instead in rs6000 we should get: >>>>>>>>>> >>>>>>>>>> pairs: >>>>>>>>>> >>>>>>>>>> load from 0th offset >>>>>>>>>> load from 16th offset. >>>>>>>>>> >>>>>>>>>> next pairs: >>>>>>>>>> >>>>>>>>>> load from 32th offset >>>>>>>>>> load from 48th offset. >>>>>>>>> >>>>>>>>> Hmm, so then I guess my question is: why wouldn't you consider merging >>>>>>>>> the pair with offsets (16,32) for rs6000? Is it because you have a >>>>>>>>> stricter alignment requirement on the base pair offsets (i.e. they >>>>>>>>> have >>>>>>>>> to be a multiple of 32 when the operand size is 16)? So the pair >>>>>>>>> offsets have to be a multiple of the entire pair size rather than a >>>>>>>>> single operand size> >>>>>>>> >>>>>>>> We get load pair at a certain point with (0,16) and other program >>>>>>>> point we get load pair (32, 48). >>>>>>>> >>>>>>>> In current implementation it takes offsets loads as (0, 16), >>>>>>>> (16, 32), (32, 48). >>>>>>>> >>>>>>>> But In rs6000 we want the load pair to be merged at different points >>>>>>>> as (0,16) and (32, 48). for (0,16) we want to replace load lxvp with >>>>>>>> 0 offset and other load (32, 48) with lxvp with 32 offset. >>>>>>>> >>>>>>>> In current case it will merge with lxvp with 0 offset and lxvp with >>>>>>>> 16 offset, then lxvp with 32 offset and lxvp with 48 offset which >>>>>>>> is incorrect in our case as the (16-32) case 16 offset will not >>>>>>>> load from even register and break for rs6000. >>>>>>> >>>>>>> Sorry, I think I'm still missing something here. Why does the address >>>>>>> offset >>>>>>> affect the parity of the tranfser register? ISTM they needn't be >>>>>>> related at >>>>>>> all (and indeed we can't even know the parity of the transfer register >>>>>>> before >>>>>>> RA, but perhaps you're only intending on running the pass after RA?) >>>>>>> >>>>>> >>>>>> We have load pair with (0,16) wherein these loads are adjacent and >>>>>> replaced with lxvp. >>>>>> >>>>>> Semantic of lxvp instruction is that it loads adjacent load pair in >>>>>> even register and even_register + 1. >>>>>> >>>>>> We replace the above load pair with lxvp instruction and then we >>>>>> dont need to merge (16,32) as (0, 16) is already merged and instead >>>>>> we merge (32,48). >>>>> >>>>> Ok, but the existing logic should already account for this. I.e. if we >>>>> successfully merge (0,16), then we don't attempt to merge (16,32). We'd >>>>> only >>>>> attempt to merge (16,32) if the merge of (0,16) failed (for whatever >>>>> reason). >>>>> So I don't see that there's anything specific to lxvp that requires this >>>>> logic >>>>> to change, _unless_ you have a stricter alignment requirement on the >>>>> offsets as >>>>> I mentioned before. >>>>> >>>> >>>> Thanks for the suggestion. It worked for rs6000 also with current changes. >>>> Sorry for the confusion. >>> >>> Alright, glad we got to the bottom of this! >> >> Thanks. >>> >>>> >>>>>> >>>>>> Yes you are correct, the addresss offset doesn't affect the parity of >>>>>> the register transfer as we are doing fusion pass before RA. >>>>>> If that is the case then I think it would be better to introduce a >>>>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base >>>>>>>>> offset of the pair (prev_access->offset in transform_for_base). I >>>>>>>>> guess >>>>>>>>> it would also take access_size as a parameter and for aarch64 it >>>>>>>>> should >>>>>>>>> check: >>>>>>>>> >>>>>>>>> multiple_p (offset, access_size) >>>>>>>>> >>>>>>>>> and for rs6000 it could check: >>>>>>>>> >>>>>>>>> multiple_p (offset, access_size * 2) >>>>>>>>> >>>>>>>>> and we would then incorporate a call to that predicate in the else if >>>>>>>>> condition of tranform_for_base. >>>>>>>>> >>>>>>>>> It would have the same limitation whereby we assume that MEM_EXPR >>>>>>>>> offset >>>>>>>>> alignment is a good proxy for RTL offset alignment, but we already >>>>>>>>> depend on that with the multiple_p check in track_via_mem_expr. >>>>>>>>> >>>>>> I have addressed the above hooks and it worked fine with both rs6000 >>>>>> and aarch64. I am sending subsequent patch in some time that address >>>>>> above. >>>>>> >>>>>>> How do you plan on handling this even-odd requirement for rs6000? >>>>>>> >>>>>> >>>>>> We plan to handle with V16QI subreg: 0 and V16QI subreg : 16 to >>>>>> generate register pair and thats what we generate and implement >>>>>> in rs6000 target >>>>>> code. >>>>> >>>>> Ah, this is coming back to me now. Sorry, I should have remembered this >>>>> from >>>>> the previous discussion with Richard S. >>>>> >>>>> Apologies for going on a slight tangent here, but if you're running >>>>> before RA are you planning to create a new OImode pseudo register for >>>>> the lxvp insn and then somehow update uses of the old transfer registers >>>>> to replace them with subregs of that OImode pseudo? >>>> >>>> Yes I do the same as you have mentioned. We generate register pairs >>>> with 256 bit mode with two subregs of 128 bit modes with 0 and >>>> 16 offset. >>>> >>>> Or do you just plan >>>>> or replacing the individual loads with moves (instead of deleting them)? >>>>> I guess the latter would be simpler and might work better in the >>>>> presence of hard regs. >>>>> >>>> >>>> Would you mind explaining how to generate register pairs with lxvp by >>>> replacing loads with moves. >>> >>> Yeah, so suppose you have something like: >>> >>> (set (reg:V16QI v1) (mem:V16QI addr)) >>> (set (reg:V16QI v2) (mem:V16QI addr+16)) >>> >>> then when you insert the lxvp you can then (logically) replace the >>> original load instructions with moves from the appropriate subreg, as >>> follows: >>> >>> (set (reg:OI pair-pseudo) (mem:OI addr)) ; lxvp >>> (set (reg:V16QI v1) (subreg:V16QI (reg:OI pair-pseudo) 0)) >>> (set (reg:V16QI v2) (subreg:V16QI (reg:OI pair-pseudo) 16)) >>> >> >> Any Pseudo created with gen_rtx_REG like >> gen_rtx_REG (OOmode, REGNO (dest_exp) will error'ed out >> with unrecognize insn by LRA. > > I'm not surprised that goes wrong: you can't just create a new REG > rtx in a different mode and reuse the regno of an existing pseudo. > >> >> If I create pseudo with gen_reg_rtx (OOmode) will error'ed >> out with new_defs Pseudo register is not found in >> change->new_defs. > > Yeah, I suppose you'd need to add an RTL-SSA def for the new pseudo. >
Would you mind explaining how can I add and RTL-SSA def for the new pseudo. >> >> Also the sequential register pairs are not generated by >> Register Allocator. > > So how do you get the sequential pairs with your current approach? My > understanding was that what I suggested above doesn't really change what > you're doing w.r.t the lxvp insn itself, but maybe I didn't properly > understand the approach taken in your initial patchset. > I generate (set (reg:OO pair-pseudo) (mem:OI addr)) ; lxvp and then at the use point of pair_pseudo generate the following. (subreg:V16QI (reg:OI pair-pseudo) 0)) (subreg:V16QI (reg:OI pair-pseudo) 16)) Thanks & Regards Ajit > Thanks, > Alex > >> >> Thats why I haven't used above method as I also thought >> through. >> >> Please let me know what do you think. >> >> Thanks & Regards >> Ajit >>> now I'm not sure off-hand if this exact combination of subregs and mode >>> changes is valid, but hopefully you get the idea. The benefit of this >>> approach is that it keeps the transformation local and is more >>> compile-time efficient (we don't have to look through all mentions of >>> v1/v2 and replace them with subregs). I'd hope that the RA can then >>> clean up any redundant moves (especially in the case that v1/v2 are >>> pseudos). >>> >>> That would mean that you don't need all the grubbing around with DF >>> looking for occurrences of the transfer regs. Instead we'd just need >>> some logic to insert the extra moves after the lxvp for rs6000, and I >>> think this might fit more cleanly into the current pass. >>> >>> Does that make sense? In any case, this shouldn't really affect the >>> preparatory aarch64 patch because we were planning to defer adding any >>> hooks that are only needed for rs6000 from the initial aarch64/generic >>> split. >>> >>> Thanks, >>> Alex >>> >>>> >>>> Thanks & Regards >>>> Ajit >>>> >>>>> Thanks, >>>>> Alex >>>>> >>>>>> >>>>>> Thanks & Regards >>>>>> Ajit >>>>>>> Thanks, >>>>>>> Alex >>>>>>> >>>>>>>> >>>>>>>> lxvp should load from even registers and then loaded value will >>>>>>>> be in even register and even register +1 (which is odd). >>>>>>>> >>>>>>>> Thanks & Regards >>>>>>>> Ajit >>>>>>>>> If that is the case then I think it would be better to introduce a >>>>>>>>> virtual function (say pair_offset_alignment_ok_p) that vets the base >>>>>>>>> offset of the pair (prev_access->offset in transform_for_base). I >>>>>>>>> guess >>>>>>>>> it would also take access_size as a parameter and for aarch64 it >>>>>>>>> should >>>>>>>>> check: >>>>>>>>> >>>>>>>>> multiple_p (offset, access_size) >>>>>>>>> >>>>>>>>> and for rs6000 it could check: >>>>>>>>> >>>>>>>>> multiple_p (offset, access_size * 2) >>>>>>>>> >>>>>>>>> and we would then incorporate a call to that predicate in the else if >>>>>>>>> condition of tranform_for_base. >>>>>>>>> >>>>>>>>> It would have the same limitation whereby we assume that MEM_EXPR >>>>>>>>> offset >>>>>>>>> alignment is a good proxy for RTL offset alignment, but we already >>>>>>>>> depend on that with the multiple_p check in track_via_mem_expr. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Alex >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks & Regards >>>>>>>>>> Ajit >>>>>>>>>>> Thanks, >>>>>>>>>>> Alex >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> } >>>>>>>>>>>>>> prev_access = &access; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> @@ -2919,7 +3047,7 @@ ldp_bb_info::transform_for_base (int >>>>>>>>>>>>>> encoded_lfs, >>>>>>>>>>>>>> // and remove all the tombstone insns, being sure to reparent >>>>>>>>>>>>>> any uses >>>>>>>>>>>>>> // of mem to previous defs when we do this. >>>>>>>>>>>>>> void >>>>>>>>>>>>>> -ldp_bb_info::cleanup_tombstones () >>>>>>>>>>>>>> +pair_fusion_bb_info::cleanup_tombstones () >>>>>>>>>>>>>> { >>>>>>>>>>>>>> // No need to do anything if we didn't emit a tombstone insn >>>>>>>>>>>>>> for this BB. >>>>>>>>>>>>>> if (!m_emitted_tombstone) >>>>>>>>>>>>>> @@ -2947,7 +3075,7 @@ ldp_bb_info::cleanup_tombstones () >>>>>>>>>>>>>> >>>>>>>>>>>>>> template<typename Map> >>>>>>>>>>>>>> void >>>>>>>>>>>>>> -ldp_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>>>> +pair_fusion_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> for (auto kv : map) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> @@ -2958,7 +3086,7 @@ ldp_bb_info::traverse_base_map (Map &map) >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> void >>>>>>>>>>>>>> -ldp_bb_info::transform () >>>>>>>>>>>>>> +pair_fusion_bb_info::transform () >>>>>>>>>>>>>> { >>>>>>>>>>>>>> traverse_base_map (expr_map); >>>>>>>>>>>>>> traverse_base_map (def_map); >>>>>>>>>>>>>> @@ -3167,14 +3295,13 @@ try_promote_writeback (insn_info *insn) >>>>>>>>>>>>>> // for load/store candidates. If running after RA, also try >>>>>>>>>>>>>> and promote >>>>>>>>>>>>>> // non-writeback pairs to use writeback addressing. Then try >>>>>>>>>>>>>> to fuse >>>>>>>>>>>>>> // candidates into pairs. >>>>>>>>>>>>>> -void ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>>>> +void pair_fusion::ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> - const bool track_loads >>>>>>>>>>>>>> - = aarch64_tune_params.ldp_policy_model != >>>>>>>>>>>>>> AARCH64_LDP_STP_POLICY_NEVER; >>>>>>>>>>>>>> - const bool track_stores >>>>>>>>>>>>>> - = aarch64_tune_params.stp_policy_model != >>>>>>>>>>>>>> AARCH64_LDP_STP_POLICY_NEVER; >>>>>>>>>>>>>> + const bool track_loads = track_load_p (); >>>>>>>>>>>>>> + const bool track_stores = track_store_p (); >>>>>>>>>>>>>> >>>>>>>>>>>>>> - ldp_bb_info bb_state (bb); >>>>>>>>>>>>> >>>>>>>>>>>>> This: >>>>>>>>>>>>> >>>>>>>>>>>>>> + aarch64_pair_fusion derived; >>>>>>>>>>>>> >>>>>>>>>>>>> can be deleted and then: >>>>>>>>>>>>> >>>>>>>>>>>>>> + pair_fusion_bb_info bb_info (bb, &derived); >>>>>>>>>>>>> >>>>>>>>>>>>> can just be: >>>>>>>>>>>>> >>>>>>>>>>>>> pair_fusion_bb_info bb_info (bb, this); >>>>>>>>>>>>> >>>>>>>>>>>>> (or you can pass *this if you make bb_info take a reference). >>>>>>>>>>>>> >>>>>>>>>>>>> I don't think there's a particular need to change the variable >>>>>>>>>>>>> name >>>>>>>>>>>>> (bb_state -> bb_info). I chose the former because it doens't >>>>>>>>>>>>> clash >>>>>>>>>>>>> with the RTL-SSA structure of the same name as the latter. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Addressed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> for (auto insn : bb->nondebug_insns ()) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> @@ -3184,31 +3311,31 @@ void ldp_fusion_bb (bb_info *bb) >>>>>>>>>>>>>> continue; >>>>>>>>>>>>>> >>>>>>>>>>>>>> rtx pat = PATTERN (rti); >>>>>>>>>>>>>> - if (reload_completed >>>>>>>>>>>>>> - && aarch64_ldp_writeback > 1 >>>>>>>>>>>>>> - && GET_CODE (pat) == PARALLEL >>>>>>>>>>>>>> - && XVECLEN (pat, 0) == 2) >>>>>>>>>>>>>> + if (pair_mem_promote_writeback_p (pat)) >>>>>>>>>>>>>> try_promote_writeback (insn); >>>>>>>>>>>>> >>>>>>>>>>>>> It looks like try_promote_writeback itself will need some further >>>>>>>>>>>>> work >>>>>>>>>>>>> to make it target-independent. I suppose this check: >>>>>>>>>>>>> >>>>>>>>>>>>> auto rti = insn->rtl (); >>>>>>>>>>>>> const auto attr = get_attr_ldpstp (rti); >>>>>>>>>>>>> if (attr == LDPSTP_NONE) >>>>>>>>>>>>> return; >>>>>>>>>>>>> >>>>>>>>>>>>> bool load_p = (attr == LDPSTP_LDP); >>>>>>>>>>>>> gcc_checking_assert (load_p || attr == LDPSTP_STP); >>>>>>>>>>>>> >>>>>>>>>>>>> will need to become part of the pair_mem_promote_writeback_p hook >>>>>>>>>>>>> that you >>>>>>>>>>>>> added, potentially changing it to return a boolean for load_p. >>>>>>>>>>>>> >>>>>>>>>>>>> Then I guess we will need hooks for destructuring the pair insn >>>>>>>>>>>>> and >>>>>>>>>>>>> another hook to wrap aarch64_gen_writeback_pair. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Addressed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (GET_CODE (pat) != SET) >>>>>>>>>>>>>> continue; >>>>>>>>>>>>>> >>>>>>>>>>>>>> if (track_stores && MEM_P (XEXP (pat, 0))) >>>>>>>>>>>>>> - bb_state.track_access (insn, false, XEXP (pat, 0)); >>>>>>>>>>>>>> + bb_info.track_access (insn, false, XEXP (pat, 0)); >>>>>>>>>>>>>> else if (track_loads && MEM_P (XEXP (pat, 1))) >>>>>>>>>>>>>> - bb_state.track_access (insn, true, XEXP (pat, 1)); >>>>>>>>>>>>>> + bb_info.track_access (insn, true, XEXP (pat, 1)); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> - bb_state.transform (); >>>>>>>>>>>>>> - bb_state.cleanup_tombstones (); >>>>>>>>>>>>>> + bb_info.transform (); >>>>>>>>>>>>>> + bb_info.cleanup_tombstones (); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> void ldp_fusion () >>>>>>>>>>>>>> { >>>>>>>>>>>>>> ldp_fusion_init (); >>>>>>>>>>>>>> + pair_fusion *pfuse; >>>>>>>>>>>>>> + aarch64_pair_fusion derived; >>>>>>>>>>>>>> + pfuse = &derived; >>>>>>>>>>>>> >>>>>>>>>>>>> This is indeed the one place where I think it is acceptable to >>>>>>>>>>>>> instantiate aarch64_pair_fusion. But again there's no need to >>>>>>>>>>>>> create a >>>>>>>>>>>>> pointer to the parent class, just call any function you like >>>>>>>>>>>>> directly. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Addressed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> for (auto bb : crtl->ssa->bbs ()) >>>>>>>>>>>>>> - ldp_fusion_bb (bb); >>>>>>>>>>>>>> + pfuse->ldp_fusion_bb (bb); >>>>>>>>>>>>> >>>>>>>>>>>>> I think even the code to iterate over bbs should itself be a >>>>>>>>>>>>> member >>>>>>>>>>>>> function of pair_fusion (say "run") and then that becomes part of >>>>>>>>>>>>> the >>>>>>>>>>>>> generic code. >>>>>>>>>>>>> >>>>>>>>>>>>> So this function would just become: >>>>>>>>>>>>> >>>>>>>>>>>>> aarch64_pair_fusion pass; >>>>>>>>>>>>> pass.run (); >>>>>>>>>>>>> >>>>>>>>>>>>> and could be inlined into the caller. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Addressed. >>>>>>>>>>>>> Perhaps you could also add an early return like: >>>>>>>>>>>>> >>>>>>>>>>>>> if (!track_loads_p () && !track_stores_p ()) >>>>>>>>>>>>> return; >>>>>>>>>>>>> >>>>>>>>>>>>> in pair_fusion::run () and then remove the corresponding code from >>>>>>>>>>>>> pass_ldp_fusion::gate? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Addressed. >>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ldp_fusion_destroy (); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> 2.39.3 >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Alex >>>>>>>>>>>> >>>>>>>>>>>> Thanks & Regards >>>>>>>>>>>> Ajit