On Wed, Sep 2, 2020 at 11:50 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi Bin, > > >> 2) This case makes me think we should exclude ainc candidates in function > >> mark_reg_offset_candidates. The justification is that: ainc candidate > >> handles step update itself and when we calculate the cost for it against > >> its ainc_use, the cost_step has been reduced. When unrolling happens, > >> the ainc computations are replicated and it doesn't save step updates > >> like normal reg_offset_p candidates. > > Though auto-inc candidate embeds stepping operation into memory > > access, we might want to avoid it in case of unroll if there are many > > sequences of memory accesses, and if the unroll factor is big. The > > rationale is embedded stepping is a u-arch operation and does have its > > cost. > > > > Thanks for the comments! Agree! Excluding them from reg_offset_p > candidates here is consistent with this expectation, it makes us > consider the unroll factor effect when checking the corresponding > step cost and the embedded stepping cost (in group/candidate cost, > minus step cost and use the cost from the address_cost hook). > > >> > >> I've updated the patch to punt ainc_use candidates as below: > >> > >>> + /* Skip AINC candidate since it contains address update itself, > >>> + the replicated AINC computations when unrolling still have > >>> + updates, unlike reg_offset_p candidates can save step updates > >>> + when unrolling. */ > >>> + if (cand->ainc_use) > >>> + continue; > >>> + > >> > >> For this new attached patch, it's bootstrapped and regress-tested without > >> explicit unrolling, while the only one failure has been identified as > >> rs6000 specific address_cost issue in bootstrapping and regression testing > >> with explicit unrolling. > >> > >> By the way, with above simple hack of address_cost, I also did one > >> bootstrapping and regression testing with explicit unrolling, the above > >> sms-4.c did pass as I expected but had two failures instead: > >> > >> PASS->FAIL: gcc.dg/sms-compare-debug-1.c (test for excess errors) > >> PASS->FAIL: gcc.dg/tree-ssa/ivopts-lt.c scan-tree-dump-times ivopts > >> "PHI" 2 > >> > >> By further investigation, the 2nd one is expected due to the adddress_cost > >> hook > >> hack, while the 1st one one exposed -fcompare-debug issue in sms. The RTL > >> sequence starts to off from sms, just some NOTE_INSN_DELETED positions > >> change. > >> I believe it's just exposed by this combination unluckily/luckily ;-) I > >> will > >> send a patch separately for it once I got time to look into it, but it > >> should > >> be unrelated to this patch series for sure. > > This is the kind of situation I intended to avoid before. IMHO, this > > isn't a neat change (it can't be given we are predicting the future > > transformation in compilation pipeline), accumulation of such changes > > could make IVOPTs break in one way or another. So as long as you make > > sure it doesn't have functional impact in case of no-rtl_unroll, I am > > fine. > > Yeah, I admit it's not neat, but the proposals in the previous discussions > without predicting unroll factor can not work well for all scenarios with > different unroll factors, they could over-blame some kind of candidates. > For the case of no-rtl_unroll, unroll factor estimation should set > loop->estimated_unroll to zero, all these changes won't take effect. The > estimation function follows the same logics as that of RTL unroll factor > calculation, I did test with explicit unrolling disablement before, it > worked expectedly. Thanks for working on this, also sorry for being nitpicking.
Thanks, bin