On Fri, Nov 6, 2015 at 9:24 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.ch...@arm.com> wrote: >> Hi, >> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is >> prevented from representing address of one memory object using address of >> another memory object. Also as I commented in that PR, we have two possible >> fixes for this: >> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj >> using another one, while not breaking PR50955. >> 2) Add iv candidates with base object stripped. In this way, we use the >> common base-stripped part to represent all address expressions, in the form >> of [base_1 + common], [base_2 + common], ..., [base_n + common]. >> >> In terms of code generation, method 2) is at least as good as 1), actually >> better in my opinion. The problem of 2) is we need to tell when iv >> candidates should be added for the common part and when shouldn't. This >> issue can be generalized and described as: We know IVO tries to add >> candidates by deriving from iv uses. One disadvantage is that candidates >> are derived from iv use independently. It doesn't take common sub >> expression among different iv uses into consideration. As a result, >> candidate for common sub expression is not added, while many useless >> candidates are added. >> >> As a matter of fact, candidate derived from iv use is useful only if it's >> common enough and could be shared among different uses. A candidate is most >> likely useless if it's derived from a single use and could not be shared by >> others. This patch works in this way by firstly recording all kinds >> candidates derived from iv uses, then adding candidates for common ones. >> >> The patch improves 410.bwaves by 3-4% on x86_64. I also saw regression for >> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm >> they are false alarms caused by align issues. >> For aarch64, fp cases are obviously improved for both spec2000 and spec2006. >> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is >> another irrelevant issue caused by heuristic candidate selecting algorithm. >> Unfortunately, I don't have fix to it currently. >> >> This patch may add more candidates in some cases, but generally candidates >> number is smaller because we don't need to add useless candidates now. >> Statistic data shows there are quite fewer loops with more than 30 >> candidates when building spec2k6 on x86_64 using this patch. >> >> Bootstrap and test on x86_64. I will re-test it against latest trunk on >> AArch64. Is it OK? > > +inline bool > +iv_common_cand_hasher::equal (const iv_common_cand *ccand1, > + const iv_common_cand *ccand2) > +{ > + return ccand1->hash == ccand2->hash > + && operand_equal_p (ccand1->base, ccand2->base, 0) > + && operand_equal_p (ccand1->step, ccand2->step, 0) > + && TYPE_PRECISION (TREE_TYPE (ccand1->base)) > + == TYPE_PRECISION (TREE_TYPE (ccand2->base)); > Hi Richard, Thanks for reviewing.
> I'm wondering on the TYPE_PRECISION check. a) why is that needed? Because operand_equal_p doesn't check type precision for constant int nodes, and IVO needs to take precision into consideration. > and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION > unconditionally? Both SCEV and IVO work on expressions with type satisfying POINTER_TYPE_P or INTEGRAL_TYPE_P, so it's safe to access precision unconditionally? > > + slot = data->iv_common_cand_tab->find_slot (&ent, INSERT); > + if (*slot == NULL) > + { > + *slot = XNEW (struct iv_common_cand); > > allocate from the IV obstack instead? I see we do a lot of heap allocations > in IVOPTs, so we can improve that as followup as well. > Yes, small structures in IVO like iv, iv_use, iv_cand, iv_common_cand are better to be allocated in obstack. Actually I have already make that change to struct iv. others will be followup too. Thanks, bin > We probably should empty the obstack after each processed loop. > > Thanks, > Richard. > > >> Thanks, >> bin >> >> 2015-11-03 Bin Cheng <bin.ch...@arm.com> >> >> PR tree-optimization/52272 >> * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct. >> (struct iv_common_cand_hasher): New struct. >> (iv_common_cand_hasher::hash): New function. >> (iv_common_cand_hasher::equal): New function. >> (struct ivopts_data): New fields, iv_common_cand_tab and >> iv_common_cands. >> (tree_ssa_iv_optimize_init): Initialize above fields. >> (record_common_cand, common_cand_cmp): New functions. >> (add_iv_candidate_derived_from_uses): New function. >> (add_iv_candidate_for_use): Record iv_common_cands derived from >> iv use in hash table, instead of adding candidates directly. >> (add_iv_candidate_for_uses): Call >> add_iv_candidate_derived_from_uses. >> (record_important_candidates): Add important candidates to iv uses' >> related_cands. Always keep related_cands for future use. >> (try_add_cand_for): Use iv uses' related_cands. >> (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields >> in struct ivopts_data, iv_common_cand_tab and iv_common_cands.