On Fri, Nov 6, 2015 at 9:24 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Hi,
>> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
>> prevented from representing address of one memory object using address of
>> another memory object.  Also as I commented in that PR, we have two possible
>> fixes for this:
>> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
>> using another one, while not breaking PR50955.
>> 2) Add iv candidates with base object stripped.  In this way, we use the
>> common base-stripped part to represent all address expressions, in the form
>> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>>
>> In terms of code generation, method 2) is at least as good as 1), actually
>> better in my opinion.  The problem of 2) is we need to tell when iv
>> candidates should be added for the common part and when shouldn't.  This
>> issue can be generalized and described as: We know IVO tries to add
>> candidates by deriving from iv uses.  One disadvantage is that candidates
>> are derived from iv use independently.  It doesn't take common sub
>> expression among different iv uses into consideration.  As a result,
>> candidate for common sub expression is not added, while many useless
>> candidates are added.
>>
>> As a matter of fact, candidate derived from iv use is useful only if it's
>> common enough and could be shared among different uses.  A candidate is most
>> likely useless if it's derived from a single use and could not be shared by
>> others.  This patch works in this way by firstly recording all kinds
>> candidates derived from iv uses, then adding candidates for common ones.
>>
>> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
>> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
>> they are false alarms caused by align issues.
>> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
>> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
>> another irrelevant issue caused by heuristic candidate selecting algorithm.
>> Unfortunately, I don't have fix to it currently.
>>
>> This patch may add more candidates in some cases, but generally candidates
>> number is smaller because we don't need to add useless candidates now.
>> Statistic data shows there are quite fewer loops with more than 30
>> candidates when building spec2k6 on x86_64 using this patch.
>>
>> Bootstrap and test on x86_64.  I will re-test it against latest trunk on
>> AArch64.  Is it OK?
>
> +inline bool
> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
> +                          const iv_common_cand *ccand2)
> +{
> +  return ccand1->hash == ccand2->hash
> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>
Hi Richard,
Thanks for reviewing.

> I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
Because operand_equal_p doesn't check type precision for constant int
nodes, and IVO needs to take precision into consideration.

> and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
> unconditionally?
Both SCEV and IVO work on expressions with type satisfying
POINTER_TYPE_P or INTEGRAL_TYPE_P, so it's safe to access precision
unconditionally?

>
> +  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
> +  if (*slot == NULL)
> +    {
> +      *slot = XNEW (struct iv_common_cand);
>
> allocate from the IV obstack instead?  I see we do a lot of heap allocations
> in IVOPTs, so we can improve that as followup as well.
>
Yes, small structures in IVO like iv, iv_use, iv_cand, iv_common_cand
are better to be allocated in obstack.  Actually I have already make
that change to struct iv.  others will be followup too.

Thanks,
bin
> We probably should empty the obstack after each processed loop.
>
> Thanks,
> Richard.
>
>
>> Thanks,
>> bin
>>
>> 2015-11-03  Bin Cheng  <bin.ch...@arm.com>
>>
>>         PR tree-optimization/52272
>>         * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
>>         (struct iv_common_cand_hasher): New struct.
>>         (iv_common_cand_hasher::hash): New function.
>>         (iv_common_cand_hasher::equal): New function.
>>         (struct ivopts_data): New fields, iv_common_cand_tab and
>>         iv_common_cands.
>>         (tree_ssa_iv_optimize_init): Initialize above fields.
>>         (record_common_cand, common_cand_cmp): New functions.
>>         (add_iv_candidate_derived_from_uses): New function.
>>         (add_iv_candidate_for_use): Record iv_common_cands derived from
>>         iv use in hash table, instead of adding candidates directly.
>>         (add_iv_candidate_for_uses): Call
>> add_iv_candidate_derived_from_uses.
>>         (record_important_candidates): Add important candidates to iv uses'
>>         related_cands.  Always keep related_cands for future use.
>>         (try_add_cand_for): Use iv uses' related_cands.
>>         (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
>>         in struct ivopts_data, iv_common_cand_tab and iv_common_cands.

Reply via email to