On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng" <amker.ch...@gmail.com> 
wrote:
>On Fri, Nov 6, 2015 at 9:24 PM, Richard Biener
><richard.guent...@gmail.com> wrote:
>> On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.ch...@arm.com> wrote:
>>> Hi,
>>> PR52272 reported a performance regression in spec2006/410.bwaves
>once GCC is
>>> prevented from representing address of one memory object using
>address of
>>> another memory object.  Also as I commented in that PR, we have two
>possible
>>> fixes for this:
>>> 1) Improve how TMR.base is deduced, so that we can represent addr of
>mem obj
>>> using another one, while not breaking PR50955.
>>> 2) Add iv candidates with base object stripped.  In this way, we use
>the
>>> common base-stripped part to represent all address expressions, in
>the form
>>> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>>>
>>> In terms of code generation, method 2) is at least as good as 1),
>actually
>>> better in my opinion.  The problem of 2) is we need to tell when iv
>>> candidates should be added for the common part and when shouldn't. 
>This
>>> issue can be generalized and described as: We know IVO tries to add
>>> candidates by deriving from iv uses.  One disadvantage is that
>candidates
>>> are derived from iv use independently.  It doesn't take common sub
>>> expression among different iv uses into consideration.  As a result,
>>> candidate for common sub expression is not added, while many useless
>>> candidates are added.
>>>
>>> As a matter of fact, candidate derived from iv use is useful only if
>it's
>>> common enough and could be shared among different uses.  A candidate
>is most
>>> likely useless if it's derived from a single use and could not be
>shared by
>>> others.  This patch works in this way by firstly recording all kinds
>>> candidates derived from iv uses, then adding candidates for common
>ones.
>>>
>>> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw
>regression for
>>> 400.perlbench and small regression for 401.bzip on x86_64, but I can
>confirm
>>> they are false alarms caused by align issues.
>>> For aarch64, fp cases are obviously improved for both spec2000 and
>spec2006.
>>> Also the patch causes 2-3% regression for 459.GemsFDTD, which I
>think is
>>> another irrelevant issue caused by heuristic candidate selecting
>algorithm.
>>> Unfortunately, I don't have fix to it currently.
>>>
>>> This patch may add more candidates in some cases, but generally
>candidates
>>> number is smaller because we don't need to add useless candidates
>now.
>>> Statistic data shows there are quite fewer loops with more than 30
>>> candidates when building spec2k6 on x86_64 using this patch.
>>>
>>> Bootstrap and test on x86_64.  I will re-test it against latest
>trunk on
>>> AArch64.  Is it OK?
>>
>> +inline bool
>> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
>> +                          const iv_common_cand *ccand2)
>> +{
>> +  return ccand1->hash == ccand2->hash
>> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
>> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
>> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
>> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>>
>Hi Richard,
>Thanks for reviewing.
>
>> I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
>Because operand_equal_p doesn't check type precision for constant int
>nodes, and IVO needs to take precision into consideration.

Ok

>> and b) what kind of tree is base so that it is safe to inspect
>TYPE_PRECISION
>> unconditionally?
>Both SCEV and IVO work on expressions with type satisfying
>POINTER_TYPE_P or INTEGRAL_TYPE_P, so it's safe to access precision
>unconditionally?

Yes.  Patch is OK then.

Richard.
>
>>
>> +  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
>> +  if (*slot == NULL)
>> +    {
>> +      *slot = XNEW (struct iv_common_cand);
>>
>> allocate from the IV obstack instead?  I see we do a lot of heap
>allocations
>> in IVOPTs, so we can improve that as followup as well.
>>
>Yes, small structures in IVO like iv, iv_use, iv_cand, iv_common_cand
>are better to be allocated in obstack.  Actually I have already make
>that change to struct iv.  others will be followup too.
>
>Thanks,
>bin
>> We probably should empty the obstack after each processed loop.
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks,
>>> bin
>>>
>>> 2015-11-03  Bin Cheng  <bin.ch...@arm.com>
>>>
>>>         PR tree-optimization/52272
>>>         * tree-ssa-loop-ivopts.c (struct iv_common_cand): New
>struct.
>>>         (struct iv_common_cand_hasher): New struct.
>>>         (iv_common_cand_hasher::hash): New function.
>>>         (iv_common_cand_hasher::equal): New function.
>>>         (struct ivopts_data): New fields, iv_common_cand_tab and
>>>         iv_common_cands.
>>>         (tree_ssa_iv_optimize_init): Initialize above fields.
>>>         (record_common_cand, common_cand_cmp): New functions.
>>>         (add_iv_candidate_derived_from_uses): New function.
>>>         (add_iv_candidate_for_use): Record iv_common_cands derived
>from
>>>         iv use in hash table, instead of adding candidates directly.
>>>         (add_iv_candidate_for_uses): Call
>>> add_iv_candidate_derived_from_uses.
>>>         (record_important_candidates): Add important candidates to
>iv uses'
>>>         related_cands.  Always keep related_cands for future use.
>>>         (try_add_cand_for): Use iv uses' related_cands.
>>>         (free_loop_data, tree_ssa_iv_optimize_finalize): Release new
>fields
>>>         in struct ivopts_data, iv_common_cand_tab and
>iv_common_cands.


Reply via email to