On Mon, May 9, 2016 at 11:54 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Fri, May 6, 2016 at 11:42 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Fri, May 6, 2016 at 10:40 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>> On Tue, May 3, 2016 at 11:08 AM, Richard Biener
>>> <richard.guent...@gmail.com> wrote:
>>>> On Tue, May 3, 2016 at 12:01 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>>> On Mon, May 2, 2016 at 10:00 AM, Richard Biener
>>>>> <richard.guent...@gmail.com> wrote:
>>>>>> On Fri, Apr 29, 2016 at 5:05 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>>>>> On Fri, Apr 29, 2016 at 12:16 PM, Richard Biener
>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>> On Thu, Apr 28, 2016 at 2:56 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>> Tree if-conversion sometimes cannot convert conditional array 
>>>>>>>>> reference into unconditional one.  Root cause is GCC conservatively 
>>>>>>>>> assumes newly introduced array reference could be out of array bound 
>>>>>>>>> and thus trapping.  This patch improves the situation by proving the 
>>>>>>>>> converted unconditional array reference is within array bound using 
>>>>>>>>> loop niter information.  To be specific, it checks every index of 
>>>>>>>>> array reference to see if it's within bound in 
>>>>>>>>> ifcvt_memrefs_wont_trap.  This patch also factors out 
>>>>>>>>> base_object_writable checking if the base object is writable or not.
>>>>>>>>> Bootstrap and test on x86_64 and aarch64, is it OK?
>>>>>>>>
>>>>>>>> I think you miss to handle the case optimally where the only
>>>>>>>> non-ARRAY_REF idx is the dereference of the
>>>>>>>> base-pointer for, say, p->a[i].  In this case we can use
>>>>>>>> base_master_dr to see if p is unconditionally dereferenced
>>>>>>> Yes, will pick up this case.
>>>>>>>
>>>>>>>> in the loop.  You also fail to handle the case where we have
>>>>>>>> MEM_REF[&x].a[i] that is, you see a decl base.
>>>>>>> I am having difficulty in creating this case for ifcvt, any advices?  
>>>>>>> Thanks.
>>>>>>
>>>>>> Sth like
>>>>>>
>>>>>> float a[128];
>>>>>> float foo (int n, int i)
>>>>>> {
>>>>>>   return (*((float(*)[n])a))[i];
>>>>>> }
>>>>>>
>>>>>> should do the trick (w/o the component-ref).  Any other type-punning
>>>>>> would do it, too.
>>>>>>
>>>>>>>> I suppose for_each_index should be fixed for this particular case (to
>>>>>>>> return true), same for TARGET_MEM_REF TMR_BASE.
>>>>>>>>
>>>>>>>> +  /* The case of nonconstant bounds could be handled, but it would be
>>>>>>>> +     complicated.  */
>>>>>>>> +  if (TREE_CODE (low) != INTEGER_CST || !integer_zerop (low)
>>>>>>>> +      || !high || TREE_CODE (high) != INTEGER_CST)
>>>>>>>> +    return false;
>>>>>>>> +
>>>>>>>>
>>>>>>>> handling of a non-zero but constant low bound is important - otherwise
>>>>>>>> all this is a no-op for Fortran.  It
>>>>>>>> shouldn't be too difficult to handle after all.  In fact I think your
>>>>>>>> code does handle it correctly already.
>>>>>>>>
>>>>>>>> +  if (!init || TREE_CODE (init) != INTEGER_CST
>>>>>>>> +      || !step || TREE_CODE (step) != INTEGER_CST || integer_zerop 
>>>>>>>> (step))
>>>>>>>> +    return false;
>>>>>>>>
>>>>>>>> step == 0 should be easy to handle as well, no?  The index will simply
>>>>>>>> always be 'init' ...
>>>>>>>>
>>>>>>>> +  /* In case the relevant bound of the array does not fit in type, or
>>>>>>>> +     it does, but bound + step (in type) still belongs into the range 
>>>>>>>> of the
>>>>>>>> +     array, the index may wrap and still stay within the range of the 
>>>>>>>> array
>>>>>>>> +     (consider e.g. if the array is indexed by the full range of
>>>>>>>> +     unsigned char).
>>>>>>>> +
>>>>>>>> +     To make things simpler, we require both bounds to fit into type, 
>>>>>>>> although
>>>>>>>> +     there are cases where this would not be strictly necessary.  */
>>>>>>>> +  if (!int_fits_type_p (high, type) || !int_fits_type_p (low, type))
>>>>>>>> +    return false;
>>>>>>>> +
>>>>>>>> +  low = fold_convert (type, low);
>>>>>>>>
>>>>>>>> please use wide_int for all of this.
>>>>>>> Now I use wi:fits_to_tree_p instead of int_fits_type_p. But I am not
>>>>>>> sure what's the meaning by "handle "low = fold_convert (type, low);"
>>>>>>> related code in wide_int".   Do you mean to use tree_int_cst_compare
>>>>>>> instead of tree_int_cst_compare in the following code?
>>>>>>
>>>>>> I don't think you need any kind of fits-to-type check here.  You'd simply
>>>>>> use to_widest () when operating on / comparing with high/low.
>>>>> But what would happen if low/high and init/step are different in type
>>>>> sign-ness?  Anything special I need to do before using wi::ltu_p or
>>>>> wi::lts_p directly?
>>>>
>>>> You want to use to_widest (min) which extends according to sign to
>>>> an "infinite" precision signed integer.  So you can then use the new
>>>> operator< overloads as well.
>>>>
>>> Hi,
>>> Here is the updated patch.  It includes below changes according to
>>> review comments:
>>>
>>> 1) It uses widest_int for all INTEGER_CST tree computations, which
>>> simplifies the patch a lot.
>>> 2) It covers array with non-zero low bound, which is important for Fortran.
>>> 3) It picks up a boundary case so that ifc-11.c/vect-23.c/vect-24.c
>>> can be handled.
>>> 4) It also checks within bound array reference inside a structure like
>>> p->a[i] by using base_master_dr in tree-if-conv.c so that ifc-12.c can
>>> be handled.
>>>
>>> It leaves two other review comments not addressed:
>>> 1) It doesn't handle array reference whose idx is a wrapping SCEV.
>>> Because only read is safe and vectorizer itself may be confused by it
>>> now.
>>> 2) It doesn't handle array reference in the form of
>>> "MEM[(float[0:(sizetype) ((long int) SAVE_EXPR <m.2> + -1)]
>>> *)&b][i_1];". To handle this case, existing code in
>>> array_at_struct_end_p as well as this patch need to be improved.  I
>>> tend to handle it in an independent patch.
>>>
>>> With these changes, now cases pr61194.c and vect-23.c can be
>>> vectorized, I removed XFAIL from them.  Also vect-mask-store-move-1.c
>>> is affected and not vectorized now, this is tricky and it exposes a
>>> known bug PR65206 in vectorizer's dependence analyzer.  This should be
>>> handled independently too.  Also gcc.dg/vect/vect-24.c now is XPASS on
>>> AArch64, but not x86_64.  Root cause is dom2 jump threads first
>>> iteration of loop thus idx_within_array_bound is failed.  I didn't
>>> check if jump threading is good in this case, either I remove the
>>> XFAIL mark.  I tend to improve idx_within_array_bound by using VRP
>>> information in a following patch.
>>
>> Hmm, I accidentally included removal of XFAIL for
>> gcc.dg/vect/vect-24.c in the patch.  Anyway, it can be included or
>> excluded wrto reviewers opinion.
>>
>> Thanks,
>> bin
>>> Otherwise bootstrap and test on x86_64 and AArch64 are fine.  Is this
>>> version OK?
>
> Ok.  Adjusting gcc.dg/vect/vect-24.c is your choice, if things change again
> with a followup you might want to defer the change to that.
Right.  I looked into VRP for vect-24.c and found it's not that easy.
The range_info computed for the index is [1, 64] which is useless
because the array bound is [0, 63].  We need to improve VRP info to
[1, 63].  Another possible issue is tricky.  In other optimizers we
may want to use control-flow sensitive VRP information.  However in
tree-if-conversion, we need to get rid of VRP information coming from
the condition of if-statement that is being converted right now.  It
doesn't matter for now, but it will if we compute flow-sensitive VRP
in the future.
For this case maybe it's worth investigating why dom2 threads the
first loop iteration.  After all, it's a canonical loop, threading the
first iteration leaves 63 iterations, which isn't friendly at least to
both vectorizer and unroller.
I will leave gcc.dg/vect/vect-24.c as it is.

Thanks,
bin
>
> Thanks,
> Richard.
>
>>> Thanks,
>>> bin
>>>
>>> 2016-05-04  Bin Cheng  <bin.ch...@arm.com>
>>>
>>>     * tree-if-conv.c (tree-ssa-loop.h): Include header file.
>>>     (tree-ssa-loop-niter.h): Ditto.
>>>     (idx_within_array_bound, ref_within_array_bound): New functions.
>>>     (ifcvt_memrefs_wont_trap): Check if array ref is within bound.
>>>     Factor out check on writable base object to ...
>>>     (base_object_writable): ... here.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2016-05-04  Bin Cheng  <bin.ch...@arm.com>
>>>
>>>     * gcc.dg/tree-ssa/ifc-9.c: New test.
>>>     * gcc.dg/tree-ssa/ifc-10.c: New test.
>>>     * gcc.dg/tree-ssa/ifc-11.c: New test.
>>>     * gcc.dg/tree-ssa/ifc-12.c: New test.
>>>     * gcc.dg/vect/pr61194.c: Remove XFAIL.
>>>     * gcc.dg/vect/vect-23.c: Remove XFAIL.
>>>     * gcc.dg/vect/vect-mask-store-move-1.c: Revise test check.

Reply via email to