On Tue, Jul 25, 2017 at 1:57 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>> Hi,
>>>>> This is a followup patch better handling below case:
>>>>>      for (i = 0; i < n; i++)
>>>>>        {
>>>>>          a[i] = 1;
>>>>>          a[i+2] = 2;
>>>>>        }
>>>>> Instead of generating root variables by loading from memory and 
>>>>> propagating with PHI
>>>>> nodes, like:
>>>>>      t0 = a[0];
>>>>>      t1 = a[1];
>>>>>      for (i = 0; i < n; i++)
>>>>>        {
>>>>>          a[i] = 1;
>>>>>          t2 = 2;
>>>>>          t0 = t1;
>>>>>          t1 = t2;
>>>>>        }
>>>>>      a[n] = t0;
>>>>>      a[n+1] = t1;
>>>>> We can simply store loop invariant values after loop body if we know loop 
>>>>> iterates more
>>>>> than chain->length times, like:
>>>>>      for (i = 0; i < n; i++)
>>>>>        {
>>>>>          a[i] = 1;
>>>>>        }
>>>>>      a[n] = 2;
>>>>>      a[n+1] = 2;
>>>>>
>>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>>> Update patch wrto changes in previous patch.
>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> +      if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST)
>>> +       continue;
>>>
>>> Please use CONSTANT_CLASS_P (val) instead.  I suppose VECTOR_CST or
>>> FIXED_CST would be ok as well for example.
>>>
>>> Ok with that change.  Did we eventually optimize this in followup
>>> passes previously?
>> Probably not?  Given below test:
>>
>> int a[10000], b[10000], c[10000];
>> int f(void)
>> {
>>   int i, n = 100;
>>   int t0 = a[0];
>>   int t1 = a[1];
>>      for (i = 0; i < n; i++)
>>        {
>>          a[i] = 1;
>>          int t2 = 2;
>>          t0 = t1;
>>          t1 = t2;
>>        }
>>      a[n] = t0;
>>      a[n+1] = t1;
>>   return 0;
>> }
>> The optimized dump is as:
>>
>>   <bb 2> [1.00%] [count: INV]:
>>   t1_8 = a[1];
>>   ivtmp.9_17 = (unsigned long) &a;
>>   _16 = ivtmp.9_17 + 400;
>>
>>   <bb 3> [99.00%] [count: INV]:
>>   # t1_20 = PHI <2(3), t1_8(2)>
>>   # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)>
>>   _15 = (void *) ivtmp.9_2;
>>   MEM[base: _15, offset: 0B] = 1;
>>   ivtmp.9_1 = ivtmp.9_2 + 4;
>>   if (ivtmp.9_1 != _16)
>>     goto <bb 3>; [98.99%] [count: INV]
>>   else
>>     goto <bb 4>; [1.01%] [count: INV]
>>
>>   <bb 4> [1.00%] [count: INV]:
>>   a[100] = t1_20;
>>   a[101] = 2;
>>   return 0;
>>
>> We now eliminate one phi and leave another behind.  It is vrp1/dce2
>> when the phi is eliminated.
>
> Ok, I see.  Maybe worth filing a missed optimization PR.
Right, PR81549 filed.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin

Reply via email to