On Tue, Jul 25, 2017 at 1:57 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Jul 25, 2017 at 2:38 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>> Hi, >>>>> This is a followup patch better handling below case: >>>>> for (i = 0; i < n; i++) >>>>> { >>>>> a[i] = 1; >>>>> a[i+2] = 2; >>>>> } >>>>> Instead of generating root variables by loading from memory and >>>>> propagating with PHI >>>>> nodes, like: >>>>> t0 = a[0]; >>>>> t1 = a[1]; >>>>> for (i = 0; i < n; i++) >>>>> { >>>>> a[i] = 1; >>>>> t2 = 2; >>>>> t0 = t1; >>>>> t1 = t2; >>>>> } >>>>> a[n] = t0; >>>>> a[n+1] = t1; >>>>> We can simply store loop invariant values after loop body if we know loop >>>>> iterates more >>>>> than chain->length times, like: >>>>> for (i = 0; i < n; i++) >>>>> { >>>>> a[i] = 1; >>>>> } >>>>> a[n] = 2; >>>>> a[n+1] = 2; >>>>> >>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK? >>>> Update patch wrto changes in previous patch. >>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> + if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST) >>> + continue; >>> >>> Please use CONSTANT_CLASS_P (val) instead. I suppose VECTOR_CST or >>> FIXED_CST would be ok as well for example. >>> >>> Ok with that change. Did we eventually optimize this in followup >>> passes previously? >> Probably not? Given below test: >> >> int a[10000], b[10000], c[10000]; >> int f(void) >> { >> int i, n = 100; >> int t0 = a[0]; >> int t1 = a[1]; >> for (i = 0; i < n; i++) >> { >> a[i] = 1; >> int t2 = 2; >> t0 = t1; >> t1 = t2; >> } >> a[n] = t0; >> a[n+1] = t1; >> return 0; >> } >> The optimized dump is as: >> >> <bb 2> [1.00%] [count: INV]: >> t1_8 = a[1]; >> ivtmp.9_17 = (unsigned long) &a; >> _16 = ivtmp.9_17 + 400; >> >> <bb 3> [99.00%] [count: INV]: >> # t1_20 = PHI <2(3), t1_8(2)> >> # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)> >> _15 = (void *) ivtmp.9_2; >> MEM[base: _15, offset: 0B] = 1; >> ivtmp.9_1 = ivtmp.9_2 + 4; >> if (ivtmp.9_1 != _16) >> goto <bb 3>; [98.99%] [count: INV] >> else >> goto <bb 4>; [1.01%] [count: INV] >> >> <bb 4> [1.00%] [count: INV]: >> a[100] = t1_20; >> a[101] = 2; >> return 0; >> >> We now eliminate one phi and leave another behind. It is vrp1/dce2 >> when the phi is eliminated. > > Ok, I see. Maybe worth filing a missed optimization PR. Right, PR81549 filed.
Thanks, bin > > Richard. > >> Thanks, >> bin