[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |7.0 Status|ASSIGNED|RESOLVED --- Comment #14 from Andrew Pinski --- So GCC 7 is able to optimize this loop fully and split it into two at -O3 (r7-3966) after my comment #12. Also starting with GCC 7, we were able to vectorize the loop at -O2 -ftree-vectorize since tree-if-conv.c can do the ifconversion (I don't have the revision). So this is all fixed anyways.
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423 Andrew Pinski changed: What|Removed |Added Status|NEW |ASSIGNED Severity|normal |enhancement CC||pinskia at gcc dot gnu.org --- Comment #13 from Andrew Pinski --- The improvement in comment #12 is something which I am working on.
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423 Andrew Pinski changed: What|Removed |Added Blocks||53947 --- Comment #12 from Andrew Pinski --- .L8: ldr q4, [x9, x8] cmgtv2.4s, v6.4s, v0.4s ldr q3, [x10, x8] add w12, w12, 1 ldr q1, [x2, x8] add v0.4s, v0.4s, v5.4s add v3.4s, v3.4s, v4.4s << this one add v1.4s, v1.4s, v4.4s << this one bit v1.16b, v3.16b, v2.16b str q1, [x9, x8] add x8, x8, 16 cmp w7, w12 bhi .L8 This is the trunk on aarch64-linux-gnu. Range splitting is not there but there is more it can be done even without range splitting; there is one extra add. PRE produces: : _2 = b[i_18]; _3 = _2 + pretmp_14; goto ; : _5 = c[i_18]; _6 = _5 + pretmp_14; : # cstore_17 = PHI <_3(4), _6(5)> But we could do better and do: : _2 = b[i_18]; goto ; : _5 = c[i_18]; : # _N = PHI <_2(4), _5(5)> _cstore_17 = _N + pretmp_14; Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
--- Comment #11 from spop at gcc dot gnu dot org 2010-05-25 23:33 --- This is not a IV type problem: the number of iterations may be zero when mid == 0 or mid == n, so the number of iterations analysis has a condition under which niter may_be_zero. I sent out a patch that makes niter return a COND_EXPR instead of a chrec_dont_know: http://gcc.gnu.org/ml/gcc-patches/2010-05/msg01927.html With that patch I now get note: not vectorized: data ref analysis failed D.2726_51 = a[var.9_55]; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
--- Comment #10 from spop at gcc dot gnu dot org 2010-05-24 23:02 --- note: not vectorized: number of iterations cannot be computed. Graphite has a problem with the generation of induction variables types that makes the code harder to analyze after Graphite. I will try to get this fixed to make this loop vectorized with the iteration range splitting that Graphite does by default. Sebastian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
--- Comment #9 from changpeng dot fang at amd dot com 2010-05-24 22:47 --- (In reply to comment #8) > -fgraphite-identity does iteration splitting for this case. Do you know why it could not be vectorized after iteration range splitting? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
--- Comment #8 from spop at gcc dot gnu dot org 2010-05-24 22:44 --- -fgraphite-identity does iteration splitting for this case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
--- Comment #7 from changpeng dot fang at amd dot com 2010-05-07 21:41 --- (In reply to comment #4) > (In reply to comment #3) > > Subject: Re: gcc should vectorize this loop > > through "iteration range splitting" > > You mean that the problem is the if-conversion of the stores > > "a[i] = ..." > > If we rewrite the code like: > int a[100], b[100], c[100]; > > void foo(int n, int mid) > { > int i; > for(i=0; i { > int t; > int ai = a[i], bi = b[i], ci = c[i]; > if (i < mid) > t = ai + bi; > else > t = ai + ci; > a[i] = t; > } > } > > --- CUT --- > This gets vectorized as we produce an if-cvt first. > There are both correctness and performance issues in the re-written code. b[i] or c[i] may not be executed in the original loop. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423
[Bug tree-optimization/43423] gcc should vectorize this loop through if-conversion
--- Comment #6 from spop at gcc dot gnu dot org 2010-04-08 17:47 --- I changed the title of this bug to match the comments in the PR: we should vectorize this loop using if-conversion, and not "iteration range splitting". Also note that in general, by doing an "iteration range splitting" the data locality in the two loops could be worse than in the if-converted loop. -- spop at gcc dot gnu dot org changed: What|Removed |Added Summary|gcc should vectorize this |gcc should vectorize this |loop through "iteration |loop through if-conversion |range splitting"| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43423