[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Known to work||7.3.1, 8.1.0 Resolution|--- |FIXED --- Comment #15 from Richard Biener --- Fixed.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #14 from Richard Biener --- Author: rguenth Date: Tue Nov 20 14:47:49 2018 New Revision: 266318 URL: https://gcc.gnu.org/viewcvs?rev=266318&root=gcc&view=rev Log: 2018-11-20 Richard Biener Backport from mainline 2018-03-12 Richard Biener PR tree-optimization/84777 * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): For force-vectorize loops ignore whether we are optimizing for size. 2018-01-26 Richard Biener PR rtl-optimization/84003 * dse.c (record_store): Only record redundant stores when the earlier store aliases at least all accesses the later one does. * g++.dg/torture/pr77745.C: Mark foo noinline to trigger latent bug in DSE if NOINLINE is appropriately defined. * g++.dg/torture/pr77745-2.C: New testcase including pr77745.C and defining NOINLINE. Added: branches/gcc-7-branch/gcc/testsuite/g++.dg/torture/pr77745-2.C Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/dse.c branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/testsuite/g++.dg/torture/pr77745.C branches/gcc-7-branch/gcc/tree-ssa-loop-ch.c
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #13 from Richard Biener --- I will backport.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-11-20 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 Martin Liška changed: What|Removed |Added CC||marxin at gcc dot gnu.org --- Comment #12 from Martin Liška --- Richi: Planning to backport or should we close it?
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #11 from Richard Biener --- Author: rguenth Date: Mon Mar 12 08:45:54 2018 New Revision: 258444 URL: https://gcc.gnu.org/viewcvs?rev=258444&root=gcc&view=rev Log: 2018-03-12 Richard Biener PR tree-optimization/84777 * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): For force-vectorize loops ignore whether we are optimizing for size. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-loop-ch.c
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #10 from Richard Biener --- GCC 8 has the patch now.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #9 from rguenther at suse dot de --- On Fri, 9 Mar 2018, linux at carewolf dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 > > --- Comment #8 from Allan Jensen --- > Yes, those I say are missing are compared to -O2. I was investigating this in > relation to Qt. We either build these files with -O3, or with -Os for customer > that are binary size sensitive. Since some of the image handling routines are > quite heavy and have been written for auto-vectorization I was just checking > if > I could get it to work and the results with your patch are quite good: > > Normal sizes of qdrawhelper.o with -O3/-O2/-Os: > 277704 / 198984 / 168440 > > With -O2 -ftree-vectorize: 242224 > With -O2 -fopenmp: 219536 > With -Os -ftree-loop-vectorize: 168440 (no change) > With -Os -fopenmp: 177144 (with your patch) > > So most of the -Os benefit and still many of the central draw loops > auto-vectorized. That looks indeed good. We have enough infrastructure already to support a #pragma GCC vectorize as well (it was added for Ada) just nobody bothered to add C/C++ support. > Haven't benchmarked it yet though. I will test and post the patch, I think it makes sense.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #8 from Allan Jensen --- Yes, those I say are missing are compared to -O2. I was investigating this in relation to Qt. We either build these files with -O3, or with -Os for customer that are binary size sensitive. Since some of the image handling routines are quite heavy and have been written for auto-vectorization I was just checking if I could get it to work and the results with your patch are quite good: Normal sizes of qdrawhelper.o with -O3/-O2/-Os: 277704 / 198984 / 168440 With -O2 -ftree-vectorize: 242224 With -O2 -fopenmp: 219536 With -Os -ftree-loop-vectorize: 168440 (no change) With -Os -fopenmp: 177144 (with your patch) So most of the -Os benefit and still many of the central draw loops auto-vectorized. Haven't benchmarked it yet though.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #7 from rguenther at suse dot de --- On Fri, 9 Mar 2018, linux at carewolf dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 > > --- Comment #6 from Allan Jensen --- > Great. Your patch worked with 90% of the marked loops! Good! > The remaining report things like this with -fopt-info-vec-missed: > > note: not vectorized: relevant stmt not supported: idisty.872_437 = (unsigned > int) idisty_386; > note: bad operation or unsupported loop bound. > > But the result is already pretty good for -fopenmp with manually marked loops. So is it any better if you use -O2 rather than -Os? Do you really need -Os? GCCs -O2 isn't as excessive code-size wise as competitors like ICC.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #6 from Allan Jensen --- Great. Your patch worked with 90% of the marked loops! The remaining report things like this with -fopt-info-vec-missed: note: not vectorized: relevant stmt not supported: idisty.872_437 = (unsigned int) idisty_386; note: bad operation or unsupported loop bound. But the result is already pretty good for -fopenmp with manually marked loops.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #5 from rguenther at suse dot de --- On Fri, 9 Mar 2018, linux at carewolf dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 > > --- Comment #4 from Allan Jensen --- > I will try the patch. I just tried -fopt-info-vec-missed and the message > reported for every loop was: > > note: not vectorized: latch block not empty. > note: bad loop form. Yeah, that's the effect for while () / for () style loops that haven't been transformed to do {} while () style by loop header copying.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #4 from Allan Jensen --- I will try the patch. I just tried -fopt-info-vec-missed and the message reported for every loop was: note: not vectorized: latch block not empty. note: bad loop form.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #3 from Richard Biener --- FDO might also help given important loops should show up as hot.
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 --- Comment #2 from Richard Biener --- Hmm, patch can't help. Instead try the following which should make the omp simd case work. Index: gcc/tree-ssa-loop-ch.c === --- gcc/tree-ssa-loop-ch.c (revision 258380) +++ gcc/tree-ssa-loop-ch.c (working copy) @@ -57,7 +57,8 @@ should_duplicate_loop_header_p (basic_bl be true, since quite often it is possible to verify that the condition is satisfied in the first iteration and therefore to eliminate it. Jump threading handles these cases now. */ - if (optimize_loop_for_size_p (loop)) + if (optimize_loop_for_size_p (loop) + && !loop->force_vectorize) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file,
[Bug tree-optimization/84777] -Os inhibits all vectorization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84777 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- IIRC the issue is that -Os inhibits most loop-header copying. Can you provide a testcase that shows the issue please? Can you check if the following patch fixes things for you? Index: gcc/tree-ssa-loop-ch.c === --- gcc/tree-ssa-loop-ch.c (revision 258380) +++ gcc/tree-ssa-loop-ch.c (working copy) @@ -257,8 +257,7 @@ public: /* opt_pass methods: */ virtual bool gate (function *fun) { -return flag_tree_ch != 0 - && (flag_tree_loop_vectorize != 0 || fun->has_force_vectorize_loops); +return flag_tree_loop_vectorize != 0 || fun->has_force_vectorize_loops; } /* Just copy headers, no initialization/finalization of loop structures. */