[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #16 from vries at gcc dot gnu.org --- ping: - https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00763.html
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #17 from vries at gcc dot gnu.org --- Author: vries Date: Thu May 28 21:23:54 2015 New Revision: 223848 URL: https://gcc.gnu.org/viewcvs?rev=223848&root=gcc&view=rev Log: Add transform_to_exit_first_loop_alt 2015-05-28 Tom de Vries PR tree-optimization/65443 * tree-parloops.c (replace_imm_uses, replace_uses_in_bb_by) (replace_uses_in_bbs_by, transform_to_exit_first_loop_alt) (try_transform_to_exit_first_loop_alt): New function. (transform_to_exit_first_loop): Use try_transform_to_exit_first_loop_alt. * gcc.dg/parloops-exit-first-loop-alt-2.c: New test. * gcc.dg/parloops-exit-first-loop-alt-3.c: New test. * gcc.dg/parloops-exit-first-loop-alt.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt.c: New test. Added: branches/gomp-4_0-branch/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c branches/gomp-4_0-branch/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c branches/gomp-4_0-branch/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c branches/gomp-4_0-branch/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c Modified: branches/gomp-4_0-branch/gcc/ChangeLog.gomp branches/gomp-4_0-branch/gcc/testsuite/ChangeLog.gomp branches/gomp-4_0-branch/gcc/tree-parloops.c branches/gomp-4_0-branch/libgomp/ChangeLog.gomp
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #18 from vries at gcc dot gnu.org --- Author: vries Date: Fri Jun 5 15:57:34 2015 New Revision: 224154 URL: https://gcc.gnu.org/viewcvs?rev=224154&root=gcc&view=rev Log: Add transform_to_exit_first_loop_alt 2015-06-05 Tom de Vries merge from gomp4 branch: 2015-05-28 Tom de Vries PR tree-optimization/65443 * tree-parloops.c (replace_imm_uses, replace_uses_in_bb_by) (replace_uses_in_bbs_by, transform_to_exit_first_loop_alt) (try_transform_to_exit_first_loop_alt): New function. (transform_to_exit_first_loop): Use try_transform_to_exit_first_loop_alt. * gcc.dg/parloops-exit-first-loop-alt-2.c: New test. * gcc.dg/parloops-exit-first-loop-alt-3.c: New test. * gcc.dg/parloops-exit-first-loop-alt.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c: New test. * testsuite/libgomp.c/parloops-exit-first-loop-alt.c: New test. Added: trunk/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c trunk/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c trunk/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c trunk/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c trunk/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c trunk/libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-parloops.c trunk/libgomp/ChangeLog
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 vries at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #19 from vries at gcc dot gnu.org --- Patch with test-cases committed to trunk, marking resolved-fixed.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 Bug 65443 depends on bug 66442, which changed state. Bug 66442 Summary: [6 regression] FAIL: gcc.dg/autopar/pr46885.c (test for excess errors) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66442 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #1 from vries at gcc dot gnu.org --- Consider test.c, compiled with -O2 -tree-parallelize-loops=2: ... #include extern unsigned int *a; void f (unsigned int n) { int i; unsigned int sum = 1; #pragma omp parallel { #pragma omp for for (i = 0; i < n; ++i) sum += a[i]; } printf ("%u\n", sum); } ... Before tranform_to_exit_first_loop, the loop looks like this: ... : # sum_18 = PHI <1(11), sum_11(6)> # ivtmp_25 = PHI <0(11), ivtmp_6(6)> i_17 = (int) ivtmp_25; _7 = (long unsigned int) i_17; _8 = _7 * 4; _9 = pretmp_24 + _8; _10 = *_9; sum_11 = _10 + sum_18; i_12 = i_17 + 1; i.1_3 = (unsigned int) i_12; if (ivtmp_25 < _20) goto ; else goto ; : # sum_21 = PHI goto ; : ivtmp_6 = ivtmp_25 + 1; goto ; ... You might say that the transformation applied by tranform_to_exit_first_loop is that all the statements in bb4 before the if are moved past the if, into both bb5 and bb6. After, it looks like: ... : # sum_28 = PHI <1(11), sum_11(6)> # ivtmp_29 = PHI <0(11), ivtmp_6(6)> if (ivtmp_29 < _20) goto ; else goto ; : # sum_18 = PHI # ivtmp_25 = PHI i_17 = (int) ivtmp_25; _7 = (long unsigned int) i_17; _8 = _7 * 4; _9 = pretmp_24 + _8; _10 = *_9; sum_11 = _10 + sum_18; i_12 = i_17 + 1; i.1_3 = (unsigned int) i_12; goto ; : # sum_30 = PHI ivtmp_31 = _20; i_32 = (int) ivtmp_31; _33 = (long unsigned int) i_32; _34 = _33 * 4; _35 = pretmp_24 + _34; _36 = *_35; sum_37 = _36 + sum_30; i_38 = i_32 + 1; i.1_39 = (unsigned int) i_38; : # sum_21 = PHI goto ; : ivtmp_6 = ivtmp_25 + 1; goto ; ...
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #2 from vries at gcc dot gnu.org --- AFAIU, this is meant with the todo: ... : goto ; : i_17 = (int) ivtmp_6; _7 = (long unsigned int) i_17; _8 = _7 * 4; _9 = pretmp_24 + _8; _10 = *_9; sum_11 = _10 + sum_y; i_12 = i_17 + 1; i.1_3 = (unsigned int) i_12; : # sum_y = PHI <1(x), sum_11(4)> # ivtmp_y = PHI <0(x), ivtmp_6(4)> if (ivtmp_y < _20 + 1) goto ; else goto ; : # sum_21 = PHI goto ; : ivtmp_6 = ivtmp_y + 1; goto ; ... So, sort of: - Split bb 4 before the loop condition, creating bb y. - Don't enter the loop at bb 4 as before, instead jump to before the loop condition, to bb y (creating bb x in the process) - For each phi in bb 4, add a corresponding phi to bb y: - For the values for entry from bb x, use the values in the phis in bb 4 for entry from bb 11. - For the values for entry from bb 4, use the reaching definitions. - increase loop bound with 1 (_20 + 1) - simplify the phis in bb 4 - use the new phis in bb y as defs for the reachable uses The problem with this transformation is that '_20 + 1' might overflow, that's what the comment 'This may need some additional preconditioning in case NIT = ~0' refers to.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #3 from vries at gcc dot gnu.org --- (In reply to vries from comment #2) > The problem with this transformation is that '_20 + 1' might overflow, > that's what the comment 'This may need some additional preconditioning in > case NIT = ~0' refers to. AFAIU, we might also move 'ivtmp_6 = ivtmp_y + 1' to the end of bb4. That way it's not triggered at loop entry, as before the transformation, eliminating the need for '_20 + 1'.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #4 from vries at gcc dot gnu.org --- (In reply to vries from comment #3) > (In reply to vries from comment #2) > > The problem with this transformation is that '_20 + 1' might overflow, > > that's what the comment 'This may need some additional preconditioning in > > case NIT = ~0' refers to. > > AFAIU, we might also move 'ivtmp_6 = ivtmp_y + 1' to the end of bb4. That > way it's not triggered at loop entry, as before the transformation, > eliminating the need for '_20 + 1'. One thing I overlooked there: _20 = n_4(D) + 4294967295; If n == 0, we don't reach the loop. If n == 1, we reach the loop, and _20 == 0. And when we reach the loop condition from loop entry with ivtmp == 0, ivtmp < _20 will evaluate to false, and we won't even enter the loop. That's the problem we're trying to solve using '_20 + 1'. And moving 'ivtmp_6 = ivtmp_y + 1' to the end of bb4 doesn't fix that.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2015-03-18 Ever confirmed|0 |1 --- Comment #5 from Richard Biener --- parloops needs a _lot_ of TLC! Confirmed.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #6 from vries at gcc dot gnu.org --- After looking into it a bit further, I think what we're trying to get is: ... : goto ; : i_17 = (int) ivtmp_y; _7 = (long unsigned int) i_17; _8 = _7 * 4; _9 = pretmp_24 + _8; _10 = *_9; sum_11 = _10 + sum_y; i_12 = i_17 + 1; i.1_3 = (unsigned int) i_12; goto ; : ivtmp_6 = ivtmp_y + 1; goto ; : # sum_y = PHI <1(x), sum_11(6)> # ivtmp_y = PHI <0(x), ivtmp_6(6)> if (ivtmp_y < _20 + 1) goto ; else goto ; : # sum_21 = PHI goto ; ...
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #7 from vries at gcc dot gnu.org --- Created attachment 35078 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35078&action=edit WIP patch WIP patch, works on included testcase only.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #8 from vries at gcc dot gnu.org --- Created attachment 35079 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35079&action=edit parloops dump with -fno-try
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #9 from vries at gcc dot gnu.org --- Created attachment 35080 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35080&action=edit parloops dump with -ftry
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 vries at gcc dot gnu.org changed: What|Removed |Added Attachment #35078|0 |1 is obsolete|| --- Comment #10 from vries at gcc dot gnu.org --- Created attachment 35092 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35092&action=edit WIP patch Updated patch which fixes probability/frequency. The generated code for the loopfn is now identical at the optimized dump (previously we were sinking loads into the loop nest due to the broken probability/frequency). The main difference in generated code at the optimized dump is this: ... : + n_24 = n_5(D); .paral_data_store.6.a = &a; .paral_data_store.6.b = &b; .paral_data_store.6.c = &c; - .paral_data_store.6.D.1854 = _12; + .paral_data_store.6.D.1854 = n_5(D); __builtin_GOMP_parallel (f._loopfn.0, &.paral_data_store.6, 2, 0); - ivtmp_27 = (signed int) _12; - _29 = a[ivtmp_27]; - _30 = b[ivtmp_27]; - _31 = _29 + _30; - c[ivtmp_27] = _31; ... That is, we up the number of iterations with one (from _n - 1 to n), and remove the peeled-off last loop iteration (the code after the __builtin_GOMP_parallel).
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 vries at gcc dot gnu.org changed: What|Removed |Added Attachment #35092|0 |1 is obsolete|| --- Comment #11 from vries at gcc dot gnu.org --- Created attachment 35103 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35103&action=edit WIP patch Updated patch. Skips cases that it can't handle, so it's on by default now. Bootstrapped and reg-tested on x86_64, no issues found.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 vries at gcc dot gnu.org changed: What|Removed |Added Attachment #35103|0 |1 is obsolete|| --- Comment #12 from vries at gcc dot gnu.org --- Created attachment 35142 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35142&action=edit WIP patch Updated patch. Now handles both constant and variable bounds, and lists the test-cases with variable bounds it doesn't handle. Build and reg-tested on x86_64. Still todo: reductions.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #13 from vries at gcc dot gnu.org --- Created attachment 35145 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35145&action=edit WIP patch Added reduction example to testcases. Patch runs test-cases successfully.
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 vries at gcc dot gnu.org changed: What|Removed |Added Keywords||patch --- Comment #14 from vries at gcc dot gnu.org --- https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01441.html
[Bug tree-optimization/65443] Don't peel last iteration from loop in transform_to_exit_first_loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443 --- Comment #15 from vries at gcc dot gnu.org --- Submitted updated patch: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00115.html