Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On 30/11/15 14:32, Jakub Jelinek wrote: On Mon, Nov 30, 2015 at 02:24:18PM +0100, Richard Biener wrote: OK for stage3 trunk if bootstrap and reg-test succeeds? -|| node->address_taken); +|| (node->address_taken +&& !node->parallelized_function)); please add a comment here on why this is safe. Ok with this change. BTW, __builting_GOMP_task supposedly can be treated similarly if the third argument is NULL (if 3rd arg is non-NULL, then the caller passes a different structure from what the callee receives, but perhaps it could be emulated as pretending that cpyfn is called first with address of a temporary var and the data argument and then fn is called with the address of the temporary var). Filed as PR68673 - Handle __builtin_GOMP_task optimally in ipa-pta. Can you provide testcases for both (3rd arg NULL/non-NULL) cases? I'm not fluent in openmp. Thanks, - Tom
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On 01/12/15 13:29, Jakub Jelinek wrote: On Tue, Dec 01, 2015 at 01:27:32PM +0100, Christophe Lyon wrote: >I've committed the attached patch as obvious: it adds >dg-require-effective-target fopenmp to these new tests >so that they are skipped e.g. on arm bare-metal targets >(using newlib). > >Note that pr46032.c has some failures: >FAIL: gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1 >on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*, >and on armeb-none-linux-gnueabihf > >I haven't looked at the details yet; see >http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html >for more information. Supposedly pr46032-{2,3}.c should go into testsuite/gcc.dg/gomp/ instead and pr46032.c into testsuite/gcc.dg/vect/ (with the fopenmp effective target and perhaps some other effective target conditions)? I've moved the tests, and added dg-require-effective-target vect_int in pr46032.c. Committed to trunk as obvious. Thanks, - Tom Move pr46032*.c tests 2015-12-01 Tom de Vries * gcc.dg/pr46032.c: Move to ... * gcc.dg/vect/pr46032.c: here. Add dg-require-effective-target vect_int. * gcc.dg/pr46032-2.c: Move to ... * gcc.dg/gomp/pr46032-2.c: ... here. Drop dg-require-effective-target fopenmp. * gcc.dg/pr46032-3.c: Move to ... * gcc.dg/gomp/pr46032-3.c: ... here. Drop dg-require-effective-target fopenmp. --- gcc/testsuite/gcc.dg/gomp/pr46032-2.c | 29 + gcc/testsuite/gcc.dg/gomp/pr46032-3.c | 28 gcc/testsuite/gcc.dg/pr46032-2.c | 30 - gcc/testsuite/gcc.dg/pr46032-3.c | 29 - gcc/testsuite/gcc.dg/pr46032.c| 48 -- gcc/testsuite/gcc.dg/vect/pr46032.c | 49 +++ 6 files changed, 106 insertions(+), 107 deletions(-) diff --git a/gcc/testsuite/gcc.dg/gomp/pr46032-2.c b/gcc/testsuite/gcc.dg/gomp/pr46032-2.c new file mode 100644 index 000..e110880 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/pr46032-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ + +#define N 2 + +int +foo (void) +{ + int a[N], b[N], c[N]; + int *ap = &a[0]; + int *bp = &b[0]; + int *cp = &c[0]; + +#pragma omp parallel for + for (unsigned int idx = 0; idx < N; idx++) +{ + ap[idx] = 1; + bp[idx] = 2; + cp[idx] = ap[idx]; +} + + return *cp; +} + +/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/gomp/pr46032-3.c b/gcc/testsuite/gcc.dg/gomp/pr46032-3.c new file mode 100644 index 000..a4af7ec --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/pr46032-3.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ + +#define N 2 + +int +foo (void) +{ + int a[N], c[N]; + int *ap = &a[0]; + int *bp = &a[0]; + int *cp = &c[0]; + +#pragma omp parallel for + for (unsigned int idx = 0; idx < N; idx++) +{ + ap[idx] = 1; + bp[idx] = 2; + cp[idx] = ap[idx]; +} + + return *cp; +} + +/* { dg-final { scan-tree-dump-times "\\] = 1;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/pr46032-2.c b/gcc/testsuite/gcc.dg/pr46032-2.c deleted file mode 100644 index d769597..000 --- a/gcc/testsuite/gcc.dg/pr46032-2.c +++ /dev/null @@ -1,30 +0,0 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target fopenmp } */ -/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ - -#define N 2 - -int -foo (void) -{ - int a[N], b[N], c[N]; - int *ap = &a[0]; - int *bp = &b[0]; - int *cp = &c[0]; - -#pragma omp parallel for - for (unsigned int idx = 0; idx < N; idx++) -{ - ap[idx] = 1; - bp[idx] = 2; - cp[idx] = ap[idx]; -} - - return *cp; -} - -/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */ - diff --git a/gcc/testsuite/gcc.dg/pr46032-3.c b/gcc/testsuite/gcc.dg/pr46032-3.c deleted file mode 100644 index a9e74d0..000 --- a/gcc/testsuite/gcc.dg/pr46032-3.c +++ /dev/null @@ -1,29 +0,0 @@ -/* { dg-do compile } */ -/* { dg-require-effective-target fopenmp } */ -/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ - -#define N 2 - -int
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On Tue, Dec 01, 2015 at 01:27:32PM +0100, Christophe Lyon wrote: > I've committed the attached patch as obvious: it adds > dg-require-effective-target fopenmp to these new tests > so that they are skipped e.g. on arm bare-metal targets > (using newlib). > > Note that pr46032.c has some failures: > FAIL: gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1 > on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*, > and on armeb-none-linux-gnueabihf > > I haven't looked at the details yet; see > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html > for more information. Supposedly pr46032-{2,3}.c should go into testsuite/gcc.dg/gomp/ instead and pr46032.c into testsuite/gcc.dg/vect/ (with the fopenmp effective target and perhaps some other effective target conditions)? > 2015-12-01 Christophe Lyon > > * gcc.dg/pr46032.c: Add dg-require-effective-target fopenmp. > * gcc.dg/pr46032-2.c: Likewise. > * gcc.dg/pr46032-3.c: Likewise. Jakub
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On 30 November 2015 at 18:55, Tom de Vries wrote: > On 30/11/15 17:48, Jakub Jelinek wrote: >> >> On Mon, Nov 30, 2015 at 05:36:25PM +0100, Tom de Vries wrote: >>> >>> +int >>> +main (void) >>> +{ >>> + unsigned results[nEvents]; >>> + unsigned pData[nEvents]; >>> + unsigned coeff = 2; >>> + >>> + init (&results[0], &pData[0]); >>> + >>> +#pragma omp parallel for >>> + for (int idx = 0; idx < (int)nEvents; idx++) >>> +results[idx] = coeff * pData[idx]; >> >> >> Could you please add another testcase, where you have say pData >> and some other pointer that init sets to alias with pData, and verify >> that such loop (would need to be say normal loop inside #pragma omp single >> or master) is not vectorized? > > > I've: > - added a simpler (not vectorizer-based) version of the testcase as > pr46032-2.c, and > - copied pr46032-2.c to pr46032-3.c and modified it such that two > pointers are aliasing > > Committed to trunk. > Hi, I've committed the attached patch as obvious: it adds dg-require-effective-target fopenmp to these new tests so that they are skipped e.g. on arm bare-metal targets (using newlib). Note that pr46032.c has some failures: FAIL: gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1 on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*, and on armeb-none-linux-gnueabihf I haven't looked at the details yet; see http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html for more information. Thanks, Christophe. 2015-12-01 Christophe Lyon * gcc.dg/pr46032.c: Add dg-require-effective-target fopenmp. * gcc.dg/pr46032-2.c: Likewise. * gcc.dg/pr46032-3.c: Likewise. > Thanks, > - Tom > Index: gcc/testsuite/gcc.dg/pr46032-2.c === --- gcc/testsuite/gcc.dg/pr46032-2.c(revision 231108) +++ gcc/testsuite/gcc.dg/pr46032-2.c(working copy) @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target fopenmp } */ /* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ #define N 2 Index: gcc/testsuite/gcc.dg/pr46032-3.c === --- gcc/testsuite/gcc.dg/pr46032-3.c(revision 231108) +++ gcc/testsuite/gcc.dg/pr46032-3.c(working copy) @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target fopenmp } */ /* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ #define N 2 Index: gcc/testsuite/gcc.dg/pr46032.c === --- gcc/testsuite/gcc.dg/pr46032.c (revision 231108) +++ gcc/testsuite/gcc.dg/pr46032.c (working copy) @@ -1,4 +1,5 @@ /* { dg-do compile } */ +/* { dg-require-effective-target fopenmp } */ /* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta -fdump-tree-vect-all" } */ extern void abort (void);
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On 30/11/15 17:48, Jakub Jelinek wrote: On Mon, Nov 30, 2015 at 05:36:25PM +0100, Tom de Vries wrote: +int +main (void) +{ + unsigned results[nEvents]; + unsigned pData[nEvents]; + unsigned coeff = 2; + + init (&results[0], &pData[0]); + +#pragma omp parallel for + for (int idx = 0; idx < (int)nEvents; idx++) +results[idx] = coeff * pData[idx]; Could you please add another testcase, where you have say pData and some other pointer that init sets to alias with pData, and verify that such loop (would need to be say normal loop inside #pragma omp single or master) is not vectorized? I've: - added a simpler (not vectorizer-based) version of the testcase as pr46032-2.c, and - copied pr46032-2.c to pr46032-3.c and modified it such that two pointers are aliasing Committed to trunk. Thanks, - Tom Add gcc.dg/pr46032-{2,3}.c test-cases 2015-11-30 Tom de Vries * gcc.dg/pr46032-2.c: New test. * gcc.dg/pr46032-3.c: New test. --- gcc/testsuite/gcc.dg/pr46032-2.c | 29 + gcc/testsuite/gcc.dg/pr46032-3.c | 28 2 files changed, 57 insertions(+) diff --git a/gcc/testsuite/gcc.dg/pr46032-2.c b/gcc/testsuite/gcc.dg/pr46032-2.c new file mode 100644 index 000..e110880 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr46032-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ + +#define N 2 + +int +foo (void) +{ + int a[N], b[N], c[N]; + int *ap = &a[0]; + int *bp = &b[0]; + int *cp = &c[0]; + +#pragma omp parallel for + for (unsigned int idx = 0; idx < N; idx++) +{ + ap[idx] = 1; + bp[idx] = 2; + cp[idx] = ap[idx]; +} + + return *cp; +} + +/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */ + diff --git a/gcc/testsuite/gcc.dg/pr46032-3.c b/gcc/testsuite/gcc.dg/pr46032-3.c new file mode 100644 index 000..a4af7ec --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr46032-3.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */ + +#define N 2 + +int +foo (void) +{ + int a[N], c[N]; + int *ap = &a[0]; + int *bp = &a[0]; + int *cp = &c[0]; + +#pragma omp parallel for + for (unsigned int idx = 0; idx < N; idx++) +{ + ap[idx] = 1; + bp[idx] = 2; + cp[idx] = ap[idx]; +} + + return *cp; +} + +/* { dg-final { scan-tree-dump-times "\\] = 1;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On Mon, Nov 30, 2015 at 05:36:25PM +0100, Tom de Vries wrote: > +int > +main (void) > +{ > + unsigned results[nEvents]; > + unsigned pData[nEvents]; > + unsigned coeff = 2; > + > + init (&results[0], &pData[0]); > + > +#pragma omp parallel for > + for (int idx = 0; idx < (int)nEvents; idx++) > +results[idx] = coeff * pData[idx]; Could you please add another testcase, where you have say pData and some other pointer that init sets to alias with pData, and verify that such loop (would need to be say normal loop inside #pragma omp single or master) is not vectorized? Jakub
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On 30/11/15 14:24, Richard Biener wrote: On Mon, 30 Nov 2015, Tom de Vries wrote: On 30/11/15 10:16, Richard Biener wrote: On Mon, 30 Nov 2015, Tom de Vries wrote: Hi, this patch fixes PR46032. It handles a call: ... __builtin_GOMP_parallel (fn, data, num_threads, flags) ... as: ... fn (data) ... in ipa-pta. This improves ipa-pta alias analysis in the parallelized function fn, and allows vectorization in the testcase without a runtime alias test. Bootstrapped and reg-tested on x86_64. OK for stage3 trunk? + /* Assign the passed argument to the appropriate incoming +parameter of the function. */ + struct constraint_expr lhs ; + lhs = get_function_part_constraint (fi, fi_parm_base + 0); + auto_vec rhsc; + struct constraint_expr *rhsp; + get_constraint_for_rhs (arg, &rhsc); + while (rhsc.length () != 0) + { + rhsp = &rhsc.last (); + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.pop (); + } please use style used elsewhere with FOR_EACH_VEC_ELT (rhsc, j, rhsp) process_constraint (new_constraint (lhs, *rhsp)); rhsc.truncate (0); That code was copied from find_func_aliases_for_call. I've factored out the bit that I copied as find_func_aliases_for_call_arg, and fixed the style there (and dropped 'rhsc.truncate (0)' since AFAIU it's redundant at the end of a function). + /* Parameter passed by value is used. */ + lhs = get_function_part_constraint (fi, fi_uses); + struct constraint_expr *rhsp; + get_constraint_for_address_of (arg, &rhsc); This isn't correct - you want to use get_constraint_for (arg, &rhsc). After all rhs is already an ADDR_EXPR. Can we add an assert somewhere to detect this incorrect usage? + FOR_EACH_VEC_ELT (rhsc, j, rhsp) + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.truncate (0); + + /* The caller clobbers what the callee does. */ + lhs = get_function_part_constraint (fi, fi_clobbers); + rhs = get_function_part_constraint (cfi, fi_clobbers); + process_constraint (new_constraint (lhs, rhs)); + + /* The caller uses what the callee does. */ + lhs = get_function_part_constraint (fi, fi_uses); + rhs = get_function_part_constraint (cfi, fi_uses); + process_constraint (new_constraint (lhs, rhs)); I don't see why you need those. The solver should compute these in even better precision (context sensitive on the call side). The same is true for the function parameter. That is, the only needed part of the patch should be that making sure we see the "direct" call and assign parameters correctly. Dropped this bit. OK for stage3 trunk if bootstrap and reg-test succeeds? -|| node->address_taken); +|| (node->address_taken +&& !node->parallelized_function)); please add a comment here on why this is safe. Ok with this change. Updated with comment, committed as attached. Thanks, - Tom Handle BUILT_IN_GOMP_PARALLEL in ipa-pta 2015-11-30 Tom de Vries PR tree-optimization/46032 * tree-ssa-structalias.c (find_func_aliases_for_call_arg): New function, factored out of ... (find_func_aliases_for_call): ... here. (find_func_aliases_for_builtin_call, find_func_clobbers): Handle BUILT_IN_GOMP_PARALLEL. (ipa_pta_execute): Same. Handle node->parallelized_function as a local function. * gcc.dg/pr46032.c: New test. * testsuite/libgomp.c/pr46032.c: New test. --- gcc/testsuite/gcc.dg/pr46032.c| 47 +++ gcc/tree-ssa-structalias.c| 71 --- libgomp/testsuite/libgomp.c/pr46032.c | 44 ++ 3 files changed, 149 insertions(+), 13 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr46032.c b/gcc/testsuite/gcc.dg/pr46032.c new file mode 100644 index 000..b91190e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr46032.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta -fdump-tree-vect-all" } */ + +extern void abort (void); + +#define nEvents 1000 + +static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize"))) +init (unsigned *results, unsigned *pData) +{ + unsigned int i; + for (i = 0; i < nEvents; ++i) +pData[i] = i % 3; +} + +static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize"))) +check (unsigned *results) +{ + unsigned sum = 0; + for (int idx = 0; idx < (int)nEvents; idx++) +sum += results[idx]; + + if (sum != 1998) +abort (); +} + +int +main (void) +{ + unsigned results[nEvents]; + unsigned pData[nEvents]; + unsigned coeff = 2; + + init (&results[0], &pData[
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On Mon, Nov 30, 2015 at 02:24:18PM +0100, Richard Biener wrote: > > OK for stage3 trunk if bootstrap and reg-test succeeds? > > -|| node->address_taken); > +|| (node->address_taken > +&& !node->parallelized_function)); > > please add a comment here on why this is safe. > > Ok with this change. BTW, __builting_GOMP_task supposedly can be treated similarly if the third argument is NULL (if 3rd arg is non-NULL, then the caller passes a different structure from what the callee receives, but perhaps it could be emulated as pretending that cpyfn is called first with address of a temporary var and the data argument and then fn is called with the address of the temporary var). Jakub
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On Mon, 30 Nov 2015, Tom de Vries wrote: > On 30/11/15 10:16, Richard Biener wrote: > > On Mon, 30 Nov 2015, Tom de Vries wrote: > > > > > Hi, > > > > > > this patch fixes PR46032. > > > > > > It handles a call: > > > ... > > >__builtin_GOMP_parallel (fn, data, num_threads, flags) > > > ... > > > as: > > > ... > > >fn (data) > > > ... > > > in ipa-pta. > > > > > > This improves ipa-pta alias analysis in the parallelized function fn, and > > > allows vectorization in the testcase without a runtime alias test. > > > > > > Bootstrapped and reg-tested on x86_64. > > > > > > OK for stage3 trunk? > > > > + /* Assign the passed argument to the appropriate incoming > > +parameter of the function. */ > > + struct constraint_expr lhs ; > > + lhs = get_function_part_constraint (fi, fi_parm_base + 0); > > + auto_vec rhsc; > > + struct constraint_expr *rhsp; > > + get_constraint_for_rhs (arg, &rhsc); > > + while (rhsc.length () != 0) > > + { > > + rhsp = &rhsc.last (); > > + process_constraint (new_constraint (lhs, *rhsp)); > > + rhsc.pop (); > > + } > > > > please use style used elsewhere with > > > > FOR_EACH_VEC_ELT (rhsc, j, rhsp) > > process_constraint (new_constraint (lhs, *rhsp)); > > rhsc.truncate (0); > > > > That code was copied from find_func_aliases_for_call. > I've factored out the bit that I copied as find_func_aliases_for_call_arg, and > fixed the style there (and dropped 'rhsc.truncate (0)' since AFAIU it's > redundant at the end of a function). > > > + /* Parameter passed by value is used. */ > > + lhs = get_function_part_constraint (fi, fi_uses); > > + struct constraint_expr *rhsp; > > + get_constraint_for_address_of (arg, &rhsc); > > > > This isn't correct - you want to use get_constraint_for (arg, &rhsc). > > After all rhs is already an ADDR_EXPR. > > > > Can we add an assert somewhere to detect this incorrect usage? > > > + FOR_EACH_VEC_ELT (rhsc, j, rhsp) > > + process_constraint (new_constraint (lhs, *rhsp)); > > + rhsc.truncate (0); > > + > > + /* The caller clobbers what the callee does. */ > > + lhs = get_function_part_constraint (fi, fi_clobbers); > > + rhs = get_function_part_constraint (cfi, fi_clobbers); > > + process_constraint (new_constraint (lhs, rhs)); > > + > > + /* The caller uses what the callee does. */ > > + lhs = get_function_part_constraint (fi, fi_uses); > > + rhs = get_function_part_constraint (cfi, fi_uses); > > + process_constraint (new_constraint (lhs, rhs)); > > > > I don't see why you need those. The solver should compute these > > in even better precision (context sensitive on the call side). > > > > The same is true for the function parameter. That is, the only > > needed part of the patch should be that making sure we see > > the "direct" call and assign parameters correctly. > > > > Dropped this bit. > > OK for stage3 trunk if bootstrap and reg-test succeeds? -|| node->address_taken); +|| (node->address_taken +&& !node->parallelized_function)); please add a comment here on why this is safe. Ok with this change. Thanks, Richard.
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On 30/11/15 10:16, Richard Biener wrote: On Mon, 30 Nov 2015, Tom de Vries wrote: Hi, this patch fixes PR46032. It handles a call: ... __builtin_GOMP_parallel (fn, data, num_threads, flags) ... as: ... fn (data) ... in ipa-pta. This improves ipa-pta alias analysis in the parallelized function fn, and allows vectorization in the testcase without a runtime alias test. Bootstrapped and reg-tested on x86_64. OK for stage3 trunk? + /* Assign the passed argument to the appropriate incoming +parameter of the function. */ + struct constraint_expr lhs ; + lhs = get_function_part_constraint (fi, fi_parm_base + 0); + auto_vec rhsc; + struct constraint_expr *rhsp; + get_constraint_for_rhs (arg, &rhsc); + while (rhsc.length () != 0) + { + rhsp = &rhsc.last (); + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.pop (); + } please use style used elsewhere with FOR_EACH_VEC_ELT (rhsc, j, rhsp) process_constraint (new_constraint (lhs, *rhsp)); rhsc.truncate (0); That code was copied from find_func_aliases_for_call. I've factored out the bit that I copied as find_func_aliases_for_call_arg, and fixed the style there (and dropped 'rhsc.truncate (0)' since AFAIU it's redundant at the end of a function). + /* Parameter passed by value is used. */ + lhs = get_function_part_constraint (fi, fi_uses); + struct constraint_expr *rhsp; + get_constraint_for_address_of (arg, &rhsc); This isn't correct - you want to use get_constraint_for (arg, &rhsc). After all rhs is already an ADDR_EXPR. Can we add an assert somewhere to detect this incorrect usage? + FOR_EACH_VEC_ELT (rhsc, j, rhsp) + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.truncate (0); + + /* The caller clobbers what the callee does. */ + lhs = get_function_part_constraint (fi, fi_clobbers); + rhs = get_function_part_constraint (cfi, fi_clobbers); + process_constraint (new_constraint (lhs, rhs)); + + /* The caller uses what the callee does. */ + lhs = get_function_part_constraint (fi, fi_uses); + rhs = get_function_part_constraint (cfi, fi_uses); + process_constraint (new_constraint (lhs, rhs)); I don't see why you need those. The solver should compute these in even better precision (context sensitive on the call side). The same is true for the function parameter. That is, the only needed part of the patch should be that making sure we see the "direct" call and assign parameters correctly. Dropped this bit. OK for stage3 trunk if bootstrap and reg-test succeeds? Thanks, - Tom Handle BUILT_IN_GOMP_PARALLEL in ipa-pta 2015-11-30 Tom de Vries PR tree-optimization/46032 * tree-ssa-structalias.c (find_func_aliases_for_call_arg): New function, factored out of ... (find_func_aliases_for_call): ... here. (find_func_aliases_for_builtin_call, find_func_clobbers): Handle BUILT_IN_GOMP_PARALLEL. (ipa_pta_execute): Same. Handle node->parallelized_function as a local function. * gcc.dg/pr46032.c: New test. * testsuite/libgomp.c/pr46032.c: New test. --- gcc/testsuite/gcc.dg/pr46032.c| 47 +++ gcc/tree-ssa-structalias.c| 60 +++ libgomp/testsuite/libgomp.c/pr46032.c | 44 + 3 files changed, 138 insertions(+), 13 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr46032.c b/gcc/testsuite/gcc.dg/pr46032.c new file mode 100644 index 000..b91190e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr46032.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta -fdump-tree-vect-all" } */ + +extern void abort (void); + +#define nEvents 1000 + +static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize"))) +init (unsigned *results, unsigned *pData) +{ + unsigned int i; + for (i = 0; i < nEvents; ++i) +pData[i] = i % 3; +} + +static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize"))) +check (unsigned *results) +{ + unsigned sum = 0; + for (int idx = 0; idx < (int)nEvents; idx++) +sum += results[idx]; + + if (sum != 1998) +abort (); +} + +int +main (void) +{ + unsigned results[nEvents]; + unsigned pData[nEvents]; + unsigned coeff = 2; + + init (&results[0], &pData[0]); + +#pragma omp parallel for + for (int idx = 0; idx < (int)nEvents; idx++) +results[idx] = coeff * pData[idx]; + + check (&results[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loop" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" } } */ + diff --git a/gcc/tree-ssa-struc
Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
On Mon, 30 Nov 2015, Tom de Vries wrote: > Hi, > > this patch fixes PR46032. > > It handles a call: > ... > __builtin_GOMP_parallel (fn, data, num_threads, flags) > ... > as: > ... > fn (data) > ... > in ipa-pta. > > This improves ipa-pta alias analysis in the parallelized function fn, and > allows vectorization in the testcase without a runtime alias test. > > Bootstrapped and reg-tested on x86_64. > > OK for stage3 trunk? + /* Assign the passed argument to the appropriate incoming +parameter of the function. */ + struct constraint_expr lhs ; + lhs = get_function_part_constraint (fi, fi_parm_base + 0); + auto_vec rhsc; + struct constraint_expr *rhsp; + get_constraint_for_rhs (arg, &rhsc); + while (rhsc.length () != 0) + { + rhsp = &rhsc.last (); + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.pop (); + } please use style used elsewhere with FOR_EACH_VEC_ELT (rhsc, j, rhsp) process_constraint (new_constraint (lhs, *rhsp)); rhsc.truncate (0); + /* Parameter passed by value is used. */ + lhs = get_function_part_constraint (fi, fi_uses); + struct constraint_expr *rhsp; + get_constraint_for_address_of (arg, &rhsc); This isn't correct - you want to use get_constraint_for (arg, &rhsc). After all rhs is already an ADDR_EXPR. + FOR_EACH_VEC_ELT (rhsc, j, rhsp) + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.truncate (0); + + /* The caller clobbers what the callee does. */ + lhs = get_function_part_constraint (fi, fi_clobbers); + rhs = get_function_part_constraint (cfi, fi_clobbers); + process_constraint (new_constraint (lhs, rhs)); + + /* The caller uses what the callee does. */ + lhs = get_function_part_constraint (fi, fi_uses); + rhs = get_function_part_constraint (cfi, fi_uses); + process_constraint (new_constraint (lhs, rhs)); I don't see why you need those. The solver should compute these in even better precision (context sensitive on the call side). The same is true for the function parameter. That is, the only needed part of the patch should be that making sure we see the "direct" call and assign parameters correctly. Richard.
[PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta
Hi, this patch fixes PR46032. It handles a call: ... __builtin_GOMP_parallel (fn, data, num_threads, flags) ... as: ... fn (data) ... in ipa-pta. This improves ipa-pta alias analysis in the parallelized function fn, and allows vectorization in the testcase without a runtime alias test. Bootstrapped and reg-tested on x86_64. OK for stage3 trunk? Thanks, - Tom Handle BUILT_IN_GOMP_PARALLEL in pta 2015-11-30 Tom de Vries PR tree-optimization/46032 * tree-ssa-structalias.c (find_func_aliases_for_builtin_call) (find_func_clobbers): Handle BUILT_IN_GOMP_PARALLEL. (ipa_pta_execute): Same. Handle node->parallelized_function as a local function. * gcc.dg/pr46032.c: New test. * testsuite/libgomp.c/pr46032.c: New test. --- gcc/testsuite/gcc.dg/pr46032.c| 47 ++ gcc/tree-ssa-structalias.c| 73 ++- libgomp/testsuite/libgomp.c/pr46032.c | 44 + 3 files changed, 162 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr46032.c b/gcc/testsuite/gcc.dg/pr46032.c new file mode 100644 index 000..b91190e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr46032.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta -fdump-tree-vect-all" } */ + +extern void abort (void); + +#define nEvents 1000 + +static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize"))) +init (unsigned *results, unsigned *pData) +{ + unsigned int i; + for (i = 0; i < nEvents; ++i) +pData[i] = i % 3; +} + +static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize"))) +check (unsigned *results) +{ + unsigned sum = 0; + for (int idx = 0; idx < (int)nEvents; idx++) +sum += results[idx]; + + if (sum != 1998) +abort (); +} + +int +main (void) +{ + unsigned results[nEvents]; + unsigned pData[nEvents]; + unsigned coeff = 2; + + init (&results[0], &pData[0]); + +#pragma omp parallel for + for (int idx = 0; idx < (int)nEvents; idx++) +results[idx] = coeff * pData[idx]; + + check (&results[0]); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "note: vectorized 1 loop" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" } } */ + diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c index f24ebeb..3fe538b 100644 --- a/gcc/tree-ssa-structalias.c +++ b/gcc/tree-ssa-structalias.c @@ -4488,6 +4488,39 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t) } return true; } + case BUILT_IN_GOMP_PARALLEL: + { + /* Handle + __builtin_GOMP_parallel (fn, data, num_threads, flags). */ + if (in_ipa_mode) + { + tree fnarg = gimple_call_arg (t, 0); + gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); + tree fndecl = TREE_OPERAND (fnarg, 0); + varinfo_t fi = get_vi_for_tree (fndecl); + tree arg = gimple_call_arg (t, 1); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR); + + /* Assign the passed argument to the appropriate incoming + parameter of the function. */ + struct constraint_expr lhs ; + lhs = get_function_part_constraint (fi, fi_parm_base + 0); + auto_vec rhsc; + struct constraint_expr *rhsp; + get_constraint_for_rhs (arg, &rhsc); + while (rhsc.length () != 0) + { + rhsp = &rhsc.last (); + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.pop (); + } + + return true; + } + /* Else fallthru to generic handling which will let + the frame escape. */ + break; + } /* printf-style functions may have hooks to set pointers to point to somewhere into the generated string. Leave them for a later exercise... */ @@ -5036,6 +5069,37 @@ find_func_clobbers (struct function *fn, gimple *origt) case BUILT_IN_VA_START: case BUILT_IN_VA_END: return; + case BUILT_IN_GOMP_PARALLEL: + { + /* Handle + __builtin_GOMP_parallel (fn, data, num_threads, flags). */ + tree fnarg = gimple_call_arg (t, 0); + gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR); + tree fndecl = TREE_OPERAND (fnarg, 0); + varinfo_t cfi = get_vi_for_tree (fndecl); + tree arg = gimple_call_arg (t, 1); + gcc_assert (TREE_CODE (arg) == ADDR_EXPR); + + /* Parameter passed by value is used. */ + lhs = get_function_part_constraint (fi, fi_uses); + struct constraint_expr *rhsp; + get_constraint_for_address_of (arg, &rhsc); + FOR_EACH_VEC_ELT (rhsc, j, rhsp) + process_constraint (new_constraint (lhs, *rhsp)); + rhsc.truncate (0); + + /* The caller clobbers what the callee does. */ + lhs = get_function_part_constraint (fi, fi_clobbers); + rhs = get_function_part_constraint (cfi, fi_clobbers); + process_constraint (new_constraint (lhs, rhs)); + + /* The caller uses what the callee does. */ + lhs = get_function_part_constraint (fi, fi