Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-03 Thread Tom de Vries

On 30/11/15 14:32, Jakub Jelinek wrote:

On Mon, Nov 30, 2015 at 02:24:18PM +0100, Richard Biener wrote:

OK for stage3 trunk if bootstrap and reg-test succeeds?


-|| node->address_taken);
+|| (node->address_taken
+&& !node->parallelized_function));

please add a comment here on why this is safe.

Ok with this change.


BTW, __builting_GOMP_task supposedly can be treated similarly
if the third argument is NULL (if 3rd arg is non-NULL, then
the caller passes a different structure from what the callee receives,
but perhaps it could be emulated as pretending that cpyfn is called first
with address of a temporary var and the data argument and then fn
is called with the address of the temporary var).


Filed as PR68673 - Handle __builtin_GOMP_task optimally in ipa-pta.

Can you provide testcases for both (3rd arg NULL/non-NULL) cases? I'm 
not fluent in openmp.


Thanks,
- Tom


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-01 Thread Tom de Vries

On 01/12/15 13:29, Jakub Jelinek wrote:

On Tue, Dec 01, 2015 at 01:27:32PM +0100, Christophe Lyon wrote:

>I've committed the attached patch as obvious: it adds
>dg-require-effective-target fopenmp to these new tests
>so that they are skipped e.g. on arm bare-metal targets
>(using newlib).
>
>Note that pr46032.c has some failures:
>FAIL:  gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1
>on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*,
>and on armeb-none-linux-gnueabihf
>
>I haven't looked at the details yet; see
>http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html
>for more information.



Supposedly pr46032-{2,3}.c should go into testsuite/gcc.dg/gomp/ instead and
pr46032.c into testsuite/gcc.dg/vect/ (with the fopenmp effective target and
perhaps some other effective target conditions)?


I've moved the tests, and added dg-require-effective-target vect_int in 
pr46032.c.


Committed to trunk as obvious.

Thanks,
- Tom
Move pr46032*.c tests

2015-12-01  Tom de Vries  

	* gcc.dg/pr46032.c: Move to ...
	* gcc.dg/vect/pr46032.c: here.  Add dg-require-effective-target
	vect_int.
	* gcc.dg/pr46032-2.c: Move to ...
	* gcc.dg/gomp/pr46032-2.c: ... here.  Drop dg-require-effective-target fopenmp.
	* gcc.dg/pr46032-3.c: Move to ...
	* gcc.dg/gomp/pr46032-3.c: ... here.  Drop dg-require-effective-target fopenmp.

---
 gcc/testsuite/gcc.dg/gomp/pr46032-2.c | 29 +
 gcc/testsuite/gcc.dg/gomp/pr46032-3.c | 28 
 gcc/testsuite/gcc.dg/pr46032-2.c  | 30 -
 gcc/testsuite/gcc.dg/pr46032-3.c  | 29 -
 gcc/testsuite/gcc.dg/pr46032.c| 48 --
 gcc/testsuite/gcc.dg/vect/pr46032.c   | 49 +++
 6 files changed, 106 insertions(+), 107 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/gomp/pr46032-2.c b/gcc/testsuite/gcc.dg/gomp/pr46032-2.c
new file mode 100644
index 000..e110880
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr46032-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
+
+#define N 2
+
+int
+foo (void)
+{
+  int a[N], b[N], c[N];
+  int *ap = &a[0];
+  int *bp = &b[0];
+  int *cp = &c[0];
+
+#pragma omp parallel for
+  for (unsigned int idx = 0; idx < N; idx++)
+{
+  ap[idx] = 1;
+  bp[idx] = 2;
+  cp[idx] = ap[idx];
+}
+
+  return *cp;
+}
+
+/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/gomp/pr46032-3.c b/gcc/testsuite/gcc.dg/gomp/pr46032-3.c
new file mode 100644
index 000..a4af7ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/pr46032-3.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
+
+#define N 2
+
+int
+foo (void)
+{
+  int a[N], c[N];
+  int *ap = &a[0];
+  int *bp = &a[0];
+  int *cp = &c[0];
+
+#pragma omp parallel for
+  for (unsigned int idx = 0; idx < N; idx++)
+{
+  ap[idx] = 1;
+  bp[idx] = 2;
+  cp[idx] = ap[idx];
+}
+
+  return *cp;
+}
+
+/* { dg-final { scan-tree-dump-times "\\] = 1;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr46032-2.c b/gcc/testsuite/gcc.dg/pr46032-2.c
deleted file mode 100644
index d769597..000
--- a/gcc/testsuite/gcc.dg/pr46032-2.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target fopenmp } */
-/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
-
-#define N 2
-
-int
-foo (void)
-{
-  int a[N], b[N], c[N];
-  int *ap = &a[0];
-  int *bp = &b[0];
-  int *cp = &c[0];
-
-#pragma omp parallel for
-  for (unsigned int idx = 0; idx < N; idx++)
-{
-  ap[idx] = 1;
-  bp[idx] = 2;
-  cp[idx] = ap[idx];
-}
-
-  return *cp;
-}
-
-/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
-
diff --git a/gcc/testsuite/gcc.dg/pr46032-3.c b/gcc/testsuite/gcc.dg/pr46032-3.c
deleted file mode 100644
index a9e74d0..000
--- a/gcc/testsuite/gcc.dg/pr46032-3.c
+++ /dev/null
@@ -1,29 +0,0 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target fopenmp } */
-/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
-
-#define N 2
-
-int

Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-01 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 01:27:32PM +0100, Christophe Lyon wrote:
> I've committed the attached patch as obvious: it adds
> dg-require-effective-target fopenmp to these new tests
> so that they are skipped e.g. on arm bare-metal targets
> (using newlib).
> 
> Note that pr46032.c has some failures:
> FAIL:  gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1
> on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*,
> and on armeb-none-linux-gnueabihf
> 
> I haven't looked at the details yet; see
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html
> for more information.

Supposedly pr46032-{2,3}.c should go into testsuite/gcc.dg/gomp/ instead and
pr46032.c into testsuite/gcc.dg/vect/ (with the fopenmp effective target and
perhaps some other effective target conditions)?
> 2015-12-01  Christophe Lyon  
> 
> * gcc.dg/pr46032.c: Add dg-require-effective-target fopenmp.
> * gcc.dg/pr46032-2.c: Likewise.
> * gcc.dg/pr46032-3.c: Likewise.

Jakub


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-12-01 Thread Christophe Lyon
On 30 November 2015 at 18:55, Tom de Vries  wrote:
> On 30/11/15 17:48, Jakub Jelinek wrote:
>>
>> On Mon, Nov 30, 2015 at 05:36:25PM +0100, Tom de Vries wrote:
>>>
>>> +int
>>> +main (void)
>>> +{
>>> +  unsigned results[nEvents];
>>> +  unsigned pData[nEvents];
>>> +  unsigned coeff = 2;
>>> +
>>> +  init (&results[0], &pData[0]);
>>> +
>>> +#pragma omp parallel for
>>> +  for (int idx = 0; idx < (int)nEvents; idx++)
>>> +results[idx] = coeff * pData[idx];
>>
>>
>> Could you please add another testcase, where you have say pData
>> and some other pointer that init sets to alias with pData, and verify
>> that such loop (would need to be say normal loop inside #pragma omp single
>> or master) is not vectorized?
>
>
> I've:
> - added a simpler (not vectorizer-based) version of the testcase as
>   pr46032-2.c, and
> - copied pr46032-2.c to pr46032-3.c and modified it such that two
>   pointers are aliasing
>
> Committed to trunk.
>

Hi,

I've committed the attached patch as obvious: it adds
dg-require-effective-target fopenmp to these new tests
so that they are skipped e.g. on arm bare-metal targets
(using newlib).

Note that pr46032.c has some failures:
FAIL:  gcc.dg/pr46032.c scan-tree-dump-times vect "note: vectorized 1 loop" 1
on arm-none-linux-gnueabi, on arm-none-linux-gnueabihf with -mfpu=vfp*,
and on armeb-none-linux-gnueabihf

I haven't looked at the details yet; see
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/231076/report-build-info.html
for more information.

Thanks,

Christophe.

2015-12-01  Christophe Lyon  

* gcc.dg/pr46032.c: Add dg-require-effective-target fopenmp.
* gcc.dg/pr46032-2.c: Likewise.
* gcc.dg/pr46032-3.c: Likewise.


> Thanks,
> - Tom
>
Index: gcc/testsuite/gcc.dg/pr46032-2.c
===
--- gcc/testsuite/gcc.dg/pr46032-2.c(revision 231108)
+++ gcc/testsuite/gcc.dg/pr46032-2.c(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
 /* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
 
 #define N 2
Index: gcc/testsuite/gcc.dg/pr46032-3.c
===
--- gcc/testsuite/gcc.dg/pr46032-3.c(revision 231108)
+++ gcc/testsuite/gcc.dg/pr46032-3.c(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
 /* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
 
 #define N 2
Index: gcc/testsuite/gcc.dg/pr46032.c
===
--- gcc/testsuite/gcc.dg/pr46032.c  (revision 231108)
+++ gcc/testsuite/gcc.dg/pr46032.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
 /* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta 
-fdump-tree-vect-all" } */
 
 extern void abort (void);


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Tom de Vries

On 30/11/15 17:48, Jakub Jelinek wrote:

On Mon, Nov 30, 2015 at 05:36:25PM +0100, Tom de Vries wrote:

+int
+main (void)
+{
+  unsigned results[nEvents];
+  unsigned pData[nEvents];
+  unsigned coeff = 2;
+
+  init (&results[0], &pData[0]);
+
+#pragma omp parallel for
+  for (int idx = 0; idx < (int)nEvents; idx++)
+results[idx] = coeff * pData[idx];


Could you please add another testcase, where you have say pData
and some other pointer that init sets to alias with pData, and verify
that such loop (would need to be say normal loop inside #pragma omp single
or master) is not vectorized?


I've:
- added a simpler (not vectorizer-based) version of the testcase as
  pr46032-2.c, and
- copied pr46032-2.c to pr46032-3.c and modified it such that two
  pointers are aliasing

Committed to trunk.

Thanks,
- Tom

Add gcc.dg/pr46032-{2,3}.c test-cases

2015-11-30  Tom de Vries  

	* gcc.dg/pr46032-2.c: New test.
	* gcc.dg/pr46032-3.c: New test.

---
 gcc/testsuite/gcc.dg/pr46032-2.c | 29 +
 gcc/testsuite/gcc.dg/pr46032-3.c | 28 
 2 files changed, 57 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/pr46032-2.c b/gcc/testsuite/gcc.dg/pr46032-2.c
new file mode 100644
index 000..e110880
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr46032-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
+
+#define N 2
+
+int
+foo (void)
+{
+  int a[N], b[N], c[N];
+  int *ap = &a[0];
+  int *bp = &b[0];
+  int *cp = &c[0];
+
+#pragma omp parallel for
+  for (unsigned int idx = 0; idx < N; idx++)
+{
+  ap[idx] = 1;
+  bp[idx] = 2;
+  cp[idx] = ap[idx];
+}
+
+  return *cp;
+}
+
+/* { dg-final { scan-tree-dump-times "\\] = 1;" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/pr46032-3.c b/gcc/testsuite/gcc.dg/pr46032-3.c
new file mode 100644
index 000..a4af7ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr46032-3.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -std=c99 -fipa-pta -fdump-tree-optimized" } */
+
+#define N 2
+
+int
+foo (void)
+{
+  int a[N], c[N];
+  int *ap = &a[0];
+  int *bp = &a[0];
+  int *cp = &c[0];
+
+#pragma omp parallel for
+  for (unsigned int idx = 0; idx < N; idx++)
+{
+  ap[idx] = 1;
+  bp[idx] = 2;
+  cp[idx] = ap[idx];
+}
+
+  return *cp;
+}
+
+/* { dg-final { scan-tree-dump-times "\\] = 1;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = 2;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = _\[0-9\]*;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\] = " 3 "optimized" } } */


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Jakub Jelinek
On Mon, Nov 30, 2015 at 05:36:25PM +0100, Tom de Vries wrote:
> +int
> +main (void)
> +{
> +  unsigned results[nEvents];
> +  unsigned pData[nEvents];
> +  unsigned coeff = 2;
> +
> +  init (&results[0], &pData[0]);
> +
> +#pragma omp parallel for
> +  for (int idx = 0; idx < (int)nEvents; idx++)
> +results[idx] = coeff * pData[idx];

Could you please add another testcase, where you have say pData
and some other pointer that init sets to alias with pData, and verify
that such loop (would need to be say normal loop inside #pragma omp single
or master) is not vectorized?

Jakub


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Tom de Vries

On 30/11/15 14:24, Richard Biener wrote:

On Mon, 30 Nov 2015, Tom de Vries wrote:


On 30/11/15 10:16, Richard Biener wrote:

On Mon, 30 Nov 2015, Tom de Vries wrote:


Hi,

this patch fixes PR46032.

It handles a call:
...
__builtin_GOMP_parallel (fn, data, num_threads, flags)
...
as:
...
fn (data)
...
in ipa-pta.

This improves ipa-pta alias analysis in the parallelized function fn, and
allows vectorization in the testcase without a runtime alias test.

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?


+ /* Assign the passed argument to the appropriate incoming
+parameter of the function.  */
+ struct constraint_expr lhs ;
+ lhs = get_function_part_constraint (fi, fi_parm_base + 0);
+ auto_vec rhsc;
+ struct constraint_expr *rhsp;
+ get_constraint_for_rhs (arg, &rhsc);
+ while (rhsc.length () != 0)
+   {
+ rhsp = &rhsc.last ();
+ process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.pop ();
+   }

please use style used elsewhere with

   FOR_EACH_VEC_ELT (rhsc, j, rhsp)
 process_constraint (new_constraint (lhs, *rhsp));
   rhsc.truncate (0);



That code was copied from find_func_aliases_for_call.
I've factored out the bit that I copied as find_func_aliases_for_call_arg, and
fixed the style there (and dropped 'rhsc.truncate (0)' since AFAIU it's
redundant at the end of a function).


+ /* Parameter passed by value is used.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ struct constraint_expr *rhsp;
+ get_constraint_for_address_of (arg, &rhsc);

This isn't correct - you want to use get_constraint_for (arg, &rhsc).
After all rhs is already an ADDR_EXPR.



Can we add an assert somewhere to detect this incorrect usage?


+ FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.truncate (0);
+
+ /* The caller clobbers what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_clobbers);
+ rhs = get_function_part_constraint (cfi, fi_clobbers);
+ process_constraint (new_constraint (lhs, rhs));
+
+ /* The caller uses what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ rhs = get_function_part_constraint (cfi, fi_uses);
+ process_constraint (new_constraint (lhs, rhs));

I don't see why you need those.  The solver should compute these
in even better precision (context sensitive on the call side).

The same is true for the function parameter.  That is, the only
needed part of the patch should be that making sure we see
the "direct" call and assign parameters correctly.



Dropped this bit.

OK for stage3 trunk if bootstrap and reg-test succeeds?


-|| node->address_taken);
+|| (node->address_taken
+&& !node->parallelized_function));

please add a comment here on why this is safe.

Ok with this change.


Updated with comment, committed as attached.

Thanks,
- Tom


Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30  Tom de Vries  

	PR tree-optimization/46032
	* tree-ssa-structalias.c (find_func_aliases_for_call_arg): New function,
	factored out of ...
	(find_func_aliases_for_call): ... here.
	(find_func_aliases_for_builtin_call, find_func_clobbers): Handle
	BUILT_IN_GOMP_PARALLEL.
	(ipa_pta_execute): Same.  Handle node->parallelized_function as a local
	function.

	* gcc.dg/pr46032.c: New test.

	* testsuite/libgomp.c/pr46032.c: New test.

---
 gcc/testsuite/gcc.dg/pr46032.c| 47 +++
 gcc/tree-ssa-structalias.c| 71 ---
 libgomp/testsuite/libgomp.c/pr46032.c | 44 ++
 3 files changed, 149 insertions(+), 13 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr46032.c b/gcc/testsuite/gcc.dg/pr46032.c
new file mode 100644
index 000..b91190e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr46032.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta -fdump-tree-vect-all" } */
+
+extern void abort (void);
+
+#define nEvents 1000
+
+static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize")))
+init (unsigned *results, unsigned *pData)
+{
+  unsigned int i;
+  for (i = 0; i < nEvents; ++i)
+pData[i] = i % 3;
+}
+
+static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize")))
+check (unsigned *results)
+{
+  unsigned sum = 0;
+  for (int idx = 0; idx < (int)nEvents; idx++)
+sum += results[idx];
+
+  if (sum != 1998)
+abort ();
+}
+
+int
+main (void)
+{
+  unsigned results[nEvents];
+  unsigned pData[nEvents];
+  unsigned coeff = 2;
+
+  init (&results[0], &pData[

Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Jakub Jelinek
On Mon, Nov 30, 2015 at 02:24:18PM +0100, Richard Biener wrote:
> > OK for stage3 trunk if bootstrap and reg-test succeeds?
> 
> -|| node->address_taken);
> +|| (node->address_taken
> +&& !node->parallelized_function));
> 
> please add a comment here on why this is safe.
> 
> Ok with this change.

BTW, __builting_GOMP_task supposedly can be treated similarly
if the third argument is NULL (if 3rd arg is non-NULL, then
the caller passes a different structure from what the callee receives,
but perhaps it could be emulated as pretending that cpyfn is called first
with address of a temporary var and the data argument and then fn
is called with the address of the temporary var).

Jakub


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Richard Biener
On Mon, 30 Nov 2015, Tom de Vries wrote:

> On 30/11/15 10:16, Richard Biener wrote:
> > On Mon, 30 Nov 2015, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > this patch fixes PR46032.
> > > 
> > > It handles a call:
> > > ...
> > >__builtin_GOMP_parallel (fn, data, num_threads, flags)
> > > ...
> > > as:
> > > ...
> > >fn (data)
> > > ...
> > > in ipa-pta.
> > > 
> > > This improves ipa-pta alias analysis in the parallelized function fn, and
> > > allows vectorization in the testcase without a runtime alias test.
> > > 
> > > Bootstrapped and reg-tested on x86_64.
> > > 
> > > OK for stage3 trunk?
> > 
> > + /* Assign the passed argument to the appropriate incoming
> > +parameter of the function.  */
> > + struct constraint_expr lhs ;
> > + lhs = get_function_part_constraint (fi, fi_parm_base + 0);
> > + auto_vec rhsc;
> > + struct constraint_expr *rhsp;
> > + get_constraint_for_rhs (arg, &rhsc);
> > + while (rhsc.length () != 0)
> > +   {
> > + rhsp = &rhsc.last ();
> > + process_constraint (new_constraint (lhs, *rhsp));
> > + rhsc.pop ();
> > +   }
> > 
> > please use style used elsewhere with
> > 
> >   FOR_EACH_VEC_ELT (rhsc, j, rhsp)
> > process_constraint (new_constraint (lhs, *rhsp));
> >   rhsc.truncate (0);
> > 
> 
> That code was copied from find_func_aliases_for_call.
> I've factored out the bit that I copied as find_func_aliases_for_call_arg, and
> fixed the style there (and dropped 'rhsc.truncate (0)' since AFAIU it's
> redundant at the end of a function).
> 
> > + /* Parameter passed by value is used.  */
> > + lhs = get_function_part_constraint (fi, fi_uses);
> > + struct constraint_expr *rhsp;
> > + get_constraint_for_address_of (arg, &rhsc);
> > 
> > This isn't correct - you want to use get_constraint_for (arg, &rhsc).
> > After all rhs is already an ADDR_EXPR.
> > 
> 
> Can we add an assert somewhere to detect this incorrect usage?
> 
> > + FOR_EACH_VEC_ELT (rhsc, j, rhsp)
> > +   process_constraint (new_constraint (lhs, *rhsp));
> > + rhsc.truncate (0);
> > +
> > + /* The caller clobbers what the callee does.  */
> > + lhs = get_function_part_constraint (fi, fi_clobbers);
> > + rhs = get_function_part_constraint (cfi, fi_clobbers);
> > + process_constraint (new_constraint (lhs, rhs));
> > +
> > + /* The caller uses what the callee does.  */
> > + lhs = get_function_part_constraint (fi, fi_uses);
> > + rhs = get_function_part_constraint (cfi, fi_uses);
> > + process_constraint (new_constraint (lhs, rhs));
> > 
> > I don't see why you need those.  The solver should compute these
> > in even better precision (context sensitive on the call side).
> > 
> > The same is true for the function parameter.  That is, the only
> > needed part of the patch should be that making sure we see
> > the "direct" call and assign parameters correctly.
> > 
> 
> Dropped this bit.
> 
> OK for stage3 trunk if bootstrap and reg-test succeeds?

-|| node->address_taken);
+|| (node->address_taken
+&& !node->parallelized_function));

please add a comment here on why this is safe.

Ok with this change.

Thanks,
Richard.


Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Tom de Vries

On 30/11/15 10:16, Richard Biener wrote:

On Mon, 30 Nov 2015, Tom de Vries wrote:


Hi,

this patch fixes PR46032.

It handles a call:
...
   __builtin_GOMP_parallel (fn, data, num_threads, flags)
...
as:
...
   fn (data)
...
in ipa-pta.

This improves ipa-pta alias analysis in the parallelized function fn, and
allows vectorization in the testcase without a runtime alias test.

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?


+ /* Assign the passed argument to the appropriate incoming
+parameter of the function.  */
+ struct constraint_expr lhs ;
+ lhs = get_function_part_constraint (fi, fi_parm_base + 0);
+ auto_vec rhsc;
+ struct constraint_expr *rhsp;
+ get_constraint_for_rhs (arg, &rhsc);
+ while (rhsc.length () != 0)
+   {
+ rhsp = &rhsc.last ();
+ process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.pop ();
+   }

please use style used elsewhere with

  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
process_constraint (new_constraint (lhs, *rhsp));
  rhsc.truncate (0);



That code was copied from find_func_aliases_for_call.
I've factored out the bit that I copied as 
find_func_aliases_for_call_arg, and fixed the style there (and dropped 
'rhsc.truncate (0)' since AFAIU it's redundant at the end of a function).



+ /* Parameter passed by value is used.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ struct constraint_expr *rhsp;
+ get_constraint_for_address_of (arg, &rhsc);

This isn't correct - you want to use get_constraint_for (arg, &rhsc).
After all rhs is already an ADDR_EXPR.



Can we add an assert somewhere to detect this incorrect usage?


+ FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.truncate (0);
+
+ /* The caller clobbers what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_clobbers);
+ rhs = get_function_part_constraint (cfi, fi_clobbers);
+ process_constraint (new_constraint (lhs, rhs));
+
+ /* The caller uses what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ rhs = get_function_part_constraint (cfi, fi_uses);
+ process_constraint (new_constraint (lhs, rhs));

I don't see why you need those.  The solver should compute these
in even better precision (context sensitive on the call side).

The same is true for the function parameter.  That is, the only
needed part of the patch should be that making sure we see
the "direct" call and assign parameters correctly.



Dropped this bit.

OK for stage3 trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom


Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30  Tom de Vries  

	PR tree-optimization/46032
	* tree-ssa-structalias.c (find_func_aliases_for_call_arg): New function,
	factored out of ...
	(find_func_aliases_for_call): ... here.
	(find_func_aliases_for_builtin_call, find_func_clobbers): Handle
	BUILT_IN_GOMP_PARALLEL.
	(ipa_pta_execute): Same.  Handle node->parallelized_function as a local
	function.

	* gcc.dg/pr46032.c: New test.

	* testsuite/libgomp.c/pr46032.c: New test.

---
 gcc/testsuite/gcc.dg/pr46032.c| 47 +++
 gcc/tree-ssa-structalias.c| 60 +++
 libgomp/testsuite/libgomp.c/pr46032.c | 44 +
 3 files changed, 138 insertions(+), 13 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr46032.c b/gcc/testsuite/gcc.dg/pr46032.c
new file mode 100644
index 000..b91190e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr46032.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta -fdump-tree-vect-all" } */
+
+extern void abort (void);
+
+#define nEvents 1000
+
+static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize")))
+init (unsigned *results, unsigned *pData)
+{
+  unsigned int i;
+  for (i = 0; i < nEvents; ++i)
+pData[i] = i % 3;
+}
+
+static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize")))
+check (unsigned *results)
+{
+  unsigned sum = 0;
+  for (int idx = 0; idx < (int)nEvents; idx++)
+sum += results[idx];
+
+  if (sum != 1998)
+abort ();
+}
+
+int
+main (void)
+{
+  unsigned results[nEvents];
+  unsigned pData[nEvents];
+  unsigned coeff = 2;
+
+  init (&results[0], &pData[0]);
+
+#pragma omp parallel for
+  for (int idx = 0; idx < (int)nEvents; idx++)
+results[idx] = coeff * pData[idx];
+
+  check (&results[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loop" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" } } */
+
diff --git a/gcc/tree-ssa-struc

Re: [PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Richard Biener
On Mon, 30 Nov 2015, Tom de Vries wrote:

> Hi,
> 
> this patch fixes PR46032.
> 
> It handles a call:
> ...
>   __builtin_GOMP_parallel (fn, data, num_threads, flags)
> ...
> as:
> ...
>   fn (data)
> ...
> in ipa-pta.
> 
> This improves ipa-pta alias analysis in the parallelized function fn, and
> allows vectorization in the testcase without a runtime alias test.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage3 trunk?

+ /* Assign the passed argument to the appropriate incoming
+parameter of the function.  */
+ struct constraint_expr lhs ;
+ lhs = get_function_part_constraint (fi, fi_parm_base + 0);
+ auto_vec rhsc;
+ struct constraint_expr *rhsp;
+ get_constraint_for_rhs (arg, &rhsc);
+ while (rhsc.length () != 0)
+   {
+ rhsp = &rhsc.last ();
+ process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.pop ();
+   }

please use style used elsewhere with

 FOR_EACH_VEC_ELT (rhsc, j, rhsp)
   process_constraint (new_constraint (lhs, *rhsp));
 rhsc.truncate (0);

+ /* Parameter passed by value is used.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ struct constraint_expr *rhsp;
+ get_constraint_for_address_of (arg, &rhsc);

This isn't correct - you want to use get_constraint_for (arg, &rhsc).
After all rhs is already an ADDR_EXPR.

+ FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+   process_constraint (new_constraint (lhs, *rhsp));
+ rhsc.truncate (0);
+
+ /* The caller clobbers what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_clobbers);
+ rhs = get_function_part_constraint (cfi, fi_clobbers);
+ process_constraint (new_constraint (lhs, rhs));
+
+ /* The caller uses what the callee does.  */
+ lhs = get_function_part_constraint (fi, fi_uses);
+ rhs = get_function_part_constraint (cfi, fi_uses);
+ process_constraint (new_constraint (lhs, rhs));

I don't see why you need those.  The solver should compute these
in even better precision (context sensitive on the call side).

The same is true for the function parameter.  That is, the only
needed part of the patch should be that making sure we see
the "direct" call and assign parameters correctly.

Richard.


[PATCH, PR46032] Handle BUILT_IN_GOMP_PARALLEL in ipa-pta

2015-11-30 Thread Tom de Vries

Hi,

this patch fixes PR46032.

It handles a call:
...
  __builtin_GOMP_parallel (fn, data, num_threads, flags)
...
as:
...
  fn (data)
...
in ipa-pta.

This improves ipa-pta alias analysis in the parallelized function fn, 
and allows vectorization in the testcase without a runtime alias test.


Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?

Thanks,
- Tom
Handle BUILT_IN_GOMP_PARALLEL in pta

2015-11-30  Tom de Vries  

	PR tree-optimization/46032
	* tree-ssa-structalias.c (find_func_aliases_for_builtin_call)
	(find_func_clobbers): Handle BUILT_IN_GOMP_PARALLEL.
	(ipa_pta_execute): Same.  Handle node->parallelized_function as a local
	function.

	* gcc.dg/pr46032.c: New test.

	* testsuite/libgomp.c/pr46032.c: New test.

---
 gcc/testsuite/gcc.dg/pr46032.c| 47 ++
 gcc/tree-ssa-structalias.c| 73 ++-
 libgomp/testsuite/libgomp.c/pr46032.c | 44 +
 3 files changed, 162 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr46032.c b/gcc/testsuite/gcc.dg/pr46032.c
new file mode 100644
index 000..b91190e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr46032.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -ftree-vectorize -std=c99 -fipa-pta -fdump-tree-vect-all" } */
+
+extern void abort (void);
+
+#define nEvents 1000
+
+static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize")))
+init (unsigned *results, unsigned *pData)
+{
+  unsigned int i;
+  for (i = 0; i < nEvents; ++i)
+pData[i] = i % 3;
+}
+
+static void __attribute__((noinline, noclone, optimize("-fno-tree-vectorize")))
+check (unsigned *results)
+{
+  unsigned sum = 0;
+  for (int idx = 0; idx < (int)nEvents; idx++)
+sum += results[idx];
+
+  if (sum != 1998)
+abort ();
+}
+
+int
+main (void)
+{
+  unsigned results[nEvents];
+  unsigned pData[nEvents];
+  unsigned coeff = 2;
+
+  init (&results[0], &pData[0]);
+
+#pragma omp parallel for
+  for (int idx = 0; idx < (int)nEvents; idx++)
+results[idx] = coeff * pData[idx];
+
+  check (&results[0]);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loop" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" } } */
+
diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index f24ebeb..3fe538b 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -4488,6 +4488,39 @@ find_func_aliases_for_builtin_call (struct function *fn, gcall *t)
 	}
 	  return true;
 	}
+  case BUILT_IN_GOMP_PARALLEL:
+	{
+	  /* Handle
+	   __builtin_GOMP_parallel (fn, data, num_threads, flags).  */
+	  if (in_ipa_mode)
+	{
+	  tree fnarg = gimple_call_arg (t, 0);
+	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
+	  tree fndecl = TREE_OPERAND (fnarg, 0);
+	  varinfo_t fi = get_vi_for_tree (fndecl);
+	  tree arg = gimple_call_arg (t, 1);
+	  gcc_assert (TREE_CODE (arg) == ADDR_EXPR);
+
+	  /* Assign the passed argument to the appropriate incoming
+		 parameter of the function.  */
+	  struct constraint_expr lhs ;
+	  lhs = get_function_part_constraint (fi, fi_parm_base + 0);
+	  auto_vec rhsc;
+	  struct constraint_expr *rhsp;
+	  get_constraint_for_rhs (arg, &rhsc);
+	  while (rhsc.length () != 0)
+		{
+		  rhsp = &rhsc.last ();
+		  process_constraint (new_constraint (lhs, *rhsp));
+		  rhsc.pop ();
+		}
+
+	  return true;
+	}
+	  /* Else fallthru to generic handling which will let
+	 the frame escape.  */
+	  break;
+	}
   /* printf-style functions may have hooks to set pointers to
 	 point to somewhere into the generated string.  Leave them
 	 for a later exercise...  */
@@ -5036,6 +5069,37 @@ find_func_clobbers (struct function *fn, gimple *origt)
 	  case BUILT_IN_VA_START:
 	  case BUILT_IN_VA_END:
 	return;
+	  case BUILT_IN_GOMP_PARALLEL:
+	{
+	  /* Handle
+		   __builtin_GOMP_parallel (fn, data, num_threads, flags).  */
+	  tree fnarg = gimple_call_arg (t, 0);
+	  gcc_assert (TREE_CODE (fnarg) == ADDR_EXPR);
+	  tree fndecl = TREE_OPERAND (fnarg, 0);
+	  varinfo_t cfi = get_vi_for_tree (fndecl);
+	  tree arg = gimple_call_arg (t, 1);
+	  gcc_assert (TREE_CODE (arg) == ADDR_EXPR);
+
+	  /* Parameter passed by value is used.  */
+	  lhs = get_function_part_constraint (fi, fi_uses);
+	  struct constraint_expr *rhsp;
+	  get_constraint_for_address_of (arg, &rhsc);
+	  FOR_EACH_VEC_ELT (rhsc, j, rhsp)
+		process_constraint (new_constraint (lhs, *rhsp));
+	  rhsc.truncate (0);
+
+	  /* The caller clobbers what the callee does.  */
+	  lhs = get_function_part_constraint (fi, fi_clobbers);
+	  rhs = get_function_part_constraint (cfi, fi_clobbers);
+	  process_constraint (new_constraint (lhs, rhs));
+
+	  /* The caller uses what the callee does.  */
+	  lhs = get_function_part_constraint (fi, fi