Re: [PATCH] Fix PR84859

Richard Biener Tue, 20 Mar 2018 05:42:55 -0700

On Tue, 20 Mar 2018, Richard Biener wrote:

> On Tue, 20 Mar 2018, Bin.Cheng wrote:
> 
> > On Mon, Mar 19, 2018 at 11:57 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> > > On Fri, Mar 16, 2018 at 03:08:17PM +0100, Richard Biener wrote:
> > >> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok?
> > >
> > > Ok, thanks.
> > Hi Richard,
> > 
> > This change causes below test failure on aarch64/arm linux:
> > FAIL: libgomp.graphite/force-parallel-4.c scan-tree-dump-times
> > graphite "2 loops carried no dependency" 1
> 
> So it looks like the conditional stores to A[i] in
> 
>   for (i = 0; i < N; i++)
>     {
>       if (i < T)
>         /* loop i1: carried no dependency.  */
>         A[i] = A[i+T];
>       else
>         /* loop i2: carried dependency.  */
>         A[i] = A[i+T+1];
>     }
> 
> vs. the stores commoned in the merge block causes the merge
> block to appear in the schedule and thus
> 
>  [scheduler] original ast:
>  {
>    for (int c0 = 0; c0 <= 19999; c0 += 1)
>      S_3(c0);
> -  for (int c0 = 0; c0 <= 999; c0 += 1)
> -    S_5(c0);
> -  for (int c0 = 1000; c0 <= 9999; c0 += 1)
> -    S_6(c0);
> +  for (int c0 = 0; c0 <= 9999; c0 += 1) {
> +    if (c0 >= 1000) {
> +      S_6(c0);
> +    } else {
> +      S_5(c0);
> +    }
> +    S_7(c0);
> +  }
>  }
> 
>  [scheduler] AST generated by isl:
>  {
>    for (int c0 = 0; c0 <= 19999; c0 += 1)
>      S_3(c0);
> -  for (int c0 = 0; c0 <= 999; c0 += 1)
> +  for (int c0 = 0; c0 <= 999; c0 += 1) {
>      S_5(c0);
> -  for (int c0 = 1000; c0 <= 9999; c0 += 1)
> +    S_7(c0);
> +  }
> +  for (int c0 = 1000; c0 <= 9999; c0 += 1) {
>      S_6(c0);
> +    S_7(c0);
> +  }
>  }
> 
> not sure why neither of the two loops end up parallelizable then.
> Possibly because of the dependences we generate for the cross-BB
> scalar deps which store and load to a "scalar" memory location
> (in the end those are just SSA dependences).  So it's an artificial
> limitation caused by how we represent scheduling dependences
> I guess:
> 
> +Adding scalar write: _4
> +From stmt: _4 = A[_3];
> +Adding scalar write: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> ...
> +Adding scalar write: _2
> +From stmt: _2 = A[_1];
> +Adding scalar write: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> ...
> +Adding scalar read: cstore_13
> +From stmt: cstore_13 = PHI <_2(5), _4(6)>
> 
> I have no idea/plan to improve that on the GRAPHITE/ISL side right now
> but maybe Sebastian has?  We'd need to tell ISL that we can "elide"
> the dependence or rather that it is inner-iteration only and doesn't
> leak cross-iteration - kind of clobbering it at the end?
> 
> So I'm going to XFAIL the testcase.


2018-03-20  Richard Biener  <rguent...@suse.de>

        * testsuite/libgomp.graphite/force-parallel-4.c: XFAIL one
        parallelizable loop.

Index: libgomp/testsuite/libgomp.graphite/force-parallel-4.c
===================================================================
--- libgomp/testsuite/libgomp.graphite/force-parallel-4.c       (revision 
258678)
+++ libgomp/testsuite/libgomp.graphite/force-parallel-4.c       (working copy)
@@ -46,7 +46,10 @@ int main(void)
   return 0;
 }
 
-/* Check that parallel code generation part make the right answer.  */
-/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 
"graphite" } } */
+/* Check that parallel code generation part make the right answer.
+   ???  XFAILed for i1 because conditional store elimination wrecks
+   our dependence representation.  */
+/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 
"graphite" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "1 loops carried no dependency" 1 
"graphite" } } */
 /* { dg-final { scan-tree-dump-times "loopfn.0" 4 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "loopfn.1" 4 "optimized" } } */

Re: [PATCH] Fix PR84859

Reply via email to