On Tue, 20 Mar 2018, Richard Biener wrote: > On Tue, 20 Mar 2018, Bin.Cheng wrote: > > > On Mon, Mar 19, 2018 at 11:57 AM, Jakub Jelinek <ja...@redhat.com> wrote: > > > On Fri, Mar 16, 2018 at 03:08:17PM +0100, Richard Biener wrote: > > >> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok? > > > > > > Ok, thanks. > > Hi Richard, > > > > This change causes below test failure on aarch64/arm linux: > > FAIL: libgomp.graphite/force-parallel-4.c scan-tree-dump-times > > graphite "2 loops carried no dependency" 1 > > So it looks like the conditional stores to A[i] in > > for (i = 0; i < N; i++) > { > if (i < T) > /* loop i1: carried no dependency. */ > A[i] = A[i+T]; > else > /* loop i2: carried dependency. */ > A[i] = A[i+T+1]; > } > > vs. the stores commoned in the merge block causes the merge > block to appear in the schedule and thus > > [scheduler] original ast: > { > for (int c0 = 0; c0 <= 19999; c0 += 1) > S_3(c0); > - for (int c0 = 0; c0 <= 999; c0 += 1) > - S_5(c0); > - for (int c0 = 1000; c0 <= 9999; c0 += 1) > - S_6(c0); > + for (int c0 = 0; c0 <= 9999; c0 += 1) { > + if (c0 >= 1000) { > + S_6(c0); > + } else { > + S_5(c0); > + } > + S_7(c0); > + } > } > > [scheduler] AST generated by isl: > { > for (int c0 = 0; c0 <= 19999; c0 += 1) > S_3(c0); > - for (int c0 = 0; c0 <= 999; c0 += 1) > + for (int c0 = 0; c0 <= 999; c0 += 1) { > S_5(c0); > - for (int c0 = 1000; c0 <= 9999; c0 += 1) > + S_7(c0); > + } > + for (int c0 = 1000; c0 <= 9999; c0 += 1) { > S_6(c0); > + S_7(c0); > + } > } > > not sure why neither of the two loops end up parallelizable then. > Possibly because of the dependences we generate for the cross-BB > scalar deps which store and load to a "scalar" memory location > (in the end those are just SSA dependences). So it's an artificial > limitation caused by how we represent scheduling dependences > I guess: > > +Adding scalar write: _4 > +From stmt: _4 = A[_3]; > +Adding scalar write: cstore_13 > +From stmt: cstore_13 = PHI <_2(5), _4(6)> > ... > +Adding scalar write: _2 > +From stmt: _2 = A[_1]; > +Adding scalar write: cstore_13 > +From stmt: cstore_13 = PHI <_2(5), _4(6)> > ... > +Adding scalar read: cstore_13 > +From stmt: cstore_13 = PHI <_2(5), _4(6)> > > I have no idea/plan to improve that on the GRAPHITE/ISL side right now > but maybe Sebastian has? We'd need to tell ISL that we can "elide" > the dependence or rather that it is inner-iteration only and doesn't > leak cross-iteration - kind of clobbering it at the end? > > So I'm going to XFAIL the testcase.
2018-03-20 Richard Biener <rguent...@suse.de> * testsuite/libgomp.graphite/force-parallel-4.c: XFAIL one parallelizable loop. Index: libgomp/testsuite/libgomp.graphite/force-parallel-4.c =================================================================== --- libgomp/testsuite/libgomp.graphite/force-parallel-4.c (revision 258678) +++ libgomp/testsuite/libgomp.graphite/force-parallel-4.c (working copy) @@ -46,7 +46,10 @@ int main(void) return 0; } -/* Check that parallel code generation part make the right answer. */ -/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" } } */ +/* Check that parallel code generation part make the right answer. + ??? XFAILed for i1 because conditional store elimination wrecks + our dependence representation. */ +/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump-times "1 loops carried no dependency" 1 "graphite" } } */ /* { dg-final { scan-tree-dump-times "loopfn.0" 4 "optimized" } } */ /* { dg-final { scan-tree-dump-times "loopfn.1" 4 "optimized" } } */