On Sun, Jul 26, 2015 at 4:21 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > I wrote an equivalent test-case in C: > ... > $ cat src/gcc/testsuite/gcc.dg/autopar/outer-7.c > /* { dg-do compile } */ > /* { dg-options "-O2 -ftree-parallelize-loops=2 -fdump-tree-parloops-details > -fdump-tree-optimized" } */ > > void abort (void); > > #define N 500 > > int > main (void) > { > int i, j; > int x[N][N]; > int *y = &x[0][0]; > > for (i = 0; i < N; i++) > for (j = 0; j < N; j++) > /* y[i * N + j] == x[i][j]. */ > y[i * N + j] = i + j + 3; > > for (i = 0; i < N; i++) > for (j = 0; j < N; j++) > if (x[i][j] != i + j + 3) > abort (); > > return 0; > } > > /* Check that outer loop is parallelized. */ > /* { dg-final { scan-tree-dump-times "parallelizing outer loop" 1 "parloops" > } } */ > /* { dg-final { scan-tree-dump-times "loopfn" 4 "optimized" } } */ > ... > > With -fno-tree-loop-ivcanon to keep original iteration order we get: > ... > #(Data Ref: > # bb: 4 > # stmt: *_15 = _17; > # ref: *_15; > # base_object: MEM[(int *)&x]; > # Access function 0: {{0B, +, 2000}_1, +, 4}_4 > #) > #(Data Ref: > # bb: 4 > # stmt: *_15 = _17; > # ref: *_15; > # base_object: MEM[(int *)&x]; > # Access function 0: {{0B, +, 2000}_1, +, 4}_4 > #) > access_fn_A: {{0B, +, 2000}_1, +, 4}_4 > access_fn_B: {{0B, +, 2000}_1, +, 4}_4 > > (subscript > iterations_that_access_an_element_twice_in_A: [0] > last_conflict: scev_not_known > iterations_that_access_an_element_twice_in_B: [0] > last_conflict: scev_not_known > (Subscript distance: 0 )) > inner loop index: 0 > loop nest: (1 4 ) > distance_vector: 0 0 > distance_vector: 1 -500 > direction_vector: = = > direction_vector: + - > ) > FAILED: data dependencies exist across iterations > ... > > If we replace the y[i * N + j] with x[i][j] we get instead: > ... > #(Data Ref: > # bb: 4 > # stmt: x[i_7][j_8] = _12; > # ref: x[i_7][j_8]; > # base_object: x; > # Access function 0: {0, +, 1}_4 > # Access function 1: {0, +, 1}_1 > #) > #(Data Ref: > # bb: 4 > # stmt: x[i_7][j_8] = _12; > # ref: x[i_7][j_8]; > # base_object: x; > # Access function 0: {0, +, 1}_4 > # Access function 1: {0, +, 1}_1 > #) > access_fn_A: {0, +, 1}_4 > access_fn_B: {0, +, 1}_4 > > (subscript > iterations_that_access_an_element_twice_in_A: [0] > last_conflict: scev_not_known > iterations_that_access_an_element_twice_in_B: [0] > last_conflict: scev_not_known > (Subscript distance: 0 )) > access_fn_A: {0, +, 1}_1 > access_fn_B: {0, +, 1}_1 > > (subscript > iterations_that_access_an_element_twice_in_A: [0] > last_conflict: scev_not_known > iterations_that_access_an_element_twice_in_B: [0] > last_conflict: scev_not_known > (Subscript distance: 0 )) > inner loop index: 0 > loop nest: (1 4 ) > distance_vector: 0 0 > direction_vector: = = > ) > SUCCESS: may be parallelized > parallelizing outer loop 8 > ...
It looks like a delinearization pass could help reconstruct a two dimension array reference, and make the Banerjee dependence test succeed. Note that Graphite works in this case just because the loop bounds are statically defined: N is 500. Now if you have N passed in as a function parameter, Graphite would also fail to compute the dependence, as it cannot represent "i * N", so we would also need the delinearization pass for Graphite. Here is a bug that I recently opened for that: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66981 Sebastian