On 18/06/15 12:48, Richard Biener wrote:
On Thu, 18 Jun 2015, Tom de Vries wrote:

Hi,

I ran into a problem with fortran loops in oacc kernels regions not being
parallelized, after introducting transform_to_exit_first_loop_alt.

For gfortran.dg/goacc/kernels-loop.f95, we get:
...
#pragma omp target oacc_parallel num_gangs(1)
...
instead of the desired num_gangs (32).

transform_to_exit_first_loop_alt fails because nit is _135, where nit is
defined by:
...
*_105 = 0;
D__lsm.27_50 = *_105;
_32 = (unsigned int) D__lsm.27_50;
_135 = 1023 - _32;
...

pass_fre would manage to propagate the '*105 = 0' assignment. But in the
current pass order, pass_fre is run before pass_lim, where this pattern is
introduced:
...
               NEXT_PASS (pass_ch_oacc_kernels);
               NEXT_PASS (pass_fre);
               NEXT_PASS (pass_tree_loop_init);
               NEXT_PASS (pass_lim);
               NEXT_PASS (pass_copy_prop);
               NEXT_PASS (pass_scev_cprop);
               NEXT_PASS (pass_parallelize_loops_oacc_kernels);
               NEXT_PASS (pass_expand_omp_ssa);
               NEXT_PASS (pass_tree_loop_done);
...

The patch moves pass_fre to the location of pass_copy_prop, and replaces it.
Furthermore, it adds scans to the fortran test-cases to make sure they get
properly parallelized.

You may now figure out that LIM needs FRE to detect equal memory
references to apply store-motion.  But maybe the issues oacc
lowering introduces are limited and under your control.


To show the context of the pass group, after this commit the pass group looks like this:
...
          NEXT_PASS (pass_sra_early);
          NEXT_PASS (pass_build_ealias);
          NEXT_PASS (pass_fre);
          NEXT_PASS (pass_oacc_kernels);
          PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels)
              NEXT_PASS (pass_ch_oacc_kernels);
              NEXT_PASS (pass_tree_loop_init);
              NEXT_PASS (pass_lim);
              NEXT_PASS (pass_tree_loop_done);
              NEXT_PASS (pass_fre);
              NEXT_PASS (pass_tree_loop_init);
              NEXT_PASS (pass_scev_cprop);
              NEXT_PASS (pass_parallelize_loops_oacc_kernels);
              NEXT_PASS (pass_expand_omp_ssa);
              NEXT_PASS (pass_tree_loop_done);
          POP_INSERT_PASSES ()
          NEXT_PASS (pass_merge_phi);
          NEXT_PASS (pass_dse);
          NEXT_PASS (pass_cd_dce);
...
In other words, the pass group is run directly after pass_fre.

When I move pass_fre before the pass group to directly after the pass group, I start seeing the failure mode you describe.

Thanks,
- Tom

Reply via email to