On Tue, Dec 01, 2015 at 06:28:27PM +0300, Alexander Monakov wrote:
> @@ -10218,12 +10218,37 @@ expand_omp_simd (struct omp_region *region, struct 
> omp_for_data *fd)
>  
>    n1 = fd->loop.n1;
>    n2 = fd->loop.n2;
> +  step = fd->loop.step;
> +  bool do_simt_transform
> +    = (cgraph_node::get (current_function_decl)->offloadable
> +       && !broken_loop
> +       && !safelen
> +       && !simduid
> +       && !(fd->collapse > 1));

expand_omp is depth-first expansion, so for the case where the simd
region is in lexically (directly or indirectly) nested inside of a
target region, the above will not trigger.  You'd need to
use cgraph_node::get (current_function_decl)->offloadable or
just walk through outer fields of region up and see if this isn't in
a target region.

Also, please consider privatized variables in the simd loops.
int
foo (int *p)
{
  int r = 0, i;
  #pragma omp simd reduction(+:r)
  for (i = 0; i < 32; i++)
    {
      p[i] += i;
      r += i;
    }
  return r;
}
#pragma omp declare target to (foo)

int
main ()
{
  int p[32], err, i;
  for (i = 0; i < 32; i++)
    p[i] = i;
  #pragma omp target map(tofrom:p) map(from:err)
  {
    int r = 0;
    #pragma omp simd reduction(+:r)
    for (i = 0; i < 32; i++)
    {
      p[i] += i;
      r += i;
    }
    err = r != 31 * 32 / 2;
    err |= foo (p) != 31 * 32 / 2;
  }
  if (err)
    __builtin_abort ();
  for (i = 0; i < 32; i++)
    if (p[i] != 3 * i)
      __builtin_abort ();
  return 0;
}

Here, it would be nice to extend omp_max_vf in the host compiler,
such that if PTX offloading is enabled, and optimize && !optimize_debug
(and vectorizer on the host not disabled, otherwise it won't be cleaned up
on the host), it returns MIN (32, whatever it would return otherwise).
And then arrange for the stores to and other operations on the "omp simd array"
attributed arrays before/after the simd loop to be handled specially for
SIMT, basically you want those to be .local, if non-addressable handled as
any other scalars, the loop up to GOMP_SIMD_LANES run exactly once, and for
the various reductions or lastprivate selection reduce it the SIMT way or
pick value from the thread in warp that had the last SIMT lane, etc.

> +  if (do_simt_transform)
> +    {
> +      tree simt_lane
> +     = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_LANE,
> +                                     integer_type_node, 0);
> +      simt_lane = fold_convert (TREE_TYPE (step), simt_lane);
> +      simt_lane = fold_build2 (MULT_EXPR, TREE_TYPE (step), step, simt_lane);
> +      cfun->curr_properties &= ~PROP_gimple_lomp_dev;

How does this even compile?  simt_lane is a local var in the if
(do_simt_transform) body.
> +    }
> +
>    if (gimple_omp_for_combined_into_p (fd->for_stmt))
>      {
>        tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
>                                    OMP_CLAUSE__LOOPTEMP_);
>        gcc_assert (innerc);
>        n1 = OMP_CLAUSE_DECL (innerc);
> +      if (do_simt_transform)
> +     {
> +       n1 = fold_convert (type, n1);
> +       if (POINTER_TYPE_P (type))
> +         n1 = fold_build_pointer_plus (n1, simt_lane);

And then you use it here, outside of its scope.

BTW, again, it would help if you post a simple *.ompexp dump on what exactly
you want to look it up.

        Jakub

Reply via email to