On Fri, 14 Jun 2019, Jakub Jelinek wrote:

> Hi!
> 
> OpenMP 5.0 introduced scan reductions, like:
>   #pragma omp simd reduction (inscan, +:r)
>   for (int i = 0; i < 1024; i++)
>     {
>       r += a[i];
>       #pragma omp scan inclusive(r)
>       b[i] = r;
>     }
> where there are 2 parts of code in each iteration, one which is supposed
> to compute the value for the privatized reduction variable (the private
> copy is initialized with a neutral element of the operation at the
> start of that part), and then the #pragma omp scan is supposed to
> change that private variable to include (in this case) inclusive partial
> sums.  E.g. PSTL we now have in libstdc++-v3/include/pstl/ makes use of
> these when available to implement std::*_scan.  It can be done also in
> worksharing loops, but I'll get to that later.
> 
> Anyway, the problem is that e.g. with OpenMP user defined reductions,
> the initializer and combiner of the reduction aren't simple operations
> during OpenMP lowering, it can be a method call or constructor call etc.,
> so we need something that preserves those initializer and combiner snippets
> in the IL for the vectorizer to be able to optimize them if they are
> simplified enough, on the other side it needs to be something that the
> normal optimizers are able to optimize and that actually works even when
> the vectorization isn't performed.
> 
> The following (incomplete, but far enough that for non-user defined
> reductions it handles the inclusive scan) patch handles that by using
> more magic, it adds variants to the .GOMP_SIMD_LANE builtin and uses those,
> the old one (0) in the user code, another variant (1) in the initializer
> and another variant (2) in the combiner pattern, which the vectorizer then
> needs to pattern recognize and either vectorize, or punt on vectorization.
> If it vectorizes it, it emits code like (optimized dump):
>   <bb 5> [local count: 708669599]:
>   # ivtmp.27_45 = PHI <0(4), ivtmp.27_12(5)>
>   # D__lsm.39_80 = PHI <D__lsm.39_47(4), _64(5)>
>   vect__4.15_49 = MEM[base: a_23(D), index: ivtmp.27_45, offset: 0B];
>   _57 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, vect__4.15_49, { 0, 8, 9, 
> 10, 11, 12, 13, 14 }>;
>   _58 = vect__4.15_49 + _57;
>   _59 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _58, { 0, 1, 8, 9, 10, 11, 
> 12, 13 }>;
>   _60 = _58 + _59;
>   _61 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _60, { 0, 1, 2, 3, 8, 9, 
> 10, 11 }>;
>   _62 = _60 + _61;
>   _63 = _62 + D__lsm.39_80;
>   _64 = VEC_PERM_EXPR <_63, _63, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>   MEM[base: b_32(D), index: ivtmp.27_45, offset: 0B] = _63;
>   ivtmp.27_12 = ivtmp.27_45 + 32;
>   if (ivtmp.27_12 != 4096)
>     goto <bb 5>; [83.33%]
>   else
>     goto <bb 6>; [16.67%]
> where the _57 ... _64 sequence is the implementation of the scan directive.
> 
> Does this look reasonable?

Ugh, not pretty but probably best we can do.  Btw, can you please
add support for the SLP case and group_size == 1?  I know I'm slow
with the branch ripping out the non-SLP path but it would save me
some extra work (possibly).

Thanks,
Richard.

> BTW, unfortunately SSE2 can't handle these permutations, probably I'll need
> optionally some other sequence if they aren't supported (only SSE4 does).
> In particular, what could be done is use whole vector shifts and
> VEC_COND_EXPR to blend the neutral element in.
> 
> --- gcc/tree-vect-stmts.c.jj  2019-06-13 13:28:36.636155362 +0200
> +++ gcc/tree-vect-stmts.c     2019-06-14 19:05:18.150502242 +0200
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-ssa-loop-niter.h"
>  #include "gimple-fold.h"
>  #include "regs.h"
> +#include "attribs.h"
>  
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> @@ -3257,7 +3258,7 @@ vectorizable_call (stmt_vec_info stmt_in
>    if (nargs == 0 || nargs > 4)
>      return false;
>  
> -  /* Ignore the argument of IFN_GOMP_SIMD_LANE, it is magic.  */
> +  /* Ignore the arguments of IFN_GOMP_SIMD_LANE, they are magic.  */
>    combined_fn cfn = gimple_call_combined_fn (stmt);
>    if (cfn == CFN_GOMP_SIMD_LANE)
>      {
> @@ -6320,6 +6321,456 @@ get_group_alias_ptr_type (stmt_vec_info
>  }
>  
>  
> +/* Function check_scan_store.
> +
> +   Check magic stores for #pragma omp scan {in,ex}clusive reductions.  */
> +
> +static bool
> +check_scan_store (stmt_vec_info stmt_info, tree vectype,
> +               enum vect_def_type rhs_dt, bool slp, tree mask,
> +               vect_memory_access_type memory_access_type)
> +{
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +  tree ref_type;
> +
> +  gcc_assert (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1);
> +  if (slp
> +      || mask
> +      || memory_access_type != VMAT_CONTIGUOUS
> +      || TREE_CODE (DR_BASE_ADDRESS (dr_info->dr)) != ADDR_EXPR
> +      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0))
> +      || loop_vinfo == NULL
> +      || LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +      || STMT_VINFO_GROUPED_ACCESS (stmt_info)
> +      || !integer_zerop (DR_OFFSET (dr_info->dr))
> +      || !integer_zerop (DR_INIT (dr_info->dr))
> +      || !(ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr)))
> +      || !alias_sets_conflict_p (get_alias_set (vectype),
> +                              get_alias_set (TREE_TYPE (ref_type))))
> +    {
> +      if (dump_enabled_p ())
> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                      "unsupported OpenMP scan store.\n");
> +      return false;
> +    }
> +
> +  /* We need to pattern match code built by OpenMP lowering and simplified
> +     by following optimizations into something we can handle.
> +     #pragma omp simd reduction(inscan,+:r)
> +     for (...)
> +       {
> +      r += something ();
> +      #pragma omp scan inclusive (r)
> +      use (r);
> +       }
> +     shall have body with:
> +       // Initialization for input phase, store the reduction initializer:
> +       _20 = .GOMP_SIMD_LANE (simduid.3_14(D), 0);
> +       _21 = .GOMP_SIMD_LANE (simduid.3_14(D), 1);
> +       D.2042[_21] = 0;
> +       // Actual input phase:
> +       ...
> +       r.0_5 = D.2042[_20];
> +       _6 = _4 + r.0_5;
> +       D.2042[_20] = _6;
> +       // Initialization for scan phase:
> +       _25 = .GOMP_SIMD_LANE (simduid.3_14(D), 2);
> +       _26 = D.2043[_25];
> +       _27 = D.2042[_25];
> +       _28 = _26 + _27;
> +       D.2043[_25] = _28;
> +       D.2042[_25] = _28;
> +       // Actual scan phase:
> +       ...
> +       r.1_8 = D.2042[_20];
> +       ...
> +     The "omp simd array" variable D.2042 holds the privatized copy used
> +     inside of the loop and D.2043 is another one that holds copies of
> +     the current original list item.  The separate GOMP_SIMD_LANE ifn
> +     kinds are there in order to allow optimizing the initializer store
> +     and combiner sequence, e.g. if it is originally some C++ish user
> +     defined reduction, but allow the vectorizer to pattern recognize it
> +     and turn into the appropriate vectorized scan.  */
> +
> +  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 2)
> +    {
> +      /* Match the D.2042[_21] = 0; store above.  Just require that
> +      it is a constant or external definition store.  */
> +      if (rhs_dt != vect_constant_def && rhs_dt != vect_external_def)
> +     {
> +      fail_init:
> +       if (dump_enabled_p ())
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "unsupported OpenMP scan initializer store.\n");
> +       return false;
> +     }
> +
> +      if (! loop_vinfo->scan_map)
> +     loop_vinfo->scan_map = new hash_map<tree, tree>;
> +      tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +      tree &cached = loop_vinfo->scan_map->get_or_insert (var);
> +      if (cached)
> +     goto fail_init;
> +      cached = gimple_assign_rhs1 (STMT_VINFO_STMT (stmt_info));
> +
> +      /* These stores can be vectorized normally.  */
> +      return true;
> +    }
> +
> +  if (rhs_dt != vect_internal_def)
> +    {
> +     fail:
> +      if (dump_enabled_p ())
> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                      "unsupported OpenMP scan combiner pattern.\n");
> +      return false;
> +    }
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    goto fail;
> +
> +  use_operand_p use_p;
> +  imm_use_iterator iter;
> +  gimple *other_store_stmt = NULL;
> +  FOR_EACH_IMM_USE_FAST (use_p, iter, rhs)
> +    {
> +      gimple *use_stmt = USE_STMT (use_p);
> +      if (use_stmt == stmt || is_gimple_debug (use_stmt))
> +     continue;
> +      if (gimple_bb (use_stmt) != gimple_bb (stmt)
> +       || !gimple_store_p (use_stmt)
> +       || other_store_stmt)
> +     goto fail;
> +      other_store_stmt = use_stmt;
> +    }
> +  if (other_store_stmt == NULL)
> +    goto fail;
> +  stmt_vec_info other_store_stmt_info
> +    = loop_vinfo->lookup_stmt (other_store_stmt);
> +  if (other_store_stmt_info == NULL
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (other_store_stmt_info) != 3)
> +    goto fail;
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> +  if (gimple_bb (def_stmt) != gimple_bb (stmt)
> +      || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_class (def_stmt) != GIMPLE_BINARY_RHS)
> +    goto fail;
> +
> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> +  /* For pointer addition, we should use the normal plus for the vector
> +     operation.  */
> +  switch (code)
> +    {
> +    case POINTER_PLUS_EXPR:
> +      code = PLUS_EXPR;
> +      break;
> +    case MULT_HIGHPART_EXPR:
> +      goto fail;
> +    default:
> +      break;
> +    }
> +  if (TREE_CODE_LENGTH (code) != binary_op || !commutative_tree_code (code))
> +    goto fail;
> +
> +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> +  if (TREE_CODE (rhs1) != SSA_NAME
> +      || TREE_CODE (rhs2) != SSA_NAME)
> +    goto fail;
> +
> +  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> +  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> +  if (gimple_bb (load1_stmt) != gimple_bb (stmt)
> +      || !gimple_assign_load_p (load1_stmt)
> +      || gimple_bb (load2_stmt) != gimple_bb (stmt)
> +      || !gimple_assign_load_p (load2_stmt))
> +    goto fail;
> +
> +  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> +  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> +  if (load1_stmt_info == NULL
> +      || load2_stmt_info == NULL
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (load1_stmt_info) != 3
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (load2_stmt_info) != 3)
> +    goto fail;
> +
> +  if (operand_equal_p (gimple_assign_lhs (stmt),
> +                    gimple_assign_rhs1 (load2_stmt), 0))
> +    {
> +      std::swap (rhs1, rhs2);
> +      std::swap (load1_stmt, load2_stmt);
> +      std::swap (load1_stmt_info, load2_stmt_info);
> +    }
> +  if (!operand_equal_p (gimple_assign_lhs (stmt),
> +                     gimple_assign_rhs1 (load1_stmt), 0)
> +      || !operand_equal_p (gimple_assign_lhs (other_store_stmt),
> +                        gimple_assign_rhs1 (load2_stmt), 0))
> +    goto fail;
> +
> +  dr_vec_info *other_dr_info = STMT_VINFO_DR_INFO (other_store_stmt_info);
> +  if (TREE_CODE (DR_BASE_ADDRESS (other_dr_info->dr)) != ADDR_EXPR
> +      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0)))
> +    goto fail;
> +
> +  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0);
> +  if (!lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var1))
> +      || !lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var2))
> +      || (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +      == (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var2))))
> +    goto fail;
> +
> +  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +    std::swap (var1, var2);
> +
> +  if (loop_vinfo->scan_map == NULL)
> +    goto fail;
> +  tree *init = loop_vinfo->scan_map->get (var1);
> +  if (init == NULL)
> +    goto fail;
> +
> +  /* The IL is as expected, now check if we can actually vectorize it.
> +       _26 = D.2043[_25];
> +       _27 = D.2042[_25];
> +       _28 = _26 + _27;
> +       D.2043[_25] = _28;
> +       D.2042[_25] = _28;
> +     should be vectorized as (where _40 is the vectorized rhs
> +     from the D.2042[_21] = 0; store):
> +       _30 = MEM <vector(8) int> [(int *)&D.2043];
> +       _31 = MEM <vector(8) int> [(int *)&D.2042];
> +       _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
> +       _33 = _31 + _32;
> +       // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
> +       _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
> +       _35 = _33 + _34;
> +       // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> +       //         _31[1]+.._31[4], ... _31[4]+.._31[7] };
> +       _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
> +       _37 = _35 + _36;
> +       // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> +       //         _31[0]+.._31[4], ... _31[0]+.._31[7] };
> +       _38 = _30 + _37;
> +       _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
> +       MEM <vector(8) int> [(int *)&D.2043] = _39;
> +       MEM <vector(8) int> [(int *)&D.2042] = _38;  */
> +  enum machine_mode vec_mode = TYPE_MODE (vectype);
> +  optab optab = optab_for_tree_code (code, vectype, optab_default);
> +  if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
> +    goto fail;
> +
> +  unsigned HOST_WIDE_INT nunits;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    goto fail;
> +  int units_log2 = exact_log2 (nunits);
> +  if (units_log2 <= 0)
> +    goto fail;
> +
> +  for (int i = 0; i <= units_log2; ++i)
> +    {
> +      unsigned HOST_WIDE_INT j, k;
> +      vec_perm_builder sel (nunits, nunits, 1);
> +      sel.quick_grow (nunits);
> +      if (i == units_log2)
> +     {
> +       for (j = 0; j < nunits; ++j)
> +         sel[j] = nunits - 1;
> +     }
> +      else
> +     {
> +       for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> +         sel[j] = nunits + j;
> +       for (k = 0; j < nunits; ++j, ++k)
> +         sel[j] = k;
> +     }
> +      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> +      if (!can_vec_perm_const_p (vec_mode, indices))
> +     goto fail;
> +    }
> +
> +  return true;
> +}
> +
> +
> +/* Function vectorizable_scan_store.
> +
> +   Helper of vectorizable_score, arguments like on vectorizable_store.
> +   Handle only the transformation, checking is done in check_scan_store.  */
> +
> +static bool
> +vectorizable_scan_store (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> +                      stmt_vec_info *vec_stmt, int ncopies)
> +{
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +  tree ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr));
> +  vec_info *vinfo = stmt_info->vinfo;
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +                  "transform scan store. ncopies = %d\n", ncopies);
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  gcc_assert (TREE_CODE (rhs) == SSA_NAME);
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> +  if (code == POINTER_PLUS_EXPR)
> +    code = PLUS_EXPR;
> +  gcc_assert (TREE_CODE_LENGTH (code) == binary_op
> +           && commutative_tree_code (code));
> +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> +  gcc_assert (TREE_CODE (rhs1) == SSA_NAME && TREE_CODE (rhs2) == SSA_NAME);
> +  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> +  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> +  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> +  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> +  dr_vec_info *load1_dr_info = STMT_VINFO_DR_INFO (load1_stmt_info);
> +  dr_vec_info *load2_dr_info = STMT_VINFO_DR_INFO (load2_stmt_info);
> +  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (load1_dr_info->dr), 0);
> +  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (load2_dr_info->dr), 0);
> +
> +  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +    {
> +      std::swap (rhs1, rhs2);
> +      std::swap (var1, var2);
> +    }
> +
> +  tree *init = loop_vinfo->scan_map->get (var1);
> +  gcc_assert (init);
> +
> +  tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +  bool inscan_var_store
> +    = lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var)) != NULL;
> +
> +  unsigned HOST_WIDE_INT nunits;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    gcc_unreachable ();
> +  int units_log2 = exact_log2 (nunits);
> +  gcc_assert (units_log2 > 0);
> +  auto_vec<tree, 16> perms;
> +  perms.quick_grow (units_log2 + 1);
> +  for (int i = 0; i <= units_log2; ++i)
> +    {
> +      unsigned HOST_WIDE_INT j, k;
> +      vec_perm_builder sel (nunits, nunits, 1);
> +      sel.quick_grow (nunits);
> +      if (i == units_log2)
> +     {
> +       for (j = 0; j < nunits; ++j)
> +         sel[j] = nunits - 1;
> +     }
> +      else
> +     {
> +       for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> +         sel[j] = nunits + j;
> +       for (k = 0; j < nunits; ++j, ++k)
> +         sel[j] = k;
> +     }
> +      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> +      perms[i] = vect_gen_perm_mask_checked (vectype, indices);
> +    }
> +
> +  stmt_vec_info prev_stmt_info = NULL;
> +  tree vec_oprnd1 = NULL_TREE;
> +  tree vec_oprnd2 = NULL_TREE;
> +  tree vec_oprnd3 = NULL_TREE;
> +  tree dataref_ptr = unshare_expr (DR_BASE_ADDRESS (dr_info->dr));
> +  tree dataref_offset = build_int_cst (ref_type, 0);
> +  tree bump = vect_get_data_ptr_increment (dr_info, vectype, 
> VMAT_CONTIGUOUS);
> +  tree orig = NULL_TREE;
> +  for (int j = 0; j < ncopies; j++)
> +    {
> +      stmt_vec_info new_stmt_info;
> +      if (j == 0)
> +     {
> +       vec_oprnd1 = vect_get_vec_def_for_operand (*init, stmt_info);
> +       vec_oprnd2 = vect_get_vec_def_for_operand (rhs1, stmt_info);
> +       vec_oprnd3 = vect_get_vec_def_for_operand (rhs2, stmt_info);
> +       orig = vec_oprnd3;
> +     }
> +      else
> +     {
> +       vec_oprnd1 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd1);
> +       vec_oprnd2 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd2);
> +       vec_oprnd3 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd3);
> +       if (!inscan_var_store)
> +         dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> +     }
> +
> +      tree v = vec_oprnd2;
> +      for (int i = 0; i < units_log2; ++i)
> +     {
> +       tree new_temp = make_ssa_name (vectype);
> +       gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
> +                                        vec_oprnd1, perms[i]);
> +       new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +       if (prev_stmt_info == NULL)
> +         STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
> +       else
> +         STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +       prev_stmt_info = new_stmt_info;
> +
> +       tree new_temp2 = make_ssa_name (vectype);
> +       g = gimple_build_assign (new_temp2, code, v, new_temp);
> +       new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +       STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +       prev_stmt_info = new_stmt_info;
> +
> +       v = new_temp2;
> +     }
> +
> +      tree new_temp = make_ssa_name (vectype);
> +      gimple *g = gimple_build_assign (new_temp, code, orig, v);
> +      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +      prev_stmt_info = new_stmt_info;
> +
> +      orig = make_ssa_name (vectype);
> +      g = gimple_build_assign (orig, VEC_PERM_EXPR, new_temp, new_temp,
> +                            perms[units_log2]);
> +      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +      prev_stmt_info = new_stmt_info;
> +
> +      if (!inscan_var_store)
> +     {
> +       tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> +                                    dataref_offset);
> +       vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> +       g = gimple_build_assign (data_ref, new_temp);
> +       new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +       STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +       prev_stmt_info = new_stmt_info;
> +     }
> +    }
> +
> +  if (inscan_var_store)
> +    for (int j = 0; j < ncopies; j++)
> +      {
> +     if (j != 0)
> +       dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> +
> +     tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> +                                  dataref_offset);
> +     vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> +     gimple *g = gimple_build_assign (data_ref, orig);
> +     stmt_vec_info new_stmt_info
> +       = vect_finish_stmt_generation (stmt_info, g, gsi);
> +     STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +     prev_stmt_info = new_stmt_info;
> +      }
> +  return true;
> +}
> +
> +
>  /* Function vectorizable_store.
>  
>     Check if STMT_INFO defines a non scalar data-ref (array/pointer/structure)
> @@ -6514,6 +6965,13 @@ vectorizable_store (stmt_vec_info stmt_i
>        group_size = vec_num = 1;
>      }
>  
> +  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1 && !vec_stmt)
> +    {
> +      if (!check_scan_store (stmt_info, vectype, rhs_dt, slp, mask,
> +                          memory_access_type))
> +     return false;
> +    }
> +
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
> @@ -6737,6 +7195,8 @@ vectorizable_store (stmt_vec_info stmt_i
>       }
>        return true;
>      }
> +  else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 3)
> +    return vectorizable_scan_store (stmt_info, gsi, vec_stmt, ncopies);
>  
>    if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>      DR_GROUP_STORE_COUNT (DR_GROUP_FIRST_ELEMENT (stmt_info))++;
> @@ -7162,7 +7622,7 @@ vectorizable_store (stmt_vec_info stmt_i
>         gcc_assert (useless_type_conversion_p (vectype,
>                                                TREE_TYPE (vec_oprnd)));
>         bool simd_lane_access_p
> -         = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> +         = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
>         if (simd_lane_access_p
>             && !loop_masks
>             && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> @@ -8347,7 +8807,7 @@ vectorizable_load (stmt_vec_info stmt_in
>        if (j == 0)
>       {
>         bool simd_lane_access_p
> -         = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> +         = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
>         if (simd_lane_access_p
>             && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
>             && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
> --- gcc/tree-vect-data-refs.c.jj      2019-06-13 12:06:17.786472401 +0200
> +++ gcc/tree-vect-data-refs.c 2019-06-14 09:52:14.920718040 +0200
> @@ -3003,6 +3003,13 @@ vect_analyze_data_ref_accesses (vec_info
>             || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
>           break;
>  
> +       /* Different .GOMP_SIMD_LANE calls still give the same lane,
> +          just hold extra information.  */
> +       if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_a)
> +           && STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_b)
> +           && data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb)) == 0)
> +         break;
> +
>         /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
>         HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
>         HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
> @@ -4101,7 +4108,8 @@ vect_find_stmt_data_reference (loop_p lo
>                         DR_STEP_ALIGNMENT (newdr)
>                           = highest_pow2_factor (step);
>                         /* Mark as simd-lane access.  */
> -                       newdr->aux = (void *)-1;
> +                       tree arg2 = gimple_call_arg (def, 1);
> +                       newdr->aux = (void *) (-1 - tree_to_uhwi (arg2));
>                         free_data_ref (dr);
>                         datarefs->safe_push (newdr);
>                         return opt_result::success ();
> @@ -4210,14 +4218,17 @@ vect_analyze_data_refs (vec_info *vinfo,
>          }
>  
>        /* See if this was detected as SIMD lane access.  */
> -      if (dr->aux == (void *)-1)
> +      if (dr->aux == (void *)-1
> +       || dr->aux == (void *)-2
> +       || dr->aux == (void *)-3)
>       {
>         if (nested_in_vect_loop_p (loop, stmt_info))
>           return opt_result::failure_at (stmt_info->stmt,
>                                          "not vectorized:"
>                                          " data ref analysis failed: %G",
>                                          stmt_info->stmt);
> -       STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true;
> +       STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info)
> +         = -(uintptr_t) dr->aux;
>       }
>  
>        tree base = get_base_address (DR_REF (dr));
> --- gcc/tree-vectorizer.h.jj  2019-06-13 12:50:31.597926603 +0200
> +++ gcc/tree-vectorizer.h     2019-06-14 16:51:53.155792356 +0200
> @@ -491,6 +491,10 @@ typedef struct _loop_vec_info : public v
>    /* Map of IV base/step expressions to inserted name in the preheader.  */
>    hash_map<tree_operand_hash, tree> *ivexpr_map;
>  
> +  /* Map of OpenMP "omp simd array" scan variables to corresponding
> +     rhs of the store of the initializer.  */
> +  hash_map<tree, tree> *scan_map;
> +
>    /* The unrolling factor needed to SLP the loop. In case of that pure SLP is
>       applied to the loop, i.e., no unrolling is needed, this is 1.  */
>    poly_uint64 slp_unrolling_factor;
> @@ -913,7 +917,7 @@ struct _stmt_vec_info {
>    bool strided_p;
>  
>    /* For both loads and stores.  */
> -  bool simd_lane_access_p;
> +  unsigned simd_lane_access_p : 2;
>  
>    /* Classifies how the load or store is going to be implemented
>       for loop vectorization.  */
> --- gcc/tree-ssa-dce.c.jj     2019-06-13 13:28:36.763153374 +0200
> +++ gcc/tree-ssa-dce.c        2019-06-13 14:20:14.889711910 +0200
> @@ -1339,14 +1339,14 @@ eliminate_unnecessary_stmts (void)
>                 update_stmt (stmt);
>                 release_ssa_name (name);
>  
> -               /* GOMP_SIMD_LANE (unless two argument) or ASAN_POISON
> +               /* GOMP_SIMD_LANE (unless three argument) or ASAN_POISON
>                    without lhs is not needed.  */
>                 if (gimple_call_internal_p (stmt))
>                   switch (gimple_call_internal_fn (stmt))
>                     {
>                     case IFN_GOMP_SIMD_LANE:
> -                     if (gimple_call_num_args (stmt) >= 2
> -                         && !integer_nonzerop (gimple_call_arg (stmt, 1)))
> +                     if (gimple_call_num_args (stmt) >= 3
> +                         && !integer_nonzerop (gimple_call_arg (stmt, 2)))
>                         break;
>                       /* FALLTHRU */
>                     case IFN_ASAN_POISON:
> --- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj        2019-06-14 
> 19:00:40.918765225 +0200
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c   2019-06-14 19:01:43.755798987 
> +0200
> @@ -0,0 +1,66 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fopenmp-simd" } */
> +
> +#include "tree-vect.h"
> +
> +int r, a[1024], b[1024];
> +
> +__attribute__((noipa)) void
> +foo (int *a, int *b)
> +{
> +  #pragma omp simd reduction (inscan, +:r)
> +  for (int i = 0; i < 1024; i++)
> +    {
> +      r += a[i];
> +      #pragma omp scan inclusive(r)
> +      b[i] = r;
> +    }
> +}
> +
> +__attribute__((noipa)) int
> +bar (void)
> +{
> +  int s = 0;
> +  #pragma omp simd reduction (inscan, +:s)
> +  for (int i = 0; i < 1024; i++)
> +    {
> +      s += 2 * a[i];
> +      #pragma omp scan inclusive(s)
> +      b[i] = s;
> +    }
> +  return s;
> +}
> +
> +int
> +main ()
> +{
> +  int s = 0;
> +  check_vect ();
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      a[i] = i;
> +      b[i] = -1;
> +      asm ("" : "+g" (i));
> +    }
> +  foo (a, b);
> +  if (r != 1024 * 1023 / 2)
> +    abort ();
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      s += i;
> +      if (b[i] != s)
> +     abort ();
> +      else
> +     b[i] = 25;
> +    }
> +  if (bar () != 1024 * 1023)
> +    abort ();
> +  s = 0;
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      s += 2 * i;
> +      if (b[i] != s)
> +     abort ();
> +    }
> +  return 0;
> +}
> --- gcc/omp-low.c.jj  2019-06-13 13:28:36.611155753 +0200
> +++ gcc/omp-low.c     2019-06-14 18:54:14.976699854 +0200
> @@ -141,6 +141,9 @@ struct omp_context
>    /* True if lower_omp_1 should look up lastprivate conditional in parent
>       context.  */
>    bool combined_into_simd_safelen0;
> +
> +  /* True if there is nested scan context with inclusive clause.  */
> +  bool scan_inclusive;
>  };
>  
>  static splay_tree all_contexts;
> @@ -3329,11 +3332,15 @@ scan_omp_1_stmt (gimple_stmt_iterator *g
>        scan_omp_single (as_a <gomp_single *> (stmt), ctx);
>        break;
>  
> +    case GIMPLE_OMP_SCAN:
> +      if (tree clauses = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> +     if (OMP_CLAUSE_CODE (clauses) == OMP_CLAUSE_INCLUSIVE)
> +       ctx->scan_inclusive = true;
> +      /* FALLTHRU */
>      case GIMPLE_OMP_SECTION:
>      case GIMPLE_OMP_MASTER:
>      case GIMPLE_OMP_ORDERED:
>      case GIMPLE_OMP_CRITICAL:
> -    case GIMPLE_OMP_SCAN:
>      case GIMPLE_OMP_GRID_BODY:
>        ctx = new_omp_context (stmt, ctx);
>        scan_omp (gimple_omp_body_ptr (stmt), ctx);
> @@ -3671,6 +3678,7 @@ struct omplow_simd_context {
>    omplow_simd_context () { memset (this, 0, sizeof (*this)); }
>    tree idx;
>    tree lane;
> +  tree lastlane;
>    vec<tree, va_heap> simt_eargs;
>    gimple_seq simt_dlist;
>    poly_uint64_pod max_vf;
> @@ -3682,7 +3690,8 @@ struct omplow_simd_context {
>  
>  static bool
>  lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
> -                           omplow_simd_context *sctx, tree &ivar, tree &lvar)
> +                           omplow_simd_context *sctx, tree &ivar,
> +                           tree &lvar, tree *rvar = NULL)
>  {
>    if (known_eq (sctx->max_vf, 0U))
>      {
> @@ -3738,7 +3747,27 @@ lower_rec_simd_input_clauses (tree new_v
>       = tree_cons (get_identifier ("omp simd array"), NULL,
>                    DECL_ATTRIBUTES (avar));
>        gimple_add_tmp_var (avar);
> -      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->idx,
> +      tree iavar = avar;
> +      if (rvar)
> +     {
> +       /* For inscan reductions, create another array temporary,
> +          which will hold the reduced value.  */
> +       iavar = create_tmp_var_raw (atype);
> +       if (TREE_ADDRESSABLE (new_var))
> +         TREE_ADDRESSABLE (iavar) = 1;
> +       DECL_ATTRIBUTES (iavar)
> +         = tree_cons (get_identifier ("omp simd array"), NULL,
> +                      tree_cons (get_identifier ("omp simd inscan"), NULL,
> +                                 DECL_ATTRIBUTES (iavar)));
> +       gimple_add_tmp_var (iavar);
> +       ctx->cb.decl_map->put (avar, iavar);
> +       if (sctx->lastlane == NULL_TREE)
> +         sctx->lastlane = create_tmp_var (unsigned_type_node);
> +       *rvar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar,
> +                       sctx->lastlane, NULL_TREE, NULL_TREE);
> +       TREE_THIS_NOTRAP (*rvar) = 1;
> +     }
> +      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar, sctx->idx,
>                    NULL_TREE, NULL_TREE);
>        lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->lane,
>                    NULL_TREE, NULL_TREE);
> @@ -3814,7 +3843,7 @@ lower_rec_input_clauses (tree clauses, g
>    omplow_simd_context sctx = omplow_simd_context ();
>    tree simt_lane = NULL_TREE, simtrec = NULL_TREE;
>    tree ivar = NULL_TREE, lvar = NULL_TREE, uid = NULL_TREE;
> -  gimple_seq llist[3] = { };
> +  gimple_seq llist[4] = { };
>    tree nonconst_simd_if = NULL_TREE;
>  
>    copyin_seq = NULL;
> @@ -5324,12 +5353,32 @@ lower_rec_input_clauses (tree clauses, g
>                     new_vard = TREE_OPERAND (new_var, 0);
>                     gcc_assert (DECL_P (new_vard));
>                   }
> +               tree rvar = NULL_TREE, *rvarp = NULL;
> +               if (is_simd
> +                   && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +                   && OMP_CLAUSE_REDUCTION_INSCAN (c))
> +                 rvarp = &rvar;
>                 if (is_simd
>                     && lower_rec_simd_input_clauses (new_var, ctx, &sctx,
> -                                                    ivar, lvar))
> +                                                    ivar, lvar, rvarp))
>                   {
> +                   if (new_vard != new_var)
> +                     {
> +                       SET_DECL_VALUE_EXPR (new_vard,
> +                                            build_fold_addr_expr (lvar));
> +                       DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> +                     }
> +
>                     tree ref = build_outer_var_ref (var, ctx);
>  
> +                   if (rvarp)
> +                     {
> +                       gimplify_assign (ivar, ref, &llist[0]);
> +                       ref = build_outer_var_ref (var, ctx);
> +                       gimplify_assign (ref, rvar, &llist[3]);
> +                       break;
> +                     }
> +
>                     gimplify_assign (unshare_expr (ivar), x, &llist[0]);
>  
>                     if (sctx.is_simt)
> @@ -5346,12 +5395,6 @@ lower_rec_input_clauses (tree clauses, g
>                     ref = build_outer_var_ref (var, ctx);
>                     gimplify_assign (ref, x, &llist[1]);
>  
> -                   if (new_vard != new_var)
> -                     {
> -                       SET_DECL_VALUE_EXPR (new_vard,
> -                                            build_fold_addr_expr (lvar));
> -                       DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> -                     }
>                   }
>                 else
>                   {
> @@ -5456,14 +5499,23 @@ lower_rec_input_clauses (tree clauses, g
>    if (sctx.lane)
>      {
>        gimple *g = gimple_build_call_internal (IFN_GOMP_SIMD_LANE,
> -                                           1 + (nonconst_simd_if != NULL),
> -                                           uid, nonconst_simd_if);
> +                                           2 + (nonconst_simd_if != NULL),
> +                                           uid, integer_zero_node,
> +                                           nonconst_simd_if);
>        gimple_call_set_lhs (g, sctx.lane);
>        gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr 
> (ctx->stmt));
>        gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
>        g = gimple_build_assign (sctx.lane, INTEGER_CST,
>                              build_int_cst (unsigned_type_node, 0));
>        gimple_seq_add_stmt (ilist, g);
> +      if (sctx.lastlane)
> +     {
> +       g = gimple_build_call_internal (IFN_GOMP_SIMD_LAST_LANE,
> +                                       2, uid, sctx.lane);
> +       gimple_call_set_lhs (g, sctx.lastlane);
> +       gimple_seq_add_stmt (dlist, g);
> +       gimple_seq_add_seq (dlist, llist[3]);
> +     }
>        /* Emit reductions across SIMT lanes in log_2(simt_vf) steps.  */
>        if (llist[2])
>       {
> @@ -5865,6 +5917,7 @@ lower_lastprivate_clauses (tree clauses,
>                 new_var = build4 (ARRAY_REF, TREE_TYPE (val),
>                                   TREE_OPERAND (val, 0), lastlane,
>                                   NULL_TREE, NULL_TREE);
> +               TREE_THIS_NOTRAP (new_var) = 1;
>               }
>           }
>         else if (maybe_simt)
> @@ -8371,6 +8424,108 @@ lower_omp_ordered (gimple_stmt_iterator
>  }
>  
>  
> +/* Expand code for an OpenMP scan directive and the structured block
> +   before the scan directive.  */
> +
> +static void
> +lower_omp_scan (gimple_stmt_iterator *gsi_p, omp_context *ctx)
> +{
> +  gimple *stmt = gsi_stmt (*gsi_p);
> +  bool has_clauses
> +    = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)) != NULL;
> +  tree lane = NULL_TREE;
> +  gimple_seq before = NULL;
> +  omp_context *octx = ctx->outer;
> +  gcc_assert (octx);
> +  bool input_phase = has_clauses ^ octx->scan_inclusive;
> +  if (gimple_code (octx->stmt) == GIMPLE_OMP_FOR
> +      && (gimple_omp_for_kind (octx->stmt) & GF_OMP_FOR_SIMD)
> +      && !gimple_omp_for_combined_into_p (octx->stmt)
> +      && octx->scan_inclusive)
> +    {
> +      if (tree c = omp_find_clause (gimple_omp_for_clauses (octx->stmt),
> +                                 OMP_CLAUSE__SIMDUID_))
> +     {
> +       tree uid = OMP_CLAUSE__SIMDUID__DECL (c);
> +       lane = create_tmp_var (unsigned_type_node);
> +       tree t = build_int_cst (integer_type_node, 1 + !input_phase);
> +       gimple *g
> +         = gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 2, uid, t);
> +       gimple_call_set_lhs (g, lane);
> +       gimple_seq_add_stmt (&before, g);
> +     }
> +      for (tree c = gimple_omp_for_clauses (octx->stmt);
> +        c; c = OMP_CLAUSE_CHAIN (c))
> +     if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +         && OMP_CLAUSE_REDUCTION_INSCAN (c))
> +       {
> +         tree var = OMP_CLAUSE_DECL (c);
> +         tree new_var = lookup_decl (var, octx);
> +         tree val = new_var;
> +         tree var2 = NULL_TREE;
> +         if (DECL_HAS_VALUE_EXPR_P (new_var))
> +           {
> +             val = DECL_VALUE_EXPR (new_var);
> +             if (TREE_CODE (val) == ARRAY_REF
> +                 && VAR_P (TREE_OPERAND (val, 0)))
> +               {
> +                 tree v = TREE_OPERAND (val, 0);
> +                 if (lookup_attribute ("omp simd array",
> +                                       DECL_ATTRIBUTES (v)))
> +                   {
> +                     val = unshare_expr (val);
> +                     TREE_OPERAND (val, 1) = lane;
> +                     if (!input_phase)
> +                       {
> +                         var2 = lookup_decl (v, octx);
> +                         var2 = build4 (ARRAY_REF, TREE_TYPE (val),
> +                                        var2, lane, NULL_TREE, NULL_TREE);
> +                         TREE_THIS_NOTRAP (var2) = 1;
> +                       }
> +                     else
> +                       var2 = val;
> +                   }
> +               }
> +           }
> +         if (var2 == NULL_TREE)
> +           gcc_unreachable ();
> +         if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> +           {
> +             gcc_unreachable ();
> +           }
> +         else
> +           {
> +             if (input_phase)
> +               {
> +                 /* input phase.  Set val to initializer before
> +                    the body.  */
> +                 tree x = omp_reduction_init (c, TREE_TYPE (new_var));
> +                 gimplify_assign (val, x, &before);
> +               }
> +             else
> +               {
> +                 /* scan phase.  */
> +                 enum tree_code code = OMP_CLAUSE_REDUCTION_CODE (c);
> +                 if (code == MINUS_EXPR)
> +                   code = PLUS_EXPR;
> +
> +                 tree x = build2 (code, TREE_TYPE (var2),
> +                                  unshare_expr (var2), unshare_expr (val));
> +                 gimplify_assign (unshare_expr (var2), x, &before);
> +                 gimplify_assign (val, var2, &before);
> +               }
> +           }
> +       }
> +    }
> +  else if (has_clauses)
> +    sorry_at (gimple_location (stmt),
> +           "%<#pragma omp scan%> not supported yet");
> +  gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> +  gsi_insert_seq_after (gsi_p, before, GSI_SAME_STMT);
> +  gsi_replace (gsi_p, gimple_build_nop (), true);
> +}
> +
> +
>  /* Gimplify a GIMPLE_OMP_CRITICAL statement.  This is a relatively simple
>     substitution of a couple of function calls.  But in the NAMED case,
>     requires that languages coordinate a symbol name.  It is therefore
> @@ -10843,11 +10998,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p
>      case GIMPLE_OMP_SCAN:
>        ctx = maybe_lookup_ctx (stmt);
>        gcc_assert (ctx);
> -      gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> -      if (gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> -     sorry_at (gimple_location (stmt),
> -               "%<#pragma omp scan%> not supported yet");
> -      gsi_replace (gsi_p, gimple_build_nop (), true);
> +      lower_omp_scan (gsi_p, ctx);
>        break;
>      case GIMPLE_OMP_CRITICAL:
>        ctx = maybe_lookup_ctx (stmt);
> --- gcc/tree-vect-loop.c.jj   2019-06-13 13:28:36.581156223 +0200
> +++ gcc/tree-vect-loop.c      2019-06-14 14:53:10.734986707 +0200
> @@ -824,6 +824,7 @@ _loop_vec_info::_loop_vec_info (struct l
>      peeling_for_alignment (0),
>      ptr_mask (0),
>      ivexpr_map (NULL),
> +    scan_map (NULL),
>      slp_unrolling_factor (1),
>      single_scalar_iteration_cost (0),
>      vectorizable (false),
> @@ -863,8 +864,8 @@ _loop_vec_info::_loop_vec_info (struct l
>         gimple *stmt = gsi_stmt (si);
>         gimple_set_uid (stmt, 0);
>         add_stmt (stmt);
> -       /* If .GOMP_SIMD_LANE call for the current loop has 2 arguments, the
> -          second argument is the #pragma omp simd if (x) condition, when 0,
> +       /* If .GOMP_SIMD_LANE call for the current loop has 3 arguments, the
> +          third argument is the #pragma omp simd if (x) condition, when 0,
>            loop shouldn't be vectorized, when non-zero constant, it should
>            be vectorized normally, otherwise versioned with vectorized loop
>            done if the condition is non-zero at runtime.  */
> @@ -872,12 +873,12 @@ _loop_vec_info::_loop_vec_info (struct l
>             && is_gimple_call (stmt)
>             && gimple_call_internal_p (stmt)
>             && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
> -           && gimple_call_num_args (stmt) >= 2
> +           && gimple_call_num_args (stmt) >= 3
>             && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
>             && (loop_in->simduid
>                 == SSA_NAME_VAR (gimple_call_arg (stmt, 0))))
>           {
> -           tree arg = gimple_call_arg (stmt, 1);
> +           tree arg = gimple_call_arg (stmt, 2);
>             if (integer_zerop (arg) || TREE_CODE (arg) == SSA_NAME)
>               simd_if_cond = arg;
>             else
> @@ -959,6 +960,7 @@ _loop_vec_info::~_loop_vec_info ()
>  
>    release_vec_loop_masks (&masks);
>    delete ivexpr_map;
> +  delete scan_map;
>  
>    loop->aux = NULL;
>  }
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Reply via email to