Re: gomp-nvptx branch - middle-end changes

2016-11-22 Thread Jakub Jelinek
On Tue, Nov 22, 2016 at 08:25:45PM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > Ok for trunk, once the needed corresponding config/nvptx bits are committed,
> > with one nit below that needs immediate action and the rest can be resolved
> > incrementally.  I'd like to check in afterwards the attached patch, at least
> > for now, so that non-offloaded SIMD code is less affected.
> 
> Testing your patch revealed an issue in Fortran offloaded code; types of
> boolean_type_node in f951 and boolean_false_node in lto1 (when 
> omp_device_lower
> runs) don't match.  I'm attaching a revised patch that addresses it by simply
> using an integer type (there are also two other minor issues, below).

Ok.

> > Please change this into
> > (ENABLE_OFFLOADING && (flag_openmp || in_lto))
> > for now, so that we don't waste compile time even when clearly it
> > isn't needed, and incrementally change the inliner to propagate
> > the property.
> 
> As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally
> needs to accept ACCEL_COMPILER.  Applied like this:
> 
> +  virtual bool gate (function *ARG_UNUSED (fun))
> +{
> +  /* FIXME: this should use PROP_gimple_lomp_dev.  */
> +#ifdef ACCEL_COMPILER
> +  return true;
> +#else
> +  return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
> +#endif
> +}

Makes sense.

> > @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v
> >if (max_vf == 0)
> >  {
> >max_vf = omp_max_vf ();
> > +  if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
> > +  OMP_CLAUSE__SIMT_))
> > +   {
> > + int max_simt = omp_max_simt_vf ();
> > + max_vf = MAX (max_vf, max_simt);
> > +   }
> 
> I don't believe here there's a need to take a maximum.  Cloning the loop 
> upfront
> means that SIMD+SIMT styles are not going to mix within a single loop.  I've
> simplified it to an if-then-else in the revised patch.

Ok.

> > @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi
> >bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
> >for (struct omp_region *rgn = region; !offloaded && rgn; rgn = 
> > rgn->outer)
> >  offloaded = rgn->type == GIMPLE_OMP_TARGET;
> > -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
> > +  bool is_simt
> > += (offloaded
> > +   && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
> > +  OMP_CLAUSE__SIMT_)
> > +   && safelen_int > 1);
> 
> Here computation of 'offloaded' is no longer needed, because presence of
> OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.
> 
> I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" 
> lowering.
> Not sure if that's intentional -- as I understand it's possible to look at the
> enclosing context's clauses because 'omp ordered' must be closely nested with

Right now omp ordered simd for non-simt basically causes vf 1, because the
vectorizer isn't ready for having non-vectorized portions of code within
vectorized loop.

> the corresponding loop.  I've added a FIXME in the patch.

Ok for trunk, thanks.

Jakub


Re: gomp-nvptx branch - middle-end changes

2016-11-22 Thread Alexander Monakov
On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> Ok for trunk, once the needed corresponding config/nvptx bits are committed,
> with one nit below that needs immediate action and the rest can be resolved
> incrementally.  I'd like to check in afterwards the attached patch, at least
> for now, so that non-offloaded SIMD code is less affected.

Testing your patch revealed an issue in Fortran offloaded code; types of
boolean_type_node in f951 and boolean_false_node in lto1 (when omp_device_lower
runs) don't match.  I'm attaching a revised patch that addresses it by simply
using an integer type (there are also two other minor issues, below).

> Please change this into
> (ENABLE_OFFLOADING && (flag_openmp || in_lto))
> for now, so that we don't waste compile time even when clearly it
> isn't needed, and incrementally change the inliner to propagate
> the property.

As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally
needs to accept ACCEL_COMPILER.  Applied like this:

+  virtual bool gate (function *ARG_UNUSED (fun))
+{
+  /* FIXME: this should use PROP_gimple_lomp_dev.  */
+#ifdef ACCEL_COMPILER
+  return true;
+#else
+  return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
+#endif
+}


In your GOMP_USE_SIMT() patch,

> @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v
>if (max_vf == 0)
>  {
>max_vf = omp_max_vf ();
> +  if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
> +OMP_CLAUSE__SIMT_))
> + {
> +   int max_simt = omp_max_simt_vf ();
> +   max_vf = MAX (max_vf, max_simt);
> + }

I don't believe here there's a need to take a maximum.  Cloning the loop upfront
means that SIMD+SIMT styles are not going to mix within a single loop.  I've
simplified it to an if-then-else in the revised patch.

> @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi
>bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
>for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer)
>  offloaded = rgn->type == GIMPLE_OMP_TARGET;
> -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
> +  bool is_simt
> += (offloaded
> +   && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
> +OMP_CLAUSE__SIMT_)
> +   && safelen_int > 1);

Here computation of 'offloaded' is no longer needed, because presence of
OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.

I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" lowering.
Not sure if that's intentional -- as I understand it's possible to look at the
enclosing context's clauses because 'omp ordered' must be closely nested with
the corresponding loop.  I've added a FIXME in the patch.

Alexander	* internal-fn.c (expand_GOMP_USE_SIMT): New function.
	* tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands.
	(omp_clause_code_name): Add _simt_ name.
	(walk_tree_1): Handle OMP_CLAUSE__SIMT_.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_.
	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_.
	(scan_omp_simd): New function.
	(scan_omp_1_stmt): Use it in target regions if needed.
	(omp_max_vf): Don't max with omp_max_simt_vf.
	(lower_rec_simd_input_clauses): Use omp_max_simt_vf if
	OMP_CLAUSE__SIMT_ is present.
	(lower_rec_input_clauses): Compute maybe_simt from presence of
	OMP_CLAUSE__SIMT_.
	(lower_lastprivate_clauses): Likewise.
	(expand_omp_simd): Likewise.
	(execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT.
	* internal-fn.def (GOMP_USE_SIMT): New internal function.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 6cd8522..b1dbc98 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -158,6 +158,14 @@ expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_USE_SIMT (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
without SIMT execution this should be expanded in omp_device_lower pass.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index f055230..9a03e17 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -141,6 +141,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs, unary)
 DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6c52bff..eab0af5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -278,6 

Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Jakub Jelinek
On Fri, Nov 11, 2016 at 12:28:16PM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> 
> > On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> > > On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > > [...]
> > > > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > > > best place to do that is in omp expansion, not gimplification)
> > > 
> > > Sorry, I couldn't find a good way to implement that during omp expansion. 
> > >  The
> > > reason I went for gimplification is automatic discovery of sharing 
> > > clauses -
> > > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] 
> > > without
> > > gimplifier's help.  Does this sound sensible?
> > 
> > Sure, for discovery of needed sharing clauses the gimplifier has the right
> > infrastructure.  But that doesn't mean you can't add those clauses at
> > gimplification time and do the outlining at omp expansion time.
> > That is what is done for omp parallel, task etc. as well.  If the standard
> > OpenMP clauses can't serve that purpose, there is always the possibility of
> > adding further internal clauses, that would e.g. be only considered for the
> > SIMT stuff.  For the outlining, our current infrastructure really wants to
> > have CFG etc., something you don't have at gimplification time.
> 
> Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to 
> inject
> a parallel region with an artificial _simtreg_ clause, transforming
> 
>   #pragma omp simd
>   for (...)
> 
> into
> 
>   #pragma omp parallel _simtreg_
> #pragma omp simd
> for (...)
> 
> and then expansion of 'omp parallel' can check presence of _simtreg_ clause 
> and
> emit a direct call rather than an invocation of GOMP_parallel.

Well, I meant keep #pragma omp simd as is, just add some data-sharing-like
clauses _simt_shared_(x) or whatever you need, then the omplower versioning
patch I've posted could e.g. drop those _simt_shared_ or whatever else you
need clauses for the omp simd without _simt_ clause, omp lowering then would
do whatever is needed for those _simt_shared_ clauses and finally omp
expansion would outline it.  Adding omp parallel around the omp simd is just
weird, it has nothing to do with omp parallel.

Jakub


Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Alexander Monakov
On Fri, 11 Nov 2016, Jakub Jelinek wrote:

> On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> > On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> > [...]
> > > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > > best place to do that is in omp expansion, not gimplification)
> > 
> > Sorry, I couldn't find a good way to implement that during omp expansion.  
> > The
> > reason I went for gimplification is automatic discovery of sharing clauses -
> > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] 
> > without
> > gimplifier's help.  Does this sound sensible?
> 
> Sure, for discovery of needed sharing clauses the gimplifier has the right
> infrastructure.  But that doesn't mean you can't add those clauses at
> gimplification time and do the outlining at omp expansion time.
> That is what is done for omp parallel, task etc. as well.  If the standard
> OpenMP clauses can't serve that purpose, there is always the possibility of
> adding further internal clauses, that would e.g. be only considered for the
> SIMT stuff.  For the outlining, our current infrastructure really wants to
> have CFG etc., something you don't have at gimplification time.

Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to inject
a parallel region with an artificial _simtreg_ clause, transforming

  #pragma omp simd
  for (...)

into

  #pragma omp parallel _simtreg_
#pragma omp simd
for (...)

and then expansion of 'omp parallel' can check presence of _simtreg_ clause and
emit a direct call rather than an invocation of GOMP_parallel.

(a few days ago I've sent you privately a patch implementing the above)

Thanks.
Alexander


Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Jakub Jelinek
On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> [...]
> > the intended outlining of SIMT regions for PTX offloading done (IMHO the
> > best place to do that is in omp expansion, not gimplification)
> 
> Sorry, I couldn't find a good way to implement that during omp expansion.  The
> reason I went for gimplification is automatic discovery of sharing clauses -
> I'm assuming in expansion it's very hard to try and fill omp_data_[sio] 
> without
> gimplifier's help.  Does this sound sensible?

Sure, for discovery of needed sharing clauses the gimplifier has the right
infrastructure.  But that doesn't mean you can't add those clauses at
gimplification time and do the outlining at omp expansion time.
That is what is done for omp parallel, task etc. as well.  If the standard
OpenMP clauses can't serve that purpose, there is always the possibility of
adding further internal clauses, that would e.g. be only considered for the
SIMT stuff.  For the outlining, our current infrastructure really wants to
have CFG etc., something you don't have at gimplification time.

Jakub


Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Alexander Monakov
On Fri, 11 Nov 2016, Jakub Jelinek wrote:
[...]
> the intended outlining of SIMT regions for PTX offloading done (IMHO the
> best place to do that is in omp expansion, not gimplification)

Sorry, I couldn't find a good way to implement that during omp expansion.  The
reason I went for gimplification is automatic discovery of sharing clauses -
I'm assuming in expansion it's very hard to try and fill omp_data_[sio] without
gimplifier's help.  Does this sound sensible?

Thanks.
Alexander


Re: gomp-nvptx branch - middle-end changes

2016-11-11 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 08:12:27PM +0300, Alexander Monakov wrote:
> gcc/
>   * internal-fn.c (expand_GOMP_SIMT_LANE): New.
>   (expand_GOMP_SIMT_VF): New.
>   (expand_GOMP_SIMT_LAST_LANE): New.
>   (expand_GOMP_SIMT_ORDERED_PRED): New.
>   (expand_GOMP_SIMT_VOTE_ANY): New.
>   (expand_GOMP_SIMT_XCHG_BFLY): New.
>   (expand_GOMP_SIMT_XCHG_IDX): New.
>   * internal-fn.def (GOMP_SIMT_LANE): New.
>   (GOMP_SIMT_VF): New.
>   (GOMP_SIMT_LAST_LANE): New.
>   (GOMP_SIMT_ORDERED_PRED): New.
>   (GOMP_SIMT_VOTE_ANY): New.
>   (GOMP_SIMT_XCHG_BFLY): New.
>   (GOMP_SIMT_XCHG_IDX): New.
>   * omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
>   (create_omp_child_function): ...here.  Set "omp target entrypoint"
>   or "omp declare target" attribute based on is_gimple_omp_offloaded.
>   (omp_max_simt_vf): New.  Use it...
>   (omp_max_vf): ...here.
>   (lower_rec_input_clauses): Add reduction lowering for SIMT execution.
>   (lower_lastprivate_clauses): Likewise, for "lastprivate" lowering.
>   (lower_omp_ordered): Likewise, for "ordered" lowering.
>   (expand_omp_simd): Add SIMT transforms.
>   (pass_data_lower_omp): Add PROP_gimple_lomp_dev.
>   (execute_omp_device_lower): New.
>   (pass_data_omp_device_lower): New.
>   (pass_omp_device_lower): New pass.
>   (make_pass_omp_device_lower): New.
>   * passes.def (pass_omp_device_lower): Position new pass.
>   * tree-pass.h (PROP_gimple_lomp_dev): Define.
>   (make_pass_omp_device_lower): Declare.

Ok for trunk, once the needed corresponding config/nvptx bits are committed,
with one nit below that needs immediate action and the rest can be resolved
incrementally.  I'd like to check in afterwards the attached patch, at least
for now, so that non-offloaded SIMD code is less affected.  Once you have
the intended outlining of SIMT regions for PTX offloading done (IMHO the
best place to do that is in omp expansion, not gimplification), you can
either base it on that, or revert and do earlier.

> +
> +/* Return maximum SIMT width if offloading may target SIMT hardware.  */
> +
> +static int
> +omp_max_simt_vf (void)
> +{
> +  if (!optimize)
> +return 0;
> +  if (ENABLE_OFFLOADING)
> +for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c; )
> +  {
> + if (!strncmp (c, "nvptx", strlen ("nvptx")))
> +   return 32;
> + else if ((c = strchr (c, ',')))
> +   c++;
> +  }
> +  return 0;
> +}

As discussed privately, this means one has to manually set OFFLOAD_TARGET_NAMES
in the environment when invoking ./cc1 or ./cc1plus in order to match ./gcc -B 
./
etc. behavior.  I think it would be better to change the driver so that
it sets OFFLOAD_TARGET_NAMES= in the environment when ENABLE_OFFLOADING, but
-foffload option is used to disable all offloading and then in this function
use the configured in offloading targets if ENABLE_OFFLOADING and
OFFLOAD_TARGET_NAMES is not in the environment.  Can be done incrementally.

> +
>  /* Return maximum possible vectorization factor for the target.  */
>  
>  static int
> @@ -4277,16 +4306,18 @@ omp_max_vf (void)
>|| global_options_set.x_flag_tree_vectorize)))
>  return 1;
>  
> +  int vf = 1;
>int vs = targetm.vectorize.autovectorize_vector_sizes ();
>if (vs)
> +vf = 1 << floor_log2 (vs);
> +  else
>  {
> -  vs = 1 << floor_log2 (vs);
> -  return vs;
> +  machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> +  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> + vf = GET_MODE_NUNITS (vqimode);
>  }
> -  machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
> -  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
> -return GET_MODE_NUNITS (vqimode);
> -  return 1;
> +  int svf = omp_max_simt_vf ();
> +  return MAX (vf, svf);

Increasing the vf even for host in non-offloaded regions is undesirable.
Can be partly solved by the attached patch I'm planning to apply
incrementally, the other part is for the simd modifier of schedule clause,
there I think what we want is use conditional expression (GOMP_USE_SIMT () ?
omp_max_simt_vf () : omp_max_vf).  I'll try to handle the schedule clause
later.

> +class pass_omp_device_lower : public gimple_opt_pass
> +{
> +public:
> +  pass_omp_device_lower (gcc::context *ctxt)
> +: gimple_opt_pass (pass_data_omp_device_lower, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *fun)
> +{
> +  /* FIXME: inlining does not propagate the lomp_dev property.  */
> +  return 1 || !(fun->curr_properties & PROP_gimple_lomp_dev);

Please change this into
(ENABLE_OFFLOADING && (flag_openmp || in_lto))
for now, so that we don't waste compile time even when clearly it
isn't needed, and incrementally change the inliner to propagate
the property.

Jakub
2016-11-11  Jakub Jelinek  

* internal-fn.c (expand_GOMP_USE

Re: gomp-nvptx branch - middle-end changes

2016-11-10 Thread Alexander Monakov
gcc/
* internal-fn.c (expand_GOMP_SIMT_LANE): New.
(expand_GOMP_SIMT_VF): New.
(expand_GOMP_SIMT_LAST_LANE): New.
(expand_GOMP_SIMT_ORDERED_PRED): New.
(expand_GOMP_SIMT_VOTE_ANY): New.
(expand_GOMP_SIMT_XCHG_BFLY): New.
(expand_GOMP_SIMT_XCHG_IDX): New.
* internal-fn.def (GOMP_SIMT_LANE): New.
(GOMP_SIMT_VF): New.
(GOMP_SIMT_LAST_LANE): New.
(GOMP_SIMT_ORDERED_PRED): New.
(GOMP_SIMT_VOTE_ANY): New.
(GOMP_SIMT_XCHG_BFLY): New.
(GOMP_SIMT_XCHG_IDX): New.
* omp-low.c (omp_maybe_offloaded_ctx): New, outlined from...
(create_omp_child_function): ...here.  Set "omp target entrypoint"
or "omp declare target" attribute based on is_gimple_omp_offloaded.
(omp_max_simt_vf): New.  Use it...
(omp_max_vf): ...here.
(lower_rec_input_clauses): Add reduction lowering for SIMT execution.
(lower_lastprivate_clauses): Likewise, for "lastprivate" lowering.
(lower_omp_ordered): Likewise, for "ordered" lowering.
(expand_omp_simd): Add SIMT transforms.
(pass_data_lower_omp): Add PROP_gimple_lomp_dev.
(execute_omp_device_lower): New.
(pass_data_omp_device_lower): New.
(pass_omp_device_lower): New pass.
(make_pass_omp_device_lower): New.
* passes.def (pass_omp_device_lower): Position new pass.
* tree-pass.h (PROP_gimple_lomp_dev): Define.
(make_pass_omp_device_lower): Declare.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index cbee97e..fd1cd8b 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -157,6 +157,132 @@ expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
+   without SIMT execution this should be expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_LANE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  gcc_assert (targetm.have_omp_simt_lane ());
+  emit_insn (targetm.gen_omp_simt_lane (target));
+}
+
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_VF (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
+/* Lane index of the first SIMT lane that supplies a non-zero argument.
+   This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the
+   lane that executed the last iteration for handling OpenMP lastprivate.  */
+
+static void
+expand_GOMP_SIMT_LAST_LANE (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_last_lane ());
+  expand_insn (targetm.code_for_omp_simt_last_lane, 2, ops);
+}
+
+/* Non-transparent predicate used in SIMT lowering of OpenMP "ordered".  */
+
+static void
+expand_GOMP_SIMT_ORDERED_PRED (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx ctr = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], ctr, mode);
+  gcc_assert (targetm.have_omp_simt_ordered ());
+  expand_insn (targetm.code_for_omp_simt_ordered, 2, ops);
+}
+
+/* "Or" boolean reduction across SIMT lanes: return non-zero in all lanes if
+   any lane supplies a non-zero argument.  */
+
+static void
+expand_GOMP_SIMT_VOTE_ANY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx cond = expand_normal (gimple_call_arg (stmt, 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_operand ops[2];
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], cond, mode);
+  gcc_assert (targetm.have_omp_simt_vote_any ());
+  expand_insn (targetm.code_for_omp_simt_vote_any, 2, ops);
+}
+
+/* Exchange between SIMT lanes with a "butterfly" pattern: source lane index
+   is destination lane index XOR given offset.  */
+
+static void
+expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  if (!lhs)
+return;
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  rtx src = expand_normal (gimple_call_arg (stmt, 0));
+  rtx idx = expand_normal (gimple_call_arg (stmt, 1));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+  struct expand_op