Re: gomp-nvptx branch - middle-end changes
On Tue, Nov 22, 2016 at 08:25:45PM +0300, Alexander Monakov wrote: > On Fri, 11 Nov 2016, Jakub Jelinek wrote: > > Ok for trunk, once the needed corresponding config/nvptx bits are committed, > > with one nit below that needs immediate action and the rest can be resolved > > incrementally. I'd like to check in afterwards the attached patch, at least > > for now, so that non-offloaded SIMD code is less affected. > > Testing your patch revealed an issue in Fortran offloaded code; types of > boolean_type_node in f951 and boolean_false_node in lto1 (when > omp_device_lower > runs) don't match. I'm attaching a revised patch that addresses it by simply > using an integer type (there are also two other minor issues, below). Ok. > > Please change this into > > (ENABLE_OFFLOADING && (flag_openmp || in_lto)) > > for now, so that we don't waste compile time even when clearly it > > isn't needed, and incrementally change the inliner to propagate > > the property. > > As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally > needs to accept ACCEL_COMPILER. Applied like this: > > + virtual bool gate (function *ARG_UNUSED (fun)) > +{ > + /* FIXME: this should use PROP_gimple_lomp_dev. */ > +#ifdef ACCEL_COMPILER > + return true; > +#else > + return ENABLE_OFFLOADING && (flag_openmp || in_lto_p); > +#endif > +} Makes sense. > > @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v > >if (max_vf == 0) > > { > >max_vf = omp_max_vf (); > > + if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt), > > + OMP_CLAUSE__SIMT_)) > > + { > > + int max_simt = omp_max_simt_vf (); > > + max_vf = MAX (max_vf, max_simt); > > + } > > I don't believe here there's a need to take a maximum. Cloning the loop > upfront > means that SIMD+SIMT styles are not going to mix within a single loop. I've > simplified it to an if-then-else in the revised patch. Ok. > > @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi > >bool offloaded = cgraph_node::get (current_function_decl)->offloadable; > >for (struct omp_region *rgn = region; !offloaded && rgn; rgn = > > rgn->outer) > > offloaded = rgn->type == GIMPLE_OMP_TARGET; > > - bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1; > > + bool is_simt > > += (offloaded > > + && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), > > + OMP_CLAUSE__SIMT_) > > + && safelen_int > 1); > > Here computation of 'offloaded' is no longer needed, because presence of > OMP_CLAUSE__SIMT_ would imply that. Removed in the revised patch. > > I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" > lowering. > Not sure if that's intentional -- as I understand it's possible to look at the > enclosing context's clauses because 'omp ordered' must be closely nested with Right now omp ordered simd for non-simt basically causes vf 1, because the vectorizer isn't ready for having non-vectorized portions of code within vectorized loop. > the corresponding loop. I've added a FIXME in the patch. Ok for trunk, thanks. Jakub
Re: gomp-nvptx branch - middle-end changes
On Fri, 11 Nov 2016, Jakub Jelinek wrote: > Ok for trunk, once the needed corresponding config/nvptx bits are committed, > with one nit below that needs immediate action and the rest can be resolved > incrementally. I'd like to check in afterwards the attached patch, at least > for now, so that non-offloaded SIMD code is less affected. Testing your patch revealed an issue in Fortran offloaded code; types of boolean_type_node in f951 and boolean_false_node in lto1 (when omp_device_lower runs) don't match. I'm attaching a revised patch that addresses it by simply using an integer type (there are also two other minor issues, below). > Please change this into > (ENABLE_OFFLOADING && (flag_openmp || in_lto)) > for now, so that we don't waste compile time even when clearly it > isn't needed, and incrementally change the inliner to propagate > the property. As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally needs to accept ACCEL_COMPILER. Applied like this: + virtual bool gate (function *ARG_UNUSED (fun)) +{ + /* FIXME: this should use PROP_gimple_lomp_dev. */ +#ifdef ACCEL_COMPILER + return true; +#else + return ENABLE_OFFLOADING && (flag_openmp || in_lto_p); +#endif +} In your GOMP_USE_SIMT() patch, > @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v >if (max_vf == 0) > { >max_vf = omp_max_vf (); > + if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt), > +OMP_CLAUSE__SIMT_)) > + { > + int max_simt = omp_max_simt_vf (); > + max_vf = MAX (max_vf, max_simt); > + } I don't believe here there's a need to take a maximum. Cloning the loop upfront means that SIMD+SIMT styles are not going to mix within a single loop. I've simplified it to an if-then-else in the revised patch. > @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi >bool offloaded = cgraph_node::get (current_function_decl)->offloadable; >for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer) > offloaded = rgn->type == GIMPLE_OMP_TARGET; > - bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1; > + bool is_simt > += (offloaded > + && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt), > +OMP_CLAUSE__SIMT_) > + && safelen_int > 1); Here computation of 'offloaded' is no longer needed, because presence of OMP_CLAUSE__SIMT_ would imply that. Removed in the revised patch. I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" lowering. Not sure if that's intentional -- as I understand it's possible to look at the enclosing context's clauses because 'omp ordered' must be closely nested with the corresponding loop. I've added a FIXME in the patch. Alexander * internal-fn.c (expand_GOMP_USE_SIMT): New function. * tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands. (omp_clause_code_name): Add _simt_ name. (walk_tree_1): Handle OMP_CLAUSE__SIMT_. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_. * omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_. (scan_omp_simd): New function. (scan_omp_1_stmt): Use it in target regions if needed. (omp_max_vf): Don't max with omp_max_simt_vf. (lower_rec_simd_input_clauses): Use omp_max_simt_vf if OMP_CLAUSE__SIMT_ is present. (lower_rec_input_clauses): Compute maybe_simt from presence of OMP_CLAUSE__SIMT_. (lower_lastprivate_clauses): Likewise. (expand_omp_simd): Likewise. (execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT. * internal-fn.def (GOMP_USE_SIMT): New internal function. * tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_. diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index 6cd8522..b1dbc98 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -158,6 +158,14 @@ expand_ANNOTATE (internal_fn, gcall *) gcc_unreachable (); } +/* This should get expanded in omp_device_lower pass. */ + +static void +expand_GOMP_USE_SIMT (internal_fn, gcall *) +{ + gcc_unreachable (); +} + /* Lane index on SIMT targets: thread index in the warp on NVPTX. On targets without SIMT execution this should be expanded in omp_device_lower pass. */ diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index f055230..9a03e17 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -141,6 +141,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs, unary) DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary) DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary) +DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 6c52bff..eab0af5 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -278,6
Re: gomp-nvptx branch - middle-end changes
On Fri, Nov 11, 2016 at 12:28:16PM +0300, Alexander Monakov wrote: > On Fri, 11 Nov 2016, Jakub Jelinek wrote: > > > On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote: > > > On Fri, 11 Nov 2016, Jakub Jelinek wrote: > > > [...] > > > > the intended outlining of SIMT regions for PTX offloading done (IMHO the > > > > best place to do that is in omp expansion, not gimplification) > > > > > > Sorry, I couldn't find a good way to implement that during omp expansion. > > > The > > > reason I went for gimplification is automatic discovery of sharing > > > clauses - > > > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] > > > without > > > gimplifier's help. Does this sound sensible? > > > > Sure, for discovery of needed sharing clauses the gimplifier has the right > > infrastructure. But that doesn't mean you can't add those clauses at > > gimplification time and do the outlining at omp expansion time. > > That is what is done for omp parallel, task etc. as well. If the standard > > OpenMP clauses can't serve that purpose, there is always the possibility of > > adding further internal clauses, that would e.g. be only considered for the > > SIMT stuff. For the outlining, our current infrastructure really wants to > > have CFG etc., something you don't have at gimplification time. > > Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to > inject > a parallel region with an artificial _simtreg_ clause, transforming > > #pragma omp simd > for (...) > > into > > #pragma omp parallel _simtreg_ > #pragma omp simd > for (...) > > and then expansion of 'omp parallel' can check presence of _simtreg_ clause > and > emit a direct call rather than an invocation of GOMP_parallel. Well, I meant keep #pragma omp simd as is, just add some data-sharing-like clauses _simt_shared_(x) or whatever you need, then the omplower versioning patch I've posted could e.g. drop those _simt_shared_ or whatever else you need clauses for the omp simd without _simt_ clause, omp lowering then would do whatever is needed for those _simt_shared_ clauses and finally omp expansion would outline it. Adding omp parallel around the omp simd is just weird, it has nothing to do with omp parallel. Jakub
Re: gomp-nvptx branch - middle-end changes
On Fri, 11 Nov 2016, Jakub Jelinek wrote: > On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote: > > On Fri, 11 Nov 2016, Jakub Jelinek wrote: > > [...] > > > the intended outlining of SIMT regions for PTX offloading done (IMHO the > > > best place to do that is in omp expansion, not gimplification) > > > > Sorry, I couldn't find a good way to implement that during omp expansion. > > The > > reason I went for gimplification is automatic discovery of sharing clauses - > > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] > > without > > gimplifier's help. Does this sound sensible? > > Sure, for discovery of needed sharing clauses the gimplifier has the right > infrastructure. But that doesn't mean you can't add those clauses at > gimplification time and do the outlining at omp expansion time. > That is what is done for omp parallel, task etc. as well. If the standard > OpenMP clauses can't serve that purpose, there is always the possibility of > adding further internal clauses, that would e.g. be only considered for the > SIMT stuff. For the outlining, our current infrastructure really wants to > have CFG etc., something you don't have at gimplification time. Yes, that is exactly what I'm doing. I'm first tweaking the gimplifier to inject a parallel region with an artificial _simtreg_ clause, transforming #pragma omp simd for (...) into #pragma omp parallel _simtreg_ #pragma omp simd for (...) and then expansion of 'omp parallel' can check presence of _simtreg_ clause and emit a direct call rather than an invocation of GOMP_parallel. (a few days ago I've sent you privately a patch implementing the above) Thanks. Alexander
Re: gomp-nvptx branch - middle-end changes
On Fri, Nov 11, 2016 at 11:52:58AM +0300, Alexander Monakov wrote: > On Fri, 11 Nov 2016, Jakub Jelinek wrote: > [...] > > the intended outlining of SIMT regions for PTX offloading done (IMHO the > > best place to do that is in omp expansion, not gimplification) > > Sorry, I couldn't find a good way to implement that during omp expansion. The > reason I went for gimplification is automatic discovery of sharing clauses - > I'm assuming in expansion it's very hard to try and fill omp_data_[sio] > without > gimplifier's help. Does this sound sensible? Sure, for discovery of needed sharing clauses the gimplifier has the right infrastructure. But that doesn't mean you can't add those clauses at gimplification time and do the outlining at omp expansion time. That is what is done for omp parallel, task etc. as well. If the standard OpenMP clauses can't serve that purpose, there is always the possibility of adding further internal clauses, that would e.g. be only considered for the SIMT stuff. For the outlining, our current infrastructure really wants to have CFG etc., something you don't have at gimplification time. Jakub
Re: gomp-nvptx branch - middle-end changes
On Fri, 11 Nov 2016, Jakub Jelinek wrote: [...] > the intended outlining of SIMT regions for PTX offloading done (IMHO the > best place to do that is in omp expansion, not gimplification) Sorry, I couldn't find a good way to implement that during omp expansion. The reason I went for gimplification is automatic discovery of sharing clauses - I'm assuming in expansion it's very hard to try and fill omp_data_[sio] without gimplifier's help. Does this sound sensible? Thanks. Alexander
Re: gomp-nvptx branch - middle-end changes
On Thu, Nov 10, 2016 at 08:12:27PM +0300, Alexander Monakov wrote: > gcc/ > * internal-fn.c (expand_GOMP_SIMT_LANE): New. > (expand_GOMP_SIMT_VF): New. > (expand_GOMP_SIMT_LAST_LANE): New. > (expand_GOMP_SIMT_ORDERED_PRED): New. > (expand_GOMP_SIMT_VOTE_ANY): New. > (expand_GOMP_SIMT_XCHG_BFLY): New. > (expand_GOMP_SIMT_XCHG_IDX): New. > * internal-fn.def (GOMP_SIMT_LANE): New. > (GOMP_SIMT_VF): New. > (GOMP_SIMT_LAST_LANE): New. > (GOMP_SIMT_ORDERED_PRED): New. > (GOMP_SIMT_VOTE_ANY): New. > (GOMP_SIMT_XCHG_BFLY): New. > (GOMP_SIMT_XCHG_IDX): New. > * omp-low.c (omp_maybe_offloaded_ctx): New, outlined from... > (create_omp_child_function): ...here. Set "omp target entrypoint" > or "omp declare target" attribute based on is_gimple_omp_offloaded. > (omp_max_simt_vf): New. Use it... > (omp_max_vf): ...here. > (lower_rec_input_clauses): Add reduction lowering for SIMT execution. > (lower_lastprivate_clauses): Likewise, for "lastprivate" lowering. > (lower_omp_ordered): Likewise, for "ordered" lowering. > (expand_omp_simd): Add SIMT transforms. > (pass_data_lower_omp): Add PROP_gimple_lomp_dev. > (execute_omp_device_lower): New. > (pass_data_omp_device_lower): New. > (pass_omp_device_lower): New pass. > (make_pass_omp_device_lower): New. > * passes.def (pass_omp_device_lower): Position new pass. > * tree-pass.h (PROP_gimple_lomp_dev): Define. > (make_pass_omp_device_lower): Declare. Ok for trunk, once the needed corresponding config/nvptx bits are committed, with one nit below that needs immediate action and the rest can be resolved incrementally. I'd like to check in afterwards the attached patch, at least for now, so that non-offloaded SIMD code is less affected. Once you have the intended outlining of SIMT regions for PTX offloading done (IMHO the best place to do that is in omp expansion, not gimplification), you can either base it on that, or revert and do earlier. > + > +/* Return maximum SIMT width if offloading may target SIMT hardware. */ > + > +static int > +omp_max_simt_vf (void) > +{ > + if (!optimize) > +return 0; > + if (ENABLE_OFFLOADING) > +for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c; ) > + { > + if (!strncmp (c, "nvptx", strlen ("nvptx"))) > + return 32; > + else if ((c = strchr (c, ','))) > + c++; > + } > + return 0; > +} As discussed privately, this means one has to manually set OFFLOAD_TARGET_NAMES in the environment when invoking ./cc1 or ./cc1plus in order to match ./gcc -B ./ etc. behavior. I think it would be better to change the driver so that it sets OFFLOAD_TARGET_NAMES= in the environment when ENABLE_OFFLOADING, but -foffload option is used to disable all offloading and then in this function use the configured in offloading targets if ENABLE_OFFLOADING and OFFLOAD_TARGET_NAMES is not in the environment. Can be done incrementally. > + > /* Return maximum possible vectorization factor for the target. */ > > static int > @@ -4277,16 +4306,18 @@ omp_max_vf (void) >|| global_options_set.x_flag_tree_vectorize))) > return 1; > > + int vf = 1; >int vs = targetm.vectorize.autovectorize_vector_sizes (); >if (vs) > +vf = 1 << floor_log2 (vs); > + else > { > - vs = 1 << floor_log2 (vs); > - return vs; > + machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode); > + if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT) > + vf = GET_MODE_NUNITS (vqimode); > } > - machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode); > - if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT) > -return GET_MODE_NUNITS (vqimode); > - return 1; > + int svf = omp_max_simt_vf (); > + return MAX (vf, svf); Increasing the vf even for host in non-offloaded regions is undesirable. Can be partly solved by the attached patch I'm planning to apply incrementally, the other part is for the simd modifier of schedule clause, there I think what we want is use conditional expression (GOMP_USE_SIMT () ? omp_max_simt_vf () : omp_max_vf). I'll try to handle the schedule clause later. > +class pass_omp_device_lower : public gimple_opt_pass > +{ > +public: > + pass_omp_device_lower (gcc::context *ctxt) > +: gimple_opt_pass (pass_data_omp_device_lower, ctxt) > + {} > + > + /* opt_pass methods: */ > + virtual bool gate (function *fun) > +{ > + /* FIXME: inlining does not propagate the lomp_dev property. */ > + return 1 || !(fun->curr_properties & PROP_gimple_lomp_dev); Please change this into (ENABLE_OFFLOADING && (flag_openmp || in_lto)) for now, so that we don't waste compile time even when clearly it isn't needed, and incrementally change the inliner to propagate the property. Jakub 2016-11-11 Jakub Jelinek * internal-fn.c (expand_GOMP_USE
Re: gomp-nvptx branch - middle-end changes
gcc/ * internal-fn.c (expand_GOMP_SIMT_LANE): New. (expand_GOMP_SIMT_VF): New. (expand_GOMP_SIMT_LAST_LANE): New. (expand_GOMP_SIMT_ORDERED_PRED): New. (expand_GOMP_SIMT_VOTE_ANY): New. (expand_GOMP_SIMT_XCHG_BFLY): New. (expand_GOMP_SIMT_XCHG_IDX): New. * internal-fn.def (GOMP_SIMT_LANE): New. (GOMP_SIMT_VF): New. (GOMP_SIMT_LAST_LANE): New. (GOMP_SIMT_ORDERED_PRED): New. (GOMP_SIMT_VOTE_ANY): New. (GOMP_SIMT_XCHG_BFLY): New. (GOMP_SIMT_XCHG_IDX): New. * omp-low.c (omp_maybe_offloaded_ctx): New, outlined from... (create_omp_child_function): ...here. Set "omp target entrypoint" or "omp declare target" attribute based on is_gimple_omp_offloaded. (omp_max_simt_vf): New. Use it... (omp_max_vf): ...here. (lower_rec_input_clauses): Add reduction lowering for SIMT execution. (lower_lastprivate_clauses): Likewise, for "lastprivate" lowering. (lower_omp_ordered): Likewise, for "ordered" lowering. (expand_omp_simd): Add SIMT transforms. (pass_data_lower_omp): Add PROP_gimple_lomp_dev. (execute_omp_device_lower): New. (pass_data_omp_device_lower): New. (pass_omp_device_lower): New pass. (make_pass_omp_device_lower): New. * passes.def (pass_omp_device_lower): Position new pass. * tree-pass.h (PROP_gimple_lomp_dev): Define. (make_pass_omp_device_lower): Declare. diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index cbee97e..fd1cd8b 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -157,6 +157,132 @@ expand_ANNOTATE (internal_fn, gcall *) gcc_unreachable (); } +/* Lane index on SIMT targets: thread index in the warp on NVPTX. On targets + without SIMT execution this should be expanded in omp_device_lower pass. */ + +static void +expand_GOMP_SIMT_LANE (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + gcc_assert (targetm.have_omp_simt_lane ()); + emit_insn (targetm.gen_omp_simt_lane (target)); +} + +/* This should get expanded in omp_device_lower pass. */ + +static void +expand_GOMP_SIMT_VF (internal_fn, gcall *) +{ + gcc_unreachable (); +} + +/* Lane index of the first SIMT lane that supplies a non-zero argument. + This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the + lane that executed the last iteration for handling OpenMP lastprivate. */ + +static void +expand_GOMP_SIMT_LAST_LANE (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx cond = expand_normal (gimple_call_arg (stmt, 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], cond, mode); + gcc_assert (targetm.have_omp_simt_last_lane ()); + expand_insn (targetm.code_for_omp_simt_last_lane, 2, ops); +} + +/* Non-transparent predicate used in SIMT lowering of OpenMP "ordered". */ + +static void +expand_GOMP_SIMT_ORDERED_PRED (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx ctr = expand_normal (gimple_call_arg (stmt, 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], ctr, mode); + gcc_assert (targetm.have_omp_simt_ordered ()); + expand_insn (targetm.code_for_omp_simt_ordered, 2, ops); +} + +/* "Or" boolean reduction across SIMT lanes: return non-zero in all lanes if + any lane supplies a non-zero argument. */ + +static void +expand_GOMP_SIMT_VOTE_ANY (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx cond = expand_normal (gimple_call_arg (stmt, 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_operand ops[2]; + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], cond, mode); + gcc_assert (targetm.have_omp_simt_vote_any ()); + expand_insn (targetm.code_for_omp_simt_vote_any, 2, ops); +} + +/* Exchange between SIMT lanes with a "butterfly" pattern: source lane index + is destination lane index XOR given offset. */ + +static void +expand_GOMP_SIMT_XCHG_BFLY (internal_fn, gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + if (!lhs) +return; + + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + rtx src = expand_normal (gimple_call_arg (stmt, 0)); + rtx idx = expand_normal (gimple_call_arg (stmt, 1)); + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); + struct expand_op