On Mon, Jul 10, 2017 at 3:50 PM, Matt Turner <matts...@gmail.com> wrote: > On Mon, Jul 10, 2017 at 1:10 PM, Connor Abbott <cwabbo...@gmail.com> wrote: >> On Thu, Jul 6, 2017 at 4:48 PM, Matt Turner <matts...@gmail.com> wrote: >>> We already had a channel_num system value, which I'm renaming to >>> subgroup_invocation to match the rest of the new system values. >>> >>> Note that while ballotARB(true) will return zeros in the high 32-bits on >>> systems where gl_SubGroupSizeARB <= 32, the gl_SubGroup??MaskARB >>> variables do not consider whether channels are enabled. See issue (1) of >>> ARB_shader_ballot. >>> --- >>> src/compiler/nir/nir.c | 4 ++++ >>> src/compiler/nir/nir_intrinsics.h | 8 +++++++- >>> src/compiler/nir/nir_lower_system_values.c | 28 >>> ++++++++++++++++++++++++++++ >>> src/intel/compiler/brw_fs_nir.cpp | 2 +- >>> src/intel/compiler/brw_nir_intrinsics.c | 4 ++-- >>> 5 files changed, 42 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/compiler/nir/nir.c b/src/compiler/nir/nir.c >>> index 491b908396..9827e129ca 100644 >>> --- a/src/compiler/nir/nir.c >>> +++ b/src/compiler/nir/nir.c >>> @@ -1908,6 +1908,10 @@ nir_intrinsic_from_system_value(gl_system_value val) >>> return nir_intrinsic_load_helper_invocation; >>> case SYSTEM_VALUE_VIEW_INDEX: >>> return nir_intrinsic_load_view_index; >>> + case SYSTEM_VALUE_SUBGROUP_SIZE: >>> + return nir_intrinsic_load_subgroup_size; >>> + case SYSTEM_VALUE_SUBGROUP_INVOCATION: >>> + return nir_intrinsic_load_subgroup_invocation; >>> default: >>> unreachable("system value does not directly correspond to >>> intrinsic"); >>> } >>> diff --git a/src/compiler/nir/nir_intrinsics.h >>> b/src/compiler/nir/nir_intrinsics.h >>> index 6c6ba4cf59..96ecfbc338 100644 >>> --- a/src/compiler/nir/nir_intrinsics.h >>> +++ b/src/compiler/nir/nir_intrinsics.h >>> @@ -344,10 +344,16 @@ SYSTEM_VALUE(work_group_id, 3, 0, xx, xx, xx) >>> SYSTEM_VALUE(user_clip_plane, 4, 1, UCP_ID, xx, xx) >>> SYSTEM_VALUE(num_work_groups, 3, 0, xx, xx, xx) >>> SYSTEM_VALUE(helper_invocation, 1, 0, xx, xx, xx) >>> -SYSTEM_VALUE(channel_num, 1, 0, xx, xx, xx) >>> SYSTEM_VALUE(alpha_ref_float, 1, 0, xx, xx, xx) >>> SYSTEM_VALUE(layer_id, 1, 0, xx, xx, xx) >>> SYSTEM_VALUE(view_index, 1, 0, xx, xx, xx) >>> +SYSTEM_VALUE(subgroup_size, 1, 0, xx, xx, xx) >>> +SYSTEM_VALUE(subgroup_invocation, 1, 0, xx, xx, xx) >>> +SYSTEM_VALUE(subgroup_eq_mask, 1, 0, xx, xx, xx) >>> +SYSTEM_VALUE(subgroup_ge_mask, 1, 0, xx, xx, xx) >>> +SYSTEM_VALUE(subgroup_gt_mask, 1, 0, xx, xx, xx) >>> +SYSTEM_VALUE(subgroup_le_mask, 1, 0, xx, xx, xx) >>> +SYSTEM_VALUE(subgroup_lt_mask, 1, 0, xx, xx, xx) >>> >>> /* Blend constant color values. Float values are clamped. */ >>> SYSTEM_VALUE(blend_const_color_r_float, 1, 0, xx, xx, xx) >>> diff --git a/src/compiler/nir/nir_lower_system_values.c >>> b/src/compiler/nir/nir_lower_system_values.c >>> index 810100a081..faf0c3c9da 100644 >>> --- a/src/compiler/nir/nir_lower_system_values.c >>> +++ b/src/compiler/nir/nir_lower_system_values.c >>> @@ -116,6 +116,34 @@ convert_block(nir_block *block, nir_builder *b) >>> nir_load_base_instance(b)); >>> break; >>> >>> + case SYSTEM_VALUE_SUBGROUP_EQ_MASK: >>> + case SYSTEM_VALUE_SUBGROUP_GE_MASK: >>> + case SYSTEM_VALUE_SUBGROUP_GT_MASK: >>> + case SYSTEM_VALUE_SUBGROUP_LE_MASK: >>> + case SYSTEM_VALUE_SUBGROUP_LT_MASK: { >>> + nir_ssa_def *count = nir_load_subgroup_invocation(b); >>> + >>> + switch (var->data.location) { >>> + case SYSTEM_VALUE_SUBGROUP_EQ_MASK: >>> + sysval = nir_ishl(b, nir_imm_int64(b, 1ull), count); >>> + break; >>> + case SYSTEM_VALUE_SUBGROUP_GE_MASK: >>> + sysval = nir_ishl(b, nir_imm_int64(b, ~0ull), count); >>> + break; >>> + case SYSTEM_VALUE_SUBGROUP_GT_MASK: >>> + sysval = nir_ishl(b, nir_imm_int64(b, ~1ull), count); >>> + break; >>> + case SYSTEM_VALUE_SUBGROUP_LE_MASK: >>> + sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~1ull), >>> count)); >>> + break; >>> + case SYSTEM_VALUE_SUBGROUP_LT_MASK: >>> + sysval = nir_inot(b, nir_ishl(b, nir_imm_int64(b, ~0ull), >>> count)); >>> + break; >>> + default: >>> + unreachable("you seriously can't tell this is unreachable?"); >>> + } >>> + } >>> + >> >> While this fine to do for both Intel and AMD, Nvidia actually has >> special system values for these, and AMD has special instructions for >> bitCount(foo & gl_SubGroupLtMask), so I think we should have actual > > So, just add this to the above switch statement? > > if (!b->shader->options->lower_subgroup_masks) > break; > > I'll also add the missing cases to nir_intrinsic_from_system_value() > and nir_system_value_from_intrinsic().
Well, it gets a little more complicated... with SPIR-V, you also have variants of these system values that return 4 32-bit integers. My plan was to lower them to the normal 64-bit intrinsics, and then have another pass lower those if necessary. So it might be better if you don't do any lowering here, and lower these to shifts in your intrinsic opt pass instead -- that makes it a little easier to share the lowering to shifts between GLSL and SPIR-V. I can adapt to whatever you want to do though. > >> nir_load_subgroup_*_mask intrinsics for these. Also, that way you can >> use the same shrinking logic to turn these into 32-bit shifts on >> Intel. > > The channel liveness doesn't affect SubGroup*Mask. See Issue (1) of > ARB_shader_ballot (note Note in my commit message mentions this). It > cites NV_shader_thread_group, which says > > The value of gl_ThreadEqMaskNV, gl_ThreadGeMaskNV, gl_ThreadGtMaskNV, > gl_ThreadLeMaskNV and gl_ThreadLtMaskNV are derived from the value of > gl_ThreadInWarpNV using simple bit-shift arithmetic, they don't take into > account the value of the thread group active mask. For example, if the > application wants a bitfield in which bits lower or equal to the current > thread id are set only for active threads, the result of gl_ThreadLeMaskNV > will need to be ANDed with the thread group active mask. > > So it needs to be a 64-bit shift at least for GT/GE. That's true, but on Intel, no thread has an ID greater than 31, so the high 32 bits of all the masks are still pre-determined. So my suggestion was to turn gl_SubGroupLtMask into pack2x32Uint(gl_SubGroupLtMask32Bit, 0), etc. and then implement gl_SubGroupLtMask32Bit etc. with 32-bit shifts. On the face of it, it seems like more instructions, but since most use-cases of these system values are going to involve and-ing them with something, it'll probably help in practice. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev