On Tue, Oct 31, 2017 at 10:55 AM, Neil Roberts <[email protected]> wrote:
> Previously the values were calculated by just shifting ~0 by the > invocation ID. This would end up including bits that are higher than > gl_SubGroupSizeARB. The corresponding CTS test effectively requires that > these high bits be zero so it was failing. There is a Piglit test as > well but this appears to checking the wrong values so it passes. > > For the two greater-than bitmasks, this patch adds an extra mask with > (~0>>(64-gl_SubGroupSizeARB)) to force these bits to zero. > We had a big discussion about this in the office yesterday. :-) From my reading of the GL_ARB_shader_ballot and GL_NV_shader_thread_group specs, it looked like the current behavior was correct. However, I'm very glad to see that I was wrong because it means that Vulkan and GL are consistent with each other. :) It'll be a bit painful to rebase my subgroup work on top of this but I think I'd prefer it if we land this first as it will actually make some things a bit simpler even though it will conflict madly. > Fixes: KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680#c3 > Signed-off-by: Neil Roberts <[email protected]> > --- > src/compiler/nir/nir_opt_intrinsics.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/nir/nir_opt_intrinsics.c > b/src/compiler/nir/nir_opt_intrinsics.c > index 26a0f96..d5fdc51 100644 > --- a/src/compiler/nir/nir_opt_intrinsics.c > +++ b/src/compiler/nir/nir_opt_intrinsics.c > @@ -28,6 +28,26 @@ > * \file nir_opt_intrinsics.c > */ > > +static nir_ssa_def * > +high_subgroup_mask(nir_builder *b, > + nir_ssa_def *count, > + uint64_t base_mask) > +{ > + /* group_mask could probably be calculated more efficiently but we > want to > + * be sure not to shift by 64 if the subgroup size is 64 because the > GLSL > + * shift operator is undefined in that case. Yeah, I'm pretty sure our hardware will just not shift in that case. > In any case if we were worried > + * about efficency this should probably be done further down because > the > + * subgroup size is likely to be known at compile time. > I wouldn't be worried about efficiency. As I said in an earlier patch, this becomes a constant with my series. Reviewed-by: Jason Ekstrand <[email protected]> Cc: [email protected] Please push soon or maybe I'll just push it myself. > + */ > + nir_ssa_def *subgroup_size = nir_load_subgroup_size(b); > + nir_ssa_def *all_bits = nir_imm_int64(b, ~0ull); > + nir_ssa_def *shift = nir_isub(b, nir_imm_int(b, 64), subgroup_size); > + nir_ssa_def *group_mask = nir_ushr(b, all_bits, shift); > + nir_ssa_def *higher_bits = nir_ishl(b, nir_imm_int64(b, base_mask), > count); > + > + return nir_iand(b, higher_bits, group_mask); > +} > + > static bool > opt_intrinsics_impl(nir_function_impl *impl) > { > @@ -95,10 +115,10 @@ opt_intrinsics_impl(nir_function_impl *impl) > replacement = nir_ishl(&b, nir_imm_int64(&b, 1ull), count); > break; > case nir_intrinsic_load_subgroup_ge_mask: > - replacement = nir_ishl(&b, nir_imm_int64(&b, ~0ull), > count); > + replacement = high_subgroup_mask(&b, count, ~0ull); > break; > case nir_intrinsic_load_subgroup_gt_mask: > - replacement = nir_ishl(&b, nir_imm_int64(&b, ~1ull), > count); > + replacement = high_subgroup_mask(&b, count, ~1ull); > break; > case nir_intrinsic_load_subgroup_le_mask: > replacement = nir_inot(&b, nir_ishl(&b, nir_imm_int64(&b, > ~1ull), count)); > -- > 2.9.5 > > _______________________________________________ > mesa-dev mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
