Re: [Mesa-dev] [PATCH 01/26] nir: Add a writemask to store intrinsics.

2015-12-04 Thread Jason Ekstrand
On Wed, Dec 2, 2015 at 4:15 PM, Kenneth Graunke  wrote:
> Tessellation control shaders need to be careful when writing outputs.
> Because multiple threads can concurrently write the same output
> variables, we need to only write the exact components we were told.
>
> Traditionally, for sub-vector writes, we've read the whole vector,
> updated the temporary, and written the whole vector back.  This breaks
> down with concurrent access.
>
> This patch prepares the way for a solution by adding a writemask field
> to store_var intrinsics, as well as the other store intrinsics.  It then
> updates all produces to emit a writemask of "all channels enabled".  It
> updates nir_lower_io to copy the writemask to output store intrinsics.
>
> Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks
> by doing a read-modify-write cycle (which is safe, because local
> variables are specific to a single thread).
>
> This should have no functional change, since no one actually emits
> partial writemasks yet.
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/gallium/auxiliary/nir/tgsi_to_nir.c|  1 +
>  .../drivers/freedreno/ir3/ir3_compiler_nir.c   |  6 +++
>  src/glsl/nir/glsl_to_nir.cpp   |  2 +
>  src/glsl/nir/nir_builder.h |  4 +-
>  src/glsl/nir/nir_intrinsics.h  | 14 +++
>  src/glsl/nir/nir_lower_gs_intrinsics.c |  3 +-
>  src/glsl/nir/nir_lower_io.c|  2 +
>  src/glsl/nir/nir_lower_locals_to_regs.c|  2 +-
>  src/glsl/nir/nir_lower_var_copies.c|  1 +
>  src/glsl/nir/nir_lower_vars_to_ssa.c   | 46 
> --
>  src/glsl/nir/nir_validate.c|  1 +
>  src/mesa/program/prog_to_nir.c |  2 +
>  12 files changed, 63 insertions(+), 21 deletions(-)
>
> Technically, I don't think I need the ir3 changes here - it should work
> without any changes.  I think I was just overzealously preparing things
> to handle writemasks before I realized there shouldn't be any.
>
> But, it might be worth doing anyway.  Not sure.
>
> Jason and I have already talked about the fact that we're conflicting.
> I like what he's doing, and he suggested this; we just need to figure out
> what lands first.  I decided to send these out anyway for people to look
> at and start reviewing, since there's a lot to do there.  We'll sort it
> out in v2.
>
> diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
> b/src/gallium/auxiliary/nir/tgsi_to_nir.c
> index 86c2ffa..f5547c8 100644
> --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
> +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
> @@ -1894,6 +1894,7 @@ ttn_emit_instruction(struct ttn_compile *c)
> &tgsi_dst->Indirect : NULL;
>
>store->num_components = 4;
> +  store->const_index[0] = 0xf;
>store->variables[0] = ttn_array_deref(c, store, var, offset, indirect);
>store->src[0] = nir_src_for_reg(dest.dest.reg.reg);
>
> diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
> b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> index 8617704..3beed0e 100644
> --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> @@ -1314,6 +1314,9 @@ emit_intrinisic_store_var(struct ir3_compile *ctx, 
> nir_intrinsic_instr *intr)
> case nir_deref_array_type_direct:
> /* direct access does not require anything special: */
> for (int i = 0; i < intr->num_components; i++) {
> +   if (!(intr->const_index[0] & (1 << i)))
> +   continue;
> +
> unsigned n = darr->base_offset * 4 + i;
> compile_assert(ctx, n < arr->length);
> arr->arr[n] = src[i];
> @@ -1326,6 +1329,9 @@ emit_intrinisic_store_var(struct ir3_compile *ctx, 
> nir_intrinsic_instr *intr)
> struct ir3_instruction *addr =
> get_addr(ctx, get_src(ctx, 
> &darr->indirect)[0]);
> for (int i = 0; i < intr->num_components; i++) {
> +   if (!(intr->const_index[0] & (1 << i)))
> +   continue;
> +
> struct ir3_instruction *store;
> unsigned n = darr->base_offset * 4 + i;
> compile_assert(ctx, n < arr->length);
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 45d045c..5c84b0d 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -982,6 +982,7 @@ nir_visitor::visit(ir_call *ir)
>   nir_intrinsic_instr *store_instr =
>  nir_intrinsic_instr_create(shader, nir_intrinsic_store_var);
>   store_instr->num_components = 
> ir->return_deref->type->vector_elements;
> + store_instr->cons

Re: [Mesa-dev] [PATCH 01/26] nir: Add a writemask to store intrinsics.

2015-12-04 Thread Kenneth Graunke
On Friday, December 04, 2015 04:45:03 PM Jason Ekstrand wrote:
> On Wed, Dec 2, 2015 at 4:15 PM, Kenneth Graunke  wrote:
> > Tessellation control shaders need to be careful when writing outputs.
> > Because multiple threads can concurrently write the same output
> > variables, we need to only write the exact components we were told.
> >
> > Traditionally, for sub-vector writes, we've read the whole vector,
> > updated the temporary, and written the whole vector back.  This breaks
> > down with concurrent access.
> >
> > This patch prepares the way for a solution by adding a writemask field
> > to store_var intrinsics, as well as the other store intrinsics.  It then
> > updates all produces to emit a writemask of "all channels enabled".  It
> > updates nir_lower_io to copy the writemask to output store intrinsics.
> >
> > Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks
> > by doing a read-modify-write cycle (which is safe, because local
> > variables are specific to a single thread).
> >
> > This should have no functional change, since no one actually emits
> > partial writemasks yet.
> >
> > Signed-off-by: Kenneth Graunke 
> > ---
> >  src/gallium/auxiliary/nir/tgsi_to_nir.c|  1 +
> >  .../drivers/freedreno/ir3/ir3_compiler_nir.c   |  6 +++
> >  src/glsl/nir/glsl_to_nir.cpp   |  2 +
> >  src/glsl/nir/nir_builder.h |  4 +-
> >  src/glsl/nir/nir_intrinsics.h  | 14 +++
> >  src/glsl/nir/nir_lower_gs_intrinsics.c |  3 +-
> >  src/glsl/nir/nir_lower_io.c|  2 +
> >  src/glsl/nir/nir_lower_locals_to_regs.c|  2 +-
> >  src/glsl/nir/nir_lower_var_copies.c|  1 +
> >  src/glsl/nir/nir_lower_vars_to_ssa.c   | 46 
> > --
> >  src/glsl/nir/nir_validate.c|  1 +
> >  src/mesa/program/prog_to_nir.c |  2 +
> >  12 files changed, 63 insertions(+), 21 deletions(-)
> >
> > Technically, I don't think I need the ir3 changes here - it should work
> > without any changes.  I think I was just overzealously preparing things
> > to handle writemasks before I realized there shouldn't be any.
> >
> > But, it might be worth doing anyway.  Not sure.
> >
> > Jason and I have already talked about the fact that we're conflicting.
> > I like what he's doing, and he suggested this; we just need to figure out
> > what lands first.  I decided to send these out anyway for people to look
> > at and start reviewing, since there's a lot to do there.  We'll sort it
> > out in v2.
> >
> > diff --git a/src/gallium/auxiliary/nir/tgsi_to_nir.c 
> > b/src/gallium/auxiliary/nir/tgsi_to_nir.c
> > index 86c2ffa..f5547c8 100644
> > --- a/src/gallium/auxiliary/nir/tgsi_to_nir.c
> > +++ b/src/gallium/auxiliary/nir/tgsi_to_nir.c
> > @@ -1894,6 +1894,7 @@ ttn_emit_instruction(struct ttn_compile *c)
> > &tgsi_dst->Indirect : NULL;
> >
> >store->num_components = 4;
> > +  store->const_index[0] = 0xf;
> >store->variables[0] = ttn_array_deref(c, store, var, offset, 
> > indirect);
> >store->src[0] = nir_src_for_reg(dest.dest.reg.reg);
> >
> > diff --git a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c 
> > b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> > index 8617704..3beed0e 100644
> > --- a/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> > +++ b/src/gallium/drivers/freedreno/ir3/ir3_compiler_nir.c
> > @@ -1314,6 +1314,9 @@ emit_intrinisic_store_var(struct ir3_compile *ctx, 
> > nir_intrinsic_instr *intr)
> > case nir_deref_array_type_direct:
> > /* direct access does not require anything special: */
> > for (int i = 0; i < intr->num_components; i++) {
> > +   if (!(intr->const_index[0] & (1 << i)))
> > +   continue;
> > +
> > unsigned n = darr->base_offset * 4 + i;
> > compile_assert(ctx, n < arr->length);
> > arr->arr[n] = src[i];
> > @@ -1326,6 +1329,9 @@ emit_intrinisic_store_var(struct ir3_compile *ctx, 
> > nir_intrinsic_instr *intr)
> > struct ir3_instruction *addr =
> > get_addr(ctx, get_src(ctx, 
> > &darr->indirect)[0]);
> > for (int i = 0; i < intr->num_components; i++) {
> > +   if (!(intr->const_index[0] & (1 << i)))
> > +   continue;
> > +
> > struct ir3_instruction *store;
> > unsigned n = darr->base_offset * 4 + i;
> > compile_assert(ctx, n < arr->length);
> > diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> > index 45d045c..5c84b0d 100644
> > --- a/src/glsl/nir/glsl_to_nir.cpp
> > +++ b/src/glsl/nir/glsl_to_nir.cpp
> > @@ -982,6 +982,7 @@ nir_visito