Re: [Mesa-dev] [PATCH] nir: fix bit_size in lower indirect derefs.

2019-04-25 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Thu, Apr 25, 2019 at 9:50 PM Dave Airlie  wrote:

> From: Dave Airlie 
>
> This fixes a case where we are expecting 64-bit but generate
> 32-bit consts and validate gets angry.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/compiler/nir/nir_lower_indirect_derefs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/nir_lower_indirect_derefs.c
> b/src/compiler/nir/nir_lower_indirect_derefs.c
> index 1e1e84fb3b0..58365628885 100644
> --- a/src/compiler/nir/nir_lower_indirect_derefs.c
> +++ b/src/compiler/nir/nir_lower_indirect_derefs.c
> @@ -51,7 +51,7 @@ emit_indirect_load_store_deref(nir_builder *b,
> nir_intrinsic_instr *orig_instr,
>nir_deref_instr *deref = *deref_arr;
>assert(deref->deref_type == nir_deref_type_array);
>
> -  nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_int(b,
> mid)));
> +  nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_intN_t(b,
> mid, parent->dest.ssa.bit_size)));
>emit_indirect_load_store_deref(b, orig_instr, parent, deref_arr,
>   start, mid, _dest, src);
>nir_push_else(b, NULL);
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nir: fix bit_size in lower indirect derefs.

2019-04-25 Thread Dave Airlie
From: Dave Airlie 

This fixes a case where we are expecting 64-bit but generate
32-bit consts and validate gets angry.

Signed-off-by: Dave Airlie 
---
 src/compiler/nir/nir_lower_indirect_derefs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_lower_indirect_derefs.c 
b/src/compiler/nir/nir_lower_indirect_derefs.c
index 1e1e84fb3b0..58365628885 100644
--- a/src/compiler/nir/nir_lower_indirect_derefs.c
+++ b/src/compiler/nir/nir_lower_indirect_derefs.c
@@ -51,7 +51,7 @@ emit_indirect_load_store_deref(nir_builder *b, 
nir_intrinsic_instr *orig_instr,
   nir_deref_instr *deref = *deref_arr;
   assert(deref->deref_type == nir_deref_type_array);
 
-  nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_int(b, mid)));
+  nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_intN_t(b, mid, 
parent->dest.ssa.bit_size)));
   emit_indirect_load_store_deref(b, orig_instr, parent, deref_arr,
  start, mid, _dest, src);
   nir_push_else(b, NULL);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops

2019-04-25 Thread Alyssa Rosenzweig
> iand and ior are commutative, so you don't need both.

--Wait, woaaah, the algebraic generator respects that? Super neat, thank
you!

> Especially without instruction count data

(I'm assuming I won't be able to do shader-db on my hw at this point..)

> For example, if the only use of inot(...some logic...) is an
> if-condition, we don't want to re-write it.  Our backend will just
> invert the "polarity" of the conditional branch. 

Out of curiousity, what's the specific issue? Midgard also specifies
branch polarity, though I currently hardcode to true since I'm not
convinced it makes a difference.

> My gut tells me that doing this in
> the backend with something like Eric's NOLTIS is the right way to go.

I'm not sure what NOLTIS is, sorry.

Would you ack a change adding the ops to nir_opcode.py but not adding an
opt passes? I have a backend algebra pass which I'm happy to move this
into (and I think it would be a win on Midgard regardless); I just
(AFAIK) need the ops in NIR for the algebra pass to work at all. [I
don't want to duplicate this infrastructure to run over the machine IR,
which is quite limited since I trust the input NIR to be good.]

Thank you for the comments.

> This is also part of the reason that I've never sent out some other
> patches that I have that convert certain kinds of logic operations into
> other things.  For example,
> 
># True iff a == -1 and b == 0
>(('iand', 'a@bool32', ('inot', 'b@bool32)), ('ilt', a, b)),

...Cute. Point taken :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops

2019-04-25 Thread Alyssa Rosenzweig
> We can support all of these with source modifiers because the above three
> aren't really "dest invertable"...  For us, they'd be
> 
> ~src0 | ~src1
> ~src0 & ~src1
> ~src0 ^ ~src1
> 
> Is it really dest_invertable or both_srcs_invertable? :-)

Sure, I wasn't sure how other drivers would want to handle this, if at
all. I'm rather regretting not slapping an "RFC" tag on this, oh well :)

> Also worth noting that I've considered adding a not modifier to NIR (if
> source modifiers are a thing we actually want at that level).  Over-all,
> I'm a little uncertain if these need to be their own ops or not...

It's worth noting that, as far as I know, the instructions added here
are the _only_ ones that Midgard can do -- our source modifiers only
apply to floating-point ops (int ops use the bits to distinguish
sign-ext/zero-ext). The not's in these ops is baked right into the
instruction.

It'd be a simple lowering pass either way, of course.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/mesa: fix assertion failure in st_tgsi_lower_yuv

2019-04-25 Thread Marek Olšák
From: Marek Olšák 

src/mesa/state_tracker/st_tgsi_lower_yuv.c:68: void reg_dst(struct
 tgsi_full_dst_register *, const struct tgsi_full_dst_register *, unsigned
 int): assertion "dst->Register.WriteMask" failed

Cc: 19.0 
---
 src/mesa/state_tracker/st_tgsi_lower_yuv.c | 46 +-
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/src/mesa/state_tracker/st_tgsi_lower_yuv.c 
b/src/mesa/state_tracker/st_tgsi_lower_yuv.c
index 6acd173adc9..b998118897f 100644
--- a/src/mesa/state_tracker/st_tgsi_lower_yuv.c
+++ b/src/mesa/state_tracker/st_tgsi_lower_yuv.c
@@ -262,45 +262,53 @@ yuv_to_rgb(struct tgsi_transform_context *tctx,
inst.Instruction.Saturate = 0;
inst.Instruction.NumDstRegs = 1;
inst.Instruction.NumSrcRegs = 2;
reg_dst([0], >tmp[A].dst, TGSI_WRITEMASK_XYZ);
reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, _));
reg_src([1], >imm[3], SWIZ(X, Y, Z, _));
inst.Src[1].Register.Negate = 1;
tctx->emit_instruction(tctx, );
 
/* DP3 dst.x, tmpA, imm[0] */
-   inst = dp3_instruction();
-   reg_dst([0], dst, TGSI_WRITEMASK_X);
-   reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W));
-   reg_src([1], >imm[0], SWIZ(X, Y, Z, W));
-   tctx->emit_instruction(tctx, );
+   if (dst->Register.WriteMask & TGSI_WRITEMASK_X) {
+  inst = dp3_instruction();
+  reg_dst([0], dst, TGSI_WRITEMASK_X);
+  reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W));
+  reg_src([1], >imm[0], SWIZ(X, Y, Z, W));
+  tctx->emit_instruction(tctx, );
+   }
 
/* DP3 dst.y, tmpA, imm[1] */
-   inst = dp3_instruction();
-   reg_dst([0], dst, TGSI_WRITEMASK_Y);
-   reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W));
-   reg_src([1], >imm[1], SWIZ(X, Y, Z, W));
-   tctx->emit_instruction(tctx, );
+   if (dst->Register.WriteMask & TGSI_WRITEMASK_Y) {
+  inst = dp3_instruction();
+  reg_dst([0], dst, TGSI_WRITEMASK_Y);
+  reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W));
+  reg_src([1], >imm[1], SWIZ(X, Y, Z, W));
+  tctx->emit_instruction(tctx, );
+   }
 
/* DP3 dst.z, tmpA, imm[2] */
-   inst = dp3_instruction();
-   reg_dst([0], dst, TGSI_WRITEMASK_Z);
-   reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W));
-   reg_src([1], >imm[2], SWIZ(X, Y, Z, W));
-   tctx->emit_instruction(tctx, );
+   if (dst->Register.WriteMask & TGSI_WRITEMASK_Z) {
+  inst = dp3_instruction();
+  reg_dst([0], dst, TGSI_WRITEMASK_Z);
+  reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W));
+  reg_src([1], >imm[2], SWIZ(X, Y, Z, W));
+  tctx->emit_instruction(tctx, );
+   }
 
/* MOV dst.w, imm[0].x */
-   inst = mov_instruction();
-   reg_dst([0], dst, TGSI_WRITEMASK_W);
-   reg_src([0], >imm[3], SWIZ(_, _, _, W));
-   tctx->emit_instruction(tctx, );
+   if (dst->Register.WriteMask & TGSI_WRITEMASK_W) {
+  inst = mov_instruction();
+  reg_dst([0], dst, TGSI_WRITEMASK_W);
+  reg_src([0], >imm[3], SWIZ(_, _, _, W));
+  tctx->emit_instruction(tctx, );
+   }
 }
 
 static void
 lower_nv12(struct tgsi_transform_context *tctx,
struct tgsi_full_instruction *originst)
 {
struct tgsi_yuv_transform *ctx = tgsi_yuv_transform(tctx);
struct tgsi_full_instruction inst;
struct tgsi_full_src_register *coord = >Src[0];
unsigned samp = originst->Src[1].Register.Index;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers

2019-04-25 Thread Marek Olšák
From: Marek Olšák 

It's done by:
- decrease the number of frames in flight by 1
- flush before throttling in SwapBuffers
  (instead of wait-then-flush, do flush-then-wait)

The improvement is apparent with Unigine Heaven.

Previously:
draw frame 2
wait frame 0
flush frame 2
present frame 2

The input lag is 2 frames.

Now:
draw frame 2
flush frame 2
wait frame 1
present frame 2

The input lag is 1 frame. Flushing is done before waiting, because
otherwise the device would be idle after waiting.

Nine is affected because it also uses the pipe cap.
---
 src/gallium/auxiliary/util/u_screen.c |  2 +-
 src/gallium/state_trackers/dri/dri_drawable.c | 20 +--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_screen.c 
b/src/gallium/auxiliary/util/u_screen.c
index 27f51e0898e..410f17421e6 100644
--- a/src/gallium/auxiliary/util/u_screen.c
+++ b/src/gallium/auxiliary/util/u_screen.c
@@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen 
*pscreen,
case PIPE_CAP_MAX_VARYINGS:
   return 8;
 
case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK:
   return 0;
 
case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES:
   return 0;
 
case PIPE_CAP_MAX_FRAMES_IN_FLIGHT:
-  return 2;
+  return 1;
 
case PIPE_CAP_DMABUF:
 #ifdef PIPE_OS_LINUX
   return 1;
 #else
   return 0;
 #endif
 
default:
   unreachable("bad PIPE_CAP_*");
diff --git a/src/gallium/state_trackers/dri/dri_drawable.c 
b/src/gallium/state_trackers/dri/dri_drawable.c
index 26bfdbecc53..c1de3bed9dd 100644
--- a/src/gallium/state_trackers/dri/dri_drawable.c
+++ b/src/gallium/state_trackers/dri/dri_drawable.c
@@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv,
*
* This pulls a fence off the throttling queue and waits for it if the
* number of fences on the throttling queue has reached the desired
* number.
*
* Then flushes to insert a fence at the current rendering position, and
* pushes that fence on the queue. This requires that the 
st_context_iface
* flush method returns a fence even if there are no commands to flush.
*/
   struct pipe_screen *screen = drawable->screen->base.screen;
-  struct pipe_fence_handle *fence;
+  struct pipe_fence_handle *oldest_fence, *new_fence = NULL;
 
-  fence = swap_fences_pop_front(drawable);
-  if (fence) {
- (void) screen->fence_finish(screen, NULL, fence, 
PIPE_TIMEOUT_INFINITE);
- screen->fence_reference(screen, , NULL);
-  }
+  st->flush(st, flush_flags, _fence);
 
-  st->flush(st, flush_flags, );
+  oldest_fence = swap_fences_pop_front(drawable);
+  if (oldest_fence) {
+ screen->fence_finish(screen, NULL, oldest_fence, 
PIPE_TIMEOUT_INFINITE);
+ screen->fence_reference(screen, _fence, NULL);
+  }
 
-  if (fence) {
- swap_fences_push_back(drawable, fence);
- screen->fence_reference(screen, , NULL);
+  if (new_fence) {
+ swap_fences_push_back(drawable, new_fence);
+ screen->fence_reference(screen, _fence, NULL);
   }
}
else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) {
   st->flush(st, flush_flags, NULL);
}
 
if (drawable) {
   drawable->flushing = FALSE;
}
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] radeonsi: don't ignore PIPE_FLUSH_ASYNC

2019-04-25 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_fence.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_fence.c 
b/src/gallium/drivers/radeonsi/si_fence.c
index 3d23597413c..ffda98d2834 100644
--- a/src/gallium/drivers/radeonsi/si_fence.c
+++ b/src/gallium/drivers/radeonsi/si_fence.c
@@ -566,21 +566,21 @@ static void si_flush_from_st(struct pipe_context *ctx,
multi_fence->fine = fine;
fine.buf = NULL;
 
if (flags & TC_FLUSH_ASYNC) {
util_queue_fence_signal(_fence->ready);

tc_unflushed_batch_token_reference(_fence->tc_token, NULL);
}
}
assert(!fine.buf);
 finish:
-   if (!(flags & PIPE_FLUSH_DEFERRED)) {
+   if (!(flags & (PIPE_FLUSH_DEFERRED | PIPE_FLUSH_ASYNC))) {
if (sctx->dma_cs)
ws->cs_sync_flush(sctx->dma_cs);
ws->cs_sync_flush(sctx->gfx_cs);
}
 }
 
 static void si_fence_server_signal(struct pipe_context *ctx,
   struct pipe_fence_handle *fence)
 {
struct si_context *sctx = (struct si_context *)ctx;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops

2019-04-25 Thread Ian Romanick
On 4/25/19 3:37 PM, Alyssa Rosenzweig wrote:
> In addition to the familiar iand/ior/ixor, some architectures feature
> destination-inverted versions inand/inor/inxor. Certain
> architectures also have source-inverted forms, dubbed iandnot/iornot
> here. Midgard has the all of these opcodes natively. Many arches have
> comparible features to implement some/all of the above. Paired with De
> Morgan's Laws, these opcodes allow anything of the form
> "~? (~?a [&|] ~?b)" to complete in one instruction.
> 
> This can be used to simplify some backend-specific code on affected
> architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on
> Gen8+").
> 
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Ian Romanick 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/nir/nir.h|  4 
>  src/compiler/nir/nir_opcodes.py   | 18 ++
>  src/compiler/nir/nir_opt_algebraic.py | 12 
>  3 files changed, 34 insertions(+)
> 
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index e878a63409d..3e01ec2cc06 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options {
> bool lower_hadd;
> bool lower_add_sat;
>  
> +   /* Set if inand/inor/inxor and iandnot/iornot supported respectively */
> +   bool bitwise_dest_invertable;
> +   bool bitwise_src_invertable;
> +
> /**
>  * Should nir_lower_io() create load_interpolated_input intrinsics?
>  *
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d35d820aa5b..f9d92afb53e 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 & 
> src1")
>  binop("ior", tuint, commutative + associative, "src0 | src1")
>  binop("ixor", tuint, commutative + associative, "src0 ^ src1")
>  
> +# inverted bitwise logic operators
> +#
> +# These variants of the above include bitwise NOTs either on the result of 
> the
> +# whole expression or on the latter operand. On some hardware (e.g. Midgard),
> +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be
> +# implemented as modifiers of the standard three. Along with appropriate
> +# algebraic passes, these should permit any permutation of inverses on AND/OR
> +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) = 
> ~a
> +# | b = b | ~a = iornot(b, a).
> +
> +binop("inand", tuint, commutative, "~(src0 & src1)")
> +binop("inor", tuint, commutative, "~(src0 | src1)")
> +binop("inxor", tuint, commutative, "~(src0 ^ src1)")
> +binop("iandnot", tuint, "", "src0 & (~src1)")
> +binop("iornot", tuint, "", "src0 & (~src1)")
> +
> +
> +
>  
>  # floating point logic operators
>  #
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index dad0545594f..6cb3e8cb950 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -1052,6 +1052,18 @@ late_optimizations = [
> (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', 
> b)), ('fadd', c, ('fmax', a, b))),
>  
> (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', 
> ('ior', a, b,
> +
> +   # We don't want to deal with inverted forms, so run this late. Any
> +   # combination of inverts on flags or output should result in a single
> +   # instruction if these are supported; cases not explicitly handled would
> +   # have been simplified via De Morgan's Law
> +   (('inot', ('iand', a, b)), ('inand', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ior', a, b)), ('inor', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ixor', a, b)), ('inxor', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('iand', ('inot', a), b), ('iandnot', b, a), 
> 'options->bitwise_src_invertable'),
> +   (('iand', a, ('inot', b)), ('iandnot', a, b), 
> 'options->bitwise_src_invertable'),

iand and ior are commutative, so you don't need both.

> +   (('ior', a, ('inot', b)), ('iornot', a, b), 
> 'options->bitwise_src_invertable'),
> +   (('ior', ('inot', a), b), ('iornot', b, a), 
> 'options->bitwise_src_invertable'),

Especially without instruction count data, I'm not very excited about
this.  When / how to do these translations is more complex than this.
For example, if the only use of inot(...some logic...) is an
if-condition, we don't want to re-write it.  Our backend will just
invert the "polarity" of the conditional branch.  What if the "...some
logic..." is used more than once?  My gut tells me that doing this in
the backend with something like Eric's NOLTIS is the right way to go.

I can try a shader-db run later.

This is also part of the reason that I've never sent out some other
patches that I have that convert certain kinds of logic operations into
other things.  For example,

   # True iff a == -1 and b == 0
   (('iand', 

Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops

2019-04-25 Thread Jason Ekstrand
On Thu, Apr 25, 2019 at 5:37 PM Alyssa Rosenzweig 
wrote:

> In addition to the familiar iand/ior/ixor, some architectures feature
> destination-inverted versions inand/inor/inxor. Certain
> architectures also have source-inverted forms, dubbed iandnot/iornot
> here. Midgard has the all of these opcodes natively. Many arches have
> comparible features to implement some/all of the above. Paired with De
> Morgan's Laws, these opcodes allow anything of the form
> "~? (~?a [&|] ~?b)" to complete in one instruction.
>
> This can be used to simplify some backend-specific code on affected
> architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on
> Gen8+").
>
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Ian Romanick 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/nir/nir.h|  4 
>  src/compiler/nir/nir_opcodes.py   | 18 ++
>  src/compiler/nir/nir_opt_algebraic.py | 12 
>  3 files changed, 34 insertions(+)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index e878a63409d..3e01ec2cc06 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options {
> bool lower_hadd;
> bool lower_add_sat;
>
> +   /* Set if inand/inor/inxor and iandnot/iornot supported respectively */
> +   bool bitwise_dest_invertable;
> +   bool bitwise_src_invertable;
>

Do we really need two booleans?  Commentary on that below


> +
> /**
>  * Should nir_lower_io() create load_interpolated_input intrinsics?
>  *
> diff --git a/src/compiler/nir/nir_opcodes.py
> b/src/compiler/nir/nir_opcodes.py
> index d35d820aa5b..f9d92afb53e 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0
> & src1")
>  binop("ior", tuint, commutative + associative, "src0 | src1")
>  binop("ixor", tuint, commutative + associative, "src0 ^ src1")
>
> +# inverted bitwise logic operators
> +#
> +# These variants of the above include bitwise NOTs either on the result
> of the
> +# whole expression or on the latter operand. On some hardware (e.g.
> Midgard),
> +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be
> +# implemented as modifiers of the standard three. Along with appropriate
> +# algebraic passes, these should permit any permutation of inverses on
> AND/OR
> +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b)))
> = ~a
> +# | b = b | ~a = iornot(b, a).
> +
> +binop("inand", tuint, commutative, "~(src0 & src1)")
> +binop("inor", tuint, commutative, "~(src0 | src1)")
> +binop("inxor", tuint, commutative, "~(src0 ^ src1)")
>

We can support all of these with source modifiers because the above three
aren't really "dest invertable"...  For us, they'd be

~src0 | ~src1
~src0 & ~src1
~src0 ^ ~src1

Is it really dest_invertable or both_srcs_invertable? :-)

Also worth noting that I've considered adding a not modifier to NIR (if
source modifiers are a thing we actually want at that level).  Over-all,
I'm a little uncertain if these need to be their own ops or not...

--Jason


> +binop("iandnot", tuint, "", "src0 & (~src1)")
> +binop("iornot", tuint, "", "src0 & (~src1)")
> +
> +
> +
>
>  # floating point logic operators
>  #
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index dad0545594f..6cb3e8cb950 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -1052,6 +1052,18 @@ late_optimizations = [
> (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)',
> '#c', b)), ('fadd', c, ('fmax', a, b))),
>
> (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot',
> ('ior', a, b,
> +
> +   # We don't want to deal with inverted forms, so run this late. Any
> +   # combination of inverts on flags or output should result in a single
> +   # instruction if these are supported; cases not explicitly handled
> would
> +   # have been simplified via De Morgan's Law
> +   (('inot', ('iand', a, b)), ('inand', a, b),
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ior', a, b)), ('inor', a, b),
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ixor', a, b)), ('inxor', a, b),
> 'options->bitwise_dest_invertable'),
> +   (('iand', ('inot', a), b), ('iandnot', b, a),
> 'options->bitwise_src_invertable'),
> +   (('iand', a, ('inot', b)), ('iandnot', a, b),
> 'options->bitwise_src_invertable'),
> +   (('ior', a, ('inot', b)), ('iornot', a, b),
> 'options->bitwise_src_invertable'),
> +   (('ior', ('inot', a), b), ('iornot', b, a),
> 'options->bitwise_src_invertable'),
>  ]
>
>  print(nir_algebraic.AlgebraicPass("nir_opt_algebraic",
> optimizations).render())
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> 

Re: [Mesa-dev] [PATCH 0/2] Add inverted bitwise forms to NIR

2019-04-25 Thread Alyssa Rosenzweig
Alright, good to know, thank you! :)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops

2019-04-25 Thread Rob Clark
On Thu, Apr 25, 2019 at 3:37 PM Alyssa Rosenzweig  wrote:
>
> In addition to the familiar iand/ior/ixor, some architectures feature
> destination-inverted versions inand/inor/inxor. Certain
> architectures also have source-inverted forms, dubbed iandnot/iornot
> here. Midgard has the all of these opcodes natively. Many arches have
> comparible features to implement some/all of the above. Paired with De
> Morgan's Laws, these opcodes allow anything of the form
> "~? (~?a [&|] ~?b)" to complete in one instruction.
>
> This can be used to simplify some backend-specific code on affected
> architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on
> Gen8+").
>
> Signed-off-by: Alyssa Rosenzweig 
> Cc: Ian Romanick 
> Cc: Kenneth Graunke 
> ---
>  src/compiler/nir/nir.h|  4 
>  src/compiler/nir/nir_opcodes.py   | 18 ++
>  src/compiler/nir/nir_opt_algebraic.py | 12 
>  3 files changed, 34 insertions(+)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index e878a63409d..3e01ec2cc06 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options {
> bool lower_hadd;
> bool lower_add_sat;
>
> +   /* Set if inand/inor/inxor and iandnot/iornot supported respectively */
> +   bool bitwise_dest_invertable;
> +   bool bitwise_src_invertable;


neat, this trick seems to work on a6xx (and I'd assume earlier gens
that use ir3 although I'd have to dust off some older blobs to check..
blob is less kind when it comes to retaining support for older hw)..

very-bikeshed-comment: split into two comments per compiler_options flag?

Reviewed-by: Rob Clark 


> +
> /**
>  * Should nir_lower_io() create load_interpolated_input intrinsics?
>  *
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index d35d820aa5b..f9d92afb53e 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 & 
> src1")
>  binop("ior", tuint, commutative + associative, "src0 | src1")
>  binop("ixor", tuint, commutative + associative, "src0 ^ src1")
>
> +# inverted bitwise logic operators
> +#
> +# These variants of the above include bitwise NOTs either on the result of 
> the
> +# whole expression or on the latter operand. On some hardware (e.g. Midgard),
> +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be
> +# implemented as modifiers of the standard three. Along with appropriate
> +# algebraic passes, these should permit any permutation of inverses on AND/OR
> +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) = 
> ~a
> +# | b = b | ~a = iornot(b, a).
> +
> +binop("inand", tuint, commutative, "~(src0 & src1)")
> +binop("inor", tuint, commutative, "~(src0 | src1)")
> +binop("inxor", tuint, commutative, "~(src0 ^ src1)")
> +binop("iandnot", tuint, "", "src0 & (~src1)")
> +binop("iornot", tuint, "", "src0 & (~src1)")
> +
> +
> +
>
>  # floating point logic operators
>  #
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index dad0545594f..6cb3e8cb950 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -1052,6 +1052,18 @@ late_optimizations = [
> (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', 
> b)), ('fadd', c, ('fmax', a, b))),
>
> (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', 
> ('ior', a, b,
> +
> +   # We don't want to deal with inverted forms, so run this late. Any
> +   # combination of inverts on flags or output should result in a single
> +   # instruction if these are supported; cases not explicitly handled would
> +   # have been simplified via De Morgan's Law
> +   (('inot', ('iand', a, b)), ('inand', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ior', a, b)), ('inor', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('inot', ('ixor', a, b)), ('inxor', a, b), 
> 'options->bitwise_dest_invertable'),
> +   (('iand', ('inot', a), b), ('iandnot', b, a), 
> 'options->bitwise_src_invertable'),
> +   (('iand', a, ('inot', b)), ('iandnot', a, b), 
> 'options->bitwise_src_invertable'),
> +   (('ior', a, ('inot', b)), ('iornot', a, b), 
> 'options->bitwise_src_invertable'),
> +   (('ior', ('inot', a), b), ('iornot', b, a), 
> 'options->bitwise_src_invertable'),
>  ]
>
>  print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", 
> optimizations).render())
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/2] Add inverted bitwise forms to NIR

2019-04-25 Thread Ilia Mirkin
On Thu, Apr 25, 2019 at 6:37 PM Alyssa Rosenzweig  wrote:
> Various combinations of these instructions are found on many
> architectures. They appear directly as-is on Midgard; some of them
> should be implementable on Intel Gen8+ via the source modifiers; I was
> told over IRC that AMD/Nouveau might have some of these as well. Rather

FWIW, many instructions NVIDIA GPUs support modifiers on either or
both parameters to an operation (esp the bitwise and arithmetic ops).
Whether the modifier is supported depends on various oddness (e.g.
whether the second arg is GPR vs const/imm20 vs imm32), so I doubt
this would be particularly useful there. Furthermore, the nouveau
compiler already has a modifier propagation pass which attempts to
combine the NOT/NEG/SAT ops with uses that can consume them directly.

It's unclear to me whether NIR should directly support all 16 boolean
operations or not. I can see arguments both for and against. Just
wanted to point out the NVIDIA situation since it was referenced
explicitly.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)

2019-04-25 Thread Dave Airlie
I ran this via the intel-ci and it has 0 regressions, I've also looked
at this before in v1.

Reviewed-by: Dave Airlie 

On Tue, 23 Apr 2019 at 06:08, Marek Olšák  wrote:
>
> Ping. Thanks.
>
> On Tue, Apr 16, 2019 at 10:16 AM Marek Olšák  wrote:
>>
>> From: Marek Olšák 
>>
>> This fixes KHR-GL45.compute_shader.resources-max on radeonsi.
>>
>> Fixes: 4e1e8f684bf "glsl: remember which SSBOs are not read-only and pass it 
>> to gallium"
>>
>> v2: use is_interface_array, protect again assertion failures in 
>> u_bit_consecutive
>> ---
>>  src/compiler/glsl/link_uniforms.cpp | 22 +++---
>>  1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/compiler/glsl/link_uniforms.cpp 
>> b/src/compiler/glsl/link_uniforms.cpp
>> index ef124111991..aa96227a7e1 100644
>> --- a/src/compiler/glsl/link_uniforms.cpp
>> +++ b/src/compiler/glsl/link_uniforms.cpp
>> @@ -515,44 +515,60 @@ public:
>>this->record_next_bindless_sampler = new string_to_uint_map;
>>this->record_next_image = new string_to_uint_map;
>>this->record_next_bindless_image = new string_to_uint_map;
>>
>>buffer_block_index = -1;
>>if (var->is_in_buffer_block()) {
>>   struct gl_uniform_block *blks = var->is_in_shader_storage_block() ?
>>  prog->data->ShaderStorageBlocks : prog->data->UniformBlocks;
>>   unsigned num_blks = var->is_in_shader_storage_block() ?
>>  prog->data->NumShaderStorageBlocks : 
>> prog->data->NumUniformBlocks;
>> + bool is_interface_array =
>> +var->is_interface_instance() && var->type->is_array();
>>
>> - if (var->is_interface_instance() && var->type->is_array()) {
>> + if (is_interface_array) {
>>  unsigned l = strlen(var->get_interface_type()->name);
>>
>>  for (unsigned i = 0; i < num_blks; i++) {
>> if (strncmp(var->get_interface_type()->name, blks[i].Name, l)
>> == 0 && blks[i].Name[l] == '[') {
>>buffer_block_index = i;
>>break;
>> }
>>  }
>>   } else {
>>  for (unsigned i = 0; i < num_blks; i++) {
>> if (strcmp(var->get_interface_type()->name, blks[i].Name) == 
>> 0) {
>>buffer_block_index = i;
>>break;
>> }
>>  }
>>   }
>>   assert(buffer_block_index != -1);
>>
>>   if (var->is_in_shader_storage_block() &&
>> - !var->data.memory_read_only)
>> -shader_storage_blocks_write_access |= 1 << buffer_block_index;
>> + !var->data.memory_read_only) {
>> +unsigned array_size = is_interface_array ?
>> + var->type->array_size() : 1;
>> +
>> +STATIC_ASSERT(MAX_SHADER_STORAGE_BUFFERS <= 32);
>> +
>> +/* Shaders that use too many SSBOs will fail to compile, which
>> + * we don't care about.
>> + *
>> + * This is true for shaders that do not use too many SSBOs:
>> + */
>> +if (buffer_block_index + array_size <= 32) {
>> +   shader_storage_blocks_write_access |=
>> +  u_bit_consecutive(buffer_block_index, array_size);
>> +}
>> + }
>>
>>   /* Uniform blocks that were specified with an instance name must be
>>* handled a little bit differently.  The name of the variable is 
>> the
>>* name used to reference the uniform block instead of being the 
>> name
>>* of a variable within the block.  Therefore, searching for the 
>> name
>>* within the block will fail.
>>*/
>>   if (var->is_interface_instance()) {
>>  ubo_byte_offset = 0;
>>  process(var->get_interface_type(),
>> --
>> 2.17.1
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()

2019-04-25 Thread Dylan Baker
Thanks!

Quoting Timothy Arceri (2019-04-25 15:36:44)
> On 26/4/19 6:50 am, Dylan Baker wrote:
> > Hi Tim,
> > 
> > I had to make a couple of small tweaks to get this to apply against 19.0 
> > (namely
> > that the glsl_type_is_struct -> glsl_type_is_struct_or_ifc doesn't exist in
> > 19.0), could you take a look at the patch in the staging/19.0 branch and 
> > let me
> > know if it looks okay?
> 
> Looks good to me. Thanks!
> 
> > 
> > Thanks,
> > Dylan
> > 
> > Quoting Timothy Arceri (2019-04-24 18:17:42)
> >> We were only setting the used mask for the first component of a
> >> varying. Since the linking opts split vectors into scalars this
> >> has mostly worked ok.
> >>
> >> However this causes an issue where for example if we split a
> >> struct on one side of the interface but not the other, then we
> >> can possibly end up removing the first components on the side
> >> that was split and then incorrectly remove the whole struct
> >> on the other side of the varying.
> >>
> >> With this change we simply mark all 4 components for each slot
> >> used by a struct. We could possibly make this more fine gained
> >> but that would require a more complex change.
> >>
> >> This fixes a bug in Strange Brigade on RADV when tessellation
> >> is enabled, all credit goes to Samuel Pitoiset for tracking down
> >> the cause of the bug.
> >>
> >> Fixes: f1eb5e639997 ("nir: add component level support to 
> >> remove_unused_io_vars()")
> >> ---
> >>   src/compiler/nir/nir_linking_helpers.c | 51 +-
> >>   1 file changed, 33 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/src/compiler/nir/nir_linking_helpers.c 
> >> b/src/compiler/nir/nir_linking_helpers.c
> >> index f4494c78f98..ef0f94baf47 100644
> >> --- a/src/compiler/nir/nir_linking_helpers.c
> >> +++ b/src/compiler/nir/nir_linking_helpers.c
> >> @@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage 
> >> stage)
> >>  return ((1ull << slots) - 1) << location;
> >>   }
> >>   
> >> +static uint8_t
> >> +get_num_components(nir_variable *var)
> >> +{
> >> +   if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type)))
> >> +  return 4;
> >> +
> >> +   return glsl_get_vector_elements(glsl_without_array(var->type));
> >> +}
> >> +
> >>   static void
> >>   tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t 
> >> *patches_read)
> >>   {
> >> @@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t 
> >> *read, uint64_t *patches_read)
> >>  continue;
> >>   
> >>   nir_variable *var = nir_deref_instr_get_variable(deref);
> >> -if (var->data.patch) {
> >> -   patches_read[var->data.location_frac] |=
> >> -  get_variable_io_mask(var, shader->info.stage);
> >> -} else {
> >> -   read[var->data.location_frac] |=
> >> -  get_variable_io_mask(var, shader->info.stage);
> >> +for (unsigned i = 0; i < get_num_components(var); i++) {
> >> +   if (var->data.patch) {
> >> +  patches_read[var->data.location_frac + i] |=
> >> + get_variable_io_mask(var, shader->info.stage);
> >> +   } else {
> >> +  read[var->data.location_frac + i] |=
> >> + get_variable_io_mask(var, shader->info.stage);
> >> +   }
> >>   }
> >>}
> >> }
> >> @@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, 
> >> nir_shader *consumer)
> >>  uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 };
> >>   
> >>  nir_foreach_variable(var, >outputs) {
> >> -  if (var->data.patch) {
> >> - patches_written[var->data.location_frac] |=
> >> -get_variable_io_mask(var, producer->info.stage);
> >> -  } else {
> >> - written[var->data.location_frac] |=
> >> -get_variable_io_mask(var, producer->info.stage);
> >> +  for (unsigned i = 0; i < get_num_components(var); i++) {
> >> + if (var->data.patch) {
> >> +patches_written[var->data.location_frac + i] |=
> >> +   get_variable_io_mask(var, producer->info.stage);
> >> + } else {
> >> +written[var->data.location_frac + i] |=
> >> +   get_variable_io_mask(var, producer->info.stage);
> >> + }
> >> }
> >>  }
> >>   
> >>  nir_foreach_variable(var, >inputs) {
> >> -  if (var->data.patch) {
> >> - patches_read[var->data.location_frac] |=
> >> -get_variable_io_mask(var, consumer->info.stage);
> >> -  } else {
> >> - read[var->data.location_frac] |=
> >> -get_variable_io_mask(var, consumer->info.stage);
> >> +  for (unsigned i = 0; i < get_num_components(var); i++) {
> >> + if (var->data.patch) {
> >> +patches_read[var->data.location_frac + i] |=
> >> +   get_variable_io_mask(var, 

Re: [Mesa-dev] [PATCH 2/3] radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)

2019-04-25 Thread Dylan Baker
Awesome, thanks.

Quoting Marek Olšák (2019-04-25 14:50:52)
> Thanks. It looks good.
> 
> Marek
> 
> On Thu, Apr 25, 2019, 5:17 PM Dylan Baker  wrote:
> 
> Hi Marek,
> 
> I've tried to apply this to 19.0, I had to pull "radeonsi: add
> si_debug_options
> for convenient adding/removing of options", which is fine, but this patch
> also
> assumes your si compute-queue only patches, which aren't present in 19.0.
> I've
> made a small change to get it compiling, but I'm sure it's not the right
> fix, so
> if you could take a look at the staging/19.0 branch and let me know what
> you'd
> like to do I'd appreciate it.
> 
> Thanks,
> Dylan
> 
> Quoting Marek Olšák (2019-04-18 14:46:27)
> > From: Marek Olšák 
> >
> > Needed to track context rolls caused by streamout and ACQUIRE_MEM.
> > ACQUIRE_MEM can occur outside of draw calls.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355
> >
> > v2: squashed patches and done more rework
> >
> > Cc: 19.0 
> > ---
> >  src/gallium/drivers/radeonsi/si_pipe.c        |  2 +
> >  src/gallium/drivers/radeonsi/si_pipe.h        |  3 +-
> >  src/gallium/drivers/radeonsi/si_state.c       |  8 +-
> >  .../drivers/radeonsi/si_state_binning.c       |  4 +-
> >  src/gallium/drivers/radeonsi/si_state_draw.c  | 86 +++
> >  .../drivers/radeonsi/si_state_shaders.c       | 10 +--
> >  .../drivers/radeonsi/si_state_streamout.c     |  1 +
> >  .../drivers/radeonsi/si_state_viewport.c      |  2 +-
> >  8 files changed, 68 insertions(+), 48 deletions(-)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
> b/src/gallium/drivers
> /radeonsi/si_pipe.c
> > index fa96ce34224..7209db9fb37 100644
> > --- a/src/gallium/drivers/radeonsi/si_pipe.c
> > +++ b/src/gallium/drivers/radeonsi/si_pipe.c
> > @@ -1072,20 +1072,22 @@ struct pipe_screen 
> *radeonsi_screen_create(struct
> radeon_winsys *ws,
> > 
> >         sscreen->has_out_of_order_rast = sscreen->info.chip_class >= VI 
> &
> &
> >                                          sscreen->info.max_se >= 2 &&
> >                                          !(sscreen->debug_flags & DBG
> (NO_OUT_OF_ORDER));
> >         sscreen->assume_no_z_fights =
> >                 driQueryOptionb(config->options,
> "radeonsi_assume_no_z_fights");
> >         sscreen->commutative_blend_add =
> >                 driQueryOptionb(config->options,
> "radeonsi_commutative_blend_add");
> >         sscreen->clear_db_cache_before_clear =
> >                 driQueryOptionb(config->options,
> "radeonsi_clear_db_cache_before_clear");
> > +       sscreen->has_gfx9_scissor_bug = sscreen->info.family ==
> CHIP_VEGA10 ||
> > +                                       sscreen->info.family ==
> CHIP_RAVEN;
> >         sscreen->has_msaa_sample_loc_bug = (sscreen->info.family >=
> CHIP_POLARIS10 &&
> >                                             sscreen->info.family <=
> CHIP_POLARIS12) ||
> >                                            sscreen->info.family ==
> CHIP_VEGA10 ||
> >                                            sscreen->info.family ==
> CHIP_RAVEN;
> >         sscreen->has_ls_vgpr_init_bug = sscreen->info.family ==
> CHIP_VEGA10 ||
> >                                         sscreen->info.family ==
> CHIP_RAVEN;
> >         sscreen->has_dcc_constant_encode = sscreen->info.family ==
> CHIP_RAVEN2;
> > 
> >         /* Only enable primitive binning on APUs by default. */
> >         sscreen->dpbb_allowed = sscreen->info.family == CHIP_RAVEN ||
> > diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
> b/src/gallium/drivers
> /radeonsi/si_pipe.h
> > index aaa95f32d20..a4c90a4f69f 100644
> > --- a/src/gallium/drivers/radeonsi/si_pipe.h
> > +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> > @@ -463,20 +463,21 @@ struct si_screen {
> >         unsigned                        eqaa_force_coverage_samples;
> >         unsigned                        eqaa_force_z_samples;
> >         unsigned                        eqaa_force_color_samples;
> >         bool                            has_clear_state;
> >         bool                            has_distributed_tess;
> >         bool                            has_draw_indirect_multi;
> >         bool                            has_out_of_order_rast;
> >         bool                            assume_no_z_fights;
> >         bool                            commutative_blend_add;
> >         bool                            clear_db_cache_before_clear;
> > +       bool                            has_gfx9_scissor_bug;
> >         bool                            has_msaa_sample_loc_bug;
> >         bool                       

[Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops

2019-04-25 Thread Alyssa Rosenzweig
In addition to the familiar iand/ior/ixor, some architectures feature
destination-inverted versions inand/inor/inxor. Certain
architectures also have source-inverted forms, dubbed iandnot/iornot
here. Midgard has the all of these opcodes natively. Many arches have
comparible features to implement some/all of the above. Paired with De
Morgan's Laws, these opcodes allow anything of the form
"~? (~?a [&|] ~?b)" to complete in one instruction.

This can be used to simplify some backend-specific code on affected
architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on
Gen8+").

Signed-off-by: Alyssa Rosenzweig 
Cc: Ian Romanick 
Cc: Kenneth Graunke 
---
 src/compiler/nir/nir.h|  4 
 src/compiler/nir/nir_opcodes.py   | 18 ++
 src/compiler/nir/nir_opt_algebraic.py | 12 
 3 files changed, 34 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index e878a63409d..3e01ec2cc06 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options {
bool lower_hadd;
bool lower_add_sat;
 
+   /* Set if inand/inor/inxor and iandnot/iornot supported respectively */
+   bool bitwise_dest_invertable;
+   bool bitwise_src_invertable;
+
/**
 * Should nir_lower_io() create load_interpolated_input intrinsics?
 *
diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index d35d820aa5b..f9d92afb53e 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 & 
src1")
 binop("ior", tuint, commutative + associative, "src0 | src1")
 binop("ixor", tuint, commutative + associative, "src0 ^ src1")
 
+# inverted bitwise logic operators
+#
+# These variants of the above include bitwise NOTs either on the result of the
+# whole expression or on the latter operand. On some hardware (e.g. Midgard),
+# these are native ops. On other hardware (e.g. Intel Gen8+), these can be
+# implemented as modifiers of the standard three. Along with appropriate
+# algebraic passes, these should permit any permutation of inverses on AND/OR
+# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) = ~a
+# | b = b | ~a = iornot(b, a).
+
+binop("inand", tuint, commutative, "~(src0 & src1)")
+binop("inor", tuint, commutative, "~(src0 | src1)")
+binop("inxor", tuint, commutative, "~(src0 ^ src1)")
+binop("iandnot", tuint, "", "src0 & (~src1)")
+binop("iornot", tuint, "", "src0 & (~src1)")
+
+
+
 
 # floating point logic operators
 #
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index dad0545594f..6cb3e8cb950 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -1052,6 +1052,18 @@ late_optimizations = [
(('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', 
b)), ('fadd', c, ('fmax', a, b))),
 
(('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', ('ior', 
a, b,
+
+   # We don't want to deal with inverted forms, so run this late. Any
+   # combination of inverts on flags or output should result in a single
+   # instruction if these are supported; cases not explicitly handled would
+   # have been simplified via De Morgan's Law
+   (('inot', ('iand', a, b)), ('inand', a, b), 
'options->bitwise_dest_invertable'),
+   (('inot', ('ior', a, b)), ('inor', a, b), 
'options->bitwise_dest_invertable'),
+   (('inot', ('ixor', a, b)), ('inxor', a, b), 
'options->bitwise_dest_invertable'),
+   (('iand', ('inot', a), b), ('iandnot', b, a), 
'options->bitwise_src_invertable'),
+   (('iand', a, ('inot', b)), ('iandnot', a, b), 
'options->bitwise_src_invertable'),
+   (('ior', a, ('inot', b)), ('iornot', a, b), 
'options->bitwise_src_invertable'),
+   (('ior', ('inot', a), b), ('iornot', b, a), 
'options->bitwise_src_invertable'),
 ]
 
 print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render())
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] panfrost/midgard: Use inverted forms

2019-04-25 Thread Alyssa Rosenzweig
Signed-off-by: Alyssa Rosenzweig 
---
 src/gallium/drivers/panfrost/midgard/midgard_compile.c | 5 +
 src/gallium/drivers/panfrost/midgard/midgard_compile.h | 5 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.c 
b/src/gallium/drivers/panfrost/midgard/midgard_compile.c
index 742217c5a97..8e8f563732e 100644
--- a/src/gallium/drivers/panfrost/midgard/midgard_compile.c
+++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.c
@@ -1182,8 +1182,13 @@ emit_alu(compiler_context *ctx, nir_alu_instr *instr)
 ALU_CASE(fcos, fcos);
 
 ALU_CASE(iand, iand);
+ALU_CASE(iandnot, iandnot);
+ALU_CASE(inand, inand);
 ALU_CASE(ior, ior);
+ALU_CASE(iornot, iornot);
+ALU_CASE(inor, inor);
 ALU_CASE(ixor, ixor);
+ALU_CASE(inxor, inxor);
 ALU_CASE(inot, inand);
 ALU_CASE(ishl, ishl);
 ALU_CASE(ishr, iasr);
diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.h 
b/src/gallium/drivers/panfrost/midgard/midgard_compile.h
index 0724582d62c..6a894566a20 100644
--- a/src/gallium/drivers/panfrost/midgard/midgard_compile.h
+++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.h
@@ -107,7 +107,10 @@ static const nir_shader_compiler_options 
midgard_nir_options = {
 .lower_extract_byte = true,
 .lower_extract_word = true,
 
-.native_integers = true
+.native_integers = true,
+
+.bitwise_dest_invertable = true,
+.bitwise_src_invertable = true,
 };
 
 #endif
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/2] Add inverted bitwise forms to NIR

2019-04-25 Thread Alyssa Rosenzweig
Various combinations of these instructions are found on many
architectures. They appear directly as-is on Midgard; some of them
should be implementable on Intel Gen8+ via the source modifiers; I was
told over IRC that AMD/Nouveau might have some of these as well. Rather
than forcing backends to grapple with these directly, let's add some new
opcodes to NIR to support them so we can use a proper algebraic pass.

Alyssa Rosenzweig (2):
  nir: Add inverted bitwise ops
  panfrost/midgard: Use inverted forms

 src/compiler/nir/nir.h |  4 
 src/compiler/nir/nir_opcodes.py| 18 ++
 src/compiler/nir/nir_opt_algebraic.py  | 12 
 .../drivers/panfrost/midgard/midgard_compile.c |  5 +
 .../drivers/panfrost/midgard/midgard_compile.h |  5 -
 5 files changed, 43 insertions(+), 1 deletion(-)

-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()

2019-04-25 Thread Timothy Arceri

On 26/4/19 6:50 am, Dylan Baker wrote:

Hi Tim,

I had to make a couple of small tweaks to get this to apply against 19.0 (namely
that the glsl_type_is_struct -> glsl_type_is_struct_or_ifc doesn't exist in
19.0), could you take a look at the patch in the staging/19.0 branch and let me
know if it looks okay?


Looks good to me. Thanks!



Thanks,
Dylan

Quoting Timothy Arceri (2019-04-24 18:17:42)

We were only setting the used mask for the first component of a
varying. Since the linking opts split vectors into scalars this
has mostly worked ok.

However this causes an issue where for example if we split a
struct on one side of the interface but not the other, then we
can possibly end up removing the first components on the side
that was split and then incorrectly remove the whole struct
on the other side of the varying.

With this change we simply mark all 4 components for each slot
used by a struct. We could possibly make this more fine gained
but that would require a more complex change.

This fixes a bug in Strange Brigade on RADV when tessellation
is enabled, all credit goes to Samuel Pitoiset for tracking down
the cause of the bug.

Fixes: f1eb5e639997 ("nir: add component level support to 
remove_unused_io_vars()")
---
  src/compiler/nir/nir_linking_helpers.c | 51 +-
  1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/src/compiler/nir/nir_linking_helpers.c 
b/src/compiler/nir/nir_linking_helpers.c
index f4494c78f98..ef0f94baf47 100644
--- a/src/compiler/nir/nir_linking_helpers.c
+++ b/src/compiler/nir/nir_linking_helpers.c
@@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage 
stage)
 return ((1ull << slots) - 1) << location;
  }
  
+static uint8_t

+get_num_components(nir_variable *var)
+{
+   if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type)))
+  return 4;
+
+   return glsl_get_vector_elements(glsl_without_array(var->type));
+}
+
  static void
  tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t 
*patches_read)
  {
@@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t *read, 
uint64_t *patches_read)
 continue;
  
  nir_variable *var = nir_deref_instr_get_variable(deref);

-if (var->data.patch) {
-   patches_read[var->data.location_frac] |=
-  get_variable_io_mask(var, shader->info.stage);
-} else {
-   read[var->data.location_frac] |=
-  get_variable_io_mask(var, shader->info.stage);
+for (unsigned i = 0; i < get_num_components(var); i++) {
+   if (var->data.patch) {
+  patches_read[var->data.location_frac + i] |=
+ get_variable_io_mask(var, shader->info.stage);
+   } else {
+  read[var->data.location_frac + i] |=
+ get_variable_io_mask(var, shader->info.stage);
+   }
  }
   }
}
@@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, 
nir_shader *consumer)
 uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 };
  
 nir_foreach_variable(var, >outputs) {

-  if (var->data.patch) {
- patches_written[var->data.location_frac] |=
-get_variable_io_mask(var, producer->info.stage);
-  } else {
- written[var->data.location_frac] |=
-get_variable_io_mask(var, producer->info.stage);
+  for (unsigned i = 0; i < get_num_components(var); i++) {
+ if (var->data.patch) {
+patches_written[var->data.location_frac + i] |=
+   get_variable_io_mask(var, producer->info.stage);
+ } else {
+written[var->data.location_frac + i] |=
+   get_variable_io_mask(var, producer->info.stage);
+ }
}
 }
  
 nir_foreach_variable(var, >inputs) {

-  if (var->data.patch) {
- patches_read[var->data.location_frac] |=
-get_variable_io_mask(var, consumer->info.stage);
-  } else {
- read[var->data.location_frac] |=
-get_variable_io_mask(var, consumer->info.stage);
+  for (unsigned i = 0; i < get_num_components(var); i++) {
+ if (var->data.patch) {
+patches_read[var->data.location_frac + i] |=
+   get_variable_io_mask(var, consumer->info.stage);
+ } else {
+read[var->data.location_frac + i] |=
+   get_variable_io_mask(var, consumer->info.stage);
+ }
}
 }
  
--

2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)

2019-04-25 Thread Marek Olšák
Thanks. It looks good.

Marek

On Thu, Apr 25, 2019, 5:17 PM Dylan Baker  wrote:

> Hi Marek,
>
> I've tried to apply this to 19.0, I had to pull "radeonsi: add
> si_debug_options
> for convenient adding/removing of options", which is fine, but this patch
> also
> assumes your si compute-queue only patches, which aren't present in 19.0.
> I've
> made a small change to get it compiling, but I'm sure it's not the right
> fix, so
> if you could take a look at the staging/19.0 branch and let me know what
> you'd
> like to do I'd appreciate it.
>
> Thanks,
> Dylan
>
> Quoting Marek Olšák (2019-04-18 14:46:27)
> > From: Marek Olšák 
> >
> > Needed to track context rolls caused by streamout and ACQUIRE_MEM.
> > ACQUIRE_MEM can occur outside of draw calls.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355
> >
> > v2: squashed patches and done more rework
> >
> > Cc: 19.0 
> > ---
> >  src/gallium/drivers/radeonsi/si_pipe.c|  2 +
> >  src/gallium/drivers/radeonsi/si_pipe.h|  3 +-
> >  src/gallium/drivers/radeonsi/si_state.c   |  8 +-
> >  .../drivers/radeonsi/si_state_binning.c   |  4 +-
> >  src/gallium/drivers/radeonsi/si_state_draw.c  | 86 +++
> >  .../drivers/radeonsi/si_state_shaders.c   | 10 +--
> >  .../drivers/radeonsi/si_state_streamout.c |  1 +
> >  .../drivers/radeonsi/si_state_viewport.c  |  2 +-
> >  8 files changed, 68 insertions(+), 48 deletions(-)
> >
> > diff --git a/src/gallium/drivers/radeonsi/si_pipe.c
> b/src/gallium/drivers/radeonsi/si_pipe.c
> > index fa96ce34224..7209db9fb37 100644
> > --- a/src/gallium/drivers/radeonsi/si_pipe.c
> > +++ b/src/gallium/drivers/radeonsi/si_pipe.c
> > @@ -1072,20 +1072,22 @@ struct pipe_screen
> *radeonsi_screen_create(struct radeon_winsys *ws,
> >
> > sscreen->has_out_of_order_rast = sscreen->info.chip_class >= VI
> &&
> >  sscreen->info.max_se >= 2 &&
> >  !(sscreen->debug_flags &
> DBG(NO_OUT_OF_ORDER));
> > sscreen->assume_no_z_fights =
> > driQueryOptionb(config->options,
> "radeonsi_assume_no_z_fights");
> > sscreen->commutative_blend_add =
> > driQueryOptionb(config->options,
> "radeonsi_commutative_blend_add");
> > sscreen->clear_db_cache_before_clear =
> > driQueryOptionb(config->options,
> "radeonsi_clear_db_cache_before_clear");
> > +   sscreen->has_gfx9_scissor_bug = sscreen->info.family ==
> CHIP_VEGA10 ||
> > +   sscreen->info.family ==
> CHIP_RAVEN;
> > sscreen->has_msaa_sample_loc_bug = (sscreen->info.family >=
> CHIP_POLARIS10 &&
> > sscreen->info.family <=
> CHIP_POLARIS12) ||
> >sscreen->info.family ==
> CHIP_VEGA10 ||
> >sscreen->info.family ==
> CHIP_RAVEN;
> > sscreen->has_ls_vgpr_init_bug = sscreen->info.family ==
> CHIP_VEGA10 ||
> > sscreen->info.family ==
> CHIP_RAVEN;
> > sscreen->has_dcc_constant_encode = sscreen->info.family ==
> CHIP_RAVEN2;
> >
> > /* Only enable primitive binning on APUs by default. */
> > sscreen->dpbb_allowed = sscreen->info.family == CHIP_RAVEN ||
> > diff --git a/src/gallium/drivers/radeonsi/si_pipe.h
> b/src/gallium/drivers/radeonsi/si_pipe.h
> > index aaa95f32d20..a4c90a4f69f 100644
> > --- a/src/gallium/drivers/radeonsi/si_pipe.h
> > +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> > @@ -463,20 +463,21 @@ struct si_screen {
> > unsignedeqaa_force_coverage_samples;
> > unsignedeqaa_force_z_samples;
> > unsignedeqaa_force_color_samples;
> > boolhas_clear_state;
> > boolhas_distributed_tess;
> > boolhas_draw_indirect_multi;
> > boolhas_out_of_order_rast;
> > boolassume_no_z_fights;
> > boolcommutative_blend_add;
> > boolclear_db_cache_before_clear;
> > +   boolhas_gfx9_scissor_bug;
> > boolhas_msaa_sample_loc_bug;
> > boolhas_ls_vgpr_init_bug;
> > boolhas_dcc_constant_encode;
> > booldpbb_allowed;
> > booldfsm_allowed;
> > boolllvm_has_working_vgpr_indexing;
> >
> > /* Whether shaders are monolithic (1-part) or separate (3-part).
> */
> > booluse_monolithic_shaders;
> > bool   

[Mesa-dev] [Bug 107022] [RADV] The Witcher 3: Trembling of trees

2019-04-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107022

zefkerri...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #34 from zefkerri...@gmail.com ---
Hi! I just tested it again, but this time using Mesa 19.0.3 with LLVM 8.0.0 and
now it is fully fixed and works well, just like on Windows. This is similar to
the fact that it was a LLVM regression that fixed now in LLVM 8, and not a Mesa
bug at all.
Thank you very much. GOOD LUCK!

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)

2019-04-25 Thread Dylan Baker
Hi Marek,

I've tried to apply this to 19.0, I had to pull "radeonsi: add si_debug_options
for convenient adding/removing of options", which is fine, but this patch also
assumes your si compute-queue only patches, which aren't present in 19.0. I've
made a small change to get it compiling, but I'm sure it's not the right fix, so
if you could take a look at the staging/19.0 branch and let me know what you'd
like to do I'd appreciate it.

Thanks,
Dylan

Quoting Marek Olšák (2019-04-18 14:46:27)
> From: Marek Olšák 
> 
> Needed to track context rolls caused by streamout and ACQUIRE_MEM.
> ACQUIRE_MEM can occur outside of draw calls.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355
> 
> v2: squashed patches and done more rework
> 
> Cc: 19.0 
> ---
>  src/gallium/drivers/radeonsi/si_pipe.c|  2 +
>  src/gallium/drivers/radeonsi/si_pipe.h|  3 +-
>  src/gallium/drivers/radeonsi/si_state.c   |  8 +-
>  .../drivers/radeonsi/si_state_binning.c   |  4 +-
>  src/gallium/drivers/radeonsi/si_state_draw.c  | 86 +++
>  .../drivers/radeonsi/si_state_shaders.c   | 10 +--
>  .../drivers/radeonsi/si_state_streamout.c |  1 +
>  .../drivers/radeonsi/si_state_viewport.c  |  2 +-
>  8 files changed, 68 insertions(+), 48 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
> b/src/gallium/drivers/radeonsi/si_pipe.c
> index fa96ce34224..7209db9fb37 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.c
> +++ b/src/gallium/drivers/radeonsi/si_pipe.c
> @@ -1072,20 +1072,22 @@ struct pipe_screen *radeonsi_screen_create(struct 
> radeon_winsys *ws,
>  
> sscreen->has_out_of_order_rast = sscreen->info.chip_class >= VI &&
>  sscreen->info.max_se >= 2 &&
>  !(sscreen->debug_flags & 
> DBG(NO_OUT_OF_ORDER));
> sscreen->assume_no_z_fights =
> driQueryOptionb(config->options, 
> "radeonsi_assume_no_z_fights");
> sscreen->commutative_blend_add =
> driQueryOptionb(config->options, 
> "radeonsi_commutative_blend_add");
> sscreen->clear_db_cache_before_clear =
> driQueryOptionb(config->options, 
> "radeonsi_clear_db_cache_before_clear");
> +   sscreen->has_gfx9_scissor_bug = sscreen->info.family == CHIP_VEGA10 ||
> +   sscreen->info.family == CHIP_RAVEN;
> sscreen->has_msaa_sample_loc_bug = (sscreen->info.family >= 
> CHIP_POLARIS10 &&
> sscreen->info.family <= 
> CHIP_POLARIS12) ||
>sscreen->info.family == 
> CHIP_VEGA10 ||
>sscreen->info.family == CHIP_RAVEN;
> sscreen->has_ls_vgpr_init_bug = sscreen->info.family == CHIP_VEGA10 ||
> sscreen->info.family == CHIP_RAVEN;
> sscreen->has_dcc_constant_encode = sscreen->info.family == 
> CHIP_RAVEN2;
>  
> /* Only enable primitive binning on APUs by default. */
> sscreen->dpbb_allowed = sscreen->info.family == CHIP_RAVEN ||
> diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
> b/src/gallium/drivers/radeonsi/si_pipe.h
> index aaa95f32d20..a4c90a4f69f 100644
> --- a/src/gallium/drivers/radeonsi/si_pipe.h
> +++ b/src/gallium/drivers/radeonsi/si_pipe.h
> @@ -463,20 +463,21 @@ struct si_screen {
> unsignedeqaa_force_coverage_samples;
> unsignedeqaa_force_z_samples;
> unsignedeqaa_force_color_samples;
> boolhas_clear_state;
> boolhas_distributed_tess;
> boolhas_draw_indirect_multi;
> boolhas_out_of_order_rast;
> boolassume_no_z_fights;
> boolcommutative_blend_add;
> boolclear_db_cache_before_clear;
> +   boolhas_gfx9_scissor_bug;
> boolhas_msaa_sample_loc_bug;
> boolhas_ls_vgpr_init_bug;
> boolhas_dcc_constant_encode;
> booldpbb_allowed;
> booldfsm_allowed;
> boolllvm_has_working_vgpr_indexing;
>  
> /* Whether shaders are monolithic (1-part) or separate (3-part). */
> booluse_monolithic_shaders;
> boolrecord_llvm_ir;
> @@ -1062,21 +1063,21 @@ struct si_context {
> unsignednum_vs_flushes;
> unsignednum_ps_flushes;
> unsignednum_cs_flushes;
> 

Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()

2019-04-25 Thread Dylan Baker
Hi Tim,

I had to make a couple of small tweaks to get this to apply against 19.0 (namely
that the glsl_type_is_struct -> glsl_type_is_struct_or_ifc doesn't exist in
19.0), could you take a look at the patch in the staging/19.0 branch and let me
know if it looks okay?

Thanks,
Dylan

Quoting Timothy Arceri (2019-04-24 18:17:42)
> We were only setting the used mask for the first component of a
> varying. Since the linking opts split vectors into scalars this
> has mostly worked ok.
> 
> However this causes an issue where for example if we split a
> struct on one side of the interface but not the other, then we
> can possibly end up removing the first components on the side
> that was split and then incorrectly remove the whole struct
> on the other side of the varying.
> 
> With this change we simply mark all 4 components for each slot
> used by a struct. We could possibly make this more fine gained
> but that would require a more complex change.
> 
> This fixes a bug in Strange Brigade on RADV when tessellation
> is enabled, all credit goes to Samuel Pitoiset for tracking down
> the cause of the bug.
> 
> Fixes: f1eb5e639997 ("nir: add component level support to 
> remove_unused_io_vars()")
> ---
>  src/compiler/nir/nir_linking_helpers.c | 51 +-
>  1 file changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_linking_helpers.c 
> b/src/compiler/nir/nir_linking_helpers.c
> index f4494c78f98..ef0f94baf47 100644
> --- a/src/compiler/nir/nir_linking_helpers.c
> +++ b/src/compiler/nir/nir_linking_helpers.c
> @@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage 
> stage)
> return ((1ull << slots) - 1) << location;
>  }
>  
> +static uint8_t
> +get_num_components(nir_variable *var)
> +{
> +   if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type)))
> +  return 4;
> +
> +   return glsl_get_vector_elements(glsl_without_array(var->type));
> +}
> +
>  static void
>  tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t 
> *patches_read)
>  {
> @@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t *read, 
> uint64_t *patches_read)
> continue;
>  
>  nir_variable *var = nir_deref_instr_get_variable(deref);
> -if (var->data.patch) {
> -   patches_read[var->data.location_frac] |=
> -  get_variable_io_mask(var, shader->info.stage);
> -} else {
> -   read[var->data.location_frac] |=
> -  get_variable_io_mask(var, shader->info.stage);
> +for (unsigned i = 0; i < get_num_components(var); i++) {
> +   if (var->data.patch) {
> +  patches_read[var->data.location_frac + i] |=
> + get_variable_io_mask(var, shader->info.stage);
> +   } else {
> +  read[var->data.location_frac + i] |=
> + get_variable_io_mask(var, shader->info.stage);
> +   }
>  }
>   }
>}
> @@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, 
> nir_shader *consumer)
> uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 };
>  
> nir_foreach_variable(var, >outputs) {
> -  if (var->data.patch) {
> - patches_written[var->data.location_frac] |=
> -get_variable_io_mask(var, producer->info.stage);
> -  } else {
> - written[var->data.location_frac] |=
> -get_variable_io_mask(var, producer->info.stage);
> +  for (unsigned i = 0; i < get_num_components(var); i++) {
> + if (var->data.patch) {
> +patches_written[var->data.location_frac + i] |=
> +   get_variable_io_mask(var, producer->info.stage);
> + } else {
> +written[var->data.location_frac + i] |=
> +   get_variable_io_mask(var, producer->info.stage);
> + }
>}
> }
>  
> nir_foreach_variable(var, >inputs) {
> -  if (var->data.patch) {
> - patches_read[var->data.location_frac] |=
> -get_variable_io_mask(var, consumer->info.stage);
> -  } else {
> - read[var->data.location_frac] |=
> -get_variable_io_mask(var, consumer->info.stage);
> +  for (unsigned i = 0; i < get_num_components(var); i++) {
> + if (var->data.patch) {
> +patches_read[var->data.location_frac + i] |=
> +   get_variable_io_mask(var, consumer->info.stage);
> + } else {
> +read[var->data.location_frac + i] |=
> +   get_variable_io_mask(var, consumer->info.stage);
> + }
>}
> }
>  
> -- 
> 2.20.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: signature
___
mesa-dev mailing list

[Mesa-dev] [Bug 110456] Land YCBCR extensions

2019-04-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110456

Bas Nieuwenhuizen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Bas Nieuwenhuizen  ---
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/593 has been merged.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 110439] [TRACKER] Mesa 19.1 feature tracker

2019-04-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110439
Bug 110439 depends on bug 110456, which changed state.

Bug 110456 Summary: Land YCBCR extensions
https://bugs.freedesktop.org/show_bug.cgi?id=110456

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 110516] OpenGLCurveEvaluator does not return affine coordinates

2019-04-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110516

Volker Enderlein  changed:

   What|Removed |Added

 Attachment #144094|0   |1
is obsolete||

--- Comment #1 from Volker Enderlein  ---
Created attachment 144095
  --> https://bugs.freedesktop.org/attachment.cgi?id=144095=edit
Correct patch

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 1/3] llvmpipe: add lp_fence_timedwait() helper

2019-04-25 Thread Gustaw Smolarczyk
czw., 25 kwi 2019 o 20:11 Gustaw Smolarczyk  napisał(a):
>
> czw., 25 kwi 2019 o 19:42 Emil Velikov  napisał(a):
> >
> > The function is analogous to lp_fence_wait() while taking at timeout
> > (ns) parameter, as needed for EGL fence/sync.
> >
> > v2:
> >  - use absolute UTC time, as per spec (Gustaw)
> >  - bail out on cnd_timedwait() failure (Gustaw)
> >
> > Cc: Gustaw Smolarczyk 
> > Cc: Roland Scheidegger 
> > Signed-off-by: Emil Velikov 
> > Reviewed-by: Roland Scheidegger  (v1)
> > ---
> >  src/gallium/drivers/llvmpipe/lp_fence.c | 30 +
> >  src/gallium/drivers/llvmpipe/lp_fence.h |  3 +++
> >  2 files changed, 33 insertions(+)
> >
> > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c 
> > b/src/gallium/drivers/llvmpipe/lp_fence.c
> > index 20cd91cd63d..b79d773bf6c 100644
> > --- a/src/gallium/drivers/llvmpipe/lp_fence.c
> > +++ b/src/gallium/drivers/llvmpipe/lp_fence.c
> > @@ -125,3 +125,33 @@ lp_fence_wait(struct lp_fence *f)
> >  }
> >
> >
> > +boolean
> > +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout)
> > +{
> > +   struct timespec ts;
> > +   int ret;
> > +
> > +   timespec_get(, TIME_UTC);
> > +
> > +   ts.tv_nsec += timeout % 10L;
> > +   ts.tv_sec += timeout / 10L;
> > +   if (ts.tv_nsec >= 10L) {
> > +  ts.tv_sec++;
> > +  ts.tv_nsec -= 10L;
> > +   }
> > +
> > +   if (LP_DEBUG & DEBUG_FENCE)
> > +  debug_printf("%s %d\n", __FUNCTION__, f->id);
> > +
> > +   mtx_lock(>mutex);
> > +   assert(f->issued);
> > +   while (f->count < f->rank) {
> > +  ret = cnd_timedwait(>signalled, >mutex, );
> > +  if (ret != thrd_success)
> > + break;
> > +   }
> > +   mtx_unlock(>mutex);
> > +   return (f->count >= f->rank && ret == thrd_success);

Hmm, you are reading from the fence object outside of the critical
section, which doesn't sound safe. Maybe compute the return value
before the mutex is unlocked?

   const boolean result = (f->count >= f->rank);
   mtx_unlock(>mutex);
   return result;

Since f->rank is immutable and f->count never decreases, it might
still be ok without this change, though it is racy.

>
> Is checking for ret == thrd_success here really necessary? If the
> first part is true we already know that the fence has been signalled.
>
> With this changed or not:
>
> Reviewed-by: Gustaw Smolarczyk 
>
> > +}
> > +
> > +
> > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h 
> > b/src/gallium/drivers/llvmpipe/lp_fence.h
> > index b72026492c6..5ba746d22d1 100644
> > --- a/src/gallium/drivers/llvmpipe/lp_fence.h
> > +++ b/src/gallium/drivers/llvmpipe/lp_fence.h
> > @@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence);
> >  void
> >  lp_fence_wait(struct lp_fence *fence);
> >
> > +boolean
> > +lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout);
> > +
> >  void
> >  llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen);
> >
> > --
> > 2.20.1
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 110516] OpenGLCurveEvaluator does not return affine coordinates

2019-04-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110516

Bug ID: 110516
   Summary: OpenGLCurveEvaluator does not return affine
coordinates
   Product: Mesa
   Version: 19.0
  Hardware: All
OS: All
Status: NEW
  Severity: major
  Priority: medium
 Component: GLU
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: volkerenderl...@hotmail.com
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 144094
  --> https://bugs.freedesktop.org/attachment.cgi?id=144094=edit
fix for bug report

According to the spec the GLU_NURBS_VERTEX and GLU_NURBS_VERTEX_DATA callbacks
should always return affine coordinates. But in case of a NURBS curve
definition in homogeneous coordinates it returns homogeneous coordinates. The
transformation to affine coordinates is missing. See accompanying patch for a
fix.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 1/3] llvmpipe: add lp_fence_timedwait() helper

2019-04-25 Thread Gustaw Smolarczyk
czw., 25 kwi 2019 o 19:42 Emil Velikov  napisał(a):
>
> The function is analogous to lp_fence_wait() while taking at timeout
> (ns) parameter, as needed for EGL fence/sync.
>
> v2:
>  - use absolute UTC time, as per spec (Gustaw)
>  - bail out on cnd_timedwait() failure (Gustaw)
>
> Cc: Gustaw Smolarczyk 
> Cc: Roland Scheidegger 
> Signed-off-by: Emil Velikov 
> Reviewed-by: Roland Scheidegger  (v1)
> ---
>  src/gallium/drivers/llvmpipe/lp_fence.c | 30 +
>  src/gallium/drivers/llvmpipe/lp_fence.h |  3 +++
>  2 files changed, 33 insertions(+)
>
> diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c 
> b/src/gallium/drivers/llvmpipe/lp_fence.c
> index 20cd91cd63d..b79d773bf6c 100644
> --- a/src/gallium/drivers/llvmpipe/lp_fence.c
> +++ b/src/gallium/drivers/llvmpipe/lp_fence.c
> @@ -125,3 +125,33 @@ lp_fence_wait(struct lp_fence *f)
>  }
>
>
> +boolean
> +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout)
> +{
> +   struct timespec ts;
> +   int ret;
> +
> +   timespec_get(, TIME_UTC);
> +
> +   ts.tv_nsec += timeout % 10L;
> +   ts.tv_sec += timeout / 10L;
> +   if (ts.tv_nsec >= 10L) {
> +  ts.tv_sec++;
> +  ts.tv_nsec -= 10L;
> +   }
> +
> +   if (LP_DEBUG & DEBUG_FENCE)
> +  debug_printf("%s %d\n", __FUNCTION__, f->id);
> +
> +   mtx_lock(>mutex);
> +   assert(f->issued);
> +   while (f->count < f->rank) {
> +  ret = cnd_timedwait(>signalled, >mutex, );
> +  if (ret != thrd_success)
> + break;
> +   }
> +   mtx_unlock(>mutex);
> +   return (f->count >= f->rank && ret == thrd_success);

Is checking for ret == thrd_success here really necessary? If the
first part is true we already know that the fence has been signalled.

With this changed or not:

Reviewed-by: Gustaw Smolarczyk 

> +}
> +
> +
> diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h 
> b/src/gallium/drivers/llvmpipe/lp_fence.h
> index b72026492c6..5ba746d22d1 100644
> --- a/src/gallium/drivers/llvmpipe/lp_fence.h
> +++ b/src/gallium/drivers/llvmpipe/lp_fence.h
> @@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence);
>  void
>  lp_fence_wait(struct lp_fence *fence);
>
> +boolean
> +lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout);
> +
>  void
>  llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen);
>
> --
> 2.20.1
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 110356] install_megadrivers.py creates new dangling symlink [bisected]

2019-04-25 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=110356

--- Comment #5 from Fabio Pedretti  ---
Any plan to backport these (up to 5d310015) to 18.3 (I know it is EOL) and
release a 18.3.7?

See Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926857

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 2/3] llvmpipe: correctly handle waiting in llvmpipe_fence_finish

2019-04-25 Thread Emil Velikov
Currently if the timeout differs from 0, we'll end up with infinite
wait... even if the user is perfectly clear they don't want that.

Use the new lp_fence_timedwait() helper guarding both waits in an
!lp_fence_signalled block like the rest of llvmpipe.

Signed-off-by: Emil Velikov 
Reviewed-by: Roland Scheidegger 
---
 src/gallium/drivers/llvmpipe/lp_screen.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c 
b/src/gallium/drivers/llvmpipe/lp_screen.c
index 8426427e397..510346d2abf 100644
--- a/src/gallium/drivers/llvmpipe/lp_screen.c
+++ b/src/gallium/drivers/llvmpipe/lp_screen.c
@@ -637,7 +637,12 @@ llvmpipe_fence_finish(struct pipe_screen *screen,
if (!timeout)
   return lp_fence_signalled(f);
 
-   lp_fence_wait(f);
+   if (!lp_fence_signalled(f)) {
+  if (timeout != PIPE_TIMEOUT_INFINITE)
+ return lp_fence_timedwait(f, timeout);
+
+  lp_fence_wait(f);
+   }
return TRUE;
 }
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/3] llvmpipe: add lp_fence_timedwait() helper

2019-04-25 Thread Emil Velikov
The function is analogous to lp_fence_wait() while taking at timeout
(ns) parameter, as needed for EGL fence/sync.

v2:
 - use absolute UTC time, as per spec (Gustaw)
 - bail out on cnd_timedwait() failure (Gustaw)

Cc: Gustaw Smolarczyk 
Cc: Roland Scheidegger 
Signed-off-by: Emil Velikov 
Reviewed-by: Roland Scheidegger  (v1)
---
 src/gallium/drivers/llvmpipe/lp_fence.c | 30 +
 src/gallium/drivers/llvmpipe/lp_fence.h |  3 +++
 2 files changed, 33 insertions(+)

diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c 
b/src/gallium/drivers/llvmpipe/lp_fence.c
index 20cd91cd63d..b79d773bf6c 100644
--- a/src/gallium/drivers/llvmpipe/lp_fence.c
+++ b/src/gallium/drivers/llvmpipe/lp_fence.c
@@ -125,3 +125,33 @@ lp_fence_wait(struct lp_fence *f)
 }
 
 
+boolean
+lp_fence_timedwait(struct lp_fence *f, uint64_t timeout)
+{
+   struct timespec ts;
+   int ret;
+
+   timespec_get(, TIME_UTC);
+
+   ts.tv_nsec += timeout % 10L;
+   ts.tv_sec += timeout / 10L;
+   if (ts.tv_nsec >= 10L) {
+  ts.tv_sec++;
+  ts.tv_nsec -= 10L;
+   }
+
+   if (LP_DEBUG & DEBUG_FENCE)
+  debug_printf("%s %d\n", __FUNCTION__, f->id);
+
+   mtx_lock(>mutex);
+   assert(f->issued);
+   while (f->count < f->rank) {
+  ret = cnd_timedwait(>signalled, >mutex, );
+  if (ret != thrd_success)
+ break;
+   }
+   mtx_unlock(>mutex);
+   return (f->count >= f->rank && ret == thrd_success);
+}
+
+
diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h 
b/src/gallium/drivers/llvmpipe/lp_fence.h
index b72026492c6..5ba746d22d1 100644
--- a/src/gallium/drivers/llvmpipe/lp_fence.h
+++ b/src/gallium/drivers/llvmpipe/lp_fence.h
@@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence);
 void
 lp_fence_wait(struct lp_fence *fence);
 
+boolean
+lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout);
+
 void
 llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen);
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 3/3] llvmpipe: Always return some fence in flush (v2)

2019-04-25 Thread Emil Velikov
From: Tomasz Figa 

If there is no last fence, due to no rendering happening yet, just
create a new signaled fence and return it, to match the expectations of
the EGL sync fence API.

Fixes random "Could not create sync fence 0x3003" assertion failures from
Skia on Android, coming from the following code:

https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427

Reproducible especially with thread count >= 4.

One could make the driver always keep the reference to the last fence,
but:

 - the driver seems to explicitly destroy the fence whenever a rendering
   pass completes and changing that would require a significant functional
   change to the code. (Specifically, in lp_scene_end_rasterization().)

 - it still wouldn't solve the problem of an EGL sync fence being created
   and waited on without any rendering happening at all, which is
   also likely to happen with Android code pointed to in the commit.

Therefore, the simple approach of always creating a fence is taken,
similarly to other drivers, such as radeonsi.

Tested with piglit llvmpipe suite with no regressions and following
tests fixed:

egl_khr_fence_sync
 conformance
  eglclientwaitsynckhr_flag_sync_flush
  eglclientwaitsynckhr_nonzero_timeout
  eglclientwaitsynckhr_zero_timeout
  eglcreatesynckhr_default_attributes
  eglgetsyncattribkhr_invalid_attrib
  eglgetsyncattribkhr_sync_status

v2:
 - remove the useless lp_fence_reference() dance (Nicolai),
 - explain why creating the dummy fence is the right approach.

Signed-off-by: Tomasz Figa 
Reviewed-by: Roland Scheidegger 
---
 src/gallium/drivers/llvmpipe/lp_setup.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c 
b/src/gallium/drivers/llvmpipe/lp_setup.c
index b0873694732..e72e119c8a1 100644
--- a/src/gallium/drivers/llvmpipe/lp_setup.c
+++ b/src/gallium/drivers/llvmpipe/lp_setup.c
@@ -361,6 +361,8 @@ lp_setup_flush( struct lp_setup_context *setup,
 
if (fence) {
   lp_fence_reference((struct lp_fence **)fence, setup->last_fence);
+  if (!*fence)
+ *fence = (struct pipe_fence_handle *)lp_fence_create(0);
}
 }
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/3] llvmpipe: add lp_fence_timedwait() helper

2019-04-25 Thread Emil Velikov
On Tue, 16 Apr 2019 at 11:50, Gustaw Smolarczyk  wrote:
>
> wt., 16 kwi 2019 o 12:11 Emil Velikov  napisał(a):
> >
> > On Thu, 11 Apr 2019 at 17:55, Gustaw Smolarczyk  
> > wrote:
> > >
> > > czw., 11 kwi 2019 o 18:06 Emil Velikov  
> > > napisał(a):
> > > >
> > > > The function is analogous to lp_fence_wait() while taking at timeout
> > > > (ns) parameter, as needed for EGL fence/sync.
> > > >
> > > > Cc: Roland Scheidegger 
> > > > Signed-off-by: Emil Velikov 
> > > > ---
> > > >  src/gallium/drivers/llvmpipe/lp_fence.c | 22 ++
> > > >  src/gallium/drivers/llvmpipe/lp_fence.h |  3 +++
> > > >  2 files changed, 25 insertions(+)
> > > >
> > > > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c 
> > > > b/src/gallium/drivers/llvmpipe/lp_fence.c
> > > > index 20cd91cd63d..f8b31a9d6a5 100644
> > > > --- a/src/gallium/drivers/llvmpipe/lp_fence.c
> > > > +++ b/src/gallium/drivers/llvmpipe/lp_fence.c
> > > > @@ -125,3 +125,25 @@ lp_fence_wait(struct lp_fence *f)
> > > >  }
> > > >
> > > >
> > > > +boolean
> > > > +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout)
> > > > +{
> > > > +   struct timespec ts = {
> > > > +  .tv_nsec = timeout % 10L,
> > > > +  .tv_sec = timeout / 10L,
> > > > +   };
> > >
> > > According to the documentation [1] and looking at the implementation
> > > in mesa [2], cnd_timedwait accepts an absolute time in UTC, not
> > > duration. It seems that the fence_finish callback accepts duration.
> > >
> > Agreed, not sure how I missed that one.
> >
> > > [1] https://en.cppreference.com/w/c/thread/cnd_timedwait
> > > [2] 
> > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/include/c11/threads_posix.h#L135
> > >
> > > > +   int ret;
> > > > +
> > > > +   if (LP_DEBUG & DEBUG_FENCE)
> > > > +  debug_printf("%s %d\n", __FUNCTION__, f->id);
> > > > +
> > > > +   mtx_lock(>mutex);
> > > > +   assert(f->issued);
> > > > +   while (f->count < f->rank) {
> > > > +  ret = cnd_timedwait(>signalled, >mutex, );
> > >
> > > Shouldn't ret be checked for thrd_busy here as well? Otherwise, the
> > > function will busy-wait after the timeout is reached instead of
> > > returning.
> > >
> > Actually this should be tweaked to:
> >  - only wait for the requested timeout
> >  - _regardless_ of how meany threads (and hence ranks) llvmpipe has
> >
> > Something like the following (warning pre-coffee code) should work.
> >
> >mtx_lock(>mutex);
> >assert(f->issued);
> >if (f->count < f->rank)
> >   ret = cnd_timedwait(>signalled, >mutex, );
> >
> >mtx_unlock(>mutex);
> >return (f->count >= f->rank && ret == thrd_success);
> >
>
> Is it a good idea not to perform cnd_timedwait in a loop? A spurious
> wake up could could shorten the wait time.
>
Thanks Gustaw - had no idea cnd_timedwait can spuriously return.

For anyone else wondering - the C11 spec does not mention anything,
but there's a bug [1] to fix that.
Underlying implementations (such as pthreads [2]) are already pretty
clear about this "interesting" behaviour.

Will send out and updated revision in a moment.

Thanks again.
Emil

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_480
[2] https://linux.die.net/man/3/pthread_cond_timedwait
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [MR] glsl: Use default precision on struct/interface members when nothing specified

2019-04-25 Thread Neil Roberts
Mesa already keeps track of the GLES precision for variables and stores
it in the ir_variable. When no precision is explicitly specified it
takes the default precision for the corresponding type. However, when
the variable is a struct or interface, the precision of each individual
member is attached to the glsl_type instead. The code to make it use the
default precision was missing so this branch adds it in.

Only the last patch actually makes this change. The rest of the patches
are to fix regressions in Piglit and CTS. The underlying problem is that
Mesa was considering types that have different precisions to be
different when comparing interstage interfaces (varyings and UBOs).
According to the spec it should be ignored. Presumably this problem
already existed if mismatched precisions were explicitly specified but
we didn’t have any tests to test it. Storing the default precision makes
some tests fail because the default precision for ints is different in
the vertex and fragment stages so it’s easier to accidentally make a
test case that tests this.

The tests that regressed are:

dEQP-GLES31.functional.shaders.opaque_type_indexing.* (12 tests)
piglit.spec.ext_transform_feedback.structs_gles3 basic-struct run
piglit.spec.glsl-es-3_00.execution.varying-struct-centroid_gles3
piglit.spec.ext_transform_feedback.structs_gles3 basic-struct get

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/736


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/3] radeonsi: handle unaligned vertex buffers in hardware

2019-04-25 Thread Nicolai Hähnle
Hi all,

the following patches contain code to implement all vertex fetches
using plain, non-format loads plus explicit shader arithmetic for
format conversion.

This allows us to remove the software workaround for unaligned vertex
buffers on SI, because we can just load individual bytes on the GPU.
CI+ will still use short/dword loads even in the unaligned case.

The format conversion code was tested by running with
radeonsi_vs_fetch_always_opencode=true on both Verde and Vega.

Please review!

Thanks,
Nicolai
--
 src/amd/common/ac_llvm_build.c   | 313 +
 src/amd/common/ac_llvm_build.h   |  30 ++
 .../drivers/radeonsi/si_debug_options.h  |   1 +
 src/gallium/drivers/radeonsi/si_get.c|   2 +-
 src/gallium/drivers/radeonsi/si_pipe.h   |   1 +
 src/gallium/drivers/radeonsi/si_shader.c | 249 +
 src/gallium/drivers/radeonsi/si_shader.h |  46 +--
 src/gallium/drivers/radeonsi/si_state.c  | 233 +++-
 src/gallium/drivers/radeonsi/si_state.h  |  19 +
 .../drivers/radeonsi/si_state_shaders.c  |  37 +-
 10 files changed, 645 insertions(+), 286 deletions(-)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] radeonsi: store sctx->vertex_elements in a local in si_shader_selector_key_vs

2019-04-25 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Purely as a shorthand in the remainder of the function.
---
 src/gallium/drivers/radeonsi/si_state_shaders.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index f57e7730905..583d7c9d3ca 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -1372,33 +1372,32 @@ static unsigned si_get_alpha_test_func(struct 
si_context *sctx)
 
 static void si_shader_selector_key_vs(struct si_context *sctx,
  struct si_shader_selector *vs,
  struct si_shader_key *key,
  struct si_vs_prolog_bits *prolog_key)
 {
if (!sctx->vertex_elements ||
vs->info.properties[TGSI_PROPERTY_VS_BLIT_SGPRS])
return;
 
-   prolog_key->instance_divisor_is_one =
-   sctx->vertex_elements->instance_divisor_is_one;
-   prolog_key->instance_divisor_is_fetched =
-   sctx->vertex_elements->instance_divisor_is_fetched;
+   struct si_vertex_elements *elts = sctx->vertex_elements;
+
+   prolog_key->instance_divisor_is_one = elts->instance_divisor_is_one;
+   prolog_key->instance_divisor_is_fetched = 
elts->instance_divisor_is_fetched;
 
/* Prefer a monolithic shader to allow scheduling divisions around
 * VBO loads. */
if (prolog_key->instance_divisor_is_fetched)
key->opt.prefer_mono = 1;
 
-   unsigned count = MIN2(vs->info.num_inputs,
- sctx->vertex_elements->count);
-   memcpy(key->mono.vs_fix_fetch, sctx->vertex_elements->fix_fetch, count);
+   unsigned count = MIN2(vs->info.num_inputs, elts->count);
+   memcpy(key->mono.vs_fix_fetch, elts->fix_fetch, count);
 }
 
 static void si_shader_selector_key_hw_vs(struct si_context *sctx,
 struct si_shader_selector *vs,
 struct si_shader_key *key)
 {
struct si_shader_selector *ps = sctx->ps_shader.cso;
 
key->opt.clip_disable =
sctx->queued.named.rasterizer->clip_plane_enable == 0 &&
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] radeonsi: overhaul the vertex fetch fixup mechanism

2019-04-25 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The overall goal is to support unaligned loads from vertex buffers
natively on SI.

In the unaligned case, we fall back to the general case implementation in
ac_build_opencoded_load_format. Since this function is fully general,
we will also use it going forward for cases requiring fully manual format
conversions of dwords anyway.

This requires a different encoding of the fix_fetch array, which will now
contain the entire format information if a fixup is required.

Having to check the alignment of vertex buffers is awkward. To keep the
impact on the fast path minimal, the si_context will keep track of which
vertex buffers are (not) at least dword-aligned, while the
si_vertex_elements will note which vertex buffers have some (at most dword)
alignment requirement. Vertex buffers should be dword-aligned most of the
time, which allows a fast early-out in almost all cases.

Add the radeonsi_vs_fetch_always_opencode configuration variable for
testing purposes. Note that it can only be used reliably on LLVM >= 9,
because support for byte and short load is required.
---
 .../drivers/radeonsi/si_debug_options.h   |   1 +
 src/gallium/drivers/radeonsi/si_get.c |   2 +-
 src/gallium/drivers/radeonsi/si_pipe.h|   1 +
 src/gallium/drivers/radeonsi/si_shader.c  | 249 ++
 src/gallium/drivers/radeonsi/si_shader.h  |  46 ++--
 src/gallium/drivers/radeonsi/si_state.c   | 233 +---
 src/gallium/drivers/radeonsi/si_state.h   |  19 ++
 .../drivers/radeonsi/si_state_shaders.c   |  26 +-
 8 files changed, 297 insertions(+), 280 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_debug_options.h 
b/src/gallium/drivers/radeonsi/si_debug_options.h
index 019256ca1d1..0bde7910fc6 100644
--- a/src/gallium/drivers/radeonsi/si_debug_options.h
+++ b/src/gallium/drivers/radeonsi/si_debug_options.h
@@ -1,6 +1,7 @@
 OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth 
clear")
 OPT_BOOL(enable_nir, false, "Enable NIR")
 OPT_BOOL(aux_debug, false, "Generate ddebug_dumps for the auxiliary context")
 OPT_BOOL(sync_compile, false, "Always compile synchronously (will cause 
stalls)")
+OPT_BOOL(vs_fetch_always_opencode, false, "Always open code vertex fetches 
(less efficient, purely for testing)")
 
 #undef OPT_BOOL
diff --git a/src/gallium/drivers/radeonsi/si_get.c 
b/src/gallium/drivers/radeonsi/si_get.c
index 4e23d283ab7..ff825c5e30a 100644
--- a/src/gallium/drivers/radeonsi/si_get.c
+++ b/src/gallium/drivers/radeonsi/si_get.c
@@ -190,21 +190,21 @@ static int si_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
/* Optimal number for good TexSubImage performance on 
Polaris10. */
return 64 * 1024 * 1024;
 
case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE:
case PIPE_CAP_MAX_SHADER_BUFFER_SIZE:
return MIN2(sscreen->info.max_alloc_size, INT_MAX);
 
case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY:
case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY:
-   return !sscreen->info.has_unaligned_shader_loads;
+   return HAVE_LLVM < 0x0900 && 
!sscreen->info.has_unaligned_shader_loads;
 
case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
return sscreen->info.has_sparse_vm_mappings ?
RADEON_SPARSE_PAGE_SIZE : 0;
 
case PIPE_CAP_PACKED_UNIFORMS:
if (sscreen->options.enable_nir)
return 1;
return 0;
 
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 7fc0319973b..1d241436a6d 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -938,20 +938,21 @@ struct si_context {
union pipe_color_union  *border_color_map; /* in VRAM (slow 
access), little endian */
unsignedborder_color_count;
unsignednum_vs_blit_sgprs;
uint32_t
vs_blit_sh_data[SI_VS_BLIT_SGPRS_POS_TEXCOORD];
uint32_tcs_user_data[4];
 
/* Vertex and index buffers. */
boolvertex_buffers_dirty;
boolvertex_buffer_pointer_dirty;
struct pipe_vertex_buffer   vertex_buffer[SI_NUM_VERTEX_BUFFERS];
+   uint16_tvertex_buffer_unaligned; /* bitmask of 
not dword-aligned buffers */
 
/* MSAA config state. */
int ps_iter_samples;
boolps_uses_fbfetch;
boolsmoothing_enabled;
 
/* DB render state. */
unsignedps_db_shader_control;
unsigneddbcb_copy_sample;
bool

[Mesa-dev] [PATCH 1/3] amd/common: add ac_build_opencoded_fetch_format

2019-04-25 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Implement software emulation of buffer_load_format for all types required
by vertex buffer fetches.
---
 src/amd/common/ac_llvm_build.c | 313 +
 src/amd/common/ac_llvm_build.h |  30 
 2 files changed, 343 insertions(+)

diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c
index 4fdf73c99ba..197c58a8e45 100644
--- a/src/amd/common/ac_llvm_build.c
+++ b/src/amd/common/ac_llvm_build.c
@@ -1667,20 +1667,333 @@ ac_build_tbuffer_load_byte(struct ac_llvm_context *ctx,
 
res = ac_build_raw_tbuffer_load(ctx, rsrc, voffset, soffset,
immoffset, 1, dfmt, nfmt, glc, 
false,
false);
 
res = LLVMBuildTrunc(ctx->builder, res, ctx->i8, "");
}
 
return res;
 }
+
+/**
+ * Convert an 11- or 10-bit unsigned floating point number to an f32.
+ *
+ * The input exponent is expected to be biased analogous to IEEE-754, i.e. by
+ * 2^(exp_bits-1) - 1 (as defined in OpenGL and other graphics APIs).
+ */
+static LLVMValueRef
+ac_ufN_to_float(struct ac_llvm_context *ctx, LLVMValueRef src, unsigned 
exp_bits, unsigned mant_bits)
+{
+   assert(LLVMTypeOf(src) == ctx->i32);
+
+   LLVMValueRef tmp;
+   LLVMValueRef mantissa;
+   mantissa = LLVMBuildAnd(ctx->builder, src, LLVMConstInt(ctx->i32, (1 << 
mant_bits) - 1, false), "");
+
+   /* Converting normal numbers is just a shift + correcting the exponent 
bias */
+   unsigned normal_shift = 23 - mant_bits;
+   unsigned bias_shift = 127 - ((1 << (exp_bits - 1)) - 1);
+   LLVMValueRef shifted, normal;
+
+   shifted = LLVMBuildShl(ctx->builder, src, LLVMConstInt(ctx->i32, 
normal_shift, false), "");
+   normal = LLVMBuildAdd(ctx->builder, shifted, LLVMConstInt(ctx->i32, 
bias_shift << 23, false), "");
+
+   /* Converting nan/inf numbers is the same, but with a different 
exponent update */
+   LLVMValueRef naninf;
+   naninf = LLVMBuildOr(ctx->builder, normal, LLVMConstInt(ctx->i32, 0xff 
<< 23, false), "");
+
+   /* Converting denormals is the complex case: determine the leading 
zeros of the
+* mantissa to obtain the correct shift for the mantissa and exponent 
correction.
+*/
+   LLVMValueRef denormal;
+   LLVMValueRef params[2] = {
+   mantissa,
+   ctx->i1true, /* result can be undef when arg is 0 */
+   };
+   LLVMValueRef ctlz = ac_build_intrinsic(ctx, "llvm.ctlz.i32", ctx->i32,
+ params, 2, AC_FUNC_ATTR_READNONE);
+
+   /* Shift such that the leading 1 ends up as the LSB of the exponent 
field. */
+   tmp = LLVMBuildSub(ctx->builder, ctlz, LLVMConstInt(ctx->i32, 8, 
false), "");
+   denormal = LLVMBuildShl(ctx->builder, mantissa, tmp, "");
+
+   unsigned denormal_exp = bias_shift + (32 - mant_bits) - 1;
+   tmp = LLVMBuildSub(ctx->builder, LLVMConstInt(ctx->i32, denormal_exp, 
false), ctlz, "");
+   tmp = LLVMBuildShl(ctx->builder, tmp, LLVMConstInt(ctx->i32, 23, 
false), "");
+   denormal = LLVMBuildAdd(ctx->builder, denormal, tmp, "");
+
+   /* Select the final result. */
+   LLVMValueRef result;
+
+   tmp = LLVMBuildICmp(ctx->builder, LLVMIntUGE, src,
+   LLVMConstInt(ctx->i32, ((1 << exp_bits) - 1) << 
mant_bits, false), "");
+   result = LLVMBuildSelect(ctx->builder, tmp, naninf, normal, "");
+
+   tmp = LLVMBuildICmp(ctx->builder, LLVMIntUGE, src,
+   LLVMConstInt(ctx->i32, 1 << mant_bits, false), "");
+   result = LLVMBuildSelect(ctx->builder, tmp, result, denormal, "");
+
+   tmp = LLVMBuildICmp(ctx->builder, LLVMIntNE, src, ctx->i32_0, "");
+   result = LLVMBuildSelect(ctx->builder, tmp, result, ctx->i32_0, "");
+
+   return ac_to_float(ctx, result);
+}
+
+/**
+ * Generate a fully general open coded buffer format fetch with all required
+ * fixups suitable for vertex fetch, using non-format buffer loads.
+ *
+ * Some combinations of argument values have special interpretations:
+ * - size = 8 bytes, format = fixed indicates PIPE_FORMAT_R11G11B10_FLOAT
+ * - size = 8 bytes, format != {float,fixed} indicates a 2_10_10_10 data format
+ *
+ * \param log_size log(size of channel in bytes)
+ * \param num_channels number of channels (1 to 4)
+ * \param format AC_FETCH_FORMAT_xxx value
+ * \param reverse whether XYZ channels are reversed
+ * \param known_aligned whether the source is known to be aligned to hardware's
+ *  effective element size for loading the given format
+ *  (note: this means dword alignment for 8_8_8_8, 16_16, 
etc.)
+ * \param rsrc buffer resource descriptor
+ * \return the resulting vector of floats or integers bitcast to <4 x i32>
+ */
+LLVMValueRef
+ac_build_opencoded_load_format(struct ac_llvm_context *ctx,
+  

Re: [Mesa-dev] [PATCH v2] vulkan/wsi: check if the display_fd given is master

2019-04-25 Thread Bas Nieuwenhuizen
r-b

On Thu, Apr 25, 2019 at 12:22 PM Emil Velikov  wrote:
>
> On Fri, 19 Apr 2019 at 16:01, Emil Velikov  wrote:
> >
> > From: Emil Velikov 
> >
> > As effectively required by the extension, we need to ensure we're master
> >
> > Currently drivers employ vendor specific solutions, which check if the
> > device behind the fd is capable*, yet none of them do the master check.
> >
> > *In the radv case, if acceleration is available.
> >
> > Instead of duplicating the check in each driver, keep it where it's
> > needed and used.
> >
> > Note this copies libdrm's drmIsMaster() to avoid depending on bleeding
> > edge version of the library.
> >
> > v2: set the fd to -1 if not master (Bas)
> >
> > Cc: Keith Packard 
> > Cc: Jason Ekstrand 
> > Cc: Bas Nieuwenhuizen 
> > Cc: Andres Rodriguez 
> > Reported-by: Andres Rodriguez 
> > Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]")
> > Signed-off-by: Emil Velikov 
> > ---
> >  src/vulkan/wsi/wsi_common_display.c | 27 +++
> >  1 file changed, 27 insertions(+)
> >
> > diff --git a/src/vulkan/wsi/wsi_common_display.c 
> > b/src/vulkan/wsi/wsi_common_display.c
> > index 74ed36ed646..2be20e85046 100644
> > --- a/src/vulkan/wsi/wsi_common_display.c
> > +++ b/src/vulkan/wsi/wsi_common_display.c
> > @@ -1812,6 +1812,30 @@ fail_attr_init:
> > return ret;
> >  }
> >
> > +
> > +/*
> > + * Local version fo the libdrm helper. Added to avoid depending on bleeding
> > + * edge version of the library.
> > + */
> > +static int
> > +local_drmIsMaster(int fd)
> > +{
> > +   /* Detect master by attempting something that requires master.
> > +*
> > +* Authenticating magic tokens requires master and 0 is an
> > +* internal kernel detail which we could use. Attempting this on
> > +* a master fd would fail therefore fail with EINVAL because 0
> > +* is invalid.
> > +*
> > +* A non-master fd will fail with EACCES, as the kernel checks
> > +* for master before attempting to do anything else.
> > +*
> > +* Since we don't want to leak implementation details, use
> > +* EACCES.
> > +*/
> > +   return drmAuthMagic(fd, 0) != -EACCES;
> > +}
> > +
> >  VkResult
> >  wsi_display_init_wsi(struct wsi_device *wsi_device,
> >   const VkAllocationCallbacks *alloc,
> > @@ -1827,6 +1851,9 @@ wsi_display_init_wsi(struct wsi_device *wsi_device,
> > }
> >
> > wsi->fd = display_fd;
> > +   if (wsi->fd != -1 && !local_drmIsMaster(wsi->fd))
> > +  wsi->fd = -1;
> > +
> > wsi->alloc = alloc;
> >
> Humble ping?
>
> -Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] vulkan/wsi: don't use DUMB_CLOSE for normal GEM handles

2019-04-25 Thread Bas Nieuwenhuizen
r-b

On Thu, Apr 25, 2019 at 12:22 PM Emil Velikov  wrote:
>
> On Fri, 19 Apr 2019 at 16:03, Emil Velikov  wrote:
> >
> > From: Emil Velikov 
> >
> > Currently we get normal GEM handles from PrimeFDToHandle, yet we close
> > then with DUMB_CLOSE. Use GEM_CLOSE instead.
> >
> > Cc: Keith Packard 
> > Cc: Jason Ekstrand 
> > Cc: Bas Nieuwenhuizen 
> > Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]")
> > Signed-off-by: Emil Velikov 
> > ---
> >  src/vulkan/wsi/wsi_common_display.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/vulkan/wsi/wsi_common_display.c 
> > b/src/vulkan/wsi/wsi_common_display.c
> > index 2be20e85046..66e191906fc 100644
> > --- a/src/vulkan/wsi/wsi_common_display.c
> > +++ b/src/vulkan/wsi/wsi_common_display.c
> > @@ -974,8 +974,8 @@ static void
> >  wsi_display_destroy_buffer(struct wsi_display *wsi,
> > uint32_t buffer)
> >  {
> > -   (void) drmIoctl(wsi->fd, DRM_IOCTL_MODE_DESTROY_DUMB,
> > -   &((struct drm_mode_destroy_dumb) { .handle = buffer }));
> > +   (void) drmIoctl(wsi->fd, DRM_IOCTL_GEM_CLOSE,
> > +   &((struct drm_gem_close) { .handle = buffer }));
> >  }
> >
> Humble ping anyone?
>
> AFAICT closing handles from PrimeFDToHandle() with DUMB_CLOSE is a
> violation, even if it somehow works today.
>
> Thanks
> Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/8] radeonsi: add si_debug_options for convenient adding/removing of options

2019-04-25 Thread Nicolai Hähnle

On 25.04.19 04:45, Marek Olšák wrote:
[snip]

-       bool                            clear_db_cache_before_clear;
         bool                            has_msaa_sample_loc_bug;
         bool                            has_ls_vgpr_init_bug;
         bool                            has_dcc_constant_encode;
         bool                            dpbb_allowed;
         bool                            dfsm_allowed;
         bool                            llvm_has_working_vgpr_indexing;

+       struct {
+#define OPT_BOOL(name, dflt, description) uint8_t name:1;


Why not bool instead of uint8_t?


Sure.



+#include "si_debug_options.inc"


Why not use the .h file extension?


The original intention was to distinguish it from header files where 
including is supposed idempotent, i.e. including the file a second time 
makes no difference. Anyway, I'm changing it.


Cheers,
Nicolai




Other than those, this is:

Reviewed-by: Marek Olšák mailto:marek.ol...@amd.com>>

Marek



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2] vulkan/wsi: check if the display_fd given is master

2019-04-25 Thread Emil Velikov
On Fri, 19 Apr 2019 at 16:01, Emil Velikov  wrote:
>
> From: Emil Velikov 
>
> As effectively required by the extension, we need to ensure we're master
>
> Currently drivers employ vendor specific solutions, which check if the
> device behind the fd is capable*, yet none of them do the master check.
>
> *In the radv case, if acceleration is available.
>
> Instead of duplicating the check in each driver, keep it where it's
> needed and used.
>
> Note this copies libdrm's drmIsMaster() to avoid depending on bleeding
> edge version of the library.
>
> v2: set the fd to -1 if not master (Bas)
>
> Cc: Keith Packard 
> Cc: Jason Ekstrand 
> Cc: Bas Nieuwenhuizen 
> Cc: Andres Rodriguez 
> Reported-by: Andres Rodriguez 
> Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]")
> Signed-off-by: Emil Velikov 
> ---
>  src/vulkan/wsi/wsi_common_display.c | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/src/vulkan/wsi/wsi_common_display.c 
> b/src/vulkan/wsi/wsi_common_display.c
> index 74ed36ed646..2be20e85046 100644
> --- a/src/vulkan/wsi/wsi_common_display.c
> +++ b/src/vulkan/wsi/wsi_common_display.c
> @@ -1812,6 +1812,30 @@ fail_attr_init:
> return ret;
>  }
>
> +
> +/*
> + * Local version fo the libdrm helper. Added to avoid depending on bleeding
> + * edge version of the library.
> + */
> +static int
> +local_drmIsMaster(int fd)
> +{
> +   /* Detect master by attempting something that requires master.
> +*
> +* Authenticating magic tokens requires master and 0 is an
> +* internal kernel detail which we could use. Attempting this on
> +* a master fd would fail therefore fail with EINVAL because 0
> +* is invalid.
> +*
> +* A non-master fd will fail with EACCES, as the kernel checks
> +* for master before attempting to do anything else.
> +*
> +* Since we don't want to leak implementation details, use
> +* EACCES.
> +*/
> +   return drmAuthMagic(fd, 0) != -EACCES;
> +}
> +
>  VkResult
>  wsi_display_init_wsi(struct wsi_device *wsi_device,
>   const VkAllocationCallbacks *alloc,
> @@ -1827,6 +1851,9 @@ wsi_display_init_wsi(struct wsi_device *wsi_device,
> }
>
> wsi->fd = display_fd;
> +   if (wsi->fd != -1 && !local_drmIsMaster(wsi->fd))
> +  wsi->fd = -1;
> +
> wsi->alloc = alloc;
>
Humble ping?

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] vulkan/wsi: don't use DUMB_CLOSE for normal GEM handles

2019-04-25 Thread Emil Velikov
On Fri, 19 Apr 2019 at 16:03, Emil Velikov  wrote:
>
> From: Emil Velikov 
>
> Currently we get normal GEM handles from PrimeFDToHandle, yet we close
> then with DUMB_CLOSE. Use GEM_CLOSE instead.
>
> Cc: Keith Packard 
> Cc: Jason Ekstrand 
> Cc: Bas Nieuwenhuizen 
> Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]")
> Signed-off-by: Emil Velikov 
> ---
>  src/vulkan/wsi/wsi_common_display.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/vulkan/wsi/wsi_common_display.c 
> b/src/vulkan/wsi/wsi_common_display.c
> index 2be20e85046..66e191906fc 100644
> --- a/src/vulkan/wsi/wsi_common_display.c
> +++ b/src/vulkan/wsi/wsi_common_display.c
> @@ -974,8 +974,8 @@ static void
>  wsi_display_destroy_buffer(struct wsi_display *wsi,
> uint32_t buffer)
>  {
> -   (void) drmIoctl(wsi->fd, DRM_IOCTL_MODE_DESTROY_DUMB,
> -   &((struct drm_mode_destroy_dumb) { .handle = buffer }));
> +   (void) drmIoctl(wsi->fd, DRM_IOCTL_GEM_CLOSE,
> +   &((struct drm_gem_close) { .handle = buffer }));
>  }
>
Humble ping anyone?

AFAICT closing handles from PrimeFDToHandle() with DUMB_CLOSE is a
violation, even if it somehow works today.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mesa (master): radeonsi: delay adding BOs at the beginning of IBs until the first draw

2019-04-25 Thread Michel Dänzer
On 2019-04-24 11:36 p.m., Marek Olšák wrote:
> It should be fixed by the patch "radeonsi: add BOs after need_cs_space".

Looks like that did the trick, thanks!


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/6] Add support for NV12

2019-04-25 Thread Lucas Stach
Hi Christian,

Am Mittwoch, den 24.04.2019, 08:36 +0200 schrieb Christian Gmeiner:
> This patch series goes a complete different route then the one from
> Lucas Stach. I am using the integrated YUV tiler instead of using
> the 2D core for format conversion. I am reusing some patches from
> Lucas and this series sits on-top of Lucas "st/dri: YUV" patches.

We specifically opted to use the 2D GPU to do a format conversion, as
this yields a RGB internal representation, which means the texture has
the same properties as a normal GL texture (e.g. glReadPixels works).
This way we can expose YUV format imports as non-external textures.

This provides a number of benefits in texture lifetime handling in the
upper layers of the stack, which are used to drive those video use-
cases, like GStreamer. I don't really care what the blob does, but I do
care about having the highest performing solution, which is to have the
2D GPU work in parallel with the 3D GPU and allow efficient texture
imports with GStreamer.

I would really appreciate a review of my patch series.

Regards,
Lucas

> Christian Gmeiner (3):
>   etnaviv: direct YUYV/UYVY support
>   etnaviv: update headers from rnndb
>   etnaviv: add multi-planar YUV support
> 
> Lucas Stach (3):
>   etnaviv: clear out next pointer when allocating resource
>   etnaviv: remember data offset into BO
>   etnaviv: improve PIPE_BIND_LINEAR handling
> 
>  .../drivers/etnaviv/etnaviv_clear_blit.c  |   2 +-
>  src/gallium/drivers/etnaviv/etnaviv_format.c  |   5 +-
>  .../drivers/etnaviv/etnaviv_resource.c|  24 +++-
>  src/gallium/drivers/etnaviv/etnaviv_rs.c  |   5 +
>  src/gallium/drivers/etnaviv/etnaviv_screen.c  |   4 +
>  src/gallium/drivers/etnaviv/etnaviv_texture.c |   8 ++
>  src/gallium/drivers/etnaviv/etnaviv_yuv.c | 123
> ++
>  src/gallium/drivers/etnaviv/etnaviv_yuv.h |  44 +++
>  src/gallium/drivers/etnaviv/hw/common.xml.h   |   2 +-
>  .../drivers/etnaviv/hw/common_3d.xml.h|   2 +-
>  src/gallium/drivers/etnaviv/hw/state.xml.h|   4 +-
>  src/gallium/drivers/etnaviv/hw/state_3d.xml.h |  35 +++--
>  .../drivers/etnaviv/hw/state_blt.xml.h|   4 +-
>  .../drivers/etnaviv/hw/texdesc_3d.xml.h   |   2 +-
>  src/gallium/drivers/etnaviv/meson.build   |   2 +
>  15 files changed, 240 insertions(+), 26 deletions(-)
>  create mode 100644 src/gallium/drivers/etnaviv/etnaviv_yuv.c
>  create mode 100644 src/gallium/drivers/etnaviv/etnaviv_yuv.h
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] gpu/docs: Clarify what userspace means for gl

2019-04-25 Thread Daniel Vetter
On Thu, Apr 25, 2019 at 11:28:40AM +0800, zhoucm1 wrote:
> 
> 
> On 2019年04月25日 03:22, Eric Anholt wrote:
> > "Zhou, David(ChunMing)"  writes:
> > 
> > > Will linux be only mesa-linux? I thought linux is an  open linux.
> > > Which will impact our opengl/amdvlk(MIT open source), not sure Rocm:
> > > 1. how to deal with one uapi that opengl/amdvlk needs but mesa dont need? 
> > > reject?
> > > 2. one hw feature that opengl/amdvlk developers work on that but no mesa
> > > developers work on, cannot upstream as well?
> > I believe these questions are already covered by
> > 
> > "+Other userspace is only admissible if exposing a given feature through 
> > OpenGL
> > or
> > +OpenGL ES would result in a technically unsound design, incomplete driver 
> > or
> > +an implementation which isn't useful in real world usage."
> > 
> > If OpenGL needs the interface, then you need a Mesa implementation.
> > It's time for you to work with the community to build that or get it
> > built.  Or, in AMD's case, work with the Mesa developers that you
> > already employ.
> > 
> > If OpenGL doesn't need it, but Vulkan needs it, then we don't have a
> > clear policy in place, and this patch doesn't change that.  I would
> > personally say that AMDVLK doesn't qualify given that as far as I know
> > there is not open review of proposed patches to the project as they're
> > being developed.
> Can I understand what you mean is, as soon as the stack is openly developed,
> then which will be able to drive new UAPI?

I think the only clear thing here is that the answer is complicated, and
need to be decided on a case by case basis. That's what I tried to clarify
with my patch, but I think there's not enough clearly defined common
ground. So it'll stay complicated.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] radeonsi/nir: create si_nir_opts() helper

2019-04-25 Thread Timothy Arceri
We will make use of this in the following commit.
---
 src/gallium/drivers/radeonsi/si_shader.h |  1 +
 src/gallium/drivers/radeonsi/si_shader_nir.c | 78 +++-
 2 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index f9f81a7bc1e..71ce27b2f55 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -710,6 +710,7 @@ void si_nir_scan_shader(const struct nir_shader *nir,
 void si_nir_scan_tess_ctrl(const struct nir_shader *nir,
   struct tgsi_tessctrl_info *out);
 void si_lower_nir(struct si_shader_selector *sel);
+void si_nir_opts(struct nir_shader *nir);
 
 /* Inline helpers. */
 
diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 4fe01ba0607..87100fbed19 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -811,6 +811,47 @@ void si_nir_scan_shader(const struct nir_shader *nir,
}
 }
 
+void
+si_nir_opts(struct nir_shader *nir)
+{
+   bool progress;
+   do {
+   progress = false;
+
+   NIR_PASS_V(nir, nir_lower_vars_to_ssa);
+
+   NIR_PASS(progress, nir, nir_opt_copy_prop_vars);
+   NIR_PASS(progress, nir, nir_opt_dead_write_vars);
+
+   NIR_PASS_V(nir, nir_lower_alu_to_scalar);
+   NIR_PASS_V(nir, nir_lower_phis_to_scalar);
+
+   /* (Constant) copy propagation is needed for txf with offsets. 
*/
+   NIR_PASS(progress, nir, nir_copy_prop);
+   NIR_PASS(progress, nir, nir_opt_remove_phis);
+   NIR_PASS(progress, nir, nir_opt_dce);
+   if (nir_opt_trivial_continues(nir)) {
+   progress = true;
+   NIR_PASS(progress, nir, nir_copy_prop);
+   NIR_PASS(progress, nir, nir_opt_dce);
+   }
+   NIR_PASS(progress, nir, nir_opt_if, true);
+   NIR_PASS(progress, nir, nir_opt_dead_cf);
+   NIR_PASS(progress, nir, nir_opt_cse);
+   NIR_PASS(progress, nir, nir_opt_peephole_select, 8, true, true);
+
+   /* Needed for algebraic lowering */
+   NIR_PASS(progress, nir, nir_opt_algebraic);
+   NIR_PASS(progress, nir, nir_opt_constant_folding);
+
+   NIR_PASS(progress, nir, nir_opt_undef);
+   NIR_PASS(progress, nir, nir_opt_conditional_discard);
+   if (nir->options->max_unroll_iterations) {
+   NIR_PASS(progress, nir, nir_opt_loop_unroll, 0);
+   }
+   } while (progress);
+}
+
 /**
  * Perform "lowering" operations on the NIR that are run once when the shader
  * selector is created.
@@ -861,42 +902,7 @@ si_lower_nir(struct si_shader_selector* sel)
 
ac_lower_indirect_derefs(sel->nir, sel->screen->info.chip_class);
 
-   bool progress;
-   do {
-   progress = false;
-
-   NIR_PASS_V(sel->nir, nir_lower_vars_to_ssa);
-
-   NIR_PASS(progress, sel->nir, nir_opt_copy_prop_vars);
-   NIR_PASS(progress, sel->nir, nir_opt_dead_write_vars);
-
-   NIR_PASS_V(sel->nir, nir_lower_alu_to_scalar);
-   NIR_PASS_V(sel->nir, nir_lower_phis_to_scalar);
-
-   /* (Constant) copy propagation is needed for txf with offsets. 
*/
-   NIR_PASS(progress, sel->nir, nir_copy_prop);
-   NIR_PASS(progress, sel->nir, nir_opt_remove_phis);
-   NIR_PASS(progress, sel->nir, nir_opt_dce);
-   if (nir_opt_trivial_continues(sel->nir)) {
-   progress = true;
-   NIR_PASS(progress, sel->nir, nir_copy_prop);
-   NIR_PASS(progress, sel->nir, nir_opt_dce);
-   }
-   NIR_PASS(progress, sel->nir, nir_opt_if, true);
-   NIR_PASS(progress, sel->nir, nir_opt_dead_cf);
-   NIR_PASS(progress, sel->nir, nir_opt_cse);
-   NIR_PASS(progress, sel->nir, nir_opt_peephole_select, 8, true, 
true);
-
-   /* Needed for algebraic lowering */
-   NIR_PASS(progress, sel->nir, nir_opt_algebraic);
-   NIR_PASS(progress, sel->nir, nir_opt_constant_folding);
-
-   NIR_PASS(progress, sel->nir, nir_opt_undef);
-   NIR_PASS(progress, sel->nir, nir_opt_conditional_discard);
-   if (sel->nir->options->max_unroll_iterations) {
-   NIR_PASS(progress, sel->nir, nir_opt_loop_unroll, 0);
-   }
-   } while (progress);
+   si_nir_opts(sel->nir);
 
NIR_PASS_V(sel->nir, nir_lower_bool_to_int32);
 
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 2/2] radeonsi/nir: call radeonsi nir opts before the scan pass

2019-04-25 Thread Timothy Arceri
Some of the opts are not called in the general optimastion loop
in the state trackers glsl -> nir conversion. We need to call
the radeonsi specific optimisation once before scanning over
the nir otherwise we can end up gathering info on code that
is later removed.

Fixes an assert in the piglit test:

./bin/varying-struct-centroid_gles3
---
 src/gallium/drivers/radeonsi/si_compute.c   | 1 +
 src/gallium/drivers/radeonsi/si_state_shaders.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 541d7e6f118..f1a433b72df 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -106,6 +106,7 @@ static void si_create_compute_state_async(void *job, int 
thread_index)
assert(program->ir_type == PIPE_SHADER_IR_NIR);
sel.nir = program->ir.nir;
 
+   si_nir_opts(sel.nir);
si_nir_scan_shader(sel.nir, );
si_lower_nir();
}
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 5bdfd4f6ac1..7250b40c5db 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -2240,6 +2240,7 @@ static void *si_create_shader_selector(struct 
pipe_context *ctx,
 
sel->nir = state->ir.nir;
 
+   si_nir_opts(sel->nir);
si_nir_scan_shader(sel->nir, >info);
si_nir_scan_tess_ctrl(sel->nir, >tcs_info);
}
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()

2019-04-25 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

Thanks fo the fix Tim!

On 4/25/19 3:17 AM, Timothy Arceri wrote:

We were only setting the used mask for the first component of a
varying. Since the linking opts split vectors into scalars this
has mostly worked ok.

However this causes an issue where for example if we split a
struct on one side of the interface but not the other, then we
can possibly end up removing the first components on the side
that was split and then incorrectly remove the whole struct
on the other side of the varying.

With this change we simply mark all 4 components for each slot
used by a struct. We could possibly make this more fine gained
but that would require a more complex change.

This fixes a bug in Strange Brigade on RADV when tessellation
is enabled, all credit goes to Samuel Pitoiset for tracking down
the cause of the bug.

Fixes: f1eb5e639997 ("nir: add component level support to 
remove_unused_io_vars()")
---
  src/compiler/nir/nir_linking_helpers.c | 51 +-
  1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/src/compiler/nir/nir_linking_helpers.c 
b/src/compiler/nir/nir_linking_helpers.c
index f4494c78f98..ef0f94baf47 100644
--- a/src/compiler/nir/nir_linking_helpers.c
+++ b/src/compiler/nir/nir_linking_helpers.c
@@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage 
stage)
 return ((1ull << slots) - 1) << location;
  }
  
+static uint8_t

+get_num_components(nir_variable *var)
+{
+   if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type)))
+  return 4;
+
+   return glsl_get_vector_elements(glsl_without_array(var->type));
+}
+
  static void
  tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t 
*patches_read)
  {
@@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t *read, 
uint64_t *patches_read)
 continue;
  
  nir_variable *var = nir_deref_instr_get_variable(deref);

-if (var->data.patch) {
-   patches_read[var->data.location_frac] |=
-  get_variable_io_mask(var, shader->info.stage);
-} else {
-   read[var->data.location_frac] |=
-  get_variable_io_mask(var, shader->info.stage);
+for (unsigned i = 0; i < get_num_components(var); i++) {
+   if (var->data.patch) {
+  patches_read[var->data.location_frac + i] |=
+ get_variable_io_mask(var, shader->info.stage);
+   } else {
+  read[var->data.location_frac + i] |=
+ get_variable_io_mask(var, shader->info.stage);
+   }
  }
   }
}
@@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, 
nir_shader *consumer)
 uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 };
  
 nir_foreach_variable(var, >outputs) {

-  if (var->data.patch) {
- patches_written[var->data.location_frac] |=
-get_variable_io_mask(var, producer->info.stage);
-  } else {
- written[var->data.location_frac] |=
-get_variable_io_mask(var, producer->info.stage);
+  for (unsigned i = 0; i < get_num_components(var); i++) {
+ if (var->data.patch) {
+patches_written[var->data.location_frac + i] |=
+   get_variable_io_mask(var, producer->info.stage);
+ } else {
+written[var->data.location_frac + i] |=
+   get_variable_io_mask(var, producer->info.stage);
+ }
}
 }
  
 nir_foreach_variable(var, >inputs) {

-  if (var->data.patch) {
- patches_read[var->data.location_frac] |=
-get_variable_io_mask(var, consumer->info.stage);
-  } else {
- read[var->data.location_frac] |=
-get_variable_io_mask(var, consumer->info.stage);
+  for (unsigned i = 0; i < get_num_components(var); i++) {
+ if (var->data.patch) {
+patches_read[var->data.location_frac + i] |=
+   get_variable_io_mask(var, consumer->info.stage);
+ } else {
+read[var->data.location_frac + i] |=
+   get_variable_io_mask(var, consumer->info.stage);
+ }
}
 }
  

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev