Re: [Mesa-dev] [PATCH] nir: fix bit_size in lower indirect derefs.
Reviewed-by: Jason Ekstrand On Thu, Apr 25, 2019 at 9:50 PM Dave Airlie wrote: > From: Dave Airlie > > This fixes a case where we are expecting 64-bit but generate > 32-bit consts and validate gets angry. > > Signed-off-by: Dave Airlie > --- > src/compiler/nir/nir_lower_indirect_derefs.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/compiler/nir/nir_lower_indirect_derefs.c > b/src/compiler/nir/nir_lower_indirect_derefs.c > index 1e1e84fb3b0..58365628885 100644 > --- a/src/compiler/nir/nir_lower_indirect_derefs.c > +++ b/src/compiler/nir/nir_lower_indirect_derefs.c > @@ -51,7 +51,7 @@ emit_indirect_load_store_deref(nir_builder *b, > nir_intrinsic_instr *orig_instr, >nir_deref_instr *deref = *deref_arr; >assert(deref->deref_type == nir_deref_type_array); > > - nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_int(b, > mid))); > + nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_intN_t(b, > mid, parent->dest.ssa.bit_size))); >emit_indirect_load_store_deref(b, orig_instr, parent, deref_arr, > start, mid, _dest, src); >nir_push_else(b, NULL); > -- > 2.20.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: fix bit_size in lower indirect derefs.
From: Dave Airlie This fixes a case where we are expecting 64-bit but generate 32-bit consts and validate gets angry. Signed-off-by: Dave Airlie --- src/compiler/nir/nir_lower_indirect_derefs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/nir/nir_lower_indirect_derefs.c b/src/compiler/nir/nir_lower_indirect_derefs.c index 1e1e84fb3b0..58365628885 100644 --- a/src/compiler/nir/nir_lower_indirect_derefs.c +++ b/src/compiler/nir/nir_lower_indirect_derefs.c @@ -51,7 +51,7 @@ emit_indirect_load_store_deref(nir_builder *b, nir_intrinsic_instr *orig_instr, nir_deref_instr *deref = *deref_arr; assert(deref->deref_type == nir_deref_type_array); - nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_int(b, mid))); + nir_push_if(b, nir_ilt(b, deref->arr.index.ssa, nir_imm_intN_t(b, mid, parent->dest.ssa.bit_size))); emit_indirect_load_store_deref(b, orig_instr, parent, deref_arr, start, mid, _dest, src); nir_push_else(b, NULL); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops
> iand and ior are commutative, so you don't need both. --Wait, woaaah, the algebraic generator respects that? Super neat, thank you! > Especially without instruction count data (I'm assuming I won't be able to do shader-db on my hw at this point..) > For example, if the only use of inot(...some logic...) is an > if-condition, we don't want to re-write it. Our backend will just > invert the "polarity" of the conditional branch. Out of curiousity, what's the specific issue? Midgard also specifies branch polarity, though I currently hardcode to true since I'm not convinced it makes a difference. > My gut tells me that doing this in > the backend with something like Eric's NOLTIS is the right way to go. I'm not sure what NOLTIS is, sorry. Would you ack a change adding the ops to nir_opcode.py but not adding an opt passes? I have a backend algebra pass which I'm happy to move this into (and I think it would be a win on Midgard regardless); I just (AFAIK) need the ops in NIR for the algebra pass to work at all. [I don't want to duplicate this infrastructure to run over the machine IR, which is quite limited since I trust the input NIR to be good.] Thank you for the comments. > This is also part of the reason that I've never sent out some other > patches that I have that convert certain kinds of logic operations into > other things. For example, > ># True iff a == -1 and b == 0 >(('iand', 'a@bool32', ('inot', 'b@bool32)), ('ilt', a, b)), ...Cute. Point taken :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops
> We can support all of these with source modifiers because the above three > aren't really "dest invertable"... For us, they'd be > > ~src0 | ~src1 > ~src0 & ~src1 > ~src0 ^ ~src1 > > Is it really dest_invertable or both_srcs_invertable? :-) Sure, I wasn't sure how other drivers would want to handle this, if at all. I'm rather regretting not slapping an "RFC" tag on this, oh well :) > Also worth noting that I've considered adding a not modifier to NIR (if > source modifiers are a thing we actually want at that level). Over-all, > I'm a little uncertain if these need to be their own ops or not... It's worth noting that, as far as I know, the instructions added here are the _only_ ones that Midgard can do -- our source modifiers only apply to floating-point ops (int ops use the bits to distinguish sign-ext/zero-ext). The not's in these ops is baked right into the instruction. It'd be a simple lowering pass either way, of course. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: fix assertion failure in st_tgsi_lower_yuv
From: Marek Olšák src/mesa/state_tracker/st_tgsi_lower_yuv.c:68: void reg_dst(struct tgsi_full_dst_register *, const struct tgsi_full_dst_register *, unsigned int): assertion "dst->Register.WriteMask" failed Cc: 19.0 --- src/mesa/state_tracker/st_tgsi_lower_yuv.c | 46 +- 1 file changed, 27 insertions(+), 19 deletions(-) diff --git a/src/mesa/state_tracker/st_tgsi_lower_yuv.c b/src/mesa/state_tracker/st_tgsi_lower_yuv.c index 6acd173adc9..b998118897f 100644 --- a/src/mesa/state_tracker/st_tgsi_lower_yuv.c +++ b/src/mesa/state_tracker/st_tgsi_lower_yuv.c @@ -262,45 +262,53 @@ yuv_to_rgb(struct tgsi_transform_context *tctx, inst.Instruction.Saturate = 0; inst.Instruction.NumDstRegs = 1; inst.Instruction.NumSrcRegs = 2; reg_dst([0], >tmp[A].dst, TGSI_WRITEMASK_XYZ); reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, _)); reg_src([1], >imm[3], SWIZ(X, Y, Z, _)); inst.Src[1].Register.Negate = 1; tctx->emit_instruction(tctx, ); /* DP3 dst.x, tmpA, imm[0] */ - inst = dp3_instruction(); - reg_dst([0], dst, TGSI_WRITEMASK_X); - reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W)); - reg_src([1], >imm[0], SWIZ(X, Y, Z, W)); - tctx->emit_instruction(tctx, ); + if (dst->Register.WriteMask & TGSI_WRITEMASK_X) { + inst = dp3_instruction(); + reg_dst([0], dst, TGSI_WRITEMASK_X); + reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W)); + reg_src([1], >imm[0], SWIZ(X, Y, Z, W)); + tctx->emit_instruction(tctx, ); + } /* DP3 dst.y, tmpA, imm[1] */ - inst = dp3_instruction(); - reg_dst([0], dst, TGSI_WRITEMASK_Y); - reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W)); - reg_src([1], >imm[1], SWIZ(X, Y, Z, W)); - tctx->emit_instruction(tctx, ); + if (dst->Register.WriteMask & TGSI_WRITEMASK_Y) { + inst = dp3_instruction(); + reg_dst([0], dst, TGSI_WRITEMASK_Y); + reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W)); + reg_src([1], >imm[1], SWIZ(X, Y, Z, W)); + tctx->emit_instruction(tctx, ); + } /* DP3 dst.z, tmpA, imm[2] */ - inst = dp3_instruction(); - reg_dst([0], dst, TGSI_WRITEMASK_Z); - reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W)); - reg_src([1], >imm[2], SWIZ(X, Y, Z, W)); - tctx->emit_instruction(tctx, ); + if (dst->Register.WriteMask & TGSI_WRITEMASK_Z) { + inst = dp3_instruction(); + reg_dst([0], dst, TGSI_WRITEMASK_Z); + reg_src([0], >tmp[A].src, SWIZ(X, Y, Z, W)); + reg_src([1], >imm[2], SWIZ(X, Y, Z, W)); + tctx->emit_instruction(tctx, ); + } /* MOV dst.w, imm[0].x */ - inst = mov_instruction(); - reg_dst([0], dst, TGSI_WRITEMASK_W); - reg_src([0], >imm[3], SWIZ(_, _, _, W)); - tctx->emit_instruction(tctx, ); + if (dst->Register.WriteMask & TGSI_WRITEMASK_W) { + inst = mov_instruction(); + reg_dst([0], dst, TGSI_WRITEMASK_W); + reg_src([0], >imm[3], SWIZ(_, _, _, W)); + tctx->emit_instruction(tctx, ); + } } static void lower_nv12(struct tgsi_transform_context *tctx, struct tgsi_full_instruction *originst) { struct tgsi_yuv_transform *ctx = tgsi_yuv_transform(tctx); struct tgsi_full_instruction inst; struct tgsi_full_src_register *coord = >Src[0]; unsigned samp = originst->Src[1].Register.Index; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/dri: decrease input lag by syncing sooner in SwapBuffers
From: Marek Olšák It's done by: - decrease the number of frames in flight by 1 - flush before throttling in SwapBuffers (instead of wait-then-flush, do flush-then-wait) The improvement is apparent with Unigine Heaven. Previously: draw frame 2 wait frame 0 flush frame 2 present frame 2 The input lag is 2 frames. Now: draw frame 2 flush frame 2 wait frame 1 present frame 2 The input lag is 1 frame. Flushing is done before waiting, because otherwise the device would be idle after waiting. Nine is affected because it also uses the pipe cap. --- src/gallium/auxiliary/util/u_screen.c | 2 +- src/gallium/state_trackers/dri/dri_drawable.c | 20 +-- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/src/gallium/auxiliary/util/u_screen.c b/src/gallium/auxiliary/util/u_screen.c index 27f51e0898e..410f17421e6 100644 --- a/src/gallium/auxiliary/util/u_screen.c +++ b/src/gallium/auxiliary/util/u_screen.c @@ -349,21 +349,21 @@ u_pipe_screen_get_param_defaults(struct pipe_screen *pscreen, case PIPE_CAP_MAX_VARYINGS: return 8; case PIPE_CAP_COMPUTE_GRID_INFO_LAST_BLOCK: return 0; case PIPE_CAP_COMPUTE_SHADER_DERIVATIVES: return 0; case PIPE_CAP_MAX_FRAMES_IN_FLIGHT: - return 2; + return 1; case PIPE_CAP_DMABUF: #ifdef PIPE_OS_LINUX return 1; #else return 0; #endif default: unreachable("bad PIPE_CAP_*"); diff --git a/src/gallium/state_trackers/dri/dri_drawable.c b/src/gallium/state_trackers/dri/dri_drawable.c index 26bfdbecc53..c1de3bed9dd 100644 --- a/src/gallium/state_trackers/dri/dri_drawable.c +++ b/src/gallium/state_trackers/dri/dri_drawable.c @@ -555,33 +555,33 @@ dri_flush(__DRIcontext *cPriv, * * This pulls a fence off the throttling queue and waits for it if the * number of fences on the throttling queue has reached the desired * number. * * Then flushes to insert a fence at the current rendering position, and * pushes that fence on the queue. This requires that the st_context_iface * flush method returns a fence even if there are no commands to flush. */ struct pipe_screen *screen = drawable->screen->base.screen; - struct pipe_fence_handle *fence; + struct pipe_fence_handle *oldest_fence, *new_fence = NULL; - fence = swap_fences_pop_front(drawable); - if (fence) { - (void) screen->fence_finish(screen, NULL, fence, PIPE_TIMEOUT_INFINITE); - screen->fence_reference(screen, , NULL); - } + st->flush(st, flush_flags, _fence); - st->flush(st, flush_flags, ); + oldest_fence = swap_fences_pop_front(drawable); + if (oldest_fence) { + screen->fence_finish(screen, NULL, oldest_fence, PIPE_TIMEOUT_INFINITE); + screen->fence_reference(screen, _fence, NULL); + } - if (fence) { - swap_fences_push_back(drawable, fence); - screen->fence_reference(screen, , NULL); + if (new_fence) { + swap_fences_push_back(drawable, new_fence); + screen->fence_reference(screen, _fence, NULL); } } else if (flags & (__DRI2_FLUSH_DRAWABLE | __DRI2_FLUSH_CONTEXT)) { st->flush(st, flush_flags, NULL); } if (drawable) { drawable->flushing = FALSE; } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: don't ignore PIPE_FLUSH_ASYNC
From: Marek Olšák --- src/gallium/drivers/radeonsi/si_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_fence.c b/src/gallium/drivers/radeonsi/si_fence.c index 3d23597413c..ffda98d2834 100644 --- a/src/gallium/drivers/radeonsi/si_fence.c +++ b/src/gallium/drivers/radeonsi/si_fence.c @@ -566,21 +566,21 @@ static void si_flush_from_st(struct pipe_context *ctx, multi_fence->fine = fine; fine.buf = NULL; if (flags & TC_FLUSH_ASYNC) { util_queue_fence_signal(_fence->ready); tc_unflushed_batch_token_reference(_fence->tc_token, NULL); } } assert(!fine.buf); finish: - if (!(flags & PIPE_FLUSH_DEFERRED)) { + if (!(flags & (PIPE_FLUSH_DEFERRED | PIPE_FLUSH_ASYNC))) { if (sctx->dma_cs) ws->cs_sync_flush(sctx->dma_cs); ws->cs_sync_flush(sctx->gfx_cs); } } static void si_fence_server_signal(struct pipe_context *ctx, struct pipe_fence_handle *fence) { struct si_context *sctx = (struct si_context *)ctx; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops
On 4/25/19 3:37 PM, Alyssa Rosenzweig wrote: > In addition to the familiar iand/ior/ixor, some architectures feature > destination-inverted versions inand/inor/inxor. Certain > architectures also have source-inverted forms, dubbed iandnot/iornot > here. Midgard has the all of these opcodes natively. Many arches have > comparible features to implement some/all of the above. Paired with De > Morgan's Laws, these opcodes allow anything of the form > "~? (~?a [&|] ~?b)" to complete in one instruction. > > This can be used to simplify some backend-specific code on affected > architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on > Gen8+"). > > Signed-off-by: Alyssa Rosenzweig > Cc: Ian Romanick > Cc: Kenneth Graunke > --- > src/compiler/nir/nir.h| 4 > src/compiler/nir/nir_opcodes.py | 18 ++ > src/compiler/nir/nir_opt_algebraic.py | 12 > 3 files changed, 34 insertions(+) > > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index e878a63409d..3e01ec2cc06 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options { > bool lower_hadd; > bool lower_add_sat; > > + /* Set if inand/inor/inxor and iandnot/iornot supported respectively */ > + bool bitwise_dest_invertable; > + bool bitwise_src_invertable; > + > /** > * Should nir_lower_io() create load_interpolated_input intrinsics? > * > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py > index d35d820aa5b..f9d92afb53e 100644 > --- a/src/compiler/nir/nir_opcodes.py > +++ b/src/compiler/nir/nir_opcodes.py > @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 & > src1") > binop("ior", tuint, commutative + associative, "src0 | src1") > binop("ixor", tuint, commutative + associative, "src0 ^ src1") > > +# inverted bitwise logic operators > +# > +# These variants of the above include bitwise NOTs either on the result of > the > +# whole expression or on the latter operand. On some hardware (e.g. Midgard), > +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be > +# implemented as modifiers of the standard three. Along with appropriate > +# algebraic passes, these should permit any permutation of inverses on AND/OR > +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) = > ~a > +# | b = b | ~a = iornot(b, a). > + > +binop("inand", tuint, commutative, "~(src0 & src1)") > +binop("inor", tuint, commutative, "~(src0 | src1)") > +binop("inxor", tuint, commutative, "~(src0 ^ src1)") > +binop("iandnot", tuint, "", "src0 & (~src1)") > +binop("iornot", tuint, "", "src0 & (~src1)") > + > + > + > > # floating point logic operators > # > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index dad0545594f..6cb3e8cb950 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -1052,6 +1052,18 @@ late_optimizations = [ > (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', > b)), ('fadd', c, ('fmax', a, b))), > > (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', > ('ior', a, b, > + > + # We don't want to deal with inverted forms, so run this late. Any > + # combination of inverts on flags or output should result in a single > + # instruction if these are supported; cases not explicitly handled would > + # have been simplified via De Morgan's Law > + (('inot', ('iand', a, b)), ('inand', a, b), > 'options->bitwise_dest_invertable'), > + (('inot', ('ior', a, b)), ('inor', a, b), > 'options->bitwise_dest_invertable'), > + (('inot', ('ixor', a, b)), ('inxor', a, b), > 'options->bitwise_dest_invertable'), > + (('iand', ('inot', a), b), ('iandnot', b, a), > 'options->bitwise_src_invertable'), > + (('iand', a, ('inot', b)), ('iandnot', a, b), > 'options->bitwise_src_invertable'), iand and ior are commutative, so you don't need both. > + (('ior', a, ('inot', b)), ('iornot', a, b), > 'options->bitwise_src_invertable'), > + (('ior', ('inot', a), b), ('iornot', b, a), > 'options->bitwise_src_invertable'), Especially without instruction count data, I'm not very excited about this. When / how to do these translations is more complex than this. For example, if the only use of inot(...some logic...) is an if-condition, we don't want to re-write it. Our backend will just invert the "polarity" of the conditional branch. What if the "...some logic..." is used more than once? My gut tells me that doing this in the backend with something like Eric's NOLTIS is the right way to go. I can try a shader-db run later. This is also part of the reason that I've never sent out some other patches that I have that convert certain kinds of logic operations into other things. For example, # True iff a == -1 and b == 0 (('iand',
Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops
On Thu, Apr 25, 2019 at 5:37 PM Alyssa Rosenzweig wrote: > In addition to the familiar iand/ior/ixor, some architectures feature > destination-inverted versions inand/inor/inxor. Certain > architectures also have source-inverted forms, dubbed iandnot/iornot > here. Midgard has the all of these opcodes natively. Many arches have > comparible features to implement some/all of the above. Paired with De > Morgan's Laws, these opcodes allow anything of the form > "~? (~?a [&|] ~?b)" to complete in one instruction. > > This can be used to simplify some backend-specific code on affected > architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on > Gen8+"). > > Signed-off-by: Alyssa Rosenzweig > Cc: Ian Romanick > Cc: Kenneth Graunke > --- > src/compiler/nir/nir.h| 4 > src/compiler/nir/nir_opcodes.py | 18 ++ > src/compiler/nir/nir_opt_algebraic.py | 12 > 3 files changed, 34 insertions(+) > > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index e878a63409d..3e01ec2cc06 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options { > bool lower_hadd; > bool lower_add_sat; > > + /* Set if inand/inor/inxor and iandnot/iornot supported respectively */ > + bool bitwise_dest_invertable; > + bool bitwise_src_invertable; > Do we really need two booleans? Commentary on that below > + > /** > * Should nir_lower_io() create load_interpolated_input intrinsics? > * > diff --git a/src/compiler/nir/nir_opcodes.py > b/src/compiler/nir/nir_opcodes.py > index d35d820aa5b..f9d92afb53e 100644 > --- a/src/compiler/nir/nir_opcodes.py > +++ b/src/compiler/nir/nir_opcodes.py > @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 > & src1") > binop("ior", tuint, commutative + associative, "src0 | src1") > binop("ixor", tuint, commutative + associative, "src0 ^ src1") > > +# inverted bitwise logic operators > +# > +# These variants of the above include bitwise NOTs either on the result > of the > +# whole expression or on the latter operand. On some hardware (e.g. > Midgard), > +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be > +# implemented as modifiers of the standard three. Along with appropriate > +# algebraic passes, these should permit any permutation of inverses on > AND/OR > +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) > = ~a > +# | b = b | ~a = iornot(b, a). > + > +binop("inand", tuint, commutative, "~(src0 & src1)") > +binop("inor", tuint, commutative, "~(src0 | src1)") > +binop("inxor", tuint, commutative, "~(src0 ^ src1)") > We can support all of these with source modifiers because the above three aren't really "dest invertable"... For us, they'd be ~src0 | ~src1 ~src0 & ~src1 ~src0 ^ ~src1 Is it really dest_invertable or both_srcs_invertable? :-) Also worth noting that I've considered adding a not modifier to NIR (if source modifiers are a thing we actually want at that level). Over-all, I'm a little uncertain if these need to be their own ops or not... --Jason > +binop("iandnot", tuint, "", "src0 & (~src1)") > +binop("iornot", tuint, "", "src0 & (~src1)") > + > + > + > > # floating point logic operators > # > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index dad0545594f..6cb3e8cb950 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -1052,6 +1052,18 @@ late_optimizations = [ > (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', > '#c', b)), ('fadd', c, ('fmax', a, b))), > > (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', > ('ior', a, b, > + > + # We don't want to deal with inverted forms, so run this late. Any > + # combination of inverts on flags or output should result in a single > + # instruction if these are supported; cases not explicitly handled > would > + # have been simplified via De Morgan's Law > + (('inot', ('iand', a, b)), ('inand', a, b), > 'options->bitwise_dest_invertable'), > + (('inot', ('ior', a, b)), ('inor', a, b), > 'options->bitwise_dest_invertable'), > + (('inot', ('ixor', a, b)), ('inxor', a, b), > 'options->bitwise_dest_invertable'), > + (('iand', ('inot', a), b), ('iandnot', b, a), > 'options->bitwise_src_invertable'), > + (('iand', a, ('inot', b)), ('iandnot', a, b), > 'options->bitwise_src_invertable'), > + (('ior', a, ('inot', b)), ('iornot', a, b), > 'options->bitwise_src_invertable'), > + (('ior', ('inot', a), b), ('iornot', b, a), > 'options->bitwise_src_invertable'), > ] > > print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", > optimizations).render()) > -- > 2.20.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org >
Re: [Mesa-dev] [PATCH 0/2] Add inverted bitwise forms to NIR
Alright, good to know, thank you! :) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops
On Thu, Apr 25, 2019 at 3:37 PM Alyssa Rosenzweig wrote: > > In addition to the familiar iand/ior/ixor, some architectures feature > destination-inverted versions inand/inor/inxor. Certain > architectures also have source-inverted forms, dubbed iandnot/iornot > here. Midgard has the all of these opcodes natively. Many arches have > comparible features to implement some/all of the above. Paired with De > Morgan's Laws, these opcodes allow anything of the form > "~? (~?a [&|] ~?b)" to complete in one instruction. > > This can be used to simplify some backend-specific code on affected > architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on > Gen8+"). > > Signed-off-by: Alyssa Rosenzweig > Cc: Ian Romanick > Cc: Kenneth Graunke > --- > src/compiler/nir/nir.h| 4 > src/compiler/nir/nir_opcodes.py | 18 ++ > src/compiler/nir/nir_opt_algebraic.py | 12 > 3 files changed, 34 insertions(+) > > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h > index e878a63409d..3e01ec2cc06 100644 > --- a/src/compiler/nir/nir.h > +++ b/src/compiler/nir/nir.h > @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options { > bool lower_hadd; > bool lower_add_sat; > > + /* Set if inand/inor/inxor and iandnot/iornot supported respectively */ > + bool bitwise_dest_invertable; > + bool bitwise_src_invertable; neat, this trick seems to work on a6xx (and I'd assume earlier gens that use ir3 although I'd have to dust off some older blobs to check.. blob is less kind when it comes to retaining support for older hw).. very-bikeshed-comment: split into two comments per compiler_options flag? Reviewed-by: Rob Clark > + > /** > * Should nir_lower_io() create load_interpolated_input intrinsics? > * > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py > index d35d820aa5b..f9d92afb53e 100644 > --- a/src/compiler/nir/nir_opcodes.py > +++ b/src/compiler/nir/nir_opcodes.py > @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 & > src1") > binop("ior", tuint, commutative + associative, "src0 | src1") > binop("ixor", tuint, commutative + associative, "src0 ^ src1") > > +# inverted bitwise logic operators > +# > +# These variants of the above include bitwise NOTs either on the result of > the > +# whole expression or on the latter operand. On some hardware (e.g. Midgard), > +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be > +# implemented as modifiers of the standard three. Along with appropriate > +# algebraic passes, these should permit any permutation of inverses on AND/OR > +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) = > ~a > +# | b = b | ~a = iornot(b, a). > + > +binop("inand", tuint, commutative, "~(src0 & src1)") > +binop("inor", tuint, commutative, "~(src0 | src1)") > +binop("inxor", tuint, commutative, "~(src0 ^ src1)") > +binop("iandnot", tuint, "", "src0 & (~src1)") > +binop("iornot", tuint, "", "src0 & (~src1)") > + > + > + > > # floating point logic operators > # > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index dad0545594f..6cb3e8cb950 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -1052,6 +1052,18 @@ late_optimizations = [ > (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', > b)), ('fadd', c, ('fmax', a, b))), > > (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', > ('ior', a, b, > + > + # We don't want to deal with inverted forms, so run this late. Any > + # combination of inverts on flags or output should result in a single > + # instruction if these are supported; cases not explicitly handled would > + # have been simplified via De Morgan's Law > + (('inot', ('iand', a, b)), ('inand', a, b), > 'options->bitwise_dest_invertable'), > + (('inot', ('ior', a, b)), ('inor', a, b), > 'options->bitwise_dest_invertable'), > + (('inot', ('ixor', a, b)), ('inxor', a, b), > 'options->bitwise_dest_invertable'), > + (('iand', ('inot', a), b), ('iandnot', b, a), > 'options->bitwise_src_invertable'), > + (('iand', a, ('inot', b)), ('iandnot', a, b), > 'options->bitwise_src_invertable'), > + (('ior', a, ('inot', b)), ('iornot', a, b), > 'options->bitwise_src_invertable'), > + (('ior', ('inot', a), b), ('iornot', b, a), > 'options->bitwise_src_invertable'), > ] > > print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", > optimizations).render()) > -- > 2.20.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Add inverted bitwise forms to NIR
On Thu, Apr 25, 2019 at 6:37 PM Alyssa Rosenzweig wrote: > Various combinations of these instructions are found on many > architectures. They appear directly as-is on Midgard; some of them > should be implementable on Intel Gen8+ via the source modifiers; I was > told over IRC that AMD/Nouveau might have some of these as well. Rather FWIW, many instructions NVIDIA GPUs support modifiers on either or both parameters to an operation (esp the bitwise and arithmetic ops). Whether the modifier is supported depends on various oddness (e.g. whether the second arg is GPR vs const/imm20 vs imm32), so I doubt this would be particularly useful there. Furthermore, the nouveau compiler already has a modifier propagation pass which attempts to combine the NOT/NEG/SAT ops with uses that can consume them directly. It's unclear to me whether NIR should directly support all 16 boolean operations or not. I can see arguments both for and against. Just wanted to point out the NVIDIA situation since it was referenced explicitly. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)
I ran this via the intel-ci and it has 0 regressions, I've also looked at this before in v1. Reviewed-by: Dave Airlie On Tue, 23 Apr 2019 at 06:08, Marek Olšák wrote: > > Ping. Thanks. > > On Tue, Apr 16, 2019 at 10:16 AM Marek Olšák wrote: >> >> From: Marek Olšák >> >> This fixes KHR-GL45.compute_shader.resources-max on radeonsi. >> >> Fixes: 4e1e8f684bf "glsl: remember which SSBOs are not read-only and pass it >> to gallium" >> >> v2: use is_interface_array, protect again assertion failures in >> u_bit_consecutive >> --- >> src/compiler/glsl/link_uniforms.cpp | 22 +++--- >> 1 file changed, 19 insertions(+), 3 deletions(-) >> >> diff --git a/src/compiler/glsl/link_uniforms.cpp >> b/src/compiler/glsl/link_uniforms.cpp >> index ef124111991..aa96227a7e1 100644 >> --- a/src/compiler/glsl/link_uniforms.cpp >> +++ b/src/compiler/glsl/link_uniforms.cpp >> @@ -515,44 +515,60 @@ public: >>this->record_next_bindless_sampler = new string_to_uint_map; >>this->record_next_image = new string_to_uint_map; >>this->record_next_bindless_image = new string_to_uint_map; >> >>buffer_block_index = -1; >>if (var->is_in_buffer_block()) { >> struct gl_uniform_block *blks = var->is_in_shader_storage_block() ? >> prog->data->ShaderStorageBlocks : prog->data->UniformBlocks; >> unsigned num_blks = var->is_in_shader_storage_block() ? >> prog->data->NumShaderStorageBlocks : >> prog->data->NumUniformBlocks; >> + bool is_interface_array = >> +var->is_interface_instance() && var->type->is_array(); >> >> - if (var->is_interface_instance() && var->type->is_array()) { >> + if (is_interface_array) { >> unsigned l = strlen(var->get_interface_type()->name); >> >> for (unsigned i = 0; i < num_blks; i++) { >> if (strncmp(var->get_interface_type()->name, blks[i].Name, l) >> == 0 && blks[i].Name[l] == '[') { >>buffer_block_index = i; >>break; >> } >> } >> } else { >> for (unsigned i = 0; i < num_blks; i++) { >> if (strcmp(var->get_interface_type()->name, blks[i].Name) == >> 0) { >>buffer_block_index = i; >>break; >> } >> } >> } >> assert(buffer_block_index != -1); >> >> if (var->is_in_shader_storage_block() && >> - !var->data.memory_read_only) >> -shader_storage_blocks_write_access |= 1 << buffer_block_index; >> + !var->data.memory_read_only) { >> +unsigned array_size = is_interface_array ? >> + var->type->array_size() : 1; >> + >> +STATIC_ASSERT(MAX_SHADER_STORAGE_BUFFERS <= 32); >> + >> +/* Shaders that use too many SSBOs will fail to compile, which >> + * we don't care about. >> + * >> + * This is true for shaders that do not use too many SSBOs: >> + */ >> +if (buffer_block_index + array_size <= 32) { >> + shader_storage_blocks_write_access |= >> + u_bit_consecutive(buffer_block_index, array_size); >> +} >> + } >> >> /* Uniform blocks that were specified with an instance name must be >>* handled a little bit differently. The name of the variable is >> the >>* name used to reference the uniform block instead of being the >> name >>* of a variable within the block. Therefore, searching for the >> name >>* within the block will fail. >>*/ >> if (var->is_interface_instance()) { >> ubo_byte_offset = 0; >> process(var->get_interface_type(), >> -- >> 2.17.1 >> > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()
Thanks! Quoting Timothy Arceri (2019-04-25 15:36:44) > On 26/4/19 6:50 am, Dylan Baker wrote: > > Hi Tim, > > > > I had to make a couple of small tweaks to get this to apply against 19.0 > > (namely > > that the glsl_type_is_struct -> glsl_type_is_struct_or_ifc doesn't exist in > > 19.0), could you take a look at the patch in the staging/19.0 branch and > > let me > > know if it looks okay? > > Looks good to me. Thanks! > > > > > Thanks, > > Dylan > > > > Quoting Timothy Arceri (2019-04-24 18:17:42) > >> We were only setting the used mask for the first component of a > >> varying. Since the linking opts split vectors into scalars this > >> has mostly worked ok. > >> > >> However this causes an issue where for example if we split a > >> struct on one side of the interface but not the other, then we > >> can possibly end up removing the first components on the side > >> that was split and then incorrectly remove the whole struct > >> on the other side of the varying. > >> > >> With this change we simply mark all 4 components for each slot > >> used by a struct. We could possibly make this more fine gained > >> but that would require a more complex change. > >> > >> This fixes a bug in Strange Brigade on RADV when tessellation > >> is enabled, all credit goes to Samuel Pitoiset for tracking down > >> the cause of the bug. > >> > >> Fixes: f1eb5e639997 ("nir: add component level support to > >> remove_unused_io_vars()") > >> --- > >> src/compiler/nir/nir_linking_helpers.c | 51 +- > >> 1 file changed, 33 insertions(+), 18 deletions(-) > >> > >> diff --git a/src/compiler/nir/nir_linking_helpers.c > >> b/src/compiler/nir/nir_linking_helpers.c > >> index f4494c78f98..ef0f94baf47 100644 > >> --- a/src/compiler/nir/nir_linking_helpers.c > >> +++ b/src/compiler/nir/nir_linking_helpers.c > >> @@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage > >> stage) > >> return ((1ull << slots) - 1) << location; > >> } > >> > >> +static uint8_t > >> +get_num_components(nir_variable *var) > >> +{ > >> + if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type))) > >> + return 4; > >> + > >> + return glsl_get_vector_elements(glsl_without_array(var->type)); > >> +} > >> + > >> static void > >> tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t > >> *patches_read) > >> { > >> @@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t > >> *read, uint64_t *patches_read) > >> continue; > >> > >> nir_variable *var = nir_deref_instr_get_variable(deref); > >> -if (var->data.patch) { > >> - patches_read[var->data.location_frac] |= > >> - get_variable_io_mask(var, shader->info.stage); > >> -} else { > >> - read[var->data.location_frac] |= > >> - get_variable_io_mask(var, shader->info.stage); > >> +for (unsigned i = 0; i < get_num_components(var); i++) { > >> + if (var->data.patch) { > >> + patches_read[var->data.location_frac + i] |= > >> + get_variable_io_mask(var, shader->info.stage); > >> + } else { > >> + read[var->data.location_frac + i] |= > >> + get_variable_io_mask(var, shader->info.stage); > >> + } > >> } > >>} > >> } > >> @@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, > >> nir_shader *consumer) > >> uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 }; > >> > >> nir_foreach_variable(var, >outputs) { > >> - if (var->data.patch) { > >> - patches_written[var->data.location_frac] |= > >> -get_variable_io_mask(var, producer->info.stage); > >> - } else { > >> - written[var->data.location_frac] |= > >> -get_variable_io_mask(var, producer->info.stage); > >> + for (unsigned i = 0; i < get_num_components(var); i++) { > >> + if (var->data.patch) { > >> +patches_written[var->data.location_frac + i] |= > >> + get_variable_io_mask(var, producer->info.stage); > >> + } else { > >> +written[var->data.location_frac + i] |= > >> + get_variable_io_mask(var, producer->info.stage); > >> + } > >> } > >> } > >> > >> nir_foreach_variable(var, >inputs) { > >> - if (var->data.patch) { > >> - patches_read[var->data.location_frac] |= > >> -get_variable_io_mask(var, consumer->info.stage); > >> - } else { > >> - read[var->data.location_frac] |= > >> -get_variable_io_mask(var, consumer->info.stage); > >> + for (unsigned i = 0; i < get_num_components(var); i++) { > >> + if (var->data.patch) { > >> +patches_read[var->data.location_frac + i] |= > >> + get_variable_io_mask(var,
Re: [Mesa-dev] [PATCH 2/3] radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)
Awesome, thanks. Quoting Marek Olšák (2019-04-25 14:50:52) > Thanks. It looks good. > > Marek > > On Thu, Apr 25, 2019, 5:17 PM Dylan Baker wrote: > > Hi Marek, > > I've tried to apply this to 19.0, I had to pull "radeonsi: add > si_debug_options > for convenient adding/removing of options", which is fine, but this patch > also > assumes your si compute-queue only patches, which aren't present in 19.0. > I've > made a small change to get it compiling, but I'm sure it's not the right > fix, so > if you could take a look at the staging/19.0 branch and let me know what > you'd > like to do I'd appreciate it. > > Thanks, > Dylan > > Quoting Marek Olšák (2019-04-18 14:46:27) > > From: Marek Olšák > > > > Needed to track context rolls caused by streamout and ACQUIRE_MEM. > > ACQUIRE_MEM can occur outside of draw calls. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355 > > > > v2: squashed patches and done more rework > > > > Cc: 19.0 > > --- > > src/gallium/drivers/radeonsi/si_pipe.c | 2 + > > src/gallium/drivers/radeonsi/si_pipe.h | 3 +- > > src/gallium/drivers/radeonsi/si_state.c | 8 +- > > .../drivers/radeonsi/si_state_binning.c | 4 +- > > src/gallium/drivers/radeonsi/si_state_draw.c | 86 +++ > > .../drivers/radeonsi/si_state_shaders.c | 10 +-- > > .../drivers/radeonsi/si_state_streamout.c | 1 + > > .../drivers/radeonsi/si_state_viewport.c | 2 +- > > 8 files changed, 68 insertions(+), 48 deletions(-) > > > > diff --git a/src/gallium/drivers/radeonsi/si_pipe.c > b/src/gallium/drivers > /radeonsi/si_pipe.c > > index fa96ce34224..7209db9fb37 100644 > > --- a/src/gallium/drivers/radeonsi/si_pipe.c > > +++ b/src/gallium/drivers/radeonsi/si_pipe.c > > @@ -1072,20 +1072,22 @@ struct pipe_screen > *radeonsi_screen_create(struct > radeon_winsys *ws, > > > > sscreen->has_out_of_order_rast = sscreen->info.chip_class >= VI > & > & > > sscreen->info.max_se >= 2 && > > !(sscreen->debug_flags & DBG > (NO_OUT_OF_ORDER)); > > sscreen->assume_no_z_fights = > > driQueryOptionb(config->options, > "radeonsi_assume_no_z_fights"); > > sscreen->commutative_blend_add = > > driQueryOptionb(config->options, > "radeonsi_commutative_blend_add"); > > sscreen->clear_db_cache_before_clear = > > driQueryOptionb(config->options, > "radeonsi_clear_db_cache_before_clear"); > > + sscreen->has_gfx9_scissor_bug = sscreen->info.family == > CHIP_VEGA10 || > > + sscreen->info.family == > CHIP_RAVEN; > > sscreen->has_msaa_sample_loc_bug = (sscreen->info.family >= > CHIP_POLARIS10 && > > sscreen->info.family <= > CHIP_POLARIS12) || > > sscreen->info.family == > CHIP_VEGA10 || > > sscreen->info.family == > CHIP_RAVEN; > > sscreen->has_ls_vgpr_init_bug = sscreen->info.family == > CHIP_VEGA10 || > > sscreen->info.family == > CHIP_RAVEN; > > sscreen->has_dcc_constant_encode = sscreen->info.family == > CHIP_RAVEN2; > > > > /* Only enable primitive binning on APUs by default. */ > > sscreen->dpbb_allowed = sscreen->info.family == CHIP_RAVEN || > > diff --git a/src/gallium/drivers/radeonsi/si_pipe.h > b/src/gallium/drivers > /radeonsi/si_pipe.h > > index aaa95f32d20..a4c90a4f69f 100644 > > --- a/src/gallium/drivers/radeonsi/si_pipe.h > > +++ b/src/gallium/drivers/radeonsi/si_pipe.h > > @@ -463,20 +463,21 @@ struct si_screen { > > unsigned eqaa_force_coverage_samples; > > unsigned eqaa_force_z_samples; > > unsigned eqaa_force_color_samples; > > bool has_clear_state; > > bool has_distributed_tess; > > bool has_draw_indirect_multi; > > bool has_out_of_order_rast; > > bool assume_no_z_fights; > > bool commutative_blend_add; > > bool clear_db_cache_before_clear; > > + bool has_gfx9_scissor_bug; > > bool has_msaa_sample_loc_bug; > > bool
[Mesa-dev] [PATCH 1/2] nir: Add inverted bitwise ops
In addition to the familiar iand/ior/ixor, some architectures feature destination-inverted versions inand/inor/inxor. Certain architectures also have source-inverted forms, dubbed iandnot/iornot here. Midgard has the all of these opcodes natively. Many arches have comparible features to implement some/all of the above. Paired with De Morgan's Laws, these opcodes allow anything of the form "~? (~?a [&|] ~?b)" to complete in one instruction. This can be used to simplify some backend-specific code on affected architectures, e.f. 8eb36c91 ("intel/fs: Emit logical-not of operands on Gen8+"). Signed-off-by: Alyssa Rosenzweig Cc: Ian Romanick Cc: Kenneth Graunke --- src/compiler/nir/nir.h| 4 src/compiler/nir/nir_opcodes.py | 18 ++ src/compiler/nir/nir_opt_algebraic.py | 12 3 files changed, 34 insertions(+) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index e878a63409d..3e01ec2cc06 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -2318,6 +2318,10 @@ typedef struct nir_shader_compiler_options { bool lower_hadd; bool lower_add_sat; + /* Set if inand/inor/inxor and iandnot/iornot supported respectively */ + bool bitwise_dest_invertable; + bool bitwise_src_invertable; + /** * Should nir_lower_io() create load_interpolated_input intrinsics? * diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py index d35d820aa5b..f9d92afb53e 100644 --- a/src/compiler/nir/nir_opcodes.py +++ b/src/compiler/nir/nir_opcodes.py @@ -690,6 +690,24 @@ binop("iand", tuint, commutative + associative, "src0 & src1") binop("ior", tuint, commutative + associative, "src0 | src1") binop("ixor", tuint, commutative + associative, "src0 ^ src1") +# inverted bitwise logic operators +# +# These variants of the above include bitwise NOTs either on the result of the +# whole expression or on the latter operand. On some hardware (e.g. Midgard), +# these are native ops. On other hardware (e.g. Intel Gen8+), these can be +# implemented as modifiers of the standard three. Along with appropriate +# algebraic passes, these should permit any permutation of inverses on AND/OR +# to execute in a single cycle. For example, ~(a & ~b) = ~(~(~a | ~(~b))) = ~a +# | b = b | ~a = iornot(b, a). + +binop("inand", tuint, commutative, "~(src0 & src1)") +binop("inor", tuint, commutative, "~(src0 | src1)") +binop("inxor", tuint, commutative, "~(src0 ^ src1)") +binop("iandnot", tuint, "", "src0 & (~src1)") +binop("iornot", tuint, "", "src0 & (~src1)") + + + # floating point logic operators # diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index dad0545594f..6cb3e8cb950 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -1052,6 +1052,18 @@ late_optimizations = [ (('fmax', ('fadd(is_used_once)', '#c', a), ('fadd(is_used_once)', '#c', b)), ('fadd', c, ('fmax', a, b))), (('bcsel', a, 0, ('b2f32', ('inot', 'b@bool'))), ('b2f32', ('inot', ('ior', a, b, + + # We don't want to deal with inverted forms, so run this late. Any + # combination of inverts on flags or output should result in a single + # instruction if these are supported; cases not explicitly handled would + # have been simplified via De Morgan's Law + (('inot', ('iand', a, b)), ('inand', a, b), 'options->bitwise_dest_invertable'), + (('inot', ('ior', a, b)), ('inor', a, b), 'options->bitwise_dest_invertable'), + (('inot', ('ixor', a, b)), ('inxor', a, b), 'options->bitwise_dest_invertable'), + (('iand', ('inot', a), b), ('iandnot', b, a), 'options->bitwise_src_invertable'), + (('iand', a, ('inot', b)), ('iandnot', a, b), 'options->bitwise_src_invertable'), + (('ior', a, ('inot', b)), ('iornot', a, b), 'options->bitwise_src_invertable'), + (('ior', ('inot', a), b), ('iornot', b, a), 'options->bitwise_src_invertable'), ] print(nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render()) -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] panfrost/midgard: Use inverted forms
Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/midgard/midgard_compile.c | 5 + src/gallium/drivers/panfrost/midgard/midgard_compile.h | 5 - 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.c b/src/gallium/drivers/panfrost/midgard/midgard_compile.c index 742217c5a97..8e8f563732e 100644 --- a/src/gallium/drivers/panfrost/midgard/midgard_compile.c +++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.c @@ -1182,8 +1182,13 @@ emit_alu(compiler_context *ctx, nir_alu_instr *instr) ALU_CASE(fcos, fcos); ALU_CASE(iand, iand); +ALU_CASE(iandnot, iandnot); +ALU_CASE(inand, inand); ALU_CASE(ior, ior); +ALU_CASE(iornot, iornot); +ALU_CASE(inor, inor); ALU_CASE(ixor, ixor); +ALU_CASE(inxor, inxor); ALU_CASE(inot, inand); ALU_CASE(ishl, ishl); ALU_CASE(ishr, iasr); diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.h b/src/gallium/drivers/panfrost/midgard/midgard_compile.h index 0724582d62c..6a894566a20 100644 --- a/src/gallium/drivers/panfrost/midgard/midgard_compile.h +++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.h @@ -107,7 +107,10 @@ static const nir_shader_compiler_options midgard_nir_options = { .lower_extract_byte = true, .lower_extract_word = true, -.native_integers = true +.native_integers = true, + +.bitwise_dest_invertable = true, +.bitwise_src_invertable = true, }; #endif -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/2] Add inverted bitwise forms to NIR
Various combinations of these instructions are found on many architectures. They appear directly as-is on Midgard; some of them should be implementable on Intel Gen8+ via the source modifiers; I was told over IRC that AMD/Nouveau might have some of these as well. Rather than forcing backends to grapple with these directly, let's add some new opcodes to NIR to support them so we can use a proper algebraic pass. Alyssa Rosenzweig (2): nir: Add inverted bitwise ops panfrost/midgard: Use inverted forms src/compiler/nir/nir.h | 4 src/compiler/nir/nir_opcodes.py| 18 ++ src/compiler/nir/nir_opt_algebraic.py | 12 .../drivers/panfrost/midgard/midgard_compile.c | 5 + .../drivers/panfrost/midgard/midgard_compile.h | 5 - 5 files changed, 43 insertions(+), 1 deletion(-) -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()
On 26/4/19 6:50 am, Dylan Baker wrote: Hi Tim, I had to make a couple of small tweaks to get this to apply against 19.0 (namely that the glsl_type_is_struct -> glsl_type_is_struct_or_ifc doesn't exist in 19.0), could you take a look at the patch in the staging/19.0 branch and let me know if it looks okay? Looks good to me. Thanks! Thanks, Dylan Quoting Timothy Arceri (2019-04-24 18:17:42) We were only setting the used mask for the first component of a varying. Since the linking opts split vectors into scalars this has mostly worked ok. However this causes an issue where for example if we split a struct on one side of the interface but not the other, then we can possibly end up removing the first components on the side that was split and then incorrectly remove the whole struct on the other side of the varying. With this change we simply mark all 4 components for each slot used by a struct. We could possibly make this more fine gained but that would require a more complex change. This fixes a bug in Strange Brigade on RADV when tessellation is enabled, all credit goes to Samuel Pitoiset for tracking down the cause of the bug. Fixes: f1eb5e639997 ("nir: add component level support to remove_unused_io_vars()") --- src/compiler/nir/nir_linking_helpers.c | 51 +- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index f4494c78f98..ef0f94baf47 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage stage) return ((1ull << slots) - 1) << location; } +static uint8_t +get_num_components(nir_variable *var) +{ + if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type))) + return 4; + + return glsl_get_vector_elements(glsl_without_array(var->type)); +} + static void tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t *patches_read) { @@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t *patches_read) continue; nir_variable *var = nir_deref_instr_get_variable(deref); -if (var->data.patch) { - patches_read[var->data.location_frac] |= - get_variable_io_mask(var, shader->info.stage); -} else { - read[var->data.location_frac] |= - get_variable_io_mask(var, shader->info.stage); +for (unsigned i = 0; i < get_num_components(var); i++) { + if (var->data.patch) { + patches_read[var->data.location_frac + i] |= + get_variable_io_mask(var, shader->info.stage); + } else { + read[var->data.location_frac + i] |= + get_variable_io_mask(var, shader->info.stage); + } } } } @@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, nir_shader *consumer) uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 }; nir_foreach_variable(var, >outputs) { - if (var->data.patch) { - patches_written[var->data.location_frac] |= -get_variable_io_mask(var, producer->info.stage); - } else { - written[var->data.location_frac] |= -get_variable_io_mask(var, producer->info.stage); + for (unsigned i = 0; i < get_num_components(var); i++) { + if (var->data.patch) { +patches_written[var->data.location_frac + i] |= + get_variable_io_mask(var, producer->info.stage); + } else { +written[var->data.location_frac + i] |= + get_variable_io_mask(var, producer->info.stage); + } } } nir_foreach_variable(var, >inputs) { - if (var->data.patch) { - patches_read[var->data.location_frac] |= -get_variable_io_mask(var, consumer->info.stage); - } else { - read[var->data.location_frac] |= -get_variable_io_mask(var, consumer->info.stage); + for (unsigned i = 0; i < get_num_components(var); i++) { + if (var->data.patch) { +patches_read[var->data.location_frac + i] |= + get_variable_io_mask(var, consumer->info.stage); + } else { +read[var->data.location_frac + i] |= + get_variable_io_mask(var, consumer->info.stage); + } } } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)
Thanks. It looks good. Marek On Thu, Apr 25, 2019, 5:17 PM Dylan Baker wrote: > Hi Marek, > > I've tried to apply this to 19.0, I had to pull "radeonsi: add > si_debug_options > for convenient adding/removing of options", which is fine, but this patch > also > assumes your si compute-queue only patches, which aren't present in 19.0. > I've > made a small change to get it compiling, but I'm sure it's not the right > fix, so > if you could take a look at the staging/19.0 branch and let me know what > you'd > like to do I'd appreciate it. > > Thanks, > Dylan > > Quoting Marek Olšák (2019-04-18 14:46:27) > > From: Marek Olšák > > > > Needed to track context rolls caused by streamout and ACQUIRE_MEM. > > ACQUIRE_MEM can occur outside of draw calls. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355 > > > > v2: squashed patches and done more rework > > > > Cc: 19.0 > > --- > > src/gallium/drivers/radeonsi/si_pipe.c| 2 + > > src/gallium/drivers/radeonsi/si_pipe.h| 3 +- > > src/gallium/drivers/radeonsi/si_state.c | 8 +- > > .../drivers/radeonsi/si_state_binning.c | 4 +- > > src/gallium/drivers/radeonsi/si_state_draw.c | 86 +++ > > .../drivers/radeonsi/si_state_shaders.c | 10 +-- > > .../drivers/radeonsi/si_state_streamout.c | 1 + > > .../drivers/radeonsi/si_state_viewport.c | 2 +- > > 8 files changed, 68 insertions(+), 48 deletions(-) > > > > diff --git a/src/gallium/drivers/radeonsi/si_pipe.c > b/src/gallium/drivers/radeonsi/si_pipe.c > > index fa96ce34224..7209db9fb37 100644 > > --- a/src/gallium/drivers/radeonsi/si_pipe.c > > +++ b/src/gallium/drivers/radeonsi/si_pipe.c > > @@ -1072,20 +1072,22 @@ struct pipe_screen > *radeonsi_screen_create(struct radeon_winsys *ws, > > > > sscreen->has_out_of_order_rast = sscreen->info.chip_class >= VI > && > > sscreen->info.max_se >= 2 && > > !(sscreen->debug_flags & > DBG(NO_OUT_OF_ORDER)); > > sscreen->assume_no_z_fights = > > driQueryOptionb(config->options, > "radeonsi_assume_no_z_fights"); > > sscreen->commutative_blend_add = > > driQueryOptionb(config->options, > "radeonsi_commutative_blend_add"); > > sscreen->clear_db_cache_before_clear = > > driQueryOptionb(config->options, > "radeonsi_clear_db_cache_before_clear"); > > + sscreen->has_gfx9_scissor_bug = sscreen->info.family == > CHIP_VEGA10 || > > + sscreen->info.family == > CHIP_RAVEN; > > sscreen->has_msaa_sample_loc_bug = (sscreen->info.family >= > CHIP_POLARIS10 && > > sscreen->info.family <= > CHIP_POLARIS12) || > >sscreen->info.family == > CHIP_VEGA10 || > >sscreen->info.family == > CHIP_RAVEN; > > sscreen->has_ls_vgpr_init_bug = sscreen->info.family == > CHIP_VEGA10 || > > sscreen->info.family == > CHIP_RAVEN; > > sscreen->has_dcc_constant_encode = sscreen->info.family == > CHIP_RAVEN2; > > > > /* Only enable primitive binning on APUs by default. */ > > sscreen->dpbb_allowed = sscreen->info.family == CHIP_RAVEN || > > diff --git a/src/gallium/drivers/radeonsi/si_pipe.h > b/src/gallium/drivers/radeonsi/si_pipe.h > > index aaa95f32d20..a4c90a4f69f 100644 > > --- a/src/gallium/drivers/radeonsi/si_pipe.h > > +++ b/src/gallium/drivers/radeonsi/si_pipe.h > > @@ -463,20 +463,21 @@ struct si_screen { > > unsignedeqaa_force_coverage_samples; > > unsignedeqaa_force_z_samples; > > unsignedeqaa_force_color_samples; > > boolhas_clear_state; > > boolhas_distributed_tess; > > boolhas_draw_indirect_multi; > > boolhas_out_of_order_rast; > > boolassume_no_z_fights; > > boolcommutative_blend_add; > > boolclear_db_cache_before_clear; > > + boolhas_gfx9_scissor_bug; > > boolhas_msaa_sample_loc_bug; > > boolhas_ls_vgpr_init_bug; > > boolhas_dcc_constant_encode; > > booldpbb_allowed; > > booldfsm_allowed; > > boolllvm_has_working_vgpr_indexing; > > > > /* Whether shaders are monolithic (1-part) or separate (3-part). > */ > > booluse_monolithic_shaders; > > bool
[Mesa-dev] [Bug 107022] [RADV] The Witcher 3: Trembling of trees
https://bugs.freedesktop.org/show_bug.cgi?id=107022 zefkerri...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #34 from zefkerri...@gmail.com --- Hi! I just tested it again, but this time using Mesa 19.0.3 with LLVM 8.0.0 and now it is fully fixed and works well, just like on Windows. This is similar to the fact that it was a LLVM regression that fixed now in LLVM 8, and not a Mesa bug at all. Thank you very much. GOOD LUCK! -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)
Hi Marek, I've tried to apply this to 19.0, I had to pull "radeonsi: add si_debug_options for convenient adding/removing of options", which is fine, but this patch also assumes your si compute-queue only patches, which aren't present in 19.0. I've made a small change to get it compiling, but I'm sure it's not the right fix, so if you could take a look at the staging/19.0 branch and let me know what you'd like to do I'd appreciate it. Thanks, Dylan Quoting Marek Olšák (2019-04-18 14:46:27) > From: Marek Olšák > > Needed to track context rolls caused by streamout and ACQUIRE_MEM. > ACQUIRE_MEM can occur outside of draw calls. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355 > > v2: squashed patches and done more rework > > Cc: 19.0 > --- > src/gallium/drivers/radeonsi/si_pipe.c| 2 + > src/gallium/drivers/radeonsi/si_pipe.h| 3 +- > src/gallium/drivers/radeonsi/si_state.c | 8 +- > .../drivers/radeonsi/si_state_binning.c | 4 +- > src/gallium/drivers/radeonsi/si_state_draw.c | 86 +++ > .../drivers/radeonsi/si_state_shaders.c | 10 +-- > .../drivers/radeonsi/si_state_streamout.c | 1 + > .../drivers/radeonsi/si_state_viewport.c | 2 +- > 8 files changed, 68 insertions(+), 48 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_pipe.c > b/src/gallium/drivers/radeonsi/si_pipe.c > index fa96ce34224..7209db9fb37 100644 > --- a/src/gallium/drivers/radeonsi/si_pipe.c > +++ b/src/gallium/drivers/radeonsi/si_pipe.c > @@ -1072,20 +1072,22 @@ struct pipe_screen *radeonsi_screen_create(struct > radeon_winsys *ws, > > sscreen->has_out_of_order_rast = sscreen->info.chip_class >= VI && > sscreen->info.max_se >= 2 && > !(sscreen->debug_flags & > DBG(NO_OUT_OF_ORDER)); > sscreen->assume_no_z_fights = > driQueryOptionb(config->options, > "radeonsi_assume_no_z_fights"); > sscreen->commutative_blend_add = > driQueryOptionb(config->options, > "radeonsi_commutative_blend_add"); > sscreen->clear_db_cache_before_clear = > driQueryOptionb(config->options, > "radeonsi_clear_db_cache_before_clear"); > + sscreen->has_gfx9_scissor_bug = sscreen->info.family == CHIP_VEGA10 || > + sscreen->info.family == CHIP_RAVEN; > sscreen->has_msaa_sample_loc_bug = (sscreen->info.family >= > CHIP_POLARIS10 && > sscreen->info.family <= > CHIP_POLARIS12) || >sscreen->info.family == > CHIP_VEGA10 || >sscreen->info.family == CHIP_RAVEN; > sscreen->has_ls_vgpr_init_bug = sscreen->info.family == CHIP_VEGA10 || > sscreen->info.family == CHIP_RAVEN; > sscreen->has_dcc_constant_encode = sscreen->info.family == > CHIP_RAVEN2; > > /* Only enable primitive binning on APUs by default. */ > sscreen->dpbb_allowed = sscreen->info.family == CHIP_RAVEN || > diff --git a/src/gallium/drivers/radeonsi/si_pipe.h > b/src/gallium/drivers/radeonsi/si_pipe.h > index aaa95f32d20..a4c90a4f69f 100644 > --- a/src/gallium/drivers/radeonsi/si_pipe.h > +++ b/src/gallium/drivers/radeonsi/si_pipe.h > @@ -463,20 +463,21 @@ struct si_screen { > unsignedeqaa_force_coverage_samples; > unsignedeqaa_force_z_samples; > unsignedeqaa_force_color_samples; > boolhas_clear_state; > boolhas_distributed_tess; > boolhas_draw_indirect_multi; > boolhas_out_of_order_rast; > boolassume_no_z_fights; > boolcommutative_blend_add; > boolclear_db_cache_before_clear; > + boolhas_gfx9_scissor_bug; > boolhas_msaa_sample_loc_bug; > boolhas_ls_vgpr_init_bug; > boolhas_dcc_constant_encode; > booldpbb_allowed; > booldfsm_allowed; > boolllvm_has_working_vgpr_indexing; > > /* Whether shaders are monolithic (1-part) or separate (3-part). */ > booluse_monolithic_shaders; > boolrecord_llvm_ir; > @@ -1062,21 +1063,21 @@ struct si_context { > unsignednum_vs_flushes; > unsignednum_ps_flushes; > unsignednum_cs_flushes; >
Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()
Hi Tim, I had to make a couple of small tweaks to get this to apply against 19.0 (namely that the glsl_type_is_struct -> glsl_type_is_struct_or_ifc doesn't exist in 19.0), could you take a look at the patch in the staging/19.0 branch and let me know if it looks okay? Thanks, Dylan Quoting Timothy Arceri (2019-04-24 18:17:42) > We were only setting the used mask for the first component of a > varying. Since the linking opts split vectors into scalars this > has mostly worked ok. > > However this causes an issue where for example if we split a > struct on one side of the interface but not the other, then we > can possibly end up removing the first components on the side > that was split and then incorrectly remove the whole struct > on the other side of the varying. > > With this change we simply mark all 4 components for each slot > used by a struct. We could possibly make this more fine gained > but that would require a more complex change. > > This fixes a bug in Strange Brigade on RADV when tessellation > is enabled, all credit goes to Samuel Pitoiset for tracking down > the cause of the bug. > > Fixes: f1eb5e639997 ("nir: add component level support to > remove_unused_io_vars()") > --- > src/compiler/nir/nir_linking_helpers.c | 51 +- > 1 file changed, 33 insertions(+), 18 deletions(-) > > diff --git a/src/compiler/nir/nir_linking_helpers.c > b/src/compiler/nir/nir_linking_helpers.c > index f4494c78f98..ef0f94baf47 100644 > --- a/src/compiler/nir/nir_linking_helpers.c > +++ b/src/compiler/nir/nir_linking_helpers.c > @@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage > stage) > return ((1ull << slots) - 1) << location; > } > > +static uint8_t > +get_num_components(nir_variable *var) > +{ > + if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type))) > + return 4; > + > + return glsl_get_vector_elements(glsl_without_array(var->type)); > +} > + > static void > tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t > *patches_read) > { > @@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t *read, > uint64_t *patches_read) > continue; > > nir_variable *var = nir_deref_instr_get_variable(deref); > -if (var->data.patch) { > - patches_read[var->data.location_frac] |= > - get_variable_io_mask(var, shader->info.stage); > -} else { > - read[var->data.location_frac] |= > - get_variable_io_mask(var, shader->info.stage); > +for (unsigned i = 0; i < get_num_components(var); i++) { > + if (var->data.patch) { > + patches_read[var->data.location_frac + i] |= > + get_variable_io_mask(var, shader->info.stage); > + } else { > + read[var->data.location_frac + i] |= > + get_variable_io_mask(var, shader->info.stage); > + } > } > } >} > @@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, > nir_shader *consumer) > uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 }; > > nir_foreach_variable(var, >outputs) { > - if (var->data.patch) { > - patches_written[var->data.location_frac] |= > -get_variable_io_mask(var, producer->info.stage); > - } else { > - written[var->data.location_frac] |= > -get_variable_io_mask(var, producer->info.stage); > + for (unsigned i = 0; i < get_num_components(var); i++) { > + if (var->data.patch) { > +patches_written[var->data.location_frac + i] |= > + get_variable_io_mask(var, producer->info.stage); > + } else { > +written[var->data.location_frac + i] |= > + get_variable_io_mask(var, producer->info.stage); > + } >} > } > > nir_foreach_variable(var, >inputs) { > - if (var->data.patch) { > - patches_read[var->data.location_frac] |= > -get_variable_io_mask(var, consumer->info.stage); > - } else { > - read[var->data.location_frac] |= > -get_variable_io_mask(var, consumer->info.stage); > + for (unsigned i = 0; i < get_num_components(var); i++) { > + if (var->data.patch) { > +patches_read[var->data.location_frac + i] |= > + get_variable_io_mask(var, consumer->info.stage); > + } else { > +read[var->data.location_frac + i] |= > + get_variable_io_mask(var, consumer->info.stage); > + } >} > } > > -- > 2.20.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: signature ___ mesa-dev mailing list
[Mesa-dev] [Bug 110456] Land YCBCR extensions
https://bugs.freedesktop.org/show_bug.cgi?id=110456 Bas Nieuwenhuizen changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Bas Nieuwenhuizen --- https://gitlab.freedesktop.org/mesa/mesa/merge_requests/593 has been merged. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 110439] [TRACKER] Mesa 19.1 feature tracker
https://bugs.freedesktop.org/show_bug.cgi?id=110439 Bug 110439 depends on bug 110456, which changed state. Bug 110456 Summary: Land YCBCR extensions https://bugs.freedesktop.org/show_bug.cgi?id=110456 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 110516] OpenGLCurveEvaluator does not return affine coordinates
https://bugs.freedesktop.org/show_bug.cgi?id=110516 Volker Enderlein changed: What|Removed |Added Attachment #144094|0 |1 is obsolete|| --- Comment #1 from Volker Enderlein --- Created attachment 144095 --> https://bugs.freedesktop.org/attachment.cgi?id=144095=edit Correct patch -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/3] llvmpipe: add lp_fence_timedwait() helper
czw., 25 kwi 2019 o 20:11 Gustaw Smolarczyk napisał(a): > > czw., 25 kwi 2019 o 19:42 Emil Velikov napisał(a): > > > > The function is analogous to lp_fence_wait() while taking at timeout > > (ns) parameter, as needed for EGL fence/sync. > > > > v2: > > - use absolute UTC time, as per spec (Gustaw) > > - bail out on cnd_timedwait() failure (Gustaw) > > > > Cc: Gustaw Smolarczyk > > Cc: Roland Scheidegger > > Signed-off-by: Emil Velikov > > Reviewed-by: Roland Scheidegger (v1) > > --- > > src/gallium/drivers/llvmpipe/lp_fence.c | 30 + > > src/gallium/drivers/llvmpipe/lp_fence.h | 3 +++ > > 2 files changed, 33 insertions(+) > > > > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c > > b/src/gallium/drivers/llvmpipe/lp_fence.c > > index 20cd91cd63d..b79d773bf6c 100644 > > --- a/src/gallium/drivers/llvmpipe/lp_fence.c > > +++ b/src/gallium/drivers/llvmpipe/lp_fence.c > > @@ -125,3 +125,33 @@ lp_fence_wait(struct lp_fence *f) > > } > > > > > > +boolean > > +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout) > > +{ > > + struct timespec ts; > > + int ret; > > + > > + timespec_get(, TIME_UTC); > > + > > + ts.tv_nsec += timeout % 10L; > > + ts.tv_sec += timeout / 10L; > > + if (ts.tv_nsec >= 10L) { > > + ts.tv_sec++; > > + ts.tv_nsec -= 10L; > > + } > > + > > + if (LP_DEBUG & DEBUG_FENCE) > > + debug_printf("%s %d\n", __FUNCTION__, f->id); > > + > > + mtx_lock(>mutex); > > + assert(f->issued); > > + while (f->count < f->rank) { > > + ret = cnd_timedwait(>signalled, >mutex, ); > > + if (ret != thrd_success) > > + break; > > + } > > + mtx_unlock(>mutex); > > + return (f->count >= f->rank && ret == thrd_success); Hmm, you are reading from the fence object outside of the critical section, which doesn't sound safe. Maybe compute the return value before the mutex is unlocked? const boolean result = (f->count >= f->rank); mtx_unlock(>mutex); return result; Since f->rank is immutable and f->count never decreases, it might still be ok without this change, though it is racy. > > Is checking for ret == thrd_success here really necessary? If the > first part is true we already know that the fence has been signalled. > > With this changed or not: > > Reviewed-by: Gustaw Smolarczyk > > > +} > > + > > + > > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h > > b/src/gallium/drivers/llvmpipe/lp_fence.h > > index b72026492c6..5ba746d22d1 100644 > > --- a/src/gallium/drivers/llvmpipe/lp_fence.h > > +++ b/src/gallium/drivers/llvmpipe/lp_fence.h > > @@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence); > > void > > lp_fence_wait(struct lp_fence *fence); > > > > +boolean > > +lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout); > > + > > void > > llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen); > > > > -- > > 2.20.1 > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 110516] OpenGLCurveEvaluator does not return affine coordinates
https://bugs.freedesktop.org/show_bug.cgi?id=110516 Bug ID: 110516 Summary: OpenGLCurveEvaluator does not return affine coordinates Product: Mesa Version: 19.0 Hardware: All OS: All Status: NEW Severity: major Priority: medium Component: GLU Assignee: mesa-dev@lists.freedesktop.org Reporter: volkerenderl...@hotmail.com QA Contact: mesa-dev@lists.freedesktop.org Created attachment 144094 --> https://bugs.freedesktop.org/attachment.cgi?id=144094=edit fix for bug report According to the spec the GLU_NURBS_VERTEX and GLU_NURBS_VERTEX_DATA callbacks should always return affine coordinates. But in case of a NURBS curve definition in homogeneous coordinates it returns homogeneous coordinates. The transformation to affine coordinates is missing. See accompanying patch for a fix. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/3] llvmpipe: add lp_fence_timedwait() helper
czw., 25 kwi 2019 o 19:42 Emil Velikov napisał(a): > > The function is analogous to lp_fence_wait() while taking at timeout > (ns) parameter, as needed for EGL fence/sync. > > v2: > - use absolute UTC time, as per spec (Gustaw) > - bail out on cnd_timedwait() failure (Gustaw) > > Cc: Gustaw Smolarczyk > Cc: Roland Scheidegger > Signed-off-by: Emil Velikov > Reviewed-by: Roland Scheidegger (v1) > --- > src/gallium/drivers/llvmpipe/lp_fence.c | 30 + > src/gallium/drivers/llvmpipe/lp_fence.h | 3 +++ > 2 files changed, 33 insertions(+) > > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c > b/src/gallium/drivers/llvmpipe/lp_fence.c > index 20cd91cd63d..b79d773bf6c 100644 > --- a/src/gallium/drivers/llvmpipe/lp_fence.c > +++ b/src/gallium/drivers/llvmpipe/lp_fence.c > @@ -125,3 +125,33 @@ lp_fence_wait(struct lp_fence *f) > } > > > +boolean > +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout) > +{ > + struct timespec ts; > + int ret; > + > + timespec_get(, TIME_UTC); > + > + ts.tv_nsec += timeout % 10L; > + ts.tv_sec += timeout / 10L; > + if (ts.tv_nsec >= 10L) { > + ts.tv_sec++; > + ts.tv_nsec -= 10L; > + } > + > + if (LP_DEBUG & DEBUG_FENCE) > + debug_printf("%s %d\n", __FUNCTION__, f->id); > + > + mtx_lock(>mutex); > + assert(f->issued); > + while (f->count < f->rank) { > + ret = cnd_timedwait(>signalled, >mutex, ); > + if (ret != thrd_success) > + break; > + } > + mtx_unlock(>mutex); > + return (f->count >= f->rank && ret == thrd_success); Is checking for ret == thrd_success here really necessary? If the first part is true we already know that the fence has been signalled. With this changed or not: Reviewed-by: Gustaw Smolarczyk > +} > + > + > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h > b/src/gallium/drivers/llvmpipe/lp_fence.h > index b72026492c6..5ba746d22d1 100644 > --- a/src/gallium/drivers/llvmpipe/lp_fence.h > +++ b/src/gallium/drivers/llvmpipe/lp_fence.h > @@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence); > void > lp_fence_wait(struct lp_fence *fence); > > +boolean > +lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout); > + > void > llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen); > > -- > 2.20.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 110356] install_megadrivers.py creates new dangling symlink [bisected]
https://bugs.freedesktop.org/show_bug.cgi?id=110356 --- Comment #5 from Fabio Pedretti --- Any plan to backport these (up to 5d310015) to 18.3 (I know it is EOL) and release a 18.3.7? See Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926857 -- You are receiving this mail because: You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/3] llvmpipe: correctly handle waiting in llvmpipe_fence_finish
Currently if the timeout differs from 0, we'll end up with infinite wait... even if the user is perfectly clear they don't want that. Use the new lp_fence_timedwait() helper guarding both waits in an !lp_fence_signalled block like the rest of llvmpipe. Signed-off-by: Emil Velikov Reviewed-by: Roland Scheidegger --- src/gallium/drivers/llvmpipe/lp_screen.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 8426427e397..510346d2abf 100644 --- a/src/gallium/drivers/llvmpipe/lp_screen.c +++ b/src/gallium/drivers/llvmpipe/lp_screen.c @@ -637,7 +637,12 @@ llvmpipe_fence_finish(struct pipe_screen *screen, if (!timeout) return lp_fence_signalled(f); - lp_fence_wait(f); + if (!lp_fence_signalled(f)) { + if (timeout != PIPE_TIMEOUT_INFINITE) + return lp_fence_timedwait(f, timeout); + + lp_fence_wait(f); + } return TRUE; } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/3] llvmpipe: add lp_fence_timedwait() helper
The function is analogous to lp_fence_wait() while taking at timeout (ns) parameter, as needed for EGL fence/sync. v2: - use absolute UTC time, as per spec (Gustaw) - bail out on cnd_timedwait() failure (Gustaw) Cc: Gustaw Smolarczyk Cc: Roland Scheidegger Signed-off-by: Emil Velikov Reviewed-by: Roland Scheidegger (v1) --- src/gallium/drivers/llvmpipe/lp_fence.c | 30 + src/gallium/drivers/llvmpipe/lp_fence.h | 3 +++ 2 files changed, 33 insertions(+) diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c b/src/gallium/drivers/llvmpipe/lp_fence.c index 20cd91cd63d..b79d773bf6c 100644 --- a/src/gallium/drivers/llvmpipe/lp_fence.c +++ b/src/gallium/drivers/llvmpipe/lp_fence.c @@ -125,3 +125,33 @@ lp_fence_wait(struct lp_fence *f) } +boolean +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout) +{ + struct timespec ts; + int ret; + + timespec_get(, TIME_UTC); + + ts.tv_nsec += timeout % 10L; + ts.tv_sec += timeout / 10L; + if (ts.tv_nsec >= 10L) { + ts.tv_sec++; + ts.tv_nsec -= 10L; + } + + if (LP_DEBUG & DEBUG_FENCE) + debug_printf("%s %d\n", __FUNCTION__, f->id); + + mtx_lock(>mutex); + assert(f->issued); + while (f->count < f->rank) { + ret = cnd_timedwait(>signalled, >mutex, ); + if (ret != thrd_success) + break; + } + mtx_unlock(>mutex); + return (f->count >= f->rank && ret == thrd_success); +} + + diff --git a/src/gallium/drivers/llvmpipe/lp_fence.h b/src/gallium/drivers/llvmpipe/lp_fence.h index b72026492c6..5ba746d22d1 100644 --- a/src/gallium/drivers/llvmpipe/lp_fence.h +++ b/src/gallium/drivers/llvmpipe/lp_fence.h @@ -65,6 +65,9 @@ lp_fence_signalled(struct lp_fence *fence); void lp_fence_wait(struct lp_fence *fence); +boolean +lp_fence_timedwait(struct lp_fence *fence, uint64_t timeout); + void llvmpipe_init_screen_fence_funcs(struct pipe_screen *screen); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/3] llvmpipe: Always return some fence in flush (v2)
From: Tomasz Figa If there is no last fence, due to no rendering happening yet, just create a new signaled fence and return it, to match the expectations of the EGL sync fence API. Fixes random "Could not create sync fence 0x3003" assertion failures from Skia on Android, coming from the following code: https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427 Reproducible especially with thread count >= 4. One could make the driver always keep the reference to the last fence, but: - the driver seems to explicitly destroy the fence whenever a rendering pass completes and changing that would require a significant functional change to the code. (Specifically, in lp_scene_end_rasterization().) - it still wouldn't solve the problem of an EGL sync fence being created and waited on without any rendering happening at all, which is also likely to happen with Android code pointed to in the commit. Therefore, the simple approach of always creating a fence is taken, similarly to other drivers, such as radeonsi. Tested with piglit llvmpipe suite with no regressions and following tests fixed: egl_khr_fence_sync conformance eglclientwaitsynckhr_flag_sync_flush eglclientwaitsynckhr_nonzero_timeout eglclientwaitsynckhr_zero_timeout eglcreatesynckhr_default_attributes eglgetsyncattribkhr_invalid_attrib eglgetsyncattribkhr_sync_status v2: - remove the useless lp_fence_reference() dance (Nicolai), - explain why creating the dummy fence is the right approach. Signed-off-by: Tomasz Figa Reviewed-by: Roland Scheidegger --- src/gallium/drivers/llvmpipe/lp_setup.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/llvmpipe/lp_setup.c b/src/gallium/drivers/llvmpipe/lp_setup.c index b0873694732..e72e119c8a1 100644 --- a/src/gallium/drivers/llvmpipe/lp_setup.c +++ b/src/gallium/drivers/llvmpipe/lp_setup.c @@ -361,6 +361,8 @@ lp_setup_flush( struct lp_setup_context *setup, if (fence) { lp_fence_reference((struct lp_fence **)fence, setup->last_fence); + if (!*fence) + *fence = (struct pipe_fence_handle *)lp_fence_create(0); } } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] llvmpipe: add lp_fence_timedwait() helper
On Tue, 16 Apr 2019 at 11:50, Gustaw Smolarczyk wrote: > > wt., 16 kwi 2019 o 12:11 Emil Velikov napisał(a): > > > > On Thu, 11 Apr 2019 at 17:55, Gustaw Smolarczyk > > wrote: > > > > > > czw., 11 kwi 2019 o 18:06 Emil Velikov > > > napisał(a): > > > > > > > > The function is analogous to lp_fence_wait() while taking at timeout > > > > (ns) parameter, as needed for EGL fence/sync. > > > > > > > > Cc: Roland Scheidegger > > > > Signed-off-by: Emil Velikov > > > > --- > > > > src/gallium/drivers/llvmpipe/lp_fence.c | 22 ++ > > > > src/gallium/drivers/llvmpipe/lp_fence.h | 3 +++ > > > > 2 files changed, 25 insertions(+) > > > > > > > > diff --git a/src/gallium/drivers/llvmpipe/lp_fence.c > > > > b/src/gallium/drivers/llvmpipe/lp_fence.c > > > > index 20cd91cd63d..f8b31a9d6a5 100644 > > > > --- a/src/gallium/drivers/llvmpipe/lp_fence.c > > > > +++ b/src/gallium/drivers/llvmpipe/lp_fence.c > > > > @@ -125,3 +125,25 @@ lp_fence_wait(struct lp_fence *f) > > > > } > > > > > > > > > > > > +boolean > > > > +lp_fence_timedwait(struct lp_fence *f, uint64_t timeout) > > > > +{ > > > > + struct timespec ts = { > > > > + .tv_nsec = timeout % 10L, > > > > + .tv_sec = timeout / 10L, > > > > + }; > > > > > > According to the documentation [1] and looking at the implementation > > > in mesa [2], cnd_timedwait accepts an absolute time in UTC, not > > > duration. It seems that the fence_finish callback accepts duration. > > > > > Agreed, not sure how I missed that one. > > > > > [1] https://en.cppreference.com/w/c/thread/cnd_timedwait > > > [2] > > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/include/c11/threads_posix.h#L135 > > > > > > > + int ret; > > > > + > > > > + if (LP_DEBUG & DEBUG_FENCE) > > > > + debug_printf("%s %d\n", __FUNCTION__, f->id); > > > > + > > > > + mtx_lock(>mutex); > > > > + assert(f->issued); > > > > + while (f->count < f->rank) { > > > > + ret = cnd_timedwait(>signalled, >mutex, ); > > > > > > Shouldn't ret be checked for thrd_busy here as well? Otherwise, the > > > function will busy-wait after the timeout is reached instead of > > > returning. > > > > > Actually this should be tweaked to: > > - only wait for the requested timeout > > - _regardless_ of how meany threads (and hence ranks) llvmpipe has > > > > Something like the following (warning pre-coffee code) should work. > > > >mtx_lock(>mutex); > >assert(f->issued); > >if (f->count < f->rank) > > ret = cnd_timedwait(>signalled, >mutex, ); > > > >mtx_unlock(>mutex); > >return (f->count >= f->rank && ret == thrd_success); > > > > Is it a good idea not to perform cnd_timedwait in a loop? A spurious > wake up could could shorten the wait time. > Thanks Gustaw - had no idea cnd_timedwait can spuriously return. For anyone else wondering - the C11 spec does not mention anything, but there's a bug [1] to fix that. Underlying implementations (such as pthreads [2]) are already pretty clear about this "interesting" behaviour. Will send out and updated revision in a moment. Thanks again. Emil [1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm#dr_480 [2] https://linux.die.net/man/3/pthread_cond_timedwait ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [MR] glsl: Use default precision on struct/interface members when nothing specified
Mesa already keeps track of the GLES precision for variables and stores it in the ir_variable. When no precision is explicitly specified it takes the default precision for the corresponding type. However, when the variable is a struct or interface, the precision of each individual member is attached to the glsl_type instead. The code to make it use the default precision was missing so this branch adds it in. Only the last patch actually makes this change. The rest of the patches are to fix regressions in Piglit and CTS. The underlying problem is that Mesa was considering types that have different precisions to be different when comparing interstage interfaces (varyings and UBOs). According to the spec it should be ignored. Presumably this problem already existed if mismatched precisions were explicitly specified but we didn’t have any tests to test it. Storing the default precision makes some tests fail because the default precision for ints is different in the vertex and fragment stages so it’s easier to accidentally make a test case that tests this. The tests that regressed are: dEQP-GLES31.functional.shaders.opaque_type_indexing.* (12 tests) piglit.spec.ext_transform_feedback.structs_gles3 basic-struct run piglit.spec.glsl-es-3_00.execution.varying-struct-centroid_gles3 piglit.spec.ext_transform_feedback.structs_gles3 basic-struct get https://gitlab.freedesktop.org/mesa/mesa/merge_requests/736 signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] radeonsi: handle unaligned vertex buffers in hardware
Hi all, the following patches contain code to implement all vertex fetches using plain, non-format loads plus explicit shader arithmetic for format conversion. This allows us to remove the software workaround for unaligned vertex buffers on SI, because we can just load individual bytes on the GPU. CI+ will still use short/dword loads even in the unaligned case. The format conversion code was tested by running with radeonsi_vs_fetch_always_opencode=true on both Verde and Vega. Please review! Thanks, Nicolai -- src/amd/common/ac_llvm_build.c | 313 + src/amd/common/ac_llvm_build.h | 30 ++ .../drivers/radeonsi/si_debug_options.h | 1 + src/gallium/drivers/radeonsi/si_get.c| 2 +- src/gallium/drivers/radeonsi/si_pipe.h | 1 + src/gallium/drivers/radeonsi/si_shader.c | 249 + src/gallium/drivers/radeonsi/si_shader.h | 46 +-- src/gallium/drivers/radeonsi/si_state.c | 233 +++- src/gallium/drivers/radeonsi/si_state.h | 19 + .../drivers/radeonsi/si_state_shaders.c | 37 +- 10 files changed, 645 insertions(+), 286 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] radeonsi: store sctx->vertex_elements in a local in si_shader_selector_key_vs
From: Nicolai Hähnle Purely as a shorthand in the remainder of the function. --- src/gallium/drivers/radeonsi/si_state_shaders.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index f57e7730905..583d7c9d3ca 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -1372,33 +1372,32 @@ static unsigned si_get_alpha_test_func(struct si_context *sctx) static void si_shader_selector_key_vs(struct si_context *sctx, struct si_shader_selector *vs, struct si_shader_key *key, struct si_vs_prolog_bits *prolog_key) { if (!sctx->vertex_elements || vs->info.properties[TGSI_PROPERTY_VS_BLIT_SGPRS]) return; - prolog_key->instance_divisor_is_one = - sctx->vertex_elements->instance_divisor_is_one; - prolog_key->instance_divisor_is_fetched = - sctx->vertex_elements->instance_divisor_is_fetched; + struct si_vertex_elements *elts = sctx->vertex_elements; + + prolog_key->instance_divisor_is_one = elts->instance_divisor_is_one; + prolog_key->instance_divisor_is_fetched = elts->instance_divisor_is_fetched; /* Prefer a monolithic shader to allow scheduling divisions around * VBO loads. */ if (prolog_key->instance_divisor_is_fetched) key->opt.prefer_mono = 1; - unsigned count = MIN2(vs->info.num_inputs, - sctx->vertex_elements->count); - memcpy(key->mono.vs_fix_fetch, sctx->vertex_elements->fix_fetch, count); + unsigned count = MIN2(vs->info.num_inputs, elts->count); + memcpy(key->mono.vs_fix_fetch, elts->fix_fetch, count); } static void si_shader_selector_key_hw_vs(struct si_context *sctx, struct si_shader_selector *vs, struct si_shader_key *key) { struct si_shader_selector *ps = sctx->ps_shader.cso; key->opt.clip_disable = sctx->queued.named.rasterizer->clip_plane_enable == 0 && -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] radeonsi: overhaul the vertex fetch fixup mechanism
From: Nicolai Hähnle The overall goal is to support unaligned loads from vertex buffers natively on SI. In the unaligned case, we fall back to the general case implementation in ac_build_opencoded_load_format. Since this function is fully general, we will also use it going forward for cases requiring fully manual format conversions of dwords anyway. This requires a different encoding of the fix_fetch array, which will now contain the entire format information if a fixup is required. Having to check the alignment of vertex buffers is awkward. To keep the impact on the fast path minimal, the si_context will keep track of which vertex buffers are (not) at least dword-aligned, while the si_vertex_elements will note which vertex buffers have some (at most dword) alignment requirement. Vertex buffers should be dword-aligned most of the time, which allows a fast early-out in almost all cases. Add the radeonsi_vs_fetch_always_opencode configuration variable for testing purposes. Note that it can only be used reliably on LLVM >= 9, because support for byte and short load is required. --- .../drivers/radeonsi/si_debug_options.h | 1 + src/gallium/drivers/radeonsi/si_get.c | 2 +- src/gallium/drivers/radeonsi/si_pipe.h| 1 + src/gallium/drivers/radeonsi/si_shader.c | 249 ++ src/gallium/drivers/radeonsi/si_shader.h | 46 ++-- src/gallium/drivers/radeonsi/si_state.c | 233 +--- src/gallium/drivers/radeonsi/si_state.h | 19 ++ .../drivers/radeonsi/si_state_shaders.c | 26 +- 8 files changed, 297 insertions(+), 280 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_debug_options.h b/src/gallium/drivers/radeonsi/si_debug_options.h index 019256ca1d1..0bde7910fc6 100644 --- a/src/gallium/drivers/radeonsi/si_debug_options.h +++ b/src/gallium/drivers/radeonsi/si_debug_options.h @@ -1,6 +1,7 @@ OPT_BOOL(clear_db_cache_before_clear, false, "Clear DB cache before fast depth clear") OPT_BOOL(enable_nir, false, "Enable NIR") OPT_BOOL(aux_debug, false, "Generate ddebug_dumps for the auxiliary context") OPT_BOOL(sync_compile, false, "Always compile synchronously (will cause stalls)") +OPT_BOOL(vs_fetch_always_opencode, false, "Always open code vertex fetches (less efficient, purely for testing)") #undef OPT_BOOL diff --git a/src/gallium/drivers/radeonsi/si_get.c b/src/gallium/drivers/radeonsi/si_get.c index 4e23d283ab7..ff825c5e30a 100644 --- a/src/gallium/drivers/radeonsi/si_get.c +++ b/src/gallium/drivers/radeonsi/si_get.c @@ -190,21 +190,21 @@ static int si_get_param(struct pipe_screen *pscreen, enum pipe_cap param) /* Optimal number for good TexSubImage performance on Polaris10. */ return 64 * 1024 * 1024; case PIPE_CAP_MAX_TEXTURE_BUFFER_SIZE: case PIPE_CAP_MAX_SHADER_BUFFER_SIZE: return MIN2(sscreen->info.max_alloc_size, INT_MAX); case PIPE_CAP_VERTEX_BUFFER_OFFSET_4BYTE_ALIGNED_ONLY: case PIPE_CAP_VERTEX_BUFFER_STRIDE_4BYTE_ALIGNED_ONLY: case PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY: - return !sscreen->info.has_unaligned_shader_loads; + return HAVE_LLVM < 0x0900 && !sscreen->info.has_unaligned_shader_loads; case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE: return sscreen->info.has_sparse_vm_mappings ? RADEON_SPARSE_PAGE_SIZE : 0; case PIPE_CAP_PACKED_UNIFORMS: if (sscreen->options.enable_nir) return 1; return 0; diff --git a/src/gallium/drivers/radeonsi/si_pipe.h b/src/gallium/drivers/radeonsi/si_pipe.h index 7fc0319973b..1d241436a6d 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.h +++ b/src/gallium/drivers/radeonsi/si_pipe.h @@ -938,20 +938,21 @@ struct si_context { union pipe_color_union *border_color_map; /* in VRAM (slow access), little endian */ unsignedborder_color_count; unsignednum_vs_blit_sgprs; uint32_t vs_blit_sh_data[SI_VS_BLIT_SGPRS_POS_TEXCOORD]; uint32_tcs_user_data[4]; /* Vertex and index buffers. */ boolvertex_buffers_dirty; boolvertex_buffer_pointer_dirty; struct pipe_vertex_buffer vertex_buffer[SI_NUM_VERTEX_BUFFERS]; + uint16_tvertex_buffer_unaligned; /* bitmask of not dword-aligned buffers */ /* MSAA config state. */ int ps_iter_samples; boolps_uses_fbfetch; boolsmoothing_enabled; /* DB render state. */ unsignedps_db_shader_control; unsigneddbcb_copy_sample; bool
[Mesa-dev] [PATCH 1/3] amd/common: add ac_build_opencoded_fetch_format
From: Nicolai Hähnle Implement software emulation of buffer_load_format for all types required by vertex buffer fetches. --- src/amd/common/ac_llvm_build.c | 313 + src/amd/common/ac_llvm_build.h | 30 2 files changed, 343 insertions(+) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 4fdf73c99ba..197c58a8e45 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1667,20 +1667,333 @@ ac_build_tbuffer_load_byte(struct ac_llvm_context *ctx, res = ac_build_raw_tbuffer_load(ctx, rsrc, voffset, soffset, immoffset, 1, dfmt, nfmt, glc, false, false); res = LLVMBuildTrunc(ctx->builder, res, ctx->i8, ""); } return res; } + +/** + * Convert an 11- or 10-bit unsigned floating point number to an f32. + * + * The input exponent is expected to be biased analogous to IEEE-754, i.e. by + * 2^(exp_bits-1) - 1 (as defined in OpenGL and other graphics APIs). + */ +static LLVMValueRef +ac_ufN_to_float(struct ac_llvm_context *ctx, LLVMValueRef src, unsigned exp_bits, unsigned mant_bits) +{ + assert(LLVMTypeOf(src) == ctx->i32); + + LLVMValueRef tmp; + LLVMValueRef mantissa; + mantissa = LLVMBuildAnd(ctx->builder, src, LLVMConstInt(ctx->i32, (1 << mant_bits) - 1, false), ""); + + /* Converting normal numbers is just a shift + correcting the exponent bias */ + unsigned normal_shift = 23 - mant_bits; + unsigned bias_shift = 127 - ((1 << (exp_bits - 1)) - 1); + LLVMValueRef shifted, normal; + + shifted = LLVMBuildShl(ctx->builder, src, LLVMConstInt(ctx->i32, normal_shift, false), ""); + normal = LLVMBuildAdd(ctx->builder, shifted, LLVMConstInt(ctx->i32, bias_shift << 23, false), ""); + + /* Converting nan/inf numbers is the same, but with a different exponent update */ + LLVMValueRef naninf; + naninf = LLVMBuildOr(ctx->builder, normal, LLVMConstInt(ctx->i32, 0xff << 23, false), ""); + + /* Converting denormals is the complex case: determine the leading zeros of the +* mantissa to obtain the correct shift for the mantissa and exponent correction. +*/ + LLVMValueRef denormal; + LLVMValueRef params[2] = { + mantissa, + ctx->i1true, /* result can be undef when arg is 0 */ + }; + LLVMValueRef ctlz = ac_build_intrinsic(ctx, "llvm.ctlz.i32", ctx->i32, + params, 2, AC_FUNC_ATTR_READNONE); + + /* Shift such that the leading 1 ends up as the LSB of the exponent field. */ + tmp = LLVMBuildSub(ctx->builder, ctlz, LLVMConstInt(ctx->i32, 8, false), ""); + denormal = LLVMBuildShl(ctx->builder, mantissa, tmp, ""); + + unsigned denormal_exp = bias_shift + (32 - mant_bits) - 1; + tmp = LLVMBuildSub(ctx->builder, LLVMConstInt(ctx->i32, denormal_exp, false), ctlz, ""); + tmp = LLVMBuildShl(ctx->builder, tmp, LLVMConstInt(ctx->i32, 23, false), ""); + denormal = LLVMBuildAdd(ctx->builder, denormal, tmp, ""); + + /* Select the final result. */ + LLVMValueRef result; + + tmp = LLVMBuildICmp(ctx->builder, LLVMIntUGE, src, + LLVMConstInt(ctx->i32, ((1 << exp_bits) - 1) << mant_bits, false), ""); + result = LLVMBuildSelect(ctx->builder, tmp, naninf, normal, ""); + + tmp = LLVMBuildICmp(ctx->builder, LLVMIntUGE, src, + LLVMConstInt(ctx->i32, 1 << mant_bits, false), ""); + result = LLVMBuildSelect(ctx->builder, tmp, result, denormal, ""); + + tmp = LLVMBuildICmp(ctx->builder, LLVMIntNE, src, ctx->i32_0, ""); + result = LLVMBuildSelect(ctx->builder, tmp, result, ctx->i32_0, ""); + + return ac_to_float(ctx, result); +} + +/** + * Generate a fully general open coded buffer format fetch with all required + * fixups suitable for vertex fetch, using non-format buffer loads. + * + * Some combinations of argument values have special interpretations: + * - size = 8 bytes, format = fixed indicates PIPE_FORMAT_R11G11B10_FLOAT + * - size = 8 bytes, format != {float,fixed} indicates a 2_10_10_10 data format + * + * \param log_size log(size of channel in bytes) + * \param num_channels number of channels (1 to 4) + * \param format AC_FETCH_FORMAT_xxx value + * \param reverse whether XYZ channels are reversed + * \param known_aligned whether the source is known to be aligned to hardware's + * effective element size for loading the given format + * (note: this means dword alignment for 8_8_8_8, 16_16, etc.) + * \param rsrc buffer resource descriptor + * \return the resulting vector of floats or integers bitcast to <4 x i32> + */ +LLVMValueRef +ac_build_opencoded_load_format(struct ac_llvm_context *ctx, +
Re: [Mesa-dev] [PATCH v2] vulkan/wsi: check if the display_fd given is master
r-b On Thu, Apr 25, 2019 at 12:22 PM Emil Velikov wrote: > > On Fri, 19 Apr 2019 at 16:01, Emil Velikov wrote: > > > > From: Emil Velikov > > > > As effectively required by the extension, we need to ensure we're master > > > > Currently drivers employ vendor specific solutions, which check if the > > device behind the fd is capable*, yet none of them do the master check. > > > > *In the radv case, if acceleration is available. > > > > Instead of duplicating the check in each driver, keep it where it's > > needed and used. > > > > Note this copies libdrm's drmIsMaster() to avoid depending on bleeding > > edge version of the library. > > > > v2: set the fd to -1 if not master (Bas) > > > > Cc: Keith Packard > > Cc: Jason Ekstrand > > Cc: Bas Nieuwenhuizen > > Cc: Andres Rodriguez > > Reported-by: Andres Rodriguez > > Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]") > > Signed-off-by: Emil Velikov > > --- > > src/vulkan/wsi/wsi_common_display.c | 27 +++ > > 1 file changed, 27 insertions(+) > > > > diff --git a/src/vulkan/wsi/wsi_common_display.c > > b/src/vulkan/wsi/wsi_common_display.c > > index 74ed36ed646..2be20e85046 100644 > > --- a/src/vulkan/wsi/wsi_common_display.c > > +++ b/src/vulkan/wsi/wsi_common_display.c > > @@ -1812,6 +1812,30 @@ fail_attr_init: > > return ret; > > } > > > > + > > +/* > > + * Local version fo the libdrm helper. Added to avoid depending on bleeding > > + * edge version of the library. > > + */ > > +static int > > +local_drmIsMaster(int fd) > > +{ > > + /* Detect master by attempting something that requires master. > > +* > > +* Authenticating magic tokens requires master and 0 is an > > +* internal kernel detail which we could use. Attempting this on > > +* a master fd would fail therefore fail with EINVAL because 0 > > +* is invalid. > > +* > > +* A non-master fd will fail with EACCES, as the kernel checks > > +* for master before attempting to do anything else. > > +* > > +* Since we don't want to leak implementation details, use > > +* EACCES. > > +*/ > > + return drmAuthMagic(fd, 0) != -EACCES; > > +} > > + > > VkResult > > wsi_display_init_wsi(struct wsi_device *wsi_device, > > const VkAllocationCallbacks *alloc, > > @@ -1827,6 +1851,9 @@ wsi_display_init_wsi(struct wsi_device *wsi_device, > > } > > > > wsi->fd = display_fd; > > + if (wsi->fd != -1 && !local_drmIsMaster(wsi->fd)) > > + wsi->fd = -1; > > + > > wsi->alloc = alloc; > > > Humble ping? > > -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan/wsi: don't use DUMB_CLOSE for normal GEM handles
r-b On Thu, Apr 25, 2019 at 12:22 PM Emil Velikov wrote: > > On Fri, 19 Apr 2019 at 16:03, Emil Velikov wrote: > > > > From: Emil Velikov > > > > Currently we get normal GEM handles from PrimeFDToHandle, yet we close > > then with DUMB_CLOSE. Use GEM_CLOSE instead. > > > > Cc: Keith Packard > > Cc: Jason Ekstrand > > Cc: Bas Nieuwenhuizen > > Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]") > > Signed-off-by: Emil Velikov > > --- > > src/vulkan/wsi/wsi_common_display.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/src/vulkan/wsi/wsi_common_display.c > > b/src/vulkan/wsi/wsi_common_display.c > > index 2be20e85046..66e191906fc 100644 > > --- a/src/vulkan/wsi/wsi_common_display.c > > +++ b/src/vulkan/wsi/wsi_common_display.c > > @@ -974,8 +974,8 @@ static void > > wsi_display_destroy_buffer(struct wsi_display *wsi, > > uint32_t buffer) > > { > > - (void) drmIoctl(wsi->fd, DRM_IOCTL_MODE_DESTROY_DUMB, > > - &((struct drm_mode_destroy_dumb) { .handle = buffer })); > > + (void) drmIoctl(wsi->fd, DRM_IOCTL_GEM_CLOSE, > > + &((struct drm_gem_close) { .handle = buffer })); > > } > > > Humble ping anyone? > > AFAICT closing handles from PrimeFDToHandle() with DUMB_CLOSE is a > violation, even if it somehow works today. > > Thanks > Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] radeonsi: add si_debug_options for convenient adding/removing of options
On 25.04.19 04:45, Marek Olšák wrote: [snip] - bool clear_db_cache_before_clear; bool has_msaa_sample_loc_bug; bool has_ls_vgpr_init_bug; bool has_dcc_constant_encode; bool dpbb_allowed; bool dfsm_allowed; bool llvm_has_working_vgpr_indexing; + struct { +#define OPT_BOOL(name, dflt, description) uint8_t name:1; Why not bool instead of uint8_t? Sure. +#include "si_debug_options.inc" Why not use the .h file extension? The original intention was to distinguish it from header files where including is supposed idempotent, i.e. including the file a second time makes no difference. Anyway, I'm changing it. Cheers, Nicolai Other than those, this is: Reviewed-by: Marek Olšák mailto:marek.ol...@amd.com>> Marek -- Lerne, wie die Welt wirklich ist, Aber vergiss niemals, wie sie sein sollte. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] vulkan/wsi: check if the display_fd given is master
On Fri, 19 Apr 2019 at 16:01, Emil Velikov wrote: > > From: Emil Velikov > > As effectively required by the extension, we need to ensure we're master > > Currently drivers employ vendor specific solutions, which check if the > device behind the fd is capable*, yet none of them do the master check. > > *In the radv case, if acceleration is available. > > Instead of duplicating the check in each driver, keep it where it's > needed and used. > > Note this copies libdrm's drmIsMaster() to avoid depending on bleeding > edge version of the library. > > v2: set the fd to -1 if not master (Bas) > > Cc: Keith Packard > Cc: Jason Ekstrand > Cc: Bas Nieuwenhuizen > Cc: Andres Rodriguez > Reported-by: Andres Rodriguez > Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]") > Signed-off-by: Emil Velikov > --- > src/vulkan/wsi/wsi_common_display.c | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/src/vulkan/wsi/wsi_common_display.c > b/src/vulkan/wsi/wsi_common_display.c > index 74ed36ed646..2be20e85046 100644 > --- a/src/vulkan/wsi/wsi_common_display.c > +++ b/src/vulkan/wsi/wsi_common_display.c > @@ -1812,6 +1812,30 @@ fail_attr_init: > return ret; > } > > + > +/* > + * Local version fo the libdrm helper. Added to avoid depending on bleeding > + * edge version of the library. > + */ > +static int > +local_drmIsMaster(int fd) > +{ > + /* Detect master by attempting something that requires master. > +* > +* Authenticating magic tokens requires master and 0 is an > +* internal kernel detail which we could use. Attempting this on > +* a master fd would fail therefore fail with EINVAL because 0 > +* is invalid. > +* > +* A non-master fd will fail with EACCES, as the kernel checks > +* for master before attempting to do anything else. > +* > +* Since we don't want to leak implementation details, use > +* EACCES. > +*/ > + return drmAuthMagic(fd, 0) != -EACCES; > +} > + > VkResult > wsi_display_init_wsi(struct wsi_device *wsi_device, > const VkAllocationCallbacks *alloc, > @@ -1827,6 +1851,9 @@ wsi_display_init_wsi(struct wsi_device *wsi_device, > } > > wsi->fd = display_fd; > + if (wsi->fd != -1 && !local_drmIsMaster(wsi->fd)) > + wsi->fd = -1; > + > wsi->alloc = alloc; > Humble ping? -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vulkan/wsi: don't use DUMB_CLOSE for normal GEM handles
On Fri, 19 Apr 2019 at 16:03, Emil Velikov wrote: > > From: Emil Velikov > > Currently we get normal GEM handles from PrimeFDToHandle, yet we close > then with DUMB_CLOSE. Use GEM_CLOSE instead. > > Cc: Keith Packard > Cc: Jason Ekstrand > Cc: Bas Nieuwenhuizen > Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]") > Signed-off-by: Emil Velikov > --- > src/vulkan/wsi/wsi_common_display.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/vulkan/wsi/wsi_common_display.c > b/src/vulkan/wsi/wsi_common_display.c > index 2be20e85046..66e191906fc 100644 > --- a/src/vulkan/wsi/wsi_common_display.c > +++ b/src/vulkan/wsi/wsi_common_display.c > @@ -974,8 +974,8 @@ static void > wsi_display_destroy_buffer(struct wsi_display *wsi, > uint32_t buffer) > { > - (void) drmIoctl(wsi->fd, DRM_IOCTL_MODE_DESTROY_DUMB, > - &((struct drm_mode_destroy_dumb) { .handle = buffer })); > + (void) drmIoctl(wsi->fd, DRM_IOCTL_GEM_CLOSE, > + &((struct drm_gem_close) { .handle = buffer })); > } > Humble ping anyone? AFAICT closing handles from PrimeFDToHandle() with DUMB_CLOSE is a violation, even if it somehow works today. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): radeonsi: delay adding BOs at the beginning of IBs until the first draw
On 2019-04-24 11:36 p.m., Marek Olšák wrote: > It should be fixed by the patch "radeonsi: add BOs after need_cs_space". Looks like that did the trick, thanks! -- Earthling Michel Dänzer | https://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/6] Add support for NV12
Hi Christian, Am Mittwoch, den 24.04.2019, 08:36 +0200 schrieb Christian Gmeiner: > This patch series goes a complete different route then the one from > Lucas Stach. I am using the integrated YUV tiler instead of using > the 2D core for format conversion. I am reusing some patches from > Lucas and this series sits on-top of Lucas "st/dri: YUV" patches. We specifically opted to use the 2D GPU to do a format conversion, as this yields a RGB internal representation, which means the texture has the same properties as a normal GL texture (e.g. glReadPixels works). This way we can expose YUV format imports as non-external textures. This provides a number of benefits in texture lifetime handling in the upper layers of the stack, which are used to drive those video use- cases, like GStreamer. I don't really care what the blob does, but I do care about having the highest performing solution, which is to have the 2D GPU work in parallel with the 3D GPU and allow efficient texture imports with GStreamer. I would really appreciate a review of my patch series. Regards, Lucas > Christian Gmeiner (3): > etnaviv: direct YUYV/UYVY support > etnaviv: update headers from rnndb > etnaviv: add multi-planar YUV support > > Lucas Stach (3): > etnaviv: clear out next pointer when allocating resource > etnaviv: remember data offset into BO > etnaviv: improve PIPE_BIND_LINEAR handling > > .../drivers/etnaviv/etnaviv_clear_blit.c | 2 +- > src/gallium/drivers/etnaviv/etnaviv_format.c | 5 +- > .../drivers/etnaviv/etnaviv_resource.c| 24 +++- > src/gallium/drivers/etnaviv/etnaviv_rs.c | 5 + > src/gallium/drivers/etnaviv/etnaviv_screen.c | 4 + > src/gallium/drivers/etnaviv/etnaviv_texture.c | 8 ++ > src/gallium/drivers/etnaviv/etnaviv_yuv.c | 123 > ++ > src/gallium/drivers/etnaviv/etnaviv_yuv.h | 44 +++ > src/gallium/drivers/etnaviv/hw/common.xml.h | 2 +- > .../drivers/etnaviv/hw/common_3d.xml.h| 2 +- > src/gallium/drivers/etnaviv/hw/state.xml.h| 4 +- > src/gallium/drivers/etnaviv/hw/state_3d.xml.h | 35 +++-- > .../drivers/etnaviv/hw/state_blt.xml.h| 4 +- > .../drivers/etnaviv/hw/texdesc_3d.xml.h | 2 +- > src/gallium/drivers/etnaviv/meson.build | 2 + > 15 files changed, 240 insertions(+), 26 deletions(-) > create mode 100644 src/gallium/drivers/etnaviv/etnaviv_yuv.c > create mode 100644 src/gallium/drivers/etnaviv/etnaviv_yuv.h > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gpu/docs: Clarify what userspace means for gl
On Thu, Apr 25, 2019 at 11:28:40AM +0800, zhoucm1 wrote: > > > On 2019年04月25日 03:22, Eric Anholt wrote: > > "Zhou, David(ChunMing)" writes: > > > > > Will linux be only mesa-linux? I thought linux is an open linux. > > > Which will impact our opengl/amdvlk(MIT open source), not sure Rocm: > > > 1. how to deal with one uapi that opengl/amdvlk needs but mesa dont need? > > > reject? > > > 2. one hw feature that opengl/amdvlk developers work on that but no mesa > > > developers work on, cannot upstream as well? > > I believe these questions are already covered by > > > > "+Other userspace is only admissible if exposing a given feature through > > OpenGL > > or > > +OpenGL ES would result in a technically unsound design, incomplete driver > > or > > +an implementation which isn't useful in real world usage." > > > > If OpenGL needs the interface, then you need a Mesa implementation. > > It's time for you to work with the community to build that or get it > > built. Or, in AMD's case, work with the Mesa developers that you > > already employ. > > > > If OpenGL doesn't need it, but Vulkan needs it, then we don't have a > > clear policy in place, and this patch doesn't change that. I would > > personally say that AMDVLK doesn't qualify given that as far as I know > > there is not open review of proposed patches to the project as they're > > being developed. > Can I understand what you mean is, as soon as the stack is openly developed, > then which will be able to drive new UAPI? I think the only clear thing here is that the answer is complicated, and need to be decided on a case by case basis. That's what I tried to clarify with my patch, but I think there's not enough clearly defined common ground. So it'll stay complicated. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] radeonsi/nir: create si_nir_opts() helper
We will make use of this in the following commit. --- src/gallium/drivers/radeonsi/si_shader.h | 1 + src/gallium/drivers/radeonsi/si_shader_nir.c | 78 +++- 2 files changed, 43 insertions(+), 36 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_shader.h b/src/gallium/drivers/radeonsi/si_shader.h index f9f81a7bc1e..71ce27b2f55 100644 --- a/src/gallium/drivers/radeonsi/si_shader.h +++ b/src/gallium/drivers/radeonsi/si_shader.h @@ -710,6 +710,7 @@ void si_nir_scan_shader(const struct nir_shader *nir, void si_nir_scan_tess_ctrl(const struct nir_shader *nir, struct tgsi_tessctrl_info *out); void si_lower_nir(struct si_shader_selector *sel); +void si_nir_opts(struct nir_shader *nir); /* Inline helpers. */ diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c b/src/gallium/drivers/radeonsi/si_shader_nir.c index 4fe01ba0607..87100fbed19 100644 --- a/src/gallium/drivers/radeonsi/si_shader_nir.c +++ b/src/gallium/drivers/radeonsi/si_shader_nir.c @@ -811,6 +811,47 @@ void si_nir_scan_shader(const struct nir_shader *nir, } } +void +si_nir_opts(struct nir_shader *nir) +{ + bool progress; + do { + progress = false; + + NIR_PASS_V(nir, nir_lower_vars_to_ssa); + + NIR_PASS(progress, nir, nir_opt_copy_prop_vars); + NIR_PASS(progress, nir, nir_opt_dead_write_vars); + + NIR_PASS_V(nir, nir_lower_alu_to_scalar); + NIR_PASS_V(nir, nir_lower_phis_to_scalar); + + /* (Constant) copy propagation is needed for txf with offsets. */ + NIR_PASS(progress, nir, nir_copy_prop); + NIR_PASS(progress, nir, nir_opt_remove_phis); + NIR_PASS(progress, nir, nir_opt_dce); + if (nir_opt_trivial_continues(nir)) { + progress = true; + NIR_PASS(progress, nir, nir_copy_prop); + NIR_PASS(progress, nir, nir_opt_dce); + } + NIR_PASS(progress, nir, nir_opt_if, true); + NIR_PASS(progress, nir, nir_opt_dead_cf); + NIR_PASS(progress, nir, nir_opt_cse); + NIR_PASS(progress, nir, nir_opt_peephole_select, 8, true, true); + + /* Needed for algebraic lowering */ + NIR_PASS(progress, nir, nir_opt_algebraic); + NIR_PASS(progress, nir, nir_opt_constant_folding); + + NIR_PASS(progress, nir, nir_opt_undef); + NIR_PASS(progress, nir, nir_opt_conditional_discard); + if (nir->options->max_unroll_iterations) { + NIR_PASS(progress, nir, nir_opt_loop_unroll, 0); + } + } while (progress); +} + /** * Perform "lowering" operations on the NIR that are run once when the shader * selector is created. @@ -861,42 +902,7 @@ si_lower_nir(struct si_shader_selector* sel) ac_lower_indirect_derefs(sel->nir, sel->screen->info.chip_class); - bool progress; - do { - progress = false; - - NIR_PASS_V(sel->nir, nir_lower_vars_to_ssa); - - NIR_PASS(progress, sel->nir, nir_opt_copy_prop_vars); - NIR_PASS(progress, sel->nir, nir_opt_dead_write_vars); - - NIR_PASS_V(sel->nir, nir_lower_alu_to_scalar); - NIR_PASS_V(sel->nir, nir_lower_phis_to_scalar); - - /* (Constant) copy propagation is needed for txf with offsets. */ - NIR_PASS(progress, sel->nir, nir_copy_prop); - NIR_PASS(progress, sel->nir, nir_opt_remove_phis); - NIR_PASS(progress, sel->nir, nir_opt_dce); - if (nir_opt_trivial_continues(sel->nir)) { - progress = true; - NIR_PASS(progress, sel->nir, nir_copy_prop); - NIR_PASS(progress, sel->nir, nir_opt_dce); - } - NIR_PASS(progress, sel->nir, nir_opt_if, true); - NIR_PASS(progress, sel->nir, nir_opt_dead_cf); - NIR_PASS(progress, sel->nir, nir_opt_cse); - NIR_PASS(progress, sel->nir, nir_opt_peephole_select, 8, true, true); - - /* Needed for algebraic lowering */ - NIR_PASS(progress, sel->nir, nir_opt_algebraic); - NIR_PASS(progress, sel->nir, nir_opt_constant_folding); - - NIR_PASS(progress, sel->nir, nir_opt_undef); - NIR_PASS(progress, sel->nir, nir_opt_conditional_discard); - if (sel->nir->options->max_unroll_iterations) { - NIR_PASS(progress, sel->nir, nir_opt_loop_unroll, 0); - } - } while (progress); + si_nir_opts(sel->nir); NIR_PASS_V(sel->nir, nir_lower_bool_to_int32); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [PATCH 2/2] radeonsi/nir: call radeonsi nir opts before the scan pass
Some of the opts are not called in the general optimastion loop in the state trackers glsl -> nir conversion. We need to call the radeonsi specific optimisation once before scanning over the nir otherwise we can end up gathering info on code that is later removed. Fixes an assert in the piglit test: ./bin/varying-struct-centroid_gles3 --- src/gallium/drivers/radeonsi/si_compute.c | 1 + src/gallium/drivers/radeonsi/si_state_shaders.c | 1 + 2 files changed, 2 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 541d7e6f118..f1a433b72df 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -106,6 +106,7 @@ static void si_create_compute_state_async(void *job, int thread_index) assert(program->ir_type == PIPE_SHADER_IR_NIR); sel.nir = program->ir.nir; + si_nir_opts(sel.nir); si_nir_scan_shader(sel.nir, ); si_lower_nir(); } diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c index 5bdfd4f6ac1..7250b40c5db 100644 --- a/src/gallium/drivers/radeonsi/si_state_shaders.c +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c @@ -2240,6 +2240,7 @@ static void *si_create_shader_selector(struct pipe_context *ctx, sel->nir = state->ir.nir; + si_nir_opts(sel->nir); si_nir_scan_shader(sel->nir, >info); si_nir_scan_tess_ctrl(sel->nir, >tcs_info); } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: fix nir_remove_unused_varyings()
Reviewed-by: Samuel Pitoiset Thanks fo the fix Tim! On 4/25/19 3:17 AM, Timothy Arceri wrote: We were only setting the used mask for the first component of a varying. Since the linking opts split vectors into scalars this has mostly worked ok. However this causes an issue where for example if we split a struct on one side of the interface but not the other, then we can possibly end up removing the first components on the side that was split and then incorrectly remove the whole struct on the other side of the varying. With this change we simply mark all 4 components for each slot used by a struct. We could possibly make this more fine gained but that would require a more complex change. This fixes a bug in Strange Brigade on RADV when tessellation is enabled, all credit goes to Samuel Pitoiset for tracking down the cause of the bug. Fixes: f1eb5e639997 ("nir: add component level support to remove_unused_io_vars()") --- src/compiler/nir/nir_linking_helpers.c | 51 +- 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/src/compiler/nir/nir_linking_helpers.c b/src/compiler/nir/nir_linking_helpers.c index f4494c78f98..ef0f94baf47 100644 --- a/src/compiler/nir/nir_linking_helpers.c +++ b/src/compiler/nir/nir_linking_helpers.c @@ -59,6 +59,15 @@ get_variable_io_mask(nir_variable *var, gl_shader_stage stage) return ((1ull << slots) - 1) << location; } +static uint8_t +get_num_components(nir_variable *var) +{ + if (glsl_type_is_struct_or_ifc(glsl_without_array(var->type))) + return 4; + + return glsl_get_vector_elements(glsl_without_array(var->type)); +} + static void tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t *patches_read) { @@ -80,12 +89,14 @@ tcs_add_output_reads(nir_shader *shader, uint64_t *read, uint64_t *patches_read) continue; nir_variable *var = nir_deref_instr_get_variable(deref); -if (var->data.patch) { - patches_read[var->data.location_frac] |= - get_variable_io_mask(var, shader->info.stage); -} else { - read[var->data.location_frac] |= - get_variable_io_mask(var, shader->info.stage); +for (unsigned i = 0; i < get_num_components(var); i++) { + if (var->data.patch) { + patches_read[var->data.location_frac + i] |= + get_variable_io_mask(var, shader->info.stage); + } else { + read[var->data.location_frac + i] |= + get_variable_io_mask(var, shader->info.stage); + } } } } @@ -161,22 +172,26 @@ nir_remove_unused_varyings(nir_shader *producer, nir_shader *consumer) uint64_t patches_read[4] = { 0 }, patches_written[4] = { 0 }; nir_foreach_variable(var, >outputs) { - if (var->data.patch) { - patches_written[var->data.location_frac] |= -get_variable_io_mask(var, producer->info.stage); - } else { - written[var->data.location_frac] |= -get_variable_io_mask(var, producer->info.stage); + for (unsigned i = 0; i < get_num_components(var); i++) { + if (var->data.patch) { +patches_written[var->data.location_frac + i] |= + get_variable_io_mask(var, producer->info.stage); + } else { +written[var->data.location_frac + i] |= + get_variable_io_mask(var, producer->info.stage); + } } } nir_foreach_variable(var, >inputs) { - if (var->data.patch) { - patches_read[var->data.location_frac] |= -get_variable_io_mask(var, consumer->info.stage); - } else { - read[var->data.location_frac] |= -get_variable_io_mask(var, consumer->info.stage); + for (unsigned i = 0; i < get_num_components(var); i++) { + if (var->data.patch) { +patches_read[var->data.location_frac + i] |= + get_variable_io_mask(var, consumer->info.stage); + } else { +read[var->data.location_frac + i] |= + get_variable_io_mask(var, consumer->info.stage); + } } } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev