[Mesa-dev] [PATCH 2/2] i965/skl: Break down SIMD16 3-source instructions when required.
Several steppings of Skylake fail when using SIMD16 with 3-source instructions (such as MAD). This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit tests. Based on a patch by Neil Roberts. Signed-off-by: Kenneth Graunke Cc: Neil Roberts Cc: Matt Turner --- src/mesa/drivers/dri/i965/brw_shader.cpp | 6 ++ 1 file changed, 6 insertions(+) Neil, what do you think of this approach? It's a bit smaller of a hammer than turning off SIMD16 altogether, and pretty simple. I haven't tested it at all, though. Feel free to --reset-author and claim authorship on this patch - it's really your code, I just moved it over a bit. diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index fed4ba3..74c0e50 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -400,6 +400,12 @@ bool brw_instruction_supports_simd16(const struct brw_context *brw, enum opcode op) { bool supports_3src = brw->is_haswell || brw->gen >= 8; + /* WaDisableSIMD16On3SrcInstr: 3-source instructions don't work in SIMD16 +* on a few steppings of Skylake. +*/ + if (brw->gen == 9 && (brw->revision == 2 || brw->revision == 3 || + brw->revision == -1)) + supports_3src = false; switch (op) { case BRW_OPCODE_MAD: -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Reserve more batch space to accomodate Gen6 perfmonitors.
Ben noticed that I said each PIPE_CONTROL was 4 DWords, but it's actually 5 DWords on Gen6-7. We've been reserving insufficient space for performance monitoring on Sandybridge, which means it would likely break if you used that functionality. (Thankfully, no one does...) Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) We should probably Cc this to stable unless we delete the broken performance monitoring code altogether. diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index 7bdd836..5a16456 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -22,12 +22,12 @@ extern "C" { * - Disabling OA counters on Gen6+ (3 DWords = 12 bytes) * - Ending MI_REPORT_PERF_COUNT on Gen5+, plus associated PIPE_CONTROLs: * - Two sets of PIPE_CONTROLs, which become 3 PIPE_CONTROLs each on SNB, - * which are 4 DWords each ==> 2 * 3 * 4 * 4 = 96 bytes + * which are 5 DWords each ==> 2 * 3 * 5 * 4 = 120 bytes * - 3 DWords for MI_REPORT_PERF_COUNT itself on Gen6+. ==> 12 bytes. * On Ironlake, it's 6 DWords, but we have some slack due to the lack of * Sandybridge PIPE_CONTROL madness. */ -#define BATCH_RESERVED 146 +#define BATCH_RESERVED 152 struct intel_batchbuffer; -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Split Gen4-5 BlitFramebuffer code; prefer BLT over Meta.
A while back I switched intel_blit_framebuffer to prefer Meta over the BLT. This meant that Gen8 platforms would start using the 3D engine for blits, just like we do on Gen6-7.5. However, I hadn't considered Gen4-5 when making that change. The BLT engine appears to be substantially faster on 965GM than using Meta to drive the 3D engine. This isn't too surprising: original Gen4 doesn't support tile offsets (that came on G45), and the level/layer fields don't work for cubemap rendering, so for inconvenient miplevel alignments, we end up blitting or copying data to/from temporaries in order to render to it. We may as well just use the blitter. I chose to use the BLT on Gen4-5 because they use the same ring for both 3D and BLT; Gen6+ splits it out. Fixes regressions on 965GM due to botched tile offset code (we should fix those properly as well, but they're longstanding bugs - for now, put things back to the status quo). Signed-off-by: Kenneth Graunke Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89430 Cc: "10.5" Cc: Mark Janes --- src/mesa/drivers/dri/i965/intel_fbo.c | 50 ++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c b/src/mesa/drivers/dri/i965/intel_fbo.c index 04e5030..121c97f 100644 --- a/src/mesa/drivers/dri/i965/intel_fbo.c +++ b/src/mesa/drivers/dri/i965/intel_fbo.c @@ -916,6 +916,51 @@ intel_blit_framebuffer(struct gl_context *ctx, } /** + * Gen4-5 implementation of glBlitFrameBuffer(). + * + * Tries BLT, Meta, then swrast. + * + * Gen4-5 have a single ring for both 3D and BLT operations, so there's no + * inter-ring synchronization issues like on Gen6+. It is apparently faster + * than using the 3D pipeline. Original Gen4 also has to rebase and copy + * miptree slices in order to render to unaligned locations. + */ +static void +gen4_blit_framebuffer(struct gl_context *ctx, + struct gl_framebuffer *readFb, + struct gl_framebuffer *drawFb, + GLint srcX0, GLint srcY0, GLint srcX1, GLint srcY1, + GLint dstX0, GLint dstY0, GLint dstX1, GLint dstY1, + GLbitfield mask, GLenum filter) +{ + /* Page 679 of OpenGL 4.4 spec says: +*"Added BlitFramebuffer to commands affected by conditional rendering in +* section 10.10 (Bug 9562)." +*/ + if (!_mesa_check_conditional_render(ctx)) + return; + + mask = intel_blit_framebuffer_with_blitter(ctx, readFb, drawFb, + srcX0, srcY0, srcX1, srcY1, + dstX0, dstY0, dstX1, dstY1, + mask, filter); + if (mask == 0x0) + return; + + mask = _mesa_meta_BlitFramebuffer(ctx, readFb, drawFb, + srcX0, srcY0, srcX1, srcY1, + dstX0, dstY0, dstX1, dstY1, + mask, filter); + if (mask == 0x0) + return; + + _swrast_BlitFramebuffer(ctx, readFb, drawFb, + srcX0, srcY0, srcX1, srcY1, + dstX0, dstY0, dstX1, dstY1, + mask, filter); +} + +/** * Does the renderbuffer have hiz enabled? */ bool @@ -1049,7 +1094,10 @@ intel_fbo_init(struct brw_context *brw) dd->UnmapRenderbuffer = intel_unmap_renderbuffer; dd->RenderTexture = intel_render_texture; dd->ValidateFramebuffer = intel_validate_framebuffer; - dd->BlitFramebuffer = intel_blit_framebuffer; + if (brw->gen >= 6) + dd->BlitFramebuffer = intel_blit_framebuffer; + else + dd->BlitFramebuffer = gen4_blit_framebuffer; dd->EGLImageTargetRenderbufferStorage = intel_image_target_renderbuffer_storage; -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix URB size for CHV
On Thursday, March 05, 2015 07:41:29 PM Ville Syrjälä wrote: > On Fri, Jan 23, 2015 at 12:12:56PM +0200, ville.syrj...@linux.intel.com wrote: > > From: Ville Syrjälä > > > > Increase the device info .urb.size for CHV to match the default URB > > size (192kB). > > > > Signed-off-by: Ville Syrjälä > > Ping? Oh, sorry! I thought I'd reviewed this. It does indeed appear to be 192kB. Reviewed-by: Kenneth Graunke Have you tested it? Assuming it doesn't explode, feel free to push this. Thanks for catching the mistake! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: Make the printer include nir_variable::location too.
Being able to see both location and driver_location can be useful when debugging IO mistakes. Signed-off-by: Kenneth Graunke --- src/glsl/nir/nir_print.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_print.c b/src/glsl/nir/nir_print.c index 6a3c6a0..30d4821 100644 --- a/src/glsl/nir/nir_print.c +++ b/src/glsl/nir/nir_print.c @@ -228,7 +228,7 @@ print_var_decl(nir_variable *var, print_var_state *state, FILE *fp) if (var->data.mode == nir_var_shader_in || var->data.mode == nir_var_shader_out || var->data.mode == nir_var_uniform) { - fprintf(fp, " (%u)", var->data.driver_location); + fprintf(fp, " (%u, %u)", var->data.location, var->data.driver_location); } fprintf(fp, "\n"); -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 6/6] i965: Don't write past the end of the application supplied buffer
On Wednesday, March 04, 2015 09:55:46 AM Ian Romanick wrote: > From: Ian Romanick > > Both the AMD and Intel APIs provide a dataSize parameter, and this > function would merrily ignore it. Neither API specifies what to do when > the buffer isn't big enough. I take the easy route of writing all the > complete bits of data that will fit. With more complete specs, we could > probably do something different. > > I noticed this while looking into an unused parameter warning. The > warning was actually useful! > > brw_performance_monitor.c: In function 'brw_get_perf_monitor_result': > brw_performance_monitor.c:1261:37: warning: unused parameter 'data_size' > [-Wunused-parameter] > GLsizei data_size, > ^ > > v2: Fix checks to include offset in the calculation. Noticed by Jan. > > Signed-off-by: Ian Romanick > Cc: Kenneth Graunke > Cc: Jan Vesely Huh. I could've sworn I reviewed this patch a while back, but I can't find any evidence of that. Thanks for fixing this! The spec isn't particularly clear about whether we should write up to the very end of the byte limit (i.e. write the group name and counter name but not the value), or omit the entire last entry when there isn't space...but I think what you did is quite sensible. I'm pretty sure we're deleting most of this code in favor of Rob's efforts, but we may as well land this fix in the meantime. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/nir: Resolve source modifiers on Gen8+ logic operations.
On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and negate changes meaning to bitwise-not (~, not -). This isn't what NIR expects, so we should resolve the source modifers via a MOV. +30 Piglits (fs-op-bit{and,or,xor}-not-abs-*). Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs.cpp | 11 +++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 15 +++ 3 files changed, 27 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d6acc23..428234f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1561,6 +1561,17 @@ fs_visitor::emit_sampleid_setup() return reg; } +void +fs_visitor::resolve_source_modifiers(fs_reg *src) +{ + if (!src->abs && !src->negate) + return; + + fs_reg temp = retype(vgrf(1), src->type); + emit(MOV(temp, *src)); + *src = temp; +} + fs_reg fs_visitor::fix_math_operand(fs_reg src) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 70098d8..ec77962 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -299,6 +299,7 @@ public: int texunit); fs_reg emit_mcs_fetch(fs_reg coordinate, int components, fs_reg sampler); void emit_gen6_gather_wa(uint8_t wa, fs_reg dst); + void resolve_source_modifiers(fs_reg *src); fs_reg fix_math_operand(fs_reg src); fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0); fs_inst *emit_math(enum opcode op, fs_reg dst, fs_reg src0, fs_reg src1); diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 66f7918..a0300aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -935,15 +935,30 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) break; case nir_op_inot: + if (brw->gen >= 8) { + resolve_source_modifiers(&op[0]); + } emit(NOT(result, op[0])); break; case nir_op_ixor: + if (brw->gen >= 8) { + resolve_source_modifiers(&op[0]); + resolve_source_modifiers(&op[1]); + } emit(XOR(result, op[0], op[1])); break; case nir_op_ior: + if (brw->gen >= 8) { + resolve_source_modifiers(&op[0]); + resolve_source_modifiers(&op[1]); + } emit(OR(result, op[0], op[1])); break; case nir_op_iand: + if (brw->gen >= 8) { + resolve_source_modifiers(&op[0]); + resolve_source_modifiers(&op[1]); + } emit(AND(result, op[0], op[1])); break; -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Implement SIMD16 dual source blending.
On Thursday, March 05, 2015 09:39:58 PM Jason Ekstrand wrote: > This looks fine to me. I just kicked off a build on our test farm and, > assuming that looks good (I'll send another e-mail in the morning if it > does), > > Reviewed-by: Jason Ekstrand > > I ran shader-db on the change and I was kind of surprised to see that it > doesn't really do anything. > > GAINED: shaders/dolphin/smg.1.shader_test FS SIMD16 > > total instructions in shared programs: 5769629 -> 5769629 (0.00%) > instructions in affected programs: 0 -> 0 > helped:0 > HURT: 0 > GAINED:1 > LOST: 0 > > Perhaps shader-db doesn't account for some other GL state required for > dual-source because I doubt only one shader uses it. Ken? > > --Jason That would be dolphin/smg.1.shader_test - the one lonely shader that uses layout qualifiers to specify the dual source color output index: layout(location = 0, index = 1) out vec4 ocol1; Other applications (such as Unigine) most likely call glBindFragDataLocationIndexed to assign the location and index. Unfortunately, shader-db doesn't capture this, as it's tied to API calls, and not part of the shader itself. Eric's new shader-db-2 project that uses apitrace would catch this (but at a large cost). We could probably capture this somehow - add some kind of annotations to the file with the locations/indexes of each shader input/output, then make the API calls after compiling the shader...relink to make them take effect, which would also cause a new precompile, then replace the original results...seems like a pain, but probably doable... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] glsl: Mark array access when copying to a temporary for the ?: operator.
Piglit's spec/glsl-1.20/compiler/structure-and-array-operations/ array-selection.vert test contains the following code: gl_Position = (pick_from_a_or_b ? a : b)[i]; where "a" and "b" are uniform vec4[2] variables. ast_to_hir creates a temporary vec4[2] variable, conditional_tmp, and generates an if-block to copy one or the other: (declare (temporary) (array vec4 2) conditional_tmp) (if (var_ref pick_from_a_or_b) ((assign () (var_ref conditional_tmp) (var_ref a))) ((assign () (var_ref conditional_tmp) (var_ref b However, we failed to update max_array_access for "a" and "b", so it remained 0 - here, the whole array is being accessed. At link time, update_array_sizes() used this bogus information to change the types of "a" and "b" to vec4[1]. We then had assignments from a vec4[1] to a vec4[2], which is highly illegal. This tripped assertions in nir_split_var_copies with scalar VS. Signed-off-by: Kenneth Graunke Cc: mesa-sta...@lists.freedesktop.org --- src/glsl/ast_to_hir.cpp | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index acb5c76..d387b2e 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1617,6 +1617,12 @@ ast_expression::do_hir(exec_list *instructions, && cond_val != NULL) { result = cond_val->value.b[0] ? op[1] : op[2]; } else { + /* The copy to conditional_tmp reads the whole array. */ + if (type->is_array()) { +mark_whole_array_access(op[1]); +mark_whole_array_access(op[2]); + } + ir_variable *const tmp = new(ctx) ir_variable(type, "conditional_tmp", ir_var_temporary); instructions->push_tail(tmp); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] nir: Only do gl_FrontFacing workaround in glsl_to_nir for the FS.
Vertex shaders can have shader inputs where location happens to be VARYING_SLOT_FACE. Without predicating this on the shader stage, we suddenly end up with load_front_face intrinsics in vertex shaders, which is nonsensical. Fixes spec/arb_vertex_buffer_object/pos-array when using NIR for VS. Signed-off-by: Kenneth Graunke --- src/glsl/nir/glsl_to_nir.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index ddad207..047cb51 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -251,7 +251,8 @@ nir_visitor::visit(ir_variable *ir) break; case ir_var_shader_in: - if (ir->data.location == VARYING_SLOT_FACE) { + if (stage == MESA_SHADER_FRAGMENT && + ir->data.location == VARYING_SLOT_FACE) { /* For whatever reason, GLSL IR makes gl_FrontFacing an input */ var->data.location = SYSTEM_VALUE_FRONT_FACE; var->data.mode = nir_var_system_value; -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] nir: Delete nir_shader::user_structures and num_user_structures.
Nothing actually uses these, and the only caller of glsl_to_nir() (brw_fs_nir.cpp) always passes NULL for the _mesa_glsl_parse_state pointer, meaning they'll always be NULL and 0, respectively. Just delete them. Signed-off-by: Kenneth Graunke --- src/glsl/nir/glsl_to_nir.cpp | 11 --- src/glsl/nir/nir.c | 3 --- src/glsl/nir/nir.h | 4 src/glsl/nir/nir_print.c | 4 4 files changed, 22 deletions(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index adef19c..b82e5c7 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -154,17 +154,6 @@ glsl_to_nir(exec_list *ir, _mesa_glsl_parse_state *state, nir_shader *shader = nir_shader_create(NULL, options); - if (state) { - shader->num_user_structures = state->num_user_structures; - shader->user_structures = ralloc_array(shader, glsl_type *, - shader->num_user_structures); - memcpy(shader->user_structures, state->user_structures, -shader->num_user_structures * sizeof(glsl_type *)); - } else { - shader->num_user_structures = 0; - shader->user_structures = NULL; - } - nir_visitor v1(shader, native_integers); nir_function_visitor v2(&v1); v2.run(ir); diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index ab57fd4..abad3f8 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -42,9 +42,6 @@ nir_shader_create(void *mem_ctx, const nir_shader_compiler_options *options) shader->options = options; - shader->num_user_structures = 0; - shader->user_structures = NULL; - exec_list_make_empty(&shader->functions); exec_list_make_empty(&shader->registers); exec_list_make_empty(&shader->globals); diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index d5df596..b935354 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1400,10 +1400,6 @@ typedef struct nir_shader { /** list of global register in the shader */ struct exec_list registers; - /** structures used in this shader */ - unsigned num_user_structures; - struct glsl_type **user_structures; - /** next available global register index */ unsigned reg_alloc; diff --git a/src/glsl/nir/nir_print.c b/src/glsl/nir/nir_print.c index 30d4821..f8b14a1 100644 --- a/src/glsl/nir/nir_print.c +++ b/src/glsl/nir/nir_print.c @@ -844,10 +844,6 @@ nir_print_shader(nir_shader *shader, FILE *fp) print_var_state state; init_print_state(&state); - for (unsigned i = 0; i < shader->num_user_structures; i++) { - glsl_print_struct(shader->user_structures[i], fp); - } - struct hash_entry *entry; hash_table_foreach(shader->uniforms, entry) { -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] nir: Plumb the shader stage into glsl_to_nir().
The next commit needs to know the shader stage in glsl_to_nir(). To facilitate that, we pass the gl_shader rather than the raw exec_list of instructions. This has both the exec_list and the stage. Signed-off-by: Kenneth Graunke --- src/glsl/nir/glsl_to_nir.cpp | 14 -- src/glsl/nir/glsl_to_nir.h | 2 +- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index 0d96e03..ddad207 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -43,7 +43,7 @@ namespace { class nir_visitor : public ir_visitor { public: - nir_visitor(nir_shader *shader); + nir_visitor(nir_shader *shader, gl_shader_stage stage); ~nir_visitor(); virtual void visit(ir_variable *); @@ -83,6 +83,7 @@ private: bool supports_ints; nir_shader *shader; + gl_shader_stage stage; nir_function_impl *impl; exec_list *cf_node_list; nir_instr *result; /* result of the expression tree last visited */ @@ -125,22 +126,23 @@ private: }; /* end of anonymous namespace */ nir_shader * -glsl_to_nir(exec_list *ir, const nir_shader_compiler_options *options) +glsl_to_nir(struct gl_shader *sh, const nir_shader_compiler_options *options) { nir_shader *shader = nir_shader_create(NULL, options); - nir_visitor v1(shader); + nir_visitor v1(shader, sh->Stage); nir_function_visitor v2(&v1); - v2.run(ir); - visit_exec_list(ir, &v1); + v2.run(sh->ir); + visit_exec_list(sh->ir, &v1); return shader; } -nir_visitor::nir_visitor(nir_shader *shader) +nir_visitor::nir_visitor(nir_shader *shader, gl_shader_stage stage) { this->supports_ints = shader->options->native_integers; this->shader = shader; + this->stage = stage; this->is_global = true; this->var_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer, _mesa_key_pointer_equal); diff --git a/src/glsl/nir/glsl_to_nir.h b/src/glsl/nir/glsl_to_nir.h index dd62793..3801e8c 100644 --- a/src/glsl/nir/glsl_to_nir.h +++ b/src/glsl/nir/glsl_to_nir.h @@ -32,7 +32,7 @@ extern "C" { #endif -nir_shader *glsl_to_nir(exec_list *ir, +nir_shader *glsl_to_nir(struct gl_shader *sh, const nir_shader_compiler_options *options); #ifdef __cplusplus diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index ccb5cea..3bb6806 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -87,7 +87,7 @@ fs_visitor::emit_nir_code() /* first, lower the GLSL IR shader to NIR */ lower_output_reads(shader->base.ir); - nir_shader *nir = glsl_to_nir(shader->base.ir, options); + nir_shader *nir = glsl_to_nir(&shader->base, options); nir_validate_shader(nir); nir_lower_global_vars_to_local(nir); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] nir: Add native_integers to nir_shader_compiler_options.
glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the driver supports native integers. Presumably other passes may as well. Adding this to nir_shader_compiler_options is an easy way to provide that information, as it's accessible via nir_shader::options. Signed-off-by: Kenneth Graunke --- src/glsl/nir/glsl_to_nir.cpp | 11 +-- src/glsl/nir/glsl_to_nir.h | 2 +- src/glsl/nir/nir.h | 6 ++ src/mesa/drivers/dri/i965/brw_context.c | 4 +++- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 +- 5 files changed, 16 insertions(+), 9 deletions(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index 7e40ef4..0d96e03 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -43,7 +43,7 @@ namespace { class nir_visitor : public ir_visitor { public: - nir_visitor(nir_shader *shader, bool supports_ints); + nir_visitor(nir_shader *shader); ~nir_visitor(); virtual void visit(ir_variable *); @@ -125,12 +125,11 @@ private: }; /* end of anonymous namespace */ nir_shader * -glsl_to_nir(exec_list *ir, bool native_integers, -const nir_shader_compiler_options *options) +glsl_to_nir(exec_list *ir, const nir_shader_compiler_options *options) { nir_shader *shader = nir_shader_create(NULL, options); - nir_visitor v1(shader, native_integers); + nir_visitor v1(shader); nir_function_visitor v2(&v1); v2.run(ir); visit_exec_list(ir, &v1); @@ -138,9 +137,9 @@ glsl_to_nir(exec_list *ir, bool native_integers, return shader; } -nir_visitor::nir_visitor(nir_shader *shader, bool supports_ints) +nir_visitor::nir_visitor(nir_shader *shader) { - this->supports_ints = supports_ints; + this->supports_ints = shader->options->native_integers; this->shader = shader; this->is_global = true; this->var_table = _mesa_hash_table_create(NULL, _mesa_hash_pointer, diff --git a/src/glsl/nir/glsl_to_nir.h b/src/glsl/nir/glsl_to_nir.h index 7300945..dd62793 100644 --- a/src/glsl/nir/glsl_to_nir.h +++ b/src/glsl/nir/glsl_to_nir.h @@ -32,7 +32,7 @@ extern "C" { #endif -nir_shader *glsl_to_nir(exec_list *ir, bool native_integers, +nir_shader *glsl_to_nir(exec_list *ir, const nir_shader_compiler_options *options); #ifdef __cplusplus diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index b935354..669a26e 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1370,6 +1370,12 @@ typedef struct nir_shader_compiler_options { bool lower_fsqrt; /** lowers fneg and ineg to fsub and isub. */ bool lower_negate; + + /** +* Does the driver support real 32-bit integers? (Otherwise, integers +* are simulated by floats.) +*/ + bool native_integers; } nir_shader_compiler_options; typedef struct nir_shader { diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 892d4ca..03547e9 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -551,7 +551,9 @@ brw_initialize_context_constants(struct brw_context *brw) ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents = 128; } - static const nir_shader_compiler_options nir_options = {}; + static const nir_shader_compiler_options nir_options = { + .native_integers = true, + }; /* We want the GLSL compiler to emit code that uses condition codes */ for (int i = 0; i < MESA_SHADER_STAGES; i++) { diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index e24bf92..ccb5cea 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -87,7 +87,7 @@ fs_visitor::emit_nir_code() /* first, lower the GLSL IR shader to NIR */ lower_output_reads(shader->base.ir); - nir_shader *nir = glsl_to_nir(shader->base.ir, true, options); + nir_shader *nir = glsl_to_nir(shader->base.ir, options); nir_validate_shader(nir); nir_lower_global_vars_to_local(nir); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] nir: Try to make sense of the nir_shader_compiler_options code.
The code in glsl_to_nir is entirely dead, as we translate from GLSL to NIR at link time, when there isn't a _mesa_glsl_parse_state to pass, so every caller passes NULL. glsl_to_nir seems like the wrong place to try and create the shader compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other translators all would have to duplicate that code. The driver should set this up once with whatever settings it wants, and pass it in. Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[] and left a comment saying: "The memory for the options is expected to be kept in a single static copy by the driver." This suggests the plan was to do exactly that. That pointer was not marked const, however, and the dead code used a mix of static structures and ralloced ones. This patch deletes the dead code in glsl_to_nir, instead making it take the shader compiler options as a mandatory argument. It creates an (empty) options struct in the i965 driver, and makes NirOptions point to that. It marks the pointer const so that we can actually do so without generating "discards const qualifier" compiler warnings. Signed-off-by: Kenneth Graunke Cc: Eric Anholt --- src/glsl/nir/glsl_to_nir.cpp | 28 ++-- src/glsl/nir/glsl_to_nir.h | 4 ++-- src/mesa/drivers/dri/i965/brw_context.c | 5 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 5 - src/mesa/main/mtypes.h | 2 +- 5 files changed, 14 insertions(+), 30 deletions(-) Eric, does this look reasonable to you? diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index b82e5c7..7e40ef4 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -124,34 +124,10 @@ private: }; /* end of anonymous namespace */ -static const nir_shader_compiler_options default_options = { -}; - nir_shader * -glsl_to_nir(exec_list *ir, _mesa_glsl_parse_state *state, -bool native_integers) +glsl_to_nir(exec_list *ir, bool native_integers, +const nir_shader_compiler_options *options) { - const nir_shader_compiler_options *options; - - if (state) { - struct gl_context *ctx = state->ctx; - struct gl_shader_compiler_options *gl_options = - &ctx->Const.ShaderCompilerOptions[state->stage]; - - if (!gl_options->NirOptions) { - nir_shader_compiler_options *new_options = -rzalloc(ctx, nir_shader_compiler_options); - options = gl_options->NirOptions = new_options; - - if (gl_options->EmitNoPow) -new_options->lower_fpow = true; - } else { - options = gl_options->NirOptions; - } - } else { - options = &default_options; - } - nir_shader *shader = nir_shader_create(NULL, options); nir_visitor v1(shader, native_integers); diff --git a/src/glsl/nir/glsl_to_nir.h b/src/glsl/nir/glsl_to_nir.h index 58b2cee..7300945 100644 --- a/src/glsl/nir/glsl_to_nir.h +++ b/src/glsl/nir/glsl_to_nir.h @@ -32,8 +32,8 @@ extern "C" { #endif -nir_shader *glsl_to_nir(exec_list * ir, _mesa_glsl_parse_state *state, -bool native_integers); +nir_shader *glsl_to_nir(exec_list *ir, bool native_integers, +const nir_shader_compiler_options *options); #ifdef __cplusplus } diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 786e6f5..892d4ca 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -68,6 +68,8 @@ #include "tnl/t_pipeline.h" #include "util/ralloc.h" +#include "glsl/nir/nir.h" + /*** * Mesa's Driver Functions ***/ @@ -549,6 +551,8 @@ brw_initialize_context_constants(struct brw_context *brw) ctx->Const.Program[MESA_SHADER_FRAGMENT].MaxInputComponents = 128; } + static const nir_shader_compiler_options nir_options = {}; + /* We want the GLSL compiler to emit code that uses condition codes */ for (int i = 0; i < MESA_SHADER_STAGES; i++) { ctx->Const.ShaderCompilerOptions[i].MaxIfDepth = brw->gen < 6 ? 16 : UINT_MAX; @@ -562,6 +566,7 @@ brw_initialize_context_constants(struct brw_context *brw) (i == MESA_SHADER_FRAGMENT); ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectUniform = false; ctx->Const.ShaderCompilerOptions[i].LowerClipDistance = true; + ctx->Const.ShaderCompilerOptions[i].NirOptions = &nir_options; } ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = true; diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index a0300aa..e24bf92 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @
Re: [Mesa-dev] [PATCH 05/13] i965: Fix the untyped surface opcodes to deal with indirect surface access.
On Friday, March 06, 2015 03:20:26 PM Pohjolainen, Topi wrote: > On Fri, Mar 06, 2015 at 02:46:51PM +0200, Francisco Jerez wrote: > > "Pohjolainen, Topi" writes: > > > > > On Fri, Mar 06, 2015 at 02:29:15PM +0200, Francisco Jerez wrote: > > >> "Pohjolainen, Topi" writes: > > >> > > >> > On Fri, Feb 27, 2015 at 05:34:48PM +0200, Francisco Jerez wrote: > > >> >> Change brw_untyped_atomic() and brw_untyped_surface_read() to take the > > >> >> surface index as a register instead of a constant and to use > > >> >> brw_send_indirect_message() to emit the indirect variant of send with > > >> >> a dynamically calculated message descriptor. This will be required to > > >> >> support variable indexing of image arrays for > > >> >> ARB_shader_image_load_store. > > >> >> --- > > >> >> src/mesa/drivers/dri/i965/brw_eu.h | 10 +- > > >> >> src/mesa/drivers/dri/i965/brw_eu_emit.c | 158 > > >> >> +-- > > >> >> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 4 +- > > >> >> src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 +- > > >> >> 4 files changed, 96 insertions(+), 80 deletions(-) > > >> >> > > >> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h > > >> >> b/src/mesa/drivers/dri/i965/brw_eu.h > > >> >> index 87a9f3f..9cc9123 100644 > > >> >> --- a/src/mesa/drivers/dri/i965/brw_eu.h > > >> >> +++ b/src/mesa/drivers/dri/i965/brw_eu.h > > >> >> @@ -398,18 +398,18 @@ void brw_CMP(struct brw_compile *p, > > >> >> > > >> >> void > > >> >> brw_untyped_atomic(struct brw_compile *p, > > >> >> - struct brw_reg dest, > > >> >> + struct brw_reg dst, > > >> >> struct brw_reg payload, > > >> >> + struct brw_reg surface, > > >> >> unsigned atomic_op, > > >> >> - unsigned bind_table_index, > > >> >> unsigned msg_length, > > >> >> bool response_expected); > > >> >> > > >> >> void > > >> >> brw_untyped_surface_read(struct brw_compile *p, > > >> >> - struct brw_reg dest, > > >> >> - struct brw_reg mrf, > > >> >> - unsigned bind_table_index, > > >> >> + struct brw_reg dst, > > >> >> + struct brw_reg payload, > > >> >> + struct brw_reg surface, > > >> >> unsigned msg_length, > > >> >> unsigned num_channels); > > >> >> > > >> >> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > >> >> b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > >> >> index 0b655d4..34695bf 100644 > > >> >> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > >> >> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > >> >> @@ -2518,6 +2518,48 @@ brw_send_indirect_message(struct brw_compile > > >> >> *p, > > >> >> return setup; > > >> >> } > > >> >> > > >> >> +static struct brw_inst * > > >> >> +brw_send_indirect_surface_message(struct brw_compile *p, > > >> >> + unsigned sfid, > > >> >> + struct brw_reg dst, > > >> >> + struct brw_reg payload, > > >> >> + struct brw_reg surface, > > >> >> + unsigned message_len, > > >> >> + unsigned response_len, > > >> >> + bool header_present) > > >> >> +{ > > >> >> + const struct brw_context *brw = p->brw; > > >> >> + struct brw_inst *insn; > > >> >> + > > >> >> + if (surface.file != BRW_IMMEDIATE_VALUE) { > > >> >> + struct brw_reg addr = retype(brw_address_reg(0), > > >> >> BRW_REGISTER_TYPE_UD); > > >> >> + > > >> >> + brw_push_insn_state(p); > > >> >> + brw_set_default_access_mode(p, BRW_ALIGN_1); > > >> >> + brw_set_default_mask_control(p, BRW_MASK_DISABLE); > > >> >> + brw_set_default_predicate_control(p, BRW_PREDICATE_NONE); > > >> >> + > > >> >> + /* Mask out invalid bits from the surface index to avoid hangs > > >> >> e.g. when > > >> >> + * some surface array is accessed out of bounds. > > >> >> + */ > > >> >> + insn = brw_AND(p, addr, > > >> >> + suboffset(vec1(retype(surface, > > >> >> BRW_REGISTER_TYPE_UD)), > > >> >> + BRW_GET_SWZ(surface.dw1.bits.swizzle, > > >> >> 0)), > > >> >> + brw_imm_ud(0xff)); > > >> >> + > > >> >> + brw_pop_insn_state(p); > > >> >> + > > >> >> + surface = addr; > > >> >> + } > > >> >> + > > >> >> + insn = brw_send_indirect_message(p, sfid, dst, payload, surface); > > >> >> + brw_inst_set_mlen(brw, insn, message_len); > > >> >> + brw_inst_set_rlen(brw, insn, response_len); > > >> >> + brw_inst_set_header_present(brw, insn, header_present); > > >> > > > >> > I'll continue the discussion we started with
Re: [Mesa-dev] [PATCH 01/13] i965: Factor out logic to build a send message instruction with indirect descriptor.
On Friday, February 27, 2015 05:34:44 PM Francisco Jerez wrote: > --- > src/mesa/drivers/dri/i965/brw_eu.h | 19 ++-- > src/mesa/drivers/dri/i965/brw_eu_emit.c | 58 > ++-- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 55 +- > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 37 --- > 4 files changed, 77 insertions(+), 92 deletions(-) FYI, I glanced over this series briefly, and it looks reasonable to me. It looks like Topi's done a pretty thorough review (thanks!) - if he's happy with it, that's good enough for me. (I suppose we could call that an Acked-by.) signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4] mesa: improve ARB_copy_image internal format compat check
On Saturday, March 07, 2015 10:09:15 PM Jason Ekstrand wrote: > LGTM. Jason, Sean doesn't have commit access (it's actually his first patch) - if this is good to go, would you mind pushing it for him? Nice work, Sean! --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/4] i965/fs: Make get_timestamp() return an fs_inst * rather than emitting.
On Friday, February 27, 2015 11:37:01 PM Pohjolainen, Topi wrote: > On Fri, Feb 27, 2015 at 11:15:35AM -0800, Kenneth Graunke wrote: > > This makes another part of the INTEL_DEBUG=shader_time code emittable > > at arbitrary locations, rather than just at the end of the instruction > > stream. > > > > v2: Don't lose smear! Caught by Topi Pohjolainen. > > > > Signed-off-by: Kenneth Graunke > > --- > > src/mesa/drivers/dri/i965/brw_fs.cpp | 24 +--- > > src/mesa/drivers/dri/i965/brw_fs.h | 2 +- > > 2 files changed, 14 insertions(+), 12 deletions(-) > > > > Yikes, good catch! Thanks for the review, Topi! > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > index 9c6f084..d65f1f1 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > @@ -680,8 +680,8 @@ fs_visitor::type_size(const struct glsl_type *type) > > return 0; > > } > > > > -fs_reg > > -fs_visitor::get_timestamp() > > +fs_inst * > > +fs_visitor::timestamp_read() > > { > > assert(brw->gen >= 7); > > > > @@ -692,12 +692,6 @@ fs_visitor::get_timestamp() > > > > fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 4); > > > > - fs_inst *mov = emit(MOV(dst, ts)); > > - /* We want to read the 3 fields we care about even if it's not enabled > > in > > -* the dispatch. > > -*/ > > - mov->force_writemask_all = true; > > - > > /* The caller wants the low 32 bits of the timestamp. Since it's > > running > > * at the GPU clock rate of ~1.2ghz, it will roll over every ~3 seconds, > > * which is plenty of time for our purposes. It is identical across the > > @@ -710,14 +704,21 @@ fs_visitor::get_timestamp() > > */ > > dst.set_smear(0); > > > > - return dst; > > + fs_inst *mov = MOV(dst, ts); > > Previously the smear wasn't set for the destination in the instruction > itself. I had to check what set_smear() really does. It also sets stride to > zero which the original logic left to the init value of one. I guess this is > not what you intented? Augh. Good catch - that totally breaks things. It makes the timestamp read a mov(1) instead of a mov(4). I rebased this code, and discovered that shader_time was just totally hosed again. I think I've patched it up, and will send out a respin shortly. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] i965/fs: Set force_writemask_all on shader_time instructions.
These computations don't have anything to do with the currently executing channels, so they should use force_writemask_all. This fixes assert failures. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by: Kenneth Graunke Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d6acc23..45a5793 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -759,18 +759,23 @@ fs_visitor::emit_shader_time_end() reset.set_smear(2); fs_inst *test = emit(AND(reg_null_d, reset, fs_reg(1u))); test->conditional_mod = BRW_CONDITIONAL_Z; + test->force_writemask_all = true; emit(IF(BRW_PREDICATE_NORMAL)); fs_reg start = shader_start_time; start.negate = true; fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1); - emit(ADD(diff, start, shader_end_time)); + fs_inst *add = ADD(diff, start, shader_end_time); + add->force_writemask_all = true; + emit(add); /* If there were no instructions between the two timestamp gets, the diff * is 2 cycles. Remove that overhead, so I can forget about that when * trying to determine the time taken for single instructions. */ - emit(ADD(diff, diff, fs_reg(-2u))); + add = ADD(diff, diff, fs_reg(-2u)); + add->force_writemask_all = true; + emit(add); emit_shader_time_write(type, diff); emit_shader_time_write(written_type, fs_reg(1u)); -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/5] i965/fs: Make emit_shader_time_write return rather than emit.
Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)). The advantage is that we can also insert a shader time write at an arbitrary location in the instruction stream, rather than being restricted to emitting at the end. Signed-off-by: Kenneth Graunke Reviewed-by: Topi Pohjolainen --- src/mesa/drivers/dri/i965/brw_fs.cpp | 15 +++ src/mesa/drivers/dri/i965/brw_fs.h | 3 +-- 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index db8affa..682841b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -778,16 +778,15 @@ fs_visitor::emit_shader_time_end() add->force_writemask_all = true; emit(add); - emit_shader_time_write(type, diff); - emit_shader_time_write(written_type, fs_reg(1u)); + emit(SHADER_TIME_ADD(type, diff)); + emit(SHADER_TIME_ADD(written_type, fs_reg(1u))); emit(BRW_OPCODE_ELSE); - emit_shader_time_write(reset_type, fs_reg(1u)); + emit(SHADER_TIME_ADD(reset_type, fs_reg(1u))); emit(BRW_OPCODE_ENDIF); } -void -fs_visitor::emit_shader_time_write(enum shader_time_shader_type type, - fs_reg value) +fs_inst * +fs_visitor::SHADER_TIME_ADD(enum shader_time_shader_type type, fs_reg value) { int shader_time_index = brw_get_shader_time_index(brw, shader_prog, prog, type); @@ -799,8 +798,8 @@ fs_visitor::emit_shader_time_write(enum shader_time_shader_type type, else payload = vgrf(glsl_type::uint_type); - emit(new(mem_ctx) fs_inst(SHADER_OPCODE_SHADER_TIME_ADD, - fs_reg(), payload, offset, value)); + return new(mem_ctx) fs_inst(SHADER_OPCODE_SHADER_TIME_ADD, + fs_reg(), payload, offset, value); } void diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 70098d8..be1c8a1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -378,8 +378,7 @@ public: void emit_shader_time_begin(); void emit_shader_time_end(); - void emit_shader_time_write(enum shader_time_shader_type type, - fs_reg value); + fs_inst *SHADER_TIME_ADD(enum shader_time_shader_type type, fs_reg value); void emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, fs_reg dst, fs_reg offset, fs_reg src0, -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] i965/fs: Make get_timestamp() pass back the MOV rather than emitting it.
This makes another part of the INTEL_DEBUG=shader_time code emittable at arbitrary locations, rather than just at the end of the instruction stream. v2: Don't lose smear! Caught by Topi Pohjolainen. v3: Don't set smear on the destination of the MOV. Thanks Topi! Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs.cpp | 19 +++ src/mesa/drivers/dri/i965/brw_fs.h | 2 +- 2 files changed, 16 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 682841b..8f11600 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -677,8 +677,14 @@ fs_visitor::type_size(const struct glsl_type *type) return 0; } +/** + * Create a MOV to read the timestamp register. + * + * The caller is responsible for emitting the MOV. The return value is + * the destination of the MOV, with extra parameters set. + */ fs_reg -fs_visitor::get_timestamp() +fs_visitor::get_timestamp(fs_inst **out_mov) { assert(brw->gen >= 7); @@ -689,7 +695,7 @@ fs_visitor::get_timestamp() fs_reg dst = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 4); - fs_inst *mov = emit(MOV(dst, ts)); + fs_inst *mov = MOV(dst, ts); /* We want to read the 3 fields we care about even if it's not enabled in * the dispatch. */ @@ -707,6 +713,7 @@ fs_visitor::get_timestamp() */ dst.set_smear(0); + *out_mov = mov; return dst; } @@ -714,7 +721,9 @@ void fs_visitor::emit_shader_time_begin() { current_annotation = "shader time start"; - shader_start_time = get_timestamp(); + fs_inst *mov; + shader_start_time = get_timestamp(&mov); + emit(mov); } void @@ -750,7 +759,9 @@ fs_visitor::emit_shader_time_end() unreachable("fs_visitor::emit_shader_time_end missing code"); } - fs_reg shader_end_time = get_timestamp(); + fs_inst *tm_read; + fs_reg shader_end_time = get_timestamp(&tm_read); + emit(tm_read); /* Check that there weren't any timestamp reset events (assuming these * were the only two timestamp reads that happened). diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index be1c8a1..f68a476 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -399,7 +399,7 @@ public: void resolve_ud_negate(fs_reg *reg); void resolve_bool_comparison(ir_rvalue *rvalue, fs_reg *reg); - fs_reg get_timestamp(); + fs_reg get_timestamp(fs_inst **out_mov); struct brw_reg interp_reg(int location, int channel); void setup_uniform_values(ir_variable *ir); -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] INTEL_DEBUG=shader_time scalar backend fixes
Welcome to the rabbit trail. In order to fix Football Manager, I had to rework INTEL_DEBUG=shader_time in the FS backend. While doing that, I hit two assertion failures. After fixing that, I compared numbers. I noticed that VS again accounted for 0 cycles. Matt - could you take a look at brw_vec4_dead_code_elimination.cpp? Since 5df88c2096281f (the rewrite to use live intervals), it's again completely eating the atomic buffer offset initialization, resulting in us using garbage data in the first register (of an mlen 2 message). (Sounds like the same bug I fixed in d0575d98fc595dcc17706dc73d1eb4.) With these patches, and the new vec4 DCE pass reverted, 10.3 vs my branch produce basically the same numbers on an openarena timedemo. Available in the 'football-v3' branch of my tree. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/5] i965/fs: Make emit_shader_time_end() insert before EOT.
Previously, we emitted the shader-time epilogue from emit_fb_writes(), during the middle of looping through color regions (or emit_urb_writes for the VS). This is duplicated several times and rather awkward. I need to fix a bug in our FB write handling, and it will be a lot easier if we move emit_shader_time_end() out of there. Now, we simply emit FB writes/URB writes, and subsequently have emit_shader_time_end() insert instructions before the final SEND with EOT. Not only is this simpler, it's actually a slight improvement: we now include the MOVs to set up the final FB write payload in our shader-time measurements. Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses send-from-GRF. (In the past, we might have hit trouble where both attempt to use MRFs for messages; that's not a problem now.) v2: Rebase on v3 of the previous patch and other shader_time fixes. Signed-off-by: Kenneth Graunke Reviewed-by: Topi Pohjolainen [v1] --- src/mesa/drivers/dri/i965/brw_fs.cpp | 28 ++-- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 13 - 2 files changed, 18 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 8f11600..af48f4b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -759,19 +759,24 @@ fs_visitor::emit_shader_time_end() unreachable("fs_visitor::emit_shader_time_end missing code"); } + /* Insert our code just before the final SEND with EOT. */ + exec_node *end = this->instructions.get_tail(); + assert(end && ((fs_inst *) end)->eot); + fs_inst *tm_read; fs_reg shader_end_time = get_timestamp(&tm_read); - emit(tm_read); + end->insert_before(tm_read); /* Check that there weren't any timestamp reset events (assuming these * were the only two timestamp reads that happened). */ fs_reg reset = shader_end_time; reset.set_smear(2); - fs_inst *test = emit(AND(reg_null_d, reset, fs_reg(1u))); + fs_inst *test = AND(reg_null_d, reset, fs_reg(1u)); test->conditional_mod = BRW_CONDITIONAL_Z; test->force_writemask_all = true; - emit(IF(BRW_PREDICATE_NORMAL)); + end->insert_before(test); + end->insert_before(IF(BRW_PREDICATE_NORMAL)); fs_reg start = shader_start_time; start.negate = true; @@ -779,7 +784,7 @@ fs_visitor::emit_shader_time_end() diff.set_smear(0); fs_inst *add = ADD(diff, start, shader_end_time); add->force_writemask_all = true; - emit(add); + end->insert_before(add); /* If there were no instructions between the two timestamp gets, the diff * is 2 cycles. Remove that overhead, so I can forget about that when @@ -787,13 +792,13 @@ fs_visitor::emit_shader_time_end() */ add = ADD(diff, diff, fs_reg(-2u)); add->force_writemask_all = true; - emit(add); + end->insert_before(add); - emit(SHADER_TIME_ADD(type, diff)); - emit(SHADER_TIME_ADD(written_type, fs_reg(1u))); - emit(BRW_OPCODE_ELSE); - emit(SHADER_TIME_ADD(reset_type, fs_reg(1u))); - emit(BRW_OPCODE_ENDIF); + end->insert_before(SHADER_TIME_ADD(type, diff)); + end->insert_before(SHADER_TIME_ADD(written_type, fs_reg(1u))); + end->insert_before(new(mem_ctx) fs_inst(BRW_OPCODE_ELSE, dispatch_width)); + end->insert_before(SHADER_TIME_ADD(reset_type, fs_reg(1u))); + end->insert_before(new(mem_ctx) fs_inst(BRW_OPCODE_ENDIF, dispatch_width)); } fs_inst * @@ -3915,6 +3920,9 @@ fs_visitor::run_fs() emit_fb_writes(); + if (INTEL_DEBUG & DEBUG_SHADER_TIME) + emit_shader_time_end(); + calculate_cfg(); optimize(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 6b48f70..6766583 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -3645,9 +3645,6 @@ fs_visitor::emit_fb_writes() fs_inst *inst; if (do_dual_src) { - if (INTEL_DEBUG & DEBUG_SHADER_TIME) - emit_shader_time_end(); - this->current_annotation = ralloc_asprintf(this->mem_ctx, "FB dual-source write"); inst = emit_single_fb_write(this->outputs[0], this->dual_src_output, @@ -3663,19 +3660,12 @@ fs_visitor::emit_fb_writes() if (brw->gen >= 6 && key->replicate_alpha && target != 0) src0_alpha = offset(outputs[0], 3); - if (target == key->nr_color_regions - 1 && - (INTEL_DEBUG & DEBUG_SHADER_TIME)) -emit_shader_time_end(); - inst = emit_single_fb_write(this->outputs[target], reg_undef, src0_alpha, this->output_components[target]);
[Mesa-dev] [PATCH 2/5] i965/fs: Set smear on shader_time diff register.
The ADD(diff, diff, fs_reg(-2u)) instruction reads diff, which is a width 1 register. We need to read it as <0,1,0> with a subreg of 0, which is what smear accomplishes. Fixes assertion: brw_eu_emit.c:285: validate_reg: Assertion `hstride == 0' failed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by: Kenneth Graunke Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/i965/brw_fs.cpp | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 45a5793..db8affa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -765,6 +765,7 @@ fs_visitor::emit_shader_time_end() fs_reg start = shader_start_time; start.negate = true; fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1); + diff.set_smear(0); fs_inst *add = ADD(diff, start, shader_end_time); add->force_writemask_all = true; emit(add); -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965/fs: Set force_writemask_all on shader_time instructions.
On Sunday, March 08, 2015 01:32:33 PM Matt Turner wrote: > On Sun, Mar 8, 2015 at 1:08 AM, Kenneth Graunke wrote: > > These computations don't have anything to do with the currently > > executing channels, so they should use force_writemask_all. > > > > This fixes assert failures. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 > > Signed-off-by: Kenneth Graunke > > Cc: mesa-sta...@lists.freedesktop.org > > --- > > src/mesa/drivers/dri/i965/brw_fs.cpp | 9 +++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > > b/src/mesa/drivers/dri/i965/brw_fs.cpp > > index d6acc23..45a5793 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > > @@ -759,18 +759,23 @@ fs_visitor::emit_shader_time_end() > > reset.set_smear(2); > > fs_inst *test = emit(AND(reg_null_d, reset, fs_reg(1u))); > > test->conditional_mod = BRW_CONDITIONAL_Z; > > + test->force_writemask_all = true; > > emit(IF(BRW_PREDICATE_NORMAL)); > > > > fs_reg start = shader_start_time; > > start.negate = true; > > fs_reg diff = fs_reg(GRF, alloc.allocate(1), BRW_REGISTER_TYPE_UD, 1); > > - emit(ADD(diff, start, shader_end_time)); > > + fs_inst *add = ADD(diff, start, shader_end_time); > > + add->force_writemask_all = true; > > + emit(add); > > Emit returns a pointer to the instruction it emitted, so you can cut > one line by wrapping the ADD with emit(). > > Either way's fine. Yeah, I didn't bother because patch 5 changes it to: end->insert_before(add) and that doesn't return a pointer to the inserted instruction. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] meta: Plug memory leak in blit shader creation
On Friday, March 06, 2015 06:22:00 PM Ben Widawsky wrote: > It looks like this has existed since: > commit f5a477ab76b6e0b268387699cd2253a43db0dfae > Author: Ian Romanick > Date: Mon Dec 16 11:54:08 2013 -0800 > > meta: Refactor shader generation code out of mipmap generation path > > Valgrind was complaining on the piglit test: > fbo-generatemipmap-formats GL_ARB_texture_float -auto -fbo > > Cc: Ian Romanick > Cc: Brian Paul > Cc: Eric Anholt > Reported-by: Mark Janes (Jenkins) > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/common/meta.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c > index fdc4cf1..2c1abe3 100644 > --- a/src/mesa/drivers/common/meta.c > +++ b/src/mesa/drivers/common/meta.c > @@ -270,6 +270,7 @@ _mesa_meta_setup_blit_shader(struct gl_context *ctx, > > if (shader->shader_prog != 0) { >_mesa_UseProgram(shader->shader_prog); > + ralloc_free(mem_ctx); >return; > } Hi Ben, You're right - in the case where the shader already exists (due to us hitting this path once before), we do create a ralloc context and fail to free it. We don't actually need one, since we're not constructing shader code. I think it would be better to pull 'mem_ctx = ralloc_context(NULL)' out of the variable declaration, and put it after this early return. The early return is the common case, and it'd be nice to avoid allocating and freeing pointless heap memory for no reason. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] i965/fs: Store a pointer to brw_sampler_prog_key_data in the visitor.
The NIR backend hardcodes brw_wm_prog_key at the moment, which won't work when we support scalar VS. We could use get_tex(), but it's a static method. I was going to promote it to fs_visitor, but then realized that both parameters (stage and key) are already members. It then occured to me that we could just set up a pointer in the constructor, and skip having a function altogether. This patch also converts all existing users to use key_tex. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs.h | 2 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 55 3 files changed, 27 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index ee6ba98..7f4916a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -418,6 +418,8 @@ public: void visit_atomic_counter_intrinsic(ir_call *ir); const void *const key; + struct brw_sampler_prog_key_data *key_tex; + struct brw_stage_prog_data *prog_data; unsigned int sanity_param_count; diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 3bb6806..249e59c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1612,7 +1612,6 @@ fs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) void fs_visitor::nir_emit_texture(nir_tex_instr *instr) { - brw_wm_prog_key *key = (brw_wm_prog_key*) this->key; unsigned sampler = instr->sampler_index; fs_reg sampler_reg(sampler); @@ -1709,7 +1708,7 @@ fs_visitor::nir_emit_texture(nir_tex_instr *instr) } if (instr->op == nir_texop_txf_ms) { - if (brw->gen >= 7 && key->tex.compressed_multisample_layout_mask & (1<gen >= 7 && key_tex->compressed_multisample_layout_mask & (1 << sampler)) mcs = emit_mcs_fetch(coordinate, instr->coord_components, sampler_reg); else mcs = fs_reg(0u); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index e413ae3..2adb299 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1860,19 +1860,6 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst, return inst; } -static struct brw_sampler_prog_key_data * -get_tex(gl_shader_stage stage, const void *key) -{ - switch (stage) { - case MESA_SHADER_FRAGMENT: - return &((brw_wm_prog_key*) key)->tex; - case MESA_SHADER_VERTEX: - return &((brw_vue_prog_key*) key)->tex; - default: - unreachable("unhandled shader stage"); - } -} - fs_reg fs_visitor::rescale_texcoord(fs_reg coordinate, int coord_components, bool is_rect, uint32_t sampler, int texunit) @@ -1880,7 +1867,6 @@ fs_visitor::rescale_texcoord(fs_reg coordinate, int coord_components, fs_inst *inst = NULL; bool needs_gl_clamp = true; fs_reg scale_x, scale_y; - struct brw_sampler_prog_key_data *tex = get_tex(stage, this->key); /* The 965 requires the EU to do the normalization of GL rectangle * texture coordinates. We use the program parameter state @@ -1888,8 +1874,8 @@ fs_visitor::rescale_texcoord(fs_reg coordinate, int coord_components, */ if (is_rect && (brw->gen < 6 || -(brw->gen >= 6 && (tex->gl_clamp_mask[0] & (1 << sampler) || - tex->gl_clamp_mask[1] & (1 << sampler) { +(brw->gen >= 6 && (key_tex->gl_clamp_mask[0] & (1 << sampler) || + key_tex->gl_clamp_mask[1] & (1 << sampler) { struct gl_program_parameter_list *params = prog->Parameters; int tokens[STATE_LENGTH] = { STATE_INTERNAL, @@ -1950,7 +1936,7 @@ fs_visitor::rescale_texcoord(fs_reg coordinate, int coord_components, needs_gl_clamp = false; for (int i = 0; i < 2; i++) { -if (tex->gl_clamp_mask[i] & (1 << sampler)) { +if (key_tex->gl_clamp_mask[i] & (1 << sampler)) { fs_reg chan = coordinate; chan = offset(chan, i); @@ -1975,7 +1961,7 @@ fs_visitor::rescale_texcoord(fs_reg coordinate, int coord_components, if (coord_components > 0 && needs_gl_clamp) { for (int i = 0; i < MIN2(coord_components, 3); i++) { -if (tex->gl_clamp_mask[i] & (1 << sampler)) { +if (key_tex->gl_clamp_mask[i] & (1 << sampler)) { fs_reg chan = coordinate; chan = offset(chan, i); @@ -2033,14 +2019,13 @@ fs_visitor::emit_texture(ir_texture_opcode op, uint32_t sampler,
[Mesa-dev] [PATCH 6/9] i965/fs: Refactor fs_visitor::nir_setup_inputs().
No functional change. In preparation for supporting vertex shaders, this adds a switch statement on shader stage (since vertex attributes and fragment shader varyings will need different handling). It also renames "varying" to "input", to be more general. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 29 +++-- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index d700523..3baafc4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -199,18 +199,27 @@ fs_visitor::nir_setup_inputs(nir_shader *shader) struct hash_entry *entry; hash_table_foreach(shader->inputs, entry) { nir_variable *var = (nir_variable *) entry->data; - fs_reg varying = offset(nir_inputs, var->data.driver_location); + fs_reg input = offset(nir_inputs, var->data.driver_location); fs_reg reg; - if (var->data.location == VARYING_SLOT_POS) { - reg = *emit_fragcoord_interpolation(var->data.pixel_center_integer, - var->data.origin_upper_left); - emit_percomp(MOV(varying, reg), 0xF); - } else { - emit_general_interpolation(varying, var->name, var->type, -(glsl_interp_qualifier) var->data.interpolation, -var->data.location, var->data.centroid, -var->data.sample); + switch (stage) { + case MESA_SHADER_VERTEX: + case MESA_SHADER_GEOMETRY: + case MESA_SHADER_COMPUTE: + unreachable("fs_visitor not used for these stages yet."); + break; + case MESA_SHADER_FRAGMENT: + if (var->data.location == VARYING_SLOT_POS) { +reg = *emit_fragcoord_interpolation(var->data.pixel_center_integer, +var->data.origin_upper_left); +emit_percomp(MOV(input, reg), 0xF); + } else { +emit_general_interpolation(input, var->name, var->type, + (glsl_interp_qualifier) var->data.interpolation, + var->data.location, var->data.centroid, + var->data.sample); + } + break; } } } -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/9] i965/fs: Add VS output support to nir_setup_outputs().
Adapted from fs_visitor::visit(ir_variable *). Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 1734d03..d03a337 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -255,7 +255,17 @@ fs_visitor::nir_setup_outputs(nir_shader *shader) nir_variable *var = (nir_variable *) entry->data; fs_reg reg = offset(nir_outputs, var->data.driver_location); - if (var->data.index > 0) { + int vector_elements = + var->type->is_array() ? var->type->fields.array->vector_elements + : var->type->vector_elements; + + if (stage == MESA_SHADER_VERTEX) { + for (int i = 0; i < ALIGN(type_size(var->type), 4) / 4; i++) { +int output = var->data.location + i; +this->outputs[output] = offset(reg, 4 * i); +this->output_components[output] = vector_elements; + } + } else if (var->data.index > 0) { assert(var->data.location == FRAG_RESULT_DATA0); assert(var->data.index == 1); this->dual_src_output = reg; @@ -275,10 +285,6 @@ fs_visitor::nir_setup_outputs(nir_shader *shader) assert(var->data.location >= FRAG_RESULT_DATA0 && var->data.location < FRAG_RESULT_DATA0 + BRW_MAX_DRAW_BUFFERS); - int vector_elements = -var->type->is_array() ? var->type->fields.array->vector_elements - : var->type->vector_elements; - /* General color output. */ for (unsigned int i = 0; i < MAX2(1, var->type->length); i++) { int output = var->data.location - FRAG_RESULT_DATA0 + i; -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/9] i965/nir: Optimize after nir_lower_var_copies().
Array variable copy splitting generates a bunch of stuff we want to clean up before proceeding. Signed-off-by: Kenneth Graunke Cc: Jason Ekstrand --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 249e59c..fbdfc22 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -102,6 +102,9 @@ fs_visitor::emit_nir_code() nir_lower_var_copies(nir); nir_validate_shader(nir); + /* Get rid of split copies */ + nir_optimize(nir); + nir_lower_io(nir); nir_validate_shader(nir); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 9/9] i965: Use NIR for scalar VS when INTEL_USE_NIR is set.
Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs.cpp | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 428234f..ee5bc4a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3818,12 +3818,17 @@ fs_visitor::run_vs() if (INTEL_DEBUG & DEBUG_SHADER_TIME) emit_shader_time_begin(); - foreach_in_list(ir_instruction, ir, shader->base.ir) { - base_ir = ir; - this->result = reg_undef; - ir->accept(this); + if (getenv("INTEL_USE_NIR") != NULL) { + emit_nir_code(); + } else { + foreach_in_list(ir_instruction, ir, shader->base.ir) { + base_ir = ir; + this->result = reg_undef; + ir->accept(this); + } + base_ir = NULL; } - base_ir = NULL; + if (failed) return false; -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] nir: Add intrinsics for SYSTEM_VALUE_BASE_VERTEX and VERTEX_ID_ZERO_BASE
Ian and I added these around the time Connor was developing NIR. Now that both exist, we should make them work together! Signed-off-by: Kenneth Graunke --- src/glsl/nir/nir_intrinsics.h | 2 ++ src/glsl/nir/nir_lower_system_values.c | 6 ++ 2 files changed, 8 insertions(+) diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index 3bf102f..8e28765 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -95,6 +95,8 @@ ATOMIC(read, NIR_INTRINSIC_CAN_ELIMINATE) SYSTEM_VALUE(front_face, 1) SYSTEM_VALUE(vertex_id, 1) +SYSTEM_VALUE(vertex_id_zero_base, 1) +SYSTEM_VALUE(base_vertex, 1) SYSTEM_VALUE(instance_id, 1) SYSTEM_VALUE(sample_id, 1) SYSTEM_VALUE(sample_pos, 2) diff --git a/src/glsl/nir/nir_lower_system_values.c b/src/glsl/nir/nir_lower_system_values.c index 328d4f1..a6eec65 100644 --- a/src/glsl/nir/nir_lower_system_values.c +++ b/src/glsl/nir/nir_lower_system_values.c @@ -49,6 +49,12 @@ convert_instr(nir_intrinsic_instr *instr) case SYSTEM_VALUE_VERTEX_ID: op = nir_intrinsic_load_vertex_id; break; + case SYSTEM_VALUE_VERTEX_ID_ZERO_BASE: + op = nir_intrinsic_load_vertex_id_zero_base; + break; + case SYSTEM_VALUE_BASE_VERTEX: + op = nir_intrinsic_load_base_vertex; + break; case SYSTEM_VALUE_INSTANCE_ID: op = nir_intrinsic_load_instance_id; break; -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] i965/nir: Lower to registers a bit later.
We can't safely call nir_optimize() with register present, since several passes called in the loop can't handle registers, and will fail asserts. Notably, nir_lower_vec_alus() and nir_opt_algebraic() really don't want registers. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index fbdfc22..c5ed55c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -108,9 +108,6 @@ fs_visitor::emit_nir_code() nir_lower_io(nir); nir_validate_shader(nir); - nir_lower_locals_to_regs(nir); - nir_validate_shader(nir); - nir_remove_dead_variables(nir); nir_validate_shader(nir); @@ -125,6 +122,9 @@ fs_visitor::emit_nir_code() nir_optimize(nir); + nir_lower_locals_to_regs(nir); + nir_validate_shader(nir); + nir_lower_to_source_mods(nir); nir_validate_shader(nir); nir_copy_prop(nir); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] i965/fs: Handle VS inputs in the NIR backend.
Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 3baafc4..1734d03 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -199,11 +199,32 @@ fs_visitor::nir_setup_inputs(nir_shader *shader) struct hash_entry *entry; hash_table_foreach(shader->inputs, entry) { nir_variable *var = (nir_variable *) entry->data; + enum brw_reg_type type = brw_type_for_base_type(var->type); fs_reg input = offset(nir_inputs, var->data.driver_location); fs_reg reg; switch (stage) { - case MESA_SHADER_VERTEX: + case MESA_SHADER_VERTEX: { + /* Our ATTR file is indexed by VERT_ATTRIB_*, which is the value + * stored in nir_variable::location. + * + * However, NIR's load_input intrinsics use a different index - an + * offset into a single contiguous array containing all inputs. + * This index corresponds to the nir_variable::driver_location field. + * + * So, we need to copy from fs_reg(ATTR, var->location) to + * offset(nir_inputs, var->data.driver_location). + */ + unsigned components = var->type->without_array()->components(); + unsigned array_length = var->type->is_array() ? var->type->length : 1; + for (unsigned i = 0; i < array_length; i++) { +for (unsigned j = 0; j < components; j++) { + emit(MOV(retype(offset(input, components * i + j), type), +offset(fs_reg(ATTR, var->data.location + i, type), j))); +} + } + break; + } case MESA_SHADER_GEOMETRY: case MESA_SHADER_COMPUTE: unreachable("fs_visitor not used for these stages yet."); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/9] i965: Implement NIR intrinsics for loading VS system values.
Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 51 1 file changed, 51 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index c5ed55c..d700523 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -363,6 +363,30 @@ emit_system_values_block(nir_block *block, void *void_visitor) nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); switch (intrin->intrinsic) { + case nir_intrinsic_load_vertex_id: + unreachable("should be lowered by lower_vertex_id()."); + + case nir_intrinsic_load_vertex_id_zero_base: + assert(v->stage == MESA_SHADER_VERTEX); + reg = &v->nir_system_values[SYSTEM_VALUE_VERTEX_ID_ZERO_BASE]; + if (reg->file == BAD_FILE) +*reg = *v->emit_vs_system_value(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE); + break; + + case nir_intrinsic_load_base_vertex: + assert(v->stage == MESA_SHADER_VERTEX); + reg = &v->nir_system_values[SYSTEM_VALUE_BASE_VERTEX]; + if (reg->file == BAD_FILE) +*reg = *v->emit_vs_system_value(SYSTEM_VALUE_BASE_VERTEX); + break; + + case nir_intrinsic_load_instance_id: + assert(v->stage == MESA_SHADER_VERTEX); + reg = &v->nir_system_values[SYSTEM_VALUE_INSTANCE_ID]; + if (reg->file == BAD_FILE) +*reg = *v->emit_vs_system_value(SYSTEM_VALUE_INSTANCE_ID); + break; + case nir_intrinsic_load_sample_pos: assert(v->stage == MESA_SHADER_FRAGMENT); reg = &v->nir_system_values[SYSTEM_VALUE_SAMPLE_POS]; @@ -1344,6 +1368,33 @@ fs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) *emit_frontfacing_interpolation())); break; + case nir_intrinsic_load_vertex_id: + unreachable("should be lowered by lower_vertex_id()"); + + case nir_intrinsic_load_vertex_id_zero_base: { + fs_reg vertex_id = nir_system_values[SYSTEM_VALUE_VERTEX_ID_ZERO_BASE]; + assert(vertex_id.file != BAD_FILE); + dest.type = vertex_id.type; + emit(MOV(dest, vertex_id)); + break; + } + + case nir_intrinsic_load_base_vertex: { + fs_reg base_vertex = nir_system_values[SYSTEM_VALUE_BASE_VERTEX]; + assert(base_vertex.file != BAD_FILE); + dest.type = base_vertex.type; + emit(MOV(dest, base_vertex)); + break; + } + + case nir_intrinsic_load_instance_id: { + fs_reg instance_id = nir_system_values[SYSTEM_VALUE_INSTANCE_ID]; + assert(instance_id.file != BAD_FILE); + dest.type = instance_id.type; + emit(MOV(dest, instance_id)); + break; + } + case nir_intrinsic_load_sample_mask_in: { fs_reg sample_mask_in = nir_system_values[SYSTEM_VALUE_SAMPLE_MASK_IN]; assert(sample_mask_in.file != BAD_FILE); -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] [v2] meta: Plug memory leak
On Monday, March 09, 2015 11:44:18 AM Ben Widawsky wrote: > It looks like this has existed since > commit f5a477ab76b6e0b268387699cd2253a43db0dfae > Author: Ian Romanick > Date: Mon Dec 16 11:54:08 2013 -0800 > > meta: Refactor shader generation code out of mipmap generation path > > Valgrind was complaining on fbo-generatemipmap-formats > > v2: Instead, do the allocation after the early return block (v2) > > Cc: Ian Romanick > Cc: Brian Paul > Cc: Eric Anholt > Cc: Kenneth Graunke > Signed-off-by: Ben Widawsky > --- > > Thanks Ken. I wasn't sure if this path was common or not, and I had opted for > the standard, define variables at the top, style. If it occurs more than > infrequently, then I like this solution better. > > (FYI: Jenkins results, http://otc-gfxtest-01.jf.intel.com/job/bwidawsk/100/) > --- > src/mesa/drivers/common/meta.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c > index fdc4cf1..fa800ec 100644 > --- a/src/mesa/drivers/common/meta.c > +++ b/src/mesa/drivers/common/meta.c > @@ -247,7 +247,6 @@ _mesa_meta_setup_blit_shader(struct gl_context *ctx, > struct blit_shader_table *table) > { > char *vs_source, *fs_source; > - void *const mem_ctx = ralloc_context(NULL); > struct blit_shader *shader = choose_blit_shader(target, table); > const char *vs_input, *vs_output, *fs_input, *vs_preprocess, > *fs_preprocess; > > @@ -273,6 +272,8 @@ _mesa_meta_setup_blit_shader(struct gl_context *ctx, >return; > } > > + void *const mem_ctx = ralloc_context(NULL); > + > vs_source = ralloc_asprintf(mem_ctx, > "%s\n" > "%s vec2 position;\n" > I'm not clear whether we can get away with C99 mixed declarations and code yet (in the past, it's been a problem for MSVC). Assuming you move the declaration back to the top, and leave the initialization here, this would get a: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] nir: Fix non-determinism in nir_lower_vars_to_ssa().
Previously, we stored derefs in a hash table, using the malloc'd pointer as the key. Then, we walked through the hash table and generated code, based on the order of the hash table's elements. Memory addresses returned by malloc are pretty much random, which meant that the hash was random, and the hash table's elements would be walked in some random order. This led to successive compiles of the same shader using different variable names and slightly different orderings of phi-nodes. Code could not be diff'd, and the final assembly would sometimes change slightly too. It turns out the only point of the hash table was to avoid inserting the same node multiple times for different dereferences. We never actually searched the hash table! This patch uses an intrusive linked list instead. Since exec_list uses head and tail sentinels, checking prev or next against NULL will tell us whether the node is already in the list. Pair programming with Jason Ekstrand. Signed-off-by: Jason Ekstrand Signed-off-by: Kenneth Graunke --- src/glsl/nir/nir_lower_vars_to_ssa.c | 123 --- 1 file changed, 26 insertions(+), 97 deletions(-) diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c b/src/glsl/nir/nir_lower_vars_to_ssa.c index 9e9a418..86e6ab4 100644 --- a/src/glsl/nir/nir_lower_vars_to_ssa.c +++ b/src/glsl/nir/nir_lower_vars_to_ssa.c @@ -35,6 +35,13 @@ struct deref_node { bool lower_to_ssa; + /* Only valid for things that end up in the direct list. +* Note that multiple nir_deref_vars may correspond to this node, but they +* will all be equivalent, so any is as good as the other. +*/ + nir_deref_var *deref; + struct exec_node direct_derefs_link; + struct set *loads; struct set *stores; struct set *copies; @@ -69,7 +76,7 @@ struct lower_variables_state { * wildcards and no indirects, these are precisely the derefs that we * can actually consider lowering. */ - struct hash_table *direct_deref_nodes; + struct exec_list direct_deref_nodes; /* Controls whether get_deref_node will add variables to the * direct_deref_nodes table. This is turned on when we are initially @@ -83,88 +90,6 @@ struct lower_variables_state { struct hash_table *phi_table; }; -/* The following two functions implement a hash and equality check for - * variable dreferences. When the hash or equality function encounters an - * array, all indirects are treated as equal and are never equal to a - * direct dereference or a wildcard. - */ -static uint32_t -hash_deref(const void *void_deref) -{ - uint32_t hash = _mesa_fnv32_1a_offset_bias; - - const nir_deref_var *deref_var = void_deref; - hash = _mesa_fnv32_1a_accumulate(hash, deref_var->var); - - for (const nir_deref *deref = deref_var->deref.child; -deref; deref = deref->child) { - switch (deref->deref_type) { - case nir_deref_type_array: { - nir_deref_array *deref_array = nir_deref_as_array(deref); - - hash = _mesa_fnv32_1a_accumulate(hash, deref_array->deref_array_type); - - if (deref_array->deref_array_type == nir_deref_array_type_direct) -hash = _mesa_fnv32_1a_accumulate(hash, deref_array->base_offset); - break; - } - case nir_deref_type_struct: { - nir_deref_struct *deref_struct = nir_deref_as_struct(deref); - hash = _mesa_fnv32_1a_accumulate(hash, deref_struct->index); - break; - } - default: - assert("Invalid deref chain"); - } - } - - return hash; -} - -static bool -derefs_equal(const void *void_a, const void *void_b) -{ - const nir_deref_var *a_var = void_a; - const nir_deref_var *b_var = void_b; - - if (a_var->var != b_var->var) - return false; - - for (const nir_deref *a = a_var->deref.child, *b = b_var->deref.child; -a != NULL; a = a->child, b = b->child) { - if (a->deref_type != b->deref_type) - return false; - - switch (a->deref_type) { - case nir_deref_type_array: { - nir_deref_array *a_arr = nir_deref_as_array(a); - nir_deref_array *b_arr = nir_deref_as_array(b); - - if (a_arr->deref_array_type != b_arr->deref_array_type) -return false; - - if (a_arr->deref_array_type == nir_deref_array_type_direct && - a_arr->base_offset != b_arr->base_offset) -return false; - break; - } - case nir_deref_type_struct: - if (nir_deref_as_struct(a)->index != nir_deref_as_struct(b)->index) -return false; - break; - default: - assert("Invalid deref chain"); - return false; - } - - assert((a->child == NULL) == (b->child == NULL)); - if((a->child == NULL) != (b->child == NULL)) - return false; - } - - return true; -} - static int typ
[Mesa-dev] [PATCH 1/2] util: Fix foreach_list_typed_safe when exec_node is not at offset 0.
From: Jason Ekstrand __next and __prev are pointers to the structure containing the exec_node link, not the embedded exec_node. NULL checks would fail unless the embedded exec_node happened to be at offset 0 in the parent struct. Signed-off-by: Jason Ekstrand Reviewed-by: Kenneth Graunke --- src/glsl/list.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/list.h b/src/glsl/list.h index ddb98f7..680e963 100644 --- a/src/glsl/list.h +++ b/src/glsl/list.h @@ -684,7 +684,7 @@ inline void exec_node::insert_before(exec_list *before) exec_node_data(__type, (__list)->head, __field),\ * __next = \ exec_node_data(__type, (__node)->__field.next, __field);\ -__next != NULL;\ +&__next->__field != NULL; \ __node = __next, __next = \ exec_node_data(__type, (__next)->__field.next, __field)) @@ -693,7 +693,7 @@ inline void exec_node::insert_before(exec_list *before) exec_node_data(__type, (__list)->tail_pred, __field), \ * __prev = \ exec_node_data(__type, (__node)->__field.prev, __field);\ -__prev != NULL;\ +&__prev->__field != NULL; \ __node = __prev, __prev = \ exec_node_data(__type, (__prev)->__field.prev, __field)) -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Allow copy propagation on ATTR file registers.
This especially helps with NIR because we currently emit MOVs at the top of the shader to copy from various ATTR registers to a giant VGRF array of all inputs. (This could potentially be done better, but since there's only ever one write to each register, it should be trivial to copy propagate away...) With NIR - only vertex shaders: total instructions in shared programs: 3083505 -> 2868128 (-6.98%) instructions in affected programs: 2977237 -> 2761860 (-7.23%) helped:18653 Without NIR - only vertex shaders: total instructions in shared programs: 2724564 -> 2708831 (-0.58%) instructions in affected programs: 593696 -> 577963 (-2.65%) helped:3134 Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_fs.cpp | 1 + src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 5 - 2 files changed, 5 insertions(+), 1 deletion(-) No apparent Piglit regressions. Please scrutinize though, I spent very little actual thought on this, so I may have missed something stupid. diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 9a415ff..6fa90d2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3235,6 +3235,7 @@ fs_visitor::lower_load_payload() * destination may be used. */ assert(inst->src[i].file == IMM || + inst->src[i].file == ATTR || inst->src[i].file == UNIFORM); mov->force_writemask_all = true; } diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index 764741d..56f4fa1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -283,7 +283,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) if (entry->src.file == IMM) return false; - assert(entry->src.file == GRF || entry->src.file == UNIFORM); + assert(entry->src.file == GRF || entry->src.file == UNIFORM || + entry->src.file == ATTR); if (entry->opcode == SHADER_OPCODE_LOAD_PAYLOAD && inst->opcode == SHADER_OPCODE_LOAD_PAYLOAD) @@ -382,6 +383,7 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry) inst->src[arg].reg_offset = entry->src.reg_offset; inst->src[arg].subreg_offset = entry->src.subreg_offset; break; + case ATTR: case GRF: { assert(entry->src.width % inst->src[arg].width == 0); @@ -598,6 +600,7 @@ can_propagate_from(fs_inst *inst) ((inst->src[0].file == GRF && (inst->src[0].reg != inst->dst.reg || inst->src[0].reg_offset != inst->dst.reg_offset)) || +inst->src[0].file == ATTR || inst->src[0].file == UNIFORM || inst->src[0].file == IMM) && inst->src[0].type == inst->dst.type && -- 2.2.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-dev, 1/2] i965: fix up asserts in brw_inst_set_jip()
On Thu, Jan 26, 2017 at 01:50:41PM +1100, Timothy Arceri wrote: > We are casting from a signed 32bit int to an unsigned 16bit int > so shift 15 bits rather than 16. > --- > src/mesa/drivers/dri/i965/brw_inst.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_inst.h > b/src/mesa/drivers/dri/i965/brw_inst.h > index 13fce97..9a777e5 100644 > --- a/src/mesa/drivers/dri/i965/brw_inst.h > +++ b/src/mesa/drivers/dri/i965/brw_inst.h > @@ -283,8 +283,8 @@ brw_inst_set_jip(const struct gen_device_info *devinfo, > if (devinfo->gen >= 8) { >brw_inst_set_bits(inst, 127, 96, (uint32_t)value); > } else { > - assert(value <= (1 << 16) - 1); > - assert(value > -(1 << 16)); > + assert(value <= (1 << 15) - 1); > + assert(value > -(1 << 15)); >brw_inst_set_bits(inst, 111, 96, (uint16_t)value); > } > } Isn't -32768 legal? It would be excluded by the above assertions. A patch to make them: assert(value <= (1 << 15) - 1); assert(value >= -(1 << 15)); would be: Reviewed-by: Kenneth Graunke ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-dev, 2/2] i965: add assert to while_jumps_before_offset()
On Thu, Jan 26, 2017 at 01:50:42PM +1100, Timothy Arceri wrote: > jip should always be negative here as its the result of > do instruction - while instruction. > --- > src/mesa/drivers/dri/i965/brw_eu_emit.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > index 257757f..f4bec33 100644 > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > @@ -2714,6 +2714,7 @@ while_jumps_before_offset(const struct gen_device_info > *devinfo, > int scale = 16 / brw_jump_scale(devinfo); > int jip = devinfo->gen == 6 ? brw_inst_gen6_jump_count(devinfo, insn) > : brw_inst_jip(devinfo, insn); > + assert(jip < 0); > return while_offset + jip * scale <= start_offset; > } > This one is: Reviewed-by: Kenneth Graunke ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Implement another VF cache invalidate workaround on Gen8+.
...and provide a better citation for the existing one. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_pipe_control.c | 32 +--- 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_pipe_control.c b/src/mesa/drivers/dri/i965/brw_pipe_control.c index b8f740640f2..3e08841e0a9 100644 --- a/src/mesa/drivers/dri/i965/brw_pipe_control.c +++ b/src/mesa/drivers/dri/i965/brw_pipe_control.c @@ -118,14 +118,30 @@ brw_emit_pipe_control_flush(struct brw_context *brw, uint32_t flags) if (brw->gen == 8) gen8_add_cs_stall_workaround_bits(&flags); - if (brw->gen == 9 && - (flags & PIPE_CONTROL_VF_CACHE_INVALIDATE)) { - /* Hardware workaround: SKL - * - * Emit Pipe Control with all bits set to zero before emitting - * a Pipe Control with VF Cache Invalidate set. - */ - brw_emit_pipe_control_flush(brw, 0); + if (flags & PIPE_CONTROL_VF_CACHE_INVALIDATE) { + if (brw->gen >= 9) { +/* The PIPE_CONTROL "VF Cache Invalidation Enable" bit description + * lists several workarounds: + * + * "Projects: SKL, KBL, BXT + * If the VF Cache Invalidation Enable is set to a 1 in a + * PIPE_CONTROL, a separate Null PIPE_CONTROL, all bitfields sets + * to 0, with the VF Cache Invalidation Enable set to 0 needs to + * be sent prior to the PIPE_CONTROL with VF Cache Invalidation + * Enable set to a 1." + */ +brw_emit_pipe_control_flush(brw, 0); + +/* "Projects: BDW+ + * When VF Cache Invalidate is set “Post Sync Operation” must + * be enabled to “Write Immediate Data” or “Write PS Depth Count” + * or “Write Timestamp”. + */ +brw_emit_pipe_control_write(brw, +flags | PIPE_CONTROL_WRITE_IMMEDIATE, +brw->workaround_bo, 0, 0, 0); +return; + } } BEGIN_BATCH(6); -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Revert "i915: Always enable GL 2.0 support."
On Sunday, January 29, 2017 6:20:10 PM PST Matt Turner wrote: > This partially reverts commit 97217a40f97cdeae0304798b607f704deb0c3558. > It leaves ES 2.0 support in place per Ian's suggestion, because ES 2.0 > is designed to work on hardware like i915. Your commit message should mention why dropping from OpenGL 2.1 to 1.4 is a good thing. (IIRC it's because Chrome (and other apps?) use really slow paths with 2.1, and so the general usability of the system is likely to be worse.) > The piglit results look like: > >name: before-revert-i915 after-revert-i915 > -- - >pass: 7171 2169 >fail:933 201 > crash: 8 7 >skip: 32997 38676 > timeout: 0 0 >warn: 3 1 > incomplete: 0 0 > dmesg-warn: 0 0 > dmesg-fail: 0 0 > changes: 0 6040 > fixes: 0 292 > regressions: 0 2 > total: 41112 41054 Why are these interesting? > Cc: "17.0" > --- > src/mesa/drivers/dri/i915/intel_extensions.c | 8 ++-- > src/mesa/drivers/dri/i915/intel_screen.c | 21 +++-- > 2 files changed, 25 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i915/intel_extensions.c > b/src/mesa/drivers/dri/i915/intel_extensions.c > index ab7820f..4f2c6fa 100644 > --- a/src/mesa/drivers/dri/i915/intel_extensions.c > +++ b/src/mesa/drivers/dri/i915/intel_extensions.c > @@ -92,8 +92,12 @@ intelInitExtensions(struct gl_context *ctx) >ctx->Extensions.ATI_separate_stencil = true; >ctx->Extensions.ATI_texture_env_combine3 = true; >ctx->Extensions.NV_texture_env_combine4 = true; > - ctx->Extensions.ARB_fragment_shader = true; > - ctx->Extensions.ARB_occlusion_query = true; > + > + if (driQueryOptionb(&intel->optionCache, "fragment_shader")) > + ctx->Extensions.ARB_fragment_shader = true; I get dropping GLSL support, but isn't GL_ARB_fragment_shader sort of reasonable for this hardware? > + > + if (driQueryOptionb(&intel->optionCache, "stub_occlusion_query")) > + ctx->Extensions.ARB_occlusion_query = true; > } > > if (intel->ctx.Mesa_DXTn > diff --git a/src/mesa/drivers/dri/i915/intel_screen.c > b/src/mesa/drivers/dri/i915/intel_screen.c > index 5c7c06a..fe86179 100644 > --- a/src/mesa/drivers/dri/i915/intel_screen.c > +++ b/src/mesa/drivers/dri/i915/intel_screen.c > @@ -62,6 +62,10 @@ DRI_CONF_BEGIN >DRI_CONF_DESC(en, "Enable early Z in classic mode (unstable, > 945-only).") >DRI_CONF_OPT_END > > + DRI_CONF_OPT_BEGIN_B(fragment_shader, "true") > + DRI_CONF_DESC(en, "Enable limited ARB_fragment_shader support on > 915/945.") > + DRI_CONF_OPT_END > + > DRI_CONF_SECTION_END > DRI_CONF_SECTION_QUALITY >DRI_CONF_FORCE_S3TC_ENABLE("false") > @@ -75,6 +79,10 @@ DRI_CONF_BEGIN >DRI_CONF_DISABLE_GLSL_LINE_CONTINUATIONS("false") >DRI_CONF_DISABLE_BLEND_FUNC_EXTENDED("false") > > + DRI_CONF_OPT_BEGIN_B(stub_occlusion_query, "false") > + DRI_CONF_DESC(en, "Enable stub ARB_occlusion_query support on > 915/945.") > + DRI_CONF_OPT_END > + >DRI_CONF_OPT_BEGIN_B(shader_precompile, "true") >DRI_CONF_DESC(en, "Perform code generation at shader link time.") >DRI_CONF_OPT_END > @@ -1125,12 +1133,21 @@ set_max_gl_versions(struct intel_screen *screen) > __DRIscreen *psp = screen->driScrnPriv; > > switch (screen->gen) { > - case 3: > + case 3: { > + bool has_fragment_shader = driQueryOptionb(&screen->optionCache, > "fragment_shader"); > + bool has_occlusion_query = driQueryOptionb(&screen->optionCache, > "stub_occlusion_query"); > + >psp->max_gl_core_version = 0; >psp->max_gl_es1_version = 11; > - psp->max_gl_compat_version = 21; >psp->max_gl_es2_version = 20; > + > + if (has_fragment_shader && has_occlusion_query) { > + psp->max_gl_compat_version = 21; > + } else { > + psp->max_gl_compat_version = 14; > + } >break; > + } > case 2: >psp->max_gl_core_version = 0; >psp->max_gl_compat_version = 13; > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] anv: Improve flushing around STATE_BASE_ADDRESS
On Tuesday, January 31, 2017 10:33:41 AM PST Jason Ekstrand wrote: > It is not clear from the docs exactly how pipelined STATE_BASE_ADDRESS > actually is. Previously, we knew we needed to flush prior to re-emitting > STATE_BASE_ADDRESS on gen8+ but we had never confirmed it on gen7 so we > left it alone and avoided the flush. Recently, Mark found hangs on gen7 > which appear to be STATE_BASE_ADDRESS related which this patch fixes. > The theory is that SURFACE_STATE structures are cached in texture cache > but relative addressing nature of things doesn't work nicely with the > cache. This means that you need to invalidate the texture cache when This description doesn't match the actual change you're making. We already invalidate the texture cache after STATE_BASE_ADDRESS, on all generations. You're making it flush the render cache. According to the comment already in that function, it sounds like the render cache may also include surface states, so...same explanation, just s/texture/render/g. It seems like a fine thing to do. > you change STATE_BASE_ADDRESS and interesting things can happen if there > are any EU threads in-flight when surface state base address changes. > > While we're here, we also add data cache flushing in order to ensure > that any compute shaders running are finished before we change the > surface state base address. It's unknown whether or not this would > actually be a problem but, given how hard these things are to debug, we > might as well make sure. That's kind of a separate change from applying the flushes on Gen7/7.5. It also seems like a fine thing to do. Might make sense to split into two patches. At minimum, assuming you update the commit message, Reviewed-by: Kenneth Graunke > > Reported-by: Mark Janes > Tested-by: Mark Janes > --- > src/intel/vulkan/genX_cmd_buffer.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c > b/src/intel/vulkan/genX_cmd_buffer.c > index f7894a0..7b43c6f 100644 > --- a/src/intel/vulkan/genX_cmd_buffer.c > +++ b/src/intel/vulkan/genX_cmd_buffer.c > @@ -55,8 +55,6 @@ genX(cmd_buffer_emit_state_base_address)(struct > anv_cmd_buffer *cmd_buffer) > { > struct anv_device *device = cmd_buffer->device; > > -/* XXX: Do we need this on more than just BDW? */ > -#if (GEN_GEN >= 8) > /* Emit a render target cache flush. > * > * This isn't documented anywhere in the PRM. However, it seems to be > @@ -65,9 +63,10 @@ genX(cmd_buffer_emit_state_base_address)(struct > anv_cmd_buffer *cmd_buffer) > * clear depth, reset state base address, and then go render stuff. > */ > anv_batch_emit(&cmd_buffer->batch, GENX(PIPE_CONTROL), pc) { > + pc.DCFlushEnable = true; >pc.RenderTargetCacheFlushEnable = true; > + pc.CommandStreamerStallEnable = true; > } > -#endif > > anv_batch_emit(&cmd_buffer->batch, GENX(STATE_BASE_ADDRESS), sba) { >sba.GeneralStateBaseAddress = (struct anv_address) { NULL, 0 }; > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] intel/blorp: Handle clearing of A4B4G4R4 on all platforms
On Friday, January 27, 2017 2:18:34 PM PST Jason Ekstrand wrote: > Cc: "13.0 17.0" > --- > src/intel/blorp/blorp_clear.c | 19 +++ > 1 file changed, 19 insertions(+) > > diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c > index afc505d..f8ac6dc 100644 > --- a/src/intel/blorp/blorp_clear.c > +++ b/src/intel/blorp/blorp_clear.c > @@ -349,6 +349,25 @@ blorp_clear(struct blorp_batch *batch, > if (format == ISL_FORMAT_R9G9B9E5_SHAREDEXP) { >clear_color.u32[0] = float3_to_rgb9e5(clear_color.f32); >format = ISL_FORMAT_R32_UINT; > + } else if (format == ISL_FORMAT_A4B4G4R4_UNORM) { > + /* Broadwell and earlier cannot render to this format so we need to > work > + * around it by swapping the colors around and using B4G4R4A4 instead. > + */ > + > + /* First, we apply the swizzle. */ > + union isl_color_value old; > + old.u32[swizzle.r - ISL_CHANNEL_SELECT_RED] = clear_color.u32[0]; > + old.u32[swizzle.g - ISL_CHANNEL_SELECT_RED] = clear_color.u32[1]; > + old.u32[swizzle.b - ISL_CHANNEL_SELECT_RED] = clear_color.u32[2]; > + old.u32[swizzle.a - ISL_CHANNEL_SELECT_RED] = clear_color.u32[3]; > + swizzle = ISL_SWIZZLE_IDENTITY; Are ISL_CHANNEL_SELECT_{ZERO,ONE} disallowed by a higher level? If not, this will break rather spectacularly (old.u32[-4] = ...). Maybe add asserts? Assuming that isn't a problem, Reviewed-by: Kenneth Graunke > + > + /* Now we re-order for the new format */ > + clear_color.u32[0] = old.u32[1]; > + clear_color.u32[1] = old.u32[2]; > + clear_color.u32[2] = old.u32[3]; > + clear_color.u32[3] = old.u32[0]; > + format = ISL_FORMAT_B4G4R4A4_UNORM; > } > > memcpy(¶ms.wm_inputs.clear_color, clear_color.f32, sizeof(float) * 4); > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] isl/formats: Only advertise sampling for A4B4G4R4 on Broadwell
On Friday, January 27, 2017 2:18:35 PM PST Jason Ekstrand wrote: > This causes hangs on Broadwell if you try to render to it. I have no > idea how we managed to not hit this earlier. > > Cc: "13.0 17.0" > --- > src/intel/isl/isl_format.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/src/intel/isl/isl_format.c b/src/intel/isl/isl_format.c > index c8daece..bc157d5 100644 > --- a/src/intel/isl/isl_format.c > +++ b/src/intel/isl/isl_format.c > @@ -218,9 +218,10 @@ static const struct surface_format_info format_info[] = { > SF(50, 50, x, x, x, x, x, x, x,x, P8A8_UNORM_PALETTE1) > SF( x, x, x, x, x, x, x, x, x,x, A1B5G5R5_UNORM) > /* According to the PRM, A4B4G4R4_UNORM isn't supported until Sky Lake > -* but empirical testing indicates that it works just fine on Broadwell. > +* but empirical testing indicates that at least sampling works just fine > +* on Broadwell. > */ > - SF(80, 80, x, x, 80, x, x, x, x,x, A4B4G4R4_UNORM) > + SF(80, 80, x, x, 90, x, x, x, x,x, A4B4G4R4_UNORM) > SF(90, x, x, x, x, x, x, x, x,x, L8A8_UINT) > SF(90, x, x, x, x, x, x, x, x,x, L8A8_SINT) > SF( Y, Y, x, 45, Y, Y, Y, x, x,x, R8_UNORM) > Still somewhat grumpy that there are mandatory 4-bit formats that aren't supported in hardware. We definitely don't want to advertise rendering to this if it doesn't work. You can always do your clever hacks in BLORP to make it work for e.g. copyimage etc, like you did for RGB9E5 and friends. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] docs: add link to http://mesamatrix.net/
On Wednesday, February 1, 2017 2:36:02 PM PST Brian Paul wrote: > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95460 > --- > docs/features.txt | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/docs/features.txt b/docs/features.txt > index 55b1fbb..300442b 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -335,3 +335,6 @@ we DO NOT WANT implementations of these extensions for > Mesa. > > More info about these features and the work involved can be found at > http://dri.freedesktop.org/wiki/MissingFunctionality > + > +Also, a graphical representation of this information can be found at > +http://mesamatrix.net/ > Sounds good to me. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Revert "i915: Always enable GL 2.0 support."
On Wednesday, February 1, 2017 12:35:15 PM PST Eero Tamminen wrote: > Hi, > > On 31.01.2017 21:12, Matt Turner wrote: > > On Sun, Jan 29, 2017 at 8:29 PM, Kenneth Graunke > > wrote: > >> On Sunday, January 29, 2017 6:20:10 PM PST Matt Turner wrote: > >>> This partially reverts commit 97217a40f97cdeae0304798b607f704deb0c3558. > >>> It leaves ES 2.0 support in place per Ian's suggestion, because ES 2.0 > >>> is designed to work on hardware like i915. > >> > >> Your commit message should mention why dropping from OpenGL 2.1 to 1.4 > >> is a good thing. > >> > >> (IIRC it's because Chrome (and other apps?) use really slow paths with > >> 2.1, and so the general usability of the system is likely to be worse.) > > > > Yeah, I'll add > > Has that been profiled? I assume the issue is the constant uploads > Chrome does for the Browser's CPU rendered page content updates. There > are not many other workloads that utilize GPU as inefficiently as > current browsers. > > > - Eero I figured it was just software rendering everything, given that there's no vertex shading support and only 64 fragment shader instructions... prog_execute isn't exactly a state-of-the-art interpreter, either. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure: Only require libdrm 2.4.75 for intel.
On Thursday, February 2, 2017 7:35:20 AM PST Chad Versace wrote: > On Thu 02 Feb 2017, Dave Airlie wrote: > > On 2 February 2017 at 13:09, Emil Velikov wrote: > > > On 2 February 2017 at 02:58, Michel Dänzer wrote: > > >> On 02/02/17 09:10 AM, Emil Velikov wrote: > > >>> On 1 February 2017 at 23:28, Vinson Lee wrote: > > Fixes: b8acb6b17981 ("configure: Require libdrm >= 2.4.75") > > Signed-off-by: Vinson Lee > > >>> Are you sure that's correct ? > > >>> > > >>> Afaict the follow-up commits make use of updated i915_drm.h which > > >>> should be provided by your distro's libdrm-dev package. > > >> > > >> This seems to be at the heart of the confusion here: Is i915_drm.h part > > >> of libdrm or of libdrm_intel? I'd argue it's the latter, and the fact > > >> that some or even all downstreams ship a single package with all libdrm* > > >> headers is irrelevant. That package also contains all the libdrm_*.pc > > >> files, so Vinson's patch works as intended either way. > > >> > > > Are you saying that there's a single -dev package [libdrm-dev] for > > > everything libdrm* related ? > > > That sounds like a broken distro package... which would explain some > > > of the assumptions/discussions on #dri-devel :-) > > > > That is how all distros ship it. > > As Dänzer said, "Vinson's patch works as intended either way". > > If this small patch fixes Vinson's problem; breaks no one's setup; and > causes no maintenance burden; then the patch is good. > > Is anyone *opposed* to Vinson's patch? (It's hard to tell because all of > the discussion about what distro's do, don't do, and should do). I'm not opposed. Normally, this is what we do. Bumping LIBDRM_INTEL_REQUIRED when we need a new i915_drm.h seems totally reasonable to me. I don't know of any setup that ships multiple libdrm (why?!)...but it seems like if you have a new enough libdrm_intel, you'll have a new enough i915_drm.h. That said...this case is a /little/ different...because we're introducing a dependency on libsync.h, which is part of core libdrm. I don't think it's an Intel-specific file, though it is currently only used in i965... I don't know that it makes much difference. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac: describe what all the LIBDRM_*REQUIRED macros mean
On Thursday, February 2, 2017 9:15:47 AM PST Chad Versace wrote: > On Thu 02 Feb 2017, Emil Velikov wrote: > > From: Emil Velikov > > > > They are versions of the respective libdrm package. They are _not_ the > > version of libdrm[.so] required for driver X. > > > > Doing the latter will lead to combinatoric explosion and in all fairness > > things will likely be broken most of the time. > > > > To make things even more confusing the kernel UAPI is provided by libdrm > > itself. > > > > Cc: Vinson Lee > > Cc: Kenneth Graunke > > Signed-off-by: Emil Velikov > > --- > > Ken, you/Chad have things spot on. Yet semes like other people struggle > > deeply with these. > > Actually, I agree with airlied and imirkin. I made a mistake when > I bumped LIBDRM_REQUIRED. Me too... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Don't crash when destroying contexts created with no visual.
dEQP-EGL.functional.create_context.no_config tries to create a context with no config, then immediately destroys it. The drawbuffer is never set up, so we can't dereference it asking if it's double buffered, or we'll crash on a null pointer dereference. Just bail early. Signed-off-by: Kenneth Graunke --- src/mesa/main/context.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c index 7ecd241e6eb..76763489b9f 100644 --- a/src/mesa/main/context.c +++ b/src/mesa/main/context.c @@ -1550,7 +1550,7 @@ _mesa_check_init_viewport(struct gl_context *ctx, GLuint width, GLuint height) static void handle_first_current(struct gl_context *ctx) { - if (ctx->Version == 0) { + if (ctx->Version == 0 || !ctx->DrawBuffer) { /* probably in the process of tearing down the context */ return; } -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] anv/pipeline: set ThreadDispatchEnable conditionally
On Friday, February 3, 2017 1:19:16 PM PST Juan A. Suarez Romero wrote: > Set 3DSTATE_WM/ThreadDispatchEnable bit on/off based on the same > conditions as used in the GL version. > > Signed-off-by: Juan A. Suarez Romero > --- > src/intel/vulkan/genX_pipeline.c | 49 > +--- > 1 file changed, 26 insertions(+), 23 deletions(-) looks good to me. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] docs: replace URL in features.txt
On Friday, February 3, 2017 11:48:50 AM PST Brian Paul wrote: > Replace unmaintained http://dri.freedesktop.org/wiki/MissingFunctionality > URL with http://mesamatrix.net/ > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95460 > --- > docs/features.txt | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/docs/features.txt b/docs/features.txt > index 55b1fbb..2f2d41d 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -333,5 +333,6 @@ we DO NOT WANT implementations of these extensions for > Mesa. >GL_ARB_shadow_ambient Superseded by > GL_ARB_fragment_program >GL_ARB_vertex_blend Superseded by > GL_ARB_vertex_program > > -More info about these features and the work involved can be found at > -http://dri.freedesktop.org/wiki/MissingFunctionality > + > +A graphical representation of this information can be found at > +http://mesamatrix.net/ > Yeah, we haven't used that page in years... Acked-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Select pipeline and emit state base address in Gen8+ HiZ ops.
If a HiZ op is the first thing in the batch, we should make sure to select the render pipeline and emit state base address before proceeding. I believe 3DSTATE_WM_HZ_OP creates 3DPRIMITIVEs internally, and dispatching those on the GPGPU pipeline seems a bit sketchy. I'm not actually sure that STATE_BASE_ADDRESS is necessary, as the depth related commands use graphics addresses, not ones relative to the base address...but we're likely to set it as part of the next operation anyway, so we should just do it right away. Cc: "17.0" Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/gen8_depth_state.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c b/src/mesa/drivers/dri/i965/gen8_depth_state.c index a7e61354fd5..620b32df8bb 100644 --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c @@ -404,6 +404,9 @@ gen8_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt, if (op == BLORP_HIZ_OP_NONE) return; + brw_select_pipeline(brw, BRW_RENDER_PIPELINE); + brw_upload_state_base_address(brw); + /* Disable the PMA stall fix since we're about to do a HiZ operation. */ if (brw->gen == 8) gen8_write_pma_stall_bits(brw, 0); -- 2.11.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Time to merge threaded GL dispatch? (aka glthread)
On Monday, February 6, 2017 8:54:40 PM PST Marek Olšák wrote: > On Mon, Feb 6, 2017 at 8:20 PM, Ernst Sjöstrand wrote: > > FYI glmark2 segfaults with mesa_glthread=true. Expected that some programs > > will segfault? > > Yes, even segfaults are expected with mesa_glthread=true. > > Marek Would it make sense to be crash-free or even regression-free on at least Piglit, before merging? (Or are we there already?) --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: correct compute shader checks for memoryBarrier functions
On Monday, February 6, 2017 9:07:30 AM PST Marc Di Luzio wrote: > As per the spec - > "The functions memoryBarrierShared() and groupMemoryBarrier() are > available only in compute shaders; the other functions are available > in all shader types." > > Conform to this by adding another delegate to check for compute > shader support instead of only whether the current stage is compute > > This allows some fragment shaders in Dirt Rally to compile > > CC: "17.0" > > Reviewed-by: Anuj Phogat Reviewed-by: Kenneth Graunke and pushed: To ssh://git.freedesktop.org/git/mesa/mesa 83fb63d31de..21efe2528cd master -> master Thanks, Marc! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Time to merge threaded GL dispatch? (aka glthread)
On Thursday, February 9, 2017 5:23:47 PM PST Ian Romanick wrote: [snip] > Not achieving the desired performance is vastly different from > segfaults. Does the threaded mode in NVIDIA's driver crash? I'd wager > not. Sadly, yes it does. A while back, I tried running some Steam games on Linux using my NVIDIA machine, with an out of the box configuration, and they totally crashed. Workaround was to disable GL threaded optimizations. Of course, I also had crashes which I had to update the CPU microcode to workaround, and tearing right through the middle of every single frame (needed more workarounds), and all kinds of problems...so...not sure that's the standard of quality I want to aspire to... --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 0/4] New drirc options for Worms WMD, Tropico 5 and Crookz
On Friday, February 10, 2017 5:45:03 AM PST Ilia Mirkin wrote: > On Fri, Feb 10, 2017 at 8:41 AM, Samuel Pitoiset > wrote: > > Hi, > > > > Mesa currently doesn't allow to create 3.1+ compatibility profiles mainly > > because various features are unimplemented and bugs can happen. > > > > However, some buggy apps request a compat profile without using any old > > features but they fail to start because Mesa clamps the GLSL version to > > 130 for compat. > > And presumably you can't just hand them a core profile because they'll > do something like glGetString(GL_EXTENSIONS)? You can totally hand Tropico 5 a core profile and it'll work correctly. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: Actually enforce ES SSBO unsized array check for SSBOs.
Commit b010fa85675b98962426fe8961466fbae2d25499 re-added this check after it was erroneously dropped while fixing another bug. However, the control flow was slightly off, and it ended up only applying to UBOs and not SSBOs. Fixes dEQP-GLES31.functional.debug.negative_coverage. {callbacks,get_error,log}.shader.compile_compute_shader. Signed-off-by: Kenneth Graunke --- src/compiler/glsl/ast_to_hir.cpp | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp index b31b61d1ed6..851a7d6ef4d 100644 --- a/src/compiler/glsl/ast_to_hir.cpp +++ b/src/compiler/glsl/ast_to_hir.cpp @@ -7903,10 +7903,9 @@ ast_interface_block::hir(exec_list *instructions, } if (var->type->is_unsized_array()) { -if (var->is_in_shader_storage_block()) { - if (is_unsized_array_last_element(var)) { - var->data.from_ssbo_unsized_array = true; - } +if (var->is_in_shader_storage_block() && +is_unsized_array_last_element(var)) { + var->data.from_ssbo_unsized_array = true; } else { /* From GLSL ES 3.10 spec, section 4.1.9 "Arrays": * -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: non-last member unsized array on SSBO must fail compilation on GLSL ES 3.1
On Friday, February 10, 2017 5:25:27 AM PST Jose Maria Casanova Crespo wrote: > From GLSL ES 3.10 spec, section 4.1.9 "Arrays": > > "If an array is declared as the last member of a shader storage block > and the size is not specified at compile-time, it is sized at run-time. > In all other cases, arrays are sized only at compile-time." > > In desktop GLSL it is allowed to have unsized-arrays that are > not last, as long as we can determine that they are implicitly > sized, which is detected at link-time. > > With this patch Mesa reports a compilation error as glslang does with > the following shader: > > buffer SSBO { vec4 data[]; vec4 moreData;}; > void main (void) > { > } > > Fixes: > dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader > dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader > dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader > > Cc: "17.0" > Signed-off-by: Jose Maria Casanova Crespo > --- > src/compiler/glsl/ast_to_hir.cpp | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) Thanks! I missed this patch on the mailing list and independently arrived at the same solution today, so we'll call that a: Reviewed-by: Kenneth Graunke and pushed: To ssh://git.freedesktop.org/git/mesa/mesa 0514b0bdc91..5bc222ebafd master -> master signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Actually enforce ES SSBO unsized array check for SSBOs.
On Friday, February 10, 2017 3:01:10 PM PST Ilia Mirkin wrote: > Seems pretty similar to https://patchwork.freedesktop.org/patch/138274/ So it does! I hadn't seen that patch yet. I just pushed his patch instead with my R-b. Thanks! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/5] mesa: Ignore per-vertex array size in SSO pipeline validation.
We were already unwrapping types when the producer was a non-array stage and the consumer was an arrayed-stage...but we ought to unwrap both ends for TCS -> TES matching too. This will allow us to drop the "resize to gl_MaxPatchVertices" check shortly, which breaks some things. Signed-off-by: Kenneth Graunke --- src/mesa/main/shader_query.cpp | 98 -- 1 file changed, 56 insertions(+), 42 deletions(-) diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp index 30e5b087abb..6e077623e15 100644 --- a/src/mesa/main/shader_query.cpp +++ b/src/mesa/main/shader_query.cpp @@ -1378,11 +1378,12 @@ validate_io(struct gl_program *producer, struct gl_program *consumer) if (producer->sh.data->linked_stages == consumer->sh.data->linked_stages) return true; - const bool nonarray_stage_to_array_stage = - producer->info.stage != MESA_SHADER_TESS_CTRL && - (consumer->info.stage == MESA_SHADER_GEOMETRY || - consumer->info.stage == MESA_SHADER_TESS_CTRL || - consumer->info.stage == MESA_SHADER_TESS_EVAL); + const bool producer_is_array_stage = + producer->info.stage == MESA_SHADER_TESS_CTRL; + const bool consumer_is_array_stage = + consumer->info.stage == MESA_SHADER_GEOMETRY || + consumer->info.stage == MESA_SHADER_TESS_CTRL || + consumer->info.stage == MESA_SHADER_TESS_EVAL; bool valid = true; @@ -1495,6 +1496,56 @@ validate_io(struct gl_program *producer, struct gl_program *consumer) if (match_index < num_outputs) outputs[match_index] = outputs[num_outputs]; + /* Section 7.4.1 (Shader Interface Matching) of the ES 3.2 spec says: + * + *"Tessellation control shader per-vertex output variables and + * blocks and tessellation control, tessellation evaluation, and + * geometry shader per-vertex input variables and blocks are + * required to be declared as arrays, with each element representing + * input or output values for a single vertex of a multi-vertex + * primitive. For the purposes of interface matching, such variables + * and blocks are treated as though they were not declared as + * arrays." + * + * So we unwrap those types before matching. + */ + const glsl_type *consumer_type = consumer_var->type; + const glsl_type *consumer_interface_type = consumer_var->interface_type; + const glsl_type *producer_type = producer_var->type; + const glsl_type *producer_interface_type = producer_var->interface_type; + + if (consumer_is_array_stage) { + if (consumer_interface_type) { +/* the interface is the array; the underlying types should match */ +if (consumer_interface_type->is_array() && !consumer_var->patch) + consumer_interface_type = consumer_interface_type->fields.array; + } else { +if (consumer_type->is_array() && !consumer_var->patch) + consumer_type = consumer_type->fields.array; + } + } + + if (producer_is_array_stage) { + if (producer_interface_type) { +/* the interface is the array; the underlying types should match */ +if (producer_interface_type->is_array() && !producer_var->patch) + producer_interface_type = producer_interface_type->fields.array; + } else { +if (producer_type->is_array() && !producer_var->patch) + producer_type = producer_type->fields.array; + } + } + + if (producer_type != consumer_type) { + valid = false; + goto out; + } + + if (producer_interface_type != consumer_interface_type) { + valid = false; + goto out; + } + /* Section 9.2.2 (Separable Programs) of the GLSL ES spec says: * *Qualifier Class| Qualifier |in/out @@ -1525,43 +1576,6 @@ validate_io(struct gl_program *producer, struct gl_program *consumer) * Note that location mismatches are detected by the loops above that * find the producer variable that goes with the consumer variable. */ - if (nonarray_stage_to_array_stage) { - if (consumer_var->interface_type != NULL) { -/* the interface is the array; underlying types should match */ -if (producer_var->type != consumer_var->type) { - valid = false; - goto out; -} - -if (!consumer_var->interface_type->is_array() || -consumer_var->interface_type->fields.array != producer_var->interface_type) { - valid = false; - goto out; -} - } else { -if (producer_var->interface_type != NULL)
[Mesa-dev] [PATCH 5/5] glsl: Drop resize-to-MaxPatchVertices hack.
TCS and TES inputs without an array size are implicitly sized to gl_MaxPatchVertices. But TCS outputs are apparently not: "If no size is specified, it will be taken from the output patch size (gl_VerticesOut) declared in the shader." Fixes dEQP-GLES31.functional.program_interface_query.program_output. array_size.separable_tess_ctrl.var. Signed-off-by: Kenneth Graunke --- src/compiler/glsl/ast_to_hir.cpp | 3 -- src/compiler/glsl/ir.h | 6 src/compiler/glsl/linker.cpp | 32 -- src/compiler/glsl/lower_named_interface_blocks.cpp | 2 -- 4 files changed, 43 deletions(-) diff --git a/src/compiler/glsl/ast_to_hir.cpp b/src/compiler/glsl/ast_to_hir.cpp index ee47b654bfb..b90ad97b1de 100644 --- a/src/compiler/glsl/ast_to_hir.cpp +++ b/src/compiler/glsl/ast_to_hir.cpp @@ -4404,8 +4404,6 @@ handle_tess_ctrl_shader_output_decl(struct _mesa_glsl_parse_state *state, if (var->data.patch) return; - var->data.tess_varying_implicit_sized_array = var->type->is_unsized_array(); - validate_layout_qualifier_vertex_count(state, loc, var, num_vertices, &state->tcs_output_size, "tessellation control shader output"); @@ -4442,7 +4440,6 @@ handle_tess_shader_input_decl(struct _mesa_glsl_parse_state *state, if (var->type->is_unsized_array()) { var->type = glsl_type::get_array_instance(var->type->fields.array, state->Const.MaxPatchVertices); - var->data.tess_varying_implicit_sized_array = true; } else if (var->type->length != state->Const.MaxPatchVertices) { _mesa_glsl_error(&loc, state, "per-vertex tessellation shader input arrays must be " diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h index 4317c54d498..3544161105e 100644 --- a/src/compiler/glsl/ir.h +++ b/src/compiler/glsl/ir.h @@ -844,12 +844,6 @@ public: unsigned implicit_sized_array:1; /** - * Is this a non-patch TCS output / TES input array that was implicitly - * sized to gl_MaxPatchVertices? - */ - unsigned tess_varying_implicit_sized_array:1; - - /** * Whether this is a fragment shader output implicitly initialized with * the previous contents of the specified render target at the * framebuffer location corresponding to this shader invocation. diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp index 4c7bf282ce1..b3c7d2c1456 100644 --- a/src/compiler/glsl/linker.cpp +++ b/src/compiler/glsl/linker.cpp @@ -3709,17 +3709,6 @@ create_shader_variable(struct gl_shader_program *shProg, return out; } -static const glsl_type * -resize_to_max_patch_vertices(const struct gl_context *ctx, - const glsl_type *type) -{ - if (!type) - return NULL; - - return glsl_type::get_array_instance(type->fields.array, -ctx->Const.MaxPatchVertices); -} - static bool add_shader_variable(const struct gl_context *ctx, struct gl_shader_program *shProg, @@ -3733,27 +3722,6 @@ add_shader_variable(const struct gl_context *ctx, const glsl_type *interface_type = var->get_interface_type(); if (outermost_struct_type == NULL) { - /* Unsized (non-patch) TCS output/TES input arrays are implicitly - * sized to gl_MaxPatchVertices. Internally, we shrink them to a - * smaller size. - * - * This can cause trouble with SSO programs. Since the TCS declares - * the number of output vertices, we can always shrink TCS output - * arrays. However, the TES might not be linked with a TCS, in - * which case it won't know the size of the patch. In other words, - * the TCS and TES may disagree on the (smaller) array sizes. This - * can result in the resource names differing across stages, causing - * SSO validation failures and other cascading issues. - * - * Expanding the array size to the full gl_MaxPatchVertices fixes - * these issues. It's also what program interface queries expect, - * as that is the official size of the array. - */ - if (var->data.tess_varying_implicit_sized_array) { - type = resize_to_max_patch_vertices(ctx, type); - interface_type = resize_to_max_patch_vertices(ctx, interface_type); - } - if (var->data.from_named_ifc_block) { const char *interface_name = interface_type->name; diff --git a/src/compiler/glsl/lower_named_interface_blocks.cpp b/src/compiler/glsl/lower_named_interface_blocks.cpp index 53ef638c9c4..a00e60dd771 100644 --- a/src/compiler/glsl/lower_named_interface_blocks.cpp +++ b/src/compiler/glsl/lower_named_interface_blocks.cpp @@ -193,8 +193,6
[Mesa-dev] [PATCH 3/5] glsl: Update a comment about link errors for TCS && !TES.
OpenGL ES actually has spec text to prohibit this. It's just OpenGL that's confusing. Signed-off-by: Kenneth Graunke --- src/compiler/glsl/linker.cpp | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp index 720c22baee0..4c7bf282ce1 100644 --- a/src/compiler/glsl/linker.cpp +++ b/src/compiler/glsl/linker.cpp @@ -4721,7 +4721,15 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) goto done; } - /* The spec is self-contradictory here. It allows linking without a tess + /* Section 7.3 of the OpenGL ES 3.2 specification says: + * + *"Linking can fail for [...] any of the following reasons: + * + * * program contains an object to form a tessellation control + * shader [...] and [...] the program is not separable and + * contains no object to form a tessellation evaluation shader" + * + * The OpenGL spec is contradictory. It allows linking without a tess * eval shader, but that can only be used with transform feedback and * rasterization disabled. However, transform feedback isn't allowed * with GL_PATCHES, so it can't be used. -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/5] mesa: Do (TCS && !TES) draw time validation in ES as well.
Now that we have OES_tessellation_shader, the same situation can occur in ES too, not just GL core profile. Having a TCS but no TES may confuse drivers - i965 crashes, for example. This prevents regressions in ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage with some SSO pipeline validation changes I'm making. Cc: "17.0" Signed-off-by: Kenneth Graunke --- src/mesa/main/api_validate.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c index 6c95701ea0e..d64264fe1e9 100644 --- a/src/mesa/main/api_validate.c +++ b/src/mesa/main/api_validate.c @@ -236,6 +236,25 @@ check_valid_to_render(struct gl_context *ctx, const char *function) return false; } + /* The spec argues that this is allowed because a tess ctrl shader +* without a tess eval shader can be used with transform feedback. +* However, glBeginTransformFeedback doesn't allow GL_PATCHES and +* therefore doesn't allow tessellation. +* +* Further investigation showed that this is indeed a spec bug and +* a tess ctrl shader without a tess eval shader shouldn't have been +* allowed, because there is no API in GL 4.0 that can make use this +* to produce something useful. +* +* Also, all vendors except one don't support a tess ctrl shader without +* a tess eval shader anyway. +*/ + if (ctx->TessCtrlProgram._Current && !ctx->TessEvalProgram._Current) { + _mesa_error(ctx, GL_INVALID_OPERATION, + "%s(tess eval shader is missing)", function); + return false; + } + switch (ctx->API) { case API_OPENGLES2: /* For ES2, we can draw if we have a vertex program/shader). */ @@ -260,25 +279,6 @@ check_valid_to_render(struct gl_context *ctx, const char *function) return false; } - /* The spec argues that this is allowed because a tess ctrl shader - * without a tess eval shader can be used with transform feedback. - * However, glBeginTransformFeedback doesn't allow GL_PATCHES and - * therefore doesn't allow tessellation. - * - * Further investigation showed that this is indeed a spec bug and - * a tess ctrl shader without a tess eval shader shouldn't have been - * allowed, because there is no API in GL 4.0 that can make use this - * to produce something useful. - * - * Also, all vendors except one don't support a tess ctrl shader without - * a tess eval shader anyway. - */ - if (ctx->TessCtrlProgram._Current && !ctx->TessEvalProgram._Current) { - _mesa_error(ctx, GL_INVALID_OPERATION, - "%s(tess eval shader is missing)", function); - return false; - } - /* Section 7.3 (Program Objects) of the OpenGL 4.5 Core Profile spec * says: * -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/5] mesa: Do a draw time check for TES && !TCS in ES 3.x.
ES 3.x requires both TCS and TES to be present. We already checked the TCS && !TES case above, so we just have to check !TCS && TES here. Note that this is allowed in OpenGL, just not ES. This fixes a subcase of: dEQP-GLES31.functional.debug.negative_coverage.*.tessellation.single_tessellation_stage Signed-off-by: Kenneth Graunke --- src/mesa/main/api_validate.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c index d64264fe1e9..41d2611eec4 100644 --- a/src/mesa/main/api_validate.c +++ b/src/mesa/main/api_validate.c @@ -257,6 +257,20 @@ check_valid_to_render(struct gl_context *ctx, const char *function) switch (ctx->API) { case API_OPENGLES2: + /* Section 11.2 (Tessellation) of the ES 3.2 spec says: + * + * "An INVALID_OPERATION error is generated by any command that + * transfers vertices to the GL if the current program state has + * one but not both of a tessellation control shader and tessellation + * evaluation shader." + */ + if (_mesa_is_gles3(ctx) && + ctx->TessEvalProgram._Current && !ctx->TessCtrlProgram._Current) { + _mesa_error(ctx, GL_INVALID_OPERATION, + "%s(tess ctrl shader is missing)", function); + return false; + } + /* For ES2, we can draw if we have a vertex program/shader). */ return ctx->VertexProgram._Current != NULL; -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] mesa: Do (TCS && !TES) draw time validation in ES as well.
On Saturday, February 11, 2017 12:21:36 AM PST Alejandro Piñeiro wrote: > On 11/02/17 08:52, Kenneth Graunke wrote: > > Now that we have OES_tessellation_shader, the same situation can occur > > in ES too, not just GL core profile. > > > > Having a TCS but no TES may confuse drivers - i965 crashes, for example. > > > > This prevents regressions in > > ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage > > with some SSO pipeline validation changes I'm making. > > > > Cc: "17.0" > > Signed-off-by: Kenneth Graunke > > --- > > src/mesa/main/api_validate.c | 38 +++--- > > 1 file changed, 19 insertions(+), 19 deletions(-) > > > > diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c > > index 6c95701ea0e..d64264fe1e9 100644 > > --- a/src/mesa/main/api_validate.c > > +++ b/src/mesa/main/api_validate.c > > @@ -236,6 +236,25 @@ check_valid_to_render(struct gl_context *ctx, const > > char *function) > >return false; > > } > > > > + /* The spec argues that this is allowed because a tess ctrl shader > > +* without a tess eval shader can be used with transform feedback. > > +* However, glBeginTransformFeedback doesn't allow GL_PATCHES and > > +* therefore doesn't allow tessellation. > > +* > > +* Further investigation showed that this is indeed a spec bug and > > +* a tess ctrl shader without a tess eval shader shouldn't have been > > +* allowed, because there is no API in GL 4.0 that can make use this > > +* to produce something useful. > > As the check is done for both OpenGL and OpenGL ES, how about update the > comment to include both, as right now only mentions desktop? > > In any case, this is a nitpick, feel free to ignore if you think that > increases the comment without good reason. Good call, I've changed it to: + /* Section 11.2 (Tessellation) of the ES 3.2 spec says: +* +* "An INVALID_OPERATION error is generated by any command that +* transfers vertices to the GL if the current program state has +* one but not both of a tessellation control shader and tessellation +* evaluation shader." +* +* The OpenGL spec argues that this is allowed because a tess ctrl shader Thanks Alejandro! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] glsl: Handle packed_type == ivec4[] in lower_packed_varyings().
For GS input arrays, we may turn a packed_type of ivec4 into an array of ivec4s. We still want flat qualification. Found by inspection. Not known to help anything. Signed-off-by: Kenneth Graunke --- src/compiler/glsl/lower_packed_varyings.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/compiler/glsl/lower_packed_varyings.cpp b/src/compiler/glsl/lower_packed_varyings.cpp index 1e9bdda12b1..13f7e5b52da 100644 --- a/src/compiler/glsl/lower_packed_varyings.cpp +++ b/src/compiler/glsl/lower_packed_varyings.cpp @@ -704,7 +704,8 @@ lower_packed_varyings_visitor::get_packed_varying_deref( packed_var->data.centroid = unpacked_var->data.centroid; packed_var->data.sample = unpacked_var->data.sample; packed_var->data.patch = unpacked_var->data.patch; - packed_var->data.interpolation = packed_type == glsl_type::ivec4_type + packed_var->data.interpolation = + packed_type->without_array() == glsl_type::ivec4_type ? unsigned(INTERP_MODE_FLAT) : unpacked_var->data.interpolation; packed_var->data.location = location; packed_var->data.precision = unpacked_var->data.precision; -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] glsl: Nerf assert about qualifiers in lower_packed_varyings().
In ES 3.0, interpret_interpolation_qualifier() defaults unset interpolation qualifiers to "smooth"...which has the strange result of marking some integer variables "smooth". Interpolation qualifiers really only matter for fragment shader inputs, other than enforcing the linker rules that changed in various language versions. By the time we get to lower_packed_varyings, we probably don't care anymore. Just ignore the assert for non-FS stages. It's sketchy, but this whole pass is pretty sketchy and needs to die... Fixes dEQP-GLES31.functional.program_interface_query.program_input.type. separable_geometry.{int,ivec2,ivec3,ivec4,uint,uvec2,uvec3,uvec4}. Signed-off-by: Kenneth Graunke --- src/compiler/glsl/lower_packed_varyings.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/compiler/glsl/lower_packed_varyings.cpp b/src/compiler/glsl/lower_packed_varyings.cpp index 13f7e5b52da..989caf63c14 100644 --- a/src/compiler/glsl/lower_packed_varyings.cpp +++ b/src/compiler/glsl/lower_packed_varyings.cpp @@ -282,7 +282,8 @@ lower_packed_varyings_visitor::run(struct gl_linked_shader *shader) * together when their interpolation mode is "flat". Treat integers as * being flat when the interpolation mode is none. */ - assert(var->data.interpolation == INTERP_MODE_FLAT || + assert(shader->Stage != MESA_SHADER_FRAGMENT || + var->data.interpolation == INTERP_MODE_FLAT || var->data.interpolation == INTERP_MODE_NONE || !var->type->contains_integer()); -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] i965/sampler_state: Set the "Base Mip Level" field on Sandy Bridge
On Friday, February 10, 2017 3:52:44 PM PST Jason Ekstrand wrote: > Fixes two GL ES 3.0 CTS tests on Sandy Bridge: > > ES3-CTS.functional.texture.mipmap.cube.base_level.linear_linear > ES3-CTS.functional.texture.mipmap.cube.base_level.linear_nearest > > Cc: "17.0 13.0" > --- > src/mesa/drivers/dri/i965/brw_sampler_state.c | 20 +++- > src/mesa/drivers/dri/i965/brw_state.h | 1 + > 2 files changed, 20 insertions(+), 1 deletion(-) This seems like the most likely explanation, and there's approximately zero information on the subject, so let's go with it for now :) Series is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] anv: Use build-id for pipeline cache UUID.
On Tuesday, February 14, 2017 12:38:45 PM PST Chad Versace wrote: > On Tue 14 Feb 2017, Matt Turner wrote: > > > > static bool > > -anv_get_function_timestamp(void *ptr, uint32_t* timestamp) > > +anv_device_get_cache_uuid(void *uuid) > > { > > - Dl_info info; > > - struct stat st; > > - if (!dladdr(ptr, &info) || !info.dli_fname) > > + const struct note *note = build_id_find_nhdr("libvulkan_intel.so"); > > + if (!note) > >return false; > > > > - if (stat(info.dli_fname, &st)) > > + unsigned len = build_id_length(note); > > + if (len < VK_UUID_SIZE) > >return false; > > > > - *timestamp = st.st_mtim.tv_sec; > > - return true; > > -} > > - > > -static bool > > -anv_device_get_cache_uuid(void *uuid) > > -{ > > - uint32_t timestamp; > > - > > - memset(uuid, 0, VK_UUID_SIZE); > > - if (!anv_get_function_timestamp(anv_device_get_cache_uuid, ×tamp)) > > + unsigned char *build_id = malloc(len); > > + if (!build_id) > >return false; > > > > - snprintf(uuid, VK_UUID_SIZE, "anv-%d", timestamp); > > + build_id_read(note, build_id); > > + > > + memcpy(uuid, build_id, VK_UUID_SIZE); > > + free(build_id); > > The Vulkan spec frowns on memory allocations when not needed. If you > must allocate memory here, then it should be through the VkInstance > allocation callbacks. However, it's best to avoid the allocation by > adding a size_t parameter, à la snprintf, to build_id_read(). > > Otherwise, the patch looks good to me. You're worried about the performance of anv_physical_device_init()? This doesn't happen on lookup...just driver start up... --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: Make a helper for emitting 3DSTATE_CONSTANT_XS packets.
This separates the logic from filling out a 3DSTATE_CONSTANT_XS packet from the decisions about what to put in the various buffers. It also should make it easier to use more than one buffer, should we decide to do so. It also provides a nice place to enforce the various restrictions via assertions. By marking the helper as inline, the code for unused buffers should be constant folded away. Signed-off-by: Kenneth Graunke Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/gen6_constant_state.c | 169 +--- 1 file changed, 118 insertions(+), 51 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_constant_state.c b/src/mesa/drivers/dri/i965/gen6_constant_state.c index 6c0c32b26f7..7e6fa92ecf2 100644 --- a/src/mesa/drivers/dri/i965/gen6_constant_state.c +++ b/src/mesa/drivers/dri/i965/gen6_constant_state.c @@ -27,6 +27,97 @@ #include "intel_batchbuffer.h" #include "program/prog_parameter.h" +#define F(RELOC, BATCH, buf, x) \ + if (buf) { \ + RELOC(buf, I915_GEM_DOMAIN_RENDER, 0, x); \ + } else { \ + BATCH(x); \ + } +#define OUT_PTR64(buf, x) F(OUT_RELOC64, OUT_BATCH64, buf, x) +#define OUT_PTR(buf, x) F(OUT_RELOC, OUT_BATCH, buf, x) + +static inline void +emit_3dstate_constant(struct brw_context *brw, + uint32_t opcode, + uint32_t mocs, + drm_intel_bo *bufs[4], + uint16_t read_lengths[4], + uint64_t offsets[4]) +{ + /* Buffer 0 is relative to Dynamic State Base Address, which we program +* to the start of the batch buffer. All others are graphics virtual +* addresses regardless of the INSTPM settings. +*/ + assert(bufs[0] == NULL || bufs[0] == brw->batch.bo); + + assert(read_lengths[0] == 0 || bufs[0] != NULL); + assert(read_lengths[1] == 0 || bufs[1] != NULL); + assert(read_lengths[2] == 0 || bufs[2] != NULL); + assert(read_lengths[3] == 0 || bufs[3] != NULL); + + if (brw->gen >= 8) { + BEGIN_BATCH(11); + OUT_BATCH(opcode << 16 | (11 - 2)); + OUT_BATCH(read_lengths[0] | read_lengths[1] << 16); + OUT_BATCH(read_lengths[2] | read_lengths[3] << 16); + OUT_BATCH64(offsets[0]); + OUT_PTR64(bufs[1], offsets[1]); + OUT_PTR64(bufs[2], offsets[2]); + OUT_PTR64(bufs[3], offsets[3]); + ADVANCE_BATCH(); + } else if (brw->gen == 7) { + /* From the Ivybridge PRM, Volume 2, Part 1, Page 112: + * "Constant buffers must be enabled in order from Constant Buffer 0 to + * Constant Buffer 3 within this command. For example, it is not + * allowed to enable Constant Buffer 1 by programming a non-zero value + * in the VS Constant Buffer 1 Read Length without a non-zero value in + * VS Constant Buffer 0 Read Length." + * + * Haswell removes this restriction. + */ + if (!brw->is_haswell) { + assert(read_lengths[3] == 0 || (read_lengths[2] > 0 && + read_lengths[1] > 0 && + read_lengths[0] > 0)); + assert(read_lengths[2] == 0 || (read_lengths[1] > 0 && + read_lengths[0] > 0)); + assert(read_lengths[1] == 0 || read_lengths[0] > 0); + } + + BEGIN_BATCH(7); + OUT_BATCH(opcode << 16 | (7 - 2)); + OUT_BATCH(read_lengths[0] | read_lengths[1] << 16); + OUT_BATCH(read_lengths[2] | read_lengths[3] << 16); + OUT_BATCH(offsets[0]); + OUT_PTR(bufs[1], offsets[1]); + OUT_PTR(bufs[2], offsets[2]); + OUT_PTR(bufs[3], offsets[3]); + ADVANCE_BATCH(); + } else if (brw->gen == 6) { + /* From the Sandybridge PRM, Volume 2, Part 1, Page 138: + * "The sum of all four read length fields (each incremented to + * represent the actual read length) must be less than or equal + * to 32." + */ + assert(read_lengths[0] + read_lengths[1] + + read_lengths[2] + read_lengths[3] < 32); + + BEGIN_BATCH(5); + OUT_BATCH(opcode << 16 | (5 - 2) | +(read_lengths[0] ? GEN6_CONSTANT_BUFFER_0_ENABLE : 0) | +(read_lengths[1] ? GEN6_CONSTANT_BUFFER_1_ENABLE : 0) | +(read_lengths[2] ? GEN6_CONSTANT_BUFFER_2_ENABLE : 0) | +(read_lengths[3] ? GEN6_CONSTANT_BUFFER_3_ENABLE : 0)); + OUT_BATCH(offsets[0] | (read_lengths[0] - 1)); + OUT_PTR(bufs[1], offsets[1] | (read_lengths[1] - 1)); + OUT_PTR(bufs[2], offsets[2] | (read_lengths[2] - 1)); + OUT_PTR(bufs[3], offsets[3] | (read_lengths[3] - 1)); + ADVANCE_BATCH(); + } else { + unreachable("unhandled gen in emit_3dstate_constant"); + } +} + void gen7_upload_constant_state(struct brw_context *brw, const stru
[Mesa-dev] [PATCH 1/2] i965: Add an OUT_BATCH64() macro.
This is more convenient than OUT_BATCH'ing both halves. Signed-off-by: Kenneth Graunke Cc: Ben Widawsky --- src/mesa/drivers/dri/i965/gen8_depth_state.c | 3 +-- src/mesa/drivers/dri/i965/gen8_ds_state.c | 3 +-- src/mesa/drivers/dri/i965/gen8_gs_state.c | 3 +-- src/mesa/drivers/dri/i965/gen8_hs_state.c | 3 +-- src/mesa/drivers/dri/i965/gen8_ps_state.c | 3 +-- src/mesa/drivers/dri/i965/gen8_vs_state.c | 3 +-- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 1 + 7 files changed, 7 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_depth_state.c b/src/mesa/drivers/dri/i965/gen8_depth_state.c index a7e61354fd5..c085246bc92 100644 --- a/src/mesa/drivers/dri/i965/gen8_depth_state.c +++ b/src/mesa/drivers/dri/i965/gen8_depth_state.c @@ -72,8 +72,7 @@ emit_depth_packets(struct brw_context *brw, OUT_RELOC64(depth_mt->bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, 0); } else { - OUT_BATCH(0); - OUT_BATCH(0); + OUT_BATCH64(0); } OUT_BATCH(((width - 1) << 4) | ((height - 1) << 18) | lod); OUT_BATCH(((depth - 1) << 21) | (min_array_element << 10) | mocs_wb); diff --git a/src/mesa/drivers/dri/i965/gen8_ds_state.c b/src/mesa/drivers/dri/i965/gen8_ds_state.c index ee2f82e1098..55738fd1ffc 100644 --- a/src/mesa/drivers/dri/i965/gen8_ds_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ds_state.c @@ -56,8 +56,7 @@ gen8_upload_ds_state(struct brw_context *brw) I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, ffs(stage_state->per_thread_scratch) - 11); } else { - OUT_BATCH(0); - OUT_BATCH(0); + OUT_BATCH64(0); } OUT_BATCH(SET_FIELD(prog_data->dispatch_grf_start_reg, GEN7_DS_DISPATCH_START_GRF) | diff --git a/src/mesa/drivers/dri/i965/gen8_gs_state.c b/src/mesa/drivers/dri/i965/gen8_gs_state.c index 2b74f1bd575..31c6f89bc13 100644 --- a/src/mesa/drivers/dri/i965/gen8_gs_state.c +++ b/src/mesa/drivers/dri/i965/gen8_gs_state.c @@ -63,8 +63,7 @@ gen8_upload_gs_state(struct brw_context *brw) I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, ffs(stage_state->per_thread_scratch) - 11); } else { - OUT_BATCH(0); - OUT_BATCH(0); + OUT_BATCH64(0); } /* DW6 */ diff --git a/src/mesa/drivers/dri/i965/gen8_hs_state.c b/src/mesa/drivers/dri/i965/gen8_hs_state.c index ee47e5e54a0..dbdd19b1f5c 100644 --- a/src/mesa/drivers/dri/i965/gen8_hs_state.c +++ b/src/mesa/drivers/dri/i965/gen8_hs_state.c @@ -57,8 +57,7 @@ gen8_upload_hs_state(struct brw_context *brw) I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, ffs(stage_state->per_thread_scratch) - 11); } else { - OUT_BATCH(0); - OUT_BATCH(0); + OUT_BATCH64(0); } OUT_BATCH(GEN7_HS_INCLUDE_VERTEX_HANDLES | SET_FIELD(prog_data->dispatch_grf_start_reg, diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c b/src/mesa/drivers/dri/i965/gen8_ps_state.c index 03468267ce6..9b1a78c6ee6 100644 --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c @@ -269,8 +269,7 @@ gen8_upload_ps_state(struct brw_context *brw, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, ffs(stage_state->per_thread_scratch) - 11); } else { - OUT_BATCH(0); - OUT_BATCH(0); + OUT_BATCH64(0); } OUT_BATCH(dw6); OUT_BATCH(dw7); diff --git a/src/mesa/drivers/dri/i965/gen8_vs_state.c b/src/mesa/drivers/dri/i965/gen8_vs_state.c index 7b66da4b17c..a2b08fe92a0 100644 --- a/src/mesa/drivers/dri/i965/gen8_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen8_vs_state.c @@ -62,8 +62,7 @@ upload_vs_state(struct brw_context *brw) I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, ffs(stage_state->per_thread_scratch) - 11); } else { - OUT_BATCH(0); - OUT_BATCH(0); + OUT_BATCH64(0); } OUT_BATCH((prog_data->dispatch_grf_start_reg << diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.h b/src/mesa/drivers/dri/i965/intel_batchbuffer.h index bf7cadfc4d6..da8f7e561f4 100644 --- a/src/mesa/drivers/dri/i965/intel_batchbuffer.h +++ b/src/mesa/drivers/dri/i965/intel_batchbuffer.h @@ -161,6 +161,7 @@ intel_batchbuffer_advance(struct brw_context *brw) #define OUT_BATCH(d) *__map++ = (d) #define OUT_BATCH_F(f) OUT_BATCH(float_as_int((f))) +#define OUT_BATCH64(d) *((uint64_t *) __map) = (d); __map += 2 #define OUT_RELOC(buf, read_domains, write_domain, delta) do {\ uint32_t __offset = (__map - brw->batch.map) * 4; \ -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] i965/ps: Use ForceThreadDispatchEnable instead of AccessUAV.
On Monday, February 13, 2017 6:58:39 PM PST Jason Ekstrand wrote: > On Mon, Feb 13, 2017 at 6:35 PM, Francisco Jerez > wrote: > > > Jason Ekstrand writes: > > > > > The AccessUAV bit is not quite what we want because it's more about > > > coherency between storage operations than it is about dispatch. Also, > > > the 3DSTATE_PS_EXTRA::PixelShaderHasUAV bit seems to cause hangs on > > > Broadwell for unknown reasons so it's best to just leave it off. The > > > 3DSTATE_WM::ForceThreadDispatchEnable bit, on the other hand, does > > > exactly what we want and seems to work fine. > > > --- > > > src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ > > > src/mesa/drivers/dri/i965/gen8_ps_state.c | 52 > > ++- > > > 2 files changed, 19 insertions(+), 35 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > > b/src/mesa/drivers/dri/i965/brw_defines.h > > > index 3c5c6c4..9ad36ca 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_defines.h > > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > > > @@ -2733,6 +2733,8 @@ enum brw_barycentric_mode { > > > # define GEN7_WM_MSDISPMODE_PERSAMPLE(0 << 31) > > > # define GEN7_WM_MSDISPMODE_PERPIXEL (1 << 31) > > > # define HSW_WM_UAV_ONLY(1 << 30) > > > +# define GEN8_WM_FORCE_THREAD_DISPATCH_OFF (1 << 19) > > > +# define GEN8_WM_FORCE_THREAD_DISPATCH_ON (2 << 19) > > > > > > #define _3DSTATE_PS 0x7820 /* GEN7+ */ > > > /* DW1: kernel pointer */ > > > diff --git a/src/mesa/drivers/dri/i965/gen8_ps_state.c > > b/src/mesa/drivers/dri/i965/gen8_ps_state.c > > > index 0346826..cca57e6 100644 > > > --- a/src/mesa/drivers/dri/i965/gen8_ps_state.c > > > +++ b/src/mesa/drivers/dri/i965/gen8_ps_state.c > > > @@ -74,39 +74,6 @@ gen8_upload_ps_extra(struct brw_context *brw, > > > if (brw->gen >= 9 && prog_data->pulls_bary) > > >dw1 |= GEN9_PSX_SHADER_PULLS_BARY; > > > > > > - /* The stricter cross-primitive coherency guarantees that the > > hardware > > > -* gives us with the "Accesses UAV" bit set for at least one shader > > stage > > > -* and the "UAV coherency required" bit set on the 3DPRIMITIVE > > command are > > > -* redundant within the current image, atomic counter and SSBO GL > > APIs, > > > -* which all have very loose ordering and coherency requirements and > > > -* generally rely on the application to insert explicit barriers > > when a > > > -* shader invocation is expected to see the memory writes performed > > by the > > > -* invocations of some previous primitive. Regardless of the value > > of "UAV > > > -* coherency required", the "Accesses UAV" bits will implicitly > > cause an in > > > -* most cases useless DC flush when the lowermost stage with the bit > > set > > > -* finishes execution. > > > -* > > > -* It would be nice to disable it, but in some cases we can't > > because on > > > -* Gen8+ it also has an influence on rasterization via the PS > > UAV-only > > > -* signal (which could be set independently from the coherency > > mechanism in > > > -* the 3DSTATE_WM command on Gen7), and because in some cases it will > > > -* determine whether the hardware skips execution of the fragment > > shader or > > > -* not via the ThreadDispatchEnable signal. However if we know that > > > -* GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and > > > -* GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make > > any > > > -* difference so we may just disable it here. > > > -* > > > -* Gen8 hardware tries to compute ThreadDispatchEnable for us but > > doesn't > > > -* take into account KillPixels when no depth or stencil writes are > > enabled. > > > -* In order for occlusion queries to work correctly with no > > attachments, we > > > -* need to force-enable here. > > > -* > > > -* BRW_NEW_FS_PROG_DATA | BRW_NEW_FRAGMENT_PROGRAM | _NEW_BUFFERS | > > _NEW_COLOR > > > -*/ > > > - if ((prog_data->has_side_effects || prog_data->uses_kill) && > > > - !brw_color_buffer_write_enabled(brw)) > > > - dw1 |= GEN8_PSX_SHADER_HAS_UAV; > > > - > > > if (prog_data->computed_stencil) { > > >assert(brw->gen >= 9); > > >dw1 |= GEN9_PSX_SHADER_COMPUTES_STENCIL; > > > @@ -127,7 +94,6 @@ upload_ps_extra(struct brw_context *brw) > > > > > > const struct brw_tracked_state gen8_ps_extra = { > > > .dirty = { > > > - .mesa = _NEW_BUFFERS | _NEW_COLOR, > > >.brw = BRW_NEW_BLORP | > > > BRW_NEW_CONTEXT | > > > BRW_NEW_FRAGMENT_PROGRAM | > > > @@ -169,6 +135,20 @@ upload_wm_state(struct brw_context *brw) > > > else if (wm_prog_data->has_side_effects) > > >dw1 |= GEN7_WM_EARLY_DS_CONTROL_PSEXEC; > > > > > > + /* In most cases, the computation for the > > WM_INT::ThreadDispatchEnable > > > +
Re: [Mesa-dev] [PATCH 1/5] i965/fs: Fix the inline nir_op_pack_double optimization
On Tuesday, February 14, 2017 11:29:47 PM PST Jason Ekstrand wrote: > We can only do the optimization if the source *is* SSA. > > Cc: "13.0 17.0" > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 991c20f..94f2751 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -1219,7 +1219,7 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, > nir_alu_instr *instr) > * the unpack operation. > */ >for (int i = 0; i < 2; i++) { > - if (instr->src[i].src.is_ssa) > + if (!instr->src[i].src.is_ssa) > continue; > > const nir_instr *parent_instr = instr->src[i].src.ssa->parent_instr; > Series is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Always expose GREMEDY_string_marker.
Equivalent marker functionality is already included in KHR_debug, which we already expose unconditionally in all drivers (dummy_true). Grim Fandango Remastered apparently calls glStringMarkerGREMEDY() without checking for the extension, spewing GL errors. Assuming the existence of the extension is definitely not valid, but it also seems kinda mean to spew GL errors when we could simply expose the feature and silently ignore the provided string markers. This patch enables GREMEDY_string_marker everywhere, and makes the calls no-ops if the driver doesn't provide the EmitStringMarker() hook, just like we did for the KHR_debug functionality. This may impact freedreno, which actually puts markers in its command buffers. Signed-off-by: Kenneth Graunke --- src/mesa/main/debug_output.c | 4 +--- src/mesa/main/extensions_table.h | 2 +- src/mesa/state_tracker/st_debug.c | 1 - src/mesa/state_tracker/st_debug.h | 3 +-- src/mesa/state_tracker/st_extensions.c | 4 5 files changed, 3 insertions(+), 11 deletions(-) diff --git a/src/mesa/main/debug_output.c b/src/mesa/main/debug_output.c index bc933db93d4..1d2dee128b4 100644 --- a/src/mesa/main/debug_output.c +++ b/src/mesa/main/debug_output.c @@ -1308,12 +1308,10 @@ void GLAPIENTRY _mesa_StringMarkerGREMEDY(GLsizei len, const GLvoid *string) { GET_CURRENT_CONTEXT(ctx); - if (ctx->Extensions.GREMEDY_string_marker) { + if (ctx->Driver.EmitStringMarker) { /* if length not specified, string will be null terminated: */ if (len <= 0) len = strlen(string); ctx->Driver.EmitStringMarker(ctx, string, len); - } else { - _mesa_error(ctx, GL_INVALID_OPERATION, "StringMarkerGREMEDY"); } } diff --git a/src/mesa/main/extensions_table.h b/src/mesa/main/extensions_table.h index 7ea56c8422d..ec48aadde3f 100644 --- a/src/mesa/main/extensions_table.h +++ b/src/mesa/main/extensions_table.h @@ -285,7 +285,7 @@ EXT(EXT_vertex_array, dummy_true EXT(EXT_vertex_array_bgra , EXT_vertex_array_bgra , GLL, GLC, x , x , 2008) EXT(EXT_window_rectangles , EXT_window_rectangles , GLL, GLC, x , 30, 2016) -EXT(GREMEDY_string_marker , GREMEDY_string_marker , GLL, GLC, x , x , 2007) +EXT(GREMEDY_string_marker , dummy_true , GLL, GLC, x , x , 2007) EXT(IBM_multimode_draw_arrays , dummy_true , GLL, GLC, x , x , 1998) EXT(IBM_rasterpos_clip , dummy_true , GLL, x , x , x , 1996) diff --git a/src/mesa/state_tracker/st_debug.c b/src/mesa/state_tracker/st_debug.c index d6cb5cd57d8..f2e982c8c7a 100644 --- a/src/mesa/state_tracker/st_debug.c +++ b/src/mesa/state_tracker/st_debug.c @@ -58,7 +58,6 @@ static const struct debug_named_value st_debug_flags[] = { { "buffer", DEBUG_BUFFER, NULL }, { "wf", DEBUG_WIREFRAME, NULL }, { "precompile", DEBUG_PRECOMPILE, NULL }, - { "gremedy", DEBUG_GREMEDY, "Enable GREMEDY debug extensions" }, { "noreadpixcache", DEBUG_NOREADPIXCACHE, NULL }, DEBUG_NAMED_VALUE_END }; diff --git a/src/mesa/state_tracker/st_debug.h b/src/mesa/state_tracker/st_debug.h index 6c1e915f68c..4b92a669a37 100644 --- a/src/mesa/state_tracker/st_debug.h +++ b/src/mesa/state_tracker/st_debug.h @@ -50,8 +50,7 @@ st_print_current(void); #define DEBUG_BUFFER0x200 #define DEBUG_WIREFRAME 0x400 #define DEBUG_PRECOMPILE 0x800 -#define DEBUG_GREMEDY 0x1000 -#define DEBUG_NOREADPIXCACHE 0x2000 +#define DEBUG_NOREADPIXCACHE 0x1000 #ifdef DEBUG extern int ST_DEBUG; diff --git a/src/mesa/state_tracker/st_extensions.c b/src/mesa/state_tracker/st_extensions.c index 37fe4469c37..d9057c77657 100644 --- a/src/mesa/state_tracker/st_extensions.c +++ b/src/mesa/state_tracker/st_extensions.c @@ -1167,10 +1167,6 @@ void st_init_extensions(struct pipe_screen *screen, extensions->ARB_vertex_attrib_64bit = GL_TRUE; } - if ((ST_DEBUG & DEBUG_GREMEDY) && - screen->get_param(screen, PIPE_CAP_STRING_MARKER)) - extensions->GREMEDY_string_marker = GL_TRUE; - if (screen->get_param(screen, PIPE_CAP_COMPUTE)) { int compute_supported_irs = screen->get_shader_param(screen, PIPE_SHADER_COMPUTE, -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] i965: Make a helper for emitting 3DSTATE_CONSTANT_XS packets.
On Wednesday, February 15, 2017 3:23:14 PM PST Jordan Justen wrote: > On 2017-02-14 13:45:49, Kenneth Graunke wrote: [snip] > > diff --git a/src/mesa/drivers/dri/i965/gen6_constant_state.c > > b/src/mesa/drivers/dri/i965/gen6_constant_state.c > > index 6c0c32b26f7..7e6fa92ecf2 100644 > > --- a/src/mesa/drivers/dri/i965/gen6_constant_state.c > > +++ b/src/mesa/drivers/dri/i965/gen6_constant_state.c > > @@ -27,6 +27,97 @@ > > #include "intel_batchbuffer.h" > > #include "program/prog_parameter.h" > > > > +#define F(RELOC, BATCH, buf, x) \ > > + if (buf) { \ > > + RELOC(buf, I915_GEM_DOMAIN_RENDER, 0, x); \ > > + } else { \ > > + BATCH(x); \ > > + } > > +#define OUT_PTR64(buf, x) F(OUT_RELOC64, OUT_BATCH64, buf, x) > > +#define OUT_PTR(buf, x) F(OUT_RELOC, OUT_BATCH, buf, x) > > Is there a better name than 'x'? Unfortunately, I couldn't think of a > suggestion. :) Probably...but I couldn't think of one. It's some bits that get OR'd in...and can be pretty arbitrary. OUT_RELOC calls it "delta", but I'm not sure that historical name makes much sense these days. OUT_BATCH uses "d" which is arguably not better. [snip] > I didn't see where the 'mocs' value usage moved to in the new code. > > -Jordan Good point. It's zero on Gen6 and Gen8, but I accidentally dropped GEN7_MOCS_L3 on Gen7-7.5. I'll fix that and send a v2. Thanks for reading carefully! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/2] i965: Make a helper for emitting 3DSTATE_CONSTANT_XS packets.
This separates the logic from filling out a 3DSTATE_CONSTANT_XS packet from the decisions about what to put in the various buffers. It also should make it easier to use more than one buffer, should we decide to do so. It also provides a nice place to enforce the various restrictions via assertions. By marking the helper as inline, the code for unused buffers should be constant folded away. v2: Don't lose MOCS setting on Gen7-7.5. Signed-off-by: Kenneth Graunke Signed-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/gen6_constant_state.c | 170 1 file changed, 117 insertions(+), 53 deletions(-) Patch 1/2 remains unchanged so I didn't bother to send it again diff --git a/src/mesa/drivers/dri/i965/gen6_constant_state.c b/src/mesa/drivers/dri/i965/gen6_constant_state.c index 6c0c32b26f7..83c1dbab97b 100644 --- a/src/mesa/drivers/dri/i965/gen6_constant_state.c +++ b/src/mesa/drivers/dri/i965/gen6_constant_state.c @@ -27,70 +27,134 @@ #include "intel_batchbuffer.h" #include "program/prog_parameter.h" +#define F(RELOC, BATCH, buf, x) \ + if (buf) { \ + RELOC(buf, I915_GEM_DOMAIN_RENDER, 0, x); \ + } else { \ + BATCH(x); \ + } +#define OUT_PTR64(buf, x) F(OUT_RELOC64, OUT_BATCH64, buf, x) +#define OUT_PTR(buf, x) F(OUT_RELOC, OUT_BATCH, buf, x) + +static inline void +emit_3dstate_constant(struct brw_context *brw, + uint32_t opcode, + drm_intel_bo *bufs[4], + uint16_t read_lengths[4], + uint64_t offsets[4]) +{ + /* Buffer 0 is relative to Dynamic State Base Address, which we program +* to the start of the batch buffer. All others are graphics virtual +* addresses regardless of the INSTPM settings. +*/ + assert(bufs[0] == NULL || bufs[0] == brw->batch.bo); + + assert(read_lengths[0] == 0 || bufs[0] != NULL); + assert(read_lengths[1] == 0 || bufs[1] != NULL); + assert(read_lengths[2] == 0 || bufs[2] != NULL); + assert(read_lengths[3] == 0 || bufs[3] != NULL); + + if (brw->gen >= 8) { + BEGIN_BATCH(11); + OUT_BATCH(opcode << 16 | (11 - 2)); + OUT_BATCH(read_lengths[0] | read_lengths[1] << 16); + OUT_BATCH(read_lengths[2] | read_lengths[3] << 16); + OUT_BATCH64(offsets[0]); + OUT_PTR64(bufs[1], offsets[1]); + OUT_PTR64(bufs[2], offsets[2]); + OUT_PTR64(bufs[3], offsets[3]); + ADVANCE_BATCH(); + } else if (brw->gen == 7) { + /* From the Ivybridge PRM, Volume 2, Part 1, Page 112: + * "Constant buffers must be enabled in order from Constant Buffer 0 to + * Constant Buffer 3 within this command. For example, it is not + * allowed to enable Constant Buffer 1 by programming a non-zero value + * in the VS Constant Buffer 1 Read Length without a non-zero value in + * VS Constant Buffer 0 Read Length." + * + * Haswell removes this restriction. + */ + if (!brw->is_haswell) { + assert(read_lengths[3] == 0 || (read_lengths[2] > 0 && + read_lengths[1] > 0 && + read_lengths[0] > 0)); + assert(read_lengths[2] == 0 || (read_lengths[1] > 0 && + read_lengths[0] > 0)); + assert(read_lengths[1] == 0 || read_lengths[0] > 0); + } + + BEGIN_BATCH(7); + OUT_BATCH(opcode << 16 | (7 - 2)); + OUT_BATCH(read_lengths[0] | read_lengths[1] << 16); + OUT_BATCH(read_lengths[2] | read_lengths[3] << 16); + OUT_BATCH(offsets[0] | GEN7_MOCS_L3); + OUT_PTR(bufs[1], offsets[1]); + OUT_PTR(bufs[2], offsets[2]); + OUT_PTR(bufs[3], offsets[3]); + ADVANCE_BATCH(); + } else if (brw->gen == 6) { + /* From the Sandybridge PRM, Volume 2, Part 1, Page 138: + * "The sum of all four read length fields (each incremented to + * represent the actual read length) must be less than or equal + * to 32." + */ + assert(read_lengths[0] + read_lengths[1] + + read_lengths[2] + read_lengths[3] < 32); + + BEGIN_BATCH(5); + OUT_BATCH(opcode << 16 | (5 - 2) | +(read_lengths[0] ? GEN6_CONSTANT_BUFFER_0_ENABLE : 0) | +(read_lengths[1] ? GEN6_CONSTANT_BUFFER_1_ENABLE : 0) | +(read_lengths[2] ? GEN6_CONSTANT_BUFFER_2_ENABLE : 0) | +(read_lengths[3] ? GEN6_CONSTANT_BUFFER_3_ENABLE : 0)); + OUT_BATCH(offsets[0] | (read_lengths[0] - 1)); + OUT_PTR(bufs[1], offsets[1] | (read_lengths[1] - 1)); + OUT_PTR(bufs[2], offsets[2] | (read_lengths[2] - 1)); + OUT_PTR(bufs[3], offsets[3] | (read_lengths[3] - 1)); + ADVANCE_BATCH(); + } else { + unreachable("unhandled gen in emit_3dstate_constant"); + } +} + voi
Re: [Mesa-dev] [PATCH shader-db 3/4] run: set INTEL_NO_HW together with INTEL_DEVID_OVERRIDE
On Thursday, February 16, 2017 4:29:50 AM PST Lionel Landwerlin wrote: > Since we're already asking the driver to generate code for a different > hardware than what we're running on, better not even bother with emitting > any batch. > > Signed-off-by: Lionel Landwerlin > --- > run.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/run.c b/run.c > index 62c19c8..7543b2a 100644 > --- a/run.c > +++ b/run.c > @@ -370,6 +370,7 @@ main(int argc, char **argv) > > printf("### Compiling for %s ###\n", platform->name); > setenv("INTEL_DEVID_OVERRIDE", platform->pci_id, 1); > +setenv("INTEL_NO_HW", "1", 1); > break; > } > case 'j': > I don't think you need this patch - libdrm will already not execute batches if INTEL_DEVID_OVERRIDE is used to force a PCI ID that doesn't match the one on the system. Unless the fake PCI ID happens to match the one you're compiling for... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] i965: Update brw_save_primitives_written_counters for pre-Gen7.
Sandybridge and earlier only have a single counter. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/gen6_sol.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index 41158bd580c..8adac92d07d 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -297,11 +297,17 @@ brw_save_primitives_written_counters(struct brw_context *brw, brw_emit_mi_flush(brw); /* Emit MI_STORE_REGISTER_MEM commands to write the values. */ - for (int i = 0; i < streams; i++) { - int offset = (obj->prim_count_buffer_index + i) * sizeof(uint64_t); + if (brw->gen >= 7) { + for (int i = 0; i < streams; i++) { + int offset = (obj->prim_count_buffer_index + i) * sizeof(uint64_t); + brw_store_register_mem64(brw, obj->prim_count_bo, + GEN7_SO_NUM_PRIMS_WRITTEN(i), + offset); + } + } else { brw_store_register_mem64(brw, obj->prim_count_bo, - GEN7_SO_NUM_PRIMS_WRITTEN(i), - offset); + GEN6_SO_NUM_PRIMS_WRITTEN, + obj->prim_count_buffer_index * sizeof(uint64_t)); } /* Update where to write data to. */ -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/7] i965: Properly reset SVBI counters on ResumeTransformFeedback().
This fixes Piglit's ARB_transform_feedback2/change-objects-while-paused GLES 3.0 test. When resuming the transform feedback object, we need to reset the SVBI counters so we continue writing at the correct point in the buffer. Instead of SO_WRITE_OFFSET counters (with a DWord offset), we have the Streamed Vertex Buffer Index (SVBI) counters, which contain a count of vertices emitted. Unfortunately, there's no straightforward way to store the current SVBI counter values to a buffer. They're not available in a register. You can use a bit in the 3DSTATE_SVB_INDEX packet to copy them to another internal counter which 3DPRIMITIVE can use...but there's no good way to extract that either. So, once again, we use SO_NUM_PRIMS_WRITTEN to calculate the vertex numbers. Thankfully, we can reuse most of the existing Gen7+ code. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_context.c | 2 + src/mesa/drivers/dri/i965/brw_context.h | 6 ++ src/mesa/drivers/dri/i965/gen6_sol.c| 116 +++- 3 files changed, 107 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 7240b1f4455..977c59c1c3e 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -481,6 +481,8 @@ brw_init_driver_functions(struct brw_context *brw, } else { functions->BeginTransformFeedback = brw_begin_transform_feedback; functions->EndTransformFeedback = brw_end_transform_feedback; + functions->PauseTransformFeedback = brw_pause_transform_feedback; + functions->ResumeTransformFeedback = brw_resume_transform_feedback; } if (brw->gen >= 6) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 25c90645cea..df406c0d772 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1471,6 +1471,12 @@ void brw_end_transform_feedback(struct gl_context *ctx, struct gl_transform_feedback_object *obj); void +brw_pause_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj); +void +brw_resume_transform_feedback(struct gl_context *ctx, + struct gl_transform_feedback_object *obj); +void brw_save_primitives_written_counters(struct brw_context *brw, struct brw_transform_feedback_object *obj); void diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index f1cc2d59fd4..702cb2830f0 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -314,18 +314,12 @@ brw_save_primitives_written_counters(struct brw_context *brw, obj->prim_count_buffer_index += streams; } -/** - * Compute the number of vertices written by this transform feedback operation. - */ -void -brw_compute_xfb_vertices_written(struct brw_context *brw, - struct brw_transform_feedback_object *obj) +static void +compute_vertices_written_so_far(struct brw_context *brw, +struct brw_transform_feedback_object *obj, +uint64_t *vertices_written) { const struct gl_context *ctx = &brw->ctx; - - if (obj->vertices_written_valid || !obj->base.EndedAnytime) - return; - unsigned vertices_per_prim = 0; switch (obj->primitive_mode) { @@ -346,8 +340,22 @@ brw_compute_xfb_vertices_written(struct brw_context *brw, tally_prims_generated(brw, obj); for (int i = 0; i < ctx->Const.MaxVertexStreams; i++) { - obj->vertices_written[i] = vertices_per_prim * obj->prims_generated[i]; + vertices_written[i] = vertices_per_prim * obj->prims_generated[i]; } +} + +/** + * Compute the number of vertices written by this transform feedback operation. + */ +void +brw_compute_xfb_vertices_written(struct brw_context *brw, + struct brw_transform_feedback_object *obj) +{ + if (obj->vertices_written_valid || !obj->base.EndedAnytime) + return; + + compute_vertices_written_so_far(brw, obj, obj->vertices_written); + obj->vertices_written_valid = true; } @@ -423,18 +431,92 @@ brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, OUT_BATCH(0x); ADVANCE_BATCH(); } + + /* We're about to lose the information needed to compute the number of +* vertices written during the last Begin/EndTransformFeedback section, +* so we can't delay it any further. +*/ + brw_compute_xfb_vertices_written(brw, brw_obj); + + /* No primitives have been generated yet. */ + for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++) { + brw_obj->prims_generated[i] = 0; + } + + /* Store the starting
[Mesa-dev] [PATCH 2/7] i965: Move some code from gen7_sol_state.c to gen6_sol.c.
I plan to use these functions on Sandybridge soon. I changed the prefix on a couple of functions to "brw" instead of "gen7" as in theory they should be usable all the way back to G45. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_context.h| 6 ++ src/mesa/drivers/dri/i965/gen6_sol.c | 140 +++ src/mesa/drivers/dri/i965/gen7_sol_state.c | 148 + 3 files changed, 150 insertions(+), 144 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 01e651b09f0..8d9a75f884b 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1464,6 +1464,12 @@ brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, void brw_end_transform_feedback(struct gl_context *ctx, struct gl_transform_feedback_object *obj); +void +brw_save_primitives_written_counters(struct brw_context *brw, + struct brw_transform_feedback_object *obj); +void +brw_compute_xfb_vertices_written(struct brw_context *brw, + struct brw_transform_feedback_object *obj); GLsizei brw_get_transform_feedback_vertex_count(struct gl_context *ctx, struct gl_transform_feedback_object *obj, diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index 6f1d2c2fd04..ca06194ba15 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -220,6 +220,146 @@ brw_delete_transform_feedback(struct gl_context *ctx, free(brw_obj); } +/** + * Tally the number of primitives generated so far. + * + * The buffer contains a series of pairs: + * (, ) ; + * (, ) ; + * + * For each stream, we subtract the pair of values (end - start) to get the + * number of primitives generated during one section. We accumulate these + * values, adding them up to get the total number of primitives generated. + */ +static void +tally_prims_generated(struct brw_context *brw, + struct brw_transform_feedback_object *obj) +{ + /* If the current batch is still contributing to the number of primitives +* generated, flush it now so the results will be present when mapped. +*/ + if (drm_intel_bo_references(brw->batch.bo, obj->prim_count_bo)) + intel_batchbuffer_flush(brw); + + if (unlikely(brw->perf_debug && drm_intel_bo_busy(obj->prim_count_bo))) + perf_debug("Stalling for # of transform feedback primitives written.\n"); + + drm_intel_bo_map(obj->prim_count_bo, false); + uint64_t *prim_counts = obj->prim_count_bo->virtual; + + assert(obj->prim_count_buffer_index % (2 * BRW_MAX_XFB_STREAMS) == 0); + int pairs = obj->prim_count_buffer_index / (2 * BRW_MAX_XFB_STREAMS); + + for (int i = 0; i < pairs; i++) { + for (int s = 0; s < BRW_MAX_XFB_STREAMS; s++) { + obj->prims_generated[s] += +prim_counts[BRW_MAX_XFB_STREAMS + s] - prim_counts[s]; + } + prim_counts += 2 * BRW_MAX_XFB_STREAMS; /* move to the next pair */ + } + + drm_intel_bo_unmap(obj->prim_count_bo); + + /* We've already gathered up the old data; we can safely overwrite it now. */ + obj->prim_count_buffer_index = 0; +} + +/** + * Store the SO_NUM_PRIMS_WRITTEN counters for each stream (4 uint64_t values) + * to prim_count_bo. + * + * If prim_count_bo is out of space, gather up the results so far into + * prims_generated[] and allocate a new buffer with enough space. + * + * The number of primitives written is used to compute the number of vertices + * written to a transform feedback stream, which is required to implement + * DrawTransformFeedback(). + */ +void +brw_save_primitives_written_counters(struct brw_context *brw, + struct brw_transform_feedback_object *obj) +{ + const int streams = BRW_MAX_XFB_STREAMS; + + /* Check if there's enough space for a new pair of four values. */ + if (obj->prim_count_bo != NULL && + obj->prim_count_buffer_index + 2 * streams >= 4096 / sizeof(uint64_t)) { + /* Gather up the results so far and release the BO. */ + tally_prims_generated(brw, obj); + } + + /* Flush any drawing so that the counters have the right values. */ + brw_emit_mi_flush(brw); + + /* Emit MI_STORE_REGISTER_MEM commands to write the values. */ + for (int i = 0; i < streams; i++) { + int offset = (obj->prim_count_buffer_index + i) * sizeof(uint64_t); + brw_store_register_mem64(brw, obj->prim_count_bo, + GEN7_SO_NUM_PRIMS_WRITTEN(i), + offset); + } + + /* Update where to write data to. */ + obj->prim_count_buffer_index += streams; +} + +/** + * Compute the number of vert
[Mesa-dev] [PATCH 5/7] i965: Save max_index in brw_transform_feedback_object.
I'm going to need this in a new Resume hook shortly. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_context.h | 6 ++ src/mesa/drivers/dri/i965/gen6_sol.c| 6 -- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 8d9a75f884b..25c90645cea 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -567,6 +567,12 @@ struct brw_transform_feedback_object { GLenum primitive_mode; /** +* The maximum number of vertices that we can write without overflowing +* any of the buffers currently being used for transform feedback. +*/ + unsigned max_index; + + /** * Count of primitives generated during this transform feedback operation. * @{ */ diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index 8adac92d07d..f1cc2d59fd4 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -382,6 +382,8 @@ brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, const struct gl_transform_feedback_info *linked_xfb_info; struct gl_transform_feedback_object *xfb_obj = ctx->TransformFeedback.CurrentObject; + struct brw_transform_feedback_object *brw_obj = + (struct brw_transform_feedback_object *) xfb_obj; assert(brw->gen == 6); @@ -397,7 +399,7 @@ brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, /* Compute the maximum number of vertices that we can write without * overflowing any of the buffers currently being used for feedback. */ - unsigned max_index + brw_obj->max_index = _mesa_compute_max_transform_feedback_vertices(ctx, xfb_obj, linked_xfb_info); @@ -406,7 +408,7 @@ brw_begin_transform_feedback(struct gl_context *ctx, GLenum mode, OUT_BATCH(_3DSTATE_GS_SVB_INDEX << 16 | (4 - 2)); OUT_BATCH(0); /* SVBI 0 */ OUT_BATCH(0); /* starting index */ - OUT_BATCH(max_index); + OUT_BATCH(brw_obj->max_index); ADVANCE_BATCH(); /* Initialize the rest of the unused streams to sane values. Otherwise, -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] i965: Drop dead Gen8+ code from Gen7/sometimes-HSW driver hooks.
These driver hooks are not used when MI_MATH and MI_LOAD_REGISTER_REG are supported, which Gen8+ can always do. So this code is dead. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/gen7_sol_state.c | 50 ++ 1 file changed, 24 insertions(+), 26 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen7_sol_state.c b/src/mesa/drivers/dri/i965/gen7_sol_state.c index e6b79ed2342..50631610e51 100644 --- a/src/mesa/drivers/dri/i965/gen7_sol_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sol_state.c @@ -490,13 +490,11 @@ gen7_begin_transform_feedback(struct gl_context *ctx, GLenum mode, struct brw_transform_feedback_object *brw_obj = (struct brw_transform_feedback_object *) obj; + assert(brw->gen == 7); + /* Reset the SO buffer offsets to 0. */ - if (brw->gen >= 8) { - brw_obj->zero_offsets = true; - } else { - intel_batchbuffer_flush(brw); - brw->batch.needs_sol_reset = true; - } + intel_batchbuffer_flush(brw); + brw->batch.needs_sol_reset = true; /* We're about to lose the information needed to compute the number of * vertices written during the last Begin/EndTransformFeedback section, @@ -552,17 +550,17 @@ gen7_pause_transform_feedback(struct gl_context *ctx, /* Flush any drawing so that the counters have the right values. */ brw_emit_mi_flush(brw); + assert(brw->gen == 7); + /* Save the SOL buffer offset register values. */ - if (brw->gen < 8) { - for (int i = 0; i < 4; i++) { - BEGIN_BATCH(3); - OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2)); - OUT_BATCH(GEN7_SO_WRITE_OFFSET(i)); - OUT_RELOC(brw_obj->offset_bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - i * sizeof(uint32_t)); - ADVANCE_BATCH(); - } + for (int i = 0; i < 4; i++) { + BEGIN_BATCH(3); + OUT_BATCH(MI_STORE_REGISTER_MEM | (3 - 2)); + OUT_BATCH(GEN7_SO_WRITE_OFFSET(i)); + OUT_RELOC(brw_obj->offset_bo, +I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, +i * sizeof(uint32_t)); + ADVANCE_BATCH(); } /* Store the temporary ending value of the SO_NUM_PRIMS_WRITTEN counters. @@ -581,17 +579,17 @@ gen7_resume_transform_feedback(struct gl_context *ctx, struct brw_transform_feedback_object *brw_obj = (struct brw_transform_feedback_object *) obj; + assert(brw->gen == 7); + /* Reload the SOL buffer offset registers. */ - if (brw->gen < 8) { - for (int i = 0; i < 4; i++) { - BEGIN_BATCH(3); - OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (3 - 2)); - OUT_BATCH(GEN7_SO_WRITE_OFFSET(i)); - OUT_RELOC(brw_obj->offset_bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - i * sizeof(uint32_t)); - ADVANCE_BATCH(); - } + for (int i = 0; i < 4; i++) { + BEGIN_BATCH(3); + OUT_BATCH(GEN7_MI_LOAD_REGISTER_MEM | (3 - 2)); + OUT_BATCH(GEN7_SO_WRITE_OFFSET(i)); + OUT_RELOC(brw_obj->offset_bo, +I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, +i * sizeof(uint32_t)); + ADVANCE_BATCH(); } /* Store the new starting value of the SO_NUM_PRIMS_WRITTEN counters. */ -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] i965: Enable ARB_transform_feedback2 on Sandybridge.
The only feature over and above ES 3.0 is DrawTransformFeedback(). We already have to do the whole SOL_NUM_PRIMS_WRITTEN counter dance in order to compute the SVBI value for ResumeTransformFeedback(), at which point our existing GetTransformFeedbackVertexCount() implementation will do the trick (though with a stall to CPU map the buffer). Someday, we could probably implement DrawTransformFeedback() more efficiently, using the "Load Internal Vertex Count" feature of 3DSTATE_SVB_INDEX and the 3DPRIMITIVE indirect vertex count bit. Rumor has it this allows people to use WebGL 2.0 on Sandybridge. Note that we don't need pipelined register writes like Gen7+ because we use the 3DSTATE_SVB_INDEX command rather than MI_LOAD_REGISTER_MEM. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99842 Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_context.c | 2 ++ src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++ 2 files changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 977c59c1c3e..17800e3fd60 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -483,6 +483,8 @@ brw_init_driver_functions(struct brw_context *brw, functions->EndTransformFeedback = brw_end_transform_feedback; functions->PauseTransformFeedback = brw_pause_transform_feedback; functions->ResumeTransformFeedback = brw_resume_transform_feedback; + functions->GetTransformFeedbackVertexCount = + brw_get_transform_feedback_vertex_count; } if (brw->gen >= 6) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index f1290bf7b49..ec7ff02be84 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -158,6 +158,9 @@ intelInitExtensions(struct gl_context *ctx) ctx->Extensions.EXT_timer_query = true; } + if (brw->gen == 6) + ctx->Extensions.ARB_transform_feedback2 = true; + if (brw->gen >= 6) { ctx->Extensions.ARB_blend_func_extended = !driQueryOptionb(&brw->optionCache, "disable_blend_func_extended"); -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] i965: Use ctx->Const.MaxVertexStreams rather than BRW_XFB_MAX_STREAMS.
This way on Sandybridge we'll only do 1 stream worth of math, since we only have one SO_NUM_PRIMS_WRITTEN counter. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/gen6_sol.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_sol.c b/src/mesa/drivers/dri/i965/gen6_sol.c index ca06194ba15..41158bd580c 100644 --- a/src/mesa/drivers/dri/i965/gen6_sol.c +++ b/src/mesa/drivers/dri/i965/gen6_sol.c @@ -230,11 +230,16 @@ brw_delete_transform_feedback(struct gl_context *ctx, * For each stream, we subtract the pair of values (end - start) to get the * number of primitives generated during one section. We accumulate these * values, adding them up to get the total number of primitives generated. + * + * Note that we expose one stream pre-Gen7, so the above is just (start, end). */ static void tally_prims_generated(struct brw_context *brw, struct brw_transform_feedback_object *obj) { + const struct gl_context *ctx = &brw->ctx; + const int streams = ctx->Const.MaxVertexStreams; + /* If the current batch is still contributing to the number of primitives * generated, flush it now so the results will be present when mapped. */ @@ -247,15 +252,14 @@ tally_prims_generated(struct brw_context *brw, drm_intel_bo_map(obj->prim_count_bo, false); uint64_t *prim_counts = obj->prim_count_bo->virtual; - assert(obj->prim_count_buffer_index % (2 * BRW_MAX_XFB_STREAMS) == 0); - int pairs = obj->prim_count_buffer_index / (2 * BRW_MAX_XFB_STREAMS); + assert(obj->prim_count_buffer_index % (2 * streams) == 0); + int pairs = obj->prim_count_buffer_index / (2 * streams); for (int i = 0; i < pairs; i++) { - for (int s = 0; s < BRW_MAX_XFB_STREAMS; s++) { - obj->prims_generated[s] += -prim_counts[BRW_MAX_XFB_STREAMS + s] - prim_counts[s]; + for (int s = 0; s < streams; s++) { + obj->prims_generated[s] += prim_counts[streams + s] - prim_counts[s]; } - prim_counts += 2 * BRW_MAX_XFB_STREAMS; /* move to the next pair */ + prim_counts += 2 * streams; /* move to the next pair */ } drm_intel_bo_unmap(obj->prim_count_bo); @@ -279,7 +283,8 @@ void brw_save_primitives_written_counters(struct brw_context *brw, struct brw_transform_feedback_object *obj) { - const int streams = BRW_MAX_XFB_STREAMS; + const struct gl_context *ctx = &brw->ctx; + const int streams = ctx->Const.MaxVertexStreams; /* Check if there's enough space for a new pair of four values. */ if (obj->prim_count_bo != NULL && @@ -310,6 +315,8 @@ void brw_compute_xfb_vertices_written(struct brw_context *brw, struct brw_transform_feedback_object *obj) { + const struct gl_context *ctx = &brw->ctx; + if (obj->vertices_written_valid || !obj->base.EndedAnytime) return; @@ -332,7 +339,7 @@ brw_compute_xfb_vertices_written(struct brw_context *brw, /* Get the number of primitives generated. */ tally_prims_generated(brw, obj); - for (int i = 0; i < BRW_MAX_XFB_STREAMS; i++) { + for (int i = 0; i < ctx->Const.MaxVertexStreams; i++) { obj->vertices_written[i] = vertices_per_prim * obj->prims_generated[i]; } obj->vertices_written_valid = true; @@ -354,7 +361,7 @@ brw_get_transform_feedback_vertex_count(struct gl_context *ctx, (struct brw_transform_feedback_object *) obj; assert(obj->EndedAnytime); - assert(stream < BRW_MAX_XFB_STREAMS); + assert(stream < ctx->Const.MaxVertexStreams); brw_compute_xfb_vertices_written(brw, brw_obj); return brw_obj->vertices_written[stream]; -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vbo: kill primitive restart lowering in glDrawArrays
On Monday, February 20, 2017 10:35:36 AM PST Marek Olšák wrote: > From: Marek Olšák > > --- > src/mesa/vbo/vbo_exec_array.c | 56 > ++- > 1 file changed, 7 insertions(+), 49 deletions(-) > > diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c > index 6a96167..30c52d5 100644 > --- a/src/mesa/vbo/vbo_exec_array.c > +++ b/src/mesa/vbo/vbo_exec_array.c > @@ -404,77 +404,35 @@ vbo_bind_arrays(struct gl_context *ctx) > */ > static void > vbo_draw_arrays(struct gl_context *ctx, GLenum mode, GLint start, > GLsizei count, GLuint numInstances, GLuint baseInstance) > { > struct vbo_context *vbo = vbo_context(ctx); > struct _mesa_prim prim[2]; > > vbo_bind_arrays(ctx); > > - /* init most fields to zero */ > + /* OpenGL 4.5 says that primitive restart is ignored with non-indexed > +* draws. > +*/ Cool! Yes it does. ES 3.2 also seems to agree. i965 hardware doesn't support primitive restart on non-indexed draws (yet). Apparently neither does AMD hardware. NVIDIA does. So both of us would need this lowering...if it were required. But it isn't, so goodbye code! :) Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 0/6] Add support for ARB_transform_feedback_overflow_query.
On Friday, January 20, 2017 9:53:21 AM PST Rafael Antognolli wrote: > This patch series implements the ARB_transform_feedback_overflow_query > extension for i965. > > Changes for v4: > - Reuse of MI_MATH calcs from hsw_queryobj.c in brw_conditional_render.c > - Renamed a couple functions as suggested by Kenneth > - Fallback to CPU-side conditional rendering if MI_MATH is not available. > > The series is available on github here: > > https://github.com/rantogno/mesa/tree/review/overflow_query-v04 > > There are also piglit tests available for it here: > > https://github.com/rantogno/piglit/tree/review/overflow_query-v05 > > Regards, > Rafael Looks great! Series is: Reviewed-by: Kenneth Graunke I'm guessing you don't have commit access, so I'll try and push it... signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] Model INTEL perf query backend after query object BE
ryData)(struct gl_context *ctx, > +struct gl_perf_query_object *obj, > +GLsizei dataSize, > +GLuint *data, > +GLuint *bytesWritten); > + /*@}*/ > + > + > + /** > * \name GREMEDY debug/marker functions > */ > /*@{*/ > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index f3a24df589..e6cf1f4af6 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -1860,6 +1860,23 @@ struct gl_perf_monitor_group > > > /** > + * A query object instance as described in INTEL_performance_query. > + * > + * NB: We want to keep this and the corresponding backend structure > + * relatively lean considering that applications may expect to > + * allocate enough objects to be able to query around all draw calls > + * in a frame. > + */ > +struct gl_perf_query_object > +{ > + GLuint Id;/**< hash table ID/name */ > + GLuint Used:1;/**< has been used for 1 or more queries */ > + GLuint Active:1; /**< inside Begin/EndPerfQuery */ > + GLuint Ready:1; /**< result is ready? */ Please use "unsigned Id" and "bool Used:1" - we're trying to get away from GL type aliases when not directly API-facing. [snip] > - /* The specification does not state that this produces an error. */ > + /* The specification does not state that this produces an error but > +* to be consistent with glGetFirstPerfQueryIdINTEL we generate an > +* INVALID_VALUE error */ */ goes on its own line. With the "mesa: ..." prefix added, patches 1-2 are (weakly): Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] i965: Implement INTEL_performance_query backend
On Thursday, February 16, 2017 5:20:37 AM PST Robert Bragg wrote: [snip] > + switch(obj->query->kind) { Space after "switch" please. Patch 3 is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] Model INTEL perf query backend after query object BE
On Wednesday, February 22, 2017 10:35:24 AM PST Robert Bragg wrote: > On Wed, Feb 22, 2017 at 1:24 AM, Kenneth Graunke > wrote: > > On Wednesday, February 15, 2017 1:37:36 PM PST Robert Bragg wrote: [snip] > >> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > >> index f3a24df589..e6cf1f4af6 100644 > >> --- a/src/mesa/main/mtypes.h > >> +++ b/src/mesa/main/mtypes.h > >> @@ -1860,6 +1860,23 @@ struct gl_perf_monitor_group > >> > >> > >> /** > >> + * A query object instance as described in INTEL_performance_query. > >> + * > >> + * NB: We want to keep this and the corresponding backend structure > >> + * relatively lean considering that applications may expect to > >> + * allocate enough objects to be able to query around all draw calls > >> + * in a frame. > >> + */ > >> +struct gl_perf_query_object > >> +{ > >> + GLuint Id;/**< hash table ID/name */ > >> + GLuint Used:1;/**< has been used for 1 or more queries */ > >> + GLuint Active:1; /**< inside Begin/EndPerfQuery */ > >> + GLuint Ready:1; /**< result is ready? */ > > > > Please use "unsigned Id" and "bool Used:1" - we're trying to get away > > from GL type aliases when not directly API-facing. > > Ah right I was generally aware of that but doing a skimming everything > with this in mind I found a few other little bits to clean up though I > ended having some second thoughts about these particular members: > > This Id is a record of the GLuint ID given to the application, just > used for debugging currently. The value is returned by > _mesa_HashFindFreeKeyBlock() which is currently implemented in terms > of the GLuint type. One other place where we access the same ID for > debugging is via _mesa_HashWalk() which takes a callback expecting a > GLuint argument. I can still change, but when I thought about this it > felt like it was indeed a directly api facing value. > > For the bitfields I started over thinking what it means to have a bool > bitfield since I doubted whether it could be assumed to be unsigned > and then wondered about the potential for a bug with some code trying > to compare a bitfield value == 1, or indexing an array. Does Mesa > require a c99 compiler, otherwise I don't think it's unheard of for > bool to end up as a signed int typedef. Anyway, besides the > overly-pedantic thought, I guessed you wouldn't really mind me using > "unsigned Used:1;" for the sake of avoiding GLuint. I don't think it > would make a practical difference since the struct will be naturally > padded to 8 bytes in all likelyhood either way. I'll prod you on IRC > to check if you really have a strong opinion here before I push. We assume bool types become 0 or 1 when cast to integers, as guaranteed by C99. There are existing examples of bool:1 in NIR, i965, and vc4. I think it should be OK in Mesa core. But, feel free to use unsigned. Or GLuint when you have a rationale for doing so, as above. A lot of people just use GL types everywhere for no particular reason, but it makes some sense here. --Ken signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: Raise a link error for non-SSO ES programs with a TES but no TCS.
OpenGL allows the TCS to be missing and supplies an implicit passthrough shader, but OpenGL ES does not. One open question is how to handle this for ARB_ES3_2_compatibility. This patch raises the link error for all ES shading language programs, but it might make sense to base it on the API. The approach taken in this patch is more restrictive, but should still allow any valid ES programs to work in GL. Signed-off-by: Kenneth Graunke --- src/compiler/glsl/linker.cpp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp index b6f8bc4212e..9e68c6e2d71 100644 --- a/src/compiler/glsl/linker.cpp +++ b/src/compiler/glsl/linker.cpp @@ -4743,6 +4743,16 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) "tessellation evaluation shader\n"); goto done; } + + if (prog->IsES) { + if (num_shaders[MESA_SHADER_TESS_EVAL] > 0 && + num_shaders[MESA_SHADER_TESS_CTRL] == 0) { +linker_error(prog, "GLSL ES requires non-separable programs " + "containing a tessellation evaluation shader to also " + "be linked with a tessellation control shader\n"); +goto done; + } + } } /* Compute shaders have additional restrictions. */ -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Use _mesa_has_OES_geometry_shader() when validating draws
On Thursday, February 23, 2017 12:05:18 AM PST Tomasz Figa wrote: > In validate_DrawElements_common() we need to check for OES_geometry_shader > extension to determine if we should fail if transform feedback is > unpaused. However current code reads ctx->Extensions.OES_geometry_shader > directly, which does not take context version into account. This means > that if the context is GLES 3.0, which makes the OES_geometry_shader > inapplicable, we would not validate the draw properly. To fix it, let's > replace the check with a call to _mesa_has_OES_geometry_shader(). > > Fixes following dEQP tests on i965 with a GLES 3.0 context: > > dEQP-GLES3.functional.negative_api.vertex_array#draw_elements > dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_incomplete_primitive > dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced > dEQP-GLES3.functional.negative_api.vertex_array#draw_elements_instanced_incomplete_primitive > dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements > dEQP-GLES3.functional.negative_api.vertex_array#draw_range_elements_incomplete_primitive > > Change-Id: Iebc960b479fcd5f6c2b1501cb3e7798b575e6c4d > Signed-off-by: Tomasz Figa > --- > src/mesa/main/api_validate.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c > index 1e8a714067..184bf143ed 100644 > --- a/src/mesa/main/api_validate.c > +++ b/src/mesa/main/api_validate.c > @@ -664,7 +664,8 @@ validate_DrawElements_common(struct gl_context *ctx, > * to have been overlooked. The body of the spec only explicitly allows > * the indirect versions. > */ > - if (_mesa_is_gles3(ctx) && !ctx->Extensions.OES_geometry_shader && > + if (_mesa_is_gles3(ctx) && > + !_mesa_has_OES_geometry_shader(ctx) && > _mesa_is_xfb_active_and_unpaused(ctx)) { >_mesa_error(ctx, GL_INVALID_OPERATION, >"%s(transform feedback active)", caller); > Reviewed-by: Kenneth Graunke I'll push this tomorrow if there are no objections. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Raise a link error for non-SSO ES programs with a TES but no TCS.
On Thursday, February 23, 2017 4:03:36 AM PST Andres Gomez wrote: > I would welcome a reference to the text in Secption 7.3 of the OpenGL > ES 3.2 specs as a code comment and commit text but, other than that, > this is: > > Reviewed-by: Andres Gomez I'll add it to the commit message. I wasn't sure about adding it to the code since the block above quotes the very same section, so it seemed like it was already cited (although the particular sentence isn't). Of course, that didn't show up in the patch...sorry :( Thanks for the review! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Use _mesa_has_OES_geometry_shader() when validating draws
On Thursday, February 23, 2017 10:46:54 AM PST Ilia Mirkin wrote: > The assumption was that if you're setting that ext, you can do gles31. Is > there a driver/hw combo where that's not the case? If so, we should fix > that instead... ChromeOS/ARC++ currently disables GLES 3.1 on i965 because we haven't passed all the dEQP tests for it. So, we support the functionality, but they're using GLES 3.0 contexts. I'm guessing that's why they hit this and I haven't. They could probably disable the bit too, but checking whether the feature is actually exposed seems pretty reasonable. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] ralloc: Delete autofree handling.
There was exactly one user of this, and I just removed it. It also accessed an implicit global context, with no locking. This meant that it was only safe if all callers of ralloc_autofree_context() held the same lock...which is a pretty terrible thing for a utility library to impose. Signed-off-by: Kenneth Graunke --- src/util/ralloc.c | 18 -- src/util/ralloc.h | 9 - 2 files changed, 27 deletions(-) diff --git a/src/util/ralloc.c b/src/util/ralloc.c index 980e4e4f138..d5cc16766b1 100644 --- a/src/util/ralloc.c +++ b/src/util/ralloc.c @@ -323,24 +323,6 @@ ralloc_parent(const void *ptr) return info->parent ? PTR_FROM_HEADER(info->parent) : NULL; } -static void *autofree_context = NULL; - -static void -autofree(void) -{ - ralloc_free(autofree_context); -} - -void * -ralloc_autofree_context(void) -{ - if (unlikely(autofree_context == NULL)) { - autofree_context = ralloc_context(NULL); - atexit(autofree); - } - return autofree_context; -} - void ralloc_set_destructor(const void *ptr, void(*destructor)(void *)) { diff --git a/src/util/ralloc.h b/src/util/ralloc.h index 3e2d342b45e..7d906519661 100644 --- a/src/util/ralloc.h +++ b/src/util/ralloc.h @@ -247,15 +247,6 @@ void ralloc_adopt(const void *new_ctx, void *old_ctx); void *ralloc_parent(const void *ptr); /** - * Return a context whose memory will be automatically freed at program exit. - * - * The first call to this function creates a context and registers a handler - * to free it using \c atexit. This may cause trouble if used in a library - * loaded with \c dlopen. - */ -void *ralloc_autofree_context(void); - -/** * Set a callback to occur just before an object is freed. */ void ralloc_set_destructor(const void *ptr, void(*destructor)(void *)); -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] compiler: Free types in _mesa_glsl_release_types() rather than autofree.
Instead of using ralloc_autofree_context() to install an atexit() handler to ralloc_free(glsl_type::mem_ctx), we can simply free them from _mesa_glsl_release_types(). This is effectively the same, because _mesa_glsl_release_types() is called from _mesa_destroy_shader_compiler(), which is called from Mesa's one_time_fini() function, which Mesa installs as an atexit() handler. The one advantage here is that it ensures the built-in functions are destroyed before the types. Signed-off-by: Kenneth Graunke --- src/compiler/glsl_types.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/compiler/glsl_types.cpp b/src/compiler/glsl_types.cpp index a431edcdd3f..1da631dcb98 100644 --- a/src/compiler/glsl_types.cpp +++ b/src/compiler/glsl_types.cpp @@ -40,7 +40,7 @@ void glsl_type::init_ralloc_type_ctx(void) { if (glsl_type::mem_ctx == NULL) { - glsl_type::mem_ctx = ralloc_autofree_context(); + glsl_type::mem_ctx = ralloc_context(NULL); assert(glsl_type::mem_ctx != NULL); } } @@ -416,6 +416,9 @@ _mesa_glsl_release_types(void) _mesa_hash_table_destroy(glsl_type::interface_types, NULL); glsl_type::interface_types = NULL; } + + ralloc_free(glsl_type::mem_ctx); + glsl_type::mem_ctx = NULL; } -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Require mipmap completeness for glCopyImageSubData(), sometimes.
This patch makes glCopyImageSubData require mipmap completeness when the texture object's built-in sampler object has a mipmapping MinFilter. Fixes (on i965): dEQP-GLES31.functional.debug.negative_coverage.*.buffer.copy_image_sub_data Signed-off-by: Kenneth Graunke --- src/mesa/main/copyimage.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/copyimage.c b/src/mesa/main/copyimage.c index cf25159e880..877c8ac246d 100644 --- a/src/mesa/main/copyimage.c +++ b/src/mesa/main/copyimage.c @@ -149,9 +149,30 @@ prepare_target(struct gl_context *ctx, GLuint name, GLenum target, return false; } + /* The ARB_copy_image specification says: + * + *"INVALID_OPERATION is generated if either object is a texture and + * the texture is not complete (as defined in section 3.9.14)" + * + * The cited section says: + * + *"Using the preceding definitions, a texture is complete unless any + * of the following conditions hold true: [...] + * + * * The minification filter requires a mipmap (is neither NEAREST + * nor LINEAR), and the texture is not mipmap complete." + * + * This imposes the bizarre restriction that glCopyImageSubData requires + * mipmap completion at times, which dEQP mandates, and other drivers + * appear to implement. We don't have any texture units here, so we + * can't look at any bound separate sampler objects...it appears that + * you're supposed to use the sampler object which is built-in to the + * texture object. + * + * See https://cvs.khronos.org/bugzilla/show_bug.cgi?id=16224. + */ _mesa_test_texobj_completeness(ctx, texObj); - if (!texObj->_BaseComplete || - (level != 0 && !texObj->_MipmapComplete)) { + if (!_mesa_is_texture_complete(texObj, &texObj->Sampler)) { _mesa_error(ctx, GL_INVALID_OPERATION, "glCopyImageSubData(%sName incomplete)", dbg_prefix); return false; -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Move intel_resolve_map.[ch] from i965_compiler_FILES to i965_FILES
I have no idea why these were part of the compiler files. They're miptree related code, and the compiler doesn't appear to use them. Signed-off-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/Makefile.sources | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 5278e86339a..633c3dc00a4 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -85,9 +85,7 @@ i965_compiler_FILES = \ intel_asm_annotation.c \ intel_asm_annotation.h \ intel_debug.c \ - intel_debug.h \ - intel_resolve_map.c \ - intel_resolve_map.h + intel_debug.h i965_compiler_GENERATED_FILES = \ brw_nir_trig_workarounds.c @@ -236,6 +234,8 @@ i965_FILES = \ intel_pixel_draw.c \ intel_pixel.h \ intel_pixel_read.c \ + intel_resolve_map.c \ + intel_resolve_map.h \ intel_screen.c \ intel_screen.h \ intel_state.c \ -- 2.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Require mipmap completeness for glCopyImageSubData(), sometimes.
On Monday, February 27, 2017 4:54:37 AM PST Kenneth Graunke wrote: > This patch makes glCopyImageSubData require mipmap completeness when the > texture object's built-in sampler object has a mipmapping MinFilter. > > Fixes (on i965): > dEQP-GLES31.functional.debug.negative_coverage.*.buffer.copy_image_sub_data > > Signed-off-by: Kenneth Graunke > --- > src/mesa/main/copyimage.c | 25 +++-- > 1 file changed, 23 insertions(+), 2 deletions(-) Oops...I accidentally mailed this out before regression testing it (waited for my test job to come back, saw it was green...but it was the results for a different patch). It looks like it regresses some Piglit tests (but I think the tests are probably what needs to change). I'll look into it. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev