Re: [Mesa-dev] [PATCH] i965: Actually check every primitive for cut index support.
On 09/01/2013 07:05 PM, Kenneth Graunke wrote: can_cut_index_handle_prims() was passed an array of _mesa_prim objects and a count, and ran a loop for that many iterations. However, it treated the array like a pointer, repeatedly checking the first element. Blarg. How would an application be able to observe the old bug behavior? Can we come up with a test case that might tickle it? This patch makes it actually check every primitive. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Either way, the patch is obviously correct. Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/mesa/drivers/dri/i965/brw_primitive_restart.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_primitive_restart.c b/src/mesa/drivers/dri/i965/brw_primitive_restart.c index 0dbc48f..b305dca 100644 --- a/src/mesa/drivers/dri/i965/brw_primitive_restart.c +++ b/src/mesa/drivers/dri/i965/brw_primitive_restart.c @@ -92,8 +92,8 @@ can_cut_index_handle_prims(struct gl_context *ctx, return false; } - for ( ; nr_prims 0; nr_prims--) { - switch(prim-mode) { + for (int i = 0; i nr_prims; i++) { + switch (prim[i].mode) { case GL_POINTS: case GL_LINES: case GL_LINE_STRIP: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glx: Initialize OpenGL version to 1.0
Please send patches only using git-send-email. Send patches as attachments prevents people from being able to provide in-line review comments. On 09/01/2013 12:30 PM, Rico Schüller wrote: Some driver/card combinations (r200/RV280, i915/915G) don't support OpenGL 2.1. These create in some corner cases an indirect context This was a typo on my part. The Linux ABI requires OpenGL 1.2, so every driver will support that. I meant to type uint32_t major_ver = 1; uint32_t minor_ver = 2; but instead typed uint32_t minor_ver = 1; uint32_t major_ver = 2; Copy and paste did the rest. :( All of your other changes are, I think, unnecessary code motion. Does making that one change in dri2_glx.c and drisw_glx.c fix the problem? instead of a direct context when calling glXCreateContextAttribsARB(). This happens because of a bad default value. To avoid this, choose a more sane default OpenGL 1.0 as mentioned in the ARB_create_context spec: The default values for GLX_CONTEXT_MAJOR_VERSION_ARB and GLX_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this case, implementations will typically return the most recent version of OpenGL they support which is backwards compatible with OpenGL 1.0 (e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility profile) This fixes: http://bugs.winehq.org/show_bug.cgi?id=34238 Signed-off-by: Rico Schüller kgbric...@web.de --- src/glx/dri2_glx.c | 10 +- src/glx/dri_common.c | 14 +++--- src/glx/drisw_glx.c | 10 +- 3 files changed, 17 insertions(+), 17 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 54080] glXQueryDrawable fails with GLXBadDrawable for a Window in direct context
https://bugs.freedesktop.org/show_bug.cgi?id=54080 Alexander Monakov amona...@gmail.com changed: What|Removed |Added CC||a...@nwnk.net --- Comment #2 from Alexander Monakov amona...@gmail.com --- Adam, you seem to have looked at this a couple of times. Will it be fixed with your pending GLX patches? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glx: Initialize OpenGL version to 1.0
On 03.09.2013 01:54, Ian Romanick wrote: Please send patches only using git-send-email. Send patches as attachments prevents people from being able to provide in-line review comments. Ok, sorry for the trouble. On 09/01/2013 12:30 PM, Rico Schüller wrote: Some driver/card combinations (r200/RV280, i915/915G) don't support OpenGL 2.1. These create in some corner cases an indirect context This was a typo on my part. The Linux ABI requires OpenGL 1.2, so every driver will support that. I meant to type uint32_t major_ver = 1; uint32_t minor_ver = 2; but instead typed uint32_t minor_ver = 1; uint32_t major_ver = 2; Copy and paste did the rest. :( So yes, we agree here, the version number needs to be fixed. The simplest one is to just change the number. I'm fine with it. I have no strong opinion about it. Though I think it should be consistent across all initialization occurrences (in dri_common.c/dri2_glx.c/drisw_glx.c). All of your other changes are, I think, unnecessary code motion. Yes, that's correct. It just removed the duplicated code and initialized all values in one location. As it seems to be unnecessary to initialize some variables in e.g. src/glx/dri2_glx.c:dri2_create_context_attribs only for the case when num_attribs is 0 in src/glx/dri_common.c:dri2_convert_glx_attribs. Hell, in all other cases they are overwritten (in dri_common.c) later anyway. So why bother to initialize it in dri2_glx.c/drisw_glx.c at all? It probably should be put into separate patch (if at all)... Does making that one change in dri2_glx.c and drisw_glx.c fix the problem? So imho, the only change really needed to fix this, is setting the version to the correct value (minor_ver = 2(or 0), major_ver = 1). The other changes are just cosmetic. I tested this with the environment variable e.g. MESA_GL_VERSION_OVERRIDE=1.3 (and the attachment http://bugs.winehq.org/attachment.cgi?id=45801 from wine bug 34238). The tests with real hardware weren't run, yet (as I don't have such hardware). Cheers Rico instead of a direct context when calling glXCreateContextAttribsARB(). This happens because of a bad default value. To avoid this, choose a more sane default OpenGL 1.0 as mentioned in the ARB_create_context spec: The default values for GLX_CONTEXT_MAJOR_VERSION_ARB and GLX_CONTEXT_MINOR_VERSION_ARB are 1 and 0 respectively. In this case, implementations will typically return the most recent version of OpenGL they support which is backwards compatible with OpenGL 1.0 (e.g. 3.0, 3.1 + GL_ARB_compatibility, or 3.2 compatibility profile) This fixes: http://bugs.winehq.org/show_bug.cgi?id=34238 Signed-off-by: Rico Schüller kgbric...@web.de --- src/glx/dri2_glx.c | 10 +- src/glx/dri_common.c | 14 +++--- src/glx/drisw_glx.c | 10 +- 3 files changed, 17 insertions(+), 17 deletions(-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/11] Transform Feedback for Radeon SI
This series implements transform feedback for Radeon SI, which also enables OpenGL 3.0. It requires the LLVM patch I sent yesterday. Transform feedback is disabled by default on CIK, because my card is very unstable with current kernel DRM. It should work though. Please review. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] radeon: don't emit VGT_STRMOUT_BUFFER_BASE on SI
The register doesn't exist on SI. --- src/gallium/drivers/radeon/r600_streamout.c | 85 ++--- 1 file changed, 54 insertions(+), 31 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_streamout.c b/src/gallium/drivers/radeon/r600_streamout.c index ab40630..313d737 100644 --- a/src/gallium/drivers/radeon/r600_streamout.c +++ b/src/gallium/drivers/radeon/r600_streamout.c @@ -74,23 +74,35 @@ static void r600_so_target_destroy(struct pipe_context *ctx, void r600_streamout_buffers_dirty(struct r600_common_context *rctx) { + struct r600_atom *begin = rctx-streamout.begin_atom; + unsigned num_bufs = util_bitcount(rctx-streamout.enabled_mask); + unsigned num_bufs_appended = util_bitcount(rctx-streamout.enabled_mask + rctx-streamout.append_bitmask); + rctx-streamout.num_dw_for_end = 12 + /* flush_vgt_streamout */ - util_bitcount(rctx-streamout.enabled_mask) * 8 + /* STRMOUT_BUFFER_UPDATE */ + num_bufs * 8 + /* STRMOUT_BUFFER_UPDATE */ 3 /* set_streamout_enable(0) */; - rctx-streamout.begin_atom.num_dw = - 12 + /* flush_vgt_streamout */ - 6 + /* set_streamout_enable */ - util_bitcount(rctx-streamout.enabled_mask) * 7 + /* SET_CONTEXT_REG */ - (rctx-family = CHIP_RS780 -rctx-family = CHIP_RV740 ? util_bitcount(rctx-streamout.enabled_mask) * 5 : 0) + /* STRMOUT_BASE_UPDATE */ - util_bitcount(rctx-streamout.enabled_mask rctx-streamout.append_bitmask) * 8 + /* STRMOUT_BUFFER_UPDATE */ - util_bitcount(rctx-streamout.enabled_mask ~rctx-streamout.append_bitmask) * 6 + /* STRMOUT_BUFFER_UPDATE */ + begin-num_dw = 12 + /* flush_vgt_streamout */ + 6; /* set_streamout_enable */ + + if (rctx-chip_class = SI) { + begin-num_dw += num_bufs * 4; /* SET_CONTEXT_REG */ + } else { + begin-num_dw += num_bufs * 7; /* SET_CONTEXT_REG */ + + if (rctx-family = CHIP_RS780 rctx-family = CHIP_RV740) + begin-num_dw += num_bufs * 5; /* STRMOUT_BASE_UPDATE */ + } + + begin-num_dw += + num_bufs_appended * 8 + /* STRMOUT_BUFFER_UPDATE */ + (num_bufs - num_bufs_appended) * 6 + /* STRMOUT_BUFFER_UPDATE */ (rctx-family CHIP_R600 rctx-family CHIP_RS780 ? 2 : 0) + /* SURFACE_BASE_UPDATE */ rctx-streamout.num_dw_for_end; - rctx-streamout.begin_atom.dirty = true; + begin-dirty = true; } void r600_set_streamout_targets(struct pipe_context *ctx, @@ -209,7 +221,6 @@ static void r600_emit_streamout_begin(struct r600_common_context *rctx, struct r struct r600_so_target **t = rctx-streamout.targets; unsigned *stride_in_dw = rctx-streamout.stride_in_dw; unsigned i, update_flags = 0; - uint64_t va; if (rctx-chip_class = EVERGREEN) { evergreen_flush_vgt_streamout(rctx); @@ -225,34 +236,46 @@ static void r600_emit_streamout_begin(struct r600_common_context *rctx, struct r t[i]-stride_in_dw = stride_in_dw[i]; - va = r600_resource_va(rctx-b.screen, - (void*)t[i]-b.buffer); - - update_flags |= SURFACE_BASE_UPDATE_STRMOUT(i); - - r600_write_context_reg_seq(cs, R_028AD0_VGT_STRMOUT_BUFFER_SIZE_0 + 16*i, 3); - radeon_emit(cs, (t[i]-b.buffer_offset + -t[i]-b.buffer_size) 2);/* BUFFER_SIZE (in DW) */ - radeon_emit(cs, stride_in_dw[i]); /* VTX_STRIDE (in DW) */ - radeon_emit(cs, va 8); /* BUFFER_BASE */ + if (rctx-chip_class = SI) { + /* SI binds streamout buffers as shader resources. +* VGT only counts primitives and tells the shader +* through SGPRs what to do. */ + r600_write_context_reg_seq(cs, R_028AD0_VGT_STRMOUT_BUFFER_SIZE_0 + 16*i, 2); + radeon_emit(cs, (t[i]-b.buffer_offset + +t[i]-b.buffer_size) 2);/* BUFFER_SIZE (in DW) */ + radeon_emit(cs, stride_in_dw[i]); /* VTX_STRIDE (in DW) */ + } else { + uint64_t va = r600_resource_va(rctx-b.screen, + (void*)t[i]-b.buffer); - r600_emit_reloc(rctx, rctx-rings.gfx, r600_resource(t[i]-b.buffer), - RADEON_USAGE_WRITE); + update_flags |= SURFACE_BASE_UPDATE_STRMOUT(i); - /* R7xx requires this packet after updating BUFFER_BASE. -* Without this, R7xx locks up. */ - if
[Mesa-dev] [PATCH 02/11] radeon: don't emit streamout state if there are no streamout buffers
This could happen if set_stream_output_targets is called twice in a row without a draw call in between. --- src/gallium/drivers/radeon/r600_streamout.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/drivers/radeon/r600_streamout.c b/src/gallium/drivers/radeon/r600_streamout.c index 313d737..18f7d88 100644 --- a/src/gallium/drivers/radeon/r600_streamout.c +++ b/src/gallium/drivers/radeon/r600_streamout.c @@ -137,6 +137,8 @@ void r600_set_streamout_targets(struct pipe_context *ctx, if (num_targets) { r600_streamout_buffers_dirty(rctx); + } else { + rctx-streamout.begin_atom.dirty = false; } } -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/11] radeonsi: integrate shared streamout state
--- src/gallium/drivers/radeonsi/r600_blit.c | 4 ++-- src/gallium/drivers/radeonsi/r600_hw_context.c | 26 +++--- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 2 ++ src/gallium/drivers/radeonsi/radeonsi_pipe.h | 9 + src/gallium/drivers/radeonsi/radeonsi_shader.h | 1 - src/gallium/drivers/radeonsi/si_state.c| 6 -- src/gallium/drivers/radeonsi/si_state_draw.c | 10 -- 7 files changed, 20 insertions(+), 38 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600_blit.c b/src/gallium/drivers/radeonsi/r600_blit.c index 20c1767..9d7c738 100644 --- a/src/gallium/drivers/radeonsi/r600_blit.c +++ b/src/gallium/drivers/radeonsi/r600_blit.c @@ -64,8 +64,8 @@ static void r600_blitter_begin(struct pipe_context *ctx, enum r600_blitter_op op util_blitter_save_viewport(rctx-blitter, rctx-queued.named.viewport-viewport); } util_blitter_save_vertex_buffer_slot(rctx-blitter, rctx-vertex_buffer); - util_blitter_save_so_targets(rctx-blitter, rctx-num_so_targets, -(struct pipe_stream_output_target**)rctx-so_targets); + util_blitter_save_so_targets(rctx-blitter, rctx-b.streamout.num_targets, +(struct pipe_stream_output_target**)rctx-b.streamout.targets); if (op R600_SAVE_FRAMEBUFFER) util_blitter_save_framebuffer(rctx-blitter, rctx-framebuffer); diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index db622ba..1a2128e 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -149,7 +149,9 @@ void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, num_dw += ctx-num_cs_dw_nontimer_queries_suspend; /* Count in streamout_end at the end of CS. */ - num_dw += ctx-num_cs_dw_streamout_end; + if (ctx-b.streamout.begin_emitted) { + num_dw += ctx-b.streamout.num_dw_for_end; + } /* Count in render_condition(NULL) at the end of CS. */ if (ctx-predicate_drawing) { @@ -179,10 +181,6 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) struct radeon_winsys_cs *cs = ctx-b.rings.gfx.cs; bool queries_suspended = false; -#if 0 - bool streamout_suspended = false; -#endif - if (!cs-cdw) return; @@ -192,12 +190,12 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) queries_suspended = true; } -#if 0 - if (ctx-num_cs_dw_streamout_end) { - r600_context_streamout_end(ctx); - streamout_suspended = true; + ctx-b.streamout.suspended = false; + + if (ctx-b.streamout.begin_emitted) { + r600_emit_streamout_end(ctx-b); + ctx-b.streamout.suspended = true; } -#endif ctx-b.flags |= R600_CONTEXT_FLUSH_AND_INV_CB | R600_CONTEXT_FLUSH_AND_INV_CB_META | @@ -263,12 +261,10 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) si_pm4_emit(ctx, ctx-queued.named.init); ctx-emitted.named.init = ctx-queued.named.init; -#if 0 - if (streamout_suspended) { - ctx-streamout_start = TRUE; - ctx-streamout_append_bitmask = ~0; + if (ctx-b.streamout.suspended) { + ctx-b.streamout.append_bitmask = ctx-b.streamout.enabled_mask; + r600_streamout_buffers_dirty(ctx-b); } -#endif /* resume queries */ if (queries_suspended) { diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c b/src/gallium/drivers/radeonsi/radeonsi_pipe.c index 6ca138f..993f30a 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c @@ -248,6 +248,8 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, void rctx-cache_flush = si_atom_cache_flush; rctx-atoms.cache_flush = rctx-cache_flush; + rctx-atoms.streamout_begin = rctx-b.streamout.begin_atom; + switch (rctx-b.chip_class) { case SI: case CIK: diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h b/src/gallium/drivers/radeonsi/radeonsi_pipe.h index 61fdfe2..ed17f2c 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h @@ -137,6 +137,7 @@ struct r600_context { /* Caches must be flushed after resource descriptors are * updated in memory. */ struct r600_atom *cache_flush; + struct r600_atom *streamout_begin; }; struct r600_atom *array[0]; } atoms; @@ -179,19 +180,11 @@ struct r600_context { /* The list of active queries. Only one query of each type can be active. */ struct
[Mesa-dev] [PATCH 04/11] radeonsi: initialize the first CS like any other
So that the init state is always emitted first and not later in draw_vbo. This fixes streamout where the init state, which disables streamout, was emitted in draw_vbo after streamout was enabled. --- src/gallium/drivers/radeonsi/r600.h| 1 + src/gallium/drivers/radeonsi/r600_hw_context.c | 11 --- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 5 +++-- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 2 ++ src/gallium/drivers/radeonsi/radeonsi_pm4.c| 1 + 5 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600.h b/src/gallium/drivers/radeonsi/r600.h index a914ce2..4b43169 100644 --- a/src/gallium/drivers/radeonsi/r600.h +++ b/src/gallium/drivers/radeonsi/r600.h @@ -74,6 +74,7 @@ struct r600_screen; void si_get_backend_mask(struct r600_context *ctx); void si_context_flush(struct r600_context *ctx, unsigned flags); +void si_begin_new_cs(struct r600_context *ctx); struct r600_query *r600_context_query_create(struct r600_context *ctx, unsigned query_type); void r600_context_query_destroy(struct r600_context *ctx, struct r600_query *query); diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index 1a2128e..c8fa66c 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -179,15 +179,15 @@ void si_need_cs_space(struct r600_context *ctx, unsigned num_dw, void si_context_flush(struct r600_context *ctx, unsigned flags) { struct radeon_winsys_cs *cs = ctx-b.rings.gfx.cs; - bool queries_suspended = false; if (!cs-cdw) return; /* suspend queries */ + ctx-nontimer_queries_suspended = false; if (ctx-num_cs_dw_nontimer_queries_suspend) { r600_context_queries_suspend(ctx); - queries_suspended = true; + ctx-nontimer_queries_suspended = true; } ctx-b.streamout.suspended = false; @@ -245,6 +245,11 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) } #endif + si_begin_new_cs(ctx); +} + +void si_begin_new_cs(struct r600_context *ctx) +{ ctx-pm4_dirty_cdwords = 0; /* Flush read caches at the beginning of CS. */ @@ -267,7 +272,7 @@ void si_context_flush(struct r600_context *ctx, unsigned flags) } /* resume queries */ - if (queries_suspended) { + if (ctx-nontimer_queries_suspended) { r600_context_queries_resume(ctx); } diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c b/src/gallium/drivers/radeonsi/radeonsi_pipe.c index 993f30a..e219e36 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c @@ -279,14 +279,15 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, void if (rctx-blitter == NULL) goto fail; - si_get_backend_mask(rctx); /* this emits commands and must be last */ - rctx-dummy_pixel_shader = util_make_fragment_cloneinput_shader(rctx-b.b, 0, TGSI_SEMANTIC_GENERIC, TGSI_INTERPOLATE_CONSTANT); rctx-b.b.bind_fs_state(rctx-b.b, rctx-dummy_pixel_shader); + /* these must be last */ + si_begin_new_cs(rctx); + si_get_backend_mask(rctx); return rctx-b.b; fail: r600_destroy_context(rctx-b.b); diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h b/src/gallium/drivers/radeonsi/radeonsi_pipe.h index ed17f2c..c5059e8 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h @@ -180,6 +180,8 @@ struct r600_context { /* The list of active queries. Only one query of each type can be active. */ struct list_headactive_nontimer_query_list; unsignednum_cs_dw_nontimer_queries_suspend; + /* If queries have been suspended. */ + boolnontimer_queries_suspended; unsignedbackend_mask; unsignedmax_db; /* for OQ */ diff --git a/src/gallium/drivers/radeonsi/radeonsi_pm4.c b/src/gallium/drivers/radeonsi/radeonsi_pm4.c index 37a199d..eed0c47 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pm4.c +++ b/src/gallium/drivers/radeonsi/radeonsi_pm4.c @@ -242,6 +242,7 @@ void si_pm4_emit_dirty(struct r600_context *rctx) if (!state || rctx-emitted.array[i] == state) continue; + assert(state != rctx-queued.named.init); si_pm4_emit(rctx, state); rctx-emitted.array[i] = state; } -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/11] radeonsi: handle rasterizer_discard and set GS_OUT_PRIM_TYPE
--- src/gallium/drivers/radeonsi/si_state.c | 1 + src/gallium/drivers/radeonsi/si_state_draw.c | 28 +++- src/gallium/drivers/radeonsi/sid.h | 3 +++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index f409af4..650db4f 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -552,6 +552,7 @@ static void *si_create_rs_state(struct pipe_context *ctx, S_028810_PS_UCP_MODE(3) | S_028810_ZCLIP_NEAR_DISABLE(!state-depth_clip) | S_028810_ZCLIP_FAR_DISABLE(!state-depth_clip) | + S_028810_DX_RASTERIZATION_KILL(state-rasterizer_discard) | S_028810_DX_LINEAR_ATTR_CLIP_ENA(1); clip_rule = state-scissor ? 0x : 0x; diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 581d289..3529660 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -273,12 +273,36 @@ static unsigned si_conv_pipe_prim(unsigned pprim) return result; } +static unsigned r600_conv_prim_to_gs_out(unsigned mode) +{ + static const int prim_conv[] = { + [PIPE_PRIM_POINTS] = V_028A6C_OUTPRIM_TYPE_POINTLIST, + [PIPE_PRIM_LINES] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_LINE_LOOP] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_LINE_STRIP] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_TRIANGLES] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_TRIANGLE_STRIP] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_TRIANGLE_FAN]= V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_QUADS] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_QUAD_STRIP] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_POLYGON] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_LINES_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_LINE_STRIP_ADJACENCY]= V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_TRIANGLES_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY]= V_028A6C_OUTPRIM_TYPE_TRISTRIP + }; + assert(mode Elements(prim_conv)); + + return prim_conv[mode]; +} + static bool si_update_draw_info_state(struct r600_context *rctx, const struct pipe_draw_info *info) { struct si_pm4_state *pm4 = si_pm4_alloc_state(rctx); struct si_shader *vs = rctx-vs_shader-current-shader; unsigned prim = si_conv_pipe_prim(info-mode); + unsigned gs_out_prim = r600_conv_prim_to_gs_out(info-mode); unsigned ls_mask = 0; if (pm4 == NULL) @@ -291,8 +315,10 @@ static bool si_update_draw_info_state(struct r600_context *rctx, if (rctx-b.chip_class = CIK) si_pm4_set_reg(pm4, R_030908_VGT_PRIMITIVE_TYPE, prim); - else + else { si_pm4_set_reg(pm4, R_008958_VGT_PRIMITIVE_TYPE, prim); + si_pm4_set_reg(pm4, R_028A6C_VGT_GS_OUT_PRIM_TYPE, gs_out_prim); + } si_pm4_set_reg(pm4, R_028400_VGT_MAX_VTX_INDX, ~0); si_pm4_set_reg(pm4, R_028404_VGT_MIN_VTX_INDX, 0); si_pm4_set_reg(pm4, R_028408_VGT_INDX_OFFSET, diff --git a/src/gallium/drivers/radeonsi/sid.h b/src/gallium/drivers/radeonsi/sid.h index 7f3329c..c6688b3 100644 --- a/src/gallium/drivers/radeonsi/sid.h +++ b/src/gallium/drivers/radeonsi/sid.h @@ -7423,6 +7423,9 @@ #define S_028A6C_OUTPRIM_TYPE(x)(((x) 0x3F) 0) #define G_028A6C_OUTPRIM_TYPE(x)(((x) 0) 0x3F) #define C_028A6C_OUTPRIM_TYPE 0xFFC0 +#define V_028A6C_OUTPRIM_TYPE_POINTLIST0 +#define V_028A6C_OUTPRIM_TYPE_LINESTRIP1 +#define V_028A6C_OUTPRIM_TYPE_TRISTRIP 2 #define S_028A6C_OUTPRIM_TYPE_1(x) (((x) 0x3F) 8) #define G_028A6C_OUTPRIM_TYPE_1(x) (((x) 8) 0x3F) #define C_028A6C_OUTPRIM_TYPE_1 0xC0FF -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] radeonsi: bind streamout buffers to VGT and the vertex shader
--- src/gallium/drivers/radeonsi/radeonsi_pipe.h | 2 + src/gallium/drivers/radeonsi/radeonsi_shader.c | 1 + src/gallium/drivers/radeonsi/radeonsi_shader.h | 18 --- src/gallium/drivers/radeonsi/si_descriptors.c | 68 ++ 4 files changed, 81 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.h b/src/gallium/drivers/radeonsi/radeonsi_pipe.h index c5059e8..9306790 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.h +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.h @@ -134,6 +134,7 @@ struct r600_context { /* The order matters. */ struct r600_atom *const_buffers[SI_NUM_SHADERS]; struct r600_atom *sampler_views[SI_NUM_SHADERS]; + struct r600_atom *streamout_buffers; /* Caches must be flushed after resource descriptors are * updated in memory. */ struct r600_atom *cache_flush; @@ -164,6 +165,7 @@ struct r600_context { unsignedsprite_coord_enable; unsignedexport_16bpc; struct si_buffer_resources const_buffers[SI_NUM_SHADERS]; + struct si_buffer_resources streamout_buffers; struct r600_textures_info samplers[SI_NUM_SHADERS]; struct r600_resource*border_color_table; unsignedborder_color_offset; diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c b/src/gallium/drivers/radeonsi/radeonsi_shader.c index 77915ae..335cd79 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c @@ -1364,6 +1364,7 @@ static void create_function(struct si_shader_context *si_shader_ctx) switch (si_shader_ctx-type) { case TGSI_PROCESSOR_VERTEX: params[SI_PARAM_VERTEX_BUFFER] = params[SI_PARAM_CONST]; + params[SI_PARAM_SO_BUFFER] = params[SI_PARAM_CONST]; params[SI_PARAM_START_INSTANCE] = i32; last_sgpr = SI_PARAM_START_INSTANCE; params[SI_PARAM_VERTEX_ID] = i32; diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.h b/src/gallium/drivers/radeonsi/radeonsi_shader.h index ede8bde..64766c9 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.h +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.h @@ -34,10 +34,11 @@ #define SI_SGPR_CONST 0 #define SI_SGPR_SAMPLER2 #define SI_SGPR_RESOURCE 4 -#define SI_SGPR_VERTEX_BUFFER 6 -#define SI_SGPR_START_INSTANCE 8 +#define SI_SGPR_VERTEX_BUFFER 6 /* VS only */ +#define SI_SGPR_SO_BUFFER 8 /* VS only, stream-out */ +#define SI_SGPR_START_INSTANCE 10 /* VS only */ -#define SI_VS_NUM_USER_SGPR9 +#define SI_VS_NUM_USER_SGPR11 #define SI_PS_NUM_USER_SGPR6 /* LLVM function parameter indices */ @@ -47,11 +48,12 @@ /* VS only parameters */ #define SI_PARAM_VERTEX_BUFFER 3 -#define SI_PARAM_START_INSTANCE4 -#define SI_PARAM_VERTEX_ID 5 -#define SI_PARAM_DUMMY_0 6 -#define SI_PARAM_DUMMY_1 7 -#define SI_PARAM_INSTANCE_ID 8 +#define SI_PARAM_SO_BUFFER 4 +#define SI_PARAM_START_INSTANCE5 +#define SI_PARAM_VERTEX_ID 6 +#define SI_PARAM_DUMMY_0 7 +#define SI_PARAM_DUMMY_1 8 +#define SI_PARAM_INSTANCE_ID 9 /* PS only parameters */ #define SI_PARAM_PRIM_MASK 3 diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 5d85448..a8f8781 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -456,6 +456,67 @@ static void si_set_constant_buffer(struct pipe_context *ctx, uint shader, uint s si_update_descriptors(rctx, buffers-desc); } +/* STREAMOUT BUFFERS */ + +static void si_set_streamout_targets(struct pipe_context *ctx, +unsigned num_targets, +struct pipe_stream_output_target **targets, +unsigned append_bitmask) +{ + struct r600_context *rctx = (struct r600_context *)ctx; + struct si_buffer_resources *buffers = rctx-streamout_buffers; + unsigned old_num_targets = rctx-b.streamout.num_targets; + unsigned i; + + /* Streamout buffers must be bound in 2 places: +* 1) in VGT by setting the VGT_STRMOUT registers +* 2) as shader resources +*/ + + /* Set the VGT regs. */ + r600_set_streamout_targets(ctx, num_targets, targets, append_bitmask); + + /* Set the shader resources.*/ + for (i = 0; i num_targets; i++) { + if (targets[i]) { + struct pipe_resource *buffer = targets[i]-buffer; + uint64_t va = r600_resource_va(ctx-screen, buffer); + + /* Set
[Mesa-dev] [PATCH 07/11] radeonsi: implement streamout flush properly
--- src/gallium/drivers/radeonsi/si_state_draw.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 3529660..e65b0cf 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -649,10 +649,16 @@ void si_emit_cache_flush(struct r600_common_context *rctx, struct r600_atom *ato radeon_emit(cs, EVENT_TYPE(V_028A90_FLUSH_AND_INV_CB_META) | EVENT_INDEX(0)); } + if (rctx-flags R600_CONTEXT_STREAMOUT_FLUSH) { + /* Needed if streamout buffers are going to be used as a source. */ + radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0)); + radeon_emit(cs, EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) | EVENT_INDEX(4)); + } + rctx-flags = 0; } -const struct r600_atom si_atom_cache_flush = { si_emit_cache_flush, 9 }; /* number of CS dwords */ +const struct r600_atom si_atom_cache_flush = { si_emit_cache_flush, 11 }; /* number of CS dwords */ void si_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info *info) { -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] radeonsi: fix streamout queries
--- src/gallium/drivers/radeonsi/r600_hw_context.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/r600_hw_context.c b/src/gallium/drivers/radeonsi/r600_hw_context.c index c8fa66c..b6e7a0f 100644 --- a/src/gallium/drivers/radeonsi/r600_hw_context.c +++ b/src/gallium/drivers/radeonsi/r600_hw_context.c @@ -474,8 +474,8 @@ void r600_query_begin(struct r600_context *ctx, struct r600_query *query) case PIPE_QUERY_SO_OVERFLOW_PREDICATE: cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 2, 0); cs-buf[cs-cdw++] = EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | EVENT_INDEX(3); - cs-buf[cs-cdw++] = query-results_end; - cs-buf[cs-cdw++] = 0; + cs-buf[cs-cdw++] = va; + cs-buf[cs-cdw++] = (va 32UL) 0xFF; break; case PIPE_QUERY_TIME_ELAPSED: cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE_EOP, 4, 0); @@ -529,10 +529,11 @@ void r600_query_end(struct r600_context *ctx, struct r600_query *query) case PIPE_QUERY_PRIMITIVES_GENERATED: case PIPE_QUERY_SO_STATISTICS: case PIPE_QUERY_SO_OVERFLOW_PREDICATE: + va += query-results_end + query-result_size/2; cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 2, 0); cs-buf[cs-cdw++] = EVENT_TYPE(EVENT_TYPE_SAMPLE_STREAMOUTSTATS) | EVENT_INDEX(3); - cs-buf[cs-cdw++] = query-results_end + query-result_size/2; - cs-buf[cs-cdw++] = 0; + cs-buf[cs-cdw++] = va; + cs-buf[cs-cdw++] = (va 32UL) 0xFF; break; case PIPE_QUERY_TIME_ELAPSED: va += query-results_end + query-result_size/2; -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] radeonsi: implement glDrawTransformFeedback functionality
--- src/gallium/drivers/radeonsi/si_state.c | 1 + src/gallium/drivers/radeonsi/si_state_draw.c | 23 +++ src/gallium/drivers/radeonsi/sid.h | 6 ++ 3 files changed, 30 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 650db4f..e1b4e32 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -3241,6 +3241,7 @@ void si_init_config(struct r600_context *rctx) si_pm4_set_reg(pm4, R_028A40_VGT_GS_MODE, 0x0); si_pm4_set_reg(pm4, R_028A84_VGT_PRIMITIVEID_EN, 0x0); si_pm4_set_reg(pm4, R_028A8C_VGT_PRIMITIVEID_RESET, 0x0); + si_pm4_set_reg(pm4, R_028B28_VGT_STRMOUT_DRAW_OPAQUE_OFFSET, 0); si_pm4_set_reg(pm4, R_028B94_VGT_STRMOUT_CONFIG, 0x0); si_pm4_set_reg(pm4, R_028B98_VGT_STRMOUT_BUFFER_CONFIG, 0x0); si_pm4_set_reg(pm4, R_028AA8_IA_MULTI_VGT_PARAM, diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index e65b0cf..687410c 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -557,6 +557,29 @@ static void si_state_draw(struct r600_context *rctx, S_02800C_NOOP_CULL_DISABLE(1)); } + if (info-count_from_stream_output) { + struct r600_so_target *t = + (struct r600_so_target*)info-count_from_stream_output; + uint64_t va = r600_resource_va(rctx-screen-b.b, + t-buf_filled_size-b.b); + va += t-buf_filled_size_offset; + + si_pm4_set_reg(pm4, R_028B30_VGT_STRMOUT_DRAW_OPAQUE_VERTEX_STRIDE, + t-stride_in_dw); + + si_pm4_cmd_begin(pm4, PKT3_COPY_DATA); + si_pm4_cmd_add(pm4, + COPY_DATA_SRC_SEL(COPY_DATA_MEM) | + COPY_DATA_DST_SEL(COPY_DATA_REG) | + COPY_DATA_WR_CONFIRM); + si_pm4_cmd_add(pm4, va); /* src address lo */ + si_pm4_cmd_add(pm4, va 32UL); /* src address hi */ + si_pm4_cmd_add(pm4, R_028B2C_VGT_STRMOUT_DRAW_OPAQUE_BUFFER_FILLED_SIZE 2); + si_pm4_cmd_add(pm4, 0); /* unused */ + si_pm4_add_bo(pm4, t-buf_filled_size, RADEON_USAGE_READ); + si_pm4_cmd_end(pm4, true); + } + /* draw packet */ si_pm4_cmd_begin(pm4, PKT3_INDEX_TYPE); if (ib-index_size == 4) { diff --git a/src/gallium/drivers/radeonsi/sid.h b/src/gallium/drivers/radeonsi/sid.h index c6688b3..021f4eb 100644 --- a/src/gallium/drivers/radeonsi/sid.h +++ b/src/gallium/drivers/radeonsi/sid.h @@ -103,6 +103,12 @@ #defineWAIT_REG_MEM_EQUAL 3 #define PKT3_MEM_WRITE 0x3D /* not on CIK */ #define PKT3_INDIRECT_BUFFER 0x32 +#define PKT3_COPY_DATA0x40 +#defineCOPY_DATA_SRC_SEL(x)((x) 0xf) +#defineCOPY_DATA_REG 0 +#defineCOPY_DATA_MEM 1 +#defineCOPY_DATA_DST_SEL(x)(((x) 0xf) 8) +#defineCOPY_DATA_WR_CONFIRM(1 20) #define PKT3_SURFACE_SYNC 0x43 /* deprecated on CIK, use ACQUIRE_MEM */ #define PKT3_ME_INITIALIZE 0x44 /* not on CIK */ #define PKT3_COND_WRITE0x45 -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/11] radeonsi: enable streamout AKA transform feedback on SI
CIK is not enabled, because it's very unstable regardless of transform feedback. --- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_pipe.c b/src/gallium/drivers/radeonsi/radeonsi_pipe.c index e219e36..5220e41 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_pipe.c +++ b/src/gallium/drivers/radeonsi/radeonsi_pipe.c @@ -342,6 +342,7 @@ static const char* r600_get_name(struct pipe_screen* pscreen) static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) { struct r600_screen *rscreen = (struct r600_screen *)pscreen; + bool has_streamout = HAVE_LLVM = 0x0304 rscreen-b.chip_class == SI; switch (param) { /* Supported features (boolean caps). */ @@ -414,20 +415,13 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) return 0; /* Stream output. */ -#if 0 case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS: - return debug_get_bool_option(R600_STREAMOUT, FALSE) ? 4 : 0; + return has_streamout ? 4 : 0; case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME: - return debug_get_bool_option(R600_STREAMOUT, FALSE) ? 1 : 0; + return has_streamout ? 1 : 0; case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS: case PIPE_CAP_MAX_STREAM_OUTPUT_INTERLEAVED_COMPONENTS: - return 16*4; -#endif - case PIPE_CAP_MAX_STREAM_OUTPUT_BUFFERS: - case PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME: - case PIPE_CAP_MAX_STREAM_OUTPUT_SEPARATE_COMPONENTS: - case PIPE_CAP_MAX_STREAM_OUTPUT_INTERLEAVED_COMPONENTS: - return 0; + return has_streamout ? 32*4 : 0; /* Texturing. */ case PIPE_CAP_MAX_TEXTURE_2D_LEVELS: -- 1.8.1.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] radeonsi: implement streamout shader support
The shader is responsible for writing to streamout buffers using the TBUFFER_STORE_FORMAT_* instructions. The locations of some input SGPRs and VGPRs are assigned dynamically, because the input SGPRs controlling streamout are not declared if they are not needed, decreasing the indices of all following inputs. --- src/gallium/drivers/radeonsi/radeonsi_shader.c | 279 - src/gallium/drivers/radeonsi/radeonsi_shader.h | 5 +- src/gallium/drivers/radeonsi/si_state_draw.c | 7 +- 3 files changed, 276 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c b/src/gallium/drivers/radeonsi/radeonsi_shader.c index 335cd79..92f7cf5 100644 --- a/src/gallium/drivers/radeonsi/radeonsi_shader.c +++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c @@ -34,6 +34,7 @@ #include gallivm/lp_bld_logic.h #include gallivm/lp_bld_tgsi.h #include gallivm/lp_bld_arit.h +#include gallivm/lp_bld_flow.h #include radeon_llvm.h #include radeon_llvm_emit.h #include util/u_memory.h @@ -59,6 +60,11 @@ struct si_shader_context struct tgsi_token * tokens; struct si_pipe_shader *shader; unsigned type; /* TGSI_PROCESSOR_* specifies the type of shader. */ + int param_streamout_config; + int param_streamout_write_index; + int param_streamout_offset[4]; + int param_vertex_id; + int param_instance_id; LLVMValueRef const_md; LLVMValueRef const_resource; #if HAVE_LLVM = 0x0304 @@ -67,6 +73,7 @@ struct si_shader_context LLVMValueRef *constants; LLVMValueRef *resources; LLVMValueRef *samplers; + LLVMValueRef so_buffers[4]; }; static struct si_shader_context * si_shader_context( @@ -119,9 +126,12 @@ static LLVMValueRef get_instance_index( struct radeon_llvm_context * radeon_bld, unsigned divisor) { + struct si_shader_context *si_shader_ctx = + si_shader_context(radeon_bld-soa.bld_base); struct gallivm_state * gallivm = radeon_bld-soa.bld_base.base.gallivm; - LLVMValueRef result = LLVMGetParam(radeon_bld-main_fn, SI_PARAM_INSTANCE_ID); + LLVMValueRef result = LLVMGetParam(radeon_bld-main_fn, + si_shader_ctx-param_instance_id); result = LLVMBuildAdd(gallivm-builder, result, LLVMGetParam( radeon_bld-main_fn, SI_PARAM_START_INSTANCE), ); @@ -168,7 +178,8 @@ static void declare_input_vs( } else { /* Load the buffer index, which is always stored in VGPR0 * for Vertex Shaders */ - buffer_index = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, SI_PARAM_VERTEX_ID); + buffer_index = LLVMGetParam(si_shader_ctx-radeon_bld.main_fn, + si_shader_ctx-param_vertex_id); } vec4_type = LLVMVectorType(base-elem_type, 4); @@ -397,7 +408,8 @@ static void declare_system_value( unsigned index, const struct tgsi_full_declaration *decl) { - + struct si_shader_context *si_shader_ctx = + si_shader_context(radeon_bld-soa.bld_base); LLVMValueRef value = 0; switch (decl-Semantic.Name) { @@ -406,7 +418,8 @@ static void declare_system_value( break; case TGSI_SEMANTIC_VERTEXID: - value = LLVMGetParam(radeon_bld-main_fn, SI_PARAM_VERTEX_ID); + value = LLVMGetParam(radeon_bld-main_fn, +si_shader_ctx-param_vertex_id); break; default: @@ -651,6 +664,206 @@ static void si_llvm_emit_clipvertex(struct lp_build_tgsi_context * bld_base, } } +static void si_dump_streamout(struct pipe_stream_output_info *so) +{ + unsigned i; + + if (so-num_outputs) + fprintf(stderr, STREAMOUT\n); + + for (i = 0; i so-num_outputs; i++) { + unsigned mask = ((1 so-output[i].num_components) - 1) + so-output[i].start_component; + fprintf(stderr, %i: BUF%i[%i..%i] - OUT[%i].%s%s%s%s\n, + i, so-output[i].output_buffer, + so-output[i].dst_offset, so-output[i].dst_offset + so-output[i].num_components - 1, + so-output[i].register_index, + mask 1 ? x : , + mask 2 ? y : , + mask 4 ? z : , + mask 8 ? w : ); + } +} + +/* TBUFFER_STORE_FORMAT_{X,XY,XYZ,XYZW} - the suffix is selected by num_channels=1..4. + * The type of vdata must be one of i32 (num_channels=1), v2i32 (num_channels=2), + * or v4i32 (num_channels=3,4). */ +static void build_tbuffer_store(struct si_shader_context *shader, + LLVMValueRef rsrc, + LLVMValueRef vdata, + unsigned num_channels, +
Re: [Mesa-dev] [PATCH] i965: Actually check every primitive for cut index support.
On 2 September 2013 16:47, Ian Romanick i...@freedesktop.org wrote: On 09/01/2013 07:05 PM, Kenneth Graunke wrote: can_cut_index_handle_prims() was passed an array of _mesa_prim objects and a count, and ran a loop for that many iterations. However, it treated the array like a pointer, repeatedly checking the first element. Blarg. How would an application be able to observe the old bug behavior? Can we come up with a test case that might tickle it? We already had some discussion about this (see http://lists.freedesktop.org/archives/mesa-dev/2013-August/044129.html). Briefly, I wasn't able to come up with a test case that produced incorrect rendering, but I was able to produce a test case where prim[i] != prim[j]. Considering how difficult the VBO module is to understand, I think it's wise to err on the safe side and do what Ken has done in this patch. Patch is: Reviewed-by: Paul Berry stereotype...@gmail.com This patch makes it actually check every primitive. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Either way, the patch is obviously correct. Reviewed-by: Ian Romanick ian.d.roman...@intel.com --- src/mesa/drivers/dri/i965/brw_primitive_restart.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_primitive_restart.c b/src/mesa/drivers/dri/i965/brw_primitive_restart.c index 0dbc48f..b305dca 100644 --- a/src/mesa/drivers/dri/i965/brw_primitive_restart.c +++ b/src/mesa/drivers/dri/i965/brw_primitive_restart.c @@ -92,8 +92,8 @@ can_cut_index_handle_prims(struct gl_context *ctx, return false; } - for ( ; nr_prims 0; nr_prims--) { - switch(prim-mode) { + for (int i = 0; i nr_prims; i++) { + switch (prim[i].mode) { case GL_POINTS: case GL_LINES: case GL_LINE_STRIP: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/6] i965/fs: Optimize IF/MOV/ELSE/MOV/ENDIF to SEL when possible.
On Tue, Aug 6, 2013 at 3:24 PM, Christoph Bumiller e0425...@student.tuwien.ac.at wrote: On 06.08.2013 19:19, Matt Turner wrote: On Tue, Aug 6, 2013 at 4:14 AM, Christoph Bumiller e0425...@student.tuwien.ac.at wrote: On 06.08.2013 03:28, Kenneth Graunke wrote: Many GLSL shaders contain code of the form: x = condition ? foo : bar The compiler emits an ir_if tree for this, since each subexpression might be a complex tree that could have side-effects and short-circuit logic operations. However, the common case is to simply pick one of two constants or variable's values---which is exactly what SEL is for. Replacing IF/ELSE with SEL also simplifies the control flow graph, making optimization passes which work on basic blocks more effective. Don't you think something like that should be implemented in common code so that all drivers can profit ? We would love that. As part of an work in progress, I'm adding conditional-select to the GLSL IR. We planned a few months ago to do this as a step toward SSA at the IR level, but have only laid a little bit of groundwork in that direction (Ian's vector insert/extract series). Looks like your backend already does SSA. Shouldn't that be implemented in common code? :) Then the code would have to run on GLSL IR as well as my internal IR because the intermediate one, TGSI, shouldn't be in SSA form, and abstracting an IR doesn't sound particularly fun. btw, I'd *love* an option to get TGSI in SSA form (or at least a form easier to turn back into SSA).. it is starting to look like doing anything vaguely clever w/ freedreno compiler will require essentially turning TGSI into SSA, and I guess other drivers will need the same. (Tegra will for sure, for if/else/endif in frag shader. But I guess it would be useful to others.) I guess add TGSI_OPCODE_PHI plus maybe some hint or instruction to indicate when a register is no longer used (maybe not needed, but otherwise maybe for large programs tgsi_{src,dst}_register.Index might overflow?). BR, -R Also I don't have to handle vectors so it's a bit simpler, actually pretty straightforward if you implement an existing algorithm. As for some other passes that could be shared, I still need them in the backend to be applied to device-specifc code sequences, you probably have a similar situation. It would be really nice to have more, useful device-independent optimizations or simplifications like this already done instead of requiring each driver to re-implement them (or use llvm). Yes, it definitely would. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] glsl: Remove unused prog parameter from tfeedback_decl::init
On 30 August 2013 16:07, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com It looks like commit 53febac removed the last user of that parameter. Signed-off-by: Ian Romanick ian.d.roman...@intel.com Cc: Paul Berry stereotype...@gmail.com Patches 2-4 are: Reviewed-by: Paul Berry stereotype...@gmail.com --- src/glsl/link_varyings.cpp | 6 +++--- src/glsl/link_varyings.h | 3 +-- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 7a61b1a..c3b5855 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -236,8 +236,8 @@ cross_validate_outputs_to_inputs(struct gl_shader_program *prog, * will fail to find any matching variable. */ void -tfeedback_decl::init(struct gl_context *ctx, struct gl_shader_program *prog, - const void *mem_ctx, const char *input) +tfeedback_decl::init(struct gl_context *ctx, const void *mem_ctx, + const char *input) { /* We don't have to be pedantic about what is a valid GLSL variable name, * because any variable with an invalid name can't exist in the IR anyway. @@ -507,7 +507,7 @@ parse_tfeedback_decls(struct gl_context *ctx, struct gl_shader_program *prog, char **varying_names, tfeedback_decl *decls) { for (unsigned i = 0; i num_names; ++i) { - decls[i].init(ctx, prog, mem_ctx, varying_names[i]); + decls[i].init(ctx, mem_ctx, varying_names[i]); if (!decls[i].is_varying()) continue; diff --git a/src/glsl/link_varyings.h b/src/glsl/link_varyings.h index 302ab5c..6264ef0 100644 --- a/src/glsl/link_varyings.h +++ b/src/glsl/link_varyings.h @@ -91,8 +91,7 @@ struct tfeedback_candidate class tfeedback_decl { public: - void init(struct gl_context *ctx, struct gl_shader_program *prog, - const void *mem_ctx, const char *input); + void init(struct gl_context *ctx, const void *mem_ctx, const char *input); static bool is_same(const tfeedback_decl x, const tfeedback_decl y); bool assign_location(struct gl_context *ctx, struct gl_shader_program *prog); -- 1.8.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/gen7.5: Fix lower bound on number of VS URB entries.
On 08/31/2013 10:34 PM, Paul Berry wrote: Haswell GT2 and GT3 require the number of vertex shader URB entries to be at least 64, not 32. At the moment, we always meet this requirement automatically, because in the absence of a geometry shader, we assign all available URB space to the vertex shader. But when we turn on support for geometry shaders, this lower limit will become important. --- src/mesa/drivers/dri/i965/brw_context.c | 7 +++ src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/gen6_urb.c| 2 +- src/mesa/drivers/dri/i965/gen7_urb.c| 7 --- 4 files changed, 13 insertions(+), 4 deletions(-) Reviewed-by: Chad Versace chad.vers...@linux.intel.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] glsl: Refactor a bunch of the code out of cross_validate_outputs_to_inputs
On 30 August 2013 16:07, Ian Romanick i...@freedesktop.org wrote: From: Ian Romanick ian.d.roman...@intel.com The new function, cross_validate_types_and_qualifiers, will have multiple callers from this file in future commits. Signed-off-by: Ian Romanick ian.d.roman...@intel.com --- src/glsl/link_varyings.cpp | 171 + 1 file changed, 94 insertions(+), 77 deletions(-) diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 4ceb1d3..a1899f7 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -41,6 +41,97 @@ /** + * Validate the types and qualifiers of an output from one stage against the + * matching input to another stage. + */ +static void +cross_validate_types_and_qualifiers(struct gl_shader_program *prog, +const ir_variable *input, +const ir_variable *output, +GLenum consumer_type, +const char *consumer_stage, +const char *producer_stage) It seems redundant to pass both consumer_type and consumer_stage as arguments, since the latter is just _mesa_glsl_shader_target_name(consumer_type). You might want to just pass consumer_type and producer_type, and use _mesa_glsl_shader_target_name() to convert them to strings in the event of an error. However, it's extra bookkeeping work to do that, so I'm ambivalent about it. Either way, Reviewed-by: Paul Berry stereotype...@gmail.com +{ + /* Check that the types match between stages. +*/ + const glsl_type *type_to_match = input-type; + if (consumer_type == GL_GEOMETRY_SHADER) { + assert(type_to_match-is_array()); /* Enforced by ast_to_hir */ + type_to_match = type_to_match-element_type(); + } + if (type_to_match != output-type) { + /* There is a bit of a special case for gl_TexCoord. This + * built-in is unsized by default. Applications that variable + * access it must redeclare it with a size. There is some + * language in the GLSL spec that implies the fragment shader + * and vertex shader do not have to agree on this size. Other + * driver behave this way, and one or two applications seem to + * rely on it. + * + * Neither declaration needs to be modified here because the array + * sizes are fixed later when update_array_sizes is called. + * + * From page 48 (page 54 of the PDF) of the GLSL 1.10 spec: + * + * Unlike user-defined varying variables, the built-in + * varying variables don't have a strict one-to-one + * correspondence between the vertex language and the + * fragment language. + */ + if (!output-type-is_array() + || (strncmp(gl_, output-name, 3) != 0)) { + linker_error(prog, + %s shader output `%s' declared as type `%s', + but %s shader input declared as type `%s'\n, + producer_stage, output-name, + output-type-name, + consumer_stage, input-type-name); + return; + } + } + + /* Check that all of the qualifiers match between stages. +*/ + if (input-centroid != output-centroid) { + linker_error(prog, + %s shader output `%s' %s centroid qualifier, + but %s shader input %s centroid qualifier\n, + producer_stage, + output-name, + (output-centroid) ? has : lacks, + consumer_stage, + (input-centroid) ? has : lacks); + return; + } + + if (input-invariant != output-invariant) { + linker_error(prog, + %s shader output `%s' %s invariant qualifier, + but %s shader input %s invariant qualifier\n, + producer_stage, + output-name, + (output-invariant) ? has : lacks, + consumer_stage, + (input-invariant) ? has : lacks); + return; + } + + if (input-interpolation != output-interpolation) { + linker_error(prog, + %s shader output `%s' specifies %s + interpolation qualifier, + but %s shader input specifies %s + interpolation qualifier\n, + producer_stage, + output-name, + output-interpolation_string(), + consumer_stage, + input-interpolation_string()); + return; + } +} + +/** * Validate that outputs from one stage match inputs of another */ void @@ -81,83 +172,9 @@ cross_validate_outputs_to_inputs(struct gl_shader_program *prog,
[Mesa-dev] [PATCH] gallivm: support indirect registers on both dimensions
We support indirect addressing only on the vertex index, but some shaders also use indirect addressing on attributes. This patch adds support for indirect addressing on both dimensions inside gs arrays. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_llvm.c | 23 +-- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h | 3 ++- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 4 +++- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 820d6b0..03668d9 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -1360,8 +1360,9 @@ clipmask_booli32(struct gallivm_state *gallivm, static LLVMValueRef draw_gs_llvm_fetch_input(const struct lp_build_tgsi_gs_iface *gs_iface, struct lp_build_tgsi_context * bld_base, - boolean is_indirect, + boolean is_vindex_indirect, LLVMValueRef vertex_index, + boolean is_aindex_indirect, LLVMValueRef attrib_index, LLVMValueRef swizzle_index) { @@ -1372,18 +1373,28 @@ draw_gs_llvm_fetch_input(const struct lp_build_tgsi_gs_iface *gs_iface, LLVMValueRef res; struct lp_type type = bld_base-base.type; - if (is_indirect) { + if (is_vindex_indirect || is_aindex_indirect) { int i; res = bld_base-base.zero; for (i = 0; i type.length; ++i) { LLVMValueRef idx = lp_build_const_int32(gallivm, i); - LLVMValueRef vert_chan_index = LLVMBuildExtractElement(builder, -vertex_index, idx, ); + LLVMValueRef vert_chan_index = vertex_index; + LLVMValueRef attr_chan_index = attrib_index; LLVMValueRef channel_vec, value; + + if (is_vindex_indirect) { +vert_chan_index = LLVMBuildExtractElement(builder, + vertex_index, idx, ); + } + if (is_aindex_indirect) { +attr_chan_index = LLVMBuildExtractElement(builder, + attrib_index, idx, ); + } + indices[0] = vert_chan_index; - indices[1] = attrib_index; + indices[1] = attr_chan_index; indices[2] = swizzle_index; - + channel_vec = LLVMBuildGEP(builder, gs-input, indices, 3, ); channel_vec = LLVMBuildLoad(builder, channel_vec, ); value = LLVMBuildExtractElement(builder, channel_vec, idx, ); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h index 522302e..8bcdbc8 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.h @@ -395,8 +395,9 @@ struct lp_build_tgsi_gs_iface { LLVMValueRef (*fetch_input)(const struct lp_build_tgsi_gs_iface *gs_iface, struct lp_build_tgsi_context * bld_base, - boolean is_indirect, + boolean is_vindex_indirect, LLVMValueRef vertex_index, + boolean is_aindex_indirect, LLVMValueRef attrib_index, LLVMValueRef swizzle_index); void (*emit_vertex)(const struct lp_build_tgsi_gs_iface *gs_iface, diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c index 4c6b6ec..e50f1d1 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c @@ -1135,7 +1135,9 @@ emit_fetch_gs_input( res = bld-gs_iface-fetch_input(bld-gs_iface, bld_base, reg-Dimension.Indirect, -vertex_index, attrib_index, +vertex_index, +reg-Register.Indirect, +attrib_index, swizzle_index); assert(res); -- 1.8.3.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallivm: support indirect registers on both dimensions
On 09/03/2013 11:50 AM, Zack Rusin wrote: We support indirect addressing only on the vertex index, but some shaders also use indirect addressing on attributes. This patch adds support for indirect addressing on both dimensions inside gs arrays. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_llvm.c | 23 +-- src/gallium/auxiliary/gallivm/lp_bld_tgsi.h | 3 ++- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 4 +++- 3 files changed, 22 insertions(+), 8 deletions(-) Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT
On Thu, Aug 15, 2013 at 10:39:31PM +0200, Vedran Rodic wrote: We do have the set_caching ioctl. It's enough to flip the PTEs to UC and let MOCS manage things. I actually did a few experiments on my IVB. I made all Mesa's buffers UC via PTEs by patching libdrm to change the cache mode of each bo after allocation. Then I fiddled with the MOCS LLC bits in various ways. It definitely has an effect, sometimes making things slower, sometimes faster. xonotic again seemed to benefit. IIRC leaving everything LLC uncached was actually the fastest (w/ high quality at least) so we may be thrashing the LLC a bit there. But eg. reaction quake regressed quite a lot if most things were left as UC. Can you share the libdrm patch? Sorry, forgot to reply. Here's the patch if you're still interested. From 47f51b19137603dccaa4fcb2a703d56335c292fe Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= ville.syrj...@linux.intel.com Date: Wed, 14 Aug 2013 15:12:29 +0300 Subject: [PATCH] make bos uncached in PTEs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- intel/intel_bufmgr_gem.c | 60 ++-- 1 file changed, 53 insertions(+), 7 deletions(-) diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c index f98f7a7..32ff260 100644 --- a/intel/intel_bufmgr_gem.c +++ b/intel/intel_bufmgr_gem.c @@ -243,6 +243,10 @@ drm_intel_gem_bo_get_tiling(drm_intel_bo *bo, uint32_t * tiling_mode, uint32_t * swizzle_mode); static int +drm_intel_gem_bo_set_caching_internal(drm_intel_bo *bo, + uint32_t cache_mode); + +static int drm_intel_gem_bo_set_tiling_internal(drm_intel_bo *bo, uint32_t tiling_mode, uint32_t stride); @@ -695,6 +699,7 @@ retry: drm_intel_gem_bo_free(bo_gem-bo); goto retry; } + } } pthread_mutex_unlock(bufmgr_gem-lock); @@ -761,9 +766,16 @@ drm_intel_gem_bo_alloc_for_render(drm_intel_bufmgr *bufmgr, unsigned long size, unsigned int alignment) { - return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, - BO_ALLOC_FOR_RENDER, - I915_TILING_NONE, 0); + drm_intel_bo *bo; + + bo = drm_intel_gem_bo_alloc_internal(bufmgr, name, size, +BO_ALLOC_FOR_RENDER, +I915_TILING_NONE, 0); + + if (bo) + drm_intel_gem_bo_set_caching_internal(bo, I915_CACHEING_NONE); + + return bo; } static drm_intel_bo * @@ -772,8 +784,15 @@ drm_intel_gem_bo_alloc(drm_intel_bufmgr *bufmgr, unsigned long size, unsigned int alignment) { - return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, 0, - I915_TILING_NONE, 0); + drm_intel_bo *bo; + + bo = drm_intel_gem_bo_alloc_internal(bufmgr, name, size, 0, +I915_TILING_NONE, 0); + + if (bo) + drm_intel_gem_bo_set_caching_internal(bo, I915_CACHEING_CACHED); + + return bo; } static drm_intel_bo * @@ -784,6 +803,7 @@ drm_intel_gem_bo_alloc_tiled(drm_intel_bufmgr *bufmgr, const char *name, drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *)bufmgr; unsigned long size, stride; uint32_t tiling; + drm_intel_bo *bo; do { unsigned long aligned_y, height_alignment; @@ -824,8 +844,13 @@ drm_intel_gem_bo_alloc_tiled(drm_intel_bufmgr *bufmgr, const char *name, if (tiling == I915_TILING_NONE) stride = 0; - return drm_intel_gem_bo_alloc_internal(bufmgr, name, size, flags, - tiling, stride); + bo = drm_intel_gem_bo_alloc_internal(bufmgr, name, size, flags, +tiling, stride); + + if (bo) + drm_intel_gem_bo_set_caching_internal(bo, I915_CACHEING_NONE); + + return bo; } /** @@ -2363,6 +2388,27 @@ drm_intel_gem_bo_unpin(drm_intel_bo *bo) } static int +drm_intel_gem_bo_set_caching_internal(drm_intel_bo *bo, + uint32_t cache_mode) +{ + drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo-bufmgr; + drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo; + struct drm_i915_gem_cacheing set_caching; + int ret; + + memset(set_caching, 0, sizeof(set_caching)); + + set_caching.handle = bo_gem-gem_handle; +
Re: [Mesa-dev] [PATCH] swrast: Fix crash in sPriv-swrast_loader-getImage().
On 09/01/2013 08:19 AM, Johannes Obermayr wrote: From: Egbert Eich e...@freedesktop.org When glXBindTexImageEXT is called and SWrast is used there will be a crash when sPriv-swrast_loader-getImage() is called from swrastSetTexBuffer2(). Reason: no memory has been allocated for the destination thus texImage-Data is NULL. Call ctx-Driver.TexImage2D() to initialize this. If memory has been allocated in a previous call free it first. Fixes: https://bugzilla.novell.com/show_bug.cgi?id=641297 Signed-off-by: Egbert Eich e...@freedesktop.org Adapted-by: Stefan Dirsch sndir...@suse.com Adapted-by: Tobias Johannes Klausmann tobias.johannes.klausm...@mni.thm.de --- I am not sure whether this patch from Q3/2010 is required these days. But openSUSE still applies it ... If yes it should also land in 9.2, 9.1, 9.0 and 8.0 branches. --- src/mesa/drivers/dri/swrast/swrast.c | 8 1 file changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/swrast/swrast.c b/src/mesa/drivers/dri/swrast/swrast.c index 332c7b7..b1c67a9 100644 --- a/src/mesa/drivers/dri/swrast/swrast.c +++ b/src/mesa/drivers/dri/swrast/swrast.c @@ -67,6 +67,7 @@ static void swrastSetTexBuffer2(__DRIcontext *pDRICtx, GLint target, GLint texture_format, __DRIdrawable *dPriv) { +GET_CURRENT_CONTEXT(ctx); The current gl_context can be found at dri_ctx-Base as seen below in several places. struct dri_context *dri_ctx; int x, y, w, h; __DRIscreen *sPriv = dPriv-driScreenPriv; @@ -98,6 +99,13 @@ static void swrastSetTexBuffer2(__DRIcontext *pDRICtx, GLint target, _mesa_init_teximage_fields(dri_ctx-Base, texImage, w, h, 1, 0, internalFormat, texFormat); +if (texImage-Data) This won't work on master/9.2/9.1 (at least) since gl_texture_image doesn't have a 'Data' field. + ctx-Driver.FreeTexImageData(ctx, texImage); There's no such driver hook in recent Mesa. + +ctx-Driver.TexImage2D(ctx, target, 0, internalFormat, + w, h, 0, texture_format, GL_UNSIGNED_INT_8_8_8_8, + NULL, ctx-Unpack, texObj, texImage); If you just need to allocate texture memory you should probably call ctx-Driver.AllocTextureImageBuffer(). + sPriv-swrast_loader-getImage(dPriv, x, y, w, h, (char *)swImage-Buffer, dPriv-loaderPrivate); -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/gs: Don't assign gl_Layer its own slot in the VUE map.
--- src/mesa/drivers/dri/i965/brw_vs.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index b81a538..7c7493f 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -64,6 +64,11 @@ brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map, vue_map-slots_valid = slots_valid; int i; + /* gl_Layer doesn't get its own varying slot--it's stored in the virst VUE +* slot (VARYING_SLOT_PSIZ). +*/ + slots_valid = ~VARYING_BIT_LAYER; + /* Make sure that the values we store in vue_map-varying_to_slot and * vue_map-slot_to_varying won't overflow the signed chars that are used * to store them. Note that since vue_map-slot_to_varying sometimes holds -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls
On Wed, Aug 28, 2013 at 1:10 PM, Dominik Behr db...@chromium.org wrote: Fixes a bug where if an uniform array is passed to a function the accesses to the array are not propagated so later all but the first vector of the uniform array are removed in parcel_out_uniform_storage resulting in broken shaders and out of bounds access to arrays in brw::vec4_visitor::pack_uniform_registers. Signed-off-by: Dominik Behr db...@chromium.org --- src/glsl/link_functions.cpp | 29 + 1 file changed, 29 insertions(+) diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp index 6b3e154..d935546 100644 --- a/src/glsl/link_functions.cpp +++ b/src/glsl/link_functions.cpp @@ -173,6 +173,35 @@ public: return visit_continue; } + virtual ir_visitor_status visit_leave(ir_call *ir) + { + /* Traverse list of function parameters, and for array parameters + propagate max_array_access, Otherwise arrays that are only referenced + from inside functions via function parameters will be incorrectly + optimized. This will lead to incorrect code being generated (or worse). + Do it when leaving the node so the childen would propagate their + array accesses first */ + + const exec_node *formal_param_node = ir-callee-parameters.get_head(); + const exec_node *actual_param_node = ir-actual_parameters.get_head(); + while (!actual_param_node-is_tail_sentinel()) { + ir_variable *formal_param = (ir_variable *) formal_param_node; + ir_rvalue *actual_param = (ir_rvalue *) actual_param_node; + + formal_param_node = formal_param_node-get_next(); + actual_param_node = actual_param_node-get_next(); + + if (formal_param-type-is_array()) { +ir_dereference_variable *deref = actual_param-as_dereference_variable(); +if (deref deref-var deref-var-type-is_array()) { + deref-var-max_array_access = + MAX2(formal_param-max_array_access, deref-var-max_array_access); +} + } + } + return visit_continue; + } + virtual ir_visitor_status visit(ir_dereference_variable *ir) { if (hash_table_find(locals, ir-var) == NULL) { -- 1.8.3.1 Reviewed-and-Tested-by: Matt Turner matts...@gmail.com I've sent four tests to the piglit list and Cc'd you. Take a look at them and make sure they're exercising the thing you want to test. I'll commit this patch tomorrow, assuming no other comments or problems with the tests. I'll also tag it for the stable branches, since it's definitely a bug fix. Thanks a bunch, Dominik! Matt ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: propagate max_array_access through function calls
Thanks, I looked at piglit tests and they look OK if they are only supposed to test whether the shader compiles and links. It doesn't look like they test the results of rendering which could be more useful? On Tue, Sep 3, 2013 at 3:19 PM, Dominik Behr db...@google.com wrote: Thanks, I looked at piglit tests and they look OK if they are only supposed to test whether the shader compiles and links. It doesn't look like they test the results of rendering which could be more useful? -- Dominik On Tue, Sep 3, 2013 at 2:52 PM, Matt Turner matts...@gmail.com wrote: On Wed, Aug 28, 2013 at 1:10 PM, Dominik Behr db...@chromium.org wrote: Fixes a bug where if an uniform array is passed to a function the accesses to the array are not propagated so later all but the first vector of the uniform array are removed in parcel_out_uniform_storage resulting in broken shaders and out of bounds access to arrays in brw::vec4_visitor::pack_uniform_registers. Signed-off-by: Dominik Behr db...@chromium.org --- src/glsl/link_functions.cpp | 29 + 1 file changed, 29 insertions(+) diff --git a/src/glsl/link_functions.cpp b/src/glsl/link_functions.cpp index 6b3e154..d935546 100644 --- a/src/glsl/link_functions.cpp +++ b/src/glsl/link_functions.cpp @@ -173,6 +173,35 @@ public: return visit_continue; } + virtual ir_visitor_status visit_leave(ir_call *ir) + { + /* Traverse list of function parameters, and for array parameters + propagate max_array_access, Otherwise arrays that are only referenced + from inside functions via function parameters will be incorrectly + optimized. This will lead to incorrect code being generated (or worse). + Do it when leaving the node so the childen would propagate their + array accesses first */ + + const exec_node *formal_param_node = ir-callee-parameters.get_head(); + const exec_node *actual_param_node = ir-actual_parameters.get_head(); + while (!actual_param_node-is_tail_sentinel()) { + ir_variable *formal_param = (ir_variable *) formal_param_node; + ir_rvalue *actual_param = (ir_rvalue *) actual_param_node; + + formal_param_node = formal_param_node-get_next(); + actual_param_node = actual_param_node-get_next(); + + if (formal_param-type-is_array()) { +ir_dereference_variable *deref = actual_param-as_dereference_variable(); +if (deref deref-var deref-var-type-is_array()) { + deref-var-max_array_access = + MAX2(formal_param-max_array_access, deref-var-max_array_access); +} + } + } + return visit_continue; + } + virtual ir_visitor_status visit(ir_dereference_variable *ir) { if (hash_table_find(locals, ir-var) == NULL) { -- 1.8.3.1 Reviewed-and-Tested-by: Matt Turner matts...@gmail.com I've sent four tests to the piglit list and Cc'd you. Take a look at them and make sure they're exercising the thing you want to test. I'll commit this patch tomorrow, assuming no other comments or problems with the tests. I'll also tag it for the stable branches, since it's definitely a bug fix. Thanks a bunch, Dominik! Matt ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/gs: Don't assign gl_Layer its own slot in the VUE map.
On 09/03/2013 04:11 PM, Paul Berry wrote: --- src/mesa/drivers/dri/i965/brw_vs.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index b81a538..7c7493f 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -64,6 +64,11 @@ brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map, vue_map-slots_valid = slots_valid; int i; + /* gl_Layer doesn't get its own varying slot--it's stored in the virst VUE +* slot (VARYING_SLOT_PSIZ). +*/ + slots_valid = ~VARYING_BIT_LAYER; + /* Make sure that the values we store in vue_map-varying_to_slot and * vue_map-slot_to_varying won't overflow the signed chars that are used * to store them. Note that since vue_map-slot_to_varying sometimes holds Reviewed-by: Chad Versace chad.vers...@linux.intel.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/15] i965/gen6+: Support 128 varying components.
GL 3.2 requires us to support 128 varying components for geometry shader outputs and fragment shader inputs, and 64 varying components otherwise. But there's no hardware limitation that restricts us to 64 varying components, and core Mesa doesn't currently allow different stages to have different maximum values, so I've gone ahead and enabled 128 varying components for all stages. This has the advantage of increased test coverage, since piglit already has a number of tests to validate that the maximum advertised number of varying components can be exchanged between VS and FS. I've also gone ahead and increased the limit for gen6 as well as gen7, since it required very little extra work. Previously, on gen6+, we relied on the SF/SBE stage of the pipeline to reorder the outputs from the GS (or VS) to match the input ordering required by the FS. This allowed us to determine the order of FS inputs solely based on the FS, so we avoided recompiles when separate shader objects were in use. But there's a problem with that: the SF/SBE stage can't arbitrarily reorder more than 16 VUE slots (1 slot = 4 varying components). To avoid introducing additional recompiles with previously-supported shaders, I've taken a hybrid approach to choosing the FS input ordering: if the FS uses 16 or fewer input varying slots, then it orders them solely based on its own requirements. If it uses more than 16 input varying slots, then it orders them according to the GS (or VS) output VUE map, so that the SF/SBE stage doesn't have to do any reordering. Patches 1-3 modify the FS so that it exposes the order of input varyings it needs via prog_data. Patches 4-6 modify the SF/SBE setup so that it consults the FS prog_data when choosing how to re-order varyings (previously, it implicitly assumed an order that happened to match the order the FS was using). Patch 7 is a minor optimization made possible by patches 1-6: now that the SF/SBE setup no longer makes implicit assumptions about the order of the FS inputs, the FS no longer has to have dummy input slots for gl_FragCoord and gl_FrontFacing. Patch 8 tweaks the VUE map slightly so that it is uniquely determined by a single 64-bit bitfield. This will allow us to store the bitfield in the FS program key rather than the entire VUE map. Patch 9 is a minor optimization made possible by patch 8: now that the VUE map is uniquely determined by a single 64-bit bitfield, we no longer have to store the entire VUE map in the GS program key. Patches 10-11 modify the FS to order its inputs according to the GS (or VS) output VUE map when there are more than 16 input slots in use. Patch 12 adjusts the VS and GS code so that it can output all 32 varyings to the VUE, even if it requires more than two URB writes to do so. Patches 13-14 make some minor gen6-specific adjustments to allow for the larger URB entries needed for 32 vayings: the Gen6 transform feedback code sometimes needs to do 2 URB writes instead of 1, and an incorrect assertion in the gen6 URB setup needs to be fixed. Patch 15 increases the value of MaxVarying from 16 to 32 for gen6+. The series is available on branch increase-max-varyings of https://github.com/stereotype441/mesa.git. I've piglit tested it on gen5, gen6, and gen7. [PATCH 01/15] i965/fs: Expose urb_setup as part of brw_wm_prog_data. [PATCH 02/15] i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs. [PATCH 03/15] i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state. [PATCH 04/15] i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values. [PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute overrides. [PATCH 06/15] i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state. [PATCH 07/15] i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_Frontfacing. [PATCH 08/15] i965/gen6+: Remove VUE map dependency on userclip_active. [PATCH 09/15] i965/gs: Stop storing an input VUE map in the GS program key. [PATCH 10/15] i965/fs: Simplify computation of key.input_slots_valid during precompile. [PATCH 11/15] i965/fs: When 64 input components, order them to match prev pipeline stage. [PATCH 12/15] i965/vec4: Generate URB writes using a loop. [PATCH 13/15] i965/gen6: Fix assertions on VS/GS URB size. [PATCH 14/15] i965/ff_gs: Generate URB writes using a loop. [PATCH 15/15] i965/gen6+: Support 128 varying components. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/15] i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.
On gen4-5, the FS stage reads varying inputs from URB entries that were output by the SF thread, where each register stores the interpolation setup for two components of a vec4, therefore the FS urb_read_length is twice the number of FS input varyings. On gen6+, varying inputs are directly deposited in the FS payload by the SF/SBE fixed function logic, so urb_read_length is irrelevant. However, in future patches, it will be nice to be able to consult brw_wm_prog_data to determine how many varying inputs the FS expects (rather than inferring it from gl_program::InputsRead). So instead of storing urb_read_length, we simply store num_varying_inputs in brw_wm_prog_data. On gen4-5, we multiply this by 2 to recover the URB read length. --- src/mesa/drivers/dri/i965/brw_context.h | 2 +- src/mesa/drivers/dri/i965/brw_fs.cpp | 7 --- src/mesa/drivers/dri/i965/brw_wm_state.c | 3 ++- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 41001d8..4c6aebe 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -314,7 +314,7 @@ struct brw_shader { */ struct brw_wm_prog_data { GLuint curb_read_length; - GLuint urb_read_length; + GLuint num_varying_inputs; GLuint first_curbe_grf; GLuint first_curbe_grf_16; diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 9e7d203..444c2b5 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1272,8 +1272,7 @@ fs_visitor::calculate_urb_setup() c-prog_data.urb_setup[VARYING_SLOT_PNTC] = urb_next++; } - /* Each attribute is 4 setup channels, each of which is half a reg. */ - c-prog_data.urb_read_length = urb_next * 2; + c-prog_data.num_varying_inputs = urb_next; } void @@ -1298,7 +1297,9 @@ fs_visitor::assign_urb_setup() } } - this-first_non_payload_grf = urb_start + c-prog_data.urb_read_length; + /* Each attribute is 4 setup channels, each of which is half a reg. */ + this-first_non_payload_grf = + urb_start + c-prog_data.num_varying_inputs * 2; } /** diff --git a/src/mesa/drivers/dri/i965/brw_wm_state.c b/src/mesa/drivers/dri/i965/brw_wm_state.c index 404fdad..4b06f66 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_state.c @@ -133,7 +133,8 @@ brw_upload_wm_unit(struct brw_context *brw) } wm-thread3.dispatch_grf_start_reg = brw-wm.prog_data-first_curbe_grf; - wm-thread3.urb_entry_read_length = brw-wm.prog_data-urb_read_length; + wm-thread3.urb_entry_read_length = + brw-wm.prog_data-num_varying_inputs * 2; wm-thread3.urb_entry_read_offset = 0; wm-thread3.const_urb_entry_read_length = brw-wm.prog_data-curb_read_length; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/15] i965/fs: Expose urb_setup as part of brw_wm_prog_data.
At the moment, for Gen6+, the FS assumes that all varying inputs are delivered to it in the order in which they appear in the gl_program::InputsRead bitfield, and the SF/SBE setup code ensures that they are delivered in this order. When we add support for more than 64 varying components, this will no longer always be possible, because the Gen6+ SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots. To allow extra flexibility in the ordering of FS varyings, this patch causes the FS to advertise exactly what ordering it expects. --- src/mesa/drivers/dri/i965/brw_context.h | 7 +++ src/mesa/drivers/dri/i965/brw_fs.cpp | 10 +- src/mesa/drivers/dri/i965/brw_fs.h | 1 - src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 ++-- 4 files changed, 14 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 939083b..41001d8 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -336,6 +336,13 @@ struct brw_wm_prog_data { */ uint32_t barycentric_interp_modes; + /** +* Map from gl_varying_slot to the position within the FS setup data +* payload where the varying's attribute vertex deltas should be delivered. +* For varying slots that are not used by the FS, the value is -1. +*/ + int urb_setup[VARYING_SLOT_MAX]; + /* Pointers to tracked values (only valid once * _mesa_load_state_parameters has been called at runtime). * diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 96cb2ee..9e7d203 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1004,7 +1004,7 @@ fs_visitor::emit_general_interpolation(ir_variable *ir) int location = ir-location; for (unsigned int i = 0; i array_elements; i++) { for (unsigned int j = 0; j type-matrix_columns; j++) { -if (urb_setup[location] == -1) { +if (c-prog_data.urb_setup[location] == -1) { /* If there's no incoming setup data for this slot, don't * emit interpolation for it. */ @@ -1231,7 +1231,7 @@ void fs_visitor::calculate_urb_setup() { for (unsigned int i = 0; i VARYING_SLOT_MAX; i++) { - urb_setup[i] = -1; + c-prog_data.urb_setup[i] = -1; } int urb_next = 0; @@ -1239,7 +1239,7 @@ fs_visitor::calculate_urb_setup() if (brw-gen = 6) { for (unsigned int i = 0; i VARYING_SLOT_MAX; i++) { if (fp-Base.InputsRead BITFIELD64_BIT(i)) { - urb_setup[i] = urb_next++; + c-prog_data.urb_setup[i] = urb_next++; } } } else { @@ -1257,7 +1257,7 @@ fs_visitor::calculate_urb_setup() * incremented, mapped or not. */ if (_mesa_varying_slot_in_fs((gl_varying_slot) i)) - urb_setup[i] = urb_next; + c-prog_data.urb_setup[i] = urb_next; urb_next++; } } @@ -1269,7 +1269,7 @@ fs_visitor::calculate_urb_setup() * See compile_sf_prog() for more info. */ if (fp-Base.InputsRead BITFIELD64_BIT(VARYING_SLOT_PNTC)) - urb_setup[VARYING_SLOT_PNTC] = urb_next++; + c-prog_data.urb_setup[VARYING_SLOT_PNTC] = urb_next++; } /* Each attribute is 4 setup channels, each of which is half a reg. */ diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index cb4ac3b..b77d4de 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -454,7 +454,6 @@ public: int first_non_payload_grf; /** Either BRW_MAX_GRF or GEN7_MRF_HACK_START */ int max_grf; - int urb_setup[VARYING_SLOT_MAX]; fs_reg *fp_temp_regs; fs_reg *fp_input_regs; diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index b049436..911b53e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2159,10 +2159,10 @@ fs_visitor::emit_dummy_fs() struct brw_reg fs_visitor::interp_reg(int location, int channel) { - int regnr = urb_setup[location] * 2 + channel / 2; + int regnr = c-prog_data.urb_setup[location] * 2 + channel / 2; int stride = (channel 1) * 4; - assert(urb_setup[location] != -1); + assert(c-prog_data.urb_setup[location] != -1); return brw_vec1_grf(regnr, stride); } -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/15] i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state.
Previously, we assumed that the number of varying inputs consumed by the fragment shader was equal to the number of bits set in gl_program::InputsRead. However, we'll soon be making two changes that will cause that not to be true: - We'll stop wasting varying input space for gl_FragCoord and gl_FrontFacing, which aren't varyings. - For fragment shaders that have more than 16 varying inputs, we'll adjust the layout of the inputs to account for the fact that the SF/SBE pipeline stage can't reorder inputs beyond the first 16; if there are GS outputs that the FS doens't use (or vice versa) this may cause the number of FS varying inputs to change. So, instead of trying to guess the number of FS inputs from gl_program::InputsRead, simply read it from brw_wm_prog_data:num_varying_inputs, which is guaranteed to be correct since it's populated by fs_visitor::calculate_urb_setup(). --- src/mesa/drivers/dri/i965/gen6_wm_state.c | 3 ++- src/mesa/drivers/dri/i965/gen7_wm_state.c | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c b/src/mesa/drivers/dri/i965/gen6_wm_state.c index 6725805..fff5cc4 100644 --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c @@ -187,7 +187,8 @@ upload_wm_state(struct brw_context *brw) dw5 |= GEN6_WM_DISPATCH_ENABLE; } - dw6 |= _mesa_bitcount_64(brw-fragment_program-Base.InputsRead) + /* CACHE_NEW_WM_PROG */ + dw6 |= brw-wm.prog_data-num_varying_inputs GEN6_WM_NUM_SF_OUTPUTS_SHIFT; if (multisampled_fbo) { /* _NEW_MULTISAMPLE */ diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c b/src/mesa/drivers/dri/i965/gen7_wm_state.c index e5691fb..842f744 100644 --- a/src/mesa/drivers/dri/i965/gen7_wm_state.c +++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c @@ -167,6 +167,7 @@ upload_ps_state(struct brw_context *brw) * rendering, CurrentFragmentProgram is used for this check to * differentiate between the GLSL and non-GLSL cases. */ + /* BRW_NEW_FRAGMENT_PROGRAM */ if (ctx-Shader.CurrentFragmentProgram == NULL) dw2 |= GEN7_PS_FLOATING_POINT_MODE_ALT; @@ -190,8 +191,8 @@ upload_ps_state(struct brw_context *brw) dw4 |= GEN7_PS_DUAL_SOURCE_BLEND_ENABLE; } - /* BRW_NEW_FRAGMENT_PROGRAM */ - if (brw-fragment_program-Base.InputsRead != 0) + /* CACHE_NEW_WM_PROG */ + if (brw-wm.prog_data-num_varying_inputs != 0) dw4 |= GEN7_PS_ATTRIBUTE_ENABLE; dw4 |= GEN7_PS_8_DISPATCH_ENABLE; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/15] i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values.
We always program the SF unit to start reading the vertex URB entry at offset 1. In upcoming patches, we'll be adding FS code that relies on this. So consistently use the constant BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoding a 1. --- src/mesa/drivers/dri/i965/brw_context.h | 10 ++ src/mesa/drivers/dri/i965/brw_sf.h| 2 -- src/mesa/drivers/dri/i965/gen6_sf_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +- 4 files changed, 12 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 4c6aebe..a0a8d4f 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -478,6 +478,16 @@ struct brw_sf_prog_data { GLuint urb_entry_size; }; + +/** + * We always program SF to start reading at an offset of 1 (2 varying slots) + * from the start of the vertex URB entry. This causes it to skip: + * - VARYING_SLOT_PSIZ and BRW_VARYING_SLOT_NDC on gen4-5 + * - VARYING_SLOT_PSIZ and VARYING_SLOT_POS on gen6+ + */ +#define BRW_SF_URB_ENTRY_READ_OFFSET 1 + + struct brw_clip_prog_data { GLuint curb_read_length;/* user planes? */ GLuint clip_mode; diff --git a/src/mesa/drivers/dri/i965/brw_sf.h b/src/mesa/drivers/dri/i965/brw_sf.h index 09880fe..0006239 100644 --- a/src/mesa/drivers/dri/i965/brw_sf.h +++ b/src/mesa/drivers/dri/i965/brw_sf.h @@ -105,6 +105,4 @@ void brw_emit_point_setup( struct brw_sf_compile *c, bool allocate ); void brw_emit_point_sprite_setup( struct brw_sf_compile *c, bool allocate ); void brw_emit_anyprim_setup( struct brw_sf_compile *c ); -#define BRW_SF_URB_ENTRY_READ_OFFSET 1 - #endif diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c index c76debe..dfe9a31 100644 --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c @@ -138,7 +138,7 @@ upload_sf_state(struct brw_context *brw) bool multisampled_fbo = ctx-DrawBuffer-Visual.samples 1; int attr = 0, input_index = 0; - int urb_entry_read_offset = 1; + const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET; float point_size; uint16_t attr_overrides[VARYING_SLOT_MAX]; uint32_t point_sprite_origin; diff --git a/src/mesa/drivers/dri/i965/gen7_sf_state.c b/src/mesa/drivers/dri/i965/gen7_sf_state.c index 0ff3388..715eb6c 100644 --- a/src/mesa/drivers/dri/i965/gen7_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen7_sf_state.c @@ -40,7 +40,7 @@ upload_sbe_state(struct brw_context *brw) uint32_t dw1, dw10, dw11; int i; int attr = 0, input_index = 0; - int urb_entry_read_offset = 1; + const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET; uint16_t attr_overrides[VARYING_SLOT_MAX]; /* _NEW_BUFFERS */ bool render_to_fbo = _mesa_is_user_fbo(ctx-DrawBuffer); -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.
--- src/mesa/drivers/dri/i965/brw_state.h | 9 +- src/mesa/drivers/dri/i965/gen6_sf_state.c | 153 +- src/mesa/drivers/dri/i965/gen7_sf_state.c | 64 + 3 files changed, 97 insertions(+), 129 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index 22e4a61..dd3e216 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -223,9 +223,12 @@ void gen4_init_vtable_sampler_functions(struct brw_context *brw); void gen7_init_vtable_sampler_functions(struct brw_context *brw); /* gen6_sf_state.c */ -uint32_t -get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, - int fs_attr, bool two_side_color, uint32_t *max_source_attr); +void +calculate_attr_overrides(const struct brw_context *brw, + uint16_t *attr_overrides, + uint32_t *point_sprite_enables, + uint32_t *flat_enables, + uint32_t *urb_entry_read_length); /* brw_vs_surface_state.c */ void diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c index dfe9a31..7094994 100644 --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c @@ -52,7 +52,7 @@ * the VUE that are not needed by the fragment shader. It is measured in * 256-bit increments. */ -uint32_t +static uint32_t get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, int fs_attr, bool two_side_color, uint32_t *max_source_attr) { @@ -123,21 +123,98 @@ get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, return source_attr; } + +/** + * Create the mapping from the FS inputs we produce to the VS outputs they + * source from. + */ +void +calculate_attr_overrides(const struct brw_context *brw, + uint16_t *attr_overrides, + uint32_t *point_sprite_enables, + uint32_t *flat_enables, + uint32_t *urb_entry_read_length) +{ + const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET; + uint32_t max_source_attr = 0; + int input_index = 0; + + /* _NEW_LIGHT */ + bool shade_model_flat = brw-ctx.Light.ShadeModel == GL_FLAT; + + for (int attr = 0; attr VARYING_SLOT_MAX; attr++) { + enum glsl_interp_qualifier interp_qualifier = + brw-fragment_program-InterpQualifier[attr]; + bool is_gl_Color = attr == VARYING_SLOT_COL0 || attr == VARYING_SLOT_COL1; + + if (!(brw-fragment_program-Base.InputsRead BITFIELD64_BIT(attr))) +continue; + + /* _NEW_POINT */ + if (brw-ctx.Point.PointSprite + (attr = VARYING_SLOT_TEX0 attr = VARYING_SLOT_TEX7) + brw-ctx.Point.CoordReplace[attr - VARYING_SLOT_TEX0]) { +*point_sprite_enables |= (1 input_index); + } + + if (attr == VARYING_SLOT_PNTC) +*point_sprite_enables |= (1 input_index); + + /* flat shading */ + if (interp_qualifier == INTERP_QUALIFIER_FLAT || + (shade_model_flat is_gl_Color + interp_qualifier == INTERP_QUALIFIER_NONE)) + *flat_enables |= (1 input_index); + + /* The hardware can only do the overrides on 16 overrides at a + * time, and the other up to 16 have to be lined up so that the + * input index = the output index. We'll need to do some + * tweaking to make sure that's the case. + */ + assert(input_index 16 || attr == input_index); + + /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_LIGHT | _NEW_PROGRAM */ + attr_overrides[input_index++] = + get_attr_override(brw-vue_map_geom_out, + urb_entry_read_offset, attr, + brw-ctx.VertexProgram._TwoSideEnabled, + max_source_attr); + } + + for (; input_index VARYING_SLOT_MAX; input_index++) + attr_overrides[input_index] = 0; + + /* From the Sandy Bridge PRM, Volume 2, Part 1, documentation for +* 3DSTATE_SF DWord 1 bits 15:11, Vertex URB Entry Read Length: +* +* This field should be set to the minimum length required to read the +* maximum source attribute. The maximum source attribute is indicated +* by the maximum value of the enabled Attribute # Source Attribute if +* Attribute Swizzle Enable is set, Number of Output Attributes-1 if +* enable is not set. +* read_length = ceiling((max_source_attr + 1) / 2) +* +* [errata] Corruption/Hang possible if length programmed larger than +* recommended +* +* Similar text exists for Ivy Bridge. +*/ + *urb_entry_read_length = ALIGN(max_source_attr + 1, 2) / 2; +} + + static void upload_sf_state(struct brw_context *brw) { struct gl_context *ctx = brw-ctx; /* BRW_NEW_FRAGMENT_PROGRAM */ uint32_t
[Mesa-dev] [PATCH 06/15] i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.
Previously, the SF/SBE setup code delivered varying inputs to the FS in the order in which they appear in the gl_program::InputsRead bitfield, since that's what the FS expects. When we add support for more than 64 varying components, this will no longer always be the case, because the Gen6+ SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots. So, when there are more than 16 vec4's worth of varying inputs, the FS will have to adjust the order its input varyings in order to partially match the order of outputs from the geometry or vertex shader. To allow extra flexibility in the ordering of FS varyings, this patch causes the SF/SBE to deliver varying inputs to the FS in exactly the order that the FS requests, by consulting brw_wm_prog_data::urb_setup and brw_wm_prog_data::num_varying_inputs. --- src/mesa/drivers/dri/i965/gen6_sf_state.c | 45 ++- src/mesa/drivers/dri/i965/gen7_sf_state.c | 13 + 2 files changed, 35 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c index 7094994..bcad5a4 100644 --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c @@ -137,17 +137,23 @@ calculate_attr_overrides(const struct brw_context *brw, { const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET; uint32_t max_source_attr = 0; - int input_index = 0; /* _NEW_LIGHT */ bool shade_model_flat = brw-ctx.Light.ShadeModel == GL_FLAT; + /* Initialize all the attr_overrides to 0. In the loop below we'll modify +* just the ones that correspond to inputs used by the fs. +*/ + memset(attr_overrides, 0, 16*sizeof(*attr_overrides)); + for (int attr = 0; attr VARYING_SLOT_MAX; attr++) { enum glsl_interp_qualifier interp_qualifier = brw-fragment_program-InterpQualifier[attr]; bool is_gl_Color = attr == VARYING_SLOT_COL0 || attr == VARYING_SLOT_COL1; + /* CACHE_NEW_WM_PROG */ + int input_index = brw-wm.prog_data-urb_setup[attr]; - if (!(brw-fragment_program-Base.InputsRead BITFIELD64_BIT(attr))) + if (input_index 0) continue; /* _NEW_POINT */ @@ -166,23 +172,23 @@ calculate_attr_overrides(const struct brw_context *brw, interp_qualifier == INTERP_QUALIFIER_NONE)) *flat_enables |= (1 input_index); - /* The hardware can only do the overrides on 16 overrides at a - * time, and the other up to 16 have to be lined up so that the - * input index = the output index. We'll need to do some - * tweaking to make sure that's the case. - */ - assert(input_index 16 || attr == input_index); - /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_LIGHT | _NEW_PROGRAM */ - attr_overrides[input_index++] = + uint16_t attr_override = get_attr_override(brw-vue_map_geom_out, urb_entry_read_offset, attr, brw-ctx.VertexProgram._TwoSideEnabled, max_source_attr); - } - for (; input_index VARYING_SLOT_MAX; input_index++) - attr_overrides[input_index] = 0; + /* The hardware can only do the overrides on 16 overrides at a + * time, and the other up to 16 have to be lined up so that the + * input index = the output index. We'll need to do some + * tweaking to make sure that's the case. + */ + if (input_index 16) + attr_overrides[input_index] = attr_override; + else + assert(attr_override == input_index); + } /* From the Sandy Bridge PRM, Volume 2, Part 1, documentation for * 3DSTATE_SF DWord 1 bits 15:11, Vertex URB Entry Read Length: @@ -207,8 +213,8 @@ static void upload_sf_state(struct brw_context *brw) { struct gl_context *ctx = brw-ctx; - /* BRW_NEW_FRAGMENT_PROGRAM */ - uint32_t num_outputs = _mesa_bitcount_64(brw-fragment_program-Base.InputsRead); + /* CACHE_NEW_WM_PROG */ + uint32_t num_outputs = brw-wm.prog_data-num_varying_inputs; uint32_t dw1, dw2, dw3, dw4, dw16, dw17; int i; /* _NEW_BUFFER */ @@ -217,7 +223,7 @@ upload_sf_state(struct brw_context *brw) const int urb_entry_read_offset = BRW_SF_URB_ENTRY_READ_OFFSET; float point_size; - uint16_t attr_overrides[VARYING_SLOT_MAX]; + uint16_t attr_overrides[16]; uint32_t point_sprite_origin; dw1 = GEN6_SF_SWIZZLE_ENABLE | num_outputs GEN6_SF_NUM_OUTPUTS_SHIFT; @@ -353,7 +359,9 @@ upload_sf_state(struct brw_context *brw) (1 GEN6_SF_TRIFAN_PROVOKE_SHIFT); } - /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM */ + /* BRW_NEW_VUE_MAP_GEOM_OUT | _NEW_POINT | _NEW_LIGHT | _NEW_PROGRAM | +* CACHE_NEW_WM_PROG +*/ uint32_t urb_entry_read_length; calculate_attr_overrides(brw, attr_overrides, dw16, dw17, urb_entry_read_length); @@ -391,7 +399,8 @@ const
[Mesa-dev] [PATCH 07/15] i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_FrontFacing.
Previously, if a fragment shader accessed gl_FragCoord or gl_FrontFacing, we would assign them their own slots in the fragment shader input attribute array, using up space that could be made available to real varyings. This was not strictly necessary (since these values are not true varyings, and are instead computed from other data available in the FS payload). But we had to do it anyway because the SF/SBE setup code assumed that every 1 bit in the gl_program::InputsRead bitfield corresponded to a genuine varying variable. Now that the SF/SBE code consults brw_wm_prog_data and only sets up the attributes that the fragment shader actually needs, we don't have to do this anymore. --- src/mesa/drivers/dri/i965/brw_context.h | 9 + src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++- src/mesa/drivers/dri/i965/gen6_sf_state.c | 8 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index a0a8d4f..167ed4a 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -439,6 +439,15 @@ void brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map, GLbitfield64 slots_valid, bool userclip_active); +/** + * Bitmask indicating which fragment shader inputs represent varyings (and + * hence have to be delivered to the fragment shader by the SF/SBE stage). + */ +#define BRW_FS_VARYING_INPUT_MASK \ + (BITFIELD64_RANGE(0, VARYING_SLOT_MAX) \ +~VARYING_BIT_POS ~VARYING_BIT_FACE) + + /* * Mapping of VUE map slots to interpolation modes. */ diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 444c2b5..013dc29 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1238,7 +1238,8 @@ fs_visitor::calculate_urb_setup() /* Figure out where each of the incoming setup attributes lands. */ if (brw-gen = 6) { for (unsigned int i = 0; i VARYING_SLOT_MAX; i++) { -if (fp-Base.InputsRead BITFIELD64_BIT(i)) { +if (fp-Base.InputsRead BRW_FS_VARYING_INPUT_MASK + BITFIELD64_BIT(i)) { c-prog_data.urb_setup[i] = urb_next++; } } diff --git a/src/mesa/drivers/dri/i965/gen6_sf_state.c b/src/mesa/drivers/dri/i965/gen6_sf_state.c index bcad5a4..a093dc1 100644 --- a/src/mesa/drivers/dri/i965/gen6_sf_state.c +++ b/src/mesa/drivers/dri/i965/gen6_sf_state.c @@ -56,14 +56,6 @@ static uint32_t get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, int fs_attr, bool two_side_color, uint32_t *max_source_attr) { - if (fs_attr == VARYING_SLOT_POS) { - /* This attribute will be overwritten by the fragment shader's - * interpolation code (see emit_interp() in brw_wm_fp.c), so just let it - * reference the first available attribute. - */ - return 0; - } - /* Find the VUE slot for this attribute. */ int slot = vue_map-varying_to_slot[fs_attr]; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/15] i965/gen6+: Remove VUE map dependency on userclip_active.
Previously, on Gen6+, we laid out the vertex (or geometry) shader VUE map differently depending whether user clipping was active. If it was active, we put the clip distances in slots 2 and 3 (where the clipper expects them); if it was inactive, we assigned them in the order of the gl_varying_slot enum. This made for unnecessary recompiles, since turning clipping on/off for a shader that used gl_ClipDistance might rearrange the varyings. It also required extra bookkeeping, since it required the user clipping flag to be provided to brw_compute_vue_map() as a parameter. With this patch, we always put clip distances at in slots 2 and 3 if they are written to. do_vs_prog() and do_gs_prog() are responsible for ensuring that clip distances are written to when user clipping is enabled (as do_vs_prog() previously did for gen4-5). This makes the only input to brw_compute_vue_map() a bitfield of which varyings the shader writes to, a fact that we'll take advantage of in forthcoming patches. --- src/mesa/drivers/dri/i965/brw_context.h | 2 +- src/mesa/drivers/dri/i965/brw_vec4_gs.c | 15 --- src/mesa/drivers/dri/i965/brw_vs.c | 26 +- 3 files changed, 26 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 167ed4a..0c1fd9e 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -436,7 +436,7 @@ static inline GLuint brw_varying_to_offset(struct brw_vue_map *vue_map, } void brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map, - GLbitfield64 slots_valid, bool userclip_active); + GLbitfield64 slots_valid); /** diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c b/src/mesa/drivers/dri/i965/brw_vec4_gs.c index 7ab03ac..94c4017 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c @@ -62,9 +62,18 @@ do_gs_prog(struct brw_context *brw, c.prog_data.base.param = rzalloc_array(NULL, const float *, param_count); c.prog_data.base.pull_param = rzalloc_array(NULL, const float *, param_count); - brw_compute_vue_map(brw, c.prog_data.base.vue_map, - gp-program.Base.OutputsWritten, - c.key.base.userclip_active); + GLbitfield64 outputs_written = gp-program.Base.OutputsWritten; + + /* In order for legacy clipping to work, we need to populate the clip +* distance varying slots whenever clipping is enabled, even if the vertex +* shader doesn't write to gl_ClipDistance. +*/ + if (c.key.base.userclip_active) { + outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0); + outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1); + } + + brw_compute_vue_map(brw, c.prog_data.base.vue_map, outputs_written); /* Compute the output vertex size. * diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index b81a538..6b97f01 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -52,14 +52,10 @@ static inline void assign_vue_slot(struct brw_vue_map *vue_map, /** * Compute the VUE map for vertex shader program. - * - * Note that consumers of this map using cache keys must include - * prog_data-userclip and prog_data-outputs_written in their key - * (generated by CACHE_NEW_VS_PROG). */ void brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map, -GLbitfield64 slots_valid, bool userclip_active) +GLbitfield64 slots_valid) { vue_map-slots_valid = slots_valid; int i; @@ -107,10 +103,11 @@ brw_compute_vue_map(struct brw_context *brw, struct brw_vue_map *vue_map, */ assign_vue_slot(vue_map, VARYING_SLOT_PSIZ); assign_vue_slot(vue_map, VARYING_SLOT_POS); - if (userclip_active) { + if (slots_valid BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0)) assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST0); + if (slots_valid BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1)) assign_vue_slot(vue_map, VARYING_SLOT_CLIP_DIST1); - } + /* front and back colors need to be consecutive so that we can use * ATTRIBUTE_SWIZZLE_INPUTATTR_FACING to swizzle them when doing * two-sided color. @@ -267,15 +264,18 @@ do_vs_prog(struct brw_context *brw, outputs_written |= BITFIELD64_BIT(VARYING_SLOT_COL0); if (outputs_written BITFIELD64_BIT(VARYING_SLOT_BFC1)) outputs_written |= BITFIELD64_BIT(VARYING_SLOT_COL1); + } - if (c.key.base.userclip_active) { - outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST0); - outputs_written |= BITFIELD64_BIT(VARYING_SLOT_CLIP_DIST1); - } + /* In order for legacy clipping to work, we need to populate the clip +* distance varying slots whenever clipping is enabled, even if the
[Mesa-dev] [PATCH 09/15] i965/gs: Stop storing an input VUE map in the GS program key.
Now that the vertex shader output VUE map is determined solely by a 64-bit bitfield, we don't have to store it in its entirety in the geometry shader program key; instead, we can just store the bitfield, and let the geometry shader infer the VUE map at compile time. This dramatically reduces the size of the geometry shader program key, which we want to keep small since it gets recomputed whenever the active program changes. --- src/mesa/drivers/dri/i965/brw_vec4_gs.c | 6 -- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h | 3 ++- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs.c b/src/mesa/drivers/dri/i965/brw_vec4_gs.c index 94c4017..5e67d1a 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs.c +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs.c @@ -167,10 +167,12 @@ do_gs_prog(struct brw_context *brw, c.prog_data.output_topology = prim_to_hw_prim[gp-program.OutputType]; + brw_compute_vue_map(brw, c.input_vue_map, c.key.input_varyings); + /* GS inputs are read from the VUE 256 bits (2 vec4's) at a time, so we * need to program a URB read length of ceiling(num_slots / 2). */ - c.prog_data.base.urb_read_length = (c.key.input_vue_map.num_slots + 1) / 2; + c.prog_data.base.urb_read_length = (c.input_vue_map.num_slots + 1) / 2; void *mem_ctx = ralloc_context(NULL); unsigned program_size; @@ -239,7 +241,7 @@ brw_upload_gs_prog(struct brw_context *brw) key.base.tex); /* BRW_NEW_VUE_MAP_VS */ - key.input_vue_map = brw-vue_map_vs; + key.input_varyings = brw-vue_map_vs.slots_valid; if (!brw_search_cache(brw-cache, BRW_GS_PROG, key, sizeof(key), diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp index d82a26e..ae78855 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp @@ -70,8 +70,8 @@ vec4_gs_visitor::setup_varying_inputs(int payload_reg, int *attribute_map) assert(num_input_vertices = MAX_GS_INPUT_VERTICES); unsigned input_array_stride = c-prog_data.base.urb_read_length * 2; - for (int slot = 0; slot c-key.input_vue_map.num_slots; slot++) { - int varying = c-key.input_vue_map.slot_to_varying[slot]; + for (int slot = 0; slot c-input_vue_map.num_slots; slot++) { + int varying = c-input_vue_map.slot_to_varying[slot]; for (unsigned vertex = 0; vertex num_input_vertices; vertex++) { attribute_map[BRW_VARYING_SLOT_COUNT * vertex + varying] = payload_reg + input_array_stride * vertex + slot; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h index fba0ac6..48623d8 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h @@ -37,7 +37,7 @@ struct brw_gs_prog_key { struct brw_vec4_prog_key base; - struct brw_vue_map input_vue_map; + GLbitfield64 input_varyings; }; @@ -49,6 +49,7 @@ struct brw_gs_compile struct brw_vec4_compile base; struct brw_gs_prog_key key; struct brw_gs_prog_data prog_data; + struct brw_vue_map input_vue_map; struct brw_geometry_program *gp; }; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/15] i965/fs: Simplify computation of key.input_slots_valid during precompile.
The for loop was rather silly. In addition to checking brw-gen 6 on each loop iteration, it took pains to exclude bits from fp-Base.InputsRead that don't correspond to fragment shader inputs. But those bits would never have been set in the first place, since the only bits that are ever set in fp-Base.InputsRead are fragment shader inputs. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 013dc29..7950d5f6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3150,17 +3150,7 @@ brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog) } if (brw-gen 6) - key.input_slots_valid |= BITFIELD64_BIT(VARYING_SLOT_POS); - - for (int i = 0; i VARYING_SLOT_MAX; i++) { - if (!(fp-Base.InputsRead BITFIELD64_BIT(i))) -continue; - - if (brw-gen 6) { - if (_mesa_varying_slot_in_fs((gl_varying_slot) i)) -key.input_slots_valid |= BITFIELD64_BIT(i); - } - } + key.input_slots_valid = fp-Base.InputsRead | VARYING_BIT_POS; key.clamp_fragment_color = ctx-API == API_OPENGL_COMPAT; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/15] i965/fs: When 64 input components, order them to match prev pipeline stage.
Since the SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots, we can't arrange the fragment shader inputs in an arbitrary order if there are more than 16 input varying slots in use. We need to make sure that slots 16-31 match the corresponding outputs of the previous pipeline stage. The easiest way to accomplish this is to just make all varying slots match up with the previous pipeline stage. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 42 ++-- src/mesa/drivers/dri/i965/brw_wm.c | 3 ++- 2 files changed, 38 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 7950d5f6..8d73a0f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1237,11 +1237,40 @@ fs_visitor::calculate_urb_setup() int urb_next = 0; /* Figure out where each of the incoming setup attributes lands. */ if (brw-gen = 6) { - for (unsigned int i = 0; i VARYING_SLOT_MAX; i++) { -if (fp-Base.InputsRead BRW_FS_VARYING_INPUT_MASK - BITFIELD64_BIT(i)) { - c-prog_data.urb_setup[i] = urb_next++; -} + if (_mesa_bitcount_64(fp-Base.InputsRead +BRW_FS_VARYING_INPUT_MASK) = 16) { + /* The SF/SBE pipeline stage can do arbitrary rearrangement of the + * first 16 varying inputs, so we can put them wherever we want. + * Just put them in order. + */ + for (unsigned int i = 0; i VARYING_SLOT_MAX; i++) { +if (fp-Base.InputsRead BRW_FS_VARYING_INPUT_MASK +BITFIELD64_BIT(i)) { + c-prog_data.urb_setup[i] = urb_next++; +} + } + } else { + /* We have enough input varyings that the SF/SBE pipeline stage can't + * arbitrarily rearrange them to suit our whim; we have to put them + * in an order that matches the output of the previous pipeline stage + * (geometry or vertex shader). + */ + struct brw_vue_map prev_stage_vue_map; + brw_compute_vue_map(brw, prev_stage_vue_map, + c-key.input_slots_valid); + int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET; + assert(prev_stage_vue_map.num_slots = first_slot + 32); + for (int slot = first_slot; slot prev_stage_vue_map.num_slots; + slot++) { +int varying = prev_stage_vue_map.slot_to_varying[slot]; +if (varying != BRW_VARYING_SLOT_COUNT +(fp-Base.InputsRead BRW_FS_VARYING_INPUT_MASK + BITFIELD64_BIT(varying))) { + c-prog_data.urb_setup[varying] = slot - first_slot; + urb_next = MAX2(urb_next, slot + 1); +} + } + urb_next = prev_stage_vue_map.num_slots - first_slot; } } else { /* FINISHME: The sf doesn't map VS-FS inputs for us very well. */ @@ -3149,7 +3178,8 @@ brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog) key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT; } - if (brw-gen 6) + if (brw-gen 6 || _mesa_bitcount_64(fp-Base.InputsRead + BRW_FS_VARYING_INPUT_MASK) 16) key.input_slots_valid = fp-Base.InputsRead | VARYING_BIT_POS; key.clamp_fragment_color = ctx-API == API_OPENGL_COMPAT; diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c index 3df2b7d..3e59880 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.c +++ b/src/mesa/drivers/dri/i965/brw_wm.c @@ -466,7 +466,8 @@ static void brw_wm_populate_key( struct brw_context *brw, (ctx-Multisample.SampleAlphaToCoverage || ctx-Color.AlphaEnabled); /* BRW_NEW_VUE_MAP_GEOM_OUT */ - if (brw-gen 6) + if (brw-gen 6 || _mesa_bitcount_64(fp-program.Base.InputsRead + BRW_FS_VARYING_INPUT_MASK) 16) key-input_slots_valid = brw-vue_map_geom_out.slots_valid; /* The unique fragment program ID */ -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/15] i965/vec4: Generate URB writes using a loop.
Previously we only ever did 1 or 2 URB writes, since the maximum number of varyings we support is small enough to fit in 2 URB writes. But GL 3.2 requires the geometry shader to support 128 output varying components, and this could require up to 3 URB writes. --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 52 +++--- 1 file changed, 21 insertions(+), 31 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 6771630..98b0a9b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2851,47 +2851,37 @@ vec4_visitor::emit_vertex() emit_clip_distances(output_reg[VARYING_SLOT_CLIP_DIST1], 4); } - /* Set up the VUE data for the first URB write */ - int slot; - for (slot = 0; slot prog_data-vue_map.num_slots; ++slot) { - emit_urb_slot(mrf++, prog_data-vue_map.slot_to_varying[slot]); - - /* If this was max_usable_mrf, we can't fit anything more into this URB - * WRITE. + /* We may need to split this up into several URB writes, so do them in a +* loop. +*/ + int slot = 0; + bool complete = false; + do { + /* URB offset is in URB row increments, and each of our MRFs is half of + * one of those, since we're doing interleaved writes. */ - if (mrf max_usable_mrf) { -slot++; -break; - } - } - - bool complete = slot = prog_data-vue_map.num_slots; - current_annotation = URB write; - vec4_instruction *inst = emit_urb_write_opcode(complete); - inst-base_mrf = base_mrf; - inst-mlen = align_interleaved_urb_mlen(brw, mrf - base_mrf); + int offset = slot / 2; - /* Optional second URB write */ - if (!complete) { mrf = base_mrf + 1; - for (; slot prog_data-vue_map.num_slots; ++slot) { -assert(mrf max_usable_mrf); - emit_urb_slot(mrf++, prog_data-vue_map.slot_to_varying[slot]); + + /* If this was max_usable_mrf, we can't fit anything more into this + * URB WRITE. + */ + if (mrf max_usable_mrf) { +slot++; +break; + } } + complete = slot = prog_data-vue_map.num_slots; current_annotation = URB write; - inst = emit_urb_write_opcode(true /* complete */); + vec4_instruction *inst = emit_urb_write_opcode(complete); inst-base_mrf = base_mrf; inst-mlen = align_interleaved_urb_mlen(brw, mrf - base_mrf); - /* URB destination offset. In the previous write, we got MRFs - * 2-13 minus the one header MRF, so 12 regs. URB offset is in - * URB row increments, and each of our MRFs is half of one of - * those, since we're doing interleaved writes. - */ - inst-offset = (max_usable_mrf - base_mrf) / 2; - } + inst-offset = offset; + } while(!complete); } void -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/15] i965/gen6: Fix assertions on VS/GS URB size.
The {VS,GS} URB Entry Allocation Size fields of 3DSTATE_URB allow values in the range 0-4, but they are U8-1 fields, so the range of possible allocation sizes is 1-5. We were erroneously prohibiting a size of 5. --- src/mesa/drivers/dri/i965/gen6_urb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen6_urb.c b/src/mesa/drivers/dri/i965/gen6_urb.c index e16d30a..86de9bd 100644 --- a/src/mesa/drivers/dri/i965/gen6_urb.c +++ b/src/mesa/drivers/dri/i965/gen6_urb.c @@ -86,8 +86,8 @@ gen6_upload_urb( struct brw_context *brw ) assert(brw-urb.nr_vs_entries = 24); assert(brw-urb.nr_vs_entries % 4 == 0); assert(brw-urb.nr_gs_entries % 4 == 0); - assert(vs_size 5); - assert(gs_size 5); + assert(vs_size = 5); + assert(gs_size = 5); BEGIN_BATCH(3); OUT_BATCH(_3DSTATE_URB 16 | (3 - 2)); -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/15] i965/ff_gs: Generate URB writes using a loop.
Previously we only ever did 1 URB write, since the maximum number of varyings we support is small enough to fit in 1 URB write (when using BRW_URB_SWIZZLE_NONE, which is what the pre-Gen7 GS always uses). But we're about to increase the number of varying components we support from 64 to 128. With 128 varyings, the most URB writes we'll have to do is 2, but it's just as easy to write a general-purpose loop. --- src/mesa/drivers/dri/i965/brw_gs_emit.c | 61 - 1 file changed, 38 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_gs_emit.c b/src/mesa/drivers/dri/i965/brw_gs_emit.c index 2c94eb0..9050b95 100644 --- a/src/mesa/drivers/dri/i965/brw_gs_emit.c +++ b/src/mesa/drivers/dri/i965/brw_gs_emit.c @@ -169,31 +169,46 @@ static void brw_ff_gs_emit_vue(struct brw_ff_gs_compile *c, bool last) { struct brw_compile *p = c-func; - bool allocate = !last; + int write_offset = 0; + bool complete = false; - /* Copy the vertex from vertn into m1..mN+1: -*/ - brw_copy8(p, brw_message_reg(1), vert, c-nr_regs); + do { + /* We can't write more than 14 registers at a time to the URB */ + int write_len = MIN2(c-nr_regs - write_offset, 14); + if (write_len == c-nr_regs - write_offset) + complete = true; - /* Send each vertex as a seperate write to the urb. This is -* different to the concept in brw_sf_emit.c, where subsequent -* writes are used to build up a single urb entry. Each of these -* writes instantiates a seperate urb entry, and a new one must be -* allocated each time. -*/ - brw_urb_WRITE(p, -allocate ? c-reg.temp - : retype(brw_null_reg(), BRW_REGISTER_TYPE_UD), -0, -c-reg.header, -allocate ? BRW_URB_WRITE_ALLOCATE_COMPLETE - : BRW_URB_WRITE_EOT_COMPLETE, -c-nr_regs + 1, /* msg length */ -allocate ? 1 : 0, /* response length */ -0, /* urb offset */ -BRW_URB_SWIZZLE_NONE); - - if (allocate) { + /* Copy the vertex from vertn into m1..mN+1: + */ + brw_copy8(p, brw_message_reg(1), offset(vert, write_offset), write_len); + + /* Send the vertex data to the URB. If this is the last write for this + * vertex, then we mark it as complete, and either end the thread or + * allocate another vertex URB entry (depending whether this is the last + * vertex). + */ + enum brw_urb_write_flags flags; + if (!complete) + flags = BRW_URB_WRITE_NO_FLAGS; + else if (last) + flags = BRW_URB_WRITE_EOT_COMPLETE; + else + flags = BRW_URB_WRITE_ALLOCATE_COMPLETE; + brw_urb_WRITE(p, +(flags BRW_URB_WRITE_ALLOCATE) ? c-reg.temp +: retype(brw_null_reg(), BRW_REGISTER_TYPE_UD), +0, +c-reg.header, +flags, +write_len + 1, /* msg length */ +(flags BRW_URB_WRITE_ALLOCATE) ? 1 +: 0, /* response length */ +write_offset, /* urb offset */ +BRW_URB_SWIZZLE_NONE); + write_offset += write_len; + } while (!complete); + + if (!last) { brw_MOV(p, get_element_ud(c-reg.header, 0), get_element_ud(c-reg.temp, 0)); } -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/15] i965/gen6+: Support 128 varying components.
GL 3.2 requires us to support 128 varying components for geometry shader outputs and fragment shader inputs, and 64 varying components otherwise. But there's no hardware limitation that restricts us to 64 varying components, and core Mesa doesn't currently allow different stages to have different maximum values, so just go ahead and enable 128 varying components for all stages. This gets us better test coverage anyway. Even though we are only working on GL 3.2 support for gen7 right now, gen6 also supports 128 varying components, so go ahead and switch it on there too. --- src/mesa/drivers/dri/i965/brw_context.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index 2321076..3c1e409 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -247,6 +247,9 @@ brw_initialize_context_constants(struct brw_context *brw) ctx-Const.DisableGLSLLineContinuations = driQueryOptionb(brw-optionCache, disable_glsl_line_continuations); + if (brw-gen = 6) + ctx-Const.MaxVarying = 32; + /* We want the GLSL compiler to emit code that uses condition codes */ for (int i = 0; i MESA_SHADER_TYPES; i++) { ctx-ShaderCompilerOptions[i].MaxIfDepth = brw-gen 6 ? 16 : UINT_MAX; -- 1.8.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs
Hi Zack, This change regresses a bunch of point sprite piglit tests on i915g. Should we revert back to the old behaviour? As far as I can see, it was correct (it was keeping the attributes in case another stage is using them). Stéphane On Thu, Aug 8, 2013 at 12:46 PM, Zack Rusin za...@vmware.com wrote: Before inserting new front face and prim id outputs cleanup the old extra outputs, otherwise our cache will use previous output slots which will break as soon as outputs of the current shader don't match the last. Signed-off-by: Zack Rusin za...@vmware.com --- src/gallium/auxiliary/draw/draw_context.c |1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index af9caee..2dc6772 100644 --- a/src/gallium/auxiliary/draw/draw_context.c +++ b/src/gallium/auxiliary/draw/draw_context.c @@ -555,6 +555,7 @@ draw_get_shader_info(const struct draw_context *draw) void draw_prepare_shader_outputs(struct draw_context *draw) { + draw_remove_extra_vertex_attribs(draw); draw_ia_prepare_outputs(draw, draw-pipeline.ia); draw_unfilled_prepare_outputs(draw, draw-pipeline.unfilled); } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] draw: cleanup the extra attribs
On Tue, Sep 3, 2013 at 8:20 PM, Stéphane Marchesin stephane.marche...@gmail.com wrote: Hi Zack, This change regresses a bunch of point sprite piglit tests on i915g. Should we revert back to the old behaviour? As far as I can see, it was correct (it was keeping the attributes in case another stage is using them). Stéphane This commit actually already lead to three regression reports: https://bugs.freedesktop.org/show_bug.cgi?id=67963 https://bugs.freedesktop.org/show_bug.cgi?id=67965 https://bugs.freedesktop.org/show_bug.cgi?id=67966 In fact, Zack has 11 regression reports filed (from Aug 3-10) against commits he made, including one commit titled softpipe: fix the regressions that oddly enough caused two regressions. I have no idea why running piglit on Zack's system didn't catch these. ;) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Mesa (git 20130828) fails to build on MIPS
Hello, I encounter an error when I build mesa from git. I am on a MIPS computer with ATI RS780E. Here are the instructions I use for the build: ./autogen.sh \ --prefix=/usr \ --enable-gles2 \ --disable-gallium-egl \ --with-egl-platforms=x11,wayland,drm \ --enable-gbm \ --enable-shared-glapi \ --with-gallium-drivers=r300,r600,swrast \ --with-dri-drivers=radeon,swrast make Then, I get this error message: make[2]: Entering directory `/usr/src/mesa/mesa-20130828/src/glsl' YACC glsl_parser.cpp make[2]: *** [glsl_parser.cpp] Error 141 How do I know what the error #141 corresponds to? Do you have an idea on how to fix this? Thanks, Christophe ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev