Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block
On Tue, 2015-10-13 at 09:44 -0700, Jordan Justen wrote: > On 2015-10-13 05:17:37, Francisco Jerez wrote: > > Iago Toral Quirogawrites: > > > > > This fixes the following test: > > > > > > [require] > > > GL >= 3.3 > > > GLSL >= 3.30 > > > GL_ARB_shader_storage_buffer_object > > > > > > [fragment shader] > > > #version 330 > > > #extension GL_ARB_shader_storage_buffer_object: require > > > > > > buffer SSBO { > > > mat4 sm4; > > > }; > > > > > > > > > mat4 um4; > > > > > > void main() { > > > sm4 *= um4; > > > > This is using the value of "um4", which is never assigned, so liveness > > analysis will correctly extend its live interval up to the start of the > > block. > > This test was derived by simplifying a CTS test case. > > Anyway, I'm not sure what happened on the way to the commit message, > but um4 should be a uniform. > > http://sprunge.us/cEUe Oh yes, that was me playing around with the example. The patches also fix the uniform version. Jordan, can you verify if this fixes the CTS test case? In any case, since Curro is working on a more general fix for this stuff, I guess you'd rather wait for his patches... Iago > -Jordan > > > The other problem here seems to be that the liveness analysis pass > > doesn't consider scalar writes (like the ones emitted by the > > combine_constants optimization pass and by emit_uniformize()) to fully > > define the contents of a register, so it will incorrectly extend up the > > live interval of registers defined using scalar writes even if all > > components ever used in the shader have been defined individually. > > Incidentally the use-def-chains-based implementation of liveness > > analysis I'm working on will fix this issue easily. > > > > > sm4[0][0] = 0.0; > > > } > > > > > > [test] > > > link success > > > > > > where our liveness analysis works really bad because the control-flow part > > > will expand register liveness to the end of only block we have (so every > > > register will be marked alive until the end of the program). This > > > artificially > > > increases register pressure to a point in which we run out of registers. > > > Of course, this is only a simple optimization for a trivial case, the > > > underlying problem still exists and would manifest when we have more than > > > one block, but it helps simple shaders such as the one in the example > > > above > > > without any effort. I guess the real fix would involve re-thinking parts > > > of the > > > liveness analysis algorithm... > > > > > > Jordan, there is another thing that we can improve for this shader that is > > > specific to SSBOs: we emit the same ssbo load multiple times because we > > > are > > > playing it safe just in case there are writes in between. I think we can > > > try to > > > do better and not re-issue the same load if we don't have ssbo stores to > > > the same address in between. I'll try to come up with a patch for this > > > (hopefully we can do this inside lower_ubo_reference). > > > > > > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just > > > indentation fixes in the code around these. > > > > > > Iago Toral Quiroga (4): > > > i965/fs: Fix indentation in fs_live_variables::compute_start_end > > > i965/fs: skip control-flow aware liveness analysis if we only have 1 > > > block > > > i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals > > > i965/vec4: skip control-flow aware liveness analysis if we only have 1 > > > block > > > > > > .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 > > > ++ > > > .../drivers/dri/i965/brw_vec4_live_variables.cpp | 23 > > > + > > > 2 files changed, 30 insertions(+), 17 deletions(-) > > > > > > -- > > > 1.9.1 > > > > > > ___ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/10] i965/gen9: Support fast clears for 32b float
SKL supports the ability to do fast clears and resolves of 32b RGBA as both integer and floats. This patch only enables float color clears because we haven't yet enabled integer color clears, (HW support for that was added in BDW). This is enabled separate because it is a new feature to SKL and so it might have some issues. NOTE: This patch has 2 regressions with 16F_LUMINANCE and 16F_INTENSITY which needs to be resolved before merging. The rest of the test suites are happy. ./bin/ext_framebuffer_multisample-formats [2468] ... Testing GL_LUMINANCE16F_ARB Probe at (0,0) Expected: 0.00 Observed: 0.50 Probe at (0,0) Expected: 0.00 Observed: 0.50 Not-Signed-off-by: Ben Widawsky--- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 ++-- src/mesa/drivers/dri/i965/gen8_surface_state.c | 8 2 files changed, 6 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 9c51ffb..aa36794 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -360,8 +360,12 @@ is_color_fast_clear_compatible(struct brw_context *brw, } for (int i = 0; i < 4; i++) { - if (color->f[i] != 0.0f && color->f[i] != 1.0f && - _mesa_format_has_color_component(format, i)) { + if (!_mesa_format_has_color_component(format, i)) { + continue; + } + + if (brw->gen < 9 && + color->f[i] != 0.0f && color->f[i] != 1.0f) { return false; } } diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index b19b492..ca0cedc 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -188,14 +188,6 @@ gen8_emit_fast_clear_color(struct brw_context *brw, uint32_t *surf) { if (brw->gen >= 9) { -#define check_fast_clear_val(x) \ - assert(mt->gen9_fast_clear_color.f[x] == 0.0 || \ - mt->gen9_fast_clear_color.f[x] == 1.0) - check_fast_clear_val(0); - check_fast_clear_val(1); - check_fast_clear_val(2); - check_fast_clear_val(3); -#undef check_fast_clear_val surf[12] = mt->gen9_fast_clear_color.ui[0]; surf[13] = mt->gen9_fast_clear_color.ui[1]; surf[14] = mt->gen9_fast_clear_color.ui[2]; -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/10] i965/gen8+: Remove redundant zeroing of surface state
The allocate_surface_state already zeroes out the surface state, and doing it later in the function is destructive for what we want to accomplish when we split out support for gen9 fast clears (next patch). NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent to remove the other instances as well. I can make an argument both ways (open coding it, vs. not). I can rework the next patch if requires. Signed-off-by: Ben Widawsky--- src/mesa/drivers/dri/i965/gen8_surface_state.c | 12 1 file changed, 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index 18b8665..eaaecd3 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -284,8 +284,6 @@ gen8_emit_texture_surface_state(struct brw_context *brw, SET_FIELD((aux_mt->pitch / tile_w) - 1, GEN8_SURFACE_AUX_PITCH) | aux_mode; - } else { - surf[6] = 0; } surf[7] = mt->fast_clear_color_value | @@ -302,11 +300,7 @@ gen8_emit_texture_surface_state(struct brw_context *brw, aux_mt->bo, 0, I915_GEM_DOMAIN_SAMPLER, (rw ? I915_GEM_DOMAIN_SAMPLER : 0)); - } else { - surf[10] = 0; - surf[11] = 0; } - surf[12] = 0; /* Emit relocation to surface contents */ drm_intel_bo_emit_reloc(brw->batch.bo, @@ -514,8 +508,6 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, SET_FIELD((aux_mt->pitch / tile_w) - 1, GEN8_SURFACE_AUX_PITCH) | aux_mode; - } else { - surf[6] = 0; } surf[7] = mt->fast_clear_color_value | @@ -533,11 +525,7 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, offset + 10 * 4, aux_mt->bo, 0, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER); - } else { - surf[10] = 0; - surf[11] = 0; } - surf[12] = 0; drm_intel_bo_emit_reloc(brw->batch.bo, offset + 8 * 4, -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments
The impetus for this patch comes from a seemingly benign statement within the spec (quoted within the patch). For me, this patch was at some point critical for getting stable piglit results (though this did not seem to be the case on a branch Chad was working on). It is very important for clearing multiple color buffer attachments and can be observed in the following piglit tests: spec/arb_framebuffer_object/fbo-drawbuffers-none glclear spec/ext_framebuffer_multisample/blit-multiple-render-targets 0 Signed-off-by: Ben Widawsky--- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 + 1 file changed, 84 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 7bf52f0..9e6711e 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable) brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM; } +/** + * Individually fast clear each color buffer attachment. On previous gens this + * isn't required. The motivation for this comes from one line (which seems to + * be specific to SKL+). The list item is in section titled _MCS Buffer for + * Render Target(s)_ + * + * "Since only one RT is bound with a clear pass, only one RT can be cleared + * at a time. To clear multiple RTs, multiple clear passes are required." + * + * The code follows the same idea as the resolve code which creates a fake FBO + * to avoid interfering with too much of the GL state. + */ +static void +fast_clear_attachments(struct brw_context *brw, + struct gl_framebuffer *fb, + uint32_t fast_clear_buffers, + struct rect fast_clear_rect) +{ + assert(brw->gen >= 9); + struct gl_context *ctx = >ctx; + + GLuint old_fb = ctx->DrawBuffer->Name; + + for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) { + struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf]; + struct intel_renderbuffer *irb = intel_renderbuffer(rb); + GLuint fbo, rbo; + int index = fb->_ColorDrawBufferIndexes[buf]; + + if (!((1 << index) & fast_clear_buffers)) + continue; + + _mesa_GenFramebuffers(1, ); + rbo = brw_get_rb_for_slice(brw, irb->mt, 0, 0, false); + + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); + _mesa_FramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, +GL_COLOR_ATTACHMENT0, +GL_RENDERBUFFER, rbo); + _mesa_DrawBuffer(GL_COLOR_ATTACHMENT0); + + brw_fast_clear_init(brw); + + use_rectlist(brw, true); + + brw_bind_rep_write_shader(brw, (float *) fast_clear_color); + + /* SKL+ also has a resolve mode for compressed render targets and thus more + * bits to let us select the type of resolve. For fast clear resolves, it + * turns out we can use the same value as pre-SKL though. + */ + set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE); + brw_draw_rectlist(ctx, _clear_rect, MAX2(1, fb->MaxNumLayers)); + set_fast_clear_op(brw, 0); + use_rectlist(brw, false); + + _mesa_DeleteRenderbuffers(1, ); + _mesa_DeleteFramebuffers(1, ); + + /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll + * resolve them eventually. + */ + irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR; + } + + _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, old_fb); +} + bool brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, GLbitfield buffers, bool partial_clear) @@ -600,12 +668,27 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, use_rectlist(brw, true); layers = MAX2(1, fb->MaxNumLayers); - if (fast_clear_buffers) { + + if (brw->gen >= 9 && fast_clear_buffers) { + fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect); + } else if (fast_clear_buffers) { _mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers); brw_bind_rep_write_shader(brw, (float *) fast_clear_color); set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE); brw_draw_rectlist(ctx, _clear_rect, layers); set_fast_clear_op(brw, 0); + + /* Now set the mcs we cleared to INTEL_FAST_CLEAR_STATE_CLEAR so we'll + * resolve them eventually. + */ + for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) { + struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf]; + struct intel_renderbuffer *irb = intel_renderbuffer(rb); + int index = fb->_ColorDrawBufferIndexes[buf]; + + if ((1 << index) & fast_clear_buffers) +irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR; + } } if (rep_clear_buffers) { @@ -614,18 +697,6 @@
[Mesa-dev] [PATCH 00/10] Support Skylake MCS buffers (fast clears)
This patch series adds support for fast color clears on SKL as it exists on previous generations of hardware minus the new hardware restriction on surface formats. Additionally, it adds support for utilizing clear values with up to 32b per color channel (see note at the bottom). It is based on work originally done by Kristian, so thanks to him for that initial work as well as helping me debug some of the issues. Additionally, thanks to Chad for helping track down the last bug in the rectangle scaling code which was (for me) being masked by another bug (#3 below). I imagine it would have been several more weeks at least before I uncovered it. We knew that SKL added the extra DWORDs to the RENDER_SURFACE_STATE in order to support the 32b per channel. As it turned out though, Skylake made other changes to support this which caused weird failures which seemed to interfere with each other. 1. Not all surface formats support lossless compression. 2. Clearing multiple color buffer attachments must happen in n passes 3. Change to the scaling factors for the MCS surface - SKL has 2x height (this was the bug which Chad helped uncover, I had it correct in my patch from March http://lists.freedesktop.org/archives/mesa-dev/2015-March/079084.html, but we had other problems which prevented merge, including #1 and #2 above). I have no piglit, dEQP or CTS regressions (except for the last patch). I haven't yet, but will collect perf data on this ASAP. Historically we've come to expect this to provide large gains in tests which are memory bandwidth limited and doing many clears. Ben Widawsky (10): i965/gen8+: Remove redundant zeroing of surface state i965/gen8+: Extract color clear surface state i965/skl: Enable fast color clears on SKL i965/skl: skip fast clears for certain surface formats i965/meta/gen9: Individually fast clear color attachments Revert "i965/gen9: Disable MCS for 1x color surfaces" Revert "i965/gen9: Enable rep clears on gen9" i965/meta: Assert fast clears and rep clears never overlap i965/meta: Remove fast_clear_color variable i965/gen9: Support fast clears for 32b float src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 172 ++-- src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 src/mesa/drivers/dri/i965/gen8_surface_state.c | 48 --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 20 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 7 +- 6 files changed, 205 insertions(+), 70 deletions(-) -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/10] Revert "i965/gen9: Disable MCS for 1x color surfaces"
This reverts commit dcd59a9e322edeea74187bcad65a8e56c0bfaaa2. --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 8 1 file changed, 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index f108b75..c723f79 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -207,14 +207,6 @@ intel_miptree_supports_non_msrt_fast_clear(struct brw_context *brw, if (brw->gen < 7) return false; - if (brw->gen >= 9) { - /* FINISHME: Enable singlesample fast MCS clears on SKL after all GPU - * FINISHME: hangs are resolved. - */ - perf_debug("singlesample fast MCS clears disabled on gen9"); - return false; - } - if (mt->disable_aux_buffers) return false; -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/10] Revert "i965/gen9: Enable rep clears on gen9"
This reverts commit 8a0c85b25853decb4a110b6d36d79c4f095d437b. It's not a strict revert because I don't want to bring back the gen < 9 check at this point in time. --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 5 - 1 file changed, 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 9e6711e..97094ae 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -537,11 +537,6 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS) clear_type = REP_CLEAR; - if (brw->gen >= 9 && clear_type == FAST_CLEAR) { - perf_debug("fast MCS clears are disabled on gen9"); - clear_type = REP_CLEAR; - } - /* We can't do scissored fast clears because of the restrictions on the * fast clear rectangle size. */ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/10] i965/skl: Enable fast color clears on SKL
Based on a patch originally from Kristian. Skylake has extended capabilities with regard to fast clears, but that is saved for another patch. The same effect could be acheived with the following, however I think the way I've done it is more in line with how the docs explain it. --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw, /* In release builds, fall through */ case I915_TILING_Y: *width_px = 32 / mt->cpp; - *height = 4; + if (brw->gen >= 9) + *height = 2; + else + *height = 4; Signed-off-by: Ben Widawsky--- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 54 + src/mesa/drivers/dri/i965/gen8_surface_state.c | 34 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 + src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 7 +++- 4 files changed, 78 insertions(+), 26 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index fbde3f0..7bf52f0 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect *rect, int num_instances) } static void -get_fast_clear_rect(struct gl_framebuffer *fb, +get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb, struct intel_renderbuffer *irb, struct rect *rect) { unsigned int x_align, y_align; @@ -228,7 +228,14 @@ get_fast_clear_rect(struct gl_framebuffer *fb, */ intel_get_non_msrt_mcs_alignment(irb->mt, _align, _align); x_align *= 16; - y_align *= 32; + + /* SKL+ line alignment requirement for Y-tiled are half those of the prior + * generations. + */ + if (brw->gen >= 9) + y_align *= 16; + else + y_align *= 32; /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render * Target(s)", beneath the "Fast Color Clear" bullet (p327): @@ -265,8 +272,10 @@ get_fast_clear_rect(struct gl_framebuffer *fb, * terms of (width,height) of the RT. * * MSAA Width of Clear Rect Height of Clear Rect + * 2X Ceil(1/8*width) Ceil(1/2*height) * 4X Ceil(1/8*width) Ceil(1/2*height) * 8X Ceil(1/2*width) Ceil(1/2*height) + * 16X widthCeil(1/2*height) * * The text "with upper left co-ordinate to coincide with actual * rectangle being cleared" is a little confusing--it seems to imply @@ -289,6 +298,9 @@ get_fast_clear_rect(struct gl_framebuffer *fb, case 8: x_scaledown = 2; break; + case 16: + x_scaledown = 1; + break; default: unreachable("Unexpected sample count for fast clear"); } @@ -358,18 +370,24 @@ is_color_fast_clear_compatible(struct brw_context *brw, /** * Convert the given color to a bitfield suitable for ORing into DWORD 7 of - * SURFACE_STATE. + * SURFACE_STATE (DWORD 12-15 on SKL+). */ -static uint32_t -compute_fast_clear_color_bits(const union gl_color_union *color) +static void +set_fast_clear_color(struct brw_context *brw, + struct intel_mipmap_tree *mt, + const union gl_color_union *color) { - uint32_t bits = 0; - for (int i = 0; i < 4; i++) { - /* Testing for non-0 works for integer and float colors */ - if (color->f[i] != 0.0f) - bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); + if (brw->gen >= 9) { + mt->gen9_fast_clear_color = *color; + } else { + mt->fast_clear_color_value = 0; + for (int i = 0; i < 4; i++) { + /* Testing for non-0 works for integer and float colors */ + if (color->f[i] != 0.0f) + mt->fast_clear_color_value |= +1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); + } } - return bits; } static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 }; @@ -504,8 +522,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, switch (clear_type) { case FAST_CLEAR: - irb->mt->fast_clear_color_value = -compute_fast_clear_color_bits(>Color.ClearColor); + set_fast_clear_color(brw, irb->mt, >Color.ClearColor); irb->need_downsample = true; /* If the buffer is already in INTEL_FAST_CLEAR_STATE_CLEAR, the @@ -521,7 +538,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED; irb->need_downsample = true; fast_clear_buffers |= 1 << index; - get_fast_clear_rect(fb, irb, _clear_rect); + get_fast_clear_rect(brw,
[Mesa-dev] [PATCH 02/10] i965/gen8+: Extract color clear surface state
On future generation platforms the color clear value is stored elsewhere in the surface state. By extracting this logic, we can cleanly implement the difference in an upcoming patch. Should have no functional impact. Signed-off-by: Ben Widawsky--- src/mesa/drivers/dri/i965/gen8_surface_state.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index eaaecd3..e70c15b 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -384,6 +384,12 @@ gen8_emit_null_surface_state(struct brw_context *brw, SET_FIELD(height - 1, GEN7_SURFACE_HEIGHT); } +static void +gen8_emit_fast_clear_color(struct intel_mipmap_tree *mt, uint32_t *surf) +{ + surf[7] |= mt->fast_clear_color_value; +} + /** * Sets up a surface state structure to point at the given region. * While it is only used for the front/back buffer currently, it should be @@ -510,11 +516,11 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, aux_mode; } - surf[7] = mt->fast_clear_color_value | - SET_FIELD(HSW_SCS_RED, GEN7_SURFACE_SCS_R) | - SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) | - SET_FIELD(HSW_SCS_BLUE, GEN7_SURFACE_SCS_B) | - SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A); + gen8_emit_fast_clear_color(mt, surf); + surf[7] |= SET_FIELD(HSW_SCS_RED, GEN7_SURFACE_SCS_R) | + SET_FIELD(HSW_SCS_GREEN, GEN7_SURFACE_SCS_G) | + SET_FIELD(HSW_SCS_BLUE, GEN7_SURFACE_SCS_B) | + SET_FIELD(HSW_SCS_ALPHA, GEN7_SURFACE_SCS_A); assert(mt->offset % mt->cpp == 0); *((uint64_t *) [8]) = mt->bo->offset64 + mt->offset; /* reloc */ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] i965/meta: Remove fast_clear_color variable
It doesn't actually serve a purpose AFAICT (in fact, I'm not certain what it's meant to do). Cc: Kristian HøgsbergSigned-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 41afc9a..9c51ffb 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -390,8 +390,6 @@ set_fast_clear_color(struct brw_context *brw, } } -static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 }; - static void set_fast_clear_op(struct brw_context *brw, uint32_t op) { @@ -472,7 +470,7 @@ fast_clear_attachments(struct brw_context *brw, use_rectlist(brw, true); - brw_bind_rep_write_shader(brw, (float *) fast_clear_color); + brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f); /* SKL+ also has a resolve mode for compressed render targets and thus more * bits to let us select the type of resolve. For fast clear resolves, it @@ -670,7 +668,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, fast_clear_attachments(brw, fb, fast_clear_buffers, fast_clear_rect); } else if (fast_clear_buffers) { _mesa_meta_drawbuffers_from_bitfield(fast_clear_buffers); - brw_bind_rep_write_shader(brw, (float *) fast_clear_color); + brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f); set_fast_clear_op(brw, GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE); brw_draw_rectlist(ctx, _clear_rect, layers); set_fast_clear_op(brw, 0); @@ -785,7 +783,7 @@ brw_meta_resolve_color(struct brw_context *brw, use_rectlist(brw, true); - brw_bind_rep_write_shader(brw, (float *) fast_clear_color); + brw_bind_rep_write_shader(brw, ctx->Color.ClearColor.f); /* SKL+ also has a resolve mode for compressed render targets and thus more * bits to let us select the type of resolve. For fast clear resolves, it -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] i965/meta: Assert fast clears and rep clears never overlap
There is nothing wrong with the code today, but as one modifies the code it turns out to be not too difficult to mess up the code, and this easy assertion should catch such driver implementation failures quickly. Cc: Kristian HøgsbergSigned-off-by: Ben Widawsky --- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 97094ae..41afc9a 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -616,6 +616,8 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, } } + assert((fast_clear_buffers & rep_clear_buffers) == 0); + if (!(fast_clear_buffers | rep_clear_buffers)) { if (plain_clear_buffers) /* If we only have plain clears, skip the meta save/restore. */ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/10] i965/skl: skip fast clears for certain surface formats
Initially I had this planned as a patch to be squashed in to the enabling patch because there is no point enabling fast clears without this. However, Chad merged a patch which disables fast clears on gen9 explicitly, and so I can hide this behind the revert of that patch. This is a nice I really wanted this patch as a distinct patch for review. This is a new, weird, and poorly documented restriction for SKL. (In fact, I am still not 100% certain the restriction is entirely necessary, but there are around 30 piglit regressions without this). SKL adds compressible render targets and as a result mutates some of the programming for fast clears and resolves. There is a new internal surface type called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "The Auxiliary surface is a CCS (Color Control Surface) with compression disabled or an MCS with compression enabled, depending on number of multisamples. MCS (Multisample Control Surface) is a special type of CCS." Signed-off-by: Ben Widawsky--- src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_surface_formats.c | 27 + src/mesa/drivers/dri/i965/gen8_surface_state.c | 8 ++-- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 3 +++ 4 files changed, 37 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index e59478a..32b8250 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1546,6 +1546,7 @@ struct brw_context uint32_t render_target_format[MESA_FORMAT_COUNT]; bool format_supported_as_render_target[MESA_FORMAT_COUNT]; + bool losslessly_compressable[MESA_FORMAT_COUNT]; /* Interpolation modes, one byte per vue slot. * Used Gen4/5 by the clip|sf|wm stages. Ignored on Gen6+. diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 97fff60..d706ecc 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -693,6 +693,33 @@ brw_init_surface_formats(struct brw_context *brw) } } + if (brw->gen >= 9) { + brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT32] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT32] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT32] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_UNORM16] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_SNORM16] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT16] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT16] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_FLOAT16] = true; + brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT32] = true; + brw->losslessly_compressable[MESA_FORMAT_RG_SINT32] = true; + brw->losslessly_compressable[MESA_FORMAT_RG_UINT32] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBX_FLOAT16] = true; + brw->losslessly_compressable[MESA_FORMAT_B8G8R8A8_UNORM] = true; + brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_UNORM] = true; + brw->losslessly_compressable[MESA_FORMAT_R8G8B8A8_SNORM] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_SINT8] = true; + brw->losslessly_compressable[MESA_FORMAT_RGBA_UINT8] = true; + brw->losslessly_compressable[MESA_FORMAT_RG_SINT16] = true; + brw->losslessly_compressable[MESA_FORMAT_RG_UINT16] = true; + brw->losslessly_compressable[MESA_FORMAT_RG_FLOAT16] = true; + brw->losslessly_compressable[MESA_FORMAT_R_UINT32] = true; + brw->losslessly_compressable[MESA_FORMAT_R_SINT32] = true; + brw->losslessly_compressable[MESA_FORMAT_R_FLOAT32] = true; + brw->losslessly_compressable[MESA_FORMAT_B8G8R8X8_UNORM] = true; + } + /* We will check this table for FBO completeness, but the surface format * table above only covered color rendering. */ diff --git a/src/mesa/drivers/dri/i965/gen8_surface_state.c b/src/mesa/drivers/dri/i965/gen8_surface_state.c index 995b4dd..b19b492 100644 --- a/src/mesa/drivers/dri/i965/gen8_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen8_surface_state.c @@ -243,8 +243,10 @@ gen8_emit_texture_surface_state(struct brw_context *brw, * "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN * 16 must be used." */ - if (brw->gen >= 9 || mt->num_samples == 1) + if (brw->gen >= 9 || mt->num_samples == 1) { assert(mt->halign == 16); + assert(mt->num_samples || brw->losslessly_compressable[mt->format] == true); + } } const uint32_t surf_type = translate_tex_target(target); @@ -488,8 +490,10 @@ gen8_update_renderbuffer_surface(struct brw_context *brw, * "When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN * 16 must be used." */ - if
Re: [Mesa-dev] [PATCH] mesa: remove unused functions in program.c
On Tue, Oct 13, 2015 at 7:16 PM, Brian Paulwrote: > replace_registers() and adjust_param_indexes() were unused. > --- Yep, noticed this too. Found it strange that someone would have managed to find and delete unused functions that gcc doesn't warn about while leaving in place functions that it does... Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Restore compute shader support in brw_nir_lower_inputs
On Tuesday, October 13, 2015 01:44:55 PM Jordan Justen wrote: > The commit shown below caused compute shaders to hit the unreachable > in the default of the switch block. Restore compute shaders to use the > fragment shader path. > > Also, simplify the fragment/compute path to only support scalar mode. > > commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4 > Author: Kenneth Graunke> Date: Fri Aug 14 15:15:11 2015 -0700 > > i965/vs: Map scalar VS input locations properly; avoid tons of MOVs. > > Signed-off-by: Jordan Justen > Cc: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_nir.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c > b/src/mesa/drivers/dri/i965/brw_nir.c > index 4f35d81..357ee4f 100644 > --- a/src/mesa/drivers/dri/i965/brw_nir.c > +++ b/src/mesa/drivers/dri/i965/brw_nir.c > @@ -96,8 +96,10 @@ brw_nir_lower_inputs(nir_shader *nir, bool is_scalar) >} >break; > case MESA_SHADER_FRAGMENT: > + case MESA_SHADER_COMPUTE: > + assert(is_scalar); >nir_assign_var_locations(>inputs, >num_inputs, > - is_scalar ? type_size_scalar : > type_size_vec4); > + type_size_scalar); >break; > default: >unreachable("unsupported shader stage"); > I didn't think compute shaders had inputs...so it might make more sense just to do: case MESA_SHADER_COMPUTE: /* Compute shaders have no inputs. */ break; Either way, sorry for breaking this, and... Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: Support uint index in do_vec_index_to_cond_assign
Series is Reviewed-by: Tapani Pälli(Note that due to recent changes, test won't pass but fails like: glcts: brw_nir.c:103: brw_nir_lower_inputs: Assertion `!"unsupported shader stage"' failed.) On 10/14/2015 12:27 AM, Jordan Justen wrote: The ES31-CTS.compute_shader.pipeline-compute-chain test case was generating an unsigned index by using gl_LocalInvocationID.x and gl_LocalInvocationID.y as array indices. Signed-off-by: Jordan Justen --- src/glsl/lower_vec_index_to_cond_assign.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/glsl/lower_vec_index_to_cond_assign.cpp b/src/glsl/lower_vec_index_to_cond_assign.cpp index 0c3394a..b623882 100644 --- a/src/glsl/lower_vec_index_to_cond_assign.cpp +++ b/src/glsl/lower_vec_index_to_cond_assign.cpp @@ -88,7 +88,9 @@ ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void *mem_ exec_list list; /* Store the index to a temporary to avoid reusing its tree. */ - index = new(base_ir) ir_variable(glsl_type::int_type, + assert(orig_index->type == glsl_type::int_type || + orig_index->type == glsl_type::uint_type); + index = new(base_ir) ir_variable(orig_index->type, "vec_index_tmp_i", ir_var_temporary); list.push_tail(index); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/10] i965/skl: Enable fast color clears on SKL
On Tue, Oct 13, 2015 at 8:50 PM, Ben Widawskywrote: > Based on a patch originally from Kristian. Skylake has extended capabilities > with regard to fast clears, but that is saved for another patch. > > The same effect could be acheived with the following, however I think the way > I've done it is more in line with how the docs explain it. > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw, >/* In release builds, fall through */ > case I915_TILING_Y: >*width_px = 32 / mt->cpp; > - *height = 4; > + if (brw->gen >= 9) > + *height = 2; > + else > + *height = 4; > > Signed-off-by: Ben Widawsky > --- > src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 54 > + > src/mesa/drivers/dri/i965/gen8_surface_state.c | 34 > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 + > src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 7 +++- > 4 files changed, 78 insertions(+), 26 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > index fbde3f0..7bf52f0 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c > @@ -204,7 +204,7 @@ brw_draw_rectlist(struct gl_context *ctx, struct rect > *rect, int num_instances) > } > > static void > -get_fast_clear_rect(struct gl_framebuffer *fb, > +get_fast_clear_rect(struct brw_context *brw, struct gl_framebuffer *fb, > struct intel_renderbuffer *irb, struct rect *rect) > { > unsigned int x_align, y_align; > @@ -228,7 +228,14 @@ get_fast_clear_rect(struct gl_framebuffer *fb, > */ >intel_get_non_msrt_mcs_alignment(irb->mt, _align, _align); >x_align *= 16; > - y_align *= 32; > + > + /* SKL+ line alignment requirement for Y-tiled are half those of the > prior > + * generations. > + */ > + if (brw->gen >= 9) > + y_align *= 16; > + else > + y_align *= 32; > >/* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render > * Target(s)", beneath the "Fast Color Clear" bullet (p327): > @@ -265,8 +272,10 @@ get_fast_clear_rect(struct gl_framebuffer *fb, > * terms of (width,height) of the RT. > * > * MSAA Width of Clear Rect Height of Clear Rect > + * 2X Ceil(1/8*width) Ceil(1/2*height) > * 4X Ceil(1/8*width) Ceil(1/2*height) > * 8X Ceil(1/2*width) Ceil(1/2*height) > + * 16X widthCeil(1/2*height) > * > * The text "with upper left co-ordinate to coincide with actual > * rectangle being cleared" is a little confusing--it seems to imply > @@ -289,6 +298,9 @@ get_fast_clear_rect(struct gl_framebuffer *fb, >case 8: > x_scaledown = 2; > break; > + case 16: > + x_scaledown = 1; > + break; >default: > unreachable("Unexpected sample count for fast clear"); >} > @@ -358,18 +370,24 @@ is_color_fast_clear_compatible(struct brw_context *brw, > > /** > * Convert the given color to a bitfield suitable for ORing into DWORD 7 of > - * SURFACE_STATE. > + * SURFACE_STATE (DWORD 12-15 on SKL+). > */ > -static uint32_t > -compute_fast_clear_color_bits(const union gl_color_union *color) > +static void > +set_fast_clear_color(struct brw_context *brw, > + struct intel_mipmap_tree *mt, > + const union gl_color_union *color) > { > - uint32_t bits = 0; > - for (int i = 0; i < 4; i++) { > - /* Testing for non-0 works for integer and float colors */ > - if (color->f[i] != 0.0f) > - bits |= 1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); > + if (brw->gen >= 9) { > + mt->gen9_fast_clear_color = *color; > + } else { > + mt->fast_clear_color_value = 0; > + for (int i = 0; i < 4; i++) { > + /* Testing for non-0 works for integer and float colors */ > + if (color->f[i] != 0.0f) > + mt->fast_clear_color_value |= > +1 << (GEN7_SURFACE_CLEAR_COLOR_SHIFT + (3 - i)); Lets put braces around this since it spans multiple lines. > + } > } > - return bits; > } > > static const uint32_t fast_clear_color[4] = { ~0, ~0, ~0, ~0 }; > @@ -504,8 +522,7 @@ brw_meta_fast_clear(struct brw_context *brw, struct > gl_framebuffer *fb, > >switch (clear_type) { >case FAST_CLEAR: > - irb->mt->fast_clear_color_value = > -compute_fast_clear_color_bits(>Color.ClearColor); > + set_fast_clear_color(brw, irb->mt, >Color.ClearColor); > irb->need_downsample = true; > > /* If the buffer is
Re: [Mesa-dev] [PATCH 05/10] i965/meta/gen9: Individually fast clear color attachments
On Tue, Oct 13, 2015 at 9:12 PM, Matt Turnerwrote: > On Tue, Oct 13, 2015 at 8:50 PM, Ben Widawsky > wrote: >> The impetus for this patch comes from a seemingly benign statement within the >> spec (quoted within the patch). For me, this patch was at some point critical >> for getting stable piglit results (though this did not seem to be the case >> on a >> branch Chad was working on). >> >> It is very important for clearing multiple color buffer attachments and can >> be >> observed in the following piglit tests: >> spec/arb_framebuffer_object/fbo-drawbuffers-none glclear >> spec/ext_framebuffer_multisample/blit-multiple-render-targets 0 >> >> Signed-off-by: Ben Widawsky >> --- >> src/mesa/drivers/dri/i965/brw_meta_fast_clear.c | 97 >> + >> 1 file changed, 84 insertions(+), 13 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c >> b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c >> index 7bf52f0..9e6711e 100644 >> --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c >> +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c >> @@ -427,6 +427,74 @@ use_rectlist(struct brw_context *brw, bool enable) >> brw->ctx.NewDriverState |= BRW_NEW_FRAGMENT_PROGRAM; >> } >> >> +/** >> + * Individually fast clear each color buffer attachment. On previous gens >> this >> + * isn't required. The motivation for this comes from one line (which seems >> to >> + * be specific to SKL+). The list item is in section titled _MCS Buffer for >> + * Render Target(s)_ >> + * >> + * "Since only one RT is bound with a clear pass, only one RT can be >> cleared >> + * at a time. To clear multiple RTs, multiple clear passes are required." >> + * >> + * The code follows the same idea as the resolve code which creates a fake >> FBO >> + * to avoid interfering with too much of the GL state. >> + */ >> +static void >> +fast_clear_attachments(struct brw_context *brw, >> + struct gl_framebuffer *fb, >> + uint32_t fast_clear_buffers, >> + struct rect fast_clear_rect) >> +{ >> + assert(brw->gen >= 9); >> + struct gl_context *ctx = >ctx; >> + >> + GLuint old_fb = ctx->DrawBuffer->Name; >> + >> + for (unsigned buf = 0; buf < fb->_NumColorDrawBuffers; buf++) { >> + struct gl_renderbuffer *rb = fb->_ColorDrawBuffers[buf]; >> + struct intel_renderbuffer *irb = intel_renderbuffer(rb); >> + GLuint fbo, rbo; >> + int index = fb->_ColorDrawBufferIndexes[buf]; >> + >> + if (!((1 << index) & fast_clear_buffers)) > > Small suggestion: I think this would read better as > ((fast_clear_buffers & (1 << index)) != 0. Oops, sorry -- of course I meant ((fast_clear_buffers & (1 << index)) == 0. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: remove unused texUnit local in _mesa_BindTextureUnit()
The texture unit is error-checked before this and the texUnit var is unused, so remove it. --- src/mesa/main/texobj.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c index b571b1b..3182920 100644 --- a/src/mesa/main/texobj.c +++ b/src/mesa/main/texobj.c @@ -1759,19 +1759,12 @@ _mesa_BindTextureUnit(GLuint unit, GLuint texture) { GET_CURRENT_CONTEXT(ctx); struct gl_texture_object *texObj; - struct gl_texture_unit *texUnit; if (unit >= _mesa_max_tex_unit(ctx)) { _mesa_error(ctx, GL_INVALID_VALUE, "glBindTextureUnit(unit=%u)", unit); return; } - texUnit = _mesa_get_tex_unit(ctx, unit); - assert(texUnit); - if (!texUnit) { - return; - } - if (MESA_VERBOSE & (VERBOSE_API|VERBOSE_TEXTURE)) _mesa_debug(ctx, "glBindTextureUnit %s %d\n", _mesa_enum_to_string(GL_TEXTURE0+unit), (GLint) texture); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: remove unused functions in program.c
replace_registers() and adjust_param_indexes() were unused. --- src/mesa/program/program.c | 51 -- 1 file changed, 51 deletions(-) diff --git a/src/mesa/program/program.c b/src/mesa/program/program.c index c35a89b..86de5e9 100644 --- a/src/mesa/program/program.c +++ b/src/mesa/program/program.c @@ -450,57 +450,6 @@ _mesa_delete_instructions(struct gl_program *prog, GLuint start, GLuint count) /** - * Search instructions for registers that match (oldFile, oldIndex), - * replacing them with (newFile, newIndex). - */ -static void -replace_registers(struct prog_instruction *inst, GLuint numInst, - GLuint oldFile, GLuint oldIndex, - GLuint newFile, GLuint newIndex) -{ - GLuint i, j; - for (i = 0; i < numInst; i++) { - /* src regs */ - for (j = 0; j < _mesa_num_inst_src_regs(inst[i].Opcode); j++) { - if (inst[i].SrcReg[j].File == oldFile && - inst[i].SrcReg[j].Index == oldIndex) { -inst[i].SrcReg[j].File = newFile; -inst[i].SrcReg[j].Index = newIndex; - } - } - /* dst reg */ - if (inst[i].DstReg.File == oldFile && inst[i].DstReg.Index == oldIndex) { - inst[i].DstReg.File = newFile; - inst[i].DstReg.Index = newIndex; - } - } -} - - -/** - * Search instructions for references to program parameters. When found, - * increment the parameter index by 'offset'. - * Used when combining programs. - */ -static void -adjust_param_indexes(struct prog_instruction *inst, GLuint numInst, - GLuint offset) -{ - GLuint i, j; - for (i = 0; i < numInst; i++) { - for (j = 0; j < _mesa_num_inst_src_regs(inst[i].Opcode); j++) { - GLuint f = inst[i].SrcReg[j].File; - if (f == PROGRAM_CONSTANT || - f == PROGRAM_UNIFORM || - f == PROGRAM_STATE_VAR) { -inst[i].SrcReg[j].Index += offset; - } - } - } -} - - -/** * Populate the 'used' array with flags indicating which registers (TEMPs, * INPUTs, OUTPUTs, etc, are used by the given program. * \param file type of register to scan for -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization
On 13/10/15 03:10, Matt Turner wrote: > On Mon, Oct 12, 2015 at 4:25 PM, Matt Turnerwrote: >> On Sat, Oct 10, 2015 at 4:24 AM, Alejandro Piñeiro >> wrote: >>> vec4 port of fs_cmod_propagation. >>> >>> Shader-db results: >>> total instructions in shared programs: 6241226 -> 6224469 (-0.27%) >>> instructions in affected programs: 498213 -> 481456 (-3.36%) >>> helped:3082 >>> HURT: 0 >>> --- >>> >>> The final outcome is really similar to fs_brw_cmod_propagation. In >>> fact the only difference is that on fs we have this: >>> if (scan_inst->overwrites_reg(inst->src[0])) { >>> if (scan_inst->is_partial_write() || >>> scan_inst->dst.reg_offset != inst->src[0].reg_offset) >>>break; >>> >>> And on vec4 (this commit) we have this: >>> if (inst->src[0].in_range(scan_inst->dst, >>>scan_inst->regs_written)) { >>> >>> if ((scan_inst->predicate && scan_inst->opcode != >>> BRW_OPCODE_SEL) || >>> scan_inst->dst.reg_offset != inst->src[0].reg_offset || >>> (scan_inst->dst.writemask != WRITEMASK_X && >>> scan_inst->dst.writemask != WRITEMASK_XYZW)) >>>break; >>> >>> if (scan_inst->dst.writemask == WRITEMASK_XYZW && >>> inst->src[0].swizzle != BRW_SWIZZLE_XYZW) { >>>break; >>> } >>> >>> So at some point I thought about refactoring it and having one common, >>> like with opt_predicated_break, but that one was possible with just >>> backend_instructions, while here we would need to deal with >>> vec4_instructions and fs_inst, that could be somewhat messy, so >>> I'm leaving this as it is. >>> >>> src/mesa/drivers/dri/i965/Makefile.sources | 1 + >>> src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + >>> src/mesa/drivers/dri/i965/brw_vec4.h | 1 + >>> .../drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 163 >>> + >>> 4 files changed, 166 insertions(+) >>> create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >>> >>> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources >>> b/src/mesa/drivers/dri/i965/Makefile.sources >>> index 81ef628..c1836d6 100644 >>> --- a/src/mesa/drivers/dri/i965/Makefile.sources >>> +++ b/src/mesa/drivers/dri/i965/Makefile.sources >>> @@ -56,6 +56,7 @@ i965_compiler_FILES = \ >>> brw_util.c \ >>> brw_util.h \ >>> brw_vec4_builder.h \ >>> + brw_vec4_cmod_propagation.cpp \ >>> brw_vec4_copy_propagation.cpp \ >>> brw_vec4.cpp \ >>> brw_vec4_cse.cpp \ >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> index e966b96..55e381b 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp >>> @@ -1867,6 +1867,7 @@ vec4_visitor::run() >>>OPT(dead_code_eliminate); >>>OPT(dead_control_flow_eliminate, this); >>>OPT(opt_copy_propagation); >>> + OPT(opt_cmod_propagation); >>>OPT(opt_cse); >>>OPT(opt_algebraic); >>>OPT(opt_register_coalesce); >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h >>> b/src/mesa/drivers/dri/i965/brw_vec4.h >>> index 5e3500c..3c1711d 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4.h >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h >>> @@ -149,6 +149,7 @@ public: >>> int var_range_start(unsigned v, unsigned n) const; >>> int var_range_end(unsigned v, unsigned n) const; >>> bool virtual_grf_interferes(int a, int b); >>> + bool opt_cmod_propagation(); >>> bool opt_copy_propagation(bool do_constant_prop = true); >>> bool opt_cse_local(bblock_t *block); >>> bool opt_cse(); >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >>> new file mode 100644 >>> index 000..7e39d2b >>> --- /dev/null >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp >>> @@ -0,0 +1,163 @@ >>> +/* >>> + * Copyright © 2015 Intel Corporation >>> + * >>> + * Permission is hereby granted, free of charge, to any person obtaining a >>> + * copy of this software and associated documentation files (the >>> "Software"), >>> + * to deal in the Software without restriction, including without >>> limitation >>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense, >>> + * and/or sell copies of the Software, and to permit persons to whom the >>> + * Software is furnished to do so, subject to the following conditions: >>> + * >>> + * The above copyright notice and this permission notice (including the >>> next >>> + * paragraph) shall be included in all copies or substantial portions of >>> the >>> + * Software. >>> + * >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT
Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier
On Mon, 2015-10-12 at 01:06 +0200, Marek Olšák wrote: > On Sun, Oct 11, 2015 at 9:20 AM, Timothy Arceri < > t_arc...@yahoo.com.au> wrote: > > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote: > > > Hi Timothy, > > > > > > One of these 3 commits breaks compilation for Talos shaders with > > > gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a > > > minimal test case. I can't say which commit, because Mesa fails > > > to > > > build between them. > > > It has something to do with uniforms, structures, > > > and samplers. > > > > > > commit dcd9cd03837545055ce2a315e7e8840cc3254d1a > > > Author: Timothy Arceri> > > Date: Sun Aug 30 12:50:34 2015 +1000 > > > > > > glsl: store uniform slot id in var location field > > > > > > > Hi Marek, > > > > The piglit test passes on my intel based laptop I can't test on > > anything else until later this week (as I'm traveling) but my guess > > is > > its something to do with the above patch. > > > > The other two patches shouldn't change anything for gallium drivers > > "glsl: assign hidden uniforms their slot id earlier" just assigns > > hidden uniforms their slot id earlier and there shouldn't be any > > difference once the IR gets to glsl_to_tgsi. > > > > Also "glsl: order indices for samplers inside a struct array" > > shouldn't > > change things either as in your test the sampler are not inside the > > struct. > > > > There is some code in the glsl_to_tgsi pass that looks like the > > location field would have always been -1 for uniforms other the UBO > > and > > UBO members maybe this has something to do with the problem now > > that > > all uniforms now get a non -1 value. > > > > case ir_var_uniform: > > entry = new(mem_ctx) variable_storage(var, > > PROGRAM_UNIFORM, > >var->data.location); > > this->variables.push_tail(entry); > > break; > > > > I hope this helps get you started. If you haven't figured it out by > > later in the week than I'll take a look on my desktop once I get > > home. > > The problem is that _mesa_get_sampler_uniform_value returns a sampler > index which is greater than 16. If I allow large sampler indices, it > starts to fail with sampler.file==TGSI_FILE_NULL in the TGSI codegen > later. I didn't investigate further. > > Marek So from my bisect run your test was passing until: a907b5dd162b7911b8c21f6d54837831bc078059 is the first bad commit commit a907b5dd162b7911b8c21f6d54837831bc078059 Author: Marek Olšák Date: Mon Oct 5 03:26:48 2015 +0200 st/mesa: translate fragment shaders into TGSI when we get them Reviewed-by: Dave Airlie Reviewed-by: Brian Paul Tested-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: fix matrix stride calculation for std430's row_major matrices with two columns
Thanks Samuel, This also fixes 10+ OpenGL ES 3.1 test cases. Reviewed-by: Marta Lofstedt> -Original Message- > From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On > Behalf Of Samuel Iglesias Gonsalvez > Sent: Tuesday, October 13, 2015 8:39 AM > To: mesa-dev@lists.freedesktop.org > Subject: [Mesa-dev] [PATCH] glsl: fix matrix stride calculation for std430's > row_major matrices with two columns > > This is the result of applying several rules: > > From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform Block Layout": > > "2. If the member is a two- or four-component vector with components > consuming N basic machine units, the base alignment is 2N or 4N, > respectively." > [...] > "4. If the member is an array of scalars or vectors, the base alignment and > array stride are set to match the base alignment of a single array element, > according to rules (1), (2), and (3), and rounded up to the base alignment of > a > vec4." > [...] > "7. If the member is a row-major matrix with C columns and R rows, the > matrix is stored identically to an array of R row vectors with C components > each, according to rule (4)." > [...] > "When using the std430 storage layout, shader storage blocks will be laid out > in buffer storage identically to uniform and shader storage blocks using the > std140 layout, except that the base alignment and stride of arrays of scalars > and vectors in rule 4 and of structures in rule 9 are not rounded up a > multiple > of the base alignment of a vec4." > > In summary: vec2 has a base alignment of 2*N, a row-major mat2xY is stored > like an array of Y row vectors with 2 components each. Because of std430 > storage layout, the base alignment of the array of vectors is not rounded up > to vec4, so it is still 2*N. > > Fixes 15 dEQP tests: > > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_m > at2 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediu > mp_mat2 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_ > mat2 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_m > at2x3 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediu > mp_mat2x3 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_ > mat2x3 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_m > at2x4 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediu > mp_mat2x4 > dEQP- > GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_ > mat2x4 > dEQP- > GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2 > dEQP- > GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x3 > dEQP- > GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x4 > dEQP- > GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major > _mat2 > dEQP- > GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major > _mat2x3 > dEQP- > GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major > _mat2x4 > > v2: > - Add spec quote in both commit log and code (Timothy) > > Signed-off-by: Samuel Iglesias Gonsalvez > Cc: Timothy Arceri > --- > src/glsl/lower_ubo_reference.cpp | 26 +- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/src/glsl/lower_ubo_reference.cpp > b/src/glsl/lower_ubo_reference.cpp > index 247620e..c8ec5c1 100644 > --- a/src/glsl/lower_ubo_reference.cpp > +++ b/src/glsl/lower_ubo_reference.cpp > @@ -744,7 +744,31 @@ lower_ubo_reference_visitor::emit_access(bool > is_write, > * or 32 depending on the number of columns. > */ >assert(matrix_columns <= 4); > - unsigned matrix_stride = glsl_align(matrix_columns * N, 16); > + unsigned matrix_stride = 0; > + /* Matrix stride for std430 mat2xY matrices are not rounded up to > + * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform > + * Block Layout": > + * > + * "2. If the member is a two- or four-component vector with > components > + * consuming N basic machine units, the base alignment is 2N or 4N, > + * respectively." [...] > + * "4. If the member is an array of scalars or vectors, the base > alignment > + * and array stride are set to match the base alignment of a single > array > + * element, according to rules (1), (2), and (3), and rounded up to the > + * base alignment of a vec4." [...] > + * "7. If the member is a row-major matrix with C columns and R rows, > the > + * matrix is stored identically to an array of R row vectors with C > + * components each, according to rule (4)." [...] > + * "When using the std430 storage layout, shader storage blocks will be > + * laid out in
Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier
On Mon, 2015-10-12 at 01:06 +0200, Marek Olšák wrote: > On Sun, Oct 11, 2015 at 9:20 AM, Timothy Arceri < > t_arc...@yahoo.com.au> wrote: > > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote: > > > Hi Timothy, > > > > > > One of these 3 commits breaks compilation for Talos shaders with > > > gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a > > > minimal test case. I can't say which commit, because Mesa fails > > > to > > > build between them. > > > It has something to do with uniforms, structures, > > > and samplers. > > > > > > commit dcd9cd03837545055ce2a315e7e8840cc3254d1a > > > Author: Timothy Arceri> > > Date: Sun Aug 30 12:50:34 2015 +1000 > > > > > > glsl: store uniform slot id in var location field > > > > > > > Hi Marek, > > > > The piglit test passes on my intel based laptop I can't test on > > anything else until later this week (as I'm traveling) but my guess > > is > > its something to do with the above patch. > > > > The other two patches shouldn't change anything for gallium drivers > > "glsl: assign hidden uniforms their slot id earlier" just assigns > > hidden uniforms their slot id earlier and there shouldn't be any > > difference once the IR gets to glsl_to_tgsi. > > > > Also "glsl: order indices for samplers inside a struct array" > > shouldn't > > change things either as in your test the sampler are not inside the > > struct. > > > > There is some code in the glsl_to_tgsi pass that looks like the > > location field would have always been -1 for uniforms other the UBO > > and > > UBO members maybe this has something to do with the problem now > > that > > all uniforms now get a non -1 value. > > > > case ir_var_uniform: > > entry = new(mem_ctx) variable_storage(var, > > PROGRAM_UNIFORM, > >var->data.location); > > this->variables.push_tail(entry); > > break; > > > > I hope this helps get you started. If you haven't figured it out by > > later in the week than I'll take a look on my desktop once I get > > home. > > The problem is that _mesa_get_sampler_uniform_value returns a sampler > index which is greater than 16. If I allow large sampler indices, it > starts to fail with sampler.file==TGSI_FILE_NULL in the TGSI codegen > later. I didn't investigate further. Hi Marek, I just built the ilo driver to test this and I was able to get this error but only after updating master. I don't have time for a full bisect right now as I'm about to get on a flight but commit 64831832791139328a67b80387f062d39e304d24 is good and is well after this commit. Tim > > Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Set api prefix to version string when overriding version
2015-10-13 13:49 GMT+08:00 Tapani Pälli: > Otherwise there are problems when user overrides version and application > such as Piglit wants to detect used api with glGetString(GL_VERSION). > > Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1. > > Before: > "3.1 Mesa 11.1.0-devel (git-24a1a15)" > > After: > "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)" > > Signed-off-by: Tapani Pälli > --- > src/mesa/main/version.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c > index 498b2f8..a1b1db5 100644 > --- a/src/mesa/main/version.c > +++ b/src/mesa/main/version.c > @@ -24,6 +24,7 @@ > > > #include > +#include "context.h" > #include "imports.h" > #include "mtypes.h" > #include "version.h" > @@ -181,7 +182,7 @@ _mesa_override_gl_version(struct gl_context *ctx) > { > if (_mesa_override_gl_version_contextless(>Const, >API, > >Version)) { > - create_version_string(ctx, ""); > + create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : "OpenGL > "); Things are a little bit funny here, because the version strings of OpenGL and ES are different. From OpenGL 4.5 (Core Profile) spec, Page 541: """ The VERSION and SHADING_LANGUAGE_VERSION strings are laid out as follows: """ From OpenGL ES 3.2 spec, Page 436: """ The VERSION string is laid out as follows: "OpenGL ES N.M vendor-specific information" """ So I don't think the "OpenGL" prefix is necessary on desktop GL. Regards, Boyan Ding > } > } > > -- > 2.4.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: fix matrix stride calculation for std430's row_major matrices with two columns
This is the result of applying several rules: From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform Block Layout": "2. If the member is a two- or four-component vector with components consuming N basic machine units, the base alignment is 2N or 4N, respectively." [...] "4. If the member is an array of scalars or vectors, the base alignment and array stride are set to match the base alignment of a single array element, according to rules (1), (2), and (3), and rounded up to the base alignment of a vec4." [...] "7. If the member is a row-major matrix with C columns and R rows, the matrix is stored identically to an array of R row vectors with C components each, according to rule (4)." [...] "When using the std430 storage layout, shader storage blocks will be laid out in buffer storage identically to uniform and shader storage blocks using the std140 layout, except that the base alignment and stride of arrays of scalars and vectors in rule 4 and of structures in rule 9 are not rounded up a multiple of the base alignment of a vec4." In summary: vec2 has a base alignment of 2*N, a row-major mat2xY is stored like an array of Y row vectors with 2 components each. Because of std430 storage layout, the base alignment of the array of vectors is not rounded up to vec4, so it is still 2*N. Fixes 15 dEQP tests: dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x3 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x3 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x3 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x4 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x4 dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x4 dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2 dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x3 dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x4 dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2 dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x3 dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x4 v2: - Add spec quote in both commit log and code (Timothy) Signed-off-by: Samuel Iglesias GonsalvezCc: Timothy Arceri --- src/glsl/lower_ubo_reference.cpp | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp index 247620e..c8ec5c1 100644 --- a/src/glsl/lower_ubo_reference.cpp +++ b/src/glsl/lower_ubo_reference.cpp @@ -744,7 +744,31 @@ lower_ubo_reference_visitor::emit_access(bool is_write, * or 32 depending on the number of columns. */ assert(matrix_columns <= 4); - unsigned matrix_stride = glsl_align(matrix_columns * N, 16); + unsigned matrix_stride = 0; + /* Matrix stride for std430 mat2xY matrices are not rounded up to + * vec4 size. From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform + * Block Layout": + * + * "2. If the member is a two- or four-component vector with components + * consuming N basic machine units, the base alignment is 2N or 4N, + * respectively." [...] + * "4. If the member is an array of scalars or vectors, the base alignment + * and array stride are set to match the base alignment of a single array + * element, according to rules (1), (2), and (3), and rounded up to the + * base alignment of a vec4." [...] + * "7. If the member is a row-major matrix with C columns and R rows, the + * matrix is stored identically to an array of R row vectors with C + * components each, according to rule (4)." [...] + * "When using the std430 storage layout, shader storage blocks will be + * laid out in buffer storage identically to uniform and shader storage + * blocks using the std140 layout, except that the base alignment and + * stride of arrays of scalars and vectors in rule 4 and of structures in + * rule 9 are not rounded up a multiple of the base alignment of a vec4." + */ + if (packing == GLSL_INTERFACE_PACKING_STD430 && matrix_columns == 2) + matrix_stride = 2 * N; + else + matrix_stride = glsl_align(matrix_columns * N, 16); const glsl_type *deref_type = deref->type->base_type == GLSL_TYPE_FLOAT ? glsl_type::float_type : glsl_type::double_type; -- 2.1.4 ___ mesa-dev mailing list
Re: [Mesa-dev] [PATCH] mesa: Set api prefix to version string when overriding version
On 10/13/2015 09:35 AM, Boyan Ding wrote: 2015-10-13 13:49 GMT+08:00 Tapani Pälli: Otherwise there are problems when user overrides version and application such as Piglit wants to detect used api with glGetString(GL_VERSION). Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1. Before: "3.1 Mesa 11.1.0-devel (git-24a1a15)" After: "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)" Signed-off-by: Tapani Pälli --- src/mesa/main/version.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c index 498b2f8..a1b1db5 100644 --- a/src/mesa/main/version.c +++ b/src/mesa/main/version.c @@ -24,6 +24,7 @@ #include +#include "context.h" #include "imports.h" #include "mtypes.h" #include "version.h" @@ -181,7 +182,7 @@ _mesa_override_gl_version(struct gl_context *ctx) { if (_mesa_override_gl_version_contextless(>Const, >API, >Version)) { - create_version_string(ctx, ""); + create_version_string(ctx, _mesa_is_gles(ctx) ? "OpenGL ES " : "OpenGL "); Things are a little bit funny here, because the version strings of OpenGL and ES are different. From OpenGL 4.5 (Core Profile) spec, Page 541: """ The VERSION and SHADING_LANGUAGE_VERSION strings are laid out as follows: """ From OpenGL ES 3.2 spec, Page 436: """ The VERSION string is laid out as follows: "OpenGL ES N.M vendor-specific information" """ So I don't think the "OpenGL" prefix is necessary on desktop GL. ok, I was not a aware of such inconsistency. I guess it should be ok then to change "OpenGL " as "" in the patch. Thanks for pointing this out! Regards, Boyan Ding } } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] radeonsi: implement vertex color clamping
On 11.10.2015 10:17, Marek Olšák wrote: > From: Marek Olšák> > This is only supported in the compatibility profile (without GS and tess). [...] > diff --git a/src/gallium/drivers/radeonsi/si_state.c > b/src/gallium/drivers/radeonsi/si_state.c > index 3aafe8a..a81060a 100644 > --- a/src/gallium/drivers/radeonsi/si_state.c > +++ b/src/gallium/drivers/radeonsi/si_state.c > @@ -760,6 +760,8 @@ static void *si_create_rs_state(struct pipe_context *ctx, > state->fill_back != PIPE_POLYGON_MODE_FILL) | > > S_028814_POLYMODE_FRONT_PTYPE(si_translate_fill(state->fill_front)) | > > S_028814_POLYMODE_BACK_PTYPE(si_translate_fill(state->fill_back))); > + si_pm4_set_reg(pm4, R_00B130_SPI_SHADER_USER_DATA_VS_0 + > + SI_SGPR_VS_STATE_BITS * 4, > state->clamp_vertex_color); The second line added here seems to have a few too many spaces of indentation. With that fixed, this series is Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier
On 11.10.2015 16:20, Timothy Arceri wrote: > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote: >> Hi Timothy, >> >> One of these 3 commits breaks compilation for Talos shaders with >> gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a >> minimal test case. I can't say which commit, because Mesa fails to >> build between them. >> It has something to do with uniforms, structures, >> and samplers. >> >> commit dcd9cd03837545055ce2a315e7e8840cc3254d1a >> Author: Timothy Arceri>> Date: Sun Aug 30 12:50:34 2015 +1000 >> >> glsl: store uniform slot id in var location field >> > > Hi Marek, > > The piglit test passes on my intel based laptop I can't test on > anything else until later this week (as I'm traveling) [...] FWIW, surely you can test llvmpipe on that laptop? :) -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/10] RadeonSI: Better LLVM IR generation
On 11.10.2015 10:29, Marek Olšák wrote: > > This patch series improves IR generation for radeonsi. Most of it removes > uses of AMDGPU intrinsics. Patches 3-7 are Reviewed-by: Michel Dänzer-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: couple shader_enums cleanups
On 10 October 2015 at 01:56, Rob Clarkwrote: > On Fri, Oct 9, 2015 at 8:07 PM, Emil Velikov wrote: >> On 9 October 2015 at 23:27, Rob Clark wrote: >>> On Fri, Oct 9, 2015 at 5:57 PM, Emil Velikov >>> wrote: On 9 October 2015 at 21:31, Rob Clark wrote: > From: Rob Clark > > Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX / > FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil. > > Reported-by: Emil Velikov > Signed-off-by: Rob Clark > --- > Note: punted on the STATIC_ASSERT() thing for now.. kinda wanted some- > think more like tgsi_strings_check() where we check everything once on > startup, so you don't have to trigger something that calls the various > xyz_name() to realize something is out of sync. > I'm confused - isn't static_assert a compile thing ? >>> >>> oh, heh, good point.. >>> > src/glsl/nir/shader_enums.c | 1 + > src/glsl/nir/shader_enums.h | 7 +++ > src/mesa/main/mtypes.h | 5 - > 3 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c > index 3722475..7fcbe81 100644 > --- a/src/glsl/nir/shader_enums.c > +++ b/src/glsl/nir/shader_enums.c > @@ -169,6 +169,7 @@ const char * gl_system_value_name(gl_system_value > sysval) > ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER), > ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID), > ENUM(SYSTEM_VALUE_WORK_GROUP_ID), > + ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS), > ENUM(SYSTEM_VALUE_VERTEX_CNT), > }; > return NAME(sysval); > diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h > index 2a5d2c5..77638ba 100644 > --- a/src/glsl/nir/shader_enums.h > +++ b/src/glsl/nir/shader_enums.h > @@ -233,6 +233,11 @@ typedef enum > VARYING_SLOT_VAR31, > } gl_varying_slot; > > + > +#define VARYING_SLOT_MAX (VARYING_SLOT_VAR0 + MAX_VARYING) I'd keep this and FRAG_RESULT_MAX defined as FOO + 1. Otherwise we'll end in funny spaghetti of header dependencies due to the extra two macros. >>> >>> not quite sure, but it sounds like you are asking to open-code >>> MAX_VARYINGS? I don't think that is needed since that seems to >>> basically be a global config.h type thing (I mean mesa/main/config.h, >>> not autoconf one) >>> >>> There isn't really *supposed* to be a VARYING_SLOT_VAR(n-1) (ignoring >>> the fact that I added them simply to get nice string values to print >>> out in nir_print) >>> >> I was thinking that #define VARYING_SLOT_MAX (VARYING_SLOT_VAR + 1) >> /*VARYING_SLOT_VAR(n) */ is the preferred way. >> Hmm can you please point me in the right direction, as to why having >> VARYING_SLOT_VAR(n-1) is a bad idea ? > > A few of the enums are meant to have their last element be interpreted > as xyz0 + n (where n < #define MAX_xyz from config.h) > > I guess technically the max isn't something that is likely to change > frequently but if we ignore the MAX_xyz from config.h that means we > have two places to update if the max was ever increased. > If we go for the 0 + MAX_foo approach, we'll get some lovely build warnings. On the other hand things will break nicely with +1. I'm slightly leaning towards the latter, as it seems that currently things are subtle/brittle and by breaking them (albeit unlikely in the near future) we'll get a victim^Wvolunteer to untangle/make things obvious. Just my 2c. I'm not proposing that you have to do anything on the topic, neither am I volunteering. > +#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX) > +#define VARYING_SLOT_TESS_MAX (VARYING_SLOT_PATCH0 + MAX_VARYING) > + As there are defined 'on top'/'after' of the existing enum perhaps we should keep them alongside their PRIM_MAX...UNKNOWN brethren ? >>> >>> PRIM_MAX is a different thing, not related to anything that is already >>> in shader_enums.. >>> >> I had in mind the relationship wrt the respective enums/macros, rather >> than their meaning. Although with the _MAX above in mind, I'm likely >> missing something. > > prim == gl draw primitive (ie. tris/tristrip/quads/points/etc).. not > something that is internal to mesa or in shader_enums.h.. > That's precisely what I meant. PRIM_MAX and VARYING_SLOT_MAX are well defined quantities with the former unlikey to ever change. VARYING_SLOT_PATCH0 and VARYING_SLOT_TESS_MAX on the other hand, are 'duck taped' on top of the existing gl_varying_slot enum. Similar to PRIM_OUTSIDE_BEGIN_END and PRIM_UNKNOWN. Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] tgsi: try and handle overflowing shaders.
On Tue, Oct 13, 2015 at 6:40 AM, Dave Airliewrote: > From: Dave Airlie > > This is used to detect error in virgl if we overflow the shader > dumping buffers. > > Signed-off-by: Dave Airlie > --- > src/gallium/auxiliary/tgsi/tgsi_dump.c | 10 -- > src/gallium/auxiliary/tgsi/tgsi_dump.h | 2 +- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c > b/src/gallium/auxiliary/tgsi/tgsi_dump.c > index 33f6a56..f341783 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c > +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c > @@ -708,6 +708,7 @@ struct str_dump_ctx > char *str; > char *ptr; > int left; > + bool nospace; > }; > > static void > @@ -730,10 +731,11 @@ str_dump_ctx_printf(struct dump_ctx *ctx, const char > *format, ...) > sctx->ptr += written; > sctx->left -= written; >} > - } > + } else > + sctx->nospace = true; > } > > -void > +int > tgsi_dump_str( > const struct tgsi_token *tokens, > uint flags, > @@ -760,6 +762,7 @@ tgsi_dump_str( > ctx.str[0] = 0; > ctx.ptr = str; > ctx.left = (int)size; > + ctx.nospace = false; > > if (flags & TGSI_DUMP_FLOAT_AS_HEX) >ctx.base.dump_float_as_hex = TRUE; > @@ -767,6 +770,8 @@ tgsi_dump_str( >ctx.base.dump_float_as_hex = FALSE; > > tgsi_iterate_shader( tokens, ); > + > + return (ctx.nospace == true) ? -1 : 0; Why not just return bool meaning success/failure? Anyway: Reviewed-by: Marek Olšák Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/10] RadeonSI cleanups
On 11.10.2015 10:11, Marek Olšák wrote: > Nothing special here other than cleanups. One patch disables NaNs for LS and > HS, and there's also one GS shader leak fix. > > Please review. Patches 1, 3-6 & 8 are Reviewed-by: Michel Dänzer-- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: ensure RM is set
On 10 October 2015 at 07:42, Jonathan Graywrote: > GNU make predefines RM to rm -f but this is not required by POSIX > so ensure that RM is set. This fixes "make clean" on OpenBSD. > > v2: use AC_CHECK_PROG > I'm ambivalent if we go the AC_CHECK_PROG vs ?= route. I'll let others pick their favourite, but both are Reviewed-by: Emil Velikov -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] vbo: fix incorrect switch statement in init_mat_currval()
Reviewed-by: Marek OlšákMarek On Tue, Oct 13, 2015 at 4:45 AM, Brian Paul wrote: > The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting > VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default > switch clause for all loop iterations. > > This doesn't fix any known issues but was clearly incorrect. > > Cc: mesa-sta...@lists.freedesktop.org > --- > src/mesa/vbo/vbo_context.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c > index e3eb286..802955d 100644 > --- a/src/mesa/vbo/vbo_context.c > +++ b/src/mesa/vbo/vbo_context.c > @@ -121,7 +121,7 @@ static void init_mat_currval(struct gl_context *ctx) >/* Size is fixed for the material attributes, for others will > * be determined at runtime: > */ > - switch (i - VERT_ATTRIB_GENERIC0) { > + switch (i) { >case MAT_ATTRIB_FRONT_SHININESS: >case MAT_ATTRIB_BACK_SHININESS: > cl->Size = 1; > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] i965/vec4: skip control-flow aware liveness analysis if we only have 1 block
Otherwise, what this is going to do is mark every variable alive in the single block we have, and thus, alive since its definition until the program ends, which is really bad in situations where register pressure is high. Of course, this does not address the real problem: that our liveness analysis is not good enough, but it helps some simple cases with no effort. --- src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index 6782379..afdecc2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -287,6 +287,13 @@ vec4_visitor::calculate_live_intervals() */ this->live_intervals = new(mem_ctx) vec4_live_variables(alloc, cfg); + /* If we only have one block this pass would only mark all used variables +* as alive in the block, and thus, alive in the entire program. We don't +* want that. +*/ + if (cfg->block_list.length() <= 1) + return; + foreach_block (block, cfg) { struct block_data *bd = _intervals->block_data[block->num]; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals
--- src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp index cc688ef..6782379 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_live_variables.cpp @@ -291,15 +291,15 @@ vec4_visitor::calculate_live_intervals() struct block_data *bd = _intervals->block_data[block->num]; for (int i = 0; i < live_intervals->num_vars; i++) { -if (BITSET_TEST(bd->livein, i)) { - start[i] = MIN2(start[i], block->start_ip); - end[i] = MAX2(end[i], block->start_ip); -} + if (BITSET_TEST(bd->livein, i)) { +start[i] = MIN2(start[i], block->start_ip); +end[i] = MAX2(end[i], block->start_ip); + } -if (BITSET_TEST(bd->liveout, i)) { - start[i] = MIN2(start[i], block->end_ip); - end[i] = MAX2(end[i], block->end_ip); -} + if (BITSET_TEST(bd->liveout, i)) { +start[i] = MIN2(start[i], block->end_ip); +end[i] = MAX2(end[i], block->end_ip); + } } } } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] i965/fs: Fix indentation in fs_live_variables::compute_start_end
--- src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index 19aec92..ce066a9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -259,16 +259,15 @@ fs_live_variables::compute_start_end() struct block_data *bd = _data[block->num]; for (int i = 0; i < num_vars; i++) { -if (BITSET_TEST(bd->livein, i)) { - start[i] = MIN2(start[i], block->start_ip); - end[i] = MAX2(end[i], block->start_ip); -} - -if (BITSET_TEST(bd->liveout, i)) { - start[i] = MIN2(start[i], block->end_ip); - end[i] = MAX2(end[i], block->end_ip); -} + if (BITSET_TEST(bd->livein, i)) { +start[i] = MIN2(start[i], block->start_ip); +end[i] = MAX2(end[i], block->start_ip); + } + if (BITSET_TEST(bd->liveout, i)) { +start[i] = MIN2(start[i], block->end_ip); +end[i] = MAX2(end[i], block->end_ip); + } } } } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block
This fixes the following test: [require] GL >= 3.3 GLSL >= 3.30 GL_ARB_shader_storage_buffer_object [fragment shader] #version 330 #extension GL_ARB_shader_storage_buffer_object: require buffer SSBO { mat4 sm4; }; mat4 um4; void main() { sm4 *= um4; sm4[0][0] = 0.0; } [test] link success where our liveness analysis works really bad because the control-flow part will expand register liveness to the end of only block we have (so every register will be marked alive until the end of the program). This artificially increases register pressure to a point in which we run out of registers. Of course, this is only a simple optimization for a trivial case, the underlying problem still exists and would manifest when we have more than one block, but it helps simple shaders such as the one in the example above without any effort. I guess the real fix would involve re-thinking parts of the liveness analysis algorithm... Jordan, there is another thing that we can improve for this shader that is specific to SSBOs: we emit the same ssbo load multiple times because we are playing it safe just in case there are writes in between. I think we can try to do better and not re-issue the same load if we don't have ssbo stores to the same address in between. I'll try to come up with a patch for this (hopefully we can do this inside lower_ubo_reference). The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just indentation fixes in the code around these. Iago Toral Quiroga (4): i965/fs: Fix indentation in fs_live_variables::compute_start_end i965/fs: skip control-flow aware liveness analysis if we only have 1 block i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals i965/vec4: skip control-flow aware liveness analysis if we only have 1 block .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 ++ .../drivers/dri/i965/brw_vec4_live_variables.cpp | 23 + 2 files changed, 30 insertions(+), 17 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] i965/fs: skip control-flow aware liveness analysis if we only have 1 block
Otherwise, what this is going to do is mark every variable alive in the single block we have, and thus, alive since its definition until the program ends, which is really bad in situations where register pressure is high. Of course, this does not address the real problem: that our liveness analysis is not good enough, but it helps some simple cases with no effort. --- src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp index ce066a9..ce99bd3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_live_variables.cpp @@ -255,6 +255,13 @@ fs_live_variables::compute_live_variables() void fs_live_variables::compute_start_end() { + /* If we only have one block this pass would only mark all used variables +* as alive in the block, and thus, alive in the entire program. We don't +* want that. +*/ + if (cfg->block_list.length() <= 1) + return; + foreach_block (block, cfg) { struct block_data *bd = _data[block->num]; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V4 2/6] glsl: assign hidden uniforms their slot id earlier
On Tue, Oct 13, 2015 at 10:13 AM, Timothy Arceriwrote: > On Mon, 2015-10-12 at 01:06 +0200, Marek Olšák wrote: >> On Sun, Oct 11, 2015 at 9:20 AM, Timothy Arceri < >> t_arc...@yahoo.com.au> wrote: >> > On Sat, 2015-10-10 at 18:06 +0200, Marek Olšák wrote: >> > > Hi Timothy, >> > > >> > > One of these 3 commits breaks compilation for Talos shaders with >> > > gallium. My piglit patch "glsl-1.30/sampler-bug: ..." contains a >> > > minimal test case. I can't say which commit, because Mesa fails >> > > to >> > > build between them. >> > > It has something to do with uniforms, structures, >> > > and samplers. >> > > >> > > commit dcd9cd03837545055ce2a315e7e8840cc3254d1a >> > > Author: Timothy Arceri >> > > Date: Sun Aug 30 12:50:34 2015 +1000 >> > > >> > > glsl: store uniform slot id in var location field >> > > >> > >> > Hi Marek, >> > >> > The piglit test passes on my intel based laptop I can't test on >> > anything else until later this week (as I'm traveling) but my guess >> > is >> > its something to do with the above patch. >> > >> > The other two patches shouldn't change anything for gallium drivers >> > "glsl: assign hidden uniforms their slot id earlier" just assigns >> > hidden uniforms their slot id earlier and there shouldn't be any >> > difference once the IR gets to glsl_to_tgsi. >> > >> > Also "glsl: order indices for samplers inside a struct array" >> > shouldn't >> > change things either as in your test the sampler are not inside the >> > struct. >> > >> > There is some code in the glsl_to_tgsi pass that looks like the >> > location field would have always been -1 for uniforms other the UBO >> > and >> > UBO members maybe this has something to do with the problem now >> > that >> > all uniforms now get a non -1 value. >> > >> > case ir_var_uniform: >> > entry = new(mem_ctx) variable_storage(var, >> > PROGRAM_UNIFORM, >> >var->data.location); >> > this->variables.push_tail(entry); >> > break; >> > >> > I hope this helps get you started. If you haven't figured it out by >> > later in the week than I'll take a look on my desktop once I get >> > home. >> >> The problem is that _mesa_get_sampler_uniform_value returns a sampler >> index which is greater than 16. If I allow large sampler indices, it >> starts to fail with sampler.file==TGSI_FILE_NULL in the TGSI codegen >> later. I didn't investigate further. >> >> Marek > > > So from my bisect run your test was passing until: > > a907b5dd162b7911b8c21f6d54837831bc078059 is the first bad commit > commit a907b5dd162b7911b8c21f6d54837831bc078059 > Author: Marek Olšák > Date: Mon Oct 5 03:26:48 2015 +0200 > > st/mesa: translate fragment shaders into TGSI when we get them > > Reviewed-by: Dave Airlie > Reviewed-by: Brian Paul > Tested-by: Brian Paul It's a linker test, so you really need to set ST_DEBUG=precompile to get that shader translated before this commit. Using --enable-debug should make it fail an assertion earlier without ST_DEBUG=precompile. You didn't need to bisect. I had bisected the bug already and it's one of your 3 commits. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] program: convert _mesa_init_gl_program() to take struct gl_program *
Rather than accepting a void pointer, only to down and up cast around it, convert the function to take the base (struct gl_program) pointer. Cc: Marek OlšákSigned-off-by: Emil Velikov --- src/mesa/drivers/dri/i915/i915_fragprog.c | 9 ++-- src/mesa/drivers/dri/i965/brw_program.c| 6 +-- .../drivers/dri/i965/test_fs_cmod_propagation.cpp | 2 +- .../dri/i965/test_fs_saturate_propagation.cpp | 2 +- .../dri/i965/test_vec4_copy_propagation.cpp| 2 +- .../dri/i965/test_vec4_register_coalesce.cpp | 2 +- src/mesa/drivers/dri/r200/r200_vertprog.c | 17 +++ src/mesa/program/program.c | 55 ++ src/mesa/program/program.h | 2 +- src/mesa/state_tracker/st_cb_program.c | 38 --- 10 files changed, 68 insertions(+), 67 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i915_fragprog.c b/src/mesa/drivers/dri/i915/i915_fragprog.c index 237d219..59d7959 100644 --- a/src/mesa/drivers/dri/i915/i915_fragprog.c +++ b/src/mesa/drivers/dri/i915/i915_fragprog.c @@ -1315,9 +1315,10 @@ static struct gl_program * i915NewProgram(struct gl_context * ctx, GLenum target, GLuint id) { switch (target) { - case GL_VERTEX_PROGRAM_ARB: - return _mesa_init_gl_program(CALLOC_STRUCT(gl_vertex_program), - target, id); + case GL_VERTEX_PROGRAM_ARB: { + struct gl_vertex_program *prog = CALLOC_STRUCT(gl_vertex_program); + return _mesa_init_gl_program(>Base, target, id); + } case GL_FRAGMENT_PROGRAM_ARB:{ struct i915_fragment_program *prog = @@ -1325,7 +1326,7 @@ i915NewProgram(struct gl_context * ctx, GLenum target, GLuint id) if (prog) { i915_init_program(I915_CONTEXT(ctx), prog); -return _mesa_init_gl_program(prog, target, id); +return _mesa_init_gl_program(>FragProg.Base, target, id); } else return NULL; diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c index 164c3d7..b547d07 100644 --- a/src/mesa/drivers/dri/i965/brw_program.c +++ b/src/mesa/drivers/dri/i965/brw_program.c @@ -69,7 +69,7 @@ static struct gl_program *brwNewProgram( struct gl_context *ctx, if (prog) { prog->id = get_new_program_id(brw->intelScreen); -return _mesa_init_gl_program(>program, target, id); +return _mesa_init_gl_program(>program.Base, target, id); } else return NULL; @@ -80,7 +80,7 @@ static struct gl_program *brwNewProgram( struct gl_context *ctx, if (prog) { prog->id = get_new_program_id(brw->intelScreen); -return _mesa_init_gl_program(>program, target, id); +return _mesa_init_gl_program(>program.Base, target, id); } else return NULL; @@ -102,7 +102,7 @@ static struct gl_program *brwNewProgram( struct gl_context *ctx, if (prog) { prog->id = get_new_program_id(brw->intelScreen); - return _mesa_init_gl_program(>program, target, id); + return _mesa_init_gl_program(>program.Base, target, id); } else { return NULL; } diff --git a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp index 7eee426..5f80f90 100644 --- a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp +++ b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp @@ -66,7 +66,7 @@ void cmod_propagation_test::SetUp() v = new cmod_propagation_fs_visitor(compiler, prog_data, shader); - _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0); + _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0); devinfo->gen = 4; } diff --git a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp index fefde4b..32e8b8f 100644 --- a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp +++ b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp @@ -66,7 +66,7 @@ void saturate_propagation_test::SetUp() v = new saturate_propagation_fs_visitor(compiler, prog_data, shader); - _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0); + _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0); devinfo->gen = 4; } diff --git a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp index 4a87e6e..e80b71b 100644 --- a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp @@ -98,7 +98,7 @@ void copy_propagation_test::SetUp() v = new copy_propagation_vec4_visitor(compiler, shader); - _mesa_init_gl_program(>program, GL_VERTEX_SHADER, 0); + _mesa_init_gl_program(>program.Base, GL_VERTEX_SHADER, 0); devinfo->gen = 4; } diff --git
Re: [Mesa-dev] [PATCH] program: convert _mesa_init_gl_program() to take struct gl_program *
Reviewed-by: Marek OlšákMarek On Tue, Oct 13, 2015 at 12:26 PM, Emil Velikov wrote: > Rather than accepting a void pointer, only to down and up cast around > it, convert the function to take the base (struct gl_program) pointer. > > Cc: Marek Olšák > Signed-off-by: Emil Velikov > --- > src/mesa/drivers/dri/i915/i915_fragprog.c | 9 ++-- > src/mesa/drivers/dri/i965/brw_program.c| 6 +-- > .../drivers/dri/i965/test_fs_cmod_propagation.cpp | 2 +- > .../dri/i965/test_fs_saturate_propagation.cpp | 2 +- > .../dri/i965/test_vec4_copy_propagation.cpp| 2 +- > .../dri/i965/test_vec4_register_coalesce.cpp | 2 +- > src/mesa/drivers/dri/r200/r200_vertprog.c | 17 +++ > src/mesa/program/program.c | 55 > ++ > src/mesa/program/program.h | 2 +- > src/mesa/state_tracker/st_cb_program.c | 38 --- > 10 files changed, 68 insertions(+), 67 deletions(-) > > diff --git a/src/mesa/drivers/dri/i915/i915_fragprog.c > b/src/mesa/drivers/dri/i915/i915_fragprog.c > index 237d219..59d7959 100644 > --- a/src/mesa/drivers/dri/i915/i915_fragprog.c > +++ b/src/mesa/drivers/dri/i915/i915_fragprog.c > @@ -1315,9 +1315,10 @@ static struct gl_program * > i915NewProgram(struct gl_context * ctx, GLenum target, GLuint id) > { > switch (target) { > - case GL_VERTEX_PROGRAM_ARB: > - return _mesa_init_gl_program(CALLOC_STRUCT(gl_vertex_program), > - target, id); > + case GL_VERTEX_PROGRAM_ARB: { > + struct gl_vertex_program *prog = CALLOC_STRUCT(gl_vertex_program); > + return _mesa_init_gl_program(>Base, target, id); > + } > > case GL_FRAGMENT_PROGRAM_ARB:{ > struct i915_fragment_program *prog = > @@ -1325,7 +1326,7 @@ i915NewProgram(struct gl_context * ctx, GLenum target, > GLuint id) > if (prog) { > i915_init_program(I915_CONTEXT(ctx), prog); > > -return _mesa_init_gl_program(prog, target, id); > +return _mesa_init_gl_program(>FragProg.Base, target, id); > } > else > return NULL; > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > b/src/mesa/drivers/dri/i965/brw_program.c > index 164c3d7..b547d07 100644 > --- a/src/mesa/drivers/dri/i965/brw_program.c > +++ b/src/mesa/drivers/dri/i965/brw_program.c > @@ -69,7 +69,7 @@ static struct gl_program *brwNewProgram( struct gl_context > *ctx, >if (prog) { > prog->id = get_new_program_id(brw->intelScreen); > > -return _mesa_init_gl_program(>program, target, id); > +return _mesa_init_gl_program(>program.Base, target, id); >} >else > return NULL; > @@ -80,7 +80,7 @@ static struct gl_program *brwNewProgram( struct gl_context > *ctx, >if (prog) { > prog->id = get_new_program_id(brw->intelScreen); > > -return _mesa_init_gl_program(>program, target, id); > +return _mesa_init_gl_program(>program.Base, target, id); >} >else > return NULL; > @@ -102,7 +102,7 @@ static struct gl_program *brwNewProgram( struct > gl_context *ctx, >if (prog) { > prog->id = get_new_program_id(brw->intelScreen); > > - return _mesa_init_gl_program(>program, target, id); > + return _mesa_init_gl_program(>program.Base, target, id); >} else { > return NULL; >} > diff --git a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp > b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp > index 7eee426..5f80f90 100644 > --- a/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp > +++ b/src/mesa/drivers/dri/i965/test_fs_cmod_propagation.cpp > @@ -66,7 +66,7 @@ void cmod_propagation_test::SetUp() > > v = new cmod_propagation_fs_visitor(compiler, prog_data, shader); > > - _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0); > + _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0); > > devinfo->gen = 4; > } > diff --git a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp > b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp > index fefde4b..32e8b8f 100644 > --- a/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp > +++ b/src/mesa/drivers/dri/i965/test_fs_saturate_propagation.cpp > @@ -66,7 +66,7 @@ void saturate_propagation_test::SetUp() > > v = new saturate_propagation_fs_visitor(compiler, prog_data, shader); > > - _mesa_init_gl_program(>program, GL_FRAGMENT_SHADER, 0); > + _mesa_init_gl_program(>program.Base, GL_FRAGMENT_SHADER, 0); > > devinfo->gen = 4; > } > diff --git a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp > b/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp > index 4a87e6e..e80b71b 100644 > --- a/src/mesa/drivers/dri/i965/test_vec4_copy_propagation.cpp >
Re: [Mesa-dev] [PATCH 1/4] tgsi: add option to dump floats as hex values
Reviewed-by: Marek OlšákMarek On Tue, Oct 13, 2015 at 6:40 AM, Dave Airlie wrote: > This adds support to the parser to accept hex values as floats, > and then adds support to the dumper to allow the user to select > to dump float as 32-bit hex numbers. > > This is required to get accurate values for virgl use of TGSI. > > Signed-off-by: Dave Airlie > --- > src/gallium/auxiliary/tgsi/tgsi_dump.c | 19 ++- > src/gallium/auxiliary/tgsi/tgsi_dump.h | 2 ++ > src/gallium/auxiliary/tgsi/tgsi_text.c | 11 ++- > 3 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.c > b/src/gallium/auxiliary/tgsi/tgsi_dump.c > index 8ceb5b4..33f6a56 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_dump.c > +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.c > @@ -29,6 +29,7 @@ > #include "util/u_string.h" > #include "util/u_math.h" > #include "util/u_memory.h" > +#include "util/u_math.h" > #include "tgsi_dump.h" > #include "tgsi_info.h" > #include "tgsi_iterate.h" > @@ -43,6 +44,8 @@ struct dump_ctx > { > struct tgsi_iterate_context iter; > > + boolean dump_float_as_hex; > + > uint instno; > uint immno; > int indent; > @@ -88,6 +91,7 @@ dump_enum( > #define SID(I) ctx->dump_printf( ctx, "%d", I ) > #define FLT(F) ctx->dump_printf( ctx, "%10.4f", F ) > #define DBL(D) ctx->dump_printf( ctx, "%10.8f", D ) > +#define HFLT(F) ctx->dump_printf( ctx, "0x%08x", fui((F)) ) > #define ENM(E,ENUMS)dump_enum( ctx, E, ENUMS, sizeof( ENUMS ) / sizeof( > *ENUMS ) ) > > const char * > @@ -251,7 +255,10 @@ dump_imm_data(struct tgsi_iterate_context *iter, > break; >} >case TGSI_IMM_FLOAT32: > - FLT( data[i].Float ); > + if (ctx->dump_float_as_hex) > +HFLT( data[i].Float ); > + else > +FLT( data[i].Float ); > break; >case TGSI_IMM_UINT32: > UID(data[i].Uint); > @@ -681,6 +688,11 @@ tgsi_dump_to_file(const struct tgsi_token *tokens, uint > flags, FILE *file) > ctx.indentation = 0; > ctx.file = file; > > + if (flags & TGSI_DUMP_FLOAT_AS_HEX) > + ctx.dump_float_as_hex = TRUE; > + else > + ctx.dump_float_as_hex = FALSE; > + > tgsi_iterate_shader( tokens, ); > } > > @@ -749,6 +761,11 @@ tgsi_dump_str( > ctx.ptr = str; > ctx.left = (int)size; > > + if (flags & TGSI_DUMP_FLOAT_AS_HEX) > + ctx.base.dump_float_as_hex = TRUE; > + else > + ctx.base.dump_float_as_hex = FALSE; > + > tgsi_iterate_shader( tokens, ); > } > > diff --git a/src/gallium/auxiliary/tgsi/tgsi_dump.h > b/src/gallium/auxiliary/tgsi/tgsi_dump.h > index 7c8f92e..b98 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_dump.h > +++ b/src/gallium/auxiliary/tgsi/tgsi_dump.h > @@ -38,6 +38,8 @@ > extern "C" { > #endif > > +#define TGSI_DUMP_FLOAT_AS_HEX (1 << 0) > + > void > tgsi_dump_str( > const struct tgsi_token *tokens, > diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c > b/src/gallium/auxiliary/tgsi/tgsi_text.c > index 3e3ed5b..4a82c9b 100644 > --- a/src/gallium/auxiliary/tgsi/tgsi_text.c > +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c > @@ -195,8 +195,15 @@ static boolean parse_float( const char **pcur, float > *val ) > boolean integral_part = FALSE; > boolean fractional_part = FALSE; > > - *val = (float) atof( cur ); > + if (*cur == '0' && *(cur + 1) == 'x') { > + union fi fi; > + fi.ui = strtoul(cur, NULL, 16); > + *val = fi.f; > + cur += 10; > + goto out; > + } > > + *val = (float) atof( cur ); > if (*cur == '-' || *cur == '+') >cur++; > if (is_digit( cur )) { > @@ -228,6 +235,8 @@ static boolean parse_float( const char **pcur, float *val > ) >else > return FALSE; > } > + > +out: > *pcur = cur; > return TRUE; > } > -- > 2.4.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/10] radeonsi: unify shader delete functions
On 11.10.2015 10:11, Marek Olšák wrote: > From: Marek Olšák> > --- > src/gallium/drivers/radeonsi/si_state_shaders.c | 84 > + > 1 file changed, 17 insertions(+), 67 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c > b/src/gallium/drivers/radeonsi/si_state_shaders.c > index 9d05cb5..cc053bb 100644 > --- a/src/gallium/drivers/radeonsi/si_state_shaders.c > +++ b/src/gallium/drivers/radeonsi/si_state_shaders.c > @@ -907,11 +907,21 @@ static void si_bind_ps_shader(struct pipe_context *ctx, > void *state) > si_mark_atom_dirty(sctx, >cb_target_mask); > } > > -static void si_delete_shader_selector(struct pipe_context *ctx, > - struct si_shader_selector *sel) > +static void si_delete_shader_selector(struct pipe_context *ctx, void *state) > { > struct si_context *sctx = (struct si_context *)ctx; > + struct si_shader_selector *sel = (struct si_shader_selector *)state; > struct si_shader *p = sel->current, *c; > + struct si_shader_selector **current_shader[SI_NUM_SHADERS] = { > + [PIPE_SHADER_VERTEX] = >vs_shader, > + [PIPE_SHADER_TESS_CTRL] = >tcs_shader, > + [PIPE_SHADER_TESS_EVAL] = >tes_shader, > + [PIPE_SHADER_GEOMETRY] = >gs_shader, > + [PIPE_SHADER_FRAGMENT] = >ps_shader, > + }; A switch (sel->type) statement might allow the compiler to generate more efficient code than this. Either way though, this patch is Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 02/10] radeonsi: cleanup si_llvm_init_export_args
On 11.10.2015 10:11, Marek Olšák wrote: > From: Marek OlšákThe shortlog should say "clean up" (verb) instead of "cleanup" (noun). Same for patches 9 & 10. > diff --git a/src/gallium/drivers/radeonsi/si_shader.c > b/src/gallium/drivers/radeonsi/si_shader.c > index 32a702f..109a805 100644 > --- a/src/gallium/drivers/radeonsi/si_shader.c > +++ b/src/gallium/drivers/radeonsi/si_shader.c > @@ -1306,6 +1306,22 @@ static void si_llvm_init_export_args(struct > lp_build_tgsi_context *bld_base, > unsigned compressed = 0; > unsigned chan; > > + /* XXX: This controls which components of the output > + * registers actually get exported. (e.g bit 0 means export > + * X component, bit 1 means export Y component, etc.) I'm > + * hard coding this to 0xf for now. In the future, we might > + * want to do something else. */ The "*/" should go on its own line. With those fixed, this patch and patches 9 & 10 are Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Plan to work on opt_peephole_sel optimization for vec4
On lun, 2015-10-12 at 15:55 +0300, Francisco Jerez wrote: > Antía Puenteswrites: > > On dom, 2015-10-11 at 10:35 -0700, Matt Turner wrote: > >> I would expect big improvements in the vec4 backend from making its > >> copy propagation pass global. > > > > I will be working on global copy propagation then. > > > > Heh... I suggest you hold yourself from working on this for a while. > > While working on some fragment discard improvements last week I noticed > that a seemingly harmless change in the CFG could cause shitloads of > shader-db regressions (>8k). The reason is that a bunch of optimization > passes of the compiler back-end are fully local (including VEC4 copy > propagation) and become ineffective anytime your dataflow happens to > traverse edges of the CFG. > > Instead of reimplementing the intricate dataflow propagation logic (e.g. > what fs_copy_prop_dataflow does) in every pass (or rather every pass > that needs to consider two or more instructions at once), I've given a > shot at implementing use-def chains in the back-end (didn't have a > bigger hammer at hand). Use-def chains should simplify this sort of > optimization passes massively and allow them to work globally without > any additional effort: Instead of looping back the instruction list to > find out whether the last instruction that wrote a register had some > specific form (c.f. cmod_propagation, saturate_propagation, > compute_to_mrf, which are all local and have O(n^2) complexity), you > just look-up the use entry for the instruction you're interested in and > look at its immediate neighbours in the use-def graph, which represent > the instruction(s) that generated the variable. > > Use-def chains can be implemented trivially on an SSA IR so as an added > benefit it should become easier to port optimization passes to SSA once > they've been rewritten in terms of use-def chains. > > I've got an algorithm to construct the use-def graph basically working > on the FS back-end (I'm not sharing it publicly yet though because in > its current form it would likely offend some people's sensibilities -- I > could send you a branch in private though if you're interested). I > haven't started porting other optimization passes to use it yet, but > it's next on my to-do list. Hi Francisco, that sounds great. Do you intend to do the same for vec4? Otherwise, I can try to port your work to vec4 once you are done with FS. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: ensure RM is set
On Tue, Oct 13, 2015 at 11:25:08AM +0100, Emil Velikov wrote: > On 10 October 2015 at 07:42, Jonathan Graywrote: > > GNU make predefines RM to rm -f but this is not required by POSIX > > so ensure that RM is set. This fixes "make clean" on OpenBSD. > > > > v2: use AC_CHECK_PROG > > > I'm ambivalent if we go the AC_CHECK_PROG vs ?= route. > > I'll let others pick their favourite, but both are > Reviewed-by: Emil Velikov > > -Emil The ?= diff was a screwup as that syntax is valid in make but not shell. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block
On Tue, 2015-10-13 at 15:17 +0300, Francisco Jerez wrote: > Iago Toral Quirogawrites: > > > This fixes the following test: > > > > [require] > > GL >= 3.3 > > GLSL >= 3.30 > > GL_ARB_shader_storage_buffer_object > > > > [fragment shader] > > #version 330 > > #extension GL_ARB_shader_storage_buffer_object: require > > > > buffer SSBO { > > mat4 sm4; > > }; > > > > > > mat4 um4; > > > > void main() { > > sm4 *= um4; > > This is using the value of "um4", which is never assigned, so liveness > analysis will correctly extend its live interval up to the start of the > block. > > The other problem here seems to be that the liveness analysis pass > doesn't consider scalar writes (like the ones emitted by the > combine_constants optimization pass and by emit_uniformize()) to fully > define the contents of a register, so it will incorrectly extend up the > live interval of registers defined using scalar writes even if all > components ever used in the shader have been defined individually. > Incidentally the use-def-chains-based implementation of liveness > analysis I'm working on will fix this issue easily. Great, thanks Curro! Once you make your change public I can verify that they fix this too. BTW, even if your changes fix this I wonder if my patch is still valid: control-flow analysis should not add anything relevant to liveness analysis if we only have one block, right? Iago > > sm4[0][0] = 0.0; > > } > > > > [test] > > link success > > > > where our liveness analysis works really bad because the control-flow part > > will expand register liveness to the end of only block we have (so every > > register will be marked alive until the end of the program). This > > artificially > > increases register pressure to a point in which we run out of registers. > > Of course, this is only a simple optimization for a trivial case, the > > underlying problem still exists and would manifest when we have more than > > one block, but it helps simple shaders such as the one in the example above > > without any effort. I guess the real fix would involve re-thinking parts of > > the > > liveness analysis algorithm... > > > > Jordan, there is another thing that we can improve for this shader that is > > specific to SSBOs: we emit the same ssbo load multiple times because we are > > playing it safe just in case there are writes in between. I think we can > > try to > > do better and not re-issue the same load if we don't have ssbo stores to > > the same address in between. I'll try to come up with a patch for this > > (hopefully we can do this inside lower_ubo_reference). > > > > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just > > indentation fixes in the code around these. > > > > Iago Toral Quiroga (4): > > i965/fs: Fix indentation in fs_live_variables::compute_start_end > > i965/fs: skip control-flow aware liveness analysis if we only have 1 > > block > > i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals > > i965/vec4: skip control-flow aware liveness analysis if we only have 1 > > block > > > > .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 > > ++ > > .../drivers/dri/i965/brw_vec4_live_variables.cpp | 23 > > + > > 2 files changed, 30 insertions(+), 17 deletions(-) > > > > -- > > 1.9.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block
Iago Toralwrites: > On Tue, 2015-10-13 at 15:17 +0300, Francisco Jerez wrote: >> Iago Toral Quiroga writes: >> >> > This fixes the following test: >> > >> > [require] >> > GL >= 3.3 >> > GLSL >= 3.30 >> > GL_ARB_shader_storage_buffer_object >> > >> > [fragment shader] >> > #version 330 >> > #extension GL_ARB_shader_storage_buffer_object: require >> > >> > buffer SSBO { >> > mat4 sm4; >> > }; >> > >> > >> > mat4 um4; >> > >> > void main() { >> > sm4 *= um4; >> >> This is using the value of "um4", which is never assigned, so liveness >> analysis will correctly extend its live interval up to the start of the >> block. >> >> The other problem here seems to be that the liveness analysis pass >> doesn't consider scalar writes (like the ones emitted by the >> combine_constants optimization pass and by emit_uniformize()) to fully >> define the contents of a register, so it will incorrectly extend up the >> live interval of registers defined using scalar writes even if all >> components ever used in the shader have been defined individually. >> Incidentally the use-def-chains-based implementation of liveness >> analysis I'm working on will fix this issue easily. > > Great, thanks Curro! Once you make your change public I can verify that > they fix this too. > > BTW, even if your changes fix this I wonder if my patch is still valid: > control-flow analysis should not add anything relevant to liveness > analysis if we only have one block, right? > Yeah, that should be okay. I'm not strongly opposed to this change as temporary hack-around for the meantime, but the comments seem somewhat misleading. Either way patches 1 and 3 (i.e. the indentation fixes) are: Reviewed-by: Francisco Jerez > Iago > >> > sm4[0][0] = 0.0; >> > } >> > >> > [test] >> > link success >> > >> > where our liveness analysis works really bad because the control-flow part >> > will expand register liveness to the end of only block we have (so every >> > register will be marked alive until the end of the program). This >> > artificially >> > increases register pressure to a point in which we run out of registers. >> > Of course, this is only a simple optimization for a trivial case, the >> > underlying problem still exists and would manifest when we have more than >> > one block, but it helps simple shaders such as the one in the example above >> > without any effort. I guess the real fix would involve re-thinking parts >> > of the >> > liveness analysis algorithm... >> > >> > Jordan, there is another thing that we can improve for this shader that is >> > specific to SSBOs: we emit the same ssbo load multiple times because we are >> > playing it safe just in case there are writes in between. I think we can >> > try to >> > do better and not re-issue the same load if we don't have ssbo stores to >> > the same address in between. I'll try to come up with a patch for this >> > (hopefully we can do this inside lower_ubo_reference). >> > >> > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just >> > indentation fixes in the code around these. >> > >> > Iago Toral Quiroga (4): >> > i965/fs: Fix indentation in fs_live_variables::compute_start_end >> > i965/fs: skip control-flow aware liveness analysis if we only have 1 >> > block >> > i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals >> > i965/vec4: skip control-flow aware liveness analysis if we only have 1 >> > block >> > >> > .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 >> > ++ >> > .../drivers/dri/i965/brw_vec4_live_variables.cpp | 23 >> > + >> > 2 files changed, 30 insertions(+), 17 deletions(-) >> > >> > -- >> > 1.9.1 >> > >> > ___ >> > mesa-dev mailing list >> > mesa-dev@lists.freedesktop.org >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Plan to work on opt_peephole_sel optimization for vec4
Antía Puenteswrites: > On lun, 2015-10-12 at 15:55 +0300, Francisco Jerez wrote: >> Antía Puentes writes: >> > On dom, 2015-10-11 at 10:35 -0700, Matt Turner wrote: >> >> I would expect big improvements in the vec4 backend from making its >> >> copy propagation pass global. >> > >> > I will be working on global copy propagation then. >> > >> >> Heh... I suggest you hold yourself from working on this for a while. >> >> While working on some fragment discard improvements last week I noticed >> that a seemingly harmless change in the CFG could cause shitloads of >> shader-db regressions (>8k). The reason is that a bunch of optimization >> passes of the compiler back-end are fully local (including VEC4 copy >> propagation) and become ineffective anytime your dataflow happens to >> traverse edges of the CFG. >> >> Instead of reimplementing the intricate dataflow propagation logic (e.g. >> what fs_copy_prop_dataflow does) in every pass (or rather every pass >> that needs to consider two or more instructions at once), I've given a >> shot at implementing use-def chains in the back-end (didn't have a >> bigger hammer at hand). Use-def chains should simplify this sort of >> optimization passes massively and allow them to work globally without >> any additional effort: Instead of looping back the instruction list to >> find out whether the last instruction that wrote a register had some >> specific form (c.f. cmod_propagation, saturate_propagation, >> compute_to_mrf, which are all local and have O(n^2) complexity), you >> just look-up the use entry for the instruction you're interested in and >> look at its immediate neighbours in the use-def graph, which represent >> the instruction(s) that generated the variable. >> >> Use-def chains can be implemented trivially on an SSA IR so as an added >> benefit it should become easier to port optimization passes to SSA once >> they've been rewritten in terms of use-def chains. >> >> I've got an algorithm to construct the use-def graph basically working >> on the FS back-end (I'm not sharing it publicly yet though because in >> its current form it would likely offend some people's sensibilities -- I >> could send you a branch in private though if you're interested). I >> haven't started porting other optimization passes to use it yet, but >> it's next on my to-do list. > > Hi Francisco, > > that sounds great. Do you intend to do the same for vec4? Otherwise, I > can try to port your work to vec4 once you are done with FS. Pretty much the whole use-def analysis logic is independent from the back-end, so it should be straightforward to port to VEC4. However I might need some help at some point rewriting other optimization passes in terms of use-def chains instead of their current hand-rolled dataflow analysis -- I'll get back to you once I have something solid for you to use as starting point. ;) signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] glsl: calculate TOP_LEVEL_ARRAY_SIZE and STRIDE when adding resources
Series is: Reviewed-by: Samuel Iglesias GonsálvezThanks! Sam On 13/10/15 13:40, Tapani Pälli wrote: > Patch moves existing calculation code from shader_query.cpp to happen > during program resource list creation. > > No Piglit or CTS regressions were observed during testing. > > Signed-off-by: Tapani Pälli > --- > src/glsl/linker.cpp| 241 > src/mesa/main/shader_query.cpp | 244 > + > 2 files changed, 243 insertions(+), 242 deletions(-) > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > index a97b4ef..3422fba 100644 > --- a/src/glsl/linker.cpp > +++ b/src/glsl/linker.cpp > @@ -3389,6 +3389,242 @@ add_packed_varyings(struct gl_shader_program *shProg, > int stage) > return true; > } > > +static char* > +get_top_level_name(const char *name) > +{ > + const char *first_dot = strchr(name, '.'); > + const char *first_square_bracket = strchr(name, '['); > + int name_size = 0; > + /* From ARB_program_interface_query spec: > +* > +* "For the property TOP_LEVEL_ARRAY_SIZE, a single integer identifying > the > +* number of active array elements of the top-level shader storage block > +* member containing to the active variable is written to . If > the > +* top-level block member is not declared as an array, the value one is > +* written to . If the top-level block member is an array with > no > +* declared size, the value zero is written to . > +*/ > + > + /* The buffer variable is on top level.*/ > + if (!first_square_bracket && !first_dot) > + name_size = strlen(name); > + else if ((!first_square_bracket || > +(first_dot && first_dot < first_square_bracket))) > + name_size = first_dot - name; > + else > + name_size = first_square_bracket - name; > + > + return strndup(name, name_size); > +} > + > +static char* > +get_var_name(const char *name) > +{ > + const char *first_dot = strchr(name, '.'); > + > + if (!first_dot) > + return strdup(name); > + > + return strndup(first_dot+1, strlen(first_dot) - 1); > +} > + > +static bool > +is_top_level_shader_storage_block_member(const char* name, > + const char* interface_name, > + const char* field_name) > +{ > + bool result = false; > + > + /* If the given variable is already a top-level shader storage > +* block member, then return array_size = 1. > +* We could have two possibilities: if we have an instanced > +* shader storage block or not instanced. > +* > +* For the first, we check create a name as it was in top level and > +* compare it with the real name. If they are the same, then > +* the variable is already at top-level. > +* > +* Full instanced name is: interface name + '.' + var name + > +*NULL character > +*/ > + int name_length = strlen(interface_name) + 1 + strlen(field_name) + 1; > + char *full_instanced_name = (char *) calloc(name_length, sizeof(char)); > + if (!full_instanced_name) { > + fprintf(stderr, "%s: Cannot allocate space for name\n", __func__); > + return false; > + } > + > + snprintf(full_instanced_name, name_length, "%s.%s", > +interface_name, field_name); > + > + /* Check if its top-level shader storage block member of an > +* instanced interface block, or of a unnamed interface block. > +*/ > + if (strcmp(name, full_instanced_name) == 0 || > + strcmp(name, field_name) == 0) > + result = true; > + > + free(full_instanced_name); > + return result; > +} > + > +static void > +calculate_array_size(struct gl_shader_program *shProg, > + struct gl_uniform_storage *uni) > +{ > + int block_index = uni->block_index; > + int array_size = -1; > + char *var_name = get_top_level_name(uni->name); > + char *interface_name = > + get_top_level_name(shProg->UniformBlocks[block_index].Name); > + > + if (strcmp(var_name, interface_name) == 0) { > + /* Deal with instanced array of SSBOs */ > + char *temp_name = get_var_name(uni->name); > + free(var_name); > + var_name = get_top_level_name(temp_name); > + free(temp_name); > + } > + > + for (unsigned i = 0; i < shProg->NumShaders; i++) { > + if (shProg->Shaders[i] == NULL) > + continue; > + > + const gl_shader *stage = shProg->Shaders[i]; > + foreach_in_list(ir_instruction, node, stage->ir) { > + ir_variable *var = node->as_variable(); > + if (!var || !var->get_interface_type() || > + var->data.mode != ir_var_shader_storage) > +continue; > + > + const glsl_type *interface = var->get_interface_type(); > + > + if (strcmp(interface_name, interface->name) != 0) > +continue; > + > + for (unsigned i = 0; i <
Re: [Mesa-dev] [PATCH 01/11] vbo: get rid of needless NR_MAT_ATTRIBS constant
For the series: Reviewed-by: Marek OlšákMarek On Tue, Oct 13, 2015 at 4:45 AM, Brian Paul wrote: > --- > src/mesa/vbo/vbo_context.c | 9 +++-- > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c > index 802955d..b5dc45c 100644 > --- a/src/mesa/vbo/vbo_context.c > +++ b/src/mesa/vbo/vbo_context.c > @@ -33,7 +33,6 @@ > #include "vbo.h" > #include "vbo_context.h" > > -#define NR_MAT_ATTRIBS 12 > > static GLuint check_size( const GLfloat *attr ) > { > @@ -108,14 +107,12 @@ static void init_mat_currval(struct gl_context *ctx) >>currval[VBO_ATTRIB_MAT_FRONT_AMBIENT]; > GLuint i; > > - assert(NR_MAT_ATTRIBS == MAT_ATTRIB_MAX); > - > - memset(arrays, 0, sizeof(*arrays) * NR_MAT_ATTRIBS); > + memset(arrays, 0, sizeof(*arrays) * MAT_ATTRIB_MAX); > > /* Set up a constant (StrideB == 0) array for each current > * attribute: > */ > - for (i = 0; i < NR_MAT_ATTRIBS; i++) { > + for (i = 0; i < MAT_ATTRIB_MAX; i++) { >struct gl_client_array *cl = [i]; > >/* Size is fixed for the material attributes, for others will > @@ -175,7 +172,7 @@ GLboolean _vbo_CreateContext( struct gl_context *ctx ) >for (i = 0; i < ARRAY_SIZE(vbo->map_vp_none); i++) > vbo->map_vp_none[i] = i; >/* map material attribs to generic slots */ > - for (i = 0; i < NR_MAT_ATTRIBS; i++) > + for (i = 0; i < MAT_ATTRIB_MAX; i++) > vbo->map_vp_none[VERT_ATTRIB_GENERIC(i)] > = VBO_ATTRIB_MAT_FRONT_AMBIENT + i; > > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] glsl: calculate TOP_LEVEL_ARRAY_SIZE and STRIDE when adding resources
Patch moves existing calculation code from shader_query.cpp to happen during program resource list creation. No Piglit or CTS regressions were observed during testing. Signed-off-by: Tapani Pälli--- src/glsl/linker.cpp| 241 src/mesa/main/shader_query.cpp | 244 + 2 files changed, 243 insertions(+), 242 deletions(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index a97b4ef..3422fba 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -3389,6 +3389,242 @@ add_packed_varyings(struct gl_shader_program *shProg, int stage) return true; } +static char* +get_top_level_name(const char *name) +{ + const char *first_dot = strchr(name, '.'); + const char *first_square_bracket = strchr(name, '['); + int name_size = 0; + /* From ARB_program_interface_query spec: +* +* "For the property TOP_LEVEL_ARRAY_SIZE, a single integer identifying the +* number of active array elements of the top-level shader storage block +* member containing to the active variable is written to . If the +* top-level block member is not declared as an array, the value one is +* written to . If the top-level block member is an array with no +* declared size, the value zero is written to . +*/ + + /* The buffer variable is on top level.*/ + if (!first_square_bracket && !first_dot) + name_size = strlen(name); + else if ((!first_square_bracket || +(first_dot && first_dot < first_square_bracket))) + name_size = first_dot - name; + else + name_size = first_square_bracket - name; + + return strndup(name, name_size); +} + +static char* +get_var_name(const char *name) +{ + const char *first_dot = strchr(name, '.'); + + if (!first_dot) + return strdup(name); + + return strndup(first_dot+1, strlen(first_dot) - 1); +} + +static bool +is_top_level_shader_storage_block_member(const char* name, + const char* interface_name, + const char* field_name) +{ + bool result = false; + + /* If the given variable is already a top-level shader storage +* block member, then return array_size = 1. +* We could have two possibilities: if we have an instanced +* shader storage block or not instanced. +* +* For the first, we check create a name as it was in top level and +* compare it with the real name. If they are the same, then +* the variable is already at top-level. +* +* Full instanced name is: interface name + '.' + var name + +*NULL character +*/ + int name_length = strlen(interface_name) + 1 + strlen(field_name) + 1; + char *full_instanced_name = (char *) calloc(name_length, sizeof(char)); + if (!full_instanced_name) { + fprintf(stderr, "%s: Cannot allocate space for name\n", __func__); + return false; + } + + snprintf(full_instanced_name, name_length, "%s.%s", +interface_name, field_name); + + /* Check if its top-level shader storage block member of an +* instanced interface block, or of a unnamed interface block. +*/ + if (strcmp(name, full_instanced_name) == 0 || + strcmp(name, field_name) == 0) + result = true; + + free(full_instanced_name); + return result; +} + +static void +calculate_array_size(struct gl_shader_program *shProg, + struct gl_uniform_storage *uni) +{ + int block_index = uni->block_index; + int array_size = -1; + char *var_name = get_top_level_name(uni->name); + char *interface_name = + get_top_level_name(shProg->UniformBlocks[block_index].Name); + + if (strcmp(var_name, interface_name) == 0) { + /* Deal with instanced array of SSBOs */ + char *temp_name = get_var_name(uni->name); + free(var_name); + var_name = get_top_level_name(temp_name); + free(temp_name); + } + + for (unsigned i = 0; i < shProg->NumShaders; i++) { + if (shProg->Shaders[i] == NULL) + continue; + + const gl_shader *stage = shProg->Shaders[i]; + foreach_in_list(ir_instruction, node, stage->ir) { + ir_variable *var = node->as_variable(); + if (!var || !var->get_interface_type() || + var->data.mode != ir_var_shader_storage) +continue; + + const glsl_type *interface = var->get_interface_type(); + + if (strcmp(interface_name, interface->name) != 0) +continue; + + for (unsigned i = 0; i < interface->length; i++) { +const glsl_struct_field *field = >fields.structure[i]; +if (strcmp(field->name, var_name) != 0) + continue; +/* From GL_ARB_program_interface_query spec: + * + * "For the property TOP_LEVEL_ARRAY_SIZE, a single integer + * identifying the number of active array elements of the top-level +
[Mesa-dev] [PATCH 1/2] glsl: add top level array size and stride to gl_uniform_storage
Patch adds 2 new fields to gl_uniform_storage so that we don't need to calculate these values during runtime shader queries. This is required by upcoming changes to free GLSL IR after linking. Patch moves 3 booleans inside structure so that structure size stays the same after this change. Signed-off-by: Tapani Pälli--- src/glsl/ir_uniform.h | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/src/glsl/ir_uniform.h b/src/glsl/ir_uniform.h index 50fe76b..1854279 100644 --- a/src/glsl/ir_uniform.h +++ b/src/glsl/ir_uniform.h @@ -162,6 +162,22 @@ struct gl_uniform_storage { /** @} */ /** +* This is a compiler-generated uniform that should not be advertised +* via the API. +*/ + bool hidden; + + /** +* This is a built-in uniform that should not be modified through any gl API. +*/ + bool builtin; + + /** +* This is a shader storage buffer variable, not an uniform. +*/ + bool is_shader_storage; + + /** * Index within gl_shader_program::AtomicBuffers[] of the atomic * counter buffer this uniform is stored in, or -1 if this is not * an atomic counter. @@ -181,20 +197,16 @@ struct gl_uniform_storage { unsigned num_compatible_subroutines; /** -* This is a compiler-generated uniform that should not be advertised -* via the API. +* A single integer identifying the number of active array elements of +* the top-level shader storage block member (GL_TOP_LEVEL_ARRAY_SIZE). */ - bool hidden; + unsigned top_level_array_size; /** -* This is a built-in uniform that should not be modified through any gl API. +* A single integer identifying the stride between array elements of the +* top-level shader storage block member. (GL_TOP_LEVEL_ARRAY_STRIDE). */ - bool builtin; - - /** -* This is a shader storage buffer variable, not an uniform. -*/ - bool is_shader_storage; + unsigned top_level_array_stride; }; #ifdef __cplusplus -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block
Iago Toral Quirogawrites: > This fixes the following test: > > [require] > GL >= 3.3 > GLSL >= 3.30 > GL_ARB_shader_storage_buffer_object > > [fragment shader] > #version 330 > #extension GL_ARB_shader_storage_buffer_object: require > > buffer SSBO { > mat4 sm4; > }; > > > mat4 um4; > > void main() { > sm4 *= um4; This is using the value of "um4", which is never assigned, so liveness analysis will correctly extend its live interval up to the start of the block. The other problem here seems to be that the liveness analysis pass doesn't consider scalar writes (like the ones emitted by the combine_constants optimization pass and by emit_uniformize()) to fully define the contents of a register, so it will incorrectly extend up the live interval of registers defined using scalar writes even if all components ever used in the shader have been defined individually. Incidentally the use-def-chains-based implementation of liveness analysis I'm working on will fix this issue easily. > sm4[0][0] = 0.0; > } > > [test] > link success > > where our liveness analysis works really bad because the control-flow part > will expand register liveness to the end of only block we have (so every > register will be marked alive until the end of the program). This artificially > increases register pressure to a point in which we run out of registers. > Of course, this is only a simple optimization for a trivial case, the > underlying problem still exists and would manifest when we have more than > one block, but it helps simple shaders such as the one in the example above > without any effort. I guess the real fix would involve re-thinking parts of > the > liveness analysis algorithm... > > Jordan, there is another thing that we can improve for this shader that is > specific to SSBOs: we emit the same ssbo load multiple times because we are > playing it safe just in case there are writes in between. I think we can try > to > do better and not re-issue the same load if we don't have ssbo stores to > the same address in between. I'll try to come up with a patch for this > (hopefully we can do this inside lower_ubo_reference). > > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just > indentation fixes in the code around these. > > Iago Toral Quiroga (4): > i965/fs: Fix indentation in fs_live_variables::compute_start_end > i965/fs: skip control-flow aware liveness analysis if we only have 1 > block > i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals > i965/vec4: skip control-flow aware liveness analysis if we only have 1 > block > > .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 > ++ > .../drivers/dri/i965/brw_vec4_live_variables.cpp | 23 + > 2 files changed, 30 insertions(+), 17 deletions(-) > > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC 2/2] gallium: add tegra support
On Mon, Oct 12, 2015 at 12:09 AM, Christian Gmeinerwrote: > This commit adds tegra support, which uses the renderonly driver > library. > > Signed-off-by: Christian Gmeiner > --- > configure.ac | 19 +++- > src/gallium/Makefile.am| 6 +++ > .../auxiliary/target-helpers/inline_drm_helper.h | 29 > src/gallium/drivers/tegra/Automake.inc | 10 + > src/gallium/drivers/tegra/Makefile.am | 9 > src/gallium/targets/dri/Makefile.am| 2 + > src/gallium/winsys/tegra/drm/Android.mk| 34 +++ > src/gallium/winsys/tegra/drm/Makefile.am | 33 ++ > src/gallium/winsys/tegra/drm/Makefile.sources | 3 ++ > src/gallium/winsys/tegra/drm/tegra_drm_public.h| 31 + > src/gallium/winsys/tegra/drm/tegra_drm_winsys.c| 51 > ++ > 11 files changed, 226 insertions(+), 1 deletion(-) > create mode 100644 src/gallium/drivers/tegra/Automake.inc > create mode 100644 src/gallium/drivers/tegra/Makefile.am > create mode 100644 src/gallium/winsys/tegra/drm/Android.mk > create mode 100644 src/gallium/winsys/tegra/drm/Makefile.am > create mode 100644 src/gallium/winsys/tegra/drm/Makefile.sources > create mode 100644 src/gallium/winsys/tegra/drm/tegra_drm_public.h > create mode 100644 src/gallium/winsys/tegra/drm/tegra_drm_winsys.c > > diff --git a/configure.ac b/configure.ac > index ea485b1..9fb8244 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -75,6 +75,7 @@ LIBDRM_INTEL_REQUIRED=2.4.61 > LIBDRM_NVVIEUX_REQUIRED=2.4.33 > LIBDRM_NOUVEAU_REQUIRED=2.4.62 > LIBDRM_FREEDRENO_REQUIRED=2.4.65 > +LIBDRM_TEGRA_REQUIRED=2.4.58 > DRI2PROTO_REQUIRED=2.6 > DRI3PROTO_REQUIRED=1.0 > PRESENTPROTO_REQUIRED=1.0 > @@ -864,7 +865,7 @@ GALLIUM_DRIVERS_DEFAULT="r300,r600,svga,swrast" > AC_ARG_WITH([gallium-drivers], > [AS_HELP_STRING([--with-gallium-drivers@<:@=DIRS...@:>@], > [comma delimited Gallium drivers list, e.g. > -"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4" > +"i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,tegra,vc4" > @<:@default=r300,r600,svga,swrast@:>@])], > [with_gallium_drivers="$withval"], > [with_gallium_drivers="$GALLIUM_DRIVERS_DEFAULT"]) > @@ -2166,6 +2167,12 @@ if test -n "$with_gallium_drivers"; then > HAVE_GALLIUM_LLVMPIPE=yes > fi > ;; > +xtegra) > +HAVE_GALLIUM_TEGRA=yes > +PKG_CHECK_MODULES([TEGRA], [libdrm_tegra >= > $LIBDRM_TEGRA_REQUIRED]) > +gallium_require_drm "tegra" > +gallium_require_drm_loader > +;; > xvc4) > HAVE_GALLIUM_VC4=yes > gallium_require_drm "vc4" > @@ -2181,6 +2188,13 @@ if test -n "$with_gallium_drivers"; then > done > fi > > +dnl We need to validate some needed dependencies for renderonly drivers. > + > +if test "x$HAVE_GALLIUM_NOUVEAU" != xyes -a "x$HAVE_GALLIUM_TEGRA" == xyes > ; then > +AC_ERROR([Building with tegra requires that nouveau]) > +fi > + > + > dnl Set LLVM_LIBS - This is done after the driver configuration so > dnl that drivers can add additional components to LLVM_COMPONENTS. > dnl Previously, gallium drivers were updating LLVM_LIBS directly > @@ -2245,6 +2259,7 @@ AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test > "x$HAVE_GALLIUM_FREEDRENO" = xyes) > AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test "x$HAVE_GALLIUM_SOFTPIPE" = xyes) > AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test "x$HAVE_GALLIUM_LLVMPIPE" = xyes) > AM_CONDITIONAL(HAVE_GALLIUM_VC4, test "x$HAVE_GALLIUM_VC4" = xyes) > +AM_CONDITIONAL(HAVE_GALLIUM_TEGRA, test "x$HAVE_GALLIUM_TEGRA" = xyes) > > AM_CONDITIONAL(HAVE_GALLIUM_STATIC_TARGETS, test > "x$enable_shared_pipe_drivers" = xno) > > @@ -2364,6 +2379,7 @@ AC_CONFIG_FILES([Makefile > src/gallium/drivers/renderonly/Makefile > src/gallium/drivers/softpipe/Makefile > src/gallium/drivers/svga/Makefile > + src/gallium/drivers/tegra/Makefile > src/gallium/drivers/trace/Makefile > src/gallium/drivers/vc4/Makefile > src/gallium/state_trackers/clover/Makefile > @@ -2406,6 +2422,7 @@ AC_CONFIG_FILES([Makefile > src/gallium/winsys/sw/wrapper/Makefile > src/gallium/winsys/sw/xlib/Makefile > src/gallium/winsys/vc4/drm/Makefile > + src/gallium/winsys/tegra/drm/Makefile > src/gbm/Makefile > src/gbm/main/gbm.pc > src/glsl/Makefile > diff --git a/src/gallium/Makefile.am b/src/gallium/Makefile.am > index a7c3606..7278300 100644 > --- a/src/gallium/Makefile.am > +++ b/src/gallium/Makefile.am > @@ -82,6 +82,12 @@ if HAVE_GALLIUM_VC4 > SUBDIRS +=
Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)
On 10/6/2015 2:12 PM, Michel Thierry wrote: On 10/5/2015 7:06 PM, Kristian Høgsberg wrote: On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierrywrote: On 9/14/2015 2:54 PM, Michał Winiarski wrote: On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote: Gen8+ supports 48-bit virtual addresses, but some objects must always be allocated inside the 32-bit address range. In specific, any resource used with flat/heapless (0x-0xf000) General State Heap (GSH) or Instruction State Heap (ISH) must be in a 32-bit range, because the General State Offset and Instruction State Offset are limited to 32-bits. The i915 driver has been modified to provide a flag to set when the 4GB limit is not necessary in a given bo (EXEC_OBJECT_SUPPORTS_48B_ADDRESS). 48-bit range will only be used when explicitly requested. Callers to the existing drm_intel_bo_emit_reloc function should set the use_48b_address_range flag beforehand, in order to use full ppgtt range. v2: Make set/clear functions nops on pre-gen8 platforms, and use them internally in emit_reloc functions (Ben) s/48BADDRESS/48B_ADDRESS/ (Dave) v3: Keep set/clear functions internal, no-one needs to use them directly. v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type before enabling set/clear function, print full offsets in debug statements, using port of lower_32_bits and upper_32_bits from linux kernel (Michał) References: http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html Cc: Ben Widawsky Cc: Michał Winiarski +Kristian LGTM. Acked-by: Michał Winiarski Signed-off-by: Michel Thierry Hi Kristian, More comments on this? I've resent the mesa patch with the last changes (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html). Michał has agreed with the interface and will use it for his work. If mesa doesn't plan to use this for now, it's ok. The kernel changes have been done to prevent any regressions when the support 48-bit flag is not set. I didn't get any replies to my last comments on this: http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html Basically, I tried explicitly placing buffers above 8G and didn't see the HW problem described (ie it all worked fine). I still think there's some confusion as to what the W/A is about. Here's my take: the W/A is a SW-only workaround to handle the cases where a driver uses heapless and 48-bit ppgtt. The problem is that the heaps are limited to 4G but with 48bit ppgtt a buffer can be placed anywhere it the 48 bit address space. If that happens it's consideredd out-of-bounds for the heap and access fails. To prevent this we need to make sure the bos we address in a heapless fashion are placed below 4g. For mesa, we only configure general state base as heap-less, which affects scratch bos. What this boils down to is that we need the 4G restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS and 3DSTATE_PS (and tesselation stage eventually). Look for the OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it isn't exclusive to the scratch bos (the error state points to that instruction). Hi, Anymore inputs about this patch? AFAIK, if changes are required based on further comments from Kristian, these will be in the mesa patch[1], not libdrm. Also, Michał will use this flag in another project. Thanks, -Michel [1]http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)
On 10/13/2015 3:13 PM, Emil Velikov wrote: On 13 October 2015 at 13:16, Michel Thierrywrote: On 10/6/2015 2:12 PM, Michel Thierry wrote: On 10/5/2015 7:06 PM, Kristian Høgsberg wrote: On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry wrote: On 9/14/2015 2:54 PM, Michał Winiarski wrote: On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote: Gen8+ supports 48-bit virtual addresses, but some objects must always be allocated inside the 32-bit address range. In specific, any resource used with flat/heapless (0x-0xf000) General State Heap (GSH) or Instruction State Heap (ISH) must be in a 32-bit range, because the General State Offset and Instruction State Offset are limited to 32-bits. The i915 driver has been modified to provide a flag to set when the 4GB limit is not necessary in a given bo (EXEC_OBJECT_SUPPORTS_48B_ADDRESS). 48-bit range will only be used when explicitly requested. Callers to the existing drm_intel_bo_emit_reloc function should set the use_48b_address_range flag beforehand, in order to use full ppgtt range. v2: Make set/clear functions nops on pre-gen8 platforms, and use them internally in emit_reloc functions (Ben) s/48BADDRESS/48B_ADDRESS/ (Dave) v3: Keep set/clear functions internal, no-one needs to use them directly. v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type before enabling set/clear function, print full offsets in debug statements, using port of lower_32_bits and upper_32_bits from linux kernel (Michał) References: http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html Cc: Ben Widawsky Cc: Michał Winiarski +Kristian LGTM. Acked-by: Michał Winiarski Signed-off-by: Michel Thierry Hi Kristian, More comments on this? I've resent the mesa patch with the last changes (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html). Michał has agreed with the interface and will use it for his work. If mesa doesn't plan to use this for now, it's ok. The kernel changes have been done to prevent any regressions when the support 48-bit flag is not set. I didn't get any replies to my last comments on this: http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html Basically, I tried explicitly placing buffers above 8G and didn't see the HW problem described (ie it all worked fine). I still think there's some confusion as to what the W/A is about. Here's my take: the W/A is a SW-only workaround to handle the cases where a driver uses heapless and 48-bit ppgtt. The problem is that the heaps are limited to 4G but with 48bit ppgtt a buffer can be placed anywhere it the 48 bit address space. If that happens it's consideredd out-of-bounds for the heap and access fails. To prevent this we need to make sure the bos we address in a heapless fashion are placed below 4g. For mesa, we only configure general state base as heap-less, which affects scratch bos. What this boils down to is that we need the 4G restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS and 3DSTATE_PS (and tesselation stage eventually). Look for the OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it isn't exclusive to the scratch bos (the error state points to that instruction). Hi, Anymore inputs about this patch? AFAIK, if changes are required based on further comments from Kristian, these will be in the mesa patch[1], not libdrm. Also, Michał will use this flag in another project. The comment seems quite brief and I'm not sure it fully addresses Kristian's concern. I'd suspect that providing reference to the HW documentation (confirming your assumption) might be beneficial. Sure, attached is the hang I get if I don't set the restriction in gen8_misc_state.c and try to use the full 48-bit range (i915_error_state_nowa.txt). This is just running gfxbench t-rex. I see the same hang signature when it is only applied to the scratch bos (in gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c - i915_error_state_scratchbo.txt). 3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x7823) is defined here: https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf (page 256) Thanks, -Michel i915_error_states.7z Description: Binary data ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/6] nir: remove dependency on glsl
Hi Rob, On 10 October 2015 at 19:47, Rob Clarkwrote: > From: Rob Clark > > Move glsl_types into NIR, now that the dependency on glsl_symbol_table > has been split out. > > Possibly makes sense to rename things at this point, but if we do that > I'd like to keep it split out into a separate patch to make git history > easier to follow (IMHO). > > Signed-off-by: Rob Clark > --- > src/glsl/Makefile.am |3 - > src/glsl/Makefile.sources |4 +- > src/glsl/builtin_type_macros.h | 172 -- > src/glsl/glsl_types.cpp| 1729 > > src/glsl/glsl_types.h | 867 -- > src/glsl/nir/builtin_type_macros.h | 172 ++ > src/glsl/nir/glsl_types.cpp| 1729 > > src/glsl/nir/glsl_types.h | 867 ++ > src/glsl/nir/nir_types.h |2 +- > .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +- > src/mesa/drivers/dri/i965/brw_fs.cpp |2 +- > src/mesa/drivers/dri/i965/brw_fs.h |2 +- > .../dri/i965/brw_fs_channel_expressions.cpp|2 +- > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |2 +- > .../drivers/dri/i965/brw_fs_vector_splitting.cpp |2 +- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |2 +- > .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +- > .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +- > src/mesa/main/ff_fragment_shader.cpp |2 +- > src/mesa/main/uniforms.h |2 +- > src/mesa/program/ir_to_mesa.cpp|2 +- > src/mesa/program/sampler.cpp |2 +- > 22 files changed, 2784 insertions(+), 2787 deletions(-) > delete mode 100644 src/glsl/builtin_type_macros.h > delete mode 100644 src/glsl/glsl_types.cpp > delete mode 100644 src/glsl/glsl_types.h > create mode 100644 src/glsl/nir/builtin_type_macros.h > create mode 100644 src/glsl/nir/glsl_types.cpp > create mode 100644 src/glsl/nir/glsl_types.h > > diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am > index 347919b..437c6a5 100644 > --- a/src/glsl/Makefile.am > +++ b/src/glsl/Makefile.am > @@ -148,9 +148,6 @@ libglsl_la_SOURCES = > \ > > > libnir_la_SOURCES =\ > - glsl_types.cpp \ > - builtin_types.cpp \ > - glsl_symbol_table.cpp \ > $(NIR_FILES)\ > $(NIR_GENERATED_FILES) > > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources > index 436949c..6e61f23 100644 > --- a/src/glsl/Makefile.sources > +++ b/src/glsl/Makefile.sources > @@ -20,6 +20,8 @@ NIR_GENERATED_FILES = \ > NIR_FILES = \ > nir/glsl_to_nir.cpp \ > nir/glsl_to_nir.h \ > + nir/glsl_types.cpp \ > + nir/glsl_types.h \ > nir/nir.c \ > nir/nir.h \ > nir/nir_array.h \ > @@ -103,8 +105,6 @@ LIBGLSL_FILES = \ > glsl_parser_extras.h \ > glsl_symbol_table.cpp \ > glsl_symbol_table.h \ > - glsl_types.cpp \ > - glsl_types.h \ > hir_field_selection.cpp \ > ir_basic_block.cpp \ > ir_basic_block.h \ Can we split this into two (or more) patches. - move the files from glsl to glsl/nir, updating scons/android. note scons is missing everything NIR related. - fold/nuke the additional glsl requirements, from NIR. From autotools side alone the patch looks great. Thank you Emil P.S. Please use -M for git to detect code movement when generating the patch. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: couple shader_enums cleanups
On 10 October 2015 at 19:47, Rob Clarkwrote: > From: Rob Clark > > Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX / > FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil. > > v2: add STATIC_ASSERT()'s > > Reported-by: Emil Velikov > Signed-off-by: Rob Clark > --- > src/glsl/nir/shader_enums.c | 8 > src/glsl/nir/shader_enums.h | 7 +++ > src/mesa/main/mtypes.h | 5 - > 3 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c > index 3722475..66a25e7 100644 > --- a/src/glsl/nir/shader_enums.c > +++ b/src/glsl/nir/shader_enums.c > @@ -28,6 +28,7 @@ > > #include "shader_enums.h" > #include "util/macros.h" > +#include "mesa/main/config.h" > > #define ENUM(x) [x] = #x > #define NAME(val) val) < ARRAY_SIZE(names)) && names[(val)]) ? > names[(val)] : "UNKNOWN") > @@ -42,6 +43,7 @@ const char * gl_shader_stage_name(gl_shader_stage stage) >ENUM(MESA_SHADER_FRAGMENT), >ENUM(MESA_SHADER_COMPUTE), > }; > + STATIC_ASSERT(ARRAY_SIZE(names) == MESA_SHADER_STAGES); > return NAME(stage); > } > > @@ -82,6 +84,7 @@ const char * gl_vert_attrib_name(gl_vert_attrib attrib) >ENUM(VERT_ATTRIB_GENERIC14), >ENUM(VERT_ATTRIB_GENERIC15), > }; > + STATIC_ASSERT(ARRAY_SIZE(names) == VERT_ATTRIB_MAX); > return NAME(attrib); > } > > @@ -147,6 +150,7 @@ const char * gl_varying_slot_name(gl_varying_slot slot) >ENUM(VARYING_SLOT_VAR30), >ENUM(VARYING_SLOT_VAR31), > }; > + STATIC_ASSERT(ARRAY_SIZE(names) == VARYING_SLOT_MAX); > return NAME(slot); > } > > @@ -169,8 +173,10 @@ const char * gl_system_value_name(gl_system_value sysval) > ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER), > ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID), > ENUM(SYSTEM_VALUE_WORK_GROUP_ID), > + ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS), > ENUM(SYSTEM_VALUE_VERTEX_CNT), > }; > + STATIC_ASSERT(ARRAY_SIZE(names) == SYSTEM_VALUE_MAX); > return NAME(sysval); > } > > @@ -182,6 +188,7 @@ const char * glsl_interp_qualifier_name(enum > glsl_interp_qualifier qual) >ENUM(INTERP_QUALIFIER_FLAT), >ENUM(INTERP_QUALIFIER_NOPERSPECTIVE), > }; > + STATIC_ASSERT(ARRAY_SIZE(names) == INTERP_QUALIFIER_COUNT); > return NAME(qual); > } > > @@ -201,5 +208,6 @@ const char * gl_frag_result_name(gl_frag_result result) >ENUM(FRAG_RESULT_DATA6), >ENUM(FRAG_RESULT_DATA7), > }; > + STATIC_ASSERT(ARRAY_SIZE(names) == FRAG_RESULT_MAX); > return NAME(result); > } > diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h > index 2a5d2c5..77638ba 100644 > --- a/src/glsl/nir/shader_enums.h > +++ b/src/glsl/nir/shader_enums.h > @@ -233,6 +233,11 @@ typedef enum > VARYING_SLOT_VAR31, > } gl_varying_slot; > > + > +#define VARYING_SLOT_MAX (VARYING_SLOT_VAR0 + MAX_VARYING) > +#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX) > +#define VARYING_SLOT_TESS_MAX (VARYING_SLOT_PATCH0 + MAX_VARYING) > + As mentioned before I'm not sure (and perhaps not the best person to comment) on the correct definition of the macros, so I'd call this Acked-by: Emil Velikov For patches 2,3,4 and 6 Reviewed-by: Emil Velikov Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)
On 13 October 2015 at 13:16, Michel Thierrywrote: > On 10/6/2015 2:12 PM, Michel Thierry wrote: >> >> On 10/5/2015 7:06 PM, Kristian Høgsberg wrote: >>> >>> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry >>> wrote: On 9/14/2015 2:54 PM, Michał Winiarski wrote: > > > On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote: >> >> >> Gen8+ supports 48-bit virtual addresses, but some objects must >> always be >> allocated inside the 32-bit address range. >> >> In specific, any resource used with flat/heapless >> (0x-0xf000) >> General State Heap (GSH) or Instruction State Heap (ISH) must be in a >> 32-bit range, because the General State Offset and Instruction State >> Offset >> are limited to 32-bits. >> >> The i915 driver has been modified to provide a flag to set when the >> 4GB >> limit is not necessary in a given bo >> (EXEC_OBJECT_SUPPORTS_48B_ADDRESS). >> 48-bit range will only be used when explicitly requested. >> >> Callers to the existing drm_intel_bo_emit_reloc function should set >> the >> use_48b_address_range flag beforehand, in order to use full ppgtt >> range. >> >> v2: Make set/clear functions nops on pre-gen8 platforms, and use them >> internally in emit_reloc functions (Ben) >> s/48BADDRESS/48B_ADDRESS/ (Dave) >> v3: Keep set/clear functions internal, no-one needs to use them >> directly. >> v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type >> before enabling set/clear function, print full offsets in debug >> statements, using port of lower_32_bits and upper_32_bits >> from linux >> kernel (Michał) >> >> References: >> http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html >> Cc: Ben Widawsky >> Cc: Michał Winiarski > > > > +Kristian > > LGTM. > Acked-by: Michał Winiarski > >> Signed-off-by: Michel Thierry Hi Kristian, More comments on this? I've resent the mesa patch with the last changes (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html). Michał has agreed with the interface and will use it for his work. If mesa doesn't plan to use this for now, it's ok. The kernel changes have been done to prevent any regressions when the support 48-bit flag is not set. >>> >>> >>> I didn't get any replies to my last comments on this: >>> >>> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html >>> >>> Basically, I tried explicitly placing buffers above 8G and didn't see >>> the HW problem described (ie it all worked fine). I still think >>> there's some confusion as to what the W/A is about. >>> >>> Here's my take: the W/A is a SW-only workaround to handle the cases >>> where a driver uses heapless and 48-bit ppgtt. The problem is that the >>> heaps are limited to 4G but with 48bit ppgtt a buffer can be placed >>> anywhere it the 48 bit address space. If that happens it's consideredd >>> out-of-bounds for the heap and access fails. To prevent this we need >>> to make sure the bos we address in a heapless fashion are placed below >>> 4g. >>> >>> For mesa, we only configure general state base as heap-less, which >>> affects scratch bos. What this boils down to is that we need the 4G >>> restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS >>> and 3DSTATE_PS (and tesselation stage eventually). Look for the >>> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c >>> and gen8_ps_state.c >> >> >> I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it >> isn't exclusive to the scratch bos (the error state points to that >> instruction). >> >> > > Hi, > > Anymore inputs about this patch? > AFAIK, if changes are required based on further comments from Kristian, > these will be in the mesa patch[1], not libdrm. Also, Michał will use this > flag in another project. > The comment seems quite brief and I'm not sure it fully addresses Kristian's concern. I'd suspect that providing reference to the HW documentation (confirming your assumption) might be beneficial. Regards, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90264] [Regression, bisected] Tooltip corruption in Chrome
https://bugs.freedesktop.org/show_bug.cgi?id=90264 Patryk Zawadzkichanged: What|Removed |Added CC||pat...@gmail.com -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/6] nir: remove dependency on glsl
On Tue, Oct 13, 2015 at 11:22 AM, Emil Velikovwrote: > Hi Rob, > > On 10 October 2015 at 19:47, Rob Clark wrote: >> From: Rob Clark >> >> Move glsl_types into NIR, now that the dependency on glsl_symbol_table >> has been split out. >> >> Possibly makes sense to rename things at this point, but if we do that >> I'd like to keep it split out into a separate patch to make git history >> easier to follow (IMHO). >> >> Signed-off-by: Rob Clark >> --- >> src/glsl/Makefile.am |3 - >> src/glsl/Makefile.sources |4 +- >> src/glsl/builtin_type_macros.h | 172 -- >> src/glsl/glsl_types.cpp| 1729 >> >> src/glsl/glsl_types.h | 867 -- >> src/glsl/nir/builtin_type_macros.h | 172 ++ >> src/glsl/nir/glsl_types.cpp| 1729 >> >> src/glsl/nir/glsl_types.h | 867 ++ >> src/glsl/nir/nir_types.h |2 +- >> .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +- >> src/mesa/drivers/dri/i965/brw_fs.cpp |2 +- >> src/mesa/drivers/dri/i965/brw_fs.h |2 +- >> .../dri/i965/brw_fs_channel_expressions.cpp|2 +- >> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |2 +- >> .../drivers/dri/i965/brw_fs_vector_splitting.cpp |2 +- >> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |2 +- >> .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +- >> .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +- >> src/mesa/main/ff_fragment_shader.cpp |2 +- >> src/mesa/main/uniforms.h |2 +- >> src/mesa/program/ir_to_mesa.cpp|2 +- >> src/mesa/program/sampler.cpp |2 +- >> 22 files changed, 2784 insertions(+), 2787 deletions(-) >> delete mode 100644 src/glsl/builtin_type_macros.h >> delete mode 100644 src/glsl/glsl_types.cpp >> delete mode 100644 src/glsl/glsl_types.h >> create mode 100644 src/glsl/nir/builtin_type_macros.h >> create mode 100644 src/glsl/nir/glsl_types.cpp >> create mode 100644 src/glsl/nir/glsl_types.h >> >> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am >> index 347919b..437c6a5 100644 >> --- a/src/glsl/Makefile.am >> +++ b/src/glsl/Makefile.am >> @@ -148,9 +148,6 @@ libglsl_la_SOURCES = >>\ >> >> >> libnir_la_SOURCES =\ >> - glsl_types.cpp \ >> - builtin_types.cpp \ >> - glsl_symbol_table.cpp \ >> $(NIR_FILES)\ >> $(NIR_GENERATED_FILES) >> >> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources >> index 436949c..6e61f23 100644 >> --- a/src/glsl/Makefile.sources >> +++ b/src/glsl/Makefile.sources >> @@ -20,6 +20,8 @@ NIR_GENERATED_FILES = \ >> NIR_FILES = \ >> nir/glsl_to_nir.cpp \ >> nir/glsl_to_nir.h \ >> + nir/glsl_types.cpp \ >> + nir/glsl_types.h \ >> nir/nir.c \ >> nir/nir.h \ >> nir/nir_array.h \ >> @@ -103,8 +105,6 @@ LIBGLSL_FILES = \ >> glsl_parser_extras.h \ >> glsl_symbol_table.cpp \ >> glsl_symbol_table.h \ >> - glsl_types.cpp \ >> - glsl_types.h \ >> hir_field_selection.cpp \ >> ir_basic_block.cpp \ >> ir_basic_block.h \ > Can we split this into two (or more) patches. > - move the files from glsl to glsl/nir, updating scons/android. note > scons is missing everything NIR related. > - fold/nuke the additional glsl requirements, from NIR. It is already split up this way.. this patch is primarily the move (plus header path tweaks, etc, to keep things compiling). I don't see how it could be split up any finer while keeping bisectability (ie. not breaking compile in the middle). That said, I did completely ignore scons/android. I don't know the first thing about scons or how to do a scons build, so I think I'll ignore that and let someone else fix it up. I suppose I could fix android build, although I can't build android on my laptop (and I guess I'd have to rebase some of the other android related stuff that isn't upstream yet), so maybe I'll just fix that in a follow-on patch this weekend. BR, -R > From autotools side alone the patch looks great. > > Thank you > Emil > > P.S. Please use -M for git to detect code movement when generating the patch. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] glsl: couple shader_enums cleanups
On Oct 13, 2015 8:26 AM, "Emil Velikov"wrote: > > On 10 October 2015 at 19:47, Rob Clark wrote: > > From: Rob Clark > > > > Add missing enum to gl_system_value_name() and move VARYING_SLOT_MAX / > > FRAG_RESULT_MAX / etc into shader_enums.h as suggested by Emil. > > > > v2: add STATIC_ASSERT()'s > > > > Reported-by: Emil Velikov > > Signed-off-by: Rob Clark > > --- > > src/glsl/nir/shader_enums.c | 8 > > src/glsl/nir/shader_enums.h | 7 +++ > > src/mesa/main/mtypes.h | 5 - > > 3 files changed, 15 insertions(+), 5 deletions(-) > > > > diff --git a/src/glsl/nir/shader_enums.c b/src/glsl/nir/shader_enums.c > > index 3722475..66a25e7 100644 > > --- a/src/glsl/nir/shader_enums.c > > +++ b/src/glsl/nir/shader_enums.c > > @@ -28,6 +28,7 @@ > > > > #include "shader_enums.h" > > #include "util/macros.h" > > +#include "mesa/main/config.h" > > > > #define ENUM(x) [x] = #x > > #define NAME(val) val) < ARRAY_SIZE(names)) && names[(val)]) ? names[(val)] : "UNKNOWN") > > @@ -42,6 +43,7 @@ const char * gl_shader_stage_name(gl_shader_stage stage) > >ENUM(MESA_SHADER_FRAGMENT), > >ENUM(MESA_SHADER_COMPUTE), > > }; > > + STATIC_ASSERT(ARRAY_SIZE(names) == MESA_SHADER_STAGES); > > return NAME(stage); > > } > > > > @@ -82,6 +84,7 @@ const char * gl_vert_attrib_name(gl_vert_attrib attrib) > >ENUM(VERT_ATTRIB_GENERIC14), > >ENUM(VERT_ATTRIB_GENERIC15), > > }; > > + STATIC_ASSERT(ARRAY_SIZE(names) == VERT_ATTRIB_MAX); > > return NAME(attrib); > > } > > > > @@ -147,6 +150,7 @@ const char * gl_varying_slot_name(gl_varying_slot slot) > >ENUM(VARYING_SLOT_VAR30), > >ENUM(VARYING_SLOT_VAR31), > > }; > > + STATIC_ASSERT(ARRAY_SIZE(names) == VARYING_SLOT_MAX); > > return NAME(slot); > > } > > > > @@ -169,8 +173,10 @@ const char * gl_system_value_name(gl_system_value sysval) > > ENUM(SYSTEM_VALUE_TESS_LEVEL_INNER), > > ENUM(SYSTEM_VALUE_LOCAL_INVOCATION_ID), > > ENUM(SYSTEM_VALUE_WORK_GROUP_ID), > > + ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS), > > ENUM(SYSTEM_VALUE_VERTEX_CNT), > > }; > > + STATIC_ASSERT(ARRAY_SIZE(names) == SYSTEM_VALUE_MAX); > > return NAME(sysval); > > } > > > > @@ -182,6 +188,7 @@ const char * glsl_interp_qualifier_name(enum glsl_interp_qualifier qual) > >ENUM(INTERP_QUALIFIER_FLAT), > >ENUM(INTERP_QUALIFIER_NOPERSPECTIVE), > > }; > > + STATIC_ASSERT(ARRAY_SIZE(names) == INTERP_QUALIFIER_COUNT); > > return NAME(qual); > > } > > > > @@ -201,5 +208,6 @@ const char * gl_frag_result_name(gl_frag_result result) > >ENUM(FRAG_RESULT_DATA6), > >ENUM(FRAG_RESULT_DATA7), > > }; > > + STATIC_ASSERT(ARRAY_SIZE(names) == FRAG_RESULT_MAX); > > return NAME(result); > > } > > diff --git a/src/glsl/nir/shader_enums.h b/src/glsl/nir/shader_enums.h > > index 2a5d2c5..77638ba 100644 > > --- a/src/glsl/nir/shader_enums.h > > +++ b/src/glsl/nir/shader_enums.h > > @@ -233,6 +233,11 @@ typedef enum > > VARYING_SLOT_VAR31, > > } gl_varying_slot; > > > > + > > +#define VARYING_SLOT_MAX (VARYING_SLOT_VAR0 + MAX_VARYING) > > +#define VARYING_SLOT_PATCH0(VARYING_SLOT_MAX) > > +#define VARYING_SLOT_TESS_MAX (VARYING_SLOT_PATCH0 + MAX_VARYING) > > + > As mentioned before I'm not sure (and perhaps not the best person to > comment) on the correct definition of the macros, so I'd call this > Acked-by: Emil Velikov > > For patches 2,3,4 and 6 > Reviewed-by: Emil Velikov You can have my R-B on the series as well. I think we're going to do even more compiler source shuffling soon but this is at least the right direction. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] i965: skip control-flow aware liveness analysis if we only have 1 block
On 2015-10-13 05:17:37, Francisco Jerez wrote: > Iago Toral Quirogawrites: > > > This fixes the following test: > > > > [require] > > GL >= 3.3 > > GLSL >= 3.30 > > GL_ARB_shader_storage_buffer_object > > > > [fragment shader] > > #version 330 > > #extension GL_ARB_shader_storage_buffer_object: require > > > > buffer SSBO { > > mat4 sm4; > > }; > > > > > > mat4 um4; > > > > void main() { > > sm4 *= um4; > > This is using the value of "um4", which is never assigned, so liveness > analysis will correctly extend its live interval up to the start of the > block. This test was derived by simplifying a CTS test case. Anyway, I'm not sure what happened on the way to the commit message, but um4 should be a uniform. http://sprunge.us/cEUe -Jordan > The other problem here seems to be that the liveness analysis pass > doesn't consider scalar writes (like the ones emitted by the > combine_constants optimization pass and by emit_uniformize()) to fully > define the contents of a register, so it will incorrectly extend up the > live interval of registers defined using scalar writes even if all > components ever used in the shader have been defined individually. > Incidentally the use-def-chains-based implementation of liveness > analysis I'm working on will fix this issue easily. > > > sm4[0][0] = 0.0; > > } > > > > [test] > > link success > > > > where our liveness analysis works really bad because the control-flow part > > will expand register liveness to the end of only block we have (so every > > register will be marked alive until the end of the program). This > > artificially > > increases register pressure to a point in which we run out of registers. > > Of course, this is only a simple optimization for a trivial case, the > > underlying problem still exists and would manifest when we have more than > > one block, but it helps simple shaders such as the one in the example above > > without any effort. I guess the real fix would involve re-thinking parts of > > the > > liveness analysis algorithm... > > > > Jordan, there is another thing that we can improve for this shader that is > > specific to SSBOs: we emit the same ssbo load multiple times because we are > > playing it safe just in case there are writes in between. I think we can > > try to > > do better and not re-issue the same load if we don't have ssbo stores to > > the same address in between. I'll try to come up with a patch for this > > (hopefully we can do this inside lower_ubo_reference). > > > > The actual fix goes into patches 2 (FS) and 4 (VS). Patches 1,3 are just > > indentation fixes in the code around these. > > > > Iago Toral Quiroga (4): > > i965/fs: Fix indentation in fs_live_variables::compute_start_end > > i965/fs: skip control-flow aware liveness analysis if we only have 1 > > block > > i965/vec4: fix indentation in vec4_visitor::calculate_live_intervals > > i965/vec4: skip control-flow aware liveness analysis if we only have 1 > > block > > > > .../drivers/dri/i965/brw_fs_live_variables.cpp | 24 > > ++ > > .../drivers/dri/i965/brw_vec4_live_variables.cpp | 23 > > + > > 2 files changed, 30 insertions(+), 17 deletions(-) > > > > -- > > 1.9.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/6] nir: remove dependency on glsl
On 13 October 2015 at 16:37, Rob Clarkwrote: > On Tue, Oct 13, 2015 at 11:22 AM, Emil Velikov > wrote: >> Hi Rob, >> >> On 10 October 2015 at 19:47, Rob Clark wrote: >>> From: Rob Clark >>> >>> Move glsl_types into NIR, now that the dependency on glsl_symbol_table >>> has been split out. >>> >>> Possibly makes sense to rename things at this point, but if we do that >>> I'd like to keep it split out into a separate patch to make git history >>> easier to follow (IMHO). >>> >>> Signed-off-by: Rob Clark >>> --- >>> src/glsl/Makefile.am |3 - >>> src/glsl/Makefile.sources |4 +- >>> src/glsl/builtin_type_macros.h | 172 -- >>> src/glsl/glsl_types.cpp| 1729 >>> >>> src/glsl/glsl_types.h | 867 -- >>> src/glsl/nir/builtin_type_macros.h | 172 ++ >>> src/glsl/nir/glsl_types.cpp| 1729 >>> >>> src/glsl/nir/glsl_types.h | 867 ++ >>> src/glsl/nir/nir_types.h |2 +- >>> .../drivers/dri/i965/brw_cubemap_normalize.cpp |2 +- >>> src/mesa/drivers/dri/i965/brw_fs.cpp |2 +- >>> src/mesa/drivers/dri/i965/brw_fs.h |2 +- >>> .../dri/i965/brw_fs_channel_expressions.cpp|2 +- >>> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |2 +- >>> .../drivers/dri/i965/brw_fs_vector_splitting.cpp |2 +- >>> src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |2 +- >>> .../dri/i965/brw_lower_unnormalized_offset.cpp |2 +- >>> .../drivers/dri/i965/brw_schedule_instructions.cpp |2 +- >>> src/mesa/main/ff_fragment_shader.cpp |2 +- >>> src/mesa/main/uniforms.h |2 +- >>> src/mesa/program/ir_to_mesa.cpp|2 +- >>> src/mesa/program/sampler.cpp |2 +- >>> 22 files changed, 2784 insertions(+), 2787 deletions(-) >>> delete mode 100644 src/glsl/builtin_type_macros.h >>> delete mode 100644 src/glsl/glsl_types.cpp >>> delete mode 100644 src/glsl/glsl_types.h >>> create mode 100644 src/glsl/nir/builtin_type_macros.h >>> create mode 100644 src/glsl/nir/glsl_types.cpp >>> create mode 100644 src/glsl/nir/glsl_types.h >>> >>> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am >>> index 347919b..437c6a5 100644 >>> --- a/src/glsl/Makefile.am >>> +++ b/src/glsl/Makefile.am >>> @@ -148,9 +148,6 @@ libglsl_la_SOURCES = >>> \ >>> >>> >>> libnir_la_SOURCES =\ >>> - glsl_types.cpp \ >>> - builtin_types.cpp \ >>> - glsl_symbol_table.cpp \ >>> $(NIR_FILES)\ >>> $(NIR_GENERATED_FILES) >>> >>> diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources >>> index 436949c..6e61f23 100644 >>> --- a/src/glsl/Makefile.sources >>> +++ b/src/glsl/Makefile.sources >>> @@ -20,6 +20,8 @@ NIR_GENERATED_FILES = \ >>> NIR_FILES = \ >>> nir/glsl_to_nir.cpp \ >>> nir/glsl_to_nir.h \ >>> + nir/glsl_types.cpp \ >>> + nir/glsl_types.h \ >>> nir/nir.c \ >>> nir/nir.h \ >>> nir/nir_array.h \ >>> @@ -103,8 +105,6 @@ LIBGLSL_FILES = \ >>> glsl_parser_extras.h \ >>> glsl_symbol_table.cpp \ >>> glsl_symbol_table.h \ >>> - glsl_types.cpp \ >>> - glsl_types.h \ >>> hir_field_selection.cpp \ >>> ir_basic_block.cpp \ >>> ir_basic_block.h \ >> Can we split this into two (or more) patches. >> - move the files from glsl to glsl/nir, updating scons/android. note >> scons is missing everything NIR related. >> - fold/nuke the additional glsl requirements, from NIR. > > It is already split up this way.. this patch is primarily the move > (plus header path tweaks, etc, to keep things compiling). I don't see > how it could be split up any finer while keeping bisectability (ie. > not breaking compile in the middle). > > That said, I did completely ignore scons/android. I don't know the > first thing about scons or how to do a scons build, so I think I'll > ignore that and let someone else fix it up. I suppose I could fix > android build, although I can't build android on my laptop (and I > guess I'd have to rebase some of the other android related stuff that > isn't upstream yet), so maybe I'll just fix that in a follow-on patch > this weekend. > Essentially you'd want (I've expanded greatly, but feel free do you your own split) - Add libnir for scons. - Android split out libnir static lib. - Move the file - GLSL_FILES ->
Re: [Mesa-dev] [Mesa-stable] [PATCH v2] configure.ac: ensure RM is set
On Tue, Oct 13, 2015 at 3:25 AM, Emil Velikovwrote: > On 10 October 2015 at 07:42, Jonathan Gray wrote: >> GNU make predefines RM to rm -f but this is not required by POSIX >> so ensure that RM is set. This fixes "make clean" on OpenBSD. >> >> v2: use AC_CHECK_PROG >> > I'm ambivalent if we go the AC_CHECK_PROG vs ?= route. > > I'll let others pick their favourite, but both are > Reviewed-by: Emil Velikov I think AC_CHECK_PROG is the way to go. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: pass caller name to create_textures()
On Mon, Oct 12, 2015 at 7:44 PM, Brian Paulwrote: > Simpler than the dsa flag approach. > --- > src/mesa/main/texobj.c | 13 ++--- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/src/mesa/main/texobj.c b/src/mesa/main/texobj.c > index 60c55ae..b571b1b 100644 > --- a/src/mesa/main/texobj.c > +++ b/src/mesa/main/texobj.c > @@ -1205,17 +1205,16 @@ invalidate_tex_image_error_check(struct gl_context > *ctx, GLuint texture, > */ > static void > create_textures(struct gl_context *ctx, GLenum target, > -GLsizei n, GLuint *textures, bool dsa) > +GLsizei n, GLuint *textures, const char *caller) > { > GLuint first; > GLint i; > - const char *func = dsa ? "Create" : "Gen"; > > if (MESA_VERBOSE & (VERBOSE_API|VERBOSE_TEXTURE)) > - _mesa_debug(ctx, "gl%sTextures %d\n", func, n); > + _mesa_debug(ctx, "%s %d\n", caller, n); > > if (n < 0) { > - _mesa_error( ctx, GL_INVALID_VALUE, "gl%sTextures(n < 0)", func ); > + _mesa_error(ctx, GL_INVALID_VALUE, "%s(n < 0)", caller); >return; > } > > @@ -1236,7 +1235,7 @@ create_textures(struct gl_context *ctx, GLenum target, >texObj = ctx->Driver.NewTextureObject(ctx, name, target); >if (!texObj) { > mtx_unlock(>Shared->Mutex); > - _mesa_error(ctx, GL_OUT_OF_MEMORY, "gl%sTextures", func); > + _mesa_error(ctx, GL_OUT_OF_MEMORY, "gl%sTextures", caller); _mesa_error(ctx, GL_OUT_OF_MEMORY, "%s", caller); > return; >} > > @@ -1273,7 +1272,7 @@ void GLAPIENTRY > _mesa_GenTextures(GLsizei n, GLuint *textures) > { > GET_CURRENT_CONTEXT(ctx); > - create_textures(ctx, 0, n, textures, false); > + create_textures(ctx, 0, n, textures, "glGenTextures"); > } > > /** > @@ -1306,7 +1305,7 @@ _mesa_CreateTextures(GLenum target, GLsizei n, GLuint > *textures) >return; > } > > - create_textures(ctx, target, n, textures, true); > + create_textures(ctx, target, n, textures, "glCreateTextures"); > } > > /** > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev With the suggested change: Reviewed-by: Anuj Phogat ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] nir: split SSBO min/max atomic instrinsics into signed/unsigned versions
On Tue, Sep 29, 2015 at 09:25:27AM +0200, Iago Toral Quiroga wrote: > NIR is typeless so this is the only way to keep track of the > type to select the proper atomic to use. > > v2: > - Use imin,imax,umin,umax for the intrinsic names (Connor Abbott) > - Change message for unreachable paths (Michael Schellenberger) Looks good, Reviewed-by: Kristian Høgsberg> --- > src/glsl/nir/glsl_to_nir.cpp | 22 ++ > src/glsl/nir/nir_intrinsics.h | 6 -- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 20 ++-- > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 +++--- > 4 files changed, 43 insertions(+), 27 deletions(-) > > diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp > index f03a107..25f41a7 100644 > --- a/src/glsl/nir/glsl_to_nir.cpp > +++ b/src/glsl/nir/glsl_to_nir.cpp > @@ -662,9 +662,21 @@ nir_visitor::visit(ir_call *ir) >} else if (strcmp(ir->callee_name(), > "__intrinsic_ssbo_atomic_xor_internal") == 0) { > op = nir_intrinsic_ssbo_atomic_xor; >} else if (strcmp(ir->callee_name(), > "__intrinsic_ssbo_atomic_min_internal") == 0) { > - op = nir_intrinsic_ssbo_atomic_min; > + assert(ir->return_deref); > + if (ir->return_deref->type == glsl_type::int_type) > +op = nir_intrinsic_ssbo_atomic_imin; > + else if (ir->return_deref->type == glsl_type::uint_type) > +op = nir_intrinsic_ssbo_atomic_umin; > + else > +unreachable("Invalid type"); >} else if (strcmp(ir->callee_name(), > "__intrinsic_ssbo_atomic_max_internal") == 0) { > - op = nir_intrinsic_ssbo_atomic_max; > + assert(ir->return_deref); > + if (ir->return_deref->type == glsl_type::int_type) > +op = nir_intrinsic_ssbo_atomic_imax; > + else if (ir->return_deref->type == glsl_type::uint_type) > +op = nir_intrinsic_ssbo_atomic_umax; > + else > +unreachable("Invalid type"); >} else if (strcmp(ir->callee_name(), > "__intrinsic_ssbo_atomic_exchange_internal") == 0) { > op = nir_intrinsic_ssbo_atomic_exchange; >} else if (strcmp(ir->callee_name(), > "__intrinsic_ssbo_atomic_comp_swap_internal") == 0) { > @@ -874,8 +886,10 @@ nir_visitor::visit(ir_call *ir) > break; >} >case nir_intrinsic_ssbo_atomic_add: > - case nir_intrinsic_ssbo_atomic_min: > - case nir_intrinsic_ssbo_atomic_max: > + case nir_intrinsic_ssbo_atomic_imin: > + case nir_intrinsic_ssbo_atomic_umin: > + case nir_intrinsic_ssbo_atomic_imax: > + case nir_intrinsic_ssbo_atomic_umax: >case nir_intrinsic_ssbo_atomic_and: >case nir_intrinsic_ssbo_atomic_or: >case nir_intrinsic_ssbo_atomic_xor: > diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h > index 06f1b02..ff42bb2 100644 > --- a/src/glsl/nir/nir_intrinsics.h > +++ b/src/glsl/nir/nir_intrinsics.h > @@ -174,8 +174,10 @@ INTRINSIC(image_samples, 0, ARR(), true, 1, 1, 0, > * 3: For CompSwap only: the second data parameter. > */ > INTRINSIC(ssbo_atomic_add, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > -INTRINSIC(ssbo_atomic_min, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > -INTRINSIC(ssbo_atomic_max, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > +INTRINSIC(ssbo_atomic_imin, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > +INTRINSIC(ssbo_atomic_umin, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > +INTRINSIC(ssbo_atomic_imax, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > +INTRINSIC(ssbo_atomic_umax, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > INTRINSIC(ssbo_atomic_and, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > INTRINSIC(ssbo_atomic_or, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > INTRINSIC(ssbo_atomic_xor, 3, ARR(1, 1, 1), true, 1, 0, 0, 0) > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index a2bc5c6..0b9555c 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -1870,17 +1870,17 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , > nir_intrinsic_instr *instr > case nir_intrinsic_ssbo_atomic_add: >nir_emit_ssbo_atomic(bld, BRW_AOP_ADD, instr); >break; > - case nir_intrinsic_ssbo_atomic_min: > - if (dest.type == BRW_REGISTER_TYPE_D) > - nir_emit_ssbo_atomic(bld, BRW_AOP_IMIN, instr); > - else > - nir_emit_ssbo_atomic(bld, BRW_AOP_UMIN, instr); > + case nir_intrinsic_ssbo_atomic_imin: > + nir_emit_ssbo_atomic(bld, BRW_AOP_IMIN, instr); >break; > - case nir_intrinsic_ssbo_atomic_max: > - if (dest.type == BRW_REGISTER_TYPE_D) > - nir_emit_ssbo_atomic(bld, BRW_AOP_IMAX, instr); > - else > - nir_emit_ssbo_atomic(bld, BRW_AOP_UMAX, instr); > + case nir_intrinsic_ssbo_atomic_umin: > + nir_emit_ssbo_atomic(bld, BRW_AOP_UMIN, instr); > + break; > + case
[Mesa-dev] [RFC 1/4] glsl IR: add always_active_io attribute to ir_variable
The value will be set in separate-shader program when an input/output must remains active (i.e. deadcode removal isn't allowed because it will create interface location/name-matching mismatch) v3: * Rename the attribute * Use ir_variable directly instead of ir_variable_refcount_visitor * Move the foreach IR code in the linker file Signed-off-by: Gregory Hainaut--- src/glsl/ir.cpp | 1 + src/glsl/ir.h | 7 + src/glsl/linker.cpp | 73 + 3 files changed, 81 insertions(+) diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 2c45b9e..f3f7355 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -1650,6 +1650,7 @@ ir_variable::ir_variable(const struct glsl_type *type, const char *name, this->data.pixel_center_integer = false; this->data.depth_layout = ir_depth_layout_none; this->data.used = false; + this->data.always_active_io = false; this->data.read_only = false; this->data.centroid = false; this->data.sample = false; diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 43a2bf0..a91add9 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -658,6 +658,13 @@ public: unsigned assigned:1; /** + * When separate shader programs are enabled, only interstage + * variables can be safely removed of the shader interface. Others + * input/output must remains active. + */ + unsigned always_active_io:1; + + /** * Enum indicating how the variable was declared. See * ir_var_declaration_type. * diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index a97b4ef..3a30d0c 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -3599,6 +3599,70 @@ link_assign_subroutine_types(struct gl_shader_program *prog) } } +static void +ir_set_always_active_io(exec_list *ir, ir_variable_mode io_mode) +{ + assert(mode == ir_var_shader_in || mode == ir_var_shader_out); + + foreach_in_list(ir_instruction, node, ir) { + ir_variable *const var = node->as_variable(); + + if (var == NULL || var->data.mode != io_mode) + continue; + + var->data.always_active_io = true; + } +} + +static void +set_always_active_io(struct gl_shader_program *prog) +{ + unsigned first, last; + assert(prog->SeparateShader); + + first = MESA_SHADER_STAGES; + last = 0; + + /* Determine first and last stage. Excluding the compute stage */ + for (unsigned i = 0; i < MESA_SHADER_COMPUTE; i++) { + if (!prog->_LinkedShaders[i]) + continue; + if (first == MESA_SHADER_STAGES) + first = i; + last = i; + } + + if (first == MESA_SHADER_STAGES) + return; + + for (unsigned stage = 0; stage < MESA_SHADER_STAGES; stage++) { + gl_shader *sh = prog->_LinkedShaders[stage]; + if (!sh) + continue; + + if (first == last) { + /* Single shader program: allowed inactive variable + * 1/ input of the VS + * 2/ output of the FS + */ + if (stage != MESA_SHADER_VERTEX) +ir_set_always_active_io(sh->ir, ir_var_shader_in); + if (stage != MESA_SHADER_FRAGMENT) +ir_set_always_active_io(sh->ir, ir_var_shader_out); + } else { + /* Multiple shaders program: allowed inactive variable + * 1/ input of the VS + * 2/ output of the FS + * 3/ interstage variables + */ + if (stage == first && stage != MESA_SHADER_VERTEX) +ir_set_always_active_io(sh->ir, ir_var_shader_in); + else if (stage == last && stage != MESA_SHADER_FRAGMENT) +ir_set_always_active_io(sh->ir, ir_var_shader_out); + } + } +} + void link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) { @@ -3858,6 +3922,15 @@ link_shaders(struct gl_context *ctx, struct gl_shader_program *prog) } } + + /** +* When separate shader programs are enabled, only interstage +* variables can be safely removed of the shader interface. Others +* input/output must remains active. +*/ + if (prog->SeparateShader) + set_always_active_io(prog); + if (!interstage_cross_validate_uniform_blocks(prog)) goto done; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC 2/4] glsl IR: only allow optimization of interstage variable
GL_ARB_separate_shader_objects allow to match by name variable or block interface. Input varying can't be removed because it is will impact the location assignment. It fixes the bug 79783 and likely any application that uses GL_ARB_separate_shader_objects extension. piglit test: arb_separate_shader_object-rendezvous_by_name Signed-off-by: Gregory Hainaut--- src/glsl/opt_dead_code.cpp | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp index 071485a..37329f6 100644 --- a/src/glsl/opt_dead_code.cpp +++ b/src/glsl/opt_dead_code.cpp @@ -75,6 +75,24 @@ do_dead_code(exec_list *instructions, bool uniform_locations_assigned) || !entry->declaration) continue; + /* Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.5 + * (Core Profile) spec says: + * + *"With separable program objects, interfaces between shader + *stages may involve the outputs from one program object and the + *inputs from a second program object. For such interfaces, it is + *not possible to detect mismatches at link time, because the + *programs are linked separately. When each such program is + *linked, all inputs or outputs interfacing with another program + *stage are treated as active." + */ + if (entry->var->data.always_active_io && +(!entry->var->data.explicit_location || + entry->var->data.location >= VARYING_SLOT_VAR0) && +(entry->var->data.mode == ir_var_shader_in || + entry->var->data.mode == ir_var_shader_out)) + continue; + if (entry->assign) { /* Remove a single dead assignment to the variable we found. * Don't do so if it's a shader or function output or a shader -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC 0/4] V3: Improve GLSL support of GL_ARB_separate_shader_objects
New version that improves previous code quality and fixes several outstanding issues. Piglit wasn't run yet. v3: Squash commit 1&2 * Use a better name for the new attribute: always_active_io * Use ir_variable directly instead of ir_variable_refcount_visitor * Put related code in linker.cpp Add 2 new commits to fix wrong interface matching in more complex case. Commit 3: avoid collision between user and linker slot assignment Commit 4: avoid unpredictable sorting of varying Commit 1/2/3 fix the piglit test: arb_separate_shader_object-rendezvous_by_name posted on piglit ML Commit 4 was tested on the PCSX2 application. I need to implement a new test. Gregory Hainaut (4): glsl IR: add always_active_io attribute to ir_variable glsl IR: only allow optimization of interstage variable glsl: avoid linker and user varying location to overlap glsl: don't sort varying in separate shader mode src/glsl/ir.cpp| 1 + src/glsl/ir.h | 7 + src/glsl/link_varyings.cpp | 76 ++ src/glsl/linker.cpp| 73 src/glsl/opt_dead_code.cpp | 18 +++ 5 files changed, 169 insertions(+), 6 deletions(-) -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC 3/4] glsl: avoid linker and user varying location to overlap
Current behavior on the interface matching: layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user out1; // Assigned to VARYING_SLOT_VAR0 by the linker New behavior on the interface matching: layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user out1; // Assigned to VARYING_SLOT_VAR1 by the linker piglit: arb_separate_shader_object-rendezvous_by_name Signed-off-by: Gregory Hainaut--- src/glsl/link_varyings.cpp | 46 +++--- 1 file changed, 43 insertions(+), 3 deletions(-) diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 7e77a67..6d6e9b5 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -766,7 +766,7 @@ public: gl_shader_stage consumer_stage); ~varying_matches(); void record(ir_variable *producer_var, ir_variable *consumer_var); - unsigned assign_locations(); + unsigned assign_locations(uint64_t reserved_slots); void store_locations() const; private: @@ -986,7 +986,7 @@ varying_matches::record(ir_variable *producer_var, ir_variable *consumer_var) * passed to varying_matches::record(). */ unsigned -varying_matches::assign_locations() +varying_matches::assign_locations(uint64_t reserved_slots) { /* Sort varying matches into an order that makes them easy to pack. */ qsort(this->matches, this->num_matches, sizeof(*this->matches), @@ -1013,6 +1013,10 @@ varying_matches::assign_locations() != this->matches[i].packing_class) { *location = ALIGN(*location, 4); } + while ((*location < MAX_VARYING * 4u) && +(reserved_slots & (1u << *location / 4u))) { + *location = ALIGN(*location + 1, 4); + } this->matches[i].generic_location = *location; @@ -1376,6 +1380,38 @@ canonicalize_shader_io(exec_list *ir, enum ir_variable_mode io_mode) } /** + * Generate a bitfield map of the already reserved slots for a shader. + * + * In theory a 32 bits value will be enough but a 64 bits value is future proof. + */ +uint64_t +reserved_varying_slot(struct gl_shader *stage, ir_variable_mode io_mode) +{ + assert(mode == ir_var_shader_in || mode == ir_var_shader_out); + assert(MAX_VARYING <= 64); /* avoid an overflow of the returned value */ + + uint64_t slots = 0; + int var_slot; + + if (!stage) + return slots; + + foreach_in_list(ir_instruction, node, stage->ir) { + ir_variable *const var = node->as_variable(); + + if (var == NULL || var->data.mode != io_mode || !var->data.explicit_location) + continue; + + var_slot = var->data.location - VARYING_SLOT_VAR0; + if (var_slot >= 0 && var_slot < MAX_VARYING) + slots |= 1u << var_slot; + } + + return slots; +} + + +/** * Assign locations for all variables that are produced in one pipeline stage * (the "producer") and consumed in the next stage (the "consumer"). * @@ -1550,7 +1586,11 @@ assign_varying_locations(struct gl_context *ctx, matches.record(matched_candidate->toplevel_var, NULL); } - const unsigned slots_used = matches.assign_locations(); + const uint64_t reserved_slots = + reserved_varying_slot(producer, ir_var_shader_out) | + reserved_varying_slot(consumer, ir_var_shader_in); + + const unsigned slots_used = matches.assign_locations(reserved_slots); matches.store_locations(); for (unsigned i = 0; i < num_tfeedback_decls; ++i) { -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC 4/4] glsl: don't sort varying in separate shader mode
Current issue is the addition of FLAT qualifier on varying_matches::record() which break the varying expected order Future issue is the removal of the interpolation qualifier matching constrain In my humble opinion, it is the responsability of the GL developer to optimize their slots assignment in SSO with the help of GL_ARB_enhanced_layouts Signed-off-by: Gregory Hainaut--- src/glsl/link_varyings.cpp | 36 ++-- 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 6d6e9b5..31efee7 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -766,7 +766,7 @@ public: gl_shader_stage consumer_stage); ~varying_matches(); void record(ir_variable *producer_var, ir_variable *consumer_var); - unsigned assign_locations(uint64_t reserved_slots); + unsigned assign_locations(uint64_t reserved_slots, bool separate_shader); void store_locations() const; private: @@ -986,11 +986,34 @@ varying_matches::record(ir_variable *producer_var, ir_variable *consumer_var) * passed to varying_matches::record(). */ unsigned -varying_matches::assign_locations(uint64_t reserved_slots) +varying_matches::assign_locations(uint64_t reserved_slots, bool separate_shader) { - /* Sort varying matches into an order that makes them easy to pack. */ - qsort(this->matches, this->num_matches, sizeof(*this->matches), - _matches::match_comparator); + /* Disable the varying sorting for separate shader program +* 1/ All programs must sort the code in the same order to guarantee the +*interface matching. However varying_matches::record() will change the +*interpolation qualifier of some stages. +* +* 2/ GLSL version 4.50 removes the matching constrain on the interpolation +*qualifier. +* +* Chapter 4.5 of GLSL 4.40: +*"The type and presence of interpolation qualifiers of variables with +*the same name declared in all linked shaders for the same cross-stage +*interface must match, otherwise the link command will fail. +* +*When comparing an output from one stage to an input of a subsequent +*stage, the input and output don't match if their interpolation +*qualifiers (or lack thereof) are not the same." +* +* Chapter 4.5 of GLSL 4.50: +*"It is a link-time error if, within the same stage, the interpolation +*qualifiers of variables of the same name do not match." +*/ + if (!separate_shader) { + /* Sort varying matches into an order that makes them easy to pack. */ + qsort(this->matches, this->num_matches, sizeof(*this->matches), +_matches::match_comparator); + } unsigned generic_location = 0; unsigned generic_patch_location = MAX_VARYING*4; @@ -1590,7 +1613,8 @@ assign_varying_locations(struct gl_context *ctx, reserved_varying_slot(producer, ir_var_shader_out) | reserved_varying_slot(consumer, ir_var_shader_in); - const unsigned slots_used = matches.assign_locations(reserved_slots); + const unsigned slots_used = matches.assign_locations(reserved_slots, +prog->SeparateShader); matches.store_locations(); for (unsigned i = 0; i < num_tfeedback_decls; ++i) { -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] llvmpipe broken on Skylake Pentium (LP_NATIVE_VECTOR_WIDTH=128)
On Mon, 2015-10-12 at 22:55 +0200, Roland Scheidegger wrote: > I don't know that looks like a generic string you're getting back. > x86-64 IIRC implies sse2 in llvm, but not the other sseX flags (and if > we detected sse41 we're going to use intrinsics for these, which then > may not be available in llvm _potentially_ (I'm really not sure here as > this code changed, and mesa generally is only adapted to such changes > once it breaks..., but the string here will be something less generic > like "ivybridge" usually on real hw). This was the clue I needed, SSE4.1 intrinsics die because llvm doesn't think they're legal for the target, because llvm 3.6 doesn't know about broadwell or skylake so they just appear as generic. Adding their model numbers to the Haswell path seems to do the trick (since apparently Haswell Pentium has the same property re AVX). - ajax ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] llvmpipe broken on Skylake Pentium (LP_NATIVE_VECTOR_WIDTH=128)
Am 13.10.2015 um 19:59 schrieb Adam Jackson: > On Mon, 2015-10-12 at 22:55 +0200, Roland Scheidegger wrote: > >> I don't know that looks like a generic string you're getting back. >> x86-64 IIRC implies sse2 in llvm, but not the other sseX flags (and if >> we detected sse41 we're going to use intrinsics for these, which then >> may not be available in llvm _potentially_ (I'm really not sure here as >> this code changed, and mesa generally is only adapted to such changes >> once it breaks..., but the string here will be something less generic >> like "ivybridge" usually on real hw). > > This was the clue I needed, SSE4.1 intrinsics die because llvm doesn't > think they're legal for the target, because llvm 3.6 doesn't know about > broadwell or skylake so they just appear as generic. Adding their > model numbers to the Haswell path seems to do the trick (since > apparently Haswell Pentium has the same property re AVX). > > - ajax > I think though it's still possible to tell llvm that these are legal by setting mattr's, so they'd work regardless the cpu detected. So we could/should probably fix this in mesa too. Albeit as mentioned the mechanism there changed somewhat over time in llvm. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] glsl: Support uint index in lower_vector_insert
The ES31-CTS.compute_shader.pipeline-compute-chain test case was generating an unsigned index by using gl_LocalInvocationID.x and gl_LocalInvocationID.y as array indices. Signed-off-by: Jordan Justen--- src/glsl/lower_vector_insert.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/glsl/lower_vector_insert.cpp b/src/glsl/lower_vector_insert.cpp index 6d7cfa9..26d31b0 100644 --- a/src/glsl/lower_vector_insert.cpp +++ b/src/glsl/lower_vector_insert.cpp @@ -108,9 +108,13 @@ vector_insert_visitor::handle_rvalue(ir_rvalue **rv) factory.emit(assign(temp, expr->operands[0])); factory.emit(assign(src_temp, expr->operands[1])); + assert(expr->operands[2]->type == glsl_type::int_type || + expr->operands[2]->type == glsl_type::uint_type); + for (unsigned i = 0; i < expr->type->vector_elements; i++) { ir_constant *const cmp_index = -new(factory.mem_ctx) ir_constant(int(i)); +ir_constant::zero(factory.mem_ctx, expr->operands[2]->type); + cmp_index->value.u[0] = i; ir_variable *const cmp_result = factory.make_temp(glsl_type::bool_type, "index_condition"); -- 2.5.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Restore compute shader support in brw_nir_lower_inputs
The commit shown below caused compute shaders to hit the unreachable in the default of the switch block. Restore compute shaders to use the fragment shader path. Also, simplify the fragment/compute path to only support scalar mode. commit 2953c3d76178d7589947e6ea1dbd902b7b02b3d4 Author: Kenneth GraunkeDate: Fri Aug 14 15:15:11 2015 -0700 i965/vs: Map scalar VS input locations properly; avoid tons of MOVs. Signed-off-by: Jordan Justen Cc: Kenneth Graunke --- src/mesa/drivers/dri/i965/brw_nir.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index 4f35d81..357ee4f 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -96,8 +96,10 @@ brw_nir_lower_inputs(nir_shader *nir, bool is_scalar) } break; case MESA_SHADER_FRAGMENT: + case MESA_SHADER_COMPUTE: + assert(is_scalar); nir_assign_var_locations(>inputs, >num_inputs, - is_scalar ? type_size_scalar : type_size_vec4); + type_size_scalar); break; default: unreachable("unsupported shader stage"); -- 2.5.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] glsl: Support uint index in do_vec_index_to_cond_assign
The ES31-CTS.compute_shader.pipeline-compute-chain test case was generating an unsigned index by using gl_LocalInvocationID.x and gl_LocalInvocationID.y as array indices. Signed-off-by: Jordan Justen--- src/glsl/lower_vec_index_to_cond_assign.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/glsl/lower_vec_index_to_cond_assign.cpp b/src/glsl/lower_vec_index_to_cond_assign.cpp index 0c3394a..b623882 100644 --- a/src/glsl/lower_vec_index_to_cond_assign.cpp +++ b/src/glsl/lower_vec_index_to_cond_assign.cpp @@ -88,7 +88,9 @@ ir_vec_index_to_cond_assign_visitor::convert_vec_index_to_cond_assign(void *mem_ exec_list list; /* Store the index to a temporary to avoid reusing its tree. */ - index = new(base_ir) ir_variable(glsl_type::int_type, + assert(orig_index->type == glsl_type::int_type || + orig_index->type == glsl_type::uint_type); + index = new(base_ir) ir_variable(orig_index->type, "vec_index_tmp_i", ir_var_temporary); list.push_tail(index); -- 2.5.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Fix variable_referenced() for vector_{extract, insert} expressions
On Mon, Oct 05, 2015 at 11:42:43AM +0200, Iago Toral Quiroga wrote: > We get these when we operate on vector variables with array accessors > (i.e. things like a[0] where 'a' is a vec4). When we call > variable_referenced() > on these expressions we want to return a reference to 'a' instead of NULL. > > This fixes a problem where we pass a[0] as the first argument to an atomic > SSBO function that expects a buffer variable. In order to check this, we use > variable_referenced(), but that is currently returning NULL in this case, > since > the underlying rvalue is a vector_extract expression. Reviewed-by: Kristian Høgsberg> --- > src/glsl/ir.cpp | 16 > src/glsl/ir.h | 2 ++ > 2 files changed, 18 insertions(+) > > diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp > index 2c45b9e..4c22843 100644 > --- a/src/glsl/ir.cpp > +++ b/src/glsl/ir.cpp > @@ -662,6 +662,22 @@ ir_expression::get_operator(const char *str) > return (ir_expression_operation) -1; > } > > +ir_variable * > +ir_expression::variable_referenced() const > +{ > + switch (operation) { > + case ir_binop_vector_extract: > + case ir_triop_vector_insert: > + /* We get these for things like a[0] where a is a vector type. In > these > + * cases we want variable_referenced() to return the actual vector > + * variable this is wrapping. > + */ > + return operands[0]->variable_referenced(); > + default: > + return ir_rvalue::variable_referenced(); > + } > +} > + > ir_constant::ir_constant() > : ir_rvalue(ir_type_constant) > { > diff --git a/src/glsl/ir.h b/src/glsl/ir.h > index 43a2bf0..9c9f22d 100644 > --- a/src/glsl/ir.h > +++ b/src/glsl/ir.h > @@ -1731,6 +1731,8 @@ public: > > virtual ir_visitor_status accept(ir_hierarchical_visitor *); > > + virtual ir_variable *variable_referenced() const; > + > ir_expression_operation operation; > ir_rvalue *operands[4]; > }; > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] i965/vec4: adding vec4_cmod_propagation optimization
On Tue, Oct 13, 2015 at 1:49 AM, Alejandro Piñeirowrote: > On 13/10/15 03:10, Matt Turner wrote: >> Looks like this is causing an intermittent failure on HSW in our >> Jenkins system (but I'm not able to reproduce locally) and a >> consistent failure on G45. I'll investigate that. > > Ok, will hold on, and meanwhile I will try to reproduce the problem on > HSW. Unfortunately I don't have any G45 available, so I will not be able > to help on that. Thanks for taking a look there. Okay, I think I see what's going on: --- good2015-10-13 13:34:40.753357253 -0700 +++ bad 2015-10-13 13:36:06.114290094 -0700 -and(8) g6<1>.xDg6<4,4,1>.xD1D { align16 }; -cmp.nz.f0(8)null-g6<4,4,1>.xD 0D { align16 }; +and.nz.f0(8)g6<1>.xDg6<4,4,1>.xD1D { align16 }; We're propagating a .nz conditional mod from a CMP with a null dest (that has a writemask of .xyzw) to an AND that has a writemask of only .x, so only the .x channel of the flag is now being updated. I think for now the thing to do is add (inst->dst.writemask & ~scan_inst->dst.writemask) != 0) to the list of conditions under which we bail out. That is, if the instruction we want to propagate the cmod onto writes fewer channels, we can't do the optimization. With that, the block should look like: if ((scan_inst->predicate && scan_inst->opcode != BRW_OPCODE_SEL) || scan_inst->dst.reg_offset != inst->src[0].reg_offset || (scan_inst->dst.writemask != WRITEMASK_X && scan_inst->dst.writemask != WRITEMASK_XYZW) || (scan_inst->dst.writemask == WRITEMASK_XYZW && inst->src[0].swizzle != BRW_SWIZZLE_XYZW) || (inst->dst.writemask & ~scan_inst->dst.writemask) != 0) break; The good news is that, unless I've done something wrong, this doesn't affect any shaders in shader-db on ILK or HSW (I only tested those two, but I expect the results are the same everywhere). Apparently this is a pretty rare case. With that change, my R-b still stands, though we should have a unit test for this case as well in 3/5. Thanks! Matt ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)
On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierrywrote: > On 10/13/2015 3:13 PM, Emil Velikov wrote: >> >> On 13 October 2015 at 13:16, Michel Thierry >> wrote: >>> >>> On 10/6/2015 2:12 PM, Michel Thierry wrote: On 10/5/2015 7:06 PM, Kristian Høgsberg wrote: > > > On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry > wrote: >> >> >> On 9/14/2015 2:54 PM, Michał Winiarski wrote: >>> >>> >>> >>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote: Gen8+ supports 48-bit virtual addresses, but some objects must always be allocated inside the 32-bit address range. In specific, any resource used with flat/heapless (0x-0xf000) General State Heap (GSH) or Instruction State Heap (ISH) must be in a 32-bit range, because the General State Offset and Instruction State Offset are limited to 32-bits. The i915 driver has been modified to provide a flag to set when the 4GB limit is not necessary in a given bo (EXEC_OBJECT_SUPPORTS_48B_ADDRESS). 48-bit range will only be used when explicitly requested. Callers to the existing drm_intel_bo_emit_reloc function should set the use_48b_address_range flag beforehand, in order to use full ppgtt range. v2: Make set/clear functions nops on pre-gen8 platforms, and use them internally in emit_reloc functions (Ben) s/48BADDRESS/48B_ADDRESS/ (Dave) v3: Keep set/clear functions internal, no-one needs to use them directly. v4: Don't set 48bit-support flag in emit reloc, check for ppgtt type before enabling set/clear function, print full offsets in debug statements, using port of lower_32_bits and upper_32_bits from linux kernel (Michał) References: http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html Cc: Ben Widawsky Cc: Michał Winiarski >>> >>> >>> >>> >>> +Kristian >>> >>> LGTM. >>> Acked-by: Michał Winiarski >>> Signed-off-by: Michel Thierry >> >> >> >> >> >> Hi Kristian, >> >> More comments on this? >> I've resent the mesa patch with the last changes >> >> >> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html). >> >> >> Michał has agreed with the interface and will use it for his work. >> If mesa doesn't plan to use this for now, it's ok. The kernel changes >> have >> been done to prevent any regressions when the support 48-bit flag is >> not >> set. > > > > I didn't get any replies to my last comments on this: > > http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html > > Basically, I tried explicitly placing buffers above 8G and didn't see > the HW problem described (ie it all worked fine). I still think > there's some confusion as to what the W/A is about. > > Here's my take: the W/A is a SW-only workaround to handle the cases > where a driver uses heapless and 48-bit ppgtt. The problem is that the > heaps are limited to 4G but with 48bit ppgtt a buffer can be placed > anywhere it the 48 bit address space. If that happens it's consideredd > out-of-bounds for the heap and access fails. To prevent this we need > to make sure the bos we address in a heapless fashion are placed below > 4g. > > For mesa, we only configure general state base as heap-less, which > affects scratch bos. What this boils down to is that we need the 4G > restriction only for the scratch bos set up on 3DSTATE_VS, 3DSTATE_GS > and 3DSTATE_PS (and tesselation stage eventually). Look for the > OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c, gen8_gs_state.c > and gen8_ps_state.c I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it isn't exclusive to the scratch bos (the error state points to that instruction). >>> >>> Hi, >>> >>> Anymore inputs about this patch? >>> AFAIK, if changes are required based on further comments from Kristian, >>> these will be in the mesa patch[1], not libdrm. Also, Michał will use >>> this >>> flag in another project. >>> >> The comment seems quite brief and I'm not sure it fully addresses >> Kristian's concern. I'd suspect that providing reference to the HW >> documentation (confirming your assumption) might be beneficial. >> > > Sure, attached is the hang I get if
Re: [Mesa-dev] [PATCH v2] nir: split SSBO min/max atomic instrinsics into signed/unsigned versions
On Tuesday, September 29, 2015 09:25:27 AM Iago Toral Quiroga wrote: > NIR is typeless so this is the only way to keep track of the > type to select the proper atomic to use. > > v2: > - Use imin,imax,umin,umax for the intrinsic names (Connor Abbott) > - Change message for unreachable paths (Michael Schellenberger) > --- > src/glsl/nir/glsl_to_nir.cpp | 22 ++ > src/glsl/nir/nir_intrinsics.h | 6 -- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 20 ++-- > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 +++--- > 4 files changed, 43 insertions(+), 27 deletions(-) Thanks, Iago! This looks good to me. Reviewed-by: Kenneth Graunkesignature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 6/7] radeonsi: Add DCC decompress.
On Tue, Oct 13, 2015 at 2:19 AM, Marek Olšákwrote: > On Mon, Oct 12, 2015 at 5:05 PM, Bas Nieuwenhuizen > wrote: >> Hi Marek, >> >> Thanks for the quick review. >> >> I do not think I understand the sharing semantics. We currently have >> fast clear for scanout surfaces with the CMASK and eliminate it on >> flush resource. I would think we could do that similarly with DCC fast >> clear. Both require a flush_resource after modifying the resource >> before other applications can use it. > > I meant not using fast clear for shared textures, so flush_resource > would be a no-op. We can't use fast clear for shared textures, because > we can't communicate clear color changes between processes. > >> >> Furthermore, if we disable DCC for image stores, we also need to >> communicate that. We could leave DCC enabled for sampling as long as >> the DCC buffer stays in decompressed state. But we would need to >> communicate that DCC should not be used anymore for rendering. > > Image stores are the only big problem preventing DCC sharing. If we > share textures with DCC enabled, we can't disable it, which means we > have to decompress before image stores are used, which can result in > performance that is worse than without DCC. I've been thinking about this and I propose the following solution: 1) If a texture isn't shared, decompress and disable DCC permanently for that texture when we get a shader with an image store. 2) If a texture is shared, decompress before it's used by image stores, but don't disable DCC. This case shouldn't occur and we should avoid using image stores for 2D acceleration. A side note: DCC and SDMA are mutually exclusive, so we must choose carefully which one we're going to use for 2D acceleration. I'll leave that decision to Michel. If we decide to use DCC for 2D accel, SDMA image blit support will be pretty useless on VI. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Don't hardcode FS in "validation failed!" message.
Instead, print "Scalar VS" or "Scalar FS". Otherwise it's really confusing which stage is broken. Signed-off-by: Kenneth Graunke--- src/mesa/drivers/dri/i965/brw_fs_validate.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp index d0e04f3..814c551 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp @@ -32,7 +32,7 @@ #define fsv_assert(cond) \ if (!(cond)) { \ - fprintf(stderr, "ASSERT: FS validation failed!\n"); \ + fprintf(stderr, "ASSERT: Scalar %s validation failed!\n", stage_abbrev); \ dump_instruction(inst, stderr); \ fprintf(stderr, "%s:%d: %s\n", __FILE__, __LINE__, #cond); \ abort(); \ -- 2.6.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] Implement separate index spaces for UBOs and SSBOs
On Fri, Oct 09, 2015 at 03:23:33PM +0200, Iago Toral Quiroga wrote: > See the rationale for this in [1], no piglit regressions observed in my > IvyBridge laptop. > > Patch 1: Renames {Num}UniformBlocks to {Num}BufferInterfaceBlocks. This is > more consistent with the current implementation, since right now > UniformBlocks contains both UBOs and SSBOs. > > Patch 2: Adds separate UniformBlocks and ShaderStorageBlocks arrays. > > Patch 3: Changes lower_ubo_reference pass to use UniformBlocks and > ShaderStorageBlocks instead of BufferInterfaceBlocks so we get > block indices in separate index spaces. > > Patch 4: Changes i965 to work with separate index spaces. > > I think there are other places that use BufferInterfaceBlocks after > linking that can be rewritten to use UniformBlocks or ShaderStorageBlocks > more efficiently. I'd like to tackle that too, but I think it makes sense > to land this first I think. Sure, that sounds good. I wasn't sure about the name 'BufferInterfaceBlocks', but thinking about it, I think it's just right. They're interface blocks that are backed by buffers (as opposed to input or output interface blocks). Series Reviewed-by: Kristian Høgsberg> [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/095951.html > > Iago Toral Quiroga (4): > mesa: Rename {Num}UniformBlocks to {Num}BufferInterfaceBlocks > mesa: Add {Num}UniformBlocks and {Num}ShaderStorageBlocks to > gl_shader{_program} > glsl/lower_ubo_reference: lower UBOs and SSBOs to separate index > spaces > i965: Adapt SSBOs to work with their own separate index space > > src/glsl/link_uniform_initializers.cpp | 4 +- > src/glsl/link_uniforms.cpp | 22 ++--- > src/glsl/linker.cpp | 105 > ++- > src/glsl/lower_ubo_reference.cpp | 22 +++-- > src/glsl/nir/glsl_to_nir.cpp | 2 +- > src/glsl/standalone_scaffolding.cpp | 11 ++- > src/mesa/drivers/dri/i965/brw_context.h | 4 +- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 24 +++--- > src/mesa/drivers/dri/i965/brw_shader.cpp | 7 +- > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 24 +++--- > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 79 + > src/mesa/main/mtypes.h | 53 +++- > src/mesa/main/shader_query.cpp | 4 +- > src/mesa/main/shaderapi.c| 4 +- > src/mesa/main/shaderobj.c| 4 +- > src/mesa/main/uniforms.c | 12 +-- > src/mesa/state_tracker/st_atom_constbuf.c| 4 +- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 +- > 18 files changed, 257 insertions(+), 132 deletions(-) > > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Don't hardcode FS in "validation failed!" message.
On Tue, Oct 13, 2015 at 3:17 PM, Kenneth Graunkewrote: > Instead, print "Scalar VS" or "Scalar FS". Otherwise it's really > confusing which stage is broken. > > Signed-off-by: Kenneth Graunke Reviewed-by: Kristian Høgsberg > --- > src/mesa/drivers/dri/i965/brw_fs_validate.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp > b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp > index d0e04f3..814c551 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_validate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_validate.cpp > @@ -32,7 +32,7 @@ > > #define fsv_assert(cond) \ > if (!(cond)) { \ > - fprintf(stderr, "ASSERT: FS validation failed!\n"); \ > + fprintf(stderr, "ASSERT: Scalar %s validation failed!\n", > stage_abbrev); \ >dump_instruction(inst, stderr); \ >fprintf(stderr, "%s:%d: %s\n", __FILE__, __LINE__, #cond); \ >abort(); \ > -- > 2.6.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] nir: Add helpers for creating variables and adding them to lists
On Mon, 2015-10-12 at 07:47 -0700, Jason Ekstrand wrote: > > On Oct 12, 2015 1:26 AM, "Iago Toral"wrote: > > > > On Fri, 2015-10-09 at 07:09 -0700, Jason Ekstrand wrote: > > > --- > > > src/glsl/nir/glsl_to_nir.cpp | 40 - > > > src/glsl/nir/nir.c | 66 > ++ > > > src/glsl/nir/nir.h | 20 + > > > src/mesa/program/prog_to_nir.c | 19 +--- > > > 4 files changed, 99 insertions(+), 46 deletions(-) > > > > > > diff --git a/src/glsl/nir/glsl_to_nir.cpp > b/src/glsl/nir/glsl_to_nir.cpp > > > index 874e361..5250080 100644 > > > --- a/src/glsl/nir/glsl_to_nir.cpp > > > +++ b/src/glsl/nir/glsl_to_nir.cpp > > > @@ -389,35 +389,10 @@ nir_visitor::visit(ir_variable *ir) > > > > > > var->interface_type = ir->get_interface_type(); > > > > > > - switch (var->data.mode) { > > > - case nir_var_local: > > > - exec_list_push_tail(>locals, >node); > > > - break; > > > - > > > - case nir_var_global: > > > - exec_list_push_tail(>globals, >node); > > > - break; > > > - > > > - case nir_var_shader_in: > > > - exec_list_push_tail(>inputs, >node); > > > - break; > > > - > > > - case nir_var_shader_out: > > > - exec_list_push_tail(>outputs, >node); > > > - break; > > > - > > > - case nir_var_uniform: > > > - case nir_var_shader_storage: > > > - exec_list_push_tail(>uniforms, >node); > > > - break; > > > - > > > - case nir_var_system_value: > > > - exec_list_push_tail(>system_values, >node); > > > - break; > > > - > > > - default: > > > - unreachable("not reached"); > > > - } > > > + if (var->data.mode == nir_var_local) > > > + nir_function_impl_add_variable(impl, var); > > > + else > > > + nir_shader_add_variable(shader, var); > > > > > > _mesa_hash_table_insert(var_table, ir, var); > > > this->var = var; > > > @@ -2023,13 +1998,10 @@ nir_visitor::visit(ir_constant *ir) > > > * constant initializer and return a dereference. > > > */ > > > > > > - nir_variable *var = ralloc(this->shader, nir_variable); > > > - var->name = ralloc_strdup(var, "const_temp"); > > > - var->type = ir->type; > > > - var->data.mode = nir_var_local; > > > + nir_variable *var = > > > + nir_local_variable_create(this->impl, ir->type, > "const_temp"); > > > var->data.read_only = true; > > > var->constant_initializer = constant_copy(ir, var); > > > - exec_list_push_tail(>impl->locals, >node); > > > > > > this->deref_head = nir_deref_var_create(this->shader, var); > > > this->deref_tail = >deref_head->deref; > > > diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c > > > index e12da80..3645726 100644 > > > --- a/src/glsl/nir/nir.c > > > +++ b/src/glsl/nir/nir.c > > > @@ -103,6 +103,72 @@ nir_reg_remove(nir_register *reg) > > > exec_node_remove(>node); > > > } > > > > > > +void > > > +nir_shader_add_variable(nir_shader *shader, nir_variable *var) > > > +{ > > > + switch (var->data.mode) { > > > + case nir_var_local: > > > + assert(!"nir_shader_add_variable cannot be used for local > variables"); > > > + break; > > > + > > > + case nir_var_global: > > > + exec_list_push_tail(>globals, >node); > > > + break; > > > + > > > + case nir_var_shader_in: > > > + exec_list_push_tail(>inputs, >node); > > > + break; > > > + > > > + case nir_var_shader_out: > > > + exec_list_push_tail(>outputs, >node); > > > + break; > > > + > > > + case nir_var_uniform: > > > + case nir_var_shader_storage: > > > + exec_list_push_tail(>uniforms, >node); > > > + break; > > > + > > > + case nir_var_system_value: > > > + exec_list_push_tail(>system_values, >node); > > > + break; > > > + } > > > +} > > > + > > > +nir_variable * > > > +nir_variable_create(nir_shader *shader, nir_variable_mode mode, > > > +const struct glsl_type *type, const char > *name) > > > +{ > > > + nir_variable *var = rzalloc(shader, nir_variable); > > > + var->name = ralloc_strdup(var, name); > > > + var->type = type; > > > + var->data.mode = mode; > > > + > > > + if ((mode == nir_var_shader_in && shader->stage != > MESA_SHADER_VERTEX) || > > > + (mode == nir_var_shader_out && shader->stage != > MESA_SHADER_FRAGMENT)) > > > + var->data.interpolation = INTERP_QUALIFIER_SMOOTH; > > > + > > > + if (mode == nir_var_shader_in || mode == nir_var_uniform) > > > + var->data.read_only = true; > > > + > > > + nir_shader_add_variable(shader, var); > > > + > > > + return var; > > > +} > > > + > > > +nir_variable * > > > +nir_local_variable_create(nir_function_impl *impl, > > > + const struct glsl_type *type, const > char *name) > > > +{ > > > + nir_variable *var = rzalloc(impl->overload->function->shader, > nir_variable); > > > + var->name = ralloc_strdup(var, name); > > > + var->type = type;