[Mesa-dev] [PATCH] i965/clip: Fix brw_clip_unfilled.c/compute_offset's assembly.
Due to the destination register width of 1 or 2, these instructions get ExecSize 1 or 2. But dir and offset (used as src0) are both registers of width 4, violating the execsize = width assertion. I honestly don't think this could have ever worked. Fixes Piglit's polygon-offset and polygon-mode-offset tests on Gen4-5. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70441 Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_clip_unfilled.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Only tested on Ironlake. No Piglit regressions. Two fixes. diff --git a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c index 5104276..82d7b64 100644 --- a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c +++ b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c @@ -198,7 +198,7 @@ static void compute_offset( struct brw_clip_compile *c ) struct brw_reg dir = c-reg.dir; brw_math_invert(p, get_element(off, 2), get_element(dir, 2)); - brw_MUL(p, vec2(off), dir, get_element(off, 2)); + brw_MUL(p, vec2(off), vec2(dir), get_element(off, 2)); brw_CMP(p, vec1(brw_null_reg()), @@ -210,8 +210,8 @@ static void compute_offset( struct brw_clip_compile *c ) brw_abs(get_element(off, 0)), brw_abs(get_element(off, 1))); brw_inst_set_pred_control(brw, brw_last_inst, BRW_PREDICATE_NORMAL); - brw_MUL(p, vec1(off), off, brw_imm_f(c-key.offset_factor)); - brw_ADD(p, vec1(off), off, brw_imm_f(c-key.offset_units)); + brw_MUL(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_factor)); + brw_ADD(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_units)); } -- 1.9.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/clip: Fix brw_clip_unfilled.c/compute_offset's assembly.
Reviewed-by: Chris Forbes chr...@ijw.co.nz On Wed, Aug 6, 2014 at 6:57 PM, Kenneth Graunke kenn...@whitecape.org wrote: Due to the destination register width of 1 or 2, these instructions get ExecSize 1 or 2. But dir and offset (used as src0) are both registers of width 4, violating the execsize = width assertion. I honestly don't think this could have ever worked. Fixes Piglit's polygon-offset and polygon-mode-offset tests on Gen4-5. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70441 Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_clip_unfilled.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) Only tested on Ironlake. No Piglit regressions. Two fixes. diff --git a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c index 5104276..82d7b64 100644 --- a/src/mesa/drivers/dri/i965/brw_clip_unfilled.c +++ b/src/mesa/drivers/dri/i965/brw_clip_unfilled.c @@ -198,7 +198,7 @@ static void compute_offset( struct brw_clip_compile *c ) struct brw_reg dir = c-reg.dir; brw_math_invert(p, get_element(off, 2), get_element(dir, 2)); - brw_MUL(p, vec2(off), dir, get_element(off, 2)); + brw_MUL(p, vec2(off), vec2(dir), get_element(off, 2)); brw_CMP(p, vec1(brw_null_reg()), @@ -210,8 +210,8 @@ static void compute_offset( struct brw_clip_compile *c ) brw_abs(get_element(off, 0)), brw_abs(get_element(off, 1))); brw_inst_set_pred_control(brw, brw_last_inst, BRW_PREDICATE_NORMAL); - brw_MUL(p, vec1(off), off, brw_imm_f(c-key.offset_factor)); - brw_ADD(p, vec1(off), off, brw_imm_f(c-key.offset_units)); + brw_MUL(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_factor)); + brw_ADD(p, vec1(off), vec1(off), brw_imm_f(c-key.offset_units)); } -- 1.9.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 04/19] i965/gen6: Adjust render height in errata case for MSAA
On Friday, August 01, 2014 12:53:34 AM Jordan Justen wrote: In the gen6 PRM Volume 1 Part 1: Graphics Core, Section 7.18.3.7.1 (Surface Arrays For all surfaces other than separate stencil buffer): [DevSNB] Errata: Sampler MSAA Qpitch will be 4 greater than the value calculated in the equation above , for every other odd Surface Height starting from 1 i.e. 1,5,9,13 Since this Qpitch errata only impacts the sampler, we have to adjust the input for the rendering surface to achieve the same qpitch. For the affected heights, we increment the height by 1 for the rendering surface. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com --- src/mesa/drivers/dri/i965/gen6_surface_state.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/gen6_surface_state.c b/src/mesa/drivers/dri/i965/gen6_surface_state.c index db58de9..141ca6f 100644 --- a/src/mesa/drivers/dri/i965/gen6_surface_state.c +++ b/src/mesa/drivers/dri/i965/gen6_surface_state.c @@ -96,8 +96,24 @@ gen6_update_renderbuffer_surface(struct brw_context *brw, /* reloc */ surf[1] = mt-bo-offset64; + /* In the gen6 PRM Volume 1 Part 1: Graphics Core, Section 7.18.3.7.1 +* (Surface Arrays For all surfaces other than separate stencil buffer): +* +* [DevSNB] Errata: Sampler MSAA Qpitch will be 4 greater than the value +* calculated in the equation above , for every other odd Surface Height +* starting from 1 i.e. 1,5,9,13 +* +* Since this Qpitch errata only impacts the sampler, we have to adjust the +* input for the rendering surface to achieve the same qpitch. For the +* affected heights, we increment the height by 1 for the rendering +* surface. +*/ + int height0 = irb-mt-logical_height0; + if (brw-gen == 6 irb-mt-num_samples 1 (height0 % 4) == 1) + height0++; + surf[2] = SET_FIELD(mt-logical_width0 - 1, BRW_SURFACE_WIDTH) | - SET_FIELD(mt-logical_height0 - 1, BRW_SURFACE_HEIGHT) | + SET_FIELD(height0 - 1, BRW_SURFACE_HEIGHT) | SET_FIELD(irb-mt_level - irb-mt-first_level, BRW_SURFACE_LOD); surf[3] = brw_get_surface_tiling_bits(mt-tiling) | FWIW, I believe this code is correct after all. I worked through a lot of math to show the effects this has on QPitch, and it originally didn't work out. It turns out our QPitch computation in brw_tex_layout.c is wrong; once I did the correct QPitch computation, it worked out. I'm going to write patches to fix that. So this gets a: Reviewed-by: Kenneth Graunke kenn...@whitecape.org But please wait on the series for a bit - I'd like to look over the rest, and see how my qpitch fixes affect things. I'll also post my math demonstrating why this does the right thing. Sorry for the hold up... Really nice work tracking this down. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information
On 06.08.2014 03:08, Jason Ekstrand wrote: Module: Mesa Branch: master Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c Author: Jason Ekstrand jason.ekstr...@intel.com Date: Thu Jul 10 23:59:42 2014 -0700 mesa/formats: Add layout and swizzle information v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com Reviewed-by: Brian Paul bri...@vmware.com As of this commit, ~20 depth/stencil related piglit tests have regressed with the radeonsi driver compared to before your changes. See below for an example failure of the draw-pixels test. That test is already broken with the previous commits, each of them with slightly different failure symptoms. Mesa 10.3.0-devel implementation error: Unexpected inFormat GL_STENCIL_INDEX Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa Probe stencil at (0, 0) Expected: 50 Observed: 100 Probe color at (0,0) Expected: 1.00 0.00 0.00 1.00 Observed: 0.00 0.00 0.00 1.00 Probe color at (0,0) Expected: 0.00 0.396078 0.00 1.00 Observed: 0.00 0.384314 0.00 1.00 Probe color at (0,0) Expected: 0.00 0.00 0.396078 1.00 Observed: 0.00 0.00 0.384314 1.00 Probe color at (0,0) Expected: 0.396078 0.427451 0.00 1.00 Observed: 0.384314 0.415686 0.00 1.00 Probe color at (0,0) Expected: 0.396078 0.427451 0.458824 1.00 Observed: 0.384314 0.415686 0.447059 1.00 Probe color at (0,0) Expected: 0.458824 0.427451 0.396078 1.00 Observed: 0.447059 0.415686 0.384314 1.00 Probe color at (0,0) Expected: 0.458824 0.427451 0.396078 0.490196 Observed: 0.447059 0.415686 0.384314 0.478431 Probe color at (0,0) Expected: 0.396078 0.396078 0.396078 0.427451 Observed: 0.384314 0.384314 0.384314 0.415686 Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa Mesa 10.3.0-devel implementation error: Unexpected inFormat 0x Please report at https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa draw-pixels: ../../../src/mesa/main/pack.c:4669: _mesa_unpack_color_span_uint: Assertion `srcFormat == 0x1903 || srcFormat == 0x1904 || srcFormat == 0x1905 || srcFormat == 0x1906 || srcFormat == 0x1909 || srcFormat == 0x190A || srcFormat == 0x8049 || srcFormat == 0x8227 || srcFormat == 0x1907 || srcFormat == 0x80E0 || srcFormat == 0x1908 || srcFormat == 0x80E1 || srcFormat == 0x8000 || srcFormat == 0x8D94 || srcFormat == 0x8D95 || srcFormat == 0x8D96 || srcFormat == 0x8D97 || srcFormat == 0x8228 || srcFormat == 0x8D98 || srcFormat == 0x8D99 || srcFormat == 0x8D9A || srcFormat == 0x8D9B || srcFormat == 0x8D9C || srcFormat == 0x8D9D' failed. -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information
On 06.08.2014 18:28, Michel Dänzer wrote: On 06.08.2014 03:08, Jason Ekstrand wrote: Module: Mesa Branch: master Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c Author: Jason Ekstrand jason.ekstr...@intel.com Date: Thu Jul 10 23:59:42 2014 -0700 mesa/formats: Add layout and swizzle information v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com Reviewed-by: Brian Paul bri...@vmware.com As of this commit, ~20 depth/stencil related piglit tests have regressed with the radeonsi driver compared to before your changes. See below for an example failure of the draw-pixels test. That test is already broken with the previous commits, each of them with slightly different failure symptoms. I meant to write: 'That test is already broken with the three previous commits, [...]' -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi
On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote: On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com wrote: From a gallium perspective, indirect temp regs are already working - so something like MOV TEMP[0], TEMP[TEMP[1].x] should work. Indirect registers are supported for inputs, outputs, temps, constants, and immediates even, but the indirect reg itself must come from a temp or address reg (I am not 100% certain where that restriction comes from). I have no idea which drivers support it, all I can tell is that it works with llvmpipe. I sort of doubt it is supported for samplers right now in gallium though technically it might be possible to express this already. Well, with my limited patch + ChrisF's small patches to mesa core, the dynamic sampler stuff works for nvc0, except for the issues I outlined. Not sure what you mean by supported in gallium. Perhaps I have an incorrect view of things, but I see gallium as an amorphous thing that we can change to our heart's content. A cap bit for the ability to support dynamic indexing of shaders (plus whatever is needed for making it work like declaration of sampler arrays) would certainly be needed in any case. For drivers supporting Right... so it's not like shaders will start magically containing these things, it'll only happen if ARB_gs5 is enabled (probably via PIPE_CAP_GLSL = 400). Which presumably means that the backend supports whatever we're throwing at it. this I would certainly expect them to allow temp regs as the indirect reg. I guess it would be nice if we'd just use temp regs instead of address reg in glsl to tgsi conversion if a driver supports it. I think for modern drivers this makes a lot more sense than trying to shove everything into address regs. Agreed. With the exception that I guess we also need to support indexing with float values? (i.e. ARL) This would have to be treated with some care. Not sure when that comes up though... perhaps only if !native_integers, which won't be an issue with any of the hw that we're talking about. If you really want to lower ARL into a temp, I recommend using F2I, which is equivalent in behavior. For UARL, MOV will do. Also, I don't think GLSL sampler arrays have to be declared as arrays in TGSI. Array declarations are really only needed for TEMPs, because they allow better register allocation. Every other shader resource has a fixed location and would not benefit from it. If GLSL is strict about out-of-bounds access, I recommending always clamping the index in glsl_to_tgsi. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: support unsigned increment in ir_loop controls
On 08/05/2014 05:41 PM, Michel Dänzer wrote: On 31.07.2014 15:05, Michel Dänzer wrote: On 30.07.2014 20:11, Tapani Pälli wrote: Current version can create ir_expression where operands have different base type, patch adds support for unsigned type. Signed-off-by: Tapani Pälli tapani.pa...@intel.com https://bugs.freedesktop.org/show_bug.cgi?id=80880 Tested-by: Michel Dänzer michel.daen...@amd.com Can this go in? This is is the only remaining issue preventing the UE4 demos from working on Gallium drivers. I've been waiting for r-b, it's quite simple so if anyone has time ... // Tapani ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: implement switch flow control using a loop
Patch removes old variable based logic for handling a break inside switch. Switch is put inside a loop so that existing infrastructure for loop flow control can be used for the switch, now also dead code elimination works properly. Possible 'continue' call inside a switch needs now special handling which is taken care of by detecting continue, breaking out and calling continue for the outside loop. Fixes following Piglit tests: fs-exec-after-break.shader_test fs-conditional-break.shader_test No Piglit or es3conform regressions. Signed-off-by: Tapani Pälli tapani.pa...@intel.com --- src/glsl/ast_to_hir.cpp | 101 +++--- src/glsl/glsl_parser_extras.h | 4 +- 2 files changed, 68 insertions(+), 37 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 30b02d0..4e3c48c 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -4366,7 +4366,7 @@ ast_jump_statement::hir(exec_list *instructions, * loop. */ if (state-loop_nesting_ast != NULL - mode == ast_continue) { + mode == ast_continue !state-switch_state.is_switch_innermost) { if (state-loop_nesting_ast-rest_expression) { state-loop_nesting_ast-rest_expression-hir(instructions, state); @@ -4378,19 +4378,27 @@ ast_jump_statement::hir(exec_list *instructions, } if (state-switch_state.is_switch_innermost + mode == ast_continue) { +/* Set 'continue_inside' to true. */ +ir_rvalue *const true_val = new (ctx) ir_constant(true); +ir_dereference_variable *deref_continue_inside_var = + new(ctx) ir_dereference_variable(state-switch_state.continue_inside); +instructions-push_tail(new(ctx) ir_assignment(deref_continue_inside_var, + true_val)); + +/* Break out from the switch, continue for the loop will + * be called right after switch. */ +ir_loop_jump *const jump = + new(ctx) ir_loop_jump(ir_loop_jump::jump_break); +instructions-push_tail(jump); + + } else if (state-switch_state.is_switch_innermost mode == ast_break) { -/* Force break out of switch by setting is_break switch state. - */ -ir_variable *const is_break_var = state-switch_state.is_break_var; -ir_dereference_variable *const deref_is_break_var = - new(ctx) ir_dereference_variable(is_break_var); -ir_constant *const true_val = new(ctx) ir_constant(true); -ir_assignment *const set_break_var = - new(ctx) ir_assignment(deref_is_break_var, true_val); - -instructions-push_tail(set_break_var); - } - else { +/* Force break out of switch by inserting a break. */ +ir_loop_jump *const jump = + new(ctx) ir_loop_jump(ir_loop_jump::jump_break); +instructions-push_tail(jump); + } else { ir_loop_jump *const jump = new(ctx) ir_loop_jump((mode == ast_break) ? ir_loop_jump::jump_break @@ -4502,19 +4510,19 @@ ast_switch_statement::hir(exec_list *instructions, instructions-push_tail(new(ctx) ir_assignment(deref_is_fallthru_var, is_fallthru_val)); - /* Initalize is_break state to false. + /* Initialize continue_inside state to false. */ - ir_rvalue *const is_break_val = new (ctx) ir_constant(false); - state-switch_state.is_break_var = + state-switch_state.continue_inside = new(ctx) ir_variable(glsl_type::bool_type, - switch_is_break_tmp, + continue_inside_tmp, ir_var_temporary); - instructions-push_tail(state-switch_state.is_break_var); + instructions-push_tail(state-switch_state.continue_inside); - ir_dereference_variable *deref_is_break_var = - new(ctx) ir_dereference_variable(state-switch_state.is_break_var); - instructions-push_tail(new(ctx) ir_assignment(deref_is_break_var, - is_break_val)); + ir_rvalue *const false_val = new (ctx) ir_constant(false); + ir_dereference_variable *deref_continue_inside_var = + new(ctx) ir_dereference_variable(state-switch_state.continue_inside); + instructions-push_tail(new(ctx) ir_assignment(deref_continue_inside_var, + false_val)); state-switch_state.run_default = new(ctx) ir_variable(glsl_type::bool_type, @@ -4522,13 +4530,46 @@ ast_switch_statement::hir(exec_list *instructions, ir_var_temporary); instructions-push_tail(state-switch_state.run_default);
Re: [Mesa-dev] [PATCH 4/4] radeon: cache the last used userptr
What is this patch good for? Marek On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 9 ++ src/gallium/drivers/radeon/r600_pipe_common.h | 11 +++ src/gallium/drivers/radeon/r600_texture.c | 41 +-- 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 69d344e..f745311 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -770,11 +770,20 @@ bool r600_common_screen_init(struct r600_common_screen *rscreen, } } + pipe_mutex_init(rscreen-userptr_lock); + return true; } void r600_destroy_common_screen(struct r600_common_screen *rscreen) { + unsigned i; + + for (i = 0; i R600_USERPTR_CACHE; ++i) + pipe_resource_reference((struct pipe_resource **)rscreen-userptr[i].tex, NULL); + + pipe_mutex_destroy(rscreen-userptr_lock); + pipe_mutex_destroy(rscreen-aux_context_lock); rscreen-aux_context-destroy(rscreen-aux_context); diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index dcec2bb..88dbaf8 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -97,6 +97,8 @@ #define R600_MAP_BUFFER_ALIGNMENT 64 +#define R600_USERPTR_CACHE 32 + struct r600_common_context; struct radeon_shader_binary { @@ -258,6 +260,15 @@ struct r600_common_screen { struct r600_resource*trace_bo; uint32_t*trace_ptr; unsignedcs_count; + + struct { + struct r600_texture *tex; + void*pointer; + unsignedoffset; + unsignedsize; + } userptr[R600_USERPTR_CACHE]; + unsigneduserptr_idx; + pipe_mutex userptr_lock; }; /* This encapsulates a state or an operation which can emitted into the GPU diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 89b3b55..c3ff96c 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -855,10 +855,11 @@ static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, { struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; struct radeon_surface surface = {}; + struct pipe_resource *res = NULL; struct r600_texture *tex; unsigned offset, size; struct pb_buffer *buf; - int r; + int r, i; /* Support only 2D textures without mipmaps */ if ((templ-target != PIPE_TEXTURE_2D templ-target != PIPE_TEXTURE_RECT) || @@ -877,16 +878,52 @@ static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, if (size 64*1024) return NULL; + pipe_mutex_lock(rscreen-userptr_lock); + for (i = 0; i R600_USERPTR_CACHE; ++i) { + + if (rscreen-userptr[i].pointer != pointer || + rscreen-userptr[i].offset != offset || + rscreen-userptr[i].size != size || + !rscreen-userptr[i].tex) + continue; + + tex = rscreen-userptr[i].tex; + if (tex-resource.b.b.width0 != templ-width0 + tex-resource.b.b.height0 != templ-height0 + tex-resource.b.b.target != templ-target + tex-resource.b.b.format != templ-format) + continue; + + pipe_resource_reference(res, (struct pipe_resource *)tex); + pipe_mutex_unlock(rscreen-userptr_lock); + return (struct r600_texture *)res; + } + pipe_mutex_unlock(rscreen-userptr_lock); + buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size); if (!buf) return NULL; - r = r600_init_surface(rscreen, surface, templ, RADEON_SURF_MODE_LINEAR_ALIGNED, false); + r = r600_init_surface(rscreen, surface, templ, RADEON_SURF_MODE_LINEAR, false); if (r) return NULL; tex = r600_texture_create_object(screen, templ, stride, buf, surface); tex-surface.level[0].offset += offset; + + pipe_mutex_lock(rscreen-userptr_lock); + ++rscreen-userptr_idx; + rscreen-userptr_idx %= R600_USERPTR_CACHE; + + i = rscreen-userptr_idx; +
Re: [Mesa-dev] [PATCH 4/4] radeon: cache the last used userptr
What is this patch good for? Nothing in particular, I just wanted to test how much overhead creating a new BO each time we do transfer_inline_write actually makes. BTW: Implementing transfer_inline_write using userptrs was just a prove of concept. It turned out to actually be way slower than just copying with the CPU because we need to block for the copy to complete. For a real use case we need to support creating textures from application supplied pointers and implement the matching OpenGL extensions, but you probably know that better than I do. Christian. Am 06.08.2014 um 13:24 schrieb Marek Olšák: What is this patch good for? Marek On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 9 ++ src/gallium/drivers/radeon/r600_pipe_common.h | 11 +++ src/gallium/drivers/radeon/r600_texture.c | 41 +-- 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 69d344e..f745311 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -770,11 +770,20 @@ bool r600_common_screen_init(struct r600_common_screen *rscreen, } } + pipe_mutex_init(rscreen-userptr_lock); + return true; } void r600_destroy_common_screen(struct r600_common_screen *rscreen) { + unsigned i; + + for (i = 0; i R600_USERPTR_CACHE; ++i) + pipe_resource_reference((struct pipe_resource **)rscreen-userptr[i].tex, NULL); + + pipe_mutex_destroy(rscreen-userptr_lock); + pipe_mutex_destroy(rscreen-aux_context_lock); rscreen-aux_context-destroy(rscreen-aux_context); diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index dcec2bb..88dbaf8 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -97,6 +97,8 @@ #define R600_MAP_BUFFER_ALIGNMENT 64 +#define R600_USERPTR_CACHE 32 + struct r600_common_context; struct radeon_shader_binary { @@ -258,6 +260,15 @@ struct r600_common_screen { struct r600_resource*trace_bo; uint32_t*trace_ptr; unsignedcs_count; + + struct { + struct r600_texture *tex; + void*pointer; + unsignedoffset; + unsignedsize; + } userptr[R600_USERPTR_CACHE]; + unsigneduserptr_idx; + pipe_mutex userptr_lock; }; /* This encapsulates a state or an operation which can emitted into the GPU diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 89b3b55..c3ff96c 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -855,10 +855,11 @@ static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, { struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; struct radeon_surface surface = {}; + struct pipe_resource *res = NULL; struct r600_texture *tex; unsigned offset, size; struct pb_buffer *buf; - int r; + int r, i; /* Support only 2D textures without mipmaps */ if ((templ-target != PIPE_TEXTURE_2D templ-target != PIPE_TEXTURE_RECT) || @@ -877,16 +878,52 @@ static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, if (size 64*1024) return NULL; + pipe_mutex_lock(rscreen-userptr_lock); + for (i = 0; i R600_USERPTR_CACHE; ++i) { + + if (rscreen-userptr[i].pointer != pointer || + rscreen-userptr[i].offset != offset || + rscreen-userptr[i].size != size || + !rscreen-userptr[i].tex) + continue; + + tex = rscreen-userptr[i].tex; + if (tex-resource.b.b.width0 != templ-width0 + tex-resource.b.b.height0 != templ-height0 + tex-resource.b.b.target != templ-target + tex-resource.b.b.format != templ-format) + continue; + + pipe_resource_reference(res, (struct pipe_resource *)tex); + pipe_mutex_unlock(rscreen-userptr_lock); + return (struct r600_texture *)res; + } + pipe_mutex_unlock(rscreen-userptr_lock); + buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size); if (!buf) return
Re: [Mesa-dev] [PATCH 1/4] radeonsi: implement partial DMA copies v2
For patch 1 2: Reviewed-by: Marek Olšák marek.ol...@amd.com How was the DMA code tested? I think the best thing would be to switch resource_copy_region to dma_copy just for testing and run piglit. (you also probably want to avoid recursion between dma_copy and resource_copy_region) Marek On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com v2: fix a couple of typos and bugs Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeonsi/si_dma.c | 85 +++ src/gallium/drivers/radeonsi/sid.h| 1 + 2 files changed, 68 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_dma.c b/src/gallium/drivers/radeonsi/si_dma.c index 26f1e1b..4d72f62 100644 --- a/src/gallium/drivers/radeonsi/si_dma.c +++ b/src/gallium/drivers/radeonsi/si_dma.c @@ -111,6 +111,48 @@ static void si_dma_copy_buffer(struct si_context *ctx, } } +static void si_dma_copy_partial(struct si_context *ctx, + struct pipe_resource *dst, + uint64_t dst_offset, + uint32_t dst_slice_size, + uint32_t dst_pitch, + struct pipe_resource *src, + uint64_t src_offset, + uint32_t src_slice_size, + uint32_t src_pitch, + uint32_t width, + uint32_t height, + uint32_t depth, + unsigned bpp) +{ + struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs; + struct r600_resource *rdst = (struct r600_resource*)dst; + struct r600_resource *rsrc = (struct r600_resource*)src; + + dst_offset += r600_resource_va(ctx-screen-b.b, dst); + src_offset += r600_resource_va(ctx-screen-b.b, src); + + r600_need_dma_space(ctx-b, 9); + + r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rsrc, RADEON_USAGE_READ, + RADEON_PRIO_MIN); + r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rdst, RADEON_USAGE_WRITE, + RADEON_PRIO_MIN); + + radeon_emit(cs, SI_DMA_PACKET(SI_DMA_PACKET_COPY, SI_DMA_COPY_PARTIAL, 0x0)); + + radeon_emit(cs, src_offset 0x); + radeon_emit(cs, ((src_offset 32UL) 0xff) | (src_pitch 13)); + radeon_emit(cs, src_slice_size); + + radeon_emit(cs, dst_offset 0x); + radeon_emit(cs, ((dst_offset 32UL) 0xff) | (dst_pitch 13)); + radeon_emit(cs, dst_slice_size); + + radeon_emit(cs, width | (height 16)); + radeon_emit(cs, depth | (util_logbase2(bpp) 29)); +} + static void si_dma_copy_tile(struct si_context *ctx, struct pipe_resource *dst, unsigned dst_level, @@ -299,33 +341,40 @@ void si_dma_copy(struct pipe_context *ctx, src_mode = src_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? RADEON_SURF_MODE_LINEAR : src_mode; dst_mode = dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? RADEON_SURF_MODE_LINEAR : dst_mode; - if (src_pitch != dst_pitch || src_box-x || dst_x || src_w != dst_w) { - /* FIXME si can do partial blit */ - goto fallback; - } - /* the x test here are currently useless (because we don't support partial blit) -* but keep them around so we don't forget about those -*/ - if ((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y % 8) || (dst_y % 8)) { + if (((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y % 8) || (dst_y % 8)) + ((src_mode != RADEON_SURF_MODE_LINEAR) || (dst_mode != RADEON_SURF_MODE_LINEAR))) { goto fallback; } if (src_mode == dst_mode) { uint64_t dst_offset, src_offset; - /* simple dma blit would do NOTE code here assume : -* src_box.x/y == 0 -* dst_x/y == 0 -* dst_pitch == src_pitch -*/ - src_offset= rsrc-surface.level[src_level].offset; - src_offset += rsrc-surface.level[src_level].slice_size * src_box-z; + uint32_t dst_slice_size, src_slice_size; + + src_slice_size = rsrc-surface.level[src_level].slice_size; + src_offset = rsrc-surface.level[src_level].offset; + src_offset += src_slice_size * src_box-z; src_offset += src_y * src_pitch + src_x * bpp; + + dst_slice_size = rdst-surface.level[dst_level].slice_size; dst_offset = rdst-surface.level[dst_level].offset; - dst_offset +=
Re: [Mesa-dev] [PATCH 3/4] radeon: accelerate transfer_inline_write
On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com Not completely implemented, cause we need DMA copy support for every hw generation. Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeon/r600_buffer_common.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.c | 2 +- src/gallium/drivers/radeon/r600_texture.c | 104 ++-- 3 files changed, 100 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index d747cbc..28ab30c 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -372,7 +372,7 @@ static const struct u_resource_vtbl r600_buffer_vtbl = r600_buffer_transfer_map, /* transfer_map */ NULL, /* transfer_flush_region */ r600_buffer_transfer_unmap, /* transfer_unmap */ - NULL/* transfer_inline_write */ + u_default_transfer_inline_write /* transfer_inline_write */ }; struct pipe_resource *r600_buffer_create(struct pipe_screen *screen, diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3476021..69d344e 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -153,7 +153,7 @@ bool r600_common_context_init(struct r600_common_context *rctx, rctx-b.transfer_map = u_transfer_map_vtbl; rctx-b.transfer_flush_region = u_default_transfer_flush_region; rctx-b.transfer_unmap = u_transfer_unmap_vtbl; - rctx-b.transfer_inline_write = u_default_transfer_inline_write; + rctx-b.transfer_inline_write = u_transfer_inline_write_vtbl; rctx-b.memory_barrier = r600_memory_barrier; rctx-b.flush = r600_flush_from_st; diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 482bbff..89b3b55 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -849,6 +849,47 @@ static struct pipe_resource *r600_texture_from_handle(struct pipe_screen *screen stride, buf, surface); } +static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, + const struct pipe_resource *templ, + void *pointer, unsigned stride) +{ + struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; + struct radeon_surface surface = {}; + struct r600_texture *tex; + unsigned offset, size; + struct pb_buffer *buf; + int r; + + /* Support only 2D textures without mipmaps */ + if ((templ-target != PIPE_TEXTURE_2D templ-target != PIPE_TEXTURE_RECT) || + templ-depth0 != 1 || templ-last_level != 0) + return NULL; + + /* stride needs to be at least dw aligned */ + if (stride % 4) + return NULL; + + offset = ((uintptr_t)pointer) 0xfff; + pointer = (void *)(((uintptr_t)pointer) - offset); + size = align(stride * templ-height0 + offset, 0x1000); + + /* avoid the overhead for small copies */ + if (size 64*1024) + return NULL; + + buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size); + if (!buf) + return NULL; + + r = r600_init_surface(rscreen, surface, templ, RADEON_SURF_MODE_LINEAR_ALIGNED, false); I know you change it the next patch, but I think the alignment for LINEAR (not ALIGNED) is 8 pixels, right? Of course, libdrm_radeon should be reviewed if it doesn't over-align the stride. The safest thing would be to check if stride == surface[0].pitch_in_bytes. + if (r) + return NULL; + + tex = r600_texture_create_object(screen, templ, stride, buf, surface); + tex-surface.level[0].offset += offset; + return tex; +} + bool r600_init_flushed_depth_texture(struct pipe_context *ctx, struct pipe_resource *texture, struct r600_texture **staging) @@ -1112,14 +1153,65 @@ static void r600_texture_transfer_unmap(struct pipe_context *ctx, FREE(transfer); } +static void r600_texture_transfer_inline_write(struct pipe_context *ctx, + struct pipe_resource *dst, + unsigned level, unsigned usage, + const struct pipe_box *box, + const void *data, +
Re: [Mesa-dev] [PATCH 4/4] radeon: cache the last used userptr
I only know about AMD_pinned_memory, which is for buffers only. I don't know about an API for creating textures from user pointers. Yes, there are pixel buffer objects, but they are a lot more difficult to implement and they are defined such that a zero-copy approach to get a texture is not possible. Marek On Wed, Aug 6, 2014 at 1:39 PM, Christian König deathsim...@vodafone.de wrote: What is this patch good for? Nothing in particular, I just wanted to test how much overhead creating a new BO each time we do transfer_inline_write actually makes. BTW: Implementing transfer_inline_write using userptrs was just a prove of concept. It turned out to actually be way slower than just copying with the CPU because we need to block for the copy to complete. For a real use case we need to support creating textures from application supplied pointers and implement the matching OpenGL extensions, but you probably know that better than I do. Christian. Am 06.08.2014 um 13:24 schrieb Marek Olšák: What is this patch good for? Marek On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 9 ++ src/gallium/drivers/radeon/r600_pipe_common.h | 11 +++ src/gallium/drivers/radeon/r600_texture.c | 41 +-- 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 69d344e..f745311 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -770,11 +770,20 @@ bool r600_common_screen_init(struct r600_common_screen *rscreen, } } + pipe_mutex_init(rscreen-userptr_lock); + return true; } void r600_destroy_common_screen(struct r600_common_screen *rscreen) { + unsigned i; + + for (i = 0; i R600_USERPTR_CACHE; ++i) + pipe_resource_reference((struct pipe_resource **)rscreen-userptr[i].tex, NULL); + + pipe_mutex_destroy(rscreen-userptr_lock); + pipe_mutex_destroy(rscreen-aux_context_lock); rscreen-aux_context-destroy(rscreen-aux_context); diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index dcec2bb..88dbaf8 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -97,6 +97,8 @@ #define R600_MAP_BUFFER_ALIGNMENT 64 +#define R600_USERPTR_CACHE 32 + struct r600_common_context; struct radeon_shader_binary { @@ -258,6 +260,15 @@ struct r600_common_screen { struct r600_resource*trace_bo; uint32_t*trace_ptr; unsignedcs_count; + + struct { + struct r600_texture *tex; + void*pointer; + unsignedoffset; + unsignedsize; + } userptr[R600_USERPTR_CACHE]; + unsigneduserptr_idx; + pipe_mutex userptr_lock; }; /* This encapsulates a state or an operation which can emitted into the GPU diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 89b3b55..c3ff96c 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -855,10 +855,11 @@ static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, { struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; struct radeon_surface surface = {}; + struct pipe_resource *res = NULL; struct r600_texture *tex; unsigned offset, size; struct pb_buffer *buf; - int r; + int r, i; /* Support only 2D textures without mipmaps */ if ((templ-target != PIPE_TEXTURE_2D templ-target != PIPE_TEXTURE_RECT) || @@ -877,16 +878,52 @@ static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, if (size 64*1024) return NULL; + pipe_mutex_lock(rscreen-userptr_lock); + for (i = 0; i R600_USERPTR_CACHE; ++i) { + + if (rscreen-userptr[i].pointer != pointer || + rscreen-userptr[i].offset != offset || + rscreen-userptr[i].size != size || + !rscreen-userptr[i].tex) + continue; + + tex = rscreen-userptr[i].tex; + if (tex-resource.b.b.width0 != templ-width0 + tex-resource.b.b.height0 != templ-height0 +
Re: [Mesa-dev] [PATCH 05/20] i965/cfg: Add a foreach_inst_in_block_safe macro.
On Tue, Aug 05, 2014 at 09:14:55PM +0300, Pohjolainen, Topi wrote: On Thu, Jul 24, 2014 at 07:54:12PM -0700, Matt Turner wrote: --- src/mesa/drivers/dri/i965/brw_cfg.h | 8 1 file changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index a5d2df5..913a1ed 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -120,6 +120,14 @@ struct cfg_t { __inst != __block-end-next; \ __inst = (__type *)__inst-next) +#define foreach_inst_in_block_safe(__type, __inst, __block)\ + for (__type *__inst = (__type *)__block-start, \ + *__next = (__type *)__inst-next, \ + *__end = (__type *)__block-end-next-next;\ Patches 4 and 7 make sense but the double -next-next here is not obvious to me. I tried handwriting instructions into blocks (this is purely arbitrary): ipopcode -- 0 : BRW_OPCODE_? .. k : BRW_OPCODE_IF k+1: BRW_OPCODE_? .. n : BRW_OPCODE_ELSE n+1: BRW_OPCODE_? .. m : BRW_OPCODE_ENDIF m+1: BRW_OPCODE_? .. t : BRW_OPCODE_? Following the logic in the constructor of cfg_t, I would deduce this: block 0: start_ip = 0 num = 0 start = inst_0 end = inst_k (if) block 1: start_ip = k+1 num = 1 start = inst_k+1 end = inst_n (else) block 2: start_ip = n+1 num = 2 start = inst_n+1 end = inst_m-1 block 3: start_ip = m num = 3 start = inst_m(endif) end = inst_t And as instructions are inherited from exec_node, for block 3 end-next should be NULL, right? +__next != __end; \ +__inst = __next, \ +__next = (__type *)__next-next) + #define foreach_inst_in_block_reverse(__type, __inst, __block) \ for (__type *__inst = (__type *)__block-end; \ __inst != __block-start-prev;\ -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] radeonsi: implement partial DMA copies v2
Am 06.08.2014 um 13:42 schrieb Marek Olšák: For patch 1 2: Reviewed-by: Marek Olšák marek.ol...@amd.com How was the DMA code tested? I think the best thing would be to switch resource_copy_region to dma_copy just for testing and run piglit. (you also probably want to avoid recursion between dma_copy and resource_copy_region) Yeah, good idea. I just won't have time for that right now, since I need to start on the next requirement asap. I tested it through VDPAU playback without UVD, e.g. mplayer/vdpau state tracker copying the frames using transfer_inline_write to a video buffer and compositing that with subtitles. Seems to provide a quite good way of providing all kind of different texture and rectangle sizes (because of the different letters in the subtitle). It took me a while to get everything working. Christian. Marek On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com v2: fix a couple of typos and bugs Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeonsi/si_dma.c | 85 +++ src/gallium/drivers/radeonsi/sid.h| 1 + 2 files changed, 68 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_dma.c b/src/gallium/drivers/radeonsi/si_dma.c index 26f1e1b..4d72f62 100644 --- a/src/gallium/drivers/radeonsi/si_dma.c +++ b/src/gallium/drivers/radeonsi/si_dma.c @@ -111,6 +111,48 @@ static void si_dma_copy_buffer(struct si_context *ctx, } } +static void si_dma_copy_partial(struct si_context *ctx, + struct pipe_resource *dst, + uint64_t dst_offset, + uint32_t dst_slice_size, + uint32_t dst_pitch, + struct pipe_resource *src, + uint64_t src_offset, + uint32_t src_slice_size, + uint32_t src_pitch, + uint32_t width, + uint32_t height, + uint32_t depth, + unsigned bpp) +{ + struct radeon_winsys_cs *cs = ctx-b.rings.dma.cs; + struct r600_resource *rdst = (struct r600_resource*)dst; + struct r600_resource *rsrc = (struct r600_resource*)src; + + dst_offset += r600_resource_va(ctx-screen-b.b, dst); + src_offset += r600_resource_va(ctx-screen-b.b, src); + + r600_need_dma_space(ctx-b, 9); + + r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rsrc, RADEON_USAGE_READ, + RADEON_PRIO_MIN); + r600_context_bo_reloc(ctx-b, ctx-b.rings.dma, rdst, RADEON_USAGE_WRITE, + RADEON_PRIO_MIN); + + radeon_emit(cs, SI_DMA_PACKET(SI_DMA_PACKET_COPY, SI_DMA_COPY_PARTIAL, 0x0)); + + radeon_emit(cs, src_offset 0x); + radeon_emit(cs, ((src_offset 32UL) 0xff) | (src_pitch 13)); + radeon_emit(cs, src_slice_size); + + radeon_emit(cs, dst_offset 0x); + radeon_emit(cs, ((dst_offset 32UL) 0xff) | (dst_pitch 13)); + radeon_emit(cs, dst_slice_size); + + radeon_emit(cs, width | (height 16)); + radeon_emit(cs, depth | (util_logbase2(bpp) 29)); +} + static void si_dma_copy_tile(struct si_context *ctx, struct pipe_resource *dst, unsigned dst_level, @@ -299,33 +341,40 @@ void si_dma_copy(struct pipe_context *ctx, src_mode = src_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? RADEON_SURF_MODE_LINEAR : src_mode; dst_mode = dst_mode == RADEON_SURF_MODE_LINEAR_ALIGNED ? RADEON_SURF_MODE_LINEAR : dst_mode; - if (src_pitch != dst_pitch || src_box-x || dst_x || src_w != dst_w) { - /* FIXME si can do partial blit */ - goto fallback; - } - /* the x test here are currently useless (because we don't support partial blit) -* but keep them around so we don't forget about those -*/ - if ((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y % 8) || (dst_y % 8)) { + if (((src_pitch % 8) || (src_box-x % 8) || (dst_x % 8) || (src_box-y % 8) || (dst_y % 8)) + ((src_mode != RADEON_SURF_MODE_LINEAR) || (dst_mode != RADEON_SURF_MODE_LINEAR))) { goto fallback; } if (src_mode == dst_mode) { uint64_t dst_offset, src_offset; - /* simple dma blit would do NOTE code here assume : -* src_box.x/y == 0 -* dst_x/y == 0 -* dst_pitch == src_pitch -*/ - src_offset= rsrc-surface.level[src_level].offset; - src_offset += rsrc-surface.level[src_level].slice_size * src_box-z; + uint32_t dst_slice_size,
Re: [Mesa-dev] [PATCH 3/4] radeon: accelerate transfer_inline_write
Am 06.08.2014 um 13:45 schrieb Marek Olšák: On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com Not completely implemented, cause we need DMA copy support for every hw generation. Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeon/r600_buffer_common.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.c | 2 +- src/gallium/drivers/radeon/r600_texture.c | 104 ++-- 3 files changed, 100 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index d747cbc..28ab30c 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -372,7 +372,7 @@ static const struct u_resource_vtbl r600_buffer_vtbl = r600_buffer_transfer_map, /* transfer_map */ NULL, /* transfer_flush_region */ r600_buffer_transfer_unmap, /* transfer_unmap */ - NULL/* transfer_inline_write */ + u_default_transfer_inline_write /* transfer_inline_write */ }; struct pipe_resource *r600_buffer_create(struct pipe_screen *screen, diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3476021..69d344e 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -153,7 +153,7 @@ bool r600_common_context_init(struct r600_common_context *rctx, rctx-b.transfer_map = u_transfer_map_vtbl; rctx-b.transfer_flush_region = u_default_transfer_flush_region; rctx-b.transfer_unmap = u_transfer_unmap_vtbl; - rctx-b.transfer_inline_write = u_default_transfer_inline_write; + rctx-b.transfer_inline_write = u_transfer_inline_write_vtbl; rctx-b.memory_barrier = r600_memory_barrier; rctx-b.flush = r600_flush_from_st; diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 482bbff..89b3b55 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -849,6 +849,47 @@ static struct pipe_resource *r600_texture_from_handle(struct pipe_screen *screen stride, buf, surface); } +static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, + const struct pipe_resource *templ, + void *pointer, unsigned stride) +{ + struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; + struct radeon_surface surface = {}; + struct r600_texture *tex; + unsigned offset, size; + struct pb_buffer *buf; + int r; + + /* Support only 2D textures without mipmaps */ + if ((templ-target != PIPE_TEXTURE_2D templ-target != PIPE_TEXTURE_RECT) || + templ-depth0 != 1 || templ-last_level != 0) + return NULL; + + /* stride needs to be at least dw aligned */ + if (stride % 4) + return NULL; + + offset = ((uintptr_t)pointer) 0xfff; + pointer = (void *)(((uintptr_t)pointer) - offset); + size = align(stride * templ-height0 + offset, 0x1000); + + /* avoid the overhead for small copies */ + if (size 64*1024) + return NULL; + + buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size); + if (!buf) + return NULL; + + r = r600_init_surface(rscreen, surface, templ, RADEON_SURF_MODE_LINEAR_ALIGNED, false); I know you change it the next patch, but I think the alignment for LINEAR (not ALIGNED) is 8 pixels, right? Of course, libdrm_radeon should be reviewed if it doesn't over-align the stride. The safest thing would be to check if stride == surface[0].pitch_in_bytes. Yeah, correct. The problem here is that even RADEON_SURF_MODE_LINEAR couldn't even handle all different alignments the application could come up with for the base pointer and stride. The only thing that can handle dword aligned or even byte aligned subwindow copies is the async DMA partial copy command and that is only available on NI+. Apart from that testing if libdrm_radeon really comes up with the correct stride is indeed a good idea. + if (r) + return NULL; + + tex = r600_texture_create_object(screen, templ, stride, buf, surface); + tex-surface.level[0].offset += offset; + return tex; +} + bool r600_init_flushed_depth_texture(struct pipe_context *ctx, struct pipe_resource *texture, struct r600_texture **staging) @@ -1112,14 +1153,65 @@ static void r600_texture_transfer_unmap(struct
Re: [Mesa-dev] [PATCH 11/20] i965: Add basic-block aware backend_instruction::insert_* methods.
On Thu, Jul 24, 2014 at 07:54:18PM -0700, Matt Turner wrote: --- src/mesa/drivers/dri/i965/brw_shader.cpp | 80 src/mesa/drivers/dri/i965/brw_shader.h | 5 ++ 2 files changed, 85 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 47535a9..ba93cbc 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -740,6 +740,86 @@ backend_instruction::has_side_effects() const } void +backend_instruction::insert_after(bblock_t *block, backend_instruction *inst) +{ + bool found = false; (void) found; + foreach_inst_in_block (backend_instruction, i, block) { + if (this == i) { + found = true; + } + } + assert(found || !Instruction not in block); + + block-end_ip++; + + for (bblock_t *block_iter = (bblock_t *)block-link.next; +!block_iter-link.is_tail_sentinel(); +block_iter = (bblock_t *)block_iter-link.next) { + block_iter-start_ip++; + block_iter-end_ip++; + } + + if (block-end == this) + block-end = inst; + + this-insert_after(inst); If you used exec_node::insert_after(inst) instead would you still need the using directive in the header? +} + +void +backend_instruction::insert_before(bblock_t *block, backend_instruction *inst) +{ + bool found = false; (void) found; + foreach_inst_in_block (backend_instruction, i, block) { + if (this == i) { + found = true; + } + } + assert(found || !Instruction not in block); + + block-end_ip++; + + for (bblock_t *block_iter = (bblock_t *)block-link.next; +!block_iter-link.is_tail_sentinel(); +block_iter = (bblock_t *)block_iter-link.next) { + block_iter-start_ip++; + block_iter-end_ip++; + } + + if (block-start == this) + block-start = inst; + + this-insert_before(inst); +} + +void +backend_instruction::insert_before(bblock_t *block, exec_list *list) +{ + bool found = false; (void) found; + foreach_inst_in_block (backend_instruction, i, block) { + if (this == i) { + found = true; + } + } + assert(found || !Instruction not in block); This is common for all three cases, and could be refactored into its own function, say check_inst_in_block(). It would document the seven lines nicely. + + unsigned num_inst = list-length(); + + block-end_ip += num_inst; + + for (bblock_t *block_iter = (bblock_t *)block-link.next; +!block_iter-link.is_tail_sentinel(); +block_iter = (bblock_t *)block_iter-link.next) { + block_iter-start_ip += num_inst; + block_iter-end_ip += num_inst; + } Same here, this iteration is the same and could be its own member with arugment telling the adjustment size. + + if (block-start == this) + block-start = (backend_instruction *)list-get_head(); + + this-insert_before(list); +} + +void backend_instruction::remove(bblock_t *block) { bool found = false; (void) found; diff --git a/src/mesa/drivers/dri/i965/brw_shader.h b/src/mesa/drivers/dri/i965/brw_shader.h index 4b80ea9..d174d5c 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.h +++ b/src/mesa/drivers/dri/i965/brw_shader.h @@ -92,6 +92,11 @@ struct backend_instruction : public exec_node { using exec_node::remove; void remove(bblock_t *block); + using exec_node::insert_after; + void insert_after(bblock_t *block, backend_instruction *inst); + using exec_node::insert_before; + void insert_before(bblock_t *block, backend_instruction *inst); + void insert_before(bblock_t *block, exec_list *list); /** * True if the instruction has side effects other than writing to -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] radeon: accelerate transfer_inline_write
On Wed, Aug 6, 2014 at 2:44 PM, Christian König deathsim...@vodafone.de wrote: Am 06.08.2014 um 13:45 schrieb Marek Olšák: On Tue, Aug 5, 2014 at 7:31 PM, Christian König deathsim...@vodafone.de wrote: From: Christian König christian.koe...@amd.com Not completely implemented, cause we need DMA copy support for every hw generation. Signed-off-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeon/r600_buffer_common.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.c | 2 +- src/gallium/drivers/radeon/r600_texture.c | 104 ++-- 3 files changed, 100 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index d747cbc..28ab30c 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -372,7 +372,7 @@ static const struct u_resource_vtbl r600_buffer_vtbl = r600_buffer_transfer_map, /* transfer_map */ NULL, /* transfer_flush_region */ r600_buffer_transfer_unmap, /* transfer_unmap */ - NULL/* transfer_inline_write */ + u_default_transfer_inline_write /* transfer_inline_write */ }; struct pipe_resource *r600_buffer_create(struct pipe_screen *screen, diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3476021..69d344e 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -153,7 +153,7 @@ bool r600_common_context_init(struct r600_common_context *rctx, rctx-b.transfer_map = u_transfer_map_vtbl; rctx-b.transfer_flush_region = u_default_transfer_flush_region; rctx-b.transfer_unmap = u_transfer_unmap_vtbl; - rctx-b.transfer_inline_write = u_default_transfer_inline_write; + rctx-b.transfer_inline_write = u_transfer_inline_write_vtbl; rctx-b.memory_barrier = r600_memory_barrier; rctx-b.flush = r600_flush_from_st; diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 482bbff..89b3b55 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -849,6 +849,47 @@ static struct pipe_resource *r600_texture_from_handle(struct pipe_screen *screen stride, buf, surface); } +static struct r600_texture *r600_texture_from_ptr(struct pipe_screen *screen, + const struct pipe_resource *templ, + void *pointer, unsigned stride) +{ + struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; + struct radeon_surface surface = {}; + struct r600_texture *tex; + unsigned offset, size; + struct pb_buffer *buf; + int r; + + /* Support only 2D textures without mipmaps */ + if ((templ-target != PIPE_TEXTURE_2D templ-target != PIPE_TEXTURE_RECT) || + templ-depth0 != 1 || templ-last_level != 0) + return NULL; + + /* stride needs to be at least dw aligned */ + if (stride % 4) + return NULL; + + offset = ((uintptr_t)pointer) 0xfff; + pointer = (void *)(((uintptr_t)pointer) - offset); + size = align(stride * templ-height0 + offset, 0x1000); + + /* avoid the overhead for small copies */ + if (size 64*1024) + return NULL; + + buf = rscreen-ws-buffer_from_ptr(rscreen-ws, pointer, size); + if (!buf) + return NULL; + + r = r600_init_surface(rscreen, surface, templ, RADEON_SURF_MODE_LINEAR_ALIGNED, false); I know you change it the next patch, but I think the alignment for LINEAR (not ALIGNED) is 8 pixels, right? Of course, libdrm_radeon should be reviewed if it doesn't over-align the stride. The safest thing would be to check if stride == surface[0].pitch_in_bytes. Yeah, correct. The problem here is that even RADEON_SURF_MODE_LINEAR couldn't even handle all different alignments the application could come up with for the base pointer and stride. The only thing that can handle dword aligned or even byte aligned subwindow copies is the async DMA partial copy command and that is only available on NI+. Apart from that testing if libdrm_radeon really comes up with the correct stride is indeed a good idea. + if (r) + return NULL; + + tex = r600_texture_create_object(screen, templ, stride, buf, surface); + tex-surface.level[0].offset += offset; + return tex; +} + bool r600_init_flushed_depth_texture(struct pipe_context *ctx, struct pipe_resource *texture,
Re: [Mesa-dev] [PATCH 00/10] [RFC] Probably useless algebraic optimizations
2014-08-04 21:25 GMT+02:00 Eric Anholt e...@anholt.net: thomashellan...@gmail.com writes: From: Thomas Helland thomashellan...@gmail.com When writing that A || (A B) patch some days ago I also wrote some other patches that have no impact on my collection of shaders. (shader-db + Some TF2 and Portal-shaders). No reduction in instruction count, and no significant increase in compilation time. I decided to put them up here anyway, as with your collection of shaders maybe YMMV. I'm definitely interested in seeing our optimizer gain features like this, even if we don't have samples of code triggering them in our database yet. What we have in shader-db from real-world apps is a subset of what our compiler will encounter -- it doesn't tend to cover code by novice shader developers, nor does it cover more-complex, more-code-generated code we expect to see in the future. This was my initial thought to. Also, while some of these patterns are simple, and likely be spotted by a seasoned programmer, they may end up in our tree from other optimizations simplifying the code. If the patches are cleaned up to use spaces instead of tabs, and avoid trailing whitespace, patches 1-4, 7-8, and 10 are: Reviewed-by: Eric Anholt e...@anholt.net I'll get these cleaned up and posted to the list again soon. I don't have commit access, so I'll need someone to push them for me. For the sub case, I'm going want to disable lower_sub_to_add_neg on my hardware, since I've got SUB but no negate modifier on operands. This makes the (A - neg(B)) patch interesting to me. However, since neg(A) - B - neg(A+B) was questioned, and it would be no change for me as well, I think we should probably drop that half. I'll rewrite this to drop the neg(A)-B part, and post it along with the rest. The min/max patches I'm not that interested in -- I think that class of optimization would be better handled in an pass that can track various bounds that values might have over time, rather than being a special case in algebraic. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeon, r200: fix buffer validation after CS flush
From: Marek Olšák marek.ol...@amd.com This validates all bound buffers (CB, ZB, textures, DMA) at the beginning of CS. This fixes bo-space_accouned assertion failures. Tested by: Jochen Rollwagen joro-2...@t-online.de Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/dri/r200/r200_context.c| 1 + src/mesa/drivers/dri/r200/r200_state.c | 2 +- src/mesa/drivers/dri/r200/r200_state.h | 1 + src/mesa/drivers/dri/radeon/radeon_common.c | 14 +- src/mesa/drivers/dri/radeon/radeon_common_context.h | 1 + src/mesa/drivers/dri/radeon/radeon_context.c| 1 + src/mesa/drivers/dri/radeon/radeon_state.c | 2 +- src/mesa/drivers/dri/radeon/radeon_state.h | 1 + 8 files changed, 8 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/r200/r200_context.c b/src/mesa/drivers/dri/r200/r200_context.c index 71dfcf3..d5749f3 100644 --- a/src/mesa/drivers/dri/r200/r200_context.c +++ b/src/mesa/drivers/dri/r200/r200_context.c @@ -190,6 +190,7 @@ static void r200_init_vtbl(radeonContextPtr radeon) radeon-vtbl.check_blit = r200_check_blit; radeon-vtbl.blit = r200_blit; radeon-vtbl.is_format_renderable = radeonIsFormatRenderable; + radeon-vtbl.revalidate_all_buffers = r200ValidateBuffers; } diff --git a/src/mesa/drivers/dri/r200/r200_state.c b/src/mesa/drivers/dri/r200/r200_state.c index 2c7b652..983430f 100644 --- a/src/mesa/drivers/dri/r200/r200_state.c +++ b/src/mesa/drivers/dri/r200/r200_state.c @@ -2210,7 +2210,7 @@ static void update_texturematrix( struct gl_context *ctx ) } } -static GLboolean r200ValidateBuffers(struct gl_context *ctx) +GLboolean r200ValidateBuffers(struct gl_context *ctx) { r200ContextPtr rmesa = R200_CONTEXT(ctx); struct radeon_renderbuffer *rrb; diff --git a/src/mesa/drivers/dri/r200/r200_state.h b/src/mesa/drivers/dri/r200/r200_state.h index db0f01f..a396b06 100644 --- a/src/mesa/drivers/dri/r200/r200_state.h +++ b/src/mesa/drivers/dri/r200/r200_state.h @@ -47,6 +47,7 @@ extern void r200UpdateViewportOffset( struct gl_context *ctx ); extern void r200UpdateWindow( struct gl_context *ctx ); extern void r200UpdateDrawBuffer(struct gl_context *ctx); +extern GLboolean r200ValidateBuffers(struct gl_context *ctx); extern GLboolean r200ValidateState( struct gl_context *ctx ); extern void r200_vtbl_update_scissor( struct gl_context *ctx ); diff --git a/src/mesa/drivers/dri/radeon/radeon_common.c b/src/mesa/drivers/dri/radeon/radeon_common.c index 67c6dc7..515e55a 100644 --- a/src/mesa/drivers/dri/radeon/radeon_common.c +++ b/src/mesa/drivers/dri/radeon/radeon_common.c @@ -532,17 +532,6 @@ static INLINE void radeonEmitAtoms(radeonContextPtr radeon, GLboolean emitAll) COMMIT_BATCH(); } -static GLboolean radeon_revalidate_bos(struct gl_context *ctx) -{ - radeonContextPtr radeon = RADEON_CONTEXT(ctx); - int ret; - - ret = radeon_cs_space_check(radeon-cmdbuf.cs); - if (ret == RADEON_CS_SPACE_FLUSH) - return GL_FALSE; - return GL_TRUE; -} - void radeonEmitState(radeonContextPtr radeon) { radeon_print(RADEON_STATE, RADEON_NORMAL, %s\n, __FUNCTION__); @@ -661,9 +650,8 @@ int rcommonFlushCmdBufLocked(radeonContextPtr rmesa, const char *caller) radeon_cs_erase(rmesa-cmdbuf.cs); rmesa-cmdbuf.flushing = 0; - if (radeon_revalidate_bos(rmesa-glCtx) == GL_FALSE) { + if (!rmesa-vtbl.revalidate_all_buffers(rmesa-glCtx)) fprintf(stderr,failed to revalidate buffers\n); - } return ret; } diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.h b/src/mesa/drivers/dri/radeon/radeon_common_context.h index 6cd1535..ac3e7b5 100644 --- a/src/mesa/drivers/dri/radeon/radeon_common_context.h +++ b/src/mesa/drivers/dri/radeon/radeon_common_context.h @@ -496,6 +496,7 @@ struct radeon_context { unsigned reg_height, unsigned flip_y); unsigned (*is_format_renderable)(mesa_format mesa_format); + GLboolean (*revalidate_all_buffers)(struct gl_context *ctx); } vtbl; }; diff --git a/src/mesa/drivers/dri/radeon/radeon_context.c b/src/mesa/drivers/dri/radeon/radeon_context.c index 1ceb4ab..edd94e2 100644 --- a/src/mesa/drivers/dri/radeon/radeon_context.c +++ b/src/mesa/drivers/dri/radeon/radeon_context.c @@ -157,6 +157,7 @@ static void r100_init_vtbl(radeonContextPtr radeon) radeon-vtbl.check_blit = r100_check_blit; radeon-vtbl.blit = r100_blit; radeon-vtbl.is_format_renderable = radeonIsFormatRenderable; + radeon-vtbl.revalidate_all_buffers = r100ValidateBuffers; } /* Create the device specific context. diff --git a/src/mesa/drivers/dri/radeon/radeon_state.c b/src/mesa/drivers/dri/radeon/radeon_state.c index f6bc5df..843b041 100644 --- a/src/mesa/drivers/dri/radeon/radeon_state.c +++ b/src/mesa/drivers/dri/radeon/radeon_state.c @@ -1992,7 +1992,7 @@ static void
[Mesa-dev] [PATCH] st/mesa: dump TGSI before calling into the driver
From: Marek Olšák marek.ol...@amd.com If the driver crashes in create_xx_shader, you want to see the shader. --- src/mesa/state_tracker/st_program.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c index 9d7b7c4..fbf8930 100644 --- a/src/mesa/state_tracker/st_program.c +++ b/src/mesa/state_tracker/st_program.c @@ -393,13 +393,12 @@ st_translate_vertex_program(struct st_context *st, vpv-tgsi.stream_output); } - vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi); - if (ST_DEBUG DEBUG_TGSI) { - tgsi_dump( vpv-tgsi.tokens, 0 ); + tgsi_dump(vpv-tgsi.tokens, 0); debug_printf(\n); } + vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi); return vpv; fail: @@ -804,15 +803,15 @@ st_translate_fragment_program(struct st_context *st, variant-tgsi.tokens = ureg_get_tokens( ureg, NULL ); ureg_destroy( ureg ); - /* fill in variant */ - variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi); - variant-key = *key; - if (ST_DEBUG DEBUG_TGSI) { - tgsi_dump( variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/ ); + tgsi_dump(variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/); debug_printf(\n); } + /* fill in variant */ + variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi); + variant-key = *key; + if (deleteFP) { /* Free the temporary program made above */ struct gl_fragment_program *fp = stfp-Base; @@ -1173,10 +1172,6 @@ st_translate_geometry_program(struct st_context *st, stgp-tgsi.stream_output); } - /* fill in new variant */ - gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi); - gpv-key = *key; - if ((ST_DEBUG DEBUG_TGSI) (ST_DEBUG DEBUG_MESA)) { _mesa_print_program(stgp-Base.Base); debug_printf(\n); @@ -1187,6 +1182,9 @@ st_translate_geometry_program(struct st_context *st, debug_printf(\n); } + /* fill in new variant */ + gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi); + gpv-key = *key; return gpv; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK
From: Marek Olšák marek.ol...@amd.com The code is rewritten to take known constraints into account, while always using 0 by default. This should improve performance for multi-SE parts in theory. A debug option is also added for easier debugging. (If there are hangs, use the option. If the hangs go away, you have found the problem.) --- src/gallium/drivers/radeon/r600_pipe_common.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeonsi/si_state_draw.c | 33 --- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 4 files changed, 43 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3476021..eb44d72 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -239,7 +239,6 @@ static const struct debug_named_value common_debug_options[] = { { vm, DBG_VM, Print virtual addresses when creating resources }, { trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c file with faulty cs }, - /* shaders */ { fs, DBG_FS, Print fetch shaders }, { vs, DBG_VS, Print vertex shaders }, @@ -254,6 +253,7 @@ static const struct debug_named_value common_debug_options[] = { { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of INVALIDATE_RANGE map flags }, { no2d, DBG_NO_2D_TILING, Disable 2D tiling }, { notiling, DBG_NO_TILING, Disable tiling }, + { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on end-of-packet. }, DEBUG_NAMED_VALUE_END /* must be last */ }; diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index dcec2bb..ac69d5b 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -93,6 +93,7 @@ #define DBG_NO_DISCARD_RANGE (1 12) #define DBG_NO_2D_TILING (1 13) #define DBG_NO_TILING (1 14) +#define DBG_SWITCH_ON_EOP (1 15) /* The maximum allowed bit is 15. */ #define R600_MAP_BUFFER_ALIGNMENT 64 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 4e808a3..ae839ba 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct si_context *sctx, if (sctx-b.chip_class = CIK) { struct si_state_rasterizer *rs = sctx-queued.named.rasterizer; - bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON || - prim == V_008958_DI_PT_LINELOOP || - prim == V_008958_DI_PT_TRIFAN || - prim == V_008958_DI_PT_TRISTRIP_ADJ || - info-primitive_restart || - (rs ? rs-line_stipple_enable : false); - /* If the WD switch is false, the IA switch must be false too. */ - bool ia_switch_on_eop = wd_switch_on_eop; unsigned primgroup_size = 64; + /* SWITCH_ON_EOP(0) is always preferable. */ + bool wd_switch_on_eop = false; + bool ia_switch_on_eop = false; + + /* WD_SWITCH_ON_EOP has no effect on GPUs with less than +* 4 shader engines. Set 1 to pass the assertion below. +* The other cases are hardware requirements. */ + if (sctx-b.screen-info.max_se 4 || + prim == V_008958_DI_PT_POLYGON || + prim == V_008958_DI_PT_LINELOOP || + prim == V_008958_DI_PT_TRIFAN || + prim == V_008958_DI_PT_TRISTRIP_ADJ || + info-primitive_restart) + wd_switch_on_eop = true; + /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP is 0. * We don't know that for indirect drawing, so treat it as * always problematic. */ if (sctx-b.family == CHIP_HAWAII - (info-indirect || info-instance_count 1)) { + (info-indirect || info-instance_count 1)) wd_switch_on_eop = true; + + /* This is a hardware requirement. */ + if ((rs rs-line_stipple_enable) || + (sctx-b.screen-debug_flags DBG_SWITCH_ON_EOP)) { ia_switch_on_eop = true; + wd_switch_on_eop = true; } + /* If the WD switch is false, the IA switch must be false too. */ + assert(wd_switch_on_eop || !ia_switch_on_eop); + si_pm4_set_reg(pm4, R_028B74_VGT_DISPATCH_DRAW_INDEX,
[Mesa-dev] [PATCH 1/2] radeonsi: fix a hang with instancing in Unigine Heaven/Valley on Hawaii
From: Marek Olšák marek.ol...@amd.com This isn't documented anywhere, but it's the only thing that works for this case. --- src/gallium/drivers/radeonsi/si_state_draw.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index eb21ba1..4e808a3 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -411,14 +411,11 @@ static bool si_update_draw_info_state(struct si_context *sctx, bool ia_switch_on_eop = wd_switch_on_eop; unsigned primgroup_size = 64; - /* Hawaii hangs if instancing is enabled and each instance -* is smaller than a prim group and WD_SWITCH_ON_EOP is 0. + /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP is 0. * We don't know that for indirect drawing, so treat it as * always problematic. */ if (sctx-b.family == CHIP_HAWAII - (info-indirect || -(info-instance_count 1 - u_prims_for_vertices(info-mode, info-count) primgroup_size))) { + (info-indirect || info-instance_count 1)) { wd_switch_on_eop = true; ia_switch_on_eop = true; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: make vertex array type error checking a little more efficient
On 08/05/2014 10:35 AM, Roland Scheidegger wrote: Am 30.07.2014 19:08, schrieb Brian Paul: Compute the bitmask of supported array types once instead of every time we call a GL vertex array function. --- src/mesa/main/mtypes.h |3 ++ src/mesa/main/varray.c | 86 +++- 2 files changed, 59 insertions(+), 30 deletions(-) diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 3f60a55..f5ce360 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -1693,6 +1693,9 @@ struct gl_array_attrib /** One of the DRAW_xxx flags, not consumed by drivers */ gl_draw_method DrawMethod; + + /** Legal array datatypes */ + GLbitfield LegalTypesMask; }; diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c index 46956ef..0356858 100644 --- a/src/mesa/main/varray.c +++ b/src/mesa/main/varray.c @@ -179,6 +179,53 @@ vertex_binding_divisor(struct gl_context *ctx, GLuint bindingIndex, /** + * Examine the API profile and extensions to determine which types are legal + * for vertex arrays. This is called once from update_array_format(). + */ +static GLbitfield +get_legal_types_mask(const struct gl_context *ctx) +{ + GLbitfield legalTypesMask = ~0u; /* all */ I think it would be better to list all possible values explicitly here, otherwise you have lots of impossible bits in there in the end. It should not make any difference really, though for instance for debugging the legalTypesMask might look a little confusing otherwise. OK, I can do that in a follow-up. + + if (_mesa_is_gles(ctx)) { + legalTypesMask = ~(FIXED_GL_BIT | + DOUBLE_BIT | + UNSIGNED_INT_10F_11F_11F_REV_BIT); + + /* GL_INT and GL_UNSIGNED_INT data is not allowed in OpenGL ES until + * 3.0. The 2_10_10_10 types are added in OpenGL ES 3.0 or + * GL_OES_vertex_type_10_10_10_2. GL_HALF_FLOAT data is not allowed + * until 3.0 or with the GL_OES_vertex_half float extension, which isn't + * quite as trivial as we'd like because it uses a different enum value + * for GL_HALF_FLOAT_OES. + */ + if (ctx-Version 30) { + legalTypesMask = ~(UNSIGNED_INT_BIT | + INT_BIT | + UNSIGNED_INT_2_10_10_10_REV_BIT | + INT_2_10_10_10_REV_BIT | + HALF_BIT); + } + } + else { + legalTypesMask = ~FIXED_ES_BIT; + + if (!ctx-Extensions.ARB_ES2_compatibility) + legalTypesMask = ~FIXED_GL_BIT; + + if (!ctx-Extensions.ARB_vertex_type_2_10_10_10_rev) + legalTypesMask = ~(UNSIGNED_INT_2_10_10_10_REV_BIT | + INT_2_10_10_10_REV_BIT); + + if (!ctx-Extensions.ARB_vertex_type_10f_11f_11f_rev) + legalTypesMask = ~UNSIGNED_INT_10F_11F_11F_REV_BIT; + } + + return legalTypesMask; +} + + +/** * Does error checking and updates the format in an attrib array. * * Called by update_array() and VertexAttrib*Format(). @@ -208,40 +255,19 @@ update_array_format(struct gl_context *ctx, GLuint elementSize; GLenum format = GL_RGBA; - if (_mesa_is_gles(ctx)) { - legalTypesMask = ~(FIXED_GL_BIT | DOUBLE_BIT | UNSIGNED_INT_10F_11F_11F_REV_BIT); - - /* GL_INT and GL_UNSIGNED_INT data is not allowed in OpenGL ES until - * 3.0. The 2_10_10_10 types are added in OpenGL ES 3.0 or - * GL_OES_vertex_type_10_10_10_2. GL_HALF_FLOAT data is not allowed - * until 3.0 or with the GL_OES_vertex_half float extension, which isn't - * quite as trivial as we'd like because it uses a different enum value - * for GL_HALF_FLOAT_OES. + if (ctx-Array.LegalTypesMask == 0) { + /* One-time initialization. We can't do this in _mesa_init_varrays() + * below because extensions are not yet enabled at that point. */ - if (ctx-Version 30) { - legalTypesMask = ~(UNSIGNED_INT_BIT - | INT_BIT - | UNSIGNED_INT_2_10_10_10_REV_BIT - | INT_2_10_10_10_REV_BIT - | HALF_BIT); - } + ctx-Array.LegalTypesMask = get_legal_types_mask(ctx); + } + + legalTypesMask = ctx-Array.LegalTypesMask; + if (_mesa_is_gles(ctx) sizeMax == BGRA_OR_4) { /* BGRA ordering is not supported in ES contexts. */ - if (sizeMax == BGRA_OR_4) - sizeMax = 4; - } else { - legalTypesMask = ~FIXED_ES_BIT; - - if (!ctx-Extensions.ARB_ES2_compatibility) - legalTypesMask = ~FIXED_GL_BIT; - - if (!ctx-Extensions.ARB_vertex_type_2_10_10_10_rev) - legalTypesMask = ~(UNSIGNED_INT_2_10_10_10_REV_BIT | - INT_2_10_10_10_REV_BIT); - - if (!ctx-Extensions.ARB_vertex_type_10f_11f_11f_rev) - legalTypesMask = ~UNSIGNED_INT_10F_11F_11F_REV_BIT; + sizeMax = 4;
Re: [Mesa-dev] [PATCH] radeon, r200: fix buffer validation after CS flush
On Wed, Aug 6, 2014 at 9:28 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com This validates all bound buffers (CB, ZB, textures, DMA) at the beginning of CS. This fixes bo-space_accouned assertion failures. Tested by: Jochen Rollwagen joro-2...@t-online.de Cc: mesa-sta...@lists.freedesktop.org Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/mesa/drivers/dri/r200/r200_context.c| 1 + src/mesa/drivers/dri/r200/r200_state.c | 2 +- src/mesa/drivers/dri/r200/r200_state.h | 1 + src/mesa/drivers/dri/radeon/radeon_common.c | 14 +- src/mesa/drivers/dri/radeon/radeon_common_context.h | 1 + src/mesa/drivers/dri/radeon/radeon_context.c| 1 + src/mesa/drivers/dri/radeon/radeon_state.c | 2 +- src/mesa/drivers/dri/radeon/radeon_state.h | 1 + 8 files changed, 8 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/r200/r200_context.c b/src/mesa/drivers/dri/r200/r200_context.c index 71dfcf3..d5749f3 100644 --- a/src/mesa/drivers/dri/r200/r200_context.c +++ b/src/mesa/drivers/dri/r200/r200_context.c @@ -190,6 +190,7 @@ static void r200_init_vtbl(radeonContextPtr radeon) radeon-vtbl.check_blit = r200_check_blit; radeon-vtbl.blit = r200_blit; radeon-vtbl.is_format_renderable = radeonIsFormatRenderable; + radeon-vtbl.revalidate_all_buffers = r200ValidateBuffers; } diff --git a/src/mesa/drivers/dri/r200/r200_state.c b/src/mesa/drivers/dri/r200/r200_state.c index 2c7b652..983430f 100644 --- a/src/mesa/drivers/dri/r200/r200_state.c +++ b/src/mesa/drivers/dri/r200/r200_state.c @@ -2210,7 +2210,7 @@ static void update_texturematrix( struct gl_context *ctx ) } } -static GLboolean r200ValidateBuffers(struct gl_context *ctx) +GLboolean r200ValidateBuffers(struct gl_context *ctx) { r200ContextPtr rmesa = R200_CONTEXT(ctx); struct radeon_renderbuffer *rrb; diff --git a/src/mesa/drivers/dri/r200/r200_state.h b/src/mesa/drivers/dri/r200/r200_state.h index db0f01f..a396b06 100644 --- a/src/mesa/drivers/dri/r200/r200_state.h +++ b/src/mesa/drivers/dri/r200/r200_state.h @@ -47,6 +47,7 @@ extern void r200UpdateViewportOffset( struct gl_context *ctx ); extern void r200UpdateWindow( struct gl_context *ctx ); extern void r200UpdateDrawBuffer(struct gl_context *ctx); +extern GLboolean r200ValidateBuffers(struct gl_context *ctx); extern GLboolean r200ValidateState( struct gl_context *ctx ); extern void r200_vtbl_update_scissor( struct gl_context *ctx ); diff --git a/src/mesa/drivers/dri/radeon/radeon_common.c b/src/mesa/drivers/dri/radeon/radeon_common.c index 67c6dc7..515e55a 100644 --- a/src/mesa/drivers/dri/radeon/radeon_common.c +++ b/src/mesa/drivers/dri/radeon/radeon_common.c @@ -532,17 +532,6 @@ static INLINE void radeonEmitAtoms(radeonContextPtr radeon, GLboolean emitAll) COMMIT_BATCH(); } -static GLboolean radeon_revalidate_bos(struct gl_context *ctx) -{ - radeonContextPtr radeon = RADEON_CONTEXT(ctx); - int ret; - - ret = radeon_cs_space_check(radeon-cmdbuf.cs); - if (ret == RADEON_CS_SPACE_FLUSH) - return GL_FALSE; - return GL_TRUE; -} - void radeonEmitState(radeonContextPtr radeon) { radeon_print(RADEON_STATE, RADEON_NORMAL, %s\n, __FUNCTION__); @@ -661,9 +650,8 @@ int rcommonFlushCmdBufLocked(radeonContextPtr rmesa, const char *caller) radeon_cs_erase(rmesa-cmdbuf.cs); rmesa-cmdbuf.flushing = 0; - if (radeon_revalidate_bos(rmesa-glCtx) == GL_FALSE) { + if (!rmesa-vtbl.revalidate_all_buffers(rmesa-glCtx)) fprintf(stderr,failed to revalidate buffers\n); - } return ret; } diff --git a/src/mesa/drivers/dri/radeon/radeon_common_context.h b/src/mesa/drivers/dri/radeon/radeon_common_context.h index 6cd1535..ac3e7b5 100644 --- a/src/mesa/drivers/dri/radeon/radeon_common_context.h +++ b/src/mesa/drivers/dri/radeon/radeon_common_context.h @@ -496,6 +496,7 @@ struct radeon_context { unsigned reg_height, unsigned flip_y); unsigned (*is_format_renderable)(mesa_format mesa_format); + GLboolean (*revalidate_all_buffers)(struct gl_context *ctx); } vtbl; }; diff --git a/src/mesa/drivers/dri/radeon/radeon_context.c b/src/mesa/drivers/dri/radeon/radeon_context.c index 1ceb4ab..edd94e2 100644 --- a/src/mesa/drivers/dri/radeon/radeon_context.c +++ b/src/mesa/drivers/dri/radeon/radeon_context.c @@ -157,6 +157,7 @@ static void r100_init_vtbl(radeonContextPtr radeon) radeon-vtbl.check_blit = r100_check_blit; radeon-vtbl.blit = r100_blit; radeon-vtbl.is_format_renderable = radeonIsFormatRenderable; + radeon-vtbl.revalidate_all_buffers = r100ValidateBuffers; } /* Create the device specific context. diff --git
Re: [Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK
On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com The code is rewritten to take known constraints into account, while always using 0 by default. This should improve performance for multi-SE parts in theory. A debug option is also added for easier debugging. (If there are hangs, use the option. If the hangs go away, you have found the problem.) Just one comment below. With that addressed: Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeonsi/si_state_draw.c | 33 --- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 4 files changed, 43 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3476021..eb44d72 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -239,7 +239,6 @@ static const struct debug_named_value common_debug_options[] = { { vm, DBG_VM, Print virtual addresses when creating resources }, { trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c file with faulty cs }, - /* shaders */ { fs, DBG_FS, Print fetch shaders }, { vs, DBG_VS, Print vertex shaders }, @@ -254,6 +253,7 @@ static const struct debug_named_value common_debug_options[] = { { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of INVALIDATE_RANGE map flags }, { no2d, DBG_NO_2D_TILING, Disable 2D tiling }, { notiling, DBG_NO_TILING, Disable tiling }, + { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on end-of-packet. }, DEBUG_NAMED_VALUE_END /* must be last */ }; diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index dcec2bb..ac69d5b 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -93,6 +93,7 @@ #define DBG_NO_DISCARD_RANGE (1 12) #define DBG_NO_2D_TILING (1 13) #define DBG_NO_TILING (1 14) +#define DBG_SWITCH_ON_EOP (1 15) /* The maximum allowed bit is 15. */ #define R600_MAP_BUFFER_ALIGNMENT 64 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 4e808a3..ae839ba 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct si_context *sctx, if (sctx-b.chip_class = CIK) { struct si_state_rasterizer *rs = sctx-queued.named.rasterizer; - bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON || - prim == V_008958_DI_PT_LINELOOP || - prim == V_008958_DI_PT_TRIFAN || - prim == V_008958_DI_PT_TRISTRIP_ADJ || - info-primitive_restart || - (rs ? rs-line_stipple_enable : false); - /* If the WD switch is false, the IA switch must be false too. */ - bool ia_switch_on_eop = wd_switch_on_eop; unsigned primgroup_size = 64; + /* SWITCH_ON_EOP(0) is always preferable. */ + bool wd_switch_on_eop = false; + bool ia_switch_on_eop = false; + + /* WD_SWITCH_ON_EOP has no effect on GPUs with less than +* 4 shader engines. Set 1 to pass the assertion below. +* The other cases are hardware requirements. */ + if (sctx-b.screen-info.max_se 4 || + prim == V_008958_DI_PT_POLYGON || + prim == V_008958_DI_PT_LINELOOP || + prim == V_008958_DI_PT_TRIFAN || + prim == V_008958_DI_PT_TRISTRIP_ADJ || + info-primitive_restart) + wd_switch_on_eop = true; + /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP is 0. * We don't know that for indirect drawing, so treat it as * always problematic. */ if (sctx-b.family == CHIP_HAWAII - (info-indirect || info-instance_count 1)) { + (info-indirect || info-instance_count 1)) wd_switch_on_eop = true; + + /* This is a hardware requirement. */ + if ((rs rs-line_stipple_enable) || + (sctx-b.screen-debug_flags DBG_SWITCH_ON_EOP)) { ia_switch_on_eop = true; + wd_switch_on_eop = true;
Re: [Mesa-dev] [PATCH 1/2] radeonsi: fix a hang with instancing in Unigine Heaven/Valley on Hawaii
On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com This isn't documented anywhere, but it's the only thing that works for this case. Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/drivers/radeonsi/si_state_draw.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index eb21ba1..4e808a3 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -411,14 +411,11 @@ static bool si_update_draw_info_state(struct si_context *sctx, bool ia_switch_on_eop = wd_switch_on_eop; unsigned primgroup_size = 64; - /* Hawaii hangs if instancing is enabled and each instance -* is smaller than a prim group and WD_SWITCH_ON_EOP is 0. + /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP is 0. * We don't know that for indirect drawing, so treat it as * always problematic. */ if (sctx-b.family == CHIP_HAWAII - (info-indirect || -(info-instance_count 1 - u_prims_for_vertices(info-mode, info-count) primgroup_size))) { + (info-indirect || info-instance_count 1)) { wd_switch_on_eop = true; ia_switch_on_eop = true; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: fix clipvertex trouble if position comes from gs
On Aug 5, 2014, at 9:40 PM, srol...@vmware.com wrote: From: Roland Scheidegger srol...@vmware.com If the vertex shader has no position but the gs has, the clipvertex output was -1 (because it's the same as vs position in this case if there's no explicit clipvertex output). This caused crashes (or assertion failures) in clipping since in the end position (which came from gs) was different from cv (-1) and we then tried to use the bogus cv input. Rather than just test for -1 cv value in clipping, make it explicitly return the position output of the gs instead which seems cleaner (since we really don't want to use the clipvertex value from the vs (it could be a valid value in the (unsupported) case of vs writing clipvertex but still using a gs). This fixes piglit shader_runner clip-distance-out-values.shader_test. Great. Well done! Both of those look good. Reviewed-by: Zack Rusin za...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: support unsigned increment in ir_loop controls
On 07/30/2014 04:11 AM, Tapani Pälli wrote: Current version can create ir_expression where operands have different base type, patch adds support for unsigned type. Signed-off-by: Tapani Pälli tapani.pa...@intel.com https://bugs.freedesktop.org/show_bug.cgi?id=80880 --- src/glsl/loop_controls.cpp | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/src/glsl/loop_controls.cpp b/src/glsl/loop_controls.cpp index 36b49eb..419f9c1 100644 --- a/src/glsl/loop_controls.cpp +++ b/src/glsl/loop_controls.cpp @@ -123,9 +123,21 @@ calculate_iterations(ir_rvalue *from, ir_rvalue *to, ir_rvalue *increment, bool valid_loop = false; for (unsigned i = 0; i Elements(bias); i++) { - iter = (increment-type-is_integer()) - ? new(mem_ctx) ir_constant(iter_value + bias[i]) - : new(mem_ctx) ir_constant(float(iter_value + bias[i])); + + /* Increment may be of type int, uint or float. */ + switch (increment-type-base_type) { + case GLSL_TYPE_INT: + iter = new(mem_ctx) ir_constant(iter_value + bias[i]); + break; + case GLSL_TYPE_UINT: + iter = new(mem_ctx) ir_constant(unsigned(iter_value + bias[i])); + break; + case GLSL_TYPE_FLOAT: + iter = new(mem_ctx) ir_constant(float(iter_value + bias[i])); + break; + default: + assert(!Unsupported type for loop iterator.); Right... because this code was written when we only had int and float types. Two things: - Use spaces instead of tabs. It looks like the surrounding code uses tabs, and that's my fault. We're trying to fix that in new code. - Change the assert to unreachable(Unsupported type for loop iterator.) With those fixed, this patch is Reviewed-by: Ian Romanick ian.d.roman...@intel.com + } ir_expression *const mul = new(mem_ctx) ir_expression(ir_binop_mul, increment-type, iter, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi
Am 06.08.2014 13:00, schrieb Marek Olšák: On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote: On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com wrote: From a gallium perspective, indirect temp regs are already working - so something like MOV TEMP[0], TEMP[TEMP[1].x] should work. Indirect registers are supported for inputs, outputs, temps, constants, and immediates even, but the indirect reg itself must come from a temp or address reg (I am not 100% certain where that restriction comes from). I have no idea which drivers support it, all I can tell is that it works with llvmpipe. I sort of doubt it is supported for samplers right now in gallium though technically it might be possible to express this already. Well, with my limited patch + ChrisF's small patches to mesa core, the dynamic sampler stuff works for nvc0, except for the issues I outlined. Not sure what you mean by supported in gallium. Perhaps I have an incorrect view of things, but I see gallium as an amorphous thing that we can change to our heart's content. A cap bit for the ability to support dynamic indexing of shaders (plus whatever is needed for making it work like declaration of sampler arrays) would certainly be needed in any case. For drivers supporting Right... so it's not like shaders will start magically containing these things, it'll only happen if ARB_gs5 is enabled (probably via PIPE_CAP_GLSL = 400). Which presumably means that the backend supports whatever we're throwing at it. this I would certainly expect them to allow temp regs as the indirect reg. I guess it would be nice if we'd just use temp regs instead of address reg in glsl to tgsi conversion if a driver supports it. I think for modern drivers this makes a lot more sense than trying to shove everything into address regs. Agreed. With the exception that I guess we also need to support indexing with float values? (i.e. ARL) This would have to be treated with some care. Not sure when that comes up though... perhaps only if !native_integers, which won't be an issue with any of the hw that we're talking about. If you really want to lower ARL into a temp, I recommend using F2I, which is equivalent in behavior. For UARL, MOV will do. Also, I don't think GLSL sampler arrays have to be declared as arrays in TGSI. Array declarations are really only needed for TEMPs, because they allow better register allocation. Every other shader resource has a fixed location and would not benefit from it. I think not requiring them to be declared as an array is a bad idea. It may well be true that hw drivers can't really benefit from it but in any case it would be trivial to handle in the drivers. It gives you the ability to easily see what values are legal in the end as a sampler index, might help with debugging at some day. Besides, it's just bad style imho to index into things which aren't arrays, that is applicable to all languages, so I can't see why it should be different for tgsi. But I guess it's not all _that_ important. Roland If GLSL is strict about out-of-bounds access, I recommending always clamping the index in glsl_to_tgsi. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi
On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com wrote: Am 06.08.2014 13:00, schrieb Marek Olšák: On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote: On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com wrote: From a gallium perspective, indirect temp regs are already working - so something like MOV TEMP[0], TEMP[TEMP[1].x] should work. Indirect registers are supported for inputs, outputs, temps, constants, and immediates even, but the indirect reg itself must come from a temp or address reg (I am not 100% certain where that restriction comes from). I have no idea which drivers support it, all I can tell is that it works with llvmpipe. I sort of doubt it is supported for samplers right now in gallium though technically it might be possible to express this already. Well, with my limited patch + ChrisF's small patches to mesa core, the dynamic sampler stuff works for nvc0, except for the issues I outlined. Not sure what you mean by supported in gallium. Perhaps I have an incorrect view of things, but I see gallium as an amorphous thing that we can change to our heart's content. A cap bit for the ability to support dynamic indexing of shaders (plus whatever is needed for making it work like declaration of sampler arrays) would certainly be needed in any case. For drivers supporting Right... so it's not like shaders will start magically containing these things, it'll only happen if ARB_gs5 is enabled (probably via PIPE_CAP_GLSL = 400). Which presumably means that the backend supports whatever we're throwing at it. this I would certainly expect them to allow temp regs as the indirect reg. I guess it would be nice if we'd just use temp regs instead of address reg in glsl to tgsi conversion if a driver supports it. I think for modern drivers this makes a lot more sense than trying to shove everything into address regs. Agreed. With the exception that I guess we also need to support indexing with float values? (i.e. ARL) This would have to be treated with some care. Not sure when that comes up though... perhaps only if !native_integers, which won't be an issue with any of the hw that we're talking about. If you really want to lower ARL into a temp, I recommend using F2I, which is equivalent in behavior. For UARL, MOV will do. Also, I don't think GLSL sampler arrays have to be declared as arrays in TGSI. Array declarations are really only needed for TEMPs, because they allow better register allocation. Every other shader resource has a fixed location and would not benefit from it. I think not requiring them to be declared as an array is a bad idea. It may well be true that hw drivers can't really benefit from it but in any case it would be trivial to handle in the drivers. It gives you the ability to easily see what values are legal in the end as a sampler index, might help with debugging at some day. Besides, it's just bad You would see that based on the declarations anyways, no? style imho to index into things which aren't arrays, that is applicable to all languages, so I can't see why it should be different for tgsi. But I guess it's not all _that_ important. Well, it might be important to put this in some context -- sampler arrays are perfectly legal in GLSL today. What's not legal in pre-gs5 glsl (although based on some commends glsl 110 might have allowed it) are the dynamic indices. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: dump TGSI before calling into the driver
On Wed, Aug 6, 2014 at 9:33 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com If the driver crashes in create_xx_shader, you want to see the shader. Reviewed-by: Ilia Mirkin imir...@alum.mit.edu --- src/mesa/state_tracker/st_program.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/src/mesa/state_tracker/st_program.c b/src/mesa/state_tracker/st_program.c index 9d7b7c4..fbf8930 100644 --- a/src/mesa/state_tracker/st_program.c +++ b/src/mesa/state_tracker/st_program.c @@ -393,13 +393,12 @@ st_translate_vertex_program(struct st_context *st, vpv-tgsi.stream_output); } - vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi); - if (ST_DEBUG DEBUG_TGSI) { - tgsi_dump( vpv-tgsi.tokens, 0 ); + tgsi_dump(vpv-tgsi.tokens, 0); debug_printf(\n); } + vpv-driver_shader = pipe-create_vs_state(pipe, vpv-tgsi); return vpv; fail: @@ -804,15 +803,15 @@ st_translate_fragment_program(struct st_context *st, variant-tgsi.tokens = ureg_get_tokens( ureg, NULL ); ureg_destroy( ureg ); - /* fill in variant */ - variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi); - variant-key = *key; - if (ST_DEBUG DEBUG_TGSI) { - tgsi_dump( variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/ ); + tgsi_dump(variant-tgsi.tokens, 0/*TGSI_DUMP_VERBOSE*/); debug_printf(\n); } + /* fill in variant */ + variant-driver_shader = pipe-create_fs_state(pipe, variant-tgsi); + variant-key = *key; + if (deleteFP) { /* Free the temporary program made above */ struct gl_fragment_program *fp = stfp-Base; @@ -1173,10 +1172,6 @@ st_translate_geometry_program(struct st_context *st, stgp-tgsi.stream_output); } - /* fill in new variant */ - gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi); - gpv-key = *key; - if ((ST_DEBUG DEBUG_TGSI) (ST_DEBUG DEBUG_MESA)) { _mesa_print_program(stgp-Base.Base); debug_printf(\n); @@ -1187,6 +1182,9 @@ st_translate_geometry_program(struct st_context *st, debug_printf(\n); } + /* fill in new variant */ + gpv-driver_shader = pipe-create_gs_state(pipe, stgp-tgsi); + gpv-key = *key; return gpv; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi
Am 06.08.2014 17:03, schrieb Ilia Mirkin: On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com wrote: Am 06.08.2014 13:00, schrieb Marek Olšák: On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote: On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com wrote: From a gallium perspective, indirect temp regs are already working - so something like MOV TEMP[0], TEMP[TEMP[1].x] should work. Indirect registers are supported for inputs, outputs, temps, constants, and immediates even, but the indirect reg itself must come from a temp or address reg (I am not 100% certain where that restriction comes from). I have no idea which drivers support it, all I can tell is that it works with llvmpipe. I sort of doubt it is supported for samplers right now in gallium though technically it might be possible to express this already. Well, with my limited patch + ChrisF's small patches to mesa core, the dynamic sampler stuff works for nvc0, except for the issues I outlined. Not sure what you mean by supported in gallium. Perhaps I have an incorrect view of things, but I see gallium as an amorphous thing that we can change to our heart's content. A cap bit for the ability to support dynamic indexing of shaders (plus whatever is needed for making it work like declaration of sampler arrays) would certainly be needed in any case. For drivers supporting Right... so it's not like shaders will start magically containing these things, it'll only happen if ARB_gs5 is enabled (probably via PIPE_CAP_GLSL = 400). Which presumably means that the backend supports whatever we're throwing at it. this I would certainly expect them to allow temp regs as the indirect reg. I guess it would be nice if we'd just use temp regs instead of address reg in glsl to tgsi conversion if a driver supports it. I think for modern drivers this makes a lot more sense than trying to shove everything into address regs. Agreed. With the exception that I guess we also need to support indexing with float values? (i.e. ARL) This would have to be treated with some care. Not sure when that comes up though... perhaps only if !native_integers, which won't be an issue with any of the hw that we're talking about. If you really want to lower ARL into a temp, I recommend using F2I, which is equivalent in behavior. For UARL, MOV will do. Also, I don't think GLSL sampler arrays have to be declared as arrays in TGSI. Array declarations are really only needed for TEMPs, because they allow better register allocation. Every other shader resource has a fixed location and would not benefit from it. I think not requiring them to be declared as an array is a bad idea. It may well be true that hw drivers can't really benefit from it but in any case it would be trivial to handle in the drivers. It gives you the ability to easily see what values are legal in the end as a sampler index, might help with debugging at some day. Besides, it's just bad You would see that based on the declarations anyways, no? How so? If you've got 15 samplers declared it is still not legal to index into the 15th one if your sampler array is starting at 0 with 5 entries (or maybe it is legal but results undefined). That is at least my understanding of the spec. (Of course if I'm wrong here then indeed sampler arrays are worthless.) style imho to index into things which aren't arrays, that is applicable to all languages, so I can't see why it should be different for tgsi. But I guess it's not all _that_ important. Well, it might be important to put this in some context -- sampler arrays are perfectly legal in GLSL today. What's not legal in pre-gs5 glsl (although based on some commends glsl 110 might have allowed it) are the dynamic indices. Yes, but without dynamically indexing into it the sampler array can easily be flattened since you always got the corresponding immediate index. That is, it was never addressed as an array in tgsi. FWIW this is the same story with d3d10 - resource dcls could be arrays but the index had to be an immediate. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi
On Wed, Aug 6, 2014 at 11:15 AM, Roland Scheidegger srol...@vmware.com wrote: Am 06.08.2014 17:03, schrieb Ilia Mirkin: On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com wrote: Am 06.08.2014 13:00, schrieb Marek Olšák: On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote: On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com wrote: From a gallium perspective, indirect temp regs are already working - so something like MOV TEMP[0], TEMP[TEMP[1].x] should work. Indirect registers are supported for inputs, outputs, temps, constants, and immediates even, but the indirect reg itself must come from a temp or address reg (I am not 100% certain where that restriction comes from). I have no idea which drivers support it, all I can tell is that it works with llvmpipe. I sort of doubt it is supported for samplers right now in gallium though technically it might be possible to express this already. Well, with my limited patch + ChrisF's small patches to mesa core, the dynamic sampler stuff works for nvc0, except for the issues I outlined. Not sure what you mean by supported in gallium. Perhaps I have an incorrect view of things, but I see gallium as an amorphous thing that we can change to our heart's content. A cap bit for the ability to support dynamic indexing of shaders (plus whatever is needed for making it work like declaration of sampler arrays) would certainly be needed in any case. For drivers supporting Right... so it's not like shaders will start magically containing these things, it'll only happen if ARB_gs5 is enabled (probably via PIPE_CAP_GLSL = 400). Which presumably means that the backend supports whatever we're throwing at it. this I would certainly expect them to allow temp regs as the indirect reg. I guess it would be nice if we'd just use temp regs instead of address reg in glsl to tgsi conversion if a driver supports it. I think for modern drivers this makes a lot more sense than trying to shove everything into address regs. Agreed. With the exception that I guess we also need to support indexing with float values? (i.e. ARL) This would have to be treated with some care. Not sure when that comes up though... perhaps only if !native_integers, which won't be an issue with any of the hw that we're talking about. If you really want to lower ARL into a temp, I recommend using F2I, which is equivalent in behavior. For UARL, MOV will do. Also, I don't think GLSL sampler arrays have to be declared as arrays in TGSI. Array declarations are really only needed for TEMPs, because they allow better register allocation. Every other shader resource has a fixed location and would not benefit from it. I think not requiring them to be declared as an array is a bad idea. It may well be true that hw drivers can't really benefit from it but in any case it would be trivial to handle in the drivers. It gives you the ability to easily see what values are legal in the end as a sampler index, might help with debugging at some day. Besides, it's just bad You would see that based on the declarations anyways, no? How so? If you've got 15 samplers declared it is still not legal to index into the 15th one if your sampler array is starting at 0 with 5 entries (or maybe it is legal but results undefined). That is at least my understanding of the spec. (Of course if I'm wrong here then indeed sampler arrays are worthless.) That is indeed not legal. So right -- you wouldn't see where the arrays are. But is that really worth worrying about at the TGSI level? Anyways, I'll send my patch once perl gets unbroken on my system, and you can rip it apart then :) Doing the array thing would be a giant complication for what I perceive to be fairly little gain. The thing is that the information of what's an array where is long lost by the time the declarations are created -- there's just a bitmask of used samplers. style imho to index into things which aren't arrays, that is applicable to all languages, so I can't see why it should be different for tgsi. But I guess it's not all _that_ important. Well, it might be important to put this in some context -- sampler arrays are perfectly legal in GLSL today. What's not legal in pre-gs5 glsl (although based on some commends glsl 110 might have allowed it) are the dynamic indices. Yes, but without dynamically indexing into it the sampler array can easily be flattened since you always got the corresponding immediate index. That is, it was never addressed as an array in tgsi. FWIW this is the same story with d3d10 - resource dcls could be arrays but the index had to be an immediate. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
Replace the plain sampler index with a register reference to a sampler. We also need to keep track of the sampler array size when there is a relative reference so that we can mark the whole array used. To facilitate implementation, we add a separate ADDR register that exclusively handles the sampler relative address. Other approaches would be more invasive. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- _mesa_get_sampler_array_nonconst_index is a function added by a patch that ChrisF is working on... basically it returns NULL unless it's a nonconst access. I've done a very modest amount of piglit testing, but I definitely need to do some more. The nvc0 bits aren't 100% ready -- I noticed that in some odd situations the arguments to the tex instruction will get all mangled. But for a simple case that mixes non-array and array samplers, it looks something like this: FRAG PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 DCL IN[0], GENERIC[0], PERSPECTIVE DCL OUT[0], COLOR DCL SAMP[0] DCL SAMP[1] DCL SAMP[2] DCL SAMP[3] DCL CONST[0..1] DCL TEMP[0..1], LOCAL DCL ADDR[0..2] IMM[0] FLT32 {0., 0., 0., 0.} 0: MOV TEMP[0].xy, IN[0].xyyy 1: TEX TEMP[0], TEMP[0], SAMP[0], 2D 2: MOV TEMP[1].xy, IN[0].xyyy 3: UARL ADDR[2].x, CONST[1]. 4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D 5: MUL TEMP[1], TEMP[1], CONST[0]. 6: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1] 7: MOV OUT[0], TEMP[0] 8: END src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 60 +- 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index dcf0cb5..6d3ac91 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -78,7 +78,7 @@ struct ureg_tokens { #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS #define UREG_MAX_CONSTANT_RANGE 32 #define UREG_MAX_IMMEDIATE 4096 -#define UREG_MAX_ADDR 2 +#define UREG_MAX_ADDR 3 #define UREG_MAX_PRED 1 #define UREG_MAX_ARRAY_TEMPS 256 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index c5e2eb5..0d5c3ed 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -245,7 +245,8 @@ public: ir_instruction *ir; GLboolean cond_update; bool saturate; - int sampler; /** sampler index */ + st_src_reg sampler; /** sampler register */ + int sampler_array_size; /** 1-based size of sampler array, 1 if not array */ int tex_target; /** One of TEXTURE_*_INDEX */ GLboolean tex_shadow; @@ -476,6 +477,7 @@ static st_dst_reg undef_dst = st_dst_reg(PROGRAM_UNDEFINED, SWIZZLE_NOOP, GLSL_T static st_dst_reg address_reg = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 0); static st_dst_reg address_reg2 = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 1); +static st_dst_reg sampler_reladdr = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 2); static void fail_link(struct gl_shader_program *prog, const char *fmt, ...) PRINTFLIKE(2, 3); @@ -2799,6 +2801,8 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) glsl_to_tgsi_instruction *inst = NULL; unsigned opcode = TGSI_OPCODE_NOP; const glsl_type *sampler_type = ir-sampler-type; + ir_rvalue *sampler_index = + _mesa_get_sampler_array_nonconst_index(ir-sampler); bool is_cube_array = false; unsigned i; @@ -3016,6 +3020,11 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) coord_dst.writemask = WRITEMASK_XYZW; } + if (sampler_index) { + sampler_index-accept(this); + emit_arl(ir, sampler_reladdr, this-result); + } + if (opcode == TGSI_OPCODE_TXD) inst = emit(ir, opcode, result_dst, coord, dx, dy); else if (opcode == TGSI_OPCODE_TXQ) { @@ -3045,9 +3054,18 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) if (ir-shadow_comparitor) inst-tex_shadow = GL_TRUE; - inst-sampler = _mesa_get_sampler_uniform_value(ir-sampler, - this-shader_program, - this-prog); + inst-sampler.index = _mesa_get_sampler_uniform_value(ir-sampler, + this-shader_program, + this-prog); + if (sampler_index) { + inst-sampler.reladdr = ralloc(mem_ctx, st_src_reg); + memcpy(inst-sampler.reladdr, sampler_reladdr, sizeof(sampler_reladdr)); + inst-sampler_array_size = + ir-sampler-as_dereference_array() +-array-variable_referenced()-type-length; + } else { + inst-sampler_array_size = 1; + } if (ir-offset) { for (i = 0; i MAX_GLSL_TEXTURE_OFFSET offset[i].file != PROGRAM_UNDEFINED; i++) @@ -3215,10 +3233,12 @@ count_resources(glsl_to_tgsi_visitor *v, gl_program
Re: [Mesa-dev] RFC: mesa/st dynamic sampler support in tgsi
Am 06.08.2014 17:20, schrieb Ilia Mirkin: On Wed, Aug 6, 2014 at 11:15 AM, Roland Scheidegger srol...@vmware.com wrote: Am 06.08.2014 17:03, schrieb Ilia Mirkin: On Wed, Aug 6, 2014 at 10:52 AM, Roland Scheidegger srol...@vmware.com wrote: Am 06.08.2014 13:00, schrieb Marek Olšák: On Wed, Aug 6, 2014 at 4:02 AM, Ilia Mirkin imir...@alum.mit.edu wrote: On Tue, Aug 5, 2014 at 5:25 PM, Roland Scheidegger srol...@vmware.com wrote: From a gallium perspective, indirect temp regs are already working - so something like MOV TEMP[0], TEMP[TEMP[1].x] should work. Indirect registers are supported for inputs, outputs, temps, constants, and immediates even, but the indirect reg itself must come from a temp or address reg (I am not 100% certain where that restriction comes from). I have no idea which drivers support it, all I can tell is that it works with llvmpipe. I sort of doubt it is supported for samplers right now in gallium though technically it might be possible to express this already. Well, with my limited patch + ChrisF's small patches to mesa core, the dynamic sampler stuff works for nvc0, except for the issues I outlined. Not sure what you mean by supported in gallium. Perhaps I have an incorrect view of things, but I see gallium as an amorphous thing that we can change to our heart's content. A cap bit for the ability to support dynamic indexing of shaders (plus whatever is needed for making it work like declaration of sampler arrays) would certainly be needed in any case. For drivers supporting Right... so it's not like shaders will start magically containing these things, it'll only happen if ARB_gs5 is enabled (probably via PIPE_CAP_GLSL = 400). Which presumably means that the backend supports whatever we're throwing at it. this I would certainly expect them to allow temp regs as the indirect reg. I guess it would be nice if we'd just use temp regs instead of address reg in glsl to tgsi conversion if a driver supports it. I think for modern drivers this makes a lot more sense than trying to shove everything into address regs. Agreed. With the exception that I guess we also need to support indexing with float values? (i.e. ARL) This would have to be treated with some care. Not sure when that comes up though... perhaps only if !native_integers, which won't be an issue with any of the hw that we're talking about. If you really want to lower ARL into a temp, I recommend using F2I, which is equivalent in behavior. For UARL, MOV will do. Also, I don't think GLSL sampler arrays have to be declared as arrays in TGSI. Array declarations are really only needed for TEMPs, because they allow better register allocation. Every other shader resource has a fixed location and would not benefit from it. I think not requiring them to be declared as an array is a bad idea. It may well be true that hw drivers can't really benefit from it but in any case it would be trivial to handle in the drivers. It gives you the ability to easily see what values are legal in the end as a sampler index, might help with debugging at some day. Besides, it's just bad You would see that based on the declarations anyways, no? How so? If you've got 15 samplers declared it is still not legal to index into the 15th one if your sampler array is starting at 0 with 5 entries (or maybe it is legal but results undefined). That is at least my understanding of the spec. (Of course if I'm wrong here then indeed sampler arrays are worthless.) That is indeed not legal. So right -- you wouldn't see where the arrays are. But is that really worth worrying about at the TGSI level? Anyways, I'll send my patch once perl gets unbroken on my system, and you can rip it apart then :) Doing the array thing would be a giant complication for what I perceive to be fairly little gain. The thing is that the information of what's an array where is long lost by the time the declarations are created -- there's just a bitmask of used samplers. Oh I wasn't aware of that I thought you got that information pretty easily. Yeah in that case I guess it's not worth bothering. In any case we could tighten that up later if necessary. Roland style imho to index into things which aren't arrays, that is applicable to all languages, so I can't see why it should be different for tgsi. But I guess it's not all _that_ important. Well, it might be important to put this in some context -- sampler arrays are perfectly legal in GLSL today. What's not legal in pre-gs5 glsl (although based on some commends glsl 110 might have allowed it) are the dynamic indices. Yes, but without dynamically indexing into it the sampler array can easily be flattened since you always got the corresponding immediate index. That is, it was never addressed as an array in tgsi. FWIW this is the same story with d3d10 - resource dcls could be arrays but the index had to be an immediate. Roland
Re: [Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK
On Wed, Aug 6, 2014 at 4:01 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com The code is rewritten to take known constraints into account, while always using 0 by default. This should improve performance for multi-SE parts in theory. A debug option is also added for easier debugging. (If there are hangs, use the option. If the hangs go away, you have found the problem.) Just one comment below. With that addressed: Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeonsi/si_state_draw.c | 33 --- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 4 files changed, 43 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3476021..eb44d72 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -239,7 +239,6 @@ static const struct debug_named_value common_debug_options[] = { { vm, DBG_VM, Print virtual addresses when creating resources }, { trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c file with faulty cs }, - /* shaders */ { fs, DBG_FS, Print fetch shaders }, { vs, DBG_VS, Print vertex shaders }, @@ -254,6 +253,7 @@ static const struct debug_named_value common_debug_options[] = { { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of INVALIDATE_RANGE map flags }, { no2d, DBG_NO_2D_TILING, Disable 2D tiling }, { notiling, DBG_NO_TILING, Disable tiling }, + { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on end-of-packet. }, DEBUG_NAMED_VALUE_END /* must be last */ }; diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index dcec2bb..ac69d5b 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -93,6 +93,7 @@ #define DBG_NO_DISCARD_RANGE (1 12) #define DBG_NO_2D_TILING (1 13) #define DBG_NO_TILING (1 14) +#define DBG_SWITCH_ON_EOP (1 15) /* The maximum allowed bit is 15. */ #define R600_MAP_BUFFER_ALIGNMENT 64 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 4e808a3..ae839ba 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct si_context *sctx, if (sctx-b.chip_class = CIK) { struct si_state_rasterizer *rs = sctx-queued.named.rasterizer; - bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON || - prim == V_008958_DI_PT_LINELOOP || - prim == V_008958_DI_PT_TRIFAN || - prim == V_008958_DI_PT_TRISTRIP_ADJ || - info-primitive_restart || - (rs ? rs-line_stipple_enable : false); - /* If the WD switch is false, the IA switch must be false too. */ - bool ia_switch_on_eop = wd_switch_on_eop; unsigned primgroup_size = 64; + /* SWITCH_ON_EOP(0) is always preferable. */ + bool wd_switch_on_eop = false; + bool ia_switch_on_eop = false; + + /* WD_SWITCH_ON_EOP has no effect on GPUs with less than +* 4 shader engines. Set 1 to pass the assertion below. +* The other cases are hardware requirements. */ + if (sctx-b.screen-info.max_se 4 || + prim == V_008958_DI_PT_POLYGON || + prim == V_008958_DI_PT_LINELOOP || + prim == V_008958_DI_PT_TRIFAN || + prim == V_008958_DI_PT_TRISTRIP_ADJ || + info-primitive_restart) + wd_switch_on_eop = true; + /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP is 0. * We don't know that for indirect drawing, so treat it as * always problematic. */ if (sctx-b.family == CHIP_HAWAII - (info-indirect || info-instance_count 1)) { + (info-indirect || info-instance_count 1)) wd_switch_on_eop = true; + + /* This is a hardware requirement. */ + if ((rs rs-line_stipple_enable) || + (sctx-b.screen-debug_flags DBG_SWITCH_ON_EOP)) {
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
I guess PIPE_SHADER_CAP_MAX_ADDRS is now useless, because it can be derived from GLSL_FEATURE_LEVEL, right? Marek On Wed, Aug 6, 2014 at 5:25 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Replace the plain sampler index with a register reference to a sampler. We also need to keep track of the sampler array size when there is a relative reference so that we can mark the whole array used. To facilitate implementation, we add a separate ADDR register that exclusively handles the sampler relative address. Other approaches would be more invasive. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- _mesa_get_sampler_array_nonconst_index is a function added by a patch that ChrisF is working on... basically it returns NULL unless it's a nonconst access. I've done a very modest amount of piglit testing, but I definitely need to do some more. The nvc0 bits aren't 100% ready -- I noticed that in some odd situations the arguments to the tex instruction will get all mangled. But for a simple case that mixes non-array and array samplers, it looks something like this: FRAG PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 DCL IN[0], GENERIC[0], PERSPECTIVE DCL OUT[0], COLOR DCL SAMP[0] DCL SAMP[1] DCL SAMP[2] DCL SAMP[3] DCL CONST[0..1] DCL TEMP[0..1], LOCAL DCL ADDR[0..2] IMM[0] FLT32 {0., 0., 0., 0.} 0: MOV TEMP[0].xy, IN[0].xyyy 1: TEX TEMP[0], TEMP[0], SAMP[0], 2D 2: MOV TEMP[1].xy, IN[0].xyyy 3: UARL ADDR[2].x, CONST[1]. 4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D 5: MUL TEMP[1], TEMP[1], CONST[0]. 6: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1] 7: MOV OUT[0], TEMP[0] 8: END src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 60 +- 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index dcf0cb5..6d3ac91 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -78,7 +78,7 @@ struct ureg_tokens { #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS #define UREG_MAX_CONSTANT_RANGE 32 #define UREG_MAX_IMMEDIATE 4096 -#define UREG_MAX_ADDR 2 +#define UREG_MAX_ADDR 3 #define UREG_MAX_PRED 1 #define UREG_MAX_ARRAY_TEMPS 256 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index c5e2eb5..0d5c3ed 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -245,7 +245,8 @@ public: ir_instruction *ir; GLboolean cond_update; bool saturate; - int sampler; /** sampler index */ + st_src_reg sampler; /** sampler register */ + int sampler_array_size; /** 1-based size of sampler array, 1 if not array */ int tex_target; /** One of TEXTURE_*_INDEX */ GLboolean tex_shadow; @@ -476,6 +477,7 @@ static st_dst_reg undef_dst = st_dst_reg(PROGRAM_UNDEFINED, SWIZZLE_NOOP, GLSL_T static st_dst_reg address_reg = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 0); static st_dst_reg address_reg2 = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 1); +static st_dst_reg sampler_reladdr = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 2); static void fail_link(struct gl_shader_program *prog, const char *fmt, ...) PRINTFLIKE(2, 3); @@ -2799,6 +2801,8 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) glsl_to_tgsi_instruction *inst = NULL; unsigned opcode = TGSI_OPCODE_NOP; const glsl_type *sampler_type = ir-sampler-type; + ir_rvalue *sampler_index = + _mesa_get_sampler_array_nonconst_index(ir-sampler); bool is_cube_array = false; unsigned i; @@ -3016,6 +3020,11 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) coord_dst.writemask = WRITEMASK_XYZW; } + if (sampler_index) { + sampler_index-accept(this); + emit_arl(ir, sampler_reladdr, this-result); + } + if (opcode == TGSI_OPCODE_TXD) inst = emit(ir, opcode, result_dst, coord, dx, dy); else if (opcode == TGSI_OPCODE_TXQ) { @@ -3045,9 +3054,18 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) if (ir-shadow_comparitor) inst-tex_shadow = GL_TRUE; - inst-sampler = _mesa_get_sampler_uniform_value(ir-sampler, - this-shader_program, - this-prog); + inst-sampler.index = _mesa_get_sampler_uniform_value(ir-sampler, + this-shader_program, + this-prog); + if (sampler_index) { + inst-sampler.reladdr = ralloc(mem_ctx, st_src_reg); + memcpy(inst-sampler.reladdr, sampler_reladdr, sizeof(sampler_reladdr)); + inst-sampler_array_size = + ir-sampler-as_dereference_array() +
Re: [Mesa-dev] [PATCH 2/2] radeonsi: always prefer SWITCH_ON_EOP(0) on CIK
On Wed, Aug 6, 2014 at 11:30 AM, Marek Olšák mar...@gmail.com wrote: On Wed, Aug 6, 2014 at 4:01 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Aug 6, 2014 at 9:32 AM, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com The code is rewritten to take known constraints into account, while always using 0 by default. This should improve performance for multi-SE parts in theory. A debug option is also added for easier debugging. (If there are hangs, use the option. If the hangs go away, you have found the problem.) Just one comment below. With that addressed: Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- src/gallium/drivers/radeon/r600_pipe_common.c | 2 +- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeonsi/si_state_draw.c | 33 --- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 17 4 files changed, 43 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 3476021..eb44d72 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -239,7 +239,6 @@ static const struct debug_named_value common_debug_options[] = { { vm, DBG_VM, Print virtual addresses when creating resources }, { trace_cs, DBG_TRACE_CS, Trace cs and write rlockup_csid.c file with faulty cs }, - /* shaders */ { fs, DBG_FS, Print fetch shaders }, { vs, DBG_VS, Print vertex shaders }, @@ -254,6 +253,7 @@ static const struct debug_named_value common_debug_options[] = { { noinvalrange, DBG_NO_DISCARD_RANGE, Disable handling of INVALIDATE_RANGE map flags }, { no2d, DBG_NO_2D_TILING, Disable 2D tiling }, { notiling, DBG_NO_TILING, Disable tiling }, + { switch_on_eop, DBG_SWITCH_ON_EOP, Program WD/IA to switch on end-of-packet. }, DEBUG_NAMED_VALUE_END /* must be last */ }; diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index dcec2bb..ac69d5b 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -93,6 +93,7 @@ #define DBG_NO_DISCARD_RANGE (1 12) #define DBG_NO_2D_TILING (1 13) #define DBG_NO_TILING (1 14) +#define DBG_SWITCH_ON_EOP (1 15) /* The maximum allowed bit is 15. */ #define R600_MAP_BUFFER_ALIGNMENT 64 diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 4e808a3..ae839ba 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -401,25 +401,40 @@ static bool si_update_draw_info_state(struct si_context *sctx, if (sctx-b.chip_class = CIK) { struct si_state_rasterizer *rs = sctx-queued.named.rasterizer; - bool wd_switch_on_eop = prim == V_008958_DI_PT_POLYGON || - prim == V_008958_DI_PT_LINELOOP || - prim == V_008958_DI_PT_TRIFAN || - prim == V_008958_DI_PT_TRISTRIP_ADJ || - info-primitive_restart || - (rs ? rs-line_stipple_enable : false); - /* If the WD switch is false, the IA switch must be false too. */ - bool ia_switch_on_eop = wd_switch_on_eop; unsigned primgroup_size = 64; + /* SWITCH_ON_EOP(0) is always preferable. */ + bool wd_switch_on_eop = false; + bool ia_switch_on_eop = false; + + /* WD_SWITCH_ON_EOP has no effect on GPUs with less than +* 4 shader engines. Set 1 to pass the assertion below. +* The other cases are hardware requirements. */ + if (sctx-b.screen-info.max_se 4 || + prim == V_008958_DI_PT_POLYGON || + prim == V_008958_DI_PT_LINELOOP || + prim == V_008958_DI_PT_TRIFAN || + prim == V_008958_DI_PT_TRISTRIP_ADJ || + info-primitive_restart) + wd_switch_on_eop = true; + /* Hawaii hangs if instancing is enabled and WD_SWITCH_ON_EOP is 0. * We don't know that for indirect drawing, so treat it as * always problematic. */ if (sctx-b.family == CHIP_HAWAII - (info-indirect || info-instance_count 1)) { + (info-indirect || info-instance_count 1)) wd_switch_on_eop = true; + + /* This is a hardware requirement. */ + if ((rs rs-line_stipple_enable) || +
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
pc-MaxAddressRegs = pc-MaxNativeAddressRegs = _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS), MAX_PROGRAM_ADDRESS_REGS); Not really sure what that's referring to... ARB_vp/fp or something? Anyways, this is definitely a bit of a violation of that. OTOH, so is the indirect UBO indexing and indirect GS input access (assuming that's allowed), since those would use ADDR[1] and every driver (except nv30) returns 1, and sometimes 0 -- including nv50/nvc0/r600/radeonsi. So... dunno what the proper way to proceed is. Fix drivers to claim higher numbers? Continue the tradition of ignoring it and relying on the fact that GPU's that don't support it also won't support the features that cause it to get used? On Wed, Aug 6, 2014 at 11:45 AM, Marek Olšák mar...@gmail.com wrote: I guess PIPE_SHADER_CAP_MAX_ADDRS is now useless, because it can be derived from GLSL_FEATURE_LEVEL, right? Marek On Wed, Aug 6, 2014 at 5:25 PM, Ilia Mirkin imir...@alum.mit.edu wrote: Replace the plain sampler index with a register reference to a sampler. We also need to keep track of the sampler array size when there is a relative reference so that we can mark the whole array used. To facilitate implementation, we add a separate ADDR register that exclusively handles the sampler relative address. Other approaches would be more invasive. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- _mesa_get_sampler_array_nonconst_index is a function added by a patch that ChrisF is working on... basically it returns NULL unless it's a nonconst access. I've done a very modest amount of piglit testing, but I definitely need to do some more. The nvc0 bits aren't 100% ready -- I noticed that in some odd situations the arguments to the tex instruction will get all mangled. But for a simple case that mixes non-array and array samplers, it looks something like this: FRAG PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1 DCL IN[0], GENERIC[0], PERSPECTIVE DCL OUT[0], COLOR DCL SAMP[0] DCL SAMP[1] DCL SAMP[2] DCL SAMP[3] DCL CONST[0..1] DCL TEMP[0..1], LOCAL DCL ADDR[0..2] IMM[0] FLT32 {0., 0., 0., 0.} 0: MOV TEMP[0].xy, IN[0].xyyy 1: TEX TEMP[0], TEMP[0], SAMP[0], 2D 2: MOV TEMP[1].xy, IN[0].xyyy 3: UARL ADDR[2].x, CONST[1]. 4: TEX TEMP[1], TEMP[1], SAMP[ADDR[2].x+1], 2D 5: MUL TEMP[1], TEMP[1], CONST[0]. 6: MAD TEMP[0], TEMP[0], IMM[0]., TEMP[1] 7: MOV OUT[0], TEMP[0] 8: END src/gallium/auxiliary/tgsi/tgsi_ureg.c | 2 +- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 60 +- 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_ureg.c b/src/gallium/auxiliary/tgsi/tgsi_ureg.c index dcf0cb5..6d3ac91 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_ureg.c +++ b/src/gallium/auxiliary/tgsi/tgsi_ureg.c @@ -78,7 +78,7 @@ struct ureg_tokens { #define UREG_MAX_OUTPUT PIPE_MAX_SHADER_OUTPUTS #define UREG_MAX_CONSTANT_RANGE 32 #define UREG_MAX_IMMEDIATE 4096 -#define UREG_MAX_ADDR 2 +#define UREG_MAX_ADDR 3 #define UREG_MAX_PRED 1 #define UREG_MAX_ARRAY_TEMPS 256 diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index c5e2eb5..0d5c3ed 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -245,7 +245,8 @@ public: ir_instruction *ir; GLboolean cond_update; bool saturate; - int sampler; /** sampler index */ + st_src_reg sampler; /** sampler register */ + int sampler_array_size; /** 1-based size of sampler array, 1 if not array */ int tex_target; /** One of TEXTURE_*_INDEX */ GLboolean tex_shadow; @@ -476,6 +477,7 @@ static st_dst_reg undef_dst = st_dst_reg(PROGRAM_UNDEFINED, SWIZZLE_NOOP, GLSL_T static st_dst_reg address_reg = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 0); static st_dst_reg address_reg2 = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 1); +static st_dst_reg sampler_reladdr = st_dst_reg(PROGRAM_ADDRESS, WRITEMASK_X, GLSL_TYPE_FLOAT, 2); static void fail_link(struct gl_shader_program *prog, const char *fmt, ...) PRINTFLIKE(2, 3); @@ -2799,6 +2801,8 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) glsl_to_tgsi_instruction *inst = NULL; unsigned opcode = TGSI_OPCODE_NOP; const glsl_type *sampler_type = ir-sampler-type; + ir_rvalue *sampler_index = + _mesa_get_sampler_array_nonconst_index(ir-sampler); bool is_cube_array = false; unsigned i; @@ -3016,6 +3020,11 @@ glsl_to_tgsi_visitor::visit(ir_texture *ir) coord_dst.writemask = WRITEMASK_XYZW; } + if (sampler_index) { + sampler_index-accept(this); + emit_arl(ir, sampler_reladdr, this-result); + } + if (opcode == TGSI_OPCODE_TXD) inst = emit(ir, opcode, result_dst, coord, dx, dy); else if
Re: [Mesa-dev] [PATCH] mesa/st: add support for dynamic sampler offsets
On Wed, Aug 6, 2014 at 5:53 PM, Ilia Mirkin imir...@alum.mit.edu wrote: pc-MaxAddressRegs = pc-MaxNativeAddressRegs = _min(screen-get_shader_param(screen, sh, PIPE_SHADER_CAP_MAX_ADDRS), MAX_PROGRAM_ADDRESS_REGS); Not really sure what that's referring to... ARB_vp/fp or something? Yes, ARB_vp needs 1, ARB_fp doesn't support indirect addresing (expects 0). Anyways, this is definitely a bit of a violation of that. OTOH, so is the indirect UBO indexing and indirect GS input access (assuming that's allowed), since those would use ADDR[1] and every driver (except nv30) returns 1, and sometimes 0 -- including nv50/nvc0/r600/radeonsi. So... dunno what the proper way to proceed is. Fix drivers to claim higher numbers? Continue the tradition of ignoring it and relying on the fact that GPU's that don't support it also won't support the features that cause it to get used? You don't have to worry about that for now. We can clean it up later. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4] winsys/radeon: fix nop packet padding for hawaii
On Mon, Aug 4, 2014 at 6:48 AM, Andreas Boll andreas.boll@gmail.com wrote: The initial firmware for hawaii does not support type3 nop packet. Detect the new hawaii firmware with query RADEON_INFO_ACCEL_WORKING2. If the returned value is 3, then the new firmware is used. This patch uses type2 for the old firmware and type3 for the new firmware. It fixes the cases when the old firmware is used and the user wants to manually enable acceleration. The two possible scenarios are: - the kernel has no support for the new firmware. - the kernel has support for the new firmware but only the old firmware is available. Additionaly this patch disables GPU acceleration on hawaii if the kernel returns a value 2. In this case the kernel hasn't the required fixes for proper acceleration. v2: - Fix indentation - Use private struct radeon_drm_winsys instead of public struct radeon_info - Rename r600_accel_working2 to accel_working2 v3: - Use type2 nop packet for returned value 3 v4: - Fail to initialize winsys for returned value 2 Cc: mesa-sta...@lists.freedesktop.org Cc: Alex Deucher alexander.deuc...@amd.com Cc: Jérôme Glisse jgli...@redhat.com Cc: Marek Olšák marek.ol...@amd.com Cc: Michel Dänzer michel.daen...@amd.com Signed-off-by: Andreas Boll andreas.boll@gmail.com Reviewed-by: Alex Deucher alexander.deuc...@amd.com --- Unfortunately I can't test this patch myself since I don't own a hawaii card. So I'd need someone to test this patch on kernel = 3.16-rc7 + these patches [1-2]. This patch would bring us one step further for hawaii acceleration on kernel 3.16. Finally we can enable hawaii acceleration if the query returns 2 [3]. Andreas. [1] http://lists.freedesktop.org/archives/dri-devel/2014-August/065305.html [2] http://lists.freedesktop.org/archives/dri-devel/2014-August/065306.html [3] http://lists.x.org/archives/xorg-driver-ati/2014-August/026534.html src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 6 +- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 10 ++ src/gallium/winsys/radeon/drm/radeon_drm_winsys.h | 1 + 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index a06ecb2..dd109af 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -446,8 +446,12 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs *rcs, case RING_GFX: /* pad DMA ring to 8 DWs to meet CP fetch alignment requirements * r6xx, requires at least 4 dw alignment to avoid a hw bug. + * hawaii with old firmware needs type2 nop packet. + * accel_working2 with value 2 indicates the new firmware. */ -if (cs-ws-info.chip_class = SI) { +if (cs-ws-info.chip_class = SI || +(cs-ws-info.family == CHIP_HAWAII + cs-ws-accel_working2 3)) { while (rcs-cdw 7) OUT_CS(cs-base, 0x8000); /* type2 nop packet */ } else { diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c index 910d06b..ecff0e7 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.c @@ -395,6 +395,16 @@ static boolean do_winsys_init(struct radeon_drm_winsys *ws) radeon_get_drm_value(ws-fd, RADEON_INFO_MAX_SH_PER_SE, NULL, ws-info.max_sh_per_se); +radeon_get_drm_value(ws-fd, RADEON_INFO_ACCEL_WORKING2, NULL, + ws-accel_working2); +if (ws-info.family == CHIP_HAWAII ws-accel_working2 2) { +fprintf(stderr, radeon: GPU acceleration for Hawaii disabled, +returned accel_working2 value %u is smaller than 2. +Please install a newer kernel.\n, +ws-accel_working2); +return FALSE; +} + if (radeon_get_drm_value(ws-fd, RADEON_INFO_SI_TILE_MODE_ARRAY, NULL, ws-info.si_tile_mode_array)) { ws-info.si_tile_mode_array_valid = TRUE; diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h index ea6f7f0..aebc391 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h +++ b/src/gallium/winsys/radeon/drm/radeon_drm_winsys.h @@ -55,6 +55,7 @@ struct radeon_drm_winsys { enum radeon_generation gen; struct radeon_info info; uint32_t va_start; +uint32_t accel_working2; struct pb_manager *kman; struct pb_manager *cman_vram; -- 2.0.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list
[Mesa-dev] [PATCH 01/12] mesa: Add the GL_ARB_texture_compression_bptc extension
This adds a boolean in the gl_extensions struct for GL_ARB_texture_compression_bptc as well as an entry in extension_table. --- src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 9ac8377..f3197f9 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -156,6 +156,7 @@ static const struct extension extension_table[] = { { GL_ARB_texture_buffer_object_rgb32, o(ARB_texture_buffer_object_rgb32), GLC,2009 }, { GL_ARB_texture_buffer_range, o(ARB_texture_buffer_range),GLC,2012 }, { GL_ARB_texture_compression, o(dummy_true), GLL,2000 }, + { GL_ARB_texture_compression_bptc, o(ARB_texture_compression_bptc),GL, 2010 }, { GL_ARB_texture_compression_rgtc, o(ARB_texture_compression_rgtc),GL, 2004 }, { GL_ARB_texture_cube_map,o(ARB_texture_cube_map), GLL,1999 }, { GL_ARB_texture_cube_map_array, o(ARB_texture_cube_map_array), GL, 2009 }, diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index f5ce360..312a336 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3574,6 +3574,7 @@ struct gl_extensions GLboolean ARB_texture_buffer_object; GLboolean ARB_texture_buffer_object_rgb32; GLboolean ARB_texture_buffer_range; + GLboolean ARB_texture_compression_bptc; GLboolean ARB_texture_compression_rgtc; GLboolean ARB_texture_cube_map; GLboolean ARB_texture_cube_map_array; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/12] mesa/format_info: Add support for the BPTC layout
Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC depend on the mode but as it only has to be an approximation we can set it to 4 like for S3TC. --- src/mesa/main/format_info.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py index a0eecd3..fc40dc4 100644 --- a/src/mesa/main/format_info.py +++ b/src/mesa/main/format_info.py @@ -110,7 +110,7 @@ def get_channel_bits(fmat, chan_name): if fmat.is_compressed(): # These values are pretty-much bogus, but OpenGL requires that we # return an approximate number of bits. - if fmat.layout == 's3tc': + if fmat.layout in ('s3tc', 'bptc'): return 4 if fmat.has_channel(chan_name) else 0 elif fmat.layout == 'fxt1': if chan_name in 'rgb': -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/12] mesa/format_info: Add support for compressed floating-point formats
If the name of a compressed texture format has ‘FLOAT’ in it it will now set the data type of the format to GL_FLOAT. This will be needed for the BPTC half-float formats. --- src/mesa/main/format_info.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py index 448bd00..a0eecd3 100644 --- a/src/mesa/main/format_info.py +++ b/src/mesa/main/format_info.py @@ -62,7 +62,9 @@ def get_gl_base_format(fmat): def get_gl_data_type(fmat): if fmat.is_compressed(): - if 'SIGNED' in fmat.name or 'SNORM' in fmat.name: + if 'FLOAT' in fmat.name: + return 'GL_FLOAT' + elif 'SIGNED' in fmat.name or 'SNORM' in fmat.name: return 'GL_SIGNED_NORMALIZED' else: return 'GL_UNSIGNED_NORMALIZED' -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/12] swrast: Enable GL_ARB_texture_compression_bptc
Enables BPTC texture compression on the software rasterizer. --- src/mesa/main/extensions.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index f3197f9..7732249 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -449,6 +449,7 @@ _mesa_enable_sw_extensions(struct gl_context *ctx) ctx-Extensions.ARB_point_sprite = GL_TRUE; ctx-Extensions.ARB_shadow = GL_TRUE; ctx-Extensions.ARB_texture_border_clamp = GL_TRUE; + ctx-Extensions.ARB_texture_compression_bptc = GL_TRUE; ctx-Extensions.ARB_texture_cube_map = GL_TRUE; ctx-Extensions.ARB_texture_env_combine = GL_TRUE; ctx-Extensions.ARB_texture_env_crossbar = GL_TRUE; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/12] Add support for BPTC texture compression
Here is a v2 of the BPTC texture compression series. The main difference is that instead of going via DXT3 for the UNORM formats it now always uses the custom naïve compressor for all formats. This doesn't give very good-looking results but it is fast and doesn't add any dependencies. There was some discussion about alternative approaches on the list here: http://lists.freedesktop.org/archives/mesa-dev/2014-July/064436.html I didn't manage to get any consensus on whether this approach is the right thing to do so I thought I would just post the patches and see what happens. The other changes are: • The patches are rebased on top of Jason Ekstrand's texstore changes. This required some modification to format_info.py. • Added a patch to make glGenerateMipmap work with the BPTC formats. • Added a patch to make the meta implementation of glGetTexImage work with the two floating-point formats. • Added the formats to some format query functions that were missed. (There are a lot of switches for formats spread around Mesa!) • Fixed setting the alpha component to 1.0 when fetching from the RGB half-float formats. • Fixed fetching the alpha component from sRGB formats. • Fixed the quantization step for the half-float compressor. • Fixed a typo causing a bug in the compressor for textures with a width that isn't a multiple of four. The patches are also available on Github here: https://github.com/bpeel/mesa/commits/wip/bptc There are piglit tests for BPTC in a branch here: https://github.com/bpeel/piglit/commits/wip/bptc - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 06/12] mesa: Add texel fetch functions for BPTC-compressed textures
Adds functions to fetch from any of the four BPTC-compressed formats. v2: Set the alpha component to 1.0 when fetching from the half-float formats instead of leaving it uninitialised. Don't linearize the alpha component when fetching from sRGB. --- src/mesa/Makefile.sources| 1 + src/mesa/main/texcompress.c | 6 + src/mesa/main/texcompress_bptc.c | 960 +++ src/mesa/main/texcompress_bptc.h | 34 ++ 4 files changed, 1001 insertions(+) create mode 100644 src/mesa/main/texcompress_bptc.c create mode 100644 src/mesa/main/texcompress_bptc.h diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources index 45c53ca..d495bd1 100644 --- a/src/mesa/Makefile.sources +++ b/src/mesa/Makefile.sources @@ -96,6 +96,7 @@ MAIN_FILES = \ $(SRCDIR)main/stencil.c \ $(SRCDIR)main/syncobj.c \ $(SRCDIR)main/texcompress.c \ + $(SRCDIR)main/texcompress_bptc.c \ $(SRCDIR)main/texcompress_cpal.c \ $(SRCDIR)main/texcompress_rgtc.c \ $(SRCDIR)main/texcompress_s3tc.c \ diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c index 53c0ea0..b4efeee 100644 --- a/src/mesa/main/texcompress.c +++ b/src/mesa/main/texcompress.c @@ -42,6 +42,7 @@ #include texcompress_rgtc.h #include texcompress_s3tc.h #include texcompress_etc.h +#include texcompress_bptc.h /** @@ -610,6 +611,11 @@ _mesa_get_compressed_fetch_func(mesa_format format) return _mesa_get_compressed_rgtc_func(format); case MESA_FORMAT_ETC1_RGB8: return _mesa_get_etc_fetch_func(format); + case MESA_FORMAT_BPTC_RGBA_UNORM: + case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: + case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT: + case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT: + return _mesa_get_bptc_fetch_func(format); default: return NULL; } diff --git a/src/mesa/main/texcompress_bptc.c b/src/mesa/main/texcompress_bptc.c new file mode 100644 index 000..7ec294b --- /dev/null +++ b/src/mesa/main/texcompress_bptc.c @@ -0,0 +1,960 @@ +/* + * Copyright (C) 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +/** + * \file texcompress_bptc.c + * GL_ARB_texture_compression_bptc support. + */ + +#include stdbool.h +#include texcompress.h +#include texcompress_bptc.h +#include util/format_srgb.h +#include texstore.h +#include macros.h +#include image.h + +#define BLOCK_SIZE 4 +#define N_PARTITIONS 64 +#define BLOCK_BYTES 16 + +struct bptc_unorm_mode { + int n_subsets; + int n_partition_bits; + bool has_rotation_bits; + bool has_index_selection_bit; + int n_color_bits; + int n_alpha_bits; + bool has_endpoint_pbits; + bool has_shared_pbits; + int n_index_bits; + int n_secondary_index_bits; +}; + +struct bptc_float_bitfield { + int8_t endpoint; + uint8_t component; + uint8_t offset; + uint8_t n_bits; + bool reverse; +}; + +struct bptc_float_mode { + bool reserved; + bool transformed_endpoints; + int n_partition_bits; + int n_endpoint_bits; + int n_index_bits; + int n_delta_bits[3]; + struct bptc_float_bitfield bitfields[24]; +}; + +static const struct bptc_unorm_mode +bptc_unorm_modes[] = { + /* 0 */ { 3, 4, false, false, 4, 0, true, false, 3, 0 }, + /* 1 */ { 2, 6, false, false, 6, 0, false, true, 3, 0 }, + /* 2 */ { 3, 6, false, false, 5, 0, false, false, 2, 0 }, + /* 3 */ { 2, 6, false, false, 7, 0, true, false, 2, 0 }, + /* 4 */ { 1, 0, true, true, 5, 6, false, false, 2, 3 }, + /* 5 */ { 1, 0, true, false, 7, 8, false, false, 2, 2 }, + /* 6 */ { 1, 0, false, false, 7, 7, true, false, 4, 0 }, + /* 7 */ { 2, 6, false, false, 5, 5, true, false, 2, 0 } +}; + +static const struct bptc_float_mode +bptc_float_modes[] = { + /* 00 */ + { false, true, 5, 10, 3, { 5, 5, 5 }, + { { 2, 1, 4, 1, false }, { 2, 2, 4, 1, false }, { 3, 2, 4, 1, false }, + { 0, 0, 0, 10, false },
[Mesa-dev] [PATCH 08/12] mesa/main: Modify generate_mipmap_compressed to cope with float textures
Once we add BPTC texture support we will need to generate mipmaps for compressed floating point textures too. Most of the code seems to already be there but it just needs a few extra lines to get it to use GL_FLOAT instead of GL_UNSIGNED_BYTE as the type for the temporary buffers. --- src/mesa/main/mipmap.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/mipmap.c b/src/mesa/main/mipmap.c index cc109cc..fdaa682 100644 --- a/src/mesa/main/mipmap.c +++ b/src/mesa/main/mipmap.c @@ -2038,12 +2038,15 @@ generate_mipmap_compressed(struct gl_context *ctx, GLenum target, components = _mesa_format_num_components(temp_format); - /* Revisit this if we get compressed formats with 8 bits per component */ - if (_mesa_get_format_datatype(srcImage-TexFormat) - == GL_SIGNED_NORMALIZED) { + switch (_mesa_get_format_datatype(srcImage-TexFormat)) { + case GL_FLOAT: + temp_datatype = GL_FLOAT; + break; + case GL_SIGNED_NORMALIZED: + /* Revisit this if we get compressed formats with 8 bits per component */ temp_datatype = GL_BYTE; - } - else { + break; + default: temp_datatype = GL_UNSIGNED_BYTE; } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 05/12] mesa: Add the format enums for BPTC-compressed images
This adds the following four Mesa image format enums which correspond to the four BPTC compressed texture formats: MESA_FORMAT_BPTC_RGBA_UNORM MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT It also updates the format information functions to handle these and the corresponding GL enums. v2: Also modify _mesa_get_format_color_encoding, _mesa_get_srgb_format_linear and _mesa_get_uncompressed_format --- src/mesa/main/formats.c | 20 src/mesa/main/formats.csv| 6 ++ src/mesa/main/formats.h | 6 ++ src/mesa/main/glformats.c| 10 ++ src/mesa/main/texcompress.c | 24 src/mesa/main/texformat.c| 8 src/mesa/main/teximage.c | 14 ++ src/mesa/swrast/s_texfetch.c | 24 8 files changed, 112 insertions(+) diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c index f03425e..a5e06ce 100644 --- a/src/mesa/main/formats.c +++ b/src/mesa/main/formats.c @@ -369,6 +369,7 @@ _mesa_get_format_color_encoding(mesa_format format) case MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC: case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1: case MESA_FORMAT_B8G8R8X8_SRGB: + case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: return GL_SRGB; default: return GL_LINEAR; @@ -426,6 +427,9 @@ _mesa_get_srgb_format_linear(mesa_format format) case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1: format = MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1; break; + case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: + format = MESA_FORMAT_BPTC_RGBA_UNORM; + break; case MESA_FORMAT_B8G8R8X8_SRGB: format = MESA_FORMAT_B8G8R8X8_UNORM; break; @@ -491,6 +495,12 @@ _mesa_get_uncompressed_format(mesa_format format) case MESA_FORMAT_ETC2_RG11_EAC: case MESA_FORMAT_ETC2_SIGNED_RG11_EAC: return MESA_FORMAT_R16G16_UNORM; + case MESA_FORMAT_BPTC_RGBA_UNORM: + case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: + return MESA_FORMAT_A8B8G8R8_UNORM; + case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT: + case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT: + return MESA_FORMAT_RGB_FLOAT32; default: #ifdef DEBUG assert(!_mesa_is_format_compressed(format)); @@ -968,6 +978,10 @@ _mesa_format_to_type_and_comps(mesa_format format, case MESA_FORMAT_ETC2_SIGNED_RG11_EAC: case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1: case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1: + case MESA_FORMAT_BPTC_RGBA_UNORM: + case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: + case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT: + case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT: /* XXX generate error instead? */ *datatype = GL_UNSIGNED_BYTE; *comps = 0; @@ -1524,6 +1538,12 @@ _mesa_format_matches_format_and_type(mesa_format mesa_format, case MESA_FORMAT_RGBA_DXT5: return GL_FALSE; + case MESA_FORMAT_BPTC_RGBA_UNORM: + case MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM: + case MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT: + case MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT: + return GL_FALSE; + case MESA_FORMAT_RGBA_FLOAT32: return format == GL_RGBA type == GL_FLOAT !swapBytes; case MESA_FORMAT_RGBA_FLOAT16: diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv index 5abb706..fdd4341 100644 --- a/src/mesa/main/formats.csv +++ b/src/mesa/main/formats.csv @@ -280,3 +280,9 @@ MESA_FORMAT_ETC2_SIGNED_R11_EAC , etc2 , 4, 4, x64 , , , MESA_FORMAT_ETC2_SIGNED_RG11_EAC , etc2 , 4, 4, x128, , , , xyzw, rgb MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2 , 4, 4, x64 , , , , xyzw, rgb MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2 , 4, 4, x128, , , , xyzw, srgb + +# BPTC compressed formats +MESA_FORMAT_BPTC_RGBA_UNORM , bptc , 4, 4, x128, , , , xyzw, rgb +MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM , bptc , 4, 4, x128, , , , xyzw, srgb +MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT , bptc , 4, 4, x128, , , , xyz1, rgb +MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT , bptc , 4, 4, x128, , , , xyz1, rgb diff --git a/src/mesa/main/formats.h b/src/mesa/main/formats.h index 457c8ab..83a7367 100644 --- a/src/mesa/main/formats.h +++ b/src/mesa/main/formats.h @@ -427,6 +427,12 @@ typedef enum MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1, MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, + /* BPTC compressed formats */ + MESA_FORMAT_BPTC_RGBA_UNORM, + MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM, + MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT, + MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT, + MESA_FORMAT_COUNT } mesa_format; diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c index 0fb25ba..00478f9 100644 --- a/src/mesa/main/glformats.c +++ b/src/mesa/main/glformats.c @@ -787,6 +787,10 @@ _mesa_is_color_format(GLenum format) case GL_COMPRESSED_SIGNED_RG11_EAC: case
[Mesa-dev] [PATCH 02/12] mesa: Fix the base format for GL_COMPRESSED_RGB_BPTC_*_FLOAT_ARB
The signed and unsigned half-float BPTC-compressed formats were being reported as having a base format of GL_RGBA but they don't store an alpha channel so it should be GL_RGB. --- src/mesa/main/texcompress.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/texcompress.c b/src/mesa/main/texcompress.c index 9dbfe9f..fb3ea02 100644 --- a/src/mesa/main/texcompress.c +++ b/src/mesa/main/texcompress.c @@ -92,6 +92,8 @@ _mesa_gl_compressed_format_base_format(GLenum format) case GL_COMPRESSED_RGB: case GL_COMPRESSED_SRGB: + case GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT_ARB: + case GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT_ARB: case GL_COMPRESSED_RGB_S3TC_DXT1_EXT: case GL_COMPRESSED_RGB_FXT1_3DFX: case GL_COMPRESSED_SRGB_S3TC_DXT1_EXT: @@ -104,8 +106,6 @@ _mesa_gl_compressed_format_base_format(GLenum format) case GL_COMPRESSED_SRGB_ALPHA: case GL_COMPRESSED_RGBA_BPTC_UNORM_ARB: case GL_COMPRESSED_SRGB_ALPHA_BPTC_UNORM_ARB: - case GL_COMPRESSED_RGB_BPTC_SIGNED_FLOAT_ARB: - case GL_COMPRESSED_RGB_BPTC_UNSIGNED_FLOAT_ARB: case GL_COMPRESSED_RGBA_S3TC_DXT1_EXT: case GL_COMPRESSED_RGBA_S3TC_DXT3_EXT: case GL_COMPRESSED_RGBA_S3TC_DXT5_EXT: -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 12/12] docs: Update release notes and GL3.txt for GL_ARB_texture_compression_bptc
--- docs/GL3.txt| 2 +- docs/relnotes/10.3.html | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index e241257..973495c 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -138,7 +138,7 @@ GL 4.1: GL 4.2: GLSL 4.2 not started - GL_ARB_texture_compression_bptc not started + GL_ARB_texture_compression_bptc DONE (i965) GL_ARB_compressed_texture_pixel_storage DONE (all drivers) GL_ARB_shader_atomic_countersDONE (i965) GL_ARB_texture_storage DONE (all drivers) diff --git a/docs/relnotes/10.3.html b/docs/relnotes/10.3.html index f023ca6..0bb417d 100644 --- a/docs/relnotes/10.3.html +++ b/docs/relnotes/10.3.html @@ -61,6 +61,7 @@ Note: some of the new features are only available with certain drivers. liGL_ARB_clear_texture on i965/li liA new software rasterizer driver (kms_swrast_dri.so) that works with DRM drivers that don't have a full-fledged GEM (such as qxl or simpledrm)/li +liGL_ARB_texture_compression_bptc on i965/gen7+/li /ul -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/12] mesa/meta: Support decompressing floating-point formats
Previously the Meta implementation of glGetTexImage would fall back to _mesa_get_teximage if the texturing is not using an unsigned normalised format. However in order to support the half-float formats of BPTC textures we can make it render to a floating-point renderbuffer instead. This patch makes decompression_state have two FBOs, one for the GL_RGBA format and one for GL_RGBA32F. If a floating-point texture is encountered it will try setting up a floating-point FBO. It will now also check the status of the FBO and fall back to _mesa_get_teximage if the FBO is not complete. --- src/mesa/drivers/common/meta.c | 97 -- src/mesa/drivers/common/meta.h | 14 +- 2 files changed, 78 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index f8f0ee3..c3764ee 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -2940,14 +2940,22 @@ _mesa_meta_CopyTexSubImage(struct gl_context *ctx, GLuint dims, free(buf); } +static void +meta_decompress_fbo_cleanup(struct decompress_fbo_state *decompress_fbo) +{ + if (decompress_fbo-FBO != 0) { + _mesa_DeleteFramebuffers(1, decompress_fbo-FBO); + _mesa_DeleteRenderbuffers(1, decompress_fbo-RBO); + } + + memset(decompress_fbo, 0, sizeof(*decompress_fbo)); +} static void meta_decompress_cleanup(struct decompress_state *decompress) { - if (decompress-FBO != 0) { - _mesa_DeleteFramebuffers(1, decompress-FBO); - _mesa_DeleteRenderbuffers(1, decompress-RBO); - } + meta_decompress_fbo_cleanup(decompress-byteFBO); + meta_decompress_fbo_cleanup(decompress-floatFBO); if (decompress-VAO != 0) { _mesa_DeleteVertexArrays(1, decompress-VAO); @@ -2969,7 +2977,7 @@ meta_decompress_cleanup(struct decompress_state *decompress) * \param dest destination buffer * \param destRowLength dest image rowLength (ala GL_PACK_ROW_LENGTH) */ -static void +static bool decompress_texture_image(struct gl_context *ctx, struct gl_texture_image *texImage, GLuint slice, @@ -2977,17 +2985,33 @@ decompress_texture_image(struct gl_context *ctx, GLvoid *dest) { struct decompress_state *decompress = ctx-Meta-Decompress; + struct decompress_fbo_state *decompress_fbo; struct gl_texture_object *texObj = texImage-TexObject; const GLint width = texImage-Width; const GLint height = texImage-Height; const GLint depth = texImage-Height; const GLenum target = texObj-Target; + GLenum rbFormat; GLenum faceTarget; struct vertex verts[4]; GLuint samplerSave; + GLenum status; const bool use_glsl_version = ctx-Extensions.ARB_vertex_shader ctx-Extensions.ARB_fragment_shader; + switch (_mesa_get_format_datatype(texImage-TexFormat)) { + case GL_FLOAT: + decompress_fbo = decompress-floatFBO; + rbFormat = GL_RGBA32F; + break; + case GL_UNSIGNED_NORMALIZED: + decompress_fbo = decompress-byteFBO; + rbFormat = GL_RGBA; + break; + default: + return false; + } + if (slice 0) { assert(target == GL_TEXTURE_3D || target == GL_TEXTURE_2D_ARRAY || @@ -2998,11 +3022,11 @@ decompress_texture_image(struct gl_context *ctx, case GL_TEXTURE_1D: case GL_TEXTURE_1D_ARRAY: assert(!No compressed 1D textures.); - return; + return false; case GL_TEXTURE_3D: assert(!No compressed 3D textures.); - return; + return false; case GL_TEXTURE_CUBE_MAP_ARRAY: faceTarget = GL_TEXTURE_CUBE_MAP_POSITIVE_X + (slice % 6); @@ -3024,27 +3048,35 @@ decompress_texture_image(struct gl_context *ctx, ctx-Texture.Unit[ctx-Texture.CurrentUnit].Sampler-Name : 0; /* Create/bind FBO/renderbuffer */ - if (decompress-FBO == 0) { - _mesa_GenFramebuffers(1, decompress-FBO); - _mesa_GenRenderbuffers(1, decompress-RBO); - _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress-FBO); - _mesa_BindRenderbuffer(GL_RENDERBUFFER_EXT, decompress-RBO); + if (decompress_fbo-FBO == 0) { + _mesa_GenFramebuffers(1, decompress_fbo-FBO); + _mesa_GenRenderbuffers(1, decompress_fbo-RBO); + _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress_fbo-FBO); + _mesa_BindRenderbuffer(GL_RENDERBUFFER_EXT, decompress_fbo-RBO); _mesa_FramebufferRenderbuffer(GL_FRAMEBUFFER_EXT, GL_COLOR_ATTACHMENT0_EXT, GL_RENDERBUFFER_EXT, - decompress-RBO); + decompress_fbo-RBO); } else { - _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress-FBO); + _mesa_BindFramebuffer(GL_FRAMEBUFFER_EXT, decompress_fbo-FBO); } /* alloc dest surface */ - if (width decompress-Width || height decompress-Height)
[Mesa-dev] [PATCH v2 07/12] mesa: Add texstore functions for BPTC-compressed textures
This adds compressors for all four of the BPTC compressed-texture formats. The compressor is written from scratch and takes a very simple approach. It always uses a single mode of the BPTC format (4 for unorm and 3 for half-floats) and picks the two endpoints by dividing the texels into those which have more or less than the average luminance of the block and then calculating an average color of the texels within each division. It's probably not really sensible to try to use BPTC compression at runtime because for example with the Nvidia offline compression tool it can take in the order of an hour to compress a full-screen image. With that in mind I don't think it's worth having a proper compressor in Mesa and this approach gives reasonable results for a usage that is basically a corner case. v2: Always use the custom compressor, even for the unorm formats. Fix the quantization step for the half-float format compressor. Fixed a typo which was breaking the right-hand edge of half-float textures with a width that isn't a multiple of four. --- src/mesa/main/texcompress_bptc.c | 689 +++ src/mesa/main/texcompress_bptc.h | 10 + src/mesa/main/texstore.c | 10 + 3 files changed, 709 insertions(+) diff --git a/src/mesa/main/texcompress_bptc.c b/src/mesa/main/texcompress_bptc.c index 7ec294b..9204f12 100644 --- a/src/mesa/main/texcompress_bptc.c +++ b/src/mesa/main/texcompress_bptc.c @@ -69,6 +69,12 @@ struct bptc_float_mode { struct bptc_float_bitfield bitfields[24]; }; +struct bit_writer { + uint8_t buf; + int pos; + uint8_t *dst; +}; + static const struct bptc_unorm_mode bptc_unorm_modes[] = { /* 0 */ { 3, 4, false, false, 4, 0, true, false, 3, 0 }, @@ -958,3 +964,686 @@ _mesa_get_bptc_fetch_func(mesa_format format) return NULL; } } + +static void +write_bits(struct bit_writer *writer, int n_bits, int value) +{ + do { + if (n_bits + writer-pos = 8) { + *(writer-dst++) = writer-buf | (value writer-pos); + writer-buf = 0; + value = (8 - writer-pos); + n_bits -= (8 - writer-pos); + writer-pos = 0; + } else { + writer-buf |= value writer-pos; + writer-pos += n_bits; + break; + } + } while (n_bits 0); +} + +static void +get_average_luminance_alpha_unorm(int width, int height, + const uint8_t *src, int src_rowstride, + int *average_luminance, int *average_alpha) +{ + int luminance_sum = 0, alpha_sum = 0; + int y, x; + + for (y = 0; y height; y++) { + for (x = 0; x width; x++) { + luminance_sum += src[0] + src[1] + src[2]; + alpha_sum += src[3]; + src += 4; + } + src += src_rowstride - width * 4; + } + + *average_luminance = luminance_sum / (width * height); + *average_alpha = alpha_sum / (width * height); +} + +static void +get_rgba_endpoints_unorm(int width, int height, + const uint8_t *src, int src_rowstride, + int average_luminance, int average_alpha, + uint8_t endpoints[][4]) +{ + int endpoint_luminances[2]; + int midpoint; + int sums[2][4]; + int endpoint; + int luminance; + uint8_t temp[3]; + const uint8_t *p = src; + int rgb_left_endpoint_count = 0; + int alpha_left_endpoint_count = 0; + int y, x, i; + + memset(sums, 0, sizeof sums); + + for (y = 0; y height; y++) { + for (x = 0; x width; x++) { + luminance = p[0] + p[1] + p[2]; + if (luminance average_luminance) { +endpoint = 0; +rgb_left_endpoint_count++; + } else { +endpoint = 1; + } + for (i = 0; i 3; i++) +sums[endpoint][i] += p[i]; + + if (p[2] average_alpha) { +endpoint = 0; +alpha_left_endpoint_count++; + } else { +endpoint = 1; + } + sums[endpoint][3] += p[3]; + + p += 4; + } + + p += src_rowstride - width * 4; + } + + if (rgb_left_endpoint_count == 0 || + rgb_left_endpoint_count == width * height) { + for (i = 0; i 3; i++) + endpoints[0][i] = endpoints[1][i] = +(sums[0][i] + sums[1][i]) / (width * height); + } else { + for (i = 0; i 3; i++) { + endpoints[0][i] = sums[0][i] / rgb_left_endpoint_count; + endpoints[1][i] = (sums[1][i] / +(width * height - rgb_left_endpoint_count)); + } + } + + if (alpha_left_endpoint_count == 0 || + alpha_left_endpoint_count == width * height) { + endpoints[0][3] = endpoints[1][3] = + (sums[0][3] + sums[1][3]) / (width * height); + } else { + endpoints[0][3] = sums[0][3] / alpha_left_endpoint_count; + endpoints[1][3] = (sums[1][3] / +(width * height - alpha_left_endpoint_count)); + } + +
[Mesa-dev] [PATCH 09/12] i965: Enable the GL_ARB_texture_compression_bptc extension
Enables the BPTC extension on Gen=7 and adds the necessary format mappings to get the right surface type value. --- src/mesa/drivers/dri/i965/brw_surface_formats.c | 5 + src/mesa/drivers/dri/i965/intel_extensions.c| 2 ++ 2 files changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_surface_formats.c b/src/mesa/drivers/dri/i965/brw_surface_formats.c index 41f4221..974f2df 100644 --- a/src/mesa/drivers/dri/i965/brw_surface_formats.c +++ b/src/mesa/drivers/dri/i965/brw_surface_formats.c @@ -487,6 +487,11 @@ brw_format_for_mesa_format(mesa_format mesa_format) [MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1] = BRW_SURFACEFORMAT_ETC2_RGB8_PTA, [MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1] = BRW_SURFACEFORMAT_ETC2_SRGB8_PTA, + [MESA_FORMAT_BPTC_RGBA_UNORM] = BRW_SURFACEFORMAT_BC7_UNORM, + [MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM] = BRW_SURFACEFORMAT_BC7_UNORM_SRGB, + [MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT] = BRW_SURFACEFORMAT_BC6H_SF16, + [MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT] = BRW_SURFACEFORMAT_BC6H_UF16, + [MESA_FORMAT_A_SNORM8] = 0, [MESA_FORMAT_L_SNORM8] = 0, [MESA_FORMAT_L8A8_SNORM] = 0, diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 4ee8636..b14b9c7 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drivers/dri/i965/intel_extensions.c @@ -302,6 +302,8 @@ intelInitExtensions(struct gl_context *ctx) ctx-Extensions.ARB_viewport_array = true; ctx-Extensions.AMD_vertex_shader_viewport_index = true; } + + ctx-Extensions.ARB_texture_compression_bptc = true; } if (brw-gen = 8) { -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()
I'd just liked to point out that I made a nearly identical patch before this patch was posted but I didn't get any review despite prodding people a few times on #dri-devel. Maybe we should try to get into the habit of searching patchwork for existing patches before posting to the list. Does anyone have any suggestions for how I can get my patches more noticed? http://patchwork.freedesktop.org/patch/27168/ I also made a piglit test for the problem here: http://cgit.freedesktop.org/piglit/commit/?id=108a17a4d78bcc7480754d2104b4 Regards, - Neil Jordan Justen jljus...@gmail.com writes: Reviewed-by: Jordan Justen jordan.l.jus...@intel.com On Wed, Jul 16, 2014 at 3:32 PM, Anuj Phogat anuj.pho...@gmail.com wrote: The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL as base internal format and non-zero x, y offsets. Currently x, y offsets are ignored while updating the texture image. Fixes Khronos GLES3 CTS tests: npot_tex_sub_image_2d npot_tex_sub_image_3d npot_pbo_tex_sub_image_2d npot_pbo_tex_sub_image_2d Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 2ab0faa..b36ffc7 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -2129,9 +2129,9 @@ intel_miptree_unmap_depthstencil(struct brw_context *brw, x + s_image_x + map-x, y + s_image_y + map-y, brw-has_swizzling); - ptrdiff_t z_offset = ((y + z_image_y) * + ptrdiff_t z_offset = ((y + z_image_y + map-y) * (z_mt-pitch / 4) + - (x + z_image_x)); + (x + z_image_x + map-x)); if (map_z32f_x24s8) { z_map[z_offset] = packed_map[(y * map-w + x) * 2 + 0]; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/12] mesa/format_info: Add support for the BPTC layout
This looks fine to me. Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Wed, Aug 6, 2014 at 9:27 AM, Neil Roberts n...@linux.intel.com wrote: Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC depend on the mode but as it only has to be an approximation we can set it to 4 like for S3TC. --- src/mesa/main/format_info.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py index a0eecd3..fc40dc4 100644 --- a/src/mesa/main/format_info.py +++ b/src/mesa/main/format_info.py @@ -110,7 +110,7 @@ def get_channel_bits(fmat, chan_name): if fmat.is_compressed(): # These values are pretty-much bogus, but OpenGL requires that we # return an approximate number of bits. - if fmat.layout == 's3tc': + if fmat.layout in ('s3tc', 'bptc'): return 4 if fmat.has_channel(chan_name) else 0 elif fmat.layout == 'fxt1': if chan_name in 'rgb': -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/12] mesa/format_info: Add support for compressed floating-point formats
Looks fine Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Wed, Aug 6, 2014 at 9:27 AM, Neil Roberts n...@linux.intel.com wrote: If the name of a compressed texture format has ‘FLOAT’ in it it will now set the data type of the format to GL_FLOAT. This will be needed for the BPTC half-float formats. --- src/mesa/main/format_info.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py index 448bd00..a0eecd3 100644 --- a/src/mesa/main/format_info.py +++ b/src/mesa/main/format_info.py @@ -62,7 +62,9 @@ def get_gl_base_format(fmat): def get_gl_data_type(fmat): if fmat.is_compressed(): - if 'SIGNED' in fmat.name or 'SNORM' in fmat.name: + if 'FLOAT' in fmat.name: + return 'GL_FLOAT' + elif 'SIGNED' in fmat.name or 'SNORM' in fmat.name: return 'GL_SIGNED_NORMALIZED' else: return 'GL_UNSIGNED_NORMALIZED' -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/12] mesa/format_info: Add support for the BPTC layout
Sorry, said that just a little early. Do we really want 4 bits for a floating-point format? How many bits does nvidia report? --Jason On Wed, Aug 6, 2014 at 9:55 AM, Jason Ekstrand ja...@jlekstrand.net wrote: This looks fine to me. Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Wed, Aug 6, 2014 at 9:27 AM, Neil Roberts n...@linux.intel.com wrote: Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC depend on the mode but as it only has to be an approximation we can set it to 4 like for S3TC. --- src/mesa/main/format_info.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py index a0eecd3..fc40dc4 100644 --- a/src/mesa/main/format_info.py +++ b/src/mesa/main/format_info.py @@ -110,7 +110,7 @@ def get_channel_bits(fmat, chan_name): if fmat.is_compressed(): # These values are pretty-much bogus, but OpenGL requires that we # return an approximate number of bits. - if fmat.layout == 's3tc': + if fmat.layout in ('s3tc', 'bptc'): return 4 if fmat.has_channel(chan_name) else 0 elif fmat.layout == 'fxt1': if chan_name in 'rgb': -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information
Michael, Could you please point me at the failing tests. I don't have a radeon, but I can run with llvmpipe or dri swrast and try to find the bug that way. --Jason Ekstrand On Wed, Aug 6, 2014 at 2:36 AM, Michel Dänzer mic...@daenzer.net wrote: On 06.08.2014 18:28, Michel Dänzer wrote: On 06.08.2014 03:08, Jason Ekstrand wrote: Module: Mesa Branch: master Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c Author: Jason Ekstrand jason.ekstr...@intel.com Date: Thu Jul 10 23:59:42 2014 -0700 mesa/formats: Add layout and swizzle information v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com Reviewed-by: Brian Paul bri...@vmware.com As of this commit, ~20 depth/stencil related piglit tests have regressed with the radeonsi driver compared to before your changes. See below for an example failure of the draw-pixels test. That test is already broken with the previous commits, each of them with slightly different failure symptoms. I meant to write: 'That test is already broken with the three previous commits, [...]' -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] mesa/formats: Use the correct swizzle parameter for the 11-bit EAC formats
Red-only formats should be x001 and RG formats should be xy01. Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com --- src/mesa/main/formats.csv | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv index 5abb706..f45e34b 100644 --- a/src/mesa/main/formats.csv +++ b/src/mesa/main/formats.csv @@ -274,9 +274,9 @@ MESA_FORMAT_ETC2_RGB8 , etc2 , 4, 4, x64 , , , MESA_FORMAT_ETC2_SRGB8, etc2 , 4, 4, x64 , , , , xyz1, srgb MESA_FORMAT_ETC2_RGBA8_EAC, etc2 , 4, 4, x128, , , , xyzw, rgb MESA_FORMAT_ETC2_SRGB8_ALPHA8_EAC , etc2 , 4, 4, x128, , , , xyzw, srgb -MESA_FORMAT_ETC2_R11_EAC , etc2 , 4, 4, x64 , , , , xyzw, rgb -MESA_FORMAT_ETC2_RG11_EAC , etc2 , 4, 4, x128, , , , xyzw, rgb -MESA_FORMAT_ETC2_SIGNED_R11_EAC , etc2 , 4, 4, x64 , , , , xyzw, rgb -MESA_FORMAT_ETC2_SIGNED_RG11_EAC , etc2 , 4, 4, x128, , , , xyzw, rgb +MESA_FORMAT_ETC2_R11_EAC , etc2 , 4, 4, x64 , , , , x001, rgb +MESA_FORMAT_ETC2_RG11_EAC , etc2 , 4, 4, x128, , , , xy01, rgb +MESA_FORMAT_ETC2_SIGNED_R11_EAC , etc2 , 4, 4, x64 , , , , x001, rgb +MESA_FORMAT_ETC2_SIGNED_RG11_EAC , etc2 , 4, 4, x128, , , , xy01, rgb MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2 , 4, 4, x64 , , , , xyzw, rgb MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2 , 4, 4, x128, , , , xyzw, srgb -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information
Michael, With the two patches I just sent to the list, the generated format_info structure is now binary-identical to the original structure commited to git with the following two exceptions: The string name parameter for MESA_FORMAT_R9G9B9E5_FLOAT was updated from MESA_FORMAT_RGB9_E5 to MESA_FORMAT_R9G9B9E5_FLOAT. The LATC formats now report 8 bits of precision instead of 4; This makes the LATC formats match the RGTC formats which use exactly the same compression just interpreted differently. I'm really confused about where the bug is coming from. That said, I'm going to run some llvmpipe tests to see if I can reproduce. --Jason Ekstrand On Wed, Aug 6, 2014 at 10:02 AM, Jason Ekstrand ja...@jlekstrand.net wrote: Michael, Could you please point me at the failing tests. I don't have a radeon, but I can run with llvmpipe or dri swrast and try to find the bug that way. --Jason Ekstrand On Wed, Aug 6, 2014 at 2:36 AM, Michel Dänzer mic...@daenzer.net wrote: On 06.08.2014 18:28, Michel Dänzer wrote: On 06.08.2014 03:08, Jason Ekstrand wrote: Module: Mesa Branch: master Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c Author: Jason Ekstrand jason.ekstr...@intel.com Date: Thu Jul 10 23:59:42 2014 -0700 mesa/formats: Add layout and swizzle information v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com Reviewed-by: Brian Paul bri...@vmware.com As of this commit, ~20 depth/stencil related piglit tests have regressed with the radeonsi driver compared to before your changes. See below for an example failure of the draw-pixels test. That test is already broken with the previous commits, each of them with slightly different failure symptoms. I meant to write: 'That test is already broken with the three previous commits, [...]' -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] mesa/formats: Fix the size of ETC2_SRGB8_PUNCHTHROUGH_ALPHA1
Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com --- src/mesa/main/formats.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv index f45e34b..eade6fa 100644 --- a/src/mesa/main/formats.csv +++ b/src/mesa/main/formats.csv @@ -279,4 +279,4 @@ MESA_FORMAT_ETC2_RG11_EAC , etc2 , 4, 4, x128, , , MESA_FORMAT_ETC2_SIGNED_R11_EAC , etc2 , 4, 4, x64 , , , , x001, rgb MESA_FORMAT_ETC2_SIGNED_RG11_EAC , etc2 , 4, 4, x128, , , , xy01, rgb MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2 , 4, 4, x64 , , , , xyzw, rgb -MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2 , 4, 4, x128, , , , xyzw, srgb +MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2 , 4, 4, x64 , , , , xyzw, srgb -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): mesa/formats: Add layout and swizzle information
FYI, it seems to be DrawPixels(GL_STENCIL_INDEX) that is broken. We actually use S8 texturing for DrawPixels and some of the functions you changed probably don't support S8 anymore. Marek On Wed, Aug 6, 2014 at 7:37 PM, Jason Ekstrand ja...@jlekstrand.net wrote: Michael, With the two patches I just sent to the list, the generated format_info structure is now binary-identical to the original structure commited to git with the following two exceptions: The string name parameter for MESA_FORMAT_R9G9B9E5_FLOAT was updated from MESA_FORMAT_RGB9_E5 to MESA_FORMAT_R9G9B9E5_FLOAT. The LATC formats now report 8 bits of precision instead of 4; This makes the LATC formats match the RGTC formats which use exactly the same compression just interpreted differently. I'm really confused about where the bug is coming from. That said, I'm going to run some llvmpipe tests to see if I can reproduce. --Jason Ekstrand On Wed, Aug 6, 2014 at 10:02 AM, Jason Ekstrand ja...@jlekstrand.net wrote: Michael, Could you please point me at the failing tests. I don't have a radeon, but I can run with llvmpipe or dri swrast and try to find the bug that way. --Jason Ekstrand On Wed, Aug 6, 2014 at 2:36 AM, Michel Dänzer mic...@daenzer.net wrote: On 06.08.2014 18:28, Michel Dänzer wrote: On 06.08.2014 03:08, Jason Ekstrand wrote: Module: Mesa Branch: master Commit: 850fb0d1dca616179d3239a7b7bd94fe1979604c URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=850fb0d1dca616179d3239a7b7bd94fe1979604c Author: Jason Ekstrand jason.ekstr...@intel.com Date: Thu Jul 10 23:59:42 2014 -0700 mesa/formats: Add layout and swizzle information v2: Move the MESA_FORMAT_SWIZZLE enum to the top of the file Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com Reviewed-by: Brian Paul bri...@vmware.com As of this commit, ~20 depth/stencil related piglit tests have regressed with the radeonsi driver compared to before your changes. See below for an example failure of the draw-pixels test. That test is already broken with the previous commits, each of them with slightly different failure symptoms. I meant to write: 'That test is already broken with the three previous commits, [...]' -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/20] i965/cfg: Add functions to test if a block is a successor/predecessor.
On Tue, Aug 5, 2014 at 10:21 AM, Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Thu, Jul 24, 2014 at 07:54:15PM -0700, Matt Turner wrote: --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 24 src/mesa/drivers/dri/i965/brw_cfg.h | 2 ++ 2 files changed, 26 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index d806b83..9cd8b9f 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -71,6 +71,30 @@ bblock_t::add_successor(void *mem_ctx, bblock_t *successor) children.push_tail(::link(mem_ctx, successor)); } +bool +bblock_t::is_predecessor_of(const bblock_t *block) const +{ + foreach_list_typed_safe (bblock_link, parent, link, block-parents) { I read patch number three again, and noticed this small formatting change there as well. I haven't seen us leaving the space before ( anywhere else. I figure I should have a space between the macro and the ( since we put a space there for regular for loops. + if (parent-block == this) { + return true; + } We have one line blocks with and without {}. I just thought I mention in case you didn't mean to. Right, I'll drop the {}. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/20] i965/cfg: Add a foreach_inst_in_block_safe macro.
On Wed, Aug 6, 2014 at 5:22 AM, Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Tue, Aug 05, 2014 at 09:14:55PM +0300, Pohjolainen, Topi wrote: On Thu, Jul 24, 2014 at 07:54:12PM -0700, Matt Turner wrote: --- src/mesa/drivers/dri/i965/brw_cfg.h | 8 1 file changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.h b/src/mesa/drivers/dri/i965/brw_cfg.h index a5d2df5..913a1ed 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.h +++ b/src/mesa/drivers/dri/i965/brw_cfg.h @@ -120,6 +120,14 @@ struct cfg_t { __inst != __block-end-next; \ __inst = (__type *)__inst-next) +#define foreach_inst_in_block_safe(__type, __inst, __block)\ + for (__type *__inst = (__type *)__block-start, \ + *__next = (__type *)__inst-next, \ + *__end = (__type *)__block-end-next-next;\ Patches 4 and 7 make sense but the double -next-next here is not obvious to me. Right, yep. exec_list uses head and tail sentinels, so the double-next handles that. Explained below: I tried handwriting instructions into blocks (this is purely arbitrary): ipopcode -- 0 : BRW_OPCODE_? .. k : BRW_OPCODE_IF k+1: BRW_OPCODE_? .. n : BRW_OPCODE_ELSE n+1: BRW_OPCODE_? .. m : BRW_OPCODE_ENDIF m+1: BRW_OPCODE_? .. t : BRW_OPCODE_? Following the logic in the constructor of cfg_t, I would deduce this: block 0: start_ip = 0 num = 0 start = inst_0 end = inst_k (if) block 1: start_ip = k+1 num = 1 start = inst_k+1 end = inst_n (else) block 2: start_ip = n+1 num = 2 start = inst_n+1 end = inst_m-1 block 3: start_ip = m num = 3 start = inst_m(endif) end = inst_t And as instructions are inherited from exec_node, for block 3 end-next should be NULL, right? Since exec_list uses head and tail sentinels, so block[3]-end-next will actually be the tail sentinel (and block[2]-end-next will be the first instruction of block[3]). The __end variable prevents us from dereferencing NULL if we remove the last instruction in a block (and therefore remove the block). Note that the continuing condition is (__next != __end). For each block, we want to iterate through the instructions until we hit block-end-next-next because if the block - isn't the last block, end-next-next will be two nodes (I say node, rather than instruction because of the tail sentinel) after the end - is the last block, end-next-next will be NULL. In both cases we want to compare with __next, which after the iteration is one past the node after block-end. Does that make sense? There are really two things to remember: (1) head and tail sentinels, and (2) this macro is _safe, so we're comparing with __next (i.e., one past the end). ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/9] i915: Gen2 texturing fixes and a few random patches
From: Ville Syrjälä ville.syrj...@linux.intel.com I had a few rainy days during my summer vacation so I decided to fix a chromnium-bsu texturing problem that was nagging me for a while now. I ended up fixing a few other things too that I spotted mostly from reading the code. The aniso vs. mip filter thing probably comes down to personal preference, but at least to me aniso+mip nearest looks better than trilinear. At least when playing the old classic glaxium :) I have no idea if the scissor patch makes any difference anywhere. I just caught the note in the spec and noticed we're doing it in the opposite order. The rest should be pretty clear. Ville Syrjälä (9): i915: Only use TEXCOORDTYPE_VECTOR with cube maps on gen2 i915: Fix GL_DOT3_RGBA a bit i915: Use L8A8 instead of I8 to simulate A8 on gen2 i915: Override mip filter to nearest with aniso i915: Accept GL_DEPTH_STENCIL GL_DEPTH_COMPONENT formats for renderbuffers i915: Kill intel_context::hw_stencil i915: Protect macro argument for TEXTURE_SET() i915: Don't call _mesa_meta_glsl_Clear() on gen2 i915: Emit 3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE src/mesa/drivers/dri/i915/i830_context.h| 8 +++--- src/mesa/drivers/dri/i915/i830_reg.h| 2 +- src/mesa/drivers/dri/i915/i830_state.c | 4 +-- src/mesa/drivers/dri/i915/i830_texblend.c | 5 ++-- src/mesa/drivers/dri/i915/i830_texstate.c | 4 +-- src/mesa/drivers/dri/i915/i830_vtbl.c | 39 +++-- src/mesa/drivers/dri/i915/i915_context.c| 3 ++- src/mesa/drivers/dri/i915/i915_context.h| 8 +++--- src/mesa/drivers/dri/i915/i915_state.c | 4 +-- src/mesa/drivers/dri/i915/i915_vtbl.c | 8 +++--- src/mesa/drivers/dri/i915/intel_clear.c | 2 +- src/mesa/drivers/dri/i915/intel_context.c | 1 - src/mesa/drivers/dri/i915/intel_context.h | 1 - src/mesa/drivers/dri/i915/intel_fbo.c | 9 +++ src/mesa/drivers/dri/i915/intel_tex_image.c | 22 15 files changed, 76 insertions(+), 44 deletions(-) -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] i915: Use L8A8 instead of I8 to simulate A8 on gen2
From: Ville Syrjälä ville.syrj...@linux.intel.com Gen2 doesn't support the A8 texture format. Currently the driver substitutes it with I8, but that results in incorrect RGB values. Use A8L8 instead. We end up wasting a bit of memory, but at least we should get the correct results. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72819 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80050 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38873 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/i830_texstate.c | 2 -- src/mesa/drivers/dri/i915/i915_context.c| 3 ++- src/mesa/drivers/dri/i915/intel_tex_image.c | 22 ++ 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i830_texstate.c b/src/mesa/drivers/dri/i915/i830_texstate.c index 58d3356..b1414c7 100644 --- a/src/mesa/drivers/dri/i915/i830_texstate.c +++ b/src/mesa/drivers/dri/i915/i830_texstate.c @@ -47,8 +47,6 @@ translate_texture_format(GLuint mesa_format) return MAPSURF_8BIT | MT_8BIT_L8; case MESA_FORMAT_I_UNORM8: return MAPSURF_8BIT | MT_8BIT_I8; - case MESA_FORMAT_A_UNORM8: - return MAPSURF_8BIT | MT_8BIT_I8; /* Kludge! */ case MESA_FORMAT_L8A8_UNORM: return MAPSURF_16BIT | MT_16BIT_AY88; case MESA_FORMAT_B5G6R5_UNORM: diff --git a/src/mesa/drivers/dri/i915/i915_context.c b/src/mesa/drivers/dri/i915/i915_context.c index 7f43896..3fd571d 100644 --- a/src/mesa/drivers/dri/i915/i915_context.c +++ b/src/mesa/drivers/dri/i915/i915_context.c @@ -109,7 +109,8 @@ intel_init_texture_formats(struct gl_context *ctx) ctx-TextureFormatSupported[MESA_FORMAT_B5G5R5A1_UNORM] = true; ctx-TextureFormatSupported[MESA_FORMAT_B5G6R5_UNORM] = true; ctx-TextureFormatSupported[MESA_FORMAT_L_UNORM8] = true; - ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true; + if (intel-gen == 3) + ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true; ctx-TextureFormatSupported[MESA_FORMAT_I_UNORM8] = true; ctx-TextureFormatSupported[MESA_FORMAT_L8A8_UNORM] = true; diff --git a/src/mesa/drivers/dri/i915/intel_tex_image.c b/src/mesa/drivers/dri/i915/intel_tex_image.c index 57674b9..be9a4ff 100644 --- a/src/mesa/drivers/dri/i915/intel_tex_image.c +++ b/src/mesa/drivers/dri/i915/intel_tex_image.c @@ -14,6 +14,7 @@ #include main/texobj.h #include main/teximage.h #include main/texstore.h +#include main/texformat.h #include intel_context.h #include intel_mipmap_tree.h @@ -362,9 +363,30 @@ intel_image_target_texture_2d(struct gl_context *ctx, GLenum target, image-tile_x, image-tile_y); } +static mesa_format intel_choose_tex_format(struct gl_context *ctx, + GLenum target, + GLint internalFormat, + GLenum format, GLenum type) +{ + struct intel_context *intel = intel_context(ctx); + + switch (internalFormat) { + case GL_ALPHA: + case GL_ALPHA4: + case GL_ALPHA8: + /* no A8 on gen2 :( */ + if (intel-gen == 2) + return MESA_FORMAT_L8A8_UNORM; + /* fall through */ + default: + return _mesa_choose_tex_format(ctx, target, internalFormat, format, type); + } +} + void intelInitTextureImageFuncs(struct dd_function_table *functions) { functions-TexImage = intelTexImage; + functions-ChooseTextureFormat = intel_choose_tex_format; functions-EGLImageTargetTexture2D = intel_image_target_texture_2d; } -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] i915: Only use TEXCOORDTYPE_VECTOR with cube maps on gen2
From: Ville Syrjälä ville.syrj...@linux.intel.com Check that the target is GL_TEXTURE_CUBE_MAP before emitting TEXCOORDTYPE_VECTOR texture coordinates. I'm not sure if the hardware would like CARTESIAN coordinates with cube maps, and as I'm too lazy to find out just emit the VECTOR coordinates for cube maps always. For other targets use CARTESIAN or HOMOGENOUS depending on the number of texture coordinates provided. Fixes rendering of the electric background texture in chromium-bsu main menu. We appear to be provided with three texture coordinates there (I'm guessing due to the funky texture matrix rotation it does). So the code would decide to use TEXCOORDTYPE_VECTOR instead of TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure. The results weren't what one might expect. demos/cubemap still works, which hopefully indicates that this doesn't break things. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/i830_vtbl.c | 37 ++- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i830_vtbl.c b/src/mesa/drivers/dri/i915/i830_vtbl.c index 53d408b..0f22d86 100644 --- a/src/mesa/drivers/dri/i915/i830_vtbl.c +++ b/src/mesa/drivers/dri/i915/i830_vtbl.c @@ -134,27 +134,28 @@ i830_render_start(struct intel_context *intel) GLuint mcs = (i830-state.Tex[i][I830_TEXREG_MCS] ~TEXCOORDTYPE_MASK); -switch (sz) { -case 1: -case 2: - emit = EMIT_2F; - sz = 2; - mcs |= TEXCOORDTYPE_CARTESIAN; - break; -case 3: +if (intel-ctx.Texture.Unit[i]._Current-Target == GL_TEXTURE_CUBE_MAP) { emit = EMIT_3F; sz = 3; mcs |= TEXCOORDTYPE_VECTOR; - break; -case 4: - emit = EMIT_3F_XYW; - sz = 3; - mcs |= TEXCOORDTYPE_HOMOGENEOUS; - break; -default: - continue; -}; - +} else { + switch (sz) { + case 1: + case 2: + case 3: + emit = EMIT_2F; + sz = 2; + mcs |= TEXCOORDTYPE_CARTESIAN; + break; + case 4: + emit = EMIT_3F_XYW; + sz = 3; + mcs |= TEXCOORDTYPE_HOMOGENEOUS; + break; + default: + continue; + } +} EMIT_ATTR(_TNL_ATTRIB_TEX0 + i, emit, 0); v2 |= VRTX_TEX_SET_FMT(count, SZ_TO_HW(sz)); -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] i915: Kill intel_context::hw_stencil
From: Ville Syrjälä ville.syrj...@linux.intel.com ctx.hw_stencil is not used anywhere so kill it. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/intel_context.c | 1 - src/mesa/drivers/dri/i915/intel_context.h | 1 - 2 files changed, 2 deletions(-) diff --git a/src/mesa/drivers/dri/i915/intel_context.c b/src/mesa/drivers/dri/i915/intel_context.c index 3104776..12a1d2b 100644 --- a/src/mesa/drivers/dri/i915/intel_context.c +++ b/src/mesa/drivers/dri/i915/intel_context.c @@ -507,7 +507,6 @@ intelInitContext(struct intel_context *intel, _mesa_meta_init(ctx); - intel-hw_stencil = mesaVis mesaVis-stencilBits mesaVis-depthBits == 24; intel-hw_stipple = 1; intel-RenderIndex = ~0; diff --git a/src/mesa/drivers/dri/i915/intel_context.h b/src/mesa/drivers/dri/i915/intel_context.h index fccf821..c314594 100644 --- a/src/mesa/drivers/dri/i915/intel_context.h +++ b/src/mesa/drivers/dri/i915/intel_context.h @@ -226,7 +226,6 @@ struct intel_context GLfloat polygon_offset_scale;/* dependent on depth_scale, bpp */ - bool hw_stencil; bool hw_stipple; bool no_rast; bool always_flush_batch; -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/9] i915: Accept GL_DEPTH_STENCIL GL_DEPTH_COMPONENT formats for renderbuffers
From: Ville Syrjälä ville.syrj...@linux.intel.com Gen2 doesn't support depth/stencil textures, and since commit c1d4d4999303f9167b20f4e0674b9436e6295cf7 Author: Ville Syrjälä ville.syrj...@linux.intel.com Date: Thu Apr 24 14:11:43 2014 +0300 i915: Don't advertise Z formats in TextureFormatSupported on gen2 depth/stencil formats are no longer accepted as texture formats. However we still want depth/stencil renderbuffers, so add explicit format checks to intel_alloc_renderbuffer_storage() to allow such things. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/intel_fbo.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i915/intel_fbo.c b/src/mesa/drivers/dri/i915/intel_fbo.c index a77c3d6..b260d16 100644 --- a/src/mesa/drivers/dri/i915/intel_fbo.c +++ b/src/mesa/drivers/dri/i915/intel_fbo.c @@ -180,6 +180,15 @@ intel_renderbuffer_format(struct gl_context * ctx, GLenum internalFormat) return intel-ctx.Driver.ChooseTextureFormat(ctx, GL_TEXTURE_2D, internalFormat, GL_NONE, GL_NONE); + + case GL_DEPTH_COMPONENT16: + return MESA_FORMAT_Z_UNORM16; + case GL_DEPTH_COMPONENT: + case GL_DEPTH_COMPONENT24: + case GL_DEPTH_COMPONENT32: + return MESA_FORMAT_Z24_UNORM_X8_UINT; + case GL_DEPTH_STENCIL_EXT: + case GL_DEPTH24_STENCIL8_EXT: case GL_STENCIL_INDEX: case GL_STENCIL_INDEX1_EXT: case GL_STENCIL_INDEX4_EXT: -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 9/9] i915: Emit 3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE
From: Ville Syrjälä ville.syrj...@linux.intel.com According to gen2 BSpec the pipeline must be flushed at least up to the windower before changing the scissor rect enable field. Emitting the 3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE is sufficient to do that. gen3 BSpec no longer has that piece of text, but let's make the same change there too for symmetry. The spec does still say that the scissor rectangle must be defined before enabling it, so the new order does seem more in line with the spec. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/i830_context.h | 8 src/mesa/drivers/dri/i915/i830_state.c | 4 ++-- src/mesa/drivers/dri/i915/i830_vtbl.c| 2 +- src/mesa/drivers/dri/i915/i915_context.h | 8 src/mesa/drivers/dri/i915/i915_state.c | 4 ++-- src/mesa/drivers/dri/i915/i915_vtbl.c| 8 6 files changed, 17 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i830_context.h b/src/mesa/drivers/dri/i915/i830_context.h index 1a7222d..09076c3 100644 --- a/src/mesa/drivers/dri/i915/i830_context.h +++ b/src/mesa/drivers/dri/i915/i830_context.h @@ -55,10 +55,10 @@ #define I830_DESTREG_DBUFADDR1 3 #define I830_DESTREG_DV0 4 #define I830_DESTREG_DV1 5 -#define I830_DESTREG_SENABLE 6 -#define I830_DESTREG_SR0 7 -#define I830_DESTREG_SR1 8 -#define I830_DESTREG_SR2 9 +#define I830_DESTREG_SR0 6 +#define I830_DESTREG_SR1 7 +#define I830_DESTREG_SR2 8 +#define I830_DESTREG_SENABLE 9 #define I830_DESTREG_DRAWRECT0 10 #define I830_DESTREG_DRAWRECT1 11 #define I830_DESTREG_DRAWRECT2 12 diff --git a/src/mesa/drivers/dri/i915/i830_state.c b/src/mesa/drivers/dri/i915/i830_state.c index bae9204..3e379f3 100644 --- a/src/mesa/drivers/dri/i915/i830_state.c +++ b/src/mesa/drivers/dri/i915/i830_state.c @@ -1069,11 +1069,11 @@ i830_init_packets(struct i830_context *i830) i830-state.Stipple[I830_STPREG_ST0] = _3DSTATE_STIPPLE; i830-state.Buffer[I830_DESTREG_DV0] = _3DSTATE_DST_BUF_VARS_CMD; - i830-state.Buffer[I830_DESTREG_SENABLE] = (_3DSTATE_SCISSOR_ENABLE_CMD | - DISABLE_SCISSOR_RECT); i830-state.Buffer[I830_DESTREG_SR0] = _3DSTATE_SCISSOR_RECT_0_CMD; i830-state.Buffer[I830_DESTREG_SR1] = 0; i830-state.Buffer[I830_DESTREG_SR2] = 0; + i830-state.Buffer[I830_DESTREG_SENABLE] = (_3DSTATE_SCISSOR_ENABLE_CMD | + DISABLE_SCISSOR_RECT); } void diff --git a/src/mesa/drivers/dri/i915/i830_vtbl.c b/src/mesa/drivers/dri/i915/i830_vtbl.c index 0f22d86..91da977 100644 --- a/src/mesa/drivers/dri/i915/i830_vtbl.c +++ b/src/mesa/drivers/dri/i915/i830_vtbl.c @@ -511,10 +511,10 @@ i830_emit_state(struct intel_context *intel) OUT_BATCH(state-Buffer[I830_DESTREG_DV0]); OUT_BATCH(state-Buffer[I830_DESTREG_DV1]); - OUT_BATCH(state-Buffer[I830_DESTREG_SENABLE]); OUT_BATCH(state-Buffer[I830_DESTREG_SR0]); OUT_BATCH(state-Buffer[I830_DESTREG_SR1]); OUT_BATCH(state-Buffer[I830_DESTREG_SR2]); + OUT_BATCH(state-Buffer[I830_DESTREG_SENABLE]); assert(state-Buffer[I830_DESTREG_DRAWRECT0] != MI_NOOP); OUT_BATCH(state-Buffer[I830_DESTREG_DRAWRECT0]); diff --git a/src/mesa/drivers/dri/i915/i915_context.h b/src/mesa/drivers/dri/i915/i915_context.h index 34af202..10f1f8b 100644 --- a/src/mesa/drivers/dri/i915/i915_context.h +++ b/src/mesa/drivers/dri/i915/i915_context.h @@ -64,10 +64,10 @@ #define I915_DESTREG_DBUFADDR1 4 #define I915_DESTREG_DV0 6 #define I915_DESTREG_DV1 7 -#define I915_DESTREG_SENABLE 8 -#define I915_DESTREG_SR0 9 -#define I915_DESTREG_SR1 10 -#define I915_DESTREG_SR2 11 +#define I915_DESTREG_SR0 8 +#define I915_DESTREG_SR1 9 +#define I915_DESTREG_SR2 10 +#define I915_DESTREG_SENABLE 11 #define I915_DESTREG_DRAWRECT0 12 #define I915_DESTREG_DRAWRECT1 13 #define I915_DESTREG_DRAWRECT2 14 diff --git a/src/mesa/drivers/dri/i915/i915_state.c b/src/mesa/drivers/dri/i915/i915_state.c index f31b271..203e4a0 100644 --- a/src/mesa/drivers/dri/i915/i915_state.c +++ b/src/mesa/drivers/dri/i915/i915_state.c @@ -988,11 +988,11 @@ i915_init_packets(struct i915_context *i915) i915-state.Buffer[I915_DESTREG_DV0] = _3DSTATE_DST_BUF_VARS_CMD; /* scissor */ - i915-state.Buffer[I915_DESTREG_SENABLE] = - (_3DSTATE_SCISSOR_ENABLE_CMD | DISABLE_SCISSOR_RECT); i915-state.Buffer[I915_DESTREG_SR0] = _3DSTATE_SCISSOR_RECT_0_CMD; i915-state.Buffer[I915_DESTREG_SR1] = 0; i915-state.Buffer[I915_DESTREG_SR2] = 0; + i915-state.Buffer[I915_DESTREG_SENABLE] = + (_3DSTATE_SCISSOR_ENABLE_CMD | DISABLE_SCISSOR_RECT); } i915-state.RasterRules[I915_RASTER_RULES] = _3DSTATE_RASTER_RULES_CMD | diff --git a/src/mesa/drivers/dri/i915/i915_vtbl.c b/src/mesa/drivers/dri/i915/i915_vtbl.c index 74173d4..706e0c3 100644 --- a/src/mesa/drivers/dri/i915/i915_vtbl.c +++
[Mesa-dev] [PATCH 8/9] i915: Don't call _mesa_meta_glsl_Clear() on gen2
From: Ville Syrjälä ville.syrj...@linux.intel.com Gen2 doesn't have fragmnts shaders so we shouldn't be calling _mesa_meta_glsl_Clear() on gen2. Restore the appropriate ARB_fragment_shader check to the clear path which was lost in: commit 94f22fbe787214580a1a13a774114d2650c166cb Author: Tapani Pälli tapani.pa...@intel.com Date: Wed Aug 8 20:46:45 2012 +0300 intel: use _mesa_meta_Clear with OpenGL ES 1.1 v2 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/intel_clear.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i915/intel_clear.c b/src/mesa/drivers/dri/i915/intel_clear.c index 1762c1d..5374e19 100644 --- a/src/mesa/drivers/dri/i915/intel_clear.c +++ b/src/mesa/drivers/dri/i915/intel_clear.c @@ -179,7 +179,7 @@ intelClear(struct gl_context *ctx, GLbitfield mask) if (tri_mask) { debug_mask(tri, tri_mask); - if (ctx-API == API_OPENGLES) + if (ctx-API == API_OPENGLES || !ctx-Extensions.ARB_fragment_shader) _mesa_meta_Clear(intel-ctx, tri_mask); else _mesa_meta_glsl_Clear(intel-ctx, tri_mask); -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] i915: Protect macro argument for TEXTURE_SET()
From: Ville Syrjälä ville.syrj...@linux.intel.com TEXTURE_SET() is the only register macro that forgets to wrap the argument evaluation in parens. Only simple integers are passed to this macro so there's no bug but sitll it seems prudent to add the parens. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/i830_reg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i915/i830_reg.h b/src/mesa/drivers/dri/i915/i830_reg.h index e08cbe5..d7ccc16 100644 --- a/src/mesa/drivers/dri/i915/i830_reg.h +++ b/src/mesa/drivers/dri/i915/i830_reg.h @@ -256,7 +256,7 @@ #define _3DSTATE_MAP_COORD_TRANSFORM ((329)|(0x1d24)|(0x8c16)) #define DISABLE_TEX_TRANSFORM (128) -#define TEXTURE_SET(x) (x29) +#define TEXTURE_SET(x) ((x)29) #define _3DSTATE_VERTEX_TRANSFORM ((329)|(0x1d24)|(0x8b16)) #define DISABLE_VIEWPORT_TRANSFORM (131) -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] i915: Override mip filter to nearest with aniso
From: Ville Syrjälä ville.syrj...@linux.intel.com gen2 doesn't supporte linear mip filter with anisotropic min/mag filtering. The hardware would automagically downgrade the min/mag filters to linear in such cases, which IMO looks worse than forcing the mip filter to nearest. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/i830_texstate.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i915/i830_texstate.c b/src/mesa/drivers/dri/i915/i830_texstate.c index b1414c7..00731e6 100644 --- a/src/mesa/drivers/dri/i915/i830_texstate.c +++ b/src/mesa/drivers/dri/i915/i830_texstate.c @@ -225,6 +225,8 @@ i830_update_tex_unit(struct intel_context *intel, GLuint unit, GLuint ss3) if (sampler-MaxAnisotropy 1.0) { minFilt = FILTER_ANISOTROPIC; magFilt = FILTER_ANISOTROPIC; + /* no trilinear + anisotropic */ + mipFilt = MIPFILTER_NEAREST; } else { switch (sampler-MagFilter) { -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/9] i915: Fix GL_DOT3_RGBA a bit
From: Ville Syrjälä ville.syrj...@linux.intel.com The spec says using DOT4 for alpha is undefined unless DOT4 is also used for color. It seems to do the right thing anyway, but better safe than sorry. Also override numAlphaArgs to 2 for DOT4 since that's what it wants. This migth fix something in case the specified alpha mode has only one argument. Also avoids emitting a needless 3DSTATE_MAP_BLEND_ARG if the specified alpha mode has three arguments. Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/i830_texblend.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i830_texblend.c b/src/mesa/drivers/dri/i915/i830_texblend.c index 236be59..feea383 100644 --- a/src/mesa/drivers/dri/i915/i830_texblend.c +++ b/src/mesa/drivers/dri/i915/i830_texblend.c @@ -136,7 +136,7 @@ i830SetTexEnvCombine(struct i830_context * i830, GLuint texel_op, GLuint * state, const GLfloat * factor) { const GLuint numColorArgs = combine-_NumArgsRGB; - const GLuint numAlphaArgs = combine-_NumArgsA; + GLuint numAlphaArgs = combine-_NumArgsA; GLuint blendop; GLuint ablendop; @@ -209,7 +209,7 @@ i830SetTexEnvCombine(struct i830_context * i830, break; case GL_DOT3_RGBA_EXT: case GL_DOT3_RGBA: - blendop = TEXBLENDOP_DOT3; + blendop = TEXBLENDOP_DOT4; break; default: return pass_through(state, blendUnit); @@ -273,6 +273,7 @@ i830SetTexEnvCombine(struct i830_context * i830, if (combine-ModeRGB == GL_DOT3_RGBA_EXT || combine-ModeRGB == GL_DOT3_RGBA) { ablendop = TEXBLENDOP_DOT4; + numAlphaArgs = 2; args_A[0] = TEXBLENDARG_FACTOR; /* the global factor */ args_A[1] = TEXBLENDARG_FACTOR; args_A[2] = TEXBLENDARG_FACTOR; -- 1.8.5.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()
On Wed, Aug 6, 2014 at 9:50 AM, Neil Roberts n...@linux.intel.com wrote: I'd just liked to point out that I made a nearly identical patch before this patch was posted but I didn't get any review despite prodding people a few times on #dri-devel. Maybe we should try to get into the habit of searching patchwork for existing patches before posting to the list. Does anyone have any suggestions for how I can get my patches more noticed? I'm sorry Neil. I should've looked first on the mailing list to avoid this duplication of efforts. I'll try to look more often in patchwork for the patches pending review. I would also request everyone to keep the status of their patches on patchwork updated. That'll make it easier to go through pending patches. I usually send out a reminder on the mailing list to attract wider attention for my patches. http://patchwork.freedesktop.org/patch/27168/ I also made a piglit test for the problem here: http://cgit.freedesktop.org/piglit/commit/?id=108a17a4d78bcc7480754d2104b4 Regards, - Neil Jordan Justen jljus...@gmail.com writes: Reviewed-by: Jordan Justen jordan.l.jus...@intel.com On Wed, Jul 16, 2014 at 3:32 PM, Anuj Phogat anuj.pho...@gmail.com wrote: The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL as base internal format and non-zero x, y offsets. Currently x, y offsets are ignored while updating the texture image. Fixes Khronos GLES3 CTS tests: npot_tex_sub_image_2d npot_tex_sub_image_3d npot_pbo_tex_sub_image_2d npot_pbo_tex_sub_image_2d Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 2ab0faa..b36ffc7 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -2129,9 +2129,9 @@ intel_miptree_unmap_depthstencil(struct brw_context *brw, x + s_image_x + map-x, y + s_image_y + map-y, brw-has_swizzling); - ptrdiff_t z_offset = ((y + z_image_y) * + ptrdiff_t z_offset = ((y + z_image_y + map-y) * (z_mt-pitch / 4) + - (x + z_image_x)); + (x + z_image_x + map-x)); if (map_z32f_x24s8) { z_map[z_offset] = packed_map[(y * map-w + x) * 2 + 0]; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/11] i965/blorp: Put sampler index in src1 of texture ops
Yes -- there's no interaction between patches 3 and 4. Blorp only shares the generator. On Tue, Aug 5, 2014 at 1:32 PM, Ian Romanick i...@freedesktop.org wrote: Does it still build with patch 3 without patch 4? On 08/04/2014 01:58 AM, Chris Forbes wrote: Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index c1676a9..7d4b327 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -78,7 +78,8 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct brw_reg dst, unsigned base_mrf, unsigned msg_length) { - fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf)); + fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf), + fs_reg(0u)); inst-base_mrf = base_mrf; inst-mlen = msg_length; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 0/12] Add support for BPTC texture compression
Does this actually work on all Gen7? The IVB PRM Vol 4 Part 1 Page 83 says: Errata: BC6H_SF16, BC6H_UF16, and BC7_SRGB are not supported and may result in data corruption if used. On Thu, Aug 7, 2014 at 4:27 AM, Neil Roberts n...@linux.intel.com wrote: Here is a v2 of the BPTC texture compression series. The main difference is that instead of going via DXT3 for the UNORM formats it now always uses the custom naïve compressor for all formats. This doesn't give very good-looking results but it is fast and doesn't add any dependencies. There was some discussion about alternative approaches on the list here: http://lists.freedesktop.org/archives/mesa-dev/2014-July/064436.html I didn't manage to get any consensus on whether this approach is the right thing to do so I thought I would just post the patches and see what happens. The other changes are: • The patches are rebased on top of Jason Ekstrand's texstore changes. This required some modification to format_info.py. • Added a patch to make glGenerateMipmap work with the BPTC formats. • Added a patch to make the meta implementation of glGetTexImage work with the two floating-point formats. • Added the formats to some format query functions that were missed. (There are a lot of switches for formats spread around Mesa!) • Fixed setting the alpha component to 1.0 when fetching from the RGB half-float formats. • Fixed fetching the alpha component from sRGB formats. • Fixed the quantization step for the half-float compressor. • Fixed a typo causing a bug in the compressor for textures with a width that isn't a multiple of four. The patches are also available on Github here: https://github.com/bpeel/mesa/commits/wip/bptc There are piglit tests for BPTC in a branch here: https://github.com/bpeel/piglit/commits/wip/bptc - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Merging VC4 driver
I'd like to start merging the VC4 driver. I've got a lot of things working under sim (piglit's at 5212/6726 for a slightly-trimmed quick.py), and once I find where I put my serial cable I hope to get the kernel fixed up and passing even more than that on HW. I'm at 80 commits right now, with 3 initial huge commits then actual incremental development. I don't think other people are going to want to review all of this (45 files changed, 9277 insertions(+), 5 deletions(-)), so I'm feeling ready to go ahead on my own. What I'll throw out for (maybe) review, though, is the stuff outside of the driver: diff --git a/configure.ac b/configure.ac index a3b3abd..9679c4c 100644 --- a/configure.ac +++ b/configure.ac @@ -724,7 +724,7 @@ GALLIUM_DRIVERS_DEFAULT=r300,r600,svga,swrast AC_ARG_WITH([gallium-drivers], [AS_HELP_STRING([--with-gallium-drivers@:@=DIRS...@:@], [comma delimited Gallium drivers list, e.g. -i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast +i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4 @:@default=r300,r600,svga,swrast@:@])], [with_gallium_drivers=$withval], [with_gallium_drivers=$GALLIUM_DRIVERS_DEFAULT]) @@ -2003,6 +2003,19 @@ if test -n $with_gallium_drivers; then GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri/kms-swrast fi ;; +xvc4) +HAVE_GALLIUM_VC4=yes +gallium_require_drm_loader +GALLIUM_DRIVERS_DIRS=$GALLIUM_DRIVERS_DIRS vc4 +gallium_check_st vc4/drm dri-vc4 +DRICOMMON_NEED_LIBDRM=yes + +case $host_cpu in +i?86 | x86_64 | amd64) +USE_VC4_SIMULATOR=yes +;; +esac +;; *) AC_MSG_ERROR([Unknown Gallium driver: $driver]) ;; @@ -2064,6 +2077,7 @@ AM_CONDITIONAL(HAVE_GALLIUM_NOUVEAU, test x$HAVE_GALLIUM_NOUVEAU = xyes) AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test x$HAVE_GALLIUM_FREEDRENO = xyes) AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test x$HAVE_GALLIUM_SOFTPIPE = xyes) AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test x$HAVE_GALLIUM_LLVMPIPE = xyes) +AM_CONDITIONAL(HAVE_GALLIUM_VC4, test x$HAVE_GALLIUM_VC4 = xyes) AM_CONDITIONAL(NEED_GALLIUM_SOFTPIPE_DRIVER, test x$HAVE_GALLIUM_SVGA = xyes -o \ x$HAVE_GALLIUM_SOFTPIPE = xyes) @@ -2129,6 +2143,7 @@ AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test x$enable_gallium_loader = xyes) AM_CONDITIONAL(HAVE_DRM_LOADER_GALLIUM, test x$enable_gallium_drm_loader = xyes) AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes) AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1) +AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes) AC_SUBST([ELF_LIB]) @@ -2201,6 +2216,7 @@ AC_CONFIG_FILES([Makefile src/gallium/drivers/softpipe/Makefile src/gallium/drivers/svga/Makefile src/gallium/drivers/trace/Makefile + src/gallium/drivers/vc4/Makefile src/gallium/state_trackers/Makefile src/gallium/state_trackers/clover/Makefile src/gallium/state_trackers/dri/Makefile @@ -2243,6 +2259,7 @@ AC_CONFIG_FILES([Makefile src/gallium/winsys/sw/wayland/Makefile src/gallium/winsys/sw/wrapper/Makefile src/gallium/winsys/sw/xlib/Makefile + src/gallium/winsys/vc4/drm/Makefile src/gbm/Makefile src/gbm/main/gbm.pc src/glsl/Makefile diff --git a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h index 5d02da7..4ef94de 100644 --- a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h +++ b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h @@ -54,6 +54,10 @@ #include freedreno/drm/freedreno_drm_public.h #endif +#if GALLIUM_VC4 +#include vc4/drm/vc4_drm_public.h +#endif + static char* driver_name = NULL; /* XXX: We need to teardown the winsys if *screen_create() fails. */ @@ -286,6 +290,48 @@ pipe_freedreno_create_screen(int fd) } #endif +#if defined(GALLIUM_VC4) +#if defined(DRI_TARGET) + +const __DRIextension **__driDriverGetExtensions_vc4(void); + +PUBLIC const __DRIextension **__driDriverGetExtensions_vc4(void) +{ + globalDriverAPI = galliumdrm_driver_api; + return galliumdrm_driver_extensions; +} + +#if defined(USE_VC4_SIMULATOR) +const __DRIextension **__driDriverGetExtensions_i965(void); + +/** + * When building using the simulator (on x86), we advertise ourselves as the + * i965 driver so that you can just make a directory with a link from + * i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that + * on your i965-using host to run the driver under simulation. + * + * This is, of course, incompatible with building with the ilo driver, but you + * shouldn't be building that anyway.
Re: [Mesa-dev] [PATCH 3/9] i915: Use L8A8 instead of I8 to simulate A8 on gen2
ville.syrj...@linux.intel.com writes: From: Ville Syrjälä ville.syrj...@linux.intel.com Gen2 doesn't support the A8 texture format. Currently the driver substitutes it with I8, but that results in incorrect RGB values. Use A8L8 instead. We end up wasting a bit of memory, but at least we should get the correct results. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72819 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80050 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38873 Signed-off-by: Ville Syrjälä ville.syrj...@linux.intel.com --- src/mesa/drivers/dri/i915/i830_texstate.c | 2 -- src/mesa/drivers/dri/i915/i915_context.c| 3 ++- src/mesa/drivers/dri/i915/intel_tex_image.c | 22 ++ 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i830_texstate.c b/src/mesa/drivers/dri/i915/i830_texstate.c index 58d3356..b1414c7 100644 --- a/src/mesa/drivers/dri/i915/i830_texstate.c +++ b/src/mesa/drivers/dri/i915/i830_texstate.c @@ -47,8 +47,6 @@ translate_texture_format(GLuint mesa_format) return MAPSURF_8BIT | MT_8BIT_L8; case MESA_FORMAT_I_UNORM8: return MAPSURF_8BIT | MT_8BIT_I8; - case MESA_FORMAT_A_UNORM8: - return MAPSURF_8BIT | MT_8BIT_I8; /* Kludge! */ case MESA_FORMAT_L8A8_UNORM: return MAPSURF_16BIT | MT_16BIT_AY88; case MESA_FORMAT_B5G6R5_UNORM: diff --git a/src/mesa/drivers/dri/i915/i915_context.c b/src/mesa/drivers/dri/i915/i915_context.c index 7f43896..3fd571d 100644 --- a/src/mesa/drivers/dri/i915/i915_context.c +++ b/src/mesa/drivers/dri/i915/i915_context.c @@ -109,7 +109,8 @@ intel_init_texture_formats(struct gl_context *ctx) ctx-TextureFormatSupported[MESA_FORMAT_B5G5R5A1_UNORM] = true; ctx-TextureFormatSupported[MESA_FORMAT_B5G6R5_UNORM] = true; ctx-TextureFormatSupported[MESA_FORMAT_L_UNORM8] = true; - ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true; + if (intel-gen == 3) + ctx-TextureFormatSupported[MESA_FORMAT_A_UNORM8] = true; ctx-TextureFormatSupported[MESA_FORMAT_I_UNORM8] = true; ctx-TextureFormatSupported[MESA_FORMAT_L8A8_UNORM] = true; diff --git a/src/mesa/drivers/dri/i915/intel_tex_image.c b/src/mesa/drivers/dri/i915/intel_tex_image.c index 57674b9..be9a4ff 100644 --- a/src/mesa/drivers/dri/i915/intel_tex_image.c +++ b/src/mesa/drivers/dri/i915/intel_tex_image.c @@ -14,6 +14,7 @@ #include main/texobj.h #include main/teximage.h #include main/texstore.h +#include main/texformat.h #include intel_context.h #include intel_mipmap_tree.h @@ -362,9 +363,30 @@ intel_image_target_texture_2d(struct gl_context *ctx, GLenum target, image-tile_x, image-tile_y); } +static mesa_format intel_choose_tex_format(struct gl_context *ctx, + GLenum target, + GLint internalFormat, + GLenum format, GLenum type) +{ + struct intel_context *intel = intel_context(ctx); + + switch (internalFormat) { + case GL_ALPHA: + case GL_ALPHA4: + case GL_ALPHA8: + /* no A8 on gen2 :( */ + if (intel-gen == 2) + return MESA_FORMAT_L8A8_UNORM; + /* fall through */ + default: + return _mesa_choose_tex_format(ctx, target, internalFormat, format, type); + } +} Instead, I'd rather see _mesa_choose_tex_format just grow another case: RETURN_IF_SUPPORTED(MESA_FORMAT_L8A8_UNORM); pgpOsvZTR2qMU.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/11] i965/blorp: Put sampler index in src1 of texture ops
On 08/06/2014 12:40 PM, Chris Forbes wrote: Yes -- there's no interaction between patches 3 and 4. Blorp only shares the generator. Okay. Then these two are also Reviewed-by: Ian Romanick ian.d.roman...@intel.com On Tue, Aug 5, 2014 at 1:32 PM, Ian Romanick i...@freedesktop.org wrote: Does it still build with patch 3 without patch 4? On 08/04/2014 01:58 AM, Chris Forbes wrote: Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index c1676a9..7d4b327 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -78,7 +78,8 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct brw_reg dst, unsigned base_mrf, unsigned msg_length) { - fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf)); + fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf), + fs_reg(0u)); inst-base_mrf = base_mrf; inst-mlen = msg_length; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 8/9] i915: Don't call _mesa_meta_glsl_Clear() on gen2
ville.syrj...@linux.intel.com writes: From: Ville Syrjälä ville.syrj...@linux.intel.com Gen2 doesn't have fragmnts shaders so we shouldn't be calling spelling^ Other than that, patches 4-9 are: Reviewed-by: Eric Anholt e...@anholt.net pgp3isQE9B0y5.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/11] i965/blorp: Put sampler index in src1 of texture ops
OK, assuming no one complains, I'll push the series later today. On Thu, Aug 7, 2014 at 8:44 AM, Ian Romanick i...@freedesktop.org wrote: On 08/06/2014 12:40 PM, Chris Forbes wrote: Yes -- there's no interaction between patches 3 and 4. Blorp only shares the generator. Okay. Then these two are also Reviewed-by: Ian Romanick ian.d.roman...@intel.com On Tue, Aug 5, 2014 at 1:32 PM, Ian Romanick i...@freedesktop.org wrote: Does it still build with patch 3 without patch 4? On 08/04/2014 01:58 AM, Chris Forbes wrote: Signed-off-by: Chris Forbes chr...@ijw.co.nz --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index c1676a9..7d4b327 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -78,7 +78,8 @@ brw_blorp_eu_emitter::emit_texture_lookup(const struct brw_reg dst, unsigned base_mrf, unsigned msg_length) { - fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf)); + fs_inst *inst = new (mem_ctx) fs_inst(op, dst, brw_message_reg(base_mrf), + fs_reg(0u)); inst-base_mrf = base_mrf; inst-mlen = msg_length; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Merging VC4 driver
Am 06.08.2014 22:33, schrieb Eric Anholt: I'd like to start merging the VC4 driver. I've got a lot of things working under sim (piglit's at 5212/6726 for a slightly-trimmed quick.py), and once I find where I put my serial cable I hope to get the kernel fixed up and passing even more than that on HW. I'm at 80 commits right now, with 3 initial huge commits then actual incremental development. I don't think other people are going to want to review all of this (45 files changed, 9277 insertions(+), 5 deletions(-)), so I'm feeling ready to go ahead on my own. What I'll throw out for (maybe) review, though, is the stuff outside of the driver: diff --git a/configure.ac b/configure.ac index a3b3abd..9679c4c 100644 --- a/configure.ac +++ b/configure.ac @@ -724,7 +724,7 @@ GALLIUM_DRIVERS_DEFAULT=r300,r600,svga,swrast AC_ARG_WITH([gallium-drivers], [AS_HELP_STRING([--with-gallium-drivers@:@=DIRS...@:@], [comma delimited Gallium drivers list, e.g. -i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast +i915,ilo,nouveau,r300,r600,radeonsi,freedreno,svga,swrast,vc4 @:@default=r300,r600,svga,swrast@:@])], [with_gallium_drivers=$withval], [with_gallium_drivers=$GALLIUM_DRIVERS_DEFAULT]) @@ -2003,6 +2003,19 @@ if test -n $with_gallium_drivers; then GALLIUM_TARGET_DIRS=$GALLIUM_TARGET_DIRS dri/kms-swrast fi ;; +xvc4) +HAVE_GALLIUM_VC4=yes +gallium_require_drm_loader +GALLIUM_DRIVERS_DIRS=$GALLIUM_DRIVERS_DIRS vc4 +gallium_check_st vc4/drm dri-vc4 +DRICOMMON_NEED_LIBDRM=yes + +case $host_cpu in +i?86 | x86_64 | amd64) +USE_VC4_SIMULATOR=yes +;; +esac +;; *) AC_MSG_ERROR([Unknown Gallium driver: $driver]) ;; @@ -2064,6 +2077,7 @@ AM_CONDITIONAL(HAVE_GALLIUM_NOUVEAU, test x$HAVE_GALLIUM_NOUVEAU = xyes) AM_CONDITIONAL(HAVE_GALLIUM_FREEDRENO, test x$HAVE_GALLIUM_FREEDRENO = xyes) AM_CONDITIONAL(HAVE_GALLIUM_SOFTPIPE, test x$HAVE_GALLIUM_SOFTPIPE = xyes) AM_CONDITIONAL(HAVE_GALLIUM_LLVMPIPE, test x$HAVE_GALLIUM_LLVMPIPE = xyes) +AM_CONDITIONAL(HAVE_GALLIUM_VC4, test x$HAVE_GALLIUM_VC4 = xyes) AM_CONDITIONAL(NEED_GALLIUM_SOFTPIPE_DRIVER, test x$HAVE_GALLIUM_SVGA = xyes -o \ x$HAVE_GALLIUM_SOFTPIPE = xyes) @@ -2129,6 +2143,7 @@ AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test x$enable_gallium_loader = xyes) AM_CONDITIONAL(HAVE_DRM_LOADER_GALLIUM, test x$enable_gallium_drm_loader = xyes) AM_CONDITIONAL(HAVE_GALLIUM_COMPUTE, test x$enable_opencl = xyes) AM_CONDITIONAL(HAVE_MESA_LLVM, test x$MESA_LLVM = x1) +AM_CONDITIONAL(USE_VC4_SIMULATOR, test x$USE_VC4_SIMULATOR = xyes) AC_SUBST([ELF_LIB]) @@ -2201,6 +2216,7 @@ AC_CONFIG_FILES([Makefile src/gallium/drivers/softpipe/Makefile src/gallium/drivers/svga/Makefile src/gallium/drivers/trace/Makefile + src/gallium/drivers/vc4/Makefile src/gallium/state_trackers/Makefile src/gallium/state_trackers/clover/Makefile src/gallium/state_trackers/dri/Makefile @@ -2243,6 +2259,7 @@ AC_CONFIG_FILES([Makefile src/gallium/winsys/sw/wayland/Makefile src/gallium/winsys/sw/wrapper/Makefile src/gallium/winsys/sw/xlib/Makefile + src/gallium/winsys/vc4/drm/Makefile src/gbm/Makefile src/gbm/main/gbm.pc src/glsl/Makefile diff --git a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h index 5d02da7..4ef94de 100644 --- a/src/gallium/auxiliary/target-helpers/inline_drm_helper.h +++ b/src/gallium/auxiliary/target-helpers/inline_drm_helper.h @@ -54,6 +54,10 @@ #include freedreno/drm/freedreno_drm_public.h #endif +#if GALLIUM_VC4 +#include vc4/drm/vc4_drm_public.h +#endif + static char* driver_name = NULL; /* XXX: We need to teardown the winsys if *screen_create() fails. */ @@ -286,6 +290,48 @@ pipe_freedreno_create_screen(int fd) } #endif +#if defined(GALLIUM_VC4) +#if defined(DRI_TARGET) + +const __DRIextension **__driDriverGetExtensions_vc4(void); + +PUBLIC const __DRIextension **__driDriverGetExtensions_vc4(void) +{ + globalDriverAPI = galliumdrm_driver_api; + return galliumdrm_driver_extensions; +} + +#if defined(USE_VC4_SIMULATOR) +const __DRIextension **__driDriverGetExtensions_i965(void); + +/** + * When building using the simulator (on x86), we advertise ourselves as the + * i965 driver so that you can just make a directory with a link from + * i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that + * on your i965-using host to run the driver
[Mesa-dev] [PATCH 3/3] r600, radeonsi: Copy implicit args provided by clover
Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- src/gallium/drivers/r600/evergreen_compute.c | 14 -- src/gallium/drivers/r600/evergreen_compute.h | 1 - src/gallium/drivers/radeonsi/si_compute.c| 6 +++--- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index d50f343..37910fb 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -268,11 +268,12 @@ static void evergreen_bind_compute_state(struct pipe_context *ctx_, void *state) * (x,y,z) * DWORDS 9+ : Kernel parameters */ -void evergreen_compute_upload_input( +static void evergreen_compute_upload_input( struct pipe_context *ctx_, const uint *block_layout, const uint *grid_layout, - const void *input) + const void *input, + size_t kinput_size) { struct r600_context *ctx = (struct r600_context *)ctx_; struct r600_pipe_compute *shader = ctx-cs_shader_state.shader; @@ -280,7 +281,7 @@ void evergreen_compute_upload_input( /* We need to reserve 9 dwords (36 bytes) for implicit kernel * parameters. */ - unsigned input_size = shader-input_size + 36; + unsigned input_size = kinput_size + 36; uint32_t * num_work_groups_start; uint32_t * global_size_start; uint32_t * local_size_start; @@ -320,7 +321,7 @@ void evergreen_compute_upload_input( memcpy(local_size_start, block_layout, 3 * sizeof(uint)); /* Copy the kernel inputs */ - memcpy(kernel_parameters_start, input, shader-input_size); + memcpy(kernel_parameters_start, input, kinput_size); for (i = 0; i (input_size / 4); i++) { COMPUTE_DBG(ctx-screen, input %i : %u\n, i, @@ -541,7 +542,7 @@ void evergreen_emit_cs_shader( static void evergreen_launch_grid( struct pipe_context *ctx_, const uint *block_layout, const uint *grid_layout, - uint32_t pc, const void *input, size_t size) + uint32_t pc, const void *input, size_t input_size) { struct r600_context *ctx = (struct r600_context *)ctx_; @@ -584,7 +585,8 @@ static void evergreen_launch_grid( #endif shader-active_kernel = kernel; ctx-cs_shader_state.kernel_index = pc; - evergreen_compute_upload_input(ctx_, block_layout, grid_layout, input); + evergreen_compute_upload_input(ctx_, block_layout, grid_layout, input, + input_size); compute_emit_cs(ctx, block_layout, grid_layout); } diff --git a/src/gallium/drivers/r600/evergreen_compute.h b/src/gallium/drivers/r600/evergreen_compute.h index 4fb53a1..570ab2a 100644 --- a/src/gallium/drivers/r600/evergreen_compute.h +++ b/src/gallium/drivers/r600/evergreen_compute.h @@ -40,7 +40,6 @@ struct r600_resource_global { void *evergreen_create_compute_state(struct pipe_context *ctx, const struct pipe_compute_state *cso); void evergreen_delete_compute_state(struct pipe_context *ctx, void *state); -void evergreen_compute_upload_input(struct pipe_context *context, const uint *block_layout, const uint *grid_layout, const void *input); void evergreen_init_atom_start_compute_cs(struct r600_context *rctx); void evergreen_init_compute_state_functions(struct r600_context *rctx); void evergreen_emit_cs_shader(struct r600_context *rctx, struct r600_atom * atom); diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 9a90470..66df65f 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -162,7 +162,7 @@ static unsigned compute_num_waves_for_scratch( static void si_launch_grid( struct pipe_context *ctx, const uint *block_layout, const uint *grid_layout, - uint32_t pc, const void *input, size_t size) + uint32_t pc, const void *input, size_t input_size) { struct si_context *sctx = (struct si_context*)ctx; struct si_pipe_compute *program = sctx-cs_shader_state.program; @@ -197,7 +197,7 @@ static void si_launch_grid( /* Upload the kernel arguments */ /* The extra num_work_size_bytes are for work group / work item size information */ - kernel_args_size = program-input_size + num_work_size_bytes + 8 /* For scratch va */; + kernel_args_size = input_size + num_work_size_bytes + 8 /* For scratch va */; kernel_args = MALLOC(kernel_args_size); for (i = 0; i 3; i++) { @@ -209,7 +209,7 @@ static void si_launch_grid( num_waves_for_scratch = compute_num_waves_for_scratch( sctx-screen-b.info, block_layout, grid_layout); - memcpy(kernel_args + (num_work_size_bytes / 4), input, program-input_size); + memcpy(kernel_args + (num_work_size_bytes / 4), input,
[Mesa-dev] [PATCH 2/3] clover: Add work dimension implicit param to input
Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- src/gallium/state_trackers/clover/core/kernel.cpp | 162 -- 1 file changed, 85 insertions(+), 77 deletions(-) diff --git a/src/gallium/state_trackers/clover/core/kernel.cpp b/src/gallium/state_trackers/clover/core/kernel.cpp index 68e91d5..7a88de1 100644 --- a/src/gallium/state_trackers/clover/core/kernel.cpp +++ b/src/gallium/state_trackers/clover/core/kernel.cpp @@ -28,6 +28,82 @@ using namespace clover; +namespace { + templatetypename T + std::vectoruint8_t + bytes(const T x) { + return { (uint8_t *)x, (uint8_t *)x + sizeof(x) }; + } + + /// + /// Transform buffer \a v from the native byte order into the byte + /// order specified by \a e. + /// + templatetypename T + void + byteswap(T v, pipe_endian e) { + if (PIPE_ENDIAN_NATIVE != e) + std::reverse(v.begin(), v.end()); + } + + /// + /// Pad buffer \a v to the next multiple of \a n. + /// + templatetypename T + void + align(T v, size_t n) { + v.resize(util_align_npot(v.size(), n)); + } + + bool + msb(const std::vectoruint8_t s) { + if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE) + return s.back() 0x80; + else + return s.front() 0x80; + } + + /// + /// Resize buffer \a v to size \a n using sign or zero extension + /// according to \a ext. + /// + templatetypename T + void + extend(T v, enum module::argument::ext_type ext, size_t n) { + const size_t m = std::min(v.size(), n); + const bool sign_ext = (ext == module::argument::sign_ext); + const uint8_t fill = (sign_ext msb(v) ? ~0 : 0); + T w(n, fill); + + if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE) + std::copy_n(v.begin(), m, w.begin()); + else + std::copy_n(v.end() - m, m, w.end() - m); + + std::swap(v, w); + } + + /// + /// Append buffer \a w to \a v. + /// + templatetypename T + void + insert(T v, const T w) { + v.insert(v.end(), w.begin(), w.end()); + } + + /// + /// Append \a n elements to the end of buffer \a v. + /// + templatetypename T + size_t + allocate(T v, size_t n) { + size_t pos = v.size(); + v.resize(pos + n); + return pos; + } +} + kernel::kernel(clover::program prog, const std::string name, const std::vectormodule::argument margs) : program(prog), _name(name), exec(*this) { @@ -77,6 +153,10 @@ kernel::launch(command_queue q, return (uint32_t *)exec.input[h]; }, exec.g_handles); + // Implicit arguments + auto dims = bytes(cl_uint(block_size.size())); + byteswap(dims, q.device().endianness()); + q.pipe-bind_compute_state(q.pipe, st); q.pipe-bind_sampler_states(q.pipe, PIPE_SHADER_COMPUTE, 0, exec.samplers.size(), @@ -89,11 +169,15 @@ kernel::launch(command_queue q, q.pipe-set_global_binding(q.pipe, 0, exec.g_buffers.size(), exec.g_buffers.data(), g_handles.data()); + // Create local copy for implicit arguments + auto local_input = exec.input; + insert(local_input, dims); + q.pipe-launch_grid(q.pipe, pad_vector(q, block_size, 1).data(), pad_vector(q, reduced_grid_size, 1).data(), find(name_equals(_name), m.syms).offset, - exec.input.data(), exec.input.size()); + local_input.data(), local_input.size()); q.pipe-set_global_binding(q.pipe, 0, exec.g_buffers.size(), NULL, NULL); q.pipe-set_compute_resources(q.pipe, 0, exec.resources.size(), NULL); @@ -206,82 +290,6 @@ kernel::exec_context::unbind() { mem_local = 0; } -namespace { - templatetypename T - std::vectoruint8_t - bytes(const T x) { - return { (uint8_t *)x, (uint8_t *)x + sizeof(x) }; - } - - /// - /// Transform buffer \a v from the native byte order into the byte - /// order specified by \a e. - /// - templatetypename T - void - byteswap(T v, pipe_endian e) { - if (PIPE_ENDIAN_NATIVE != e) - std::reverse(v.begin(), v.end()); - } - - /// - /// Pad buffer \a v to the next multiple of \a n. - /// - templatetypename T - void - align(T v, size_t n) { - v.resize(util_align_npot(v.size(), n)); - } - - bool - msb(const std::vectoruint8_t s) { - if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE) - return s.back() 0x80; - else - return s.front() 0x80; - } - - /// - /// Resize buffer \a v to size \a n using sign or zero extension - /// according to \a ext. - /// - templatetypename T - void - extend(T v, enum module::argument::ext_type ext, size_t n) { - const size_t m = std::min(v.size(), n); - const bool sign_ext = (ext == module::argument::sign_ext); - const uint8_t fill = (sign_ext msb(v) ? ~0 : 0); - T w(n, fill); - - if (PIPE_ENDIAN_NATIVE == PIPE_ENDIAN_LITTLE) -
[Mesa-dev] [PATCH 1/3] gallium: Pass input data size to launch_grid
Future commits add implicit parameters so we can no longer rely on shader param size Signed-off-by: Jan Vesely jan.ves...@rutgers.edu --- src/gallium/drivers/ilo/ilo_gpgpu.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 4 +-- src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 2 +- src/gallium/drivers/r600/evergreen_compute.c | 2 +- src/gallium/drivers/radeonsi/si_compute.c | 2 +- src/gallium/include/pipe/p_context.h | 2 +- src/gallium/state_trackers/clover/core/kernel.cpp | 2 +- src/gallium/tests/trivial/compute.c | 40 +++ 9 files changed, 29 insertions(+), 29 deletions(-) diff --git a/src/gallium/drivers/ilo/ilo_gpgpu.c b/src/gallium/drivers/ilo/ilo_gpgpu.c index b17a518..d995db2 100644 --- a/src/gallium/drivers/ilo/ilo_gpgpu.c +++ b/src/gallium/drivers/ilo/ilo_gpgpu.c @@ -35,7 +35,7 @@ static void ilo_launch_grid(struct pipe_context *pipe, const uint *block_layout, const uint *grid_layout, -uint32_t pc, const void *input) +uint32_t pc, const void *input, size_t size) { } diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c index ad287a2..55b71e2 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c @@ -197,7 +197,7 @@ void nvc0_launch_grid(struct pipe_context *pipe, const uint *block_layout, const uint *grid_layout, uint32_t label, - const void *input) + const void *input, size_t size) { struct nvc0_context *nvc0 = nvc0_context(pipe); struct nouveau_pushbuf *push = nvc0-base.pushbuf; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h index ebeb8c4..2e901fa 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h @@ -353,10 +353,10 @@ void nvc0_push_vbo(struct nvc0_context *, const struct pipe_draw_info *); /* nve4_compute.c */ void nve4_launch_grid(struct pipe_context *, - const uint *, const uint *, uint32_t, const void *); + const uint *, const uint *, uint32_t, const void *, size_t); /* nvc0_compute.c */ void nvc0_launch_grid(struct pipe_context *, - const uint *, const uint *, uint32_t, const void *); + const uint *, const uint *, uint32_t, const void *, size_t); #endif diff --git a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c index f243316..e408ec8 100644 --- a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c +++ b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c @@ -432,7 +432,7 @@ void nve4_launch_grid(struct pipe_context *pipe, const uint *block_layout, const uint *grid_layout, uint32_t label, - const void *input) + const void *input, size_t size) { struct nvc0_context *nvc0 = nvc0_context(pipe); struct nouveau_pushbuf *push = nvc0-base.pushbuf; diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 1970414..d50f343 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -541,7 +541,7 @@ void evergreen_emit_cs_shader( static void evergreen_launch_grid( struct pipe_context *ctx_, const uint *block_layout, const uint *grid_layout, - uint32_t pc, const void *input) + uint32_t pc, const void *input, size_t size) { struct r600_context *ctx = (struct r600_context *)ctx_; diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 42e4fec..9a90470 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -162,7 +162,7 @@ static unsigned compute_num_waves_for_scratch( static void si_launch_grid( struct pipe_context *ctx, const uint *block_layout, const uint *grid_layout, - uint32_t pc, const void *input) + uint32_t pc, const void *input, size_t size) { struct si_context *sctx = (struct si_context*)ctx; struct si_pipe_compute *program = sctx-cs_shader_state.program; diff --git a/src/gallium/include/pipe/p_context.h b/src/gallium/include/pipe/p_context.h index af5674f..e71be02 100644 --- a/src/gallium/include/pipe/p_context.h +++ b/src/gallium/include/pipe/p_context.h @@ -523,7 +523,7 @@ struct pipe_context { */ void (*launch_grid)(struct pipe_context *context, const uint *block_layout, const uint *grid_layout, - uint32_t pc, const void *input); +
[Mesa-dev] [PATCH 0/3] cl workdim v2
This respin includes Francisco's approach of providing implicit in the arg vector passed from clover, and Tom's idea of appending implicit args after the kernel args. I assumed it's not safe to modify exec.input, so the input vector is copied before appending work dim. Passes get-work-dim piglit on turks without any regression, I have not tested SI as I don't have the hw. jan Jan Vesely (3): gallium: Pass input data size to launch_grid clover: Add work dimension implicit param to input r600,radeonsi: Copy implicit args provided by clover src/gallium/drivers/ilo/ilo_gpgpu.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_compute.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 4 +- src/gallium/drivers/nouveau/nvc0/nve4_compute.c | 2 +- src/gallium/drivers/r600/evergreen_compute.c | 14 +- src/gallium/drivers/r600/evergreen_compute.h | 1 - src/gallium/drivers/radeonsi/si_compute.c | 6 +- src/gallium/include/pipe/p_context.h | 2 +- src/gallium/state_trackers/clover/core/kernel.cpp | 162 -- src/gallium/tests/trivial/compute.c | 40 +++--- 10 files changed, 122 insertions(+), 113 deletions(-) -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa/formats: Fix the size of ETC2_SRGB8_PUNCHTHROUGH_ALPHA1
On Wed, Aug 6, 2014 at 10:31 AM, Jason Ekstrand ja...@jlekstrand.net wrote: Signed-off-by: Jason Ekstrand jason.ekstr...@intel.com --- src/mesa/main/formats.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/formats.csv b/src/mesa/main/formats.csv index f45e34b..eade6fa 100644 --- a/src/mesa/main/formats.csv +++ b/src/mesa/main/formats.csv @@ -279,4 +279,4 @@ MESA_FORMAT_ETC2_RG11_EAC , etc2 , 4, 4, x128, , , MESA_FORMAT_ETC2_SIGNED_R11_EAC , etc2 , 4, 4, x64 , , , , x001, rgb MESA_FORMAT_ETC2_SIGNED_RG11_EAC , etc2 , 4, 4, x128, , , , xy01, rgb MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1 , etc2 , 4, 4, x64 , , , , xyzw, rgb -MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2 , 4, 4, x128, , , , xyzw, srgb +MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1, etc2 , 4, 4, x64 , , , , xyzw, srgb -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Both patches are: Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] gallium/radeon: store VM address in r600_resource
From: Marek Olšák marek.ol...@amd.com This will help to get rid of the buffer_get_virtual_address calls. --- src/gallium/drivers/radeon/r600_buffer_common.c | 7 +-- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeon/r600_texture.c | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index d747cbc..a580685 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -168,14 +168,17 @@ bool r600_init_resource(struct r600_common_screen *rscreen, old_buf = res-buf; res-cs_buf = rscreen-ws-buffer_get_cs_handle(new_buf); /* should be atomic */ res-buf = new_buf; /* should be atomic */ + + if (rscreen-info.r600_virtual_address) + res-gpu_address = rscreen-ws-buffer_get_virtual_address(res-cs_buf); + pb_reference(old_buf, NULL); util_range_set_empty(res-valid_buffer_range); if (rscreen-debug_flags DBG_VM res-b.b.target == PIPE_BUFFER) { fprintf(stderr, VM start=0x%PRIX64 end=0x%PRIX64 | Buffer %u bytes\n, - r600_resource_va(rscreen-b, res-b.b), - r600_resource_va(rscreen-b, res-b.b) + res-buf-size, + res-gpu_address, res-gpu_address + res-buf-size, res-buf-size); } return true; diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index ac69d5b..59d0b3e 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -127,6 +127,7 @@ struct r600_resource { /* Winsys objects. */ struct pb_buffer*buf; struct radeon_winsys_cs_handle *cs_buf; + uint64_tgpu_address; /* Resource state. */ enum radeon_bo_domain domains; diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 482bbff..326aca4 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -655,6 +655,7 @@ r600_texture_create_object(struct pipe_screen *screen, } else { resource-buf = buf; resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf); + resource-gpu_address = rscreen-ws-buffer_get_virtual_address(resource-cs_buf); resource-domains = rscreen-ws-buffer_get_initial_domain(resource-cs_buf); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] radeonsi: use gpu_address from r600_resource
From: Marek Olšák marek.ol...@amd.com --- src/gallium/drivers/radeonsi/si_compute.c | 10 src/gallium/drivers/radeonsi/si_descriptors.c | 33 --- src/gallium/drivers/radeonsi/si_dma.c | 12 +- src/gallium/drivers/radeonsi/si_hw_context.c | 2 +- src/gallium/drivers/radeonsi/si_state.c | 17 ++ src/gallium/drivers/radeonsi/si_state_draw.c | 23 --- 6 files changed, 41 insertions(+), 56 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_compute.c b/src/gallium/drivers/radeonsi/si_compute.c index 42e4fec..12e4f56 100644 --- a/src/gallium/drivers/radeonsi/si_compute.c +++ b/src/gallium/drivers/radeonsi/si_compute.c @@ -114,7 +114,7 @@ static void si_set_global_binding( uint64_t va; uint32_t offset; program-global_buffers[i] = resources[i]; - va = r600_resource_va(ctx-screen, resources[i]); + va = r600_resource(resources[i])-gpu_address; offset = util_le32_to_cpu(*handles[i]); va += offset; va = util_cpu_to_le64(va); @@ -223,8 +223,7 @@ static void si_launch_grid( si_resource_create_custom(sctx-b.b.screen, PIPE_USAGE_DEFAULT, scratch_bytes); } - scratch_buffer_va = r600_resource_va(ctx-screen, - (struct pipe_resource*)shader-scratch_bo); + scratch_buffer_va = shader-scratch_bo-gpu_address; si_pm4_add_bo(pm4, shader-scratch_bo, RADEON_USAGE_READWRITE, RADEON_PRIO_SHADER_RESOURCE_RW); @@ -238,8 +237,7 @@ static void si_launch_grid( si_upload_const_buffer(sctx, kernel_args_buffer, (uint8_t*)kernel_args, kernel_args_size, kernel_args_offset); - kernel_args_va = r600_resource_va(ctx-screen, - (struct pipe_resource*)kernel_args_buffer); + kernel_args_va = kernel_args_buffer-gpu_address; kernel_args_va += kernel_args_offset; si_pm4_add_bo(pm4, kernel_args_buffer, RADEON_USAGE_READ, RADEON_PRIO_SHADER_DATA); @@ -285,7 +283,7 @@ static void si_launch_grid( 0x190 /* Default value */); } - shader_va = r600_resource_va(ctx-screen, (void *)shader-bo); + shader_va = shader-bo-gpu_address; si_pm4_add_bo(pm4, shader-bo, RADEON_USAGE_READ, RADEON_PRIO_SHADER_DATA); si_pm4_set_reg(pm4, R_00B830_COMPUTE_PGM_LO, (shader_va 8) 0x); si_pm4_set_reg(pm4, R_00B834_COMPUTE_PGM_HI, shader_va 40); diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c b/src/gallium/drivers/radeonsi/si_descriptors.c index 171de45..81ad14b 100644 --- a/src/gallium/drivers/radeonsi/si_descriptors.c +++ b/src/gallium/drivers/radeonsi/si_descriptors.c @@ -113,8 +113,6 @@ static void si_init_descriptors(struct si_context *sctx, unsigned num_elements, void (*emit_func)(struct si_context *ctx, struct r600_atom *state)) { - uint64_t va; - assert(num_elements = sizeof(desc-enabled_mask)*8); assert(num_elements = sizeof(desc-dirty_mask)*8); @@ -131,11 +129,11 @@ static void si_init_descriptors(struct si_context *sctx, r600_context_bo_reloc(sctx-b, sctx-b.rings.gfx, desc-buffer, RADEON_USAGE_READWRITE, RADEON_PRIO_SHADER_DATA); - va = r600_resource_va(sctx-b.b.screen, desc-buffer-b.b); /* We don't check for CS space here, because this should be called * only once at context initialization. */ - si_emit_cp_dma_clear_buffer(sctx, va, desc-buffer-b.b.width0, 0, + si_emit_cp_dma_clear_buffer(sctx, desc-buffer-gpu_address, + desc-buffer-b.b.width0, 0, R600_CP_DMA_SYNC); } @@ -170,7 +168,7 @@ static void si_emit_shader_pointer(struct si_context *sctx, { struct si_descriptors *desc = (struct si_descriptors*)atom; struct radeon_winsys_cs *cs = sctx-b.rings.gfx.cs; - uint64_t va = r600_resource_va(sctx-b.b.screen, desc-buffer-b.b) + + uint64_t va = desc-buffer-gpu_address + desc-current_context_id * desc-context_size + desc-buffer_offset; @@ -205,7 +203,7 @@ static void si_emit_descriptors(struct si_context *sctx, assert(dirty_mask); - va_base = r600_resource_va(sctx-b.b.screen, desc-buffer-b.b); + va_base = desc-buffer-gpu_address; /* Copy the descriptors to a new context slot. */ /* XXX Consider using TC or L2 for this copy on CIK. */ @@ -567,7 +565,6 @@ static void si_vertex_buffers_begin_new_cs(struct si_context *sctx) void si_update_vertex_buffers(struct
[Mesa-dev] [PATCH 6/6] gallium/radeon: remove r600_resource_va
From: Marek Olšák marek.ol...@amd.com --- src/gallium/drivers/radeon/r600_cs.h | 9 - 1 file changed, 9 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_cs.h b/src/gallium/drivers/radeon/r600_cs.h index b30b465..3cee760 100644 --- a/src/gallium/drivers/radeon/r600_cs.h +++ b/src/gallium/drivers/radeon/r600_cs.h @@ -33,15 +33,6 @@ #include r600_pipe_common.h #include r600d_common.h -static INLINE uint64_t r600_resource_va(struct pipe_screen *screen, - struct pipe_resource *resource) -{ - struct r600_common_screen *rscreen = (struct r600_common_screen*)screen; - struct r600_resource *rresource = (struct r600_resource*)resource; - - return rscreen-ws-buffer_get_virtual_address(rresource-cs_buf); -} - static INLINE unsigned r600_context_bo_reloc(struct r600_common_context *rctx, struct r600_ring *ring, struct r600_resource *rbo, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] r600g: use gpu_address from r600_resource
From: Marek Olšák marek.ol...@amd.com --- src/gallium/drivers/r600/evergreen_compute.c| 5 +-- src/gallium/drivers/r600/evergreen_hw_context.c | 6 ++-- src/gallium/drivers/r600/evergreen_state.c | 47 +++-- src/gallium/drivers/r600/r600_hw_context.c | 4 +-- src/gallium/drivers/r600/r600_state_common.c| 6 ++-- 5 files changed, 29 insertions(+), 39 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 1970414..402c871 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -521,12 +521,9 @@ void evergreen_emit_cs_shader( struct r600_pipe_compute *shader = state-shader; struct r600_kernel *kernel = shader-kernels[state-kernel_index]; struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs; - uint64_t va; - - va = r600_resource_va(rctx-screen-b.b, kernel-code_bo-b.b); r600_write_compute_context_reg_seq(cs, R_0288D0_SQ_PGM_START_LS, 3); - radeon_emit(cs, va 8); /* R_0288D0_SQ_PGM_START_LS */ + radeon_emit(cs, kernel-code_bo-gpu_address 8); /* R_0288D0_SQ_PGM_START_LS */ radeon_emit(cs, /* R_0288D4_SQ_PGM_RESOURCES_LS */ S_0288D4_NUM_GPRS(kernel-bc.ngpr) | S_0288D4_STACK_SIZE(kernel-bc.nstack)); diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index f95a17e..63c2906 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -46,8 +46,8 @@ void evergreen_dma_copy_buffer(struct r600_context *rctx, util_range_add(rdst-valid_buffer_range, dst_offset, dst_offset + size); - dst_offset += r600_resource_va(rctx-screen-b.b, dst); - src_offset += r600_resource_va(rctx-screen-b.b, src); + dst_offset += rdst-gpu_address; + src_offset += rsrc-gpu_address; /* see if we use dword or byte copy */ if (!(dst_offset % 4) !(src_offset % 4) !(size % 4)) { @@ -97,7 +97,7 @@ void evergreen_cp_dma_clear_buffer(struct r600_context *rctx, util_range_add(r600_resource(dst)-valid_buffer_range, offset, offset + size); - offset += r600_resource_va(rctx-screen-b.b, dst); + offset += r600_resource(dst)-gpu_address; /* Flush the cache where the resource is bound. */ rctx-b.flags |= R600_CONTEXT_INV_CONST_CACHE | diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 63811e8..4598ccf 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -600,7 +600,6 @@ texture_buffer_sampler_view(struct r600_pipe_sampler_view *view, unsigned width0, unsigned height0) { - struct pipe_context *ctx = view-base.context; struct r600_texture *tmp = (struct r600_texture*)view-base.texture; uint64_t va; int stride = util_format_get_blocksize(view-base.format); @@ -624,7 +623,7 @@ texture_buffer_sampler_view(struct r600_pipe_sampler_view *view, swizzle_res = r600_get_swizzle_combined(desc-swizzle, swizzle, TRUE); - va = r600_resource_va(ctx-screen, view-base.texture) + offset; + va = tmp-resource.gpu_address + offset; view-tex_resource = tmp-resource; view-skip_mip_address_reloc = true; @@ -781,7 +780,7 @@ evergreen_create_sampler_view_custom(struct pipe_context *ctx, } else if (texture-target == PIPE_TEXTURE_CUBE_ARRAY) depth = texture-array_size / 6; - va = r600_resource_va(ctx-screen, texture); + va = tmp-resource.gpu_address; view-tex_resource = tmp-resource; view-tex_resource_words[0] = (S_03_DIM(r600_tex_dim(texture-target, texture-nr_samples)) | @@ -941,8 +940,7 @@ void evergreen_init_color_surface_rat(struct r600_context *rctx, endian = ENDIAN_NONE; } - surf-cb_color_base = - r600_resource_va(rctx-b.b.screen, pipe_buffer) 8; + surf-cb_color_base = r600_resource(pipe_buffer)-gpu_address 8; surf-cb_color_pitch = (pitch / 8) - 1; @@ -980,7 +978,6 @@ void evergreen_init_color_surface(struct r600_context *rctx, { struct r600_screen *rscreen = rctx-screen; struct r600_texture *rtex = (struct r600_texture*)surf-base.texture; - struct pipe_resource *pipe_tex = surf-base.texture; unsigned level = surf-base.u.tex.level; unsigned pitch, slice; unsigned color_info, color_attrib, color_dim = 0, color_view; @@ -1139,7 +1136,7 @@ void evergreen_init_color_surface(struct r600_context *rctx, color_info |= S_028C70_COMPRESSION(1); } - base_offset = r600_resource_va(rctx-b.b.screen, pipe_tex);
[Mesa-dev] [PATCH 5/6] gallium/radeon: use gpu_address from r600_resource
From: Marek Olšák marek.ol...@amd.com --- src/gallium/drivers/radeon/r600_query.c | 14 ++ src/gallium/drivers/radeon/r600_streamout.c | 9 +++-- src/gallium/drivers/radeon/r600_texture.c | 12 +--- 3 files changed, 14 insertions(+), 21 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_query.c b/src/gallium/drivers/radeon/r600_query.c index 92863cb..503737c 100644 --- a/src/gallium/drivers/radeon/r600_query.c +++ b/src/gallium/drivers/radeon/r600_query.c @@ -171,8 +171,7 @@ static void r600_emit_query_begin(struct r600_common_context *ctx, struct r600_q } /* emit begin query */ - va = r600_resource_va(ctx-b.screen, (void*)query-buffer.buf); - va += query-buffer.results_end; + va = query-buffer.buf-gpu_address + query-buffer.results_end; switch (query-type) { case PIPE_QUERY_OCCLUSION_COUNTER: @@ -233,7 +232,8 @@ static void r600_emit_query_end(struct r600_common_context *ctx, struct r600_que ctx-need_gfx_cs_space(ctx-b, query-num_cs_dw, FALSE); } - va = r600_resource_va(ctx-b.screen, (void*)query-buffer.buf); + va = query-buffer.buf-gpu_address; + /* emit end query */ switch (query-type) { case PIPE_QUERY_OCCLUSION_COUNTER: @@ -329,7 +329,7 @@ static void r600_emit_query_predication(struct r600_common_context *ctx, struct /* emit predicate packets for all data blocks */ for (qbuf = query-buffer; qbuf; qbuf = qbuf-previous) { unsigned results_base = 0; - uint64_t va = r600_resource_va(ctx-b.screen, qbuf-buf-b.b); + uint64_t va = qbuf-buf-gpu_address; while (results_base qbuf-results_end) { radeon_emit(cs, PKT3(PKT3_SET_PREDICATION, 1, 0)); @@ -826,7 +826,6 @@ void r600_query_init_backend_mask(struct r600_common_context *ctx) uint32_t *results; unsigned num_backends = ctx-screen-info.r600_num_backends; unsigned i, mask = 0; - uint64_t va; /* if backend_map query is supported by the kernel */ if (ctx-screen-info.r600_backend_map_valid) { @@ -861,7 +860,6 @@ void r600_query_init_backend_mask(struct r600_common_context *ctx) PIPE_USAGE_STAGING, ctx-max_db*16); if (!buffer) goto err; - va = r600_resource_va(ctx-b.screen, (void*)buffer); /* initialize buffer with zeroes */ results = r600_buffer_map_sync_with_rings(ctx, buffer, PIPE_TRANSFER_WRITE); @@ -872,8 +870,8 @@ void r600_query_init_backend_mask(struct r600_common_context *ctx) /* emit EVENT_WRITE for ZPASS_DONE */ radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 2, 0)); radeon_emit(cs, EVENT_TYPE(EVENT_TYPE_ZPASS_DONE) | EVENT_INDEX(1)); - radeon_emit(cs, va); - radeon_emit(cs, va 32); + radeon_emit(cs, buffer-gpu_address); + radeon_emit(cs, buffer-gpu_address 32); r600_emit_reloc(ctx, ctx-rings.gfx, buffer, RADEON_USAGE_WRITE, RADEON_PRIO_MIN); diff --git a/src/gallium/drivers/radeon/r600_streamout.c b/src/gallium/drivers/radeon/r600_streamout.c index cb72ada..e2413c2 100644 --- a/src/gallium/drivers/radeon/r600_streamout.c +++ b/src/gallium/drivers/radeon/r600_streamout.c @@ -212,8 +212,7 @@ static void r600_emit_streamout_begin(struct r600_common_context *rctx, struct r t[i]-b.buffer_size) 2);/* BUFFER_SIZE (in DW) */ radeon_emit(cs, stride_in_dw[i]); /* VTX_STRIDE (in DW) */ } else { - uint64_t va = r600_resource_va(rctx-b.screen, - (void*)t[i]-b.buffer); + uint64_t va = r600_resource(t[i]-b.buffer)-gpu_address; update_flags |= SURFACE_BASE_UPDATE_STRMOUT(i); @@ -239,8 +238,7 @@ static void r600_emit_streamout_begin(struct r600_common_context *rctx, struct r } if (rctx-streamout.append_bitmask (1 i)) { - uint64_t va = r600_resource_va(rctx-b.screen, - (void*)t[i]-buf_filled_size) + + uint64_t va = t[i]-buf_filled_size-gpu_address + t[i]-buf_filled_size_offset; /* Append. */ @@ -286,8 +284,7 @@ void r600_emit_streamout_end(struct r600_common_context *rctx) if (!t[i]) continue; - va = r600_resource_va(rctx-b.screen, - (void*)t[i]-buf_filled_size) + t[i]-buf_filled_size_offset; + va = t[i]-buf_filled_size-gpu_address + t[i]-buf_filled_size_offset;
[Mesa-dev] [PATCH 1/6] r600g: remove useless r600_resource_va calls
From: Marek Olšák marek.ol...@amd.com R600-R700 don't support virtual memory. --- src/gallium/drivers/r600/r600_state.c | 27 +-- 1 file changed, 9 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 258ffd1..607b199 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -595,25 +595,22 @@ texture_buffer_sampler_view(struct r600_pipe_sampler_view *view, unsigned width0, unsigned height0) { - struct pipe_context *ctx = view-base.context; struct r600_texture *tmp = (struct r600_texture*)view-base.texture; - uint64_t va; int stride = util_format_get_blocksize(view-base.format); unsigned format, num_format, format_comp, endian; - unsigned offset = view-base.u.buf.first_element * stride; + uint64_t offset = view-base.u.buf.first_element * stride; unsigned size = (view-base.u.buf.last_element - view-base.u.buf.first_element + 1) * stride; r600_vertex_data_type(view-base.format, format, num_format, format_comp, endian); - va = r600_resource_va(ctx-screen, view-base.texture) + offset; view-tex_resource = tmp-resource; - view-skip_mip_address_reloc = true; - view-tex_resource_words[0] = va; + + view-tex_resource_words[0] = offset; view-tex_resource_words[1] = size - 1; - view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(va 32UL) | + view-tex_resource_words[2] = S_038008_BASE_ADDRESS_HI(offset 32UL) | S_038008_STRIDE(stride) | S_038008_DATA_FORMAT(format) | S_038008_NUM_FORMAT_ALL(num_format) | @@ -1105,8 +1102,7 @@ static void r600_init_depth_surface(struct r600_context *rctx, /* use htile only for first level */ if (rtex-htile_buffer !level) { - uint64_t va = r600_resource_va(rctx-screen-b.b, rtex-htile_buffer-b.b); - surf-db_htile_data_base = va 8; + surf-db_htile_data_base = 0; surf-db_htile_surface = S_028D24_HTILE_WIDTH(1) | S_028D24_HTILE_HEIGHT(1) | S_028D24_FULL_CACHE(1) | @@ -1944,7 +1940,6 @@ static void r600_emit_shader_stages(struct r600_context *rctx, struct r600_atom static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a) { - struct pipe_screen *screen = rctx-b.b.screen; struct radeon_winsys_cs *cs = rctx-b.rings.gfx.cs; struct r600_gs_rings_state *state = (struct r600_gs_rings_state*)a; struct r600_resource *rbuffer; @@ -1955,8 +1950,7 @@ static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a) if (state-enable) { rbuffer =(struct r600_resource*)state-esgs_ring.buffer; - r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE, - (r600_resource_va(screen, rbuffer-b.b)) 8); + r600_write_config_reg(cs, R_008C40_SQ_ESGS_RING_BASE, 0); radeon_emit(cs, PKT3(PKT3_NOP, 0, 0)); radeon_emit(cs, r600_context_bo_reloc(rctx-b, rctx-b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE, @@ -1965,8 +1959,7 @@ static void r600_emit_gs_rings(struct r600_context *rctx, struct r600_atom *a) state-esgs_ring.buffer_size 8); rbuffer =(struct r600_resource*)state-gsvs_ring.buffer; - r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE, - (r600_resource_va(screen, rbuffer-b.b)) 8); + r600_write_config_reg(cs, R_008C48_SQ_GSVS_RING_BASE, 0); radeon_emit(cs, PKT3(PKT3_NOP, 0, 0)); radeon_emit(cs, r600_context_bo_reloc(rctx-b, rctx-b.rings.gfx, rbuffer, RADEON_USAGE_READWRITE, @@ -2644,8 +2637,7 @@ void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_GS, S_02887C_NUM_GPRS(rshader-bc.ngpr) | S_02887C_STACK_SIZE(rshader-bc.nstack)); - r600_store_context_reg(cb, R_02886C_SQ_PGM_START_GS, - r600_resource_va(ctx-screen, (void *)shader-bo) 8); + r600_store_context_reg(cb, R_02886C_SQ_PGM_START_GS, 0); /* After that, the NOP relocation packet must be emitted (shader-bo, RADEON_USAGE_READ). */ } @@ -2659,8 +2651,7 @@ void r600_update_es_state(struct pipe_context *ctx, struct r600_pipe_shader *sha r600_store_context_reg(cb, R_028890_SQ_PGM_RESOURCES_ES,
Re: [Mesa-dev] [PATCH 08/20] i965/cfg: Add functions to test if a block is a successor/predecessor.
On Wed, Aug 6, 2014 at 11:19 AM, Matt Turner matts...@gmail.com wrote: On Tue, Aug 5, 2014 at 10:21 AM, Pohjolainen, Topi topi.pohjolai...@intel.com wrote: On Thu, Jul 24, 2014 at 07:54:15PM -0700, Matt Turner wrote: --- src/mesa/drivers/dri/i965/brw_cfg.cpp | 24 src/mesa/drivers/dri/i965/brw_cfg.h | 2 ++ 2 files changed, 26 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_cfg.cpp b/src/mesa/drivers/dri/i965/brw_cfg.cpp index d806b83..9cd8b9f 100644 --- a/src/mesa/drivers/dri/i965/brw_cfg.cpp +++ b/src/mesa/drivers/dri/i965/brw_cfg.cpp @@ -71,6 +71,30 @@ bblock_t::add_successor(void *mem_ctx, bblock_t *successor) children.push_tail(::link(mem_ctx, successor)); } +bool +bblock_t::is_predecessor_of(const bblock_t *block) const +{ + foreach_list_typed_safe (bblock_link, parent, link, block-parents) { I read patch number three again, and noticed this small formatting change there as well. I haven't seen us leaving the space before ( anywhere else. I figure I should have a space between the macro and the ( since we put a space there for regular for loops. + if (parent-block == this) { + return true; + } We have one line blocks with and without {}. I just thought I mention in case you didn't mean to. Right, I'll drop the {}. Oh, this is inside another {} set. I'd rather include the braces when they're nested. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium: remove PIPE_SHADER_CAP_MAX_ADDRS
From: Marek Olšák marek.ol...@amd.com This limit is fixed in Mesa core and cannot be changed. It only affects ARB_vertex_program and ARB_fragment_program. The minimum value for ARB_vertex_program is 1 according to the spec. The maximum value for ARB_vertex_program is limited to 1 by Mesa core. The value should be zero for ARB_fragment_program, because it doesn't support ARL. Finally, drivers shouldn't mess with these values arbitrarily. --- Sidenote: Does anybody use predicates in TGSI? src/gallium/auxiliary/gallivm/lp_bld_limits.h| 2 -- src/gallium/auxiliary/tgsi/tgsi_exec.h | 3 --- src/gallium/docs/source/screen.rst | 1 - src/gallium/drivers/freedreno/freedreno_screen.c | 2 -- src/gallium/drivers/i915/i915_screen.c | 2 -- src/gallium/drivers/ilo/ilo_screen.c | 2 -- src/gallium/drivers/nouveau/nv30/nv30_screen.c | 4 src/gallium/drivers/nouveau/nv50/nv50_screen.c | 2 -- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 -- src/gallium/drivers/r300/r300_screen.c | 3 --- src/gallium/drivers/r600/r600_pipe.c | 3 --- src/gallium/drivers/radeonsi/si_pipe.c | 3 --- src/gallium/drivers/svga/svga_screen.c | 3 --- src/gallium/include/pipe/p_defines.h | 1 - src/mesa/state_tracker/st_extensions.c | 3 +-- 15 files changed, 1 insertion(+), 35 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_limits.h b/src/gallium/auxiliary/gallivm/lp_bld_limits.h index eb83ea8..a96ab29 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_limits.h +++ b/src/gallium/auxiliary/gallivm/lp_bld_limits.h @@ -103,8 +103,6 @@ gallivm_get_shader_param(enum pipe_shader_cap param) return PIPE_MAX_CONSTANT_BUFFERS; case PIPE_SHADER_CAP_MAX_TEMPS: return LP_MAX_TGSI_TEMPS; - case PIPE_SHADER_CAP_MAX_ADDRS: - return LP_MAX_TGSI_ADDRS; case PIPE_SHADER_CAP_MAX_PREDS: return LP_MAX_TGSI_PREDS; case PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED: diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h b/src/gallium/auxiliary/tgsi/tgsi_exec.h index c6fd3d7..4720ec6 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h @@ -193,7 +193,6 @@ struct tgsi_sampler #define TGSI_EXEC_NUM_TEMP_R4 #define TGSI_EXEC_TEMP_ADDR (TGSI_EXEC_NUM_TEMPS + 8) -#define TGSI_EXEC_NUM_ADDRS 1 /* predicate register */ #define TGSI_EXEC_TEMP_P0 (TGSI_EXEC_NUM_TEMPS + 9) @@ -433,8 +432,6 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param) return PIPE_MAX_CONSTANT_BUFFERS; case PIPE_SHADER_CAP_MAX_TEMPS: return TGSI_EXEC_NUM_TEMPS; - case PIPE_SHADER_CAP_MAX_ADDRS: - return TGSI_EXEC_NUM_ADDRS; case PIPE_SHADER_CAP_MAX_PREDS: return TGSI_EXEC_NUM_PREDS; case PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED: diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst index 74cecc2..814e3ae 100644 --- a/src/gallium/docs/source/screen.rst +++ b/src/gallium/docs/source/screen.rst @@ -269,7 +269,6 @@ file is still supported. In that case, the constbuf index is assumed to be 0. * ``PIPE_SHADER_CAP_MAX_TEMPS``: The maximum number of temporary registers. -* ``PIPE_SHADER_CAP_MAX_ADDRS``: The maximum number of address registers. * ``PIPE_SHADER_CAP_MAX_PREDS``: The maximum number of predicate registers. * ``PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED``: Whether the continue opcode is supported. * ``PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR``: Whether indirect addressing diff --git a/src/gallium/drivers/freedreno/freedreno_screen.c b/src/gallium/drivers/freedreno/freedreno_screen.c index 8fae5dd..5fb7352 100644 --- a/src/gallium/drivers/freedreno/freedreno_screen.c +++ b/src/gallium/drivers/freedreno/freedreno_screen.c @@ -327,8 +327,6 @@ fd_screen_get_shader_param(struct pipe_screen *pscreen, unsigned shader, return 16; case PIPE_SHADER_CAP_MAX_TEMPS: return 64; /* Max native temporaries. */ - case PIPE_SHADER_CAP_MAX_ADDRS: - return 1; /* Max native address registers */ case PIPE_SHADER_CAP_MAX_CONST_BUFFER_SIZE: return ((screen-gpu_id = 300) ? 1024 : 64) * sizeof(float[4]); case PIPE_SHADER_CAP_MAX_CONST_BUFFERS: diff --git a/src/gallium/drivers/i915/i915_screen.c b/src/gallium/drivers/i915/i915_screen.c index 133c773..ca3dd4a 100644 --- a/src/gallium/drivers/i915/i915_screen.c +++ b/src/gallium/drivers/i915/i915_screen.c @@ -135,8 +135,6 @@ i915_get_shader_param(struct pipe_screen *screen, unsigned shader, enum pipe_sha return 1; case PIPE_SHADER_CAP_MAX_TEMPS: return 12; /* XXX: 12 - 32 ? */ - case PIPE_SHADER_CAP_MAX_ADDRS: - return 0; case PIPE_SHADER_CAP_MAX_PREDS: return 0; case PIPE_SHADER_CAP_TGSI_CONT_SUPPORTED: diff --git a/src/gallium/drivers/ilo/ilo_screen.c b/src/gallium/drivers/ilo/ilo_screen.c
Re: [Mesa-dev] Merging VC4 driver
Roland Scheidegger srol...@vmware.com writes: Am 06.08.2014 22:33, schrieb Eric Anholt: + * When building using the simulator (on x86), we advertise ourselves as the + * i965 driver so that you can just make a directory with a link from + * i965_dri.so to the built vc4_dri.so, and point LIBGL_DRIVERS_PATH to that + * on your i965-using host to run the driver under simulation. + * + * This is, of course, incompatible with building with the ilo driver, but you + * shouldn't be building that anyway. + */ +PUBLIC const __DRIextension **__driDriverGetExtensions_i965(void) +{ + globalDriverAPI = galliumdrm_driver_api; + return galliumdrm_driver_extensions; +} +#endif I have no idea how that simulator works, but this looks like a fairly gross hack to me. Couldn't you use something similar to how the software based drivers are loaded or anything like that? I need a DRI fd and normal DRI buffer management from a host x server (which I interact with using the dumb ioctls). Are you thinking of something that would provide that in a simpler way? pgpHs5t3f30AY.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 82268] New: Add support for the OpenRISC architecture (or1k)
https://bugs.freedesktop.org/show_bug.cgi?id=82268 Priority: medium Bug ID: 82268 Assignee: mesa-dev@lists.freedesktop.org Summary: Add support for the OpenRISC architecture (or1k) Severity: enhancement Classification: Unclassified OS: All Reporter: manuel.montez...@gmail.com Hardware: Other Status: NEW Version: 10.1 Component: Mesa core Product: Mesa Created attachment 104182 -- https://bugs.freedesktop.org/attachment.cgi?id=104182action=edit mesa-or1k.patch From Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=749172 From: Christian Svensson deb...@cmd.nu Date: Sat, 24 May 2014 21:37:01 +0200 Package: mesa Version: 10.1.2 Severity: wishlist Tags: upstream patch Dear Maintainer, This trivial patch adds support for or1k. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev