Re: [Mesa-dev] [RFC PATCH 5/5] i965: Skip execution size adjustment for instructions of width 4
On Dec 9, 2015 11:47 PM, "Iago Toral" wrote: > > On Wed, 2015-12-09 at 08:10 -0800, Jason Ekstrand wrote: > > > > On Dec 9, 2015 4:16 AM, "Iago Toral Quiroga" > > wrote: > > > > > > This code in brw_set_dest adjusts the execution size of any > > instruction > > > with a dst.width < 8. However, we don't want to do this with > > instructions > > > operating on doubles, since these will have a width of 4, but still > > > need an execution size of 8 (for SIMD8). Unfortunately, we can't > > just check > > > the size of the operands involved to detect if we are doing an > > operation on > > > doubles, because we can have instructions that do operations on > > double > > > operands interpreted as UD, operating on any of its 2 32-bit > > components. > > > > > > Previous commits have made it so we never emit instructions with a > > horizontal > > > width of 4 that don't have the correct execution size set for > > gen7/gen8, so > > > we can skip it in this case, avoiding the conflicts with fp64 > > requirements. > > > > > > Expanding the same fix to other hardware generations requires many > > more > > > changes but since we are not targetting fp64 support on them > > > wer don't really care for now. > > > --- > > > src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +- > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > > index 78f2c8c..50a8771 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > > @@ -202,8 +202,20 @@ brw_set_dest(struct brw_codegen *p, brw_inst > > *inst, struct brw_reg dest) > > > /* Generators should set a default exec_size of either 8 > > (SIMD4x2 or SIMD8) > > > * or 16 (SIMD16), as that's normally correct. However, when > > dealing with > > > * small registers, we automatically reduce it to match the > > register size. > > > +* > > > +* In platforms that support fp64 we can emit instructions with > > a width of > > > +* 4 that need two SIMD8 registers and an exec_size of 8 or 16. > > In these > > > +* cases we need to make sure that these instructions have their > > exec sizes > > > +* set properly when they are emitted and we can't rely on this > > code to fix > > > +* it. > > > */ > > > - if (dest.width < BRW_EXECUTE_8) > > > + bool fix_exec_size; > > > + if (devinfo->gen == 7 || devinfo->gen == 8) > > > > If we're doing to take this approach, we definitely want to make it > > gen > 6 or something so we include future gens. Really gen > 4 is > > probably doable since the only real problem is the legacy clipping > > code. > > Strips and fans is also a problem, but it is certainly doable if we want > to do it. Yeah, my primary point is that we should make it as little of an edge-case as possible. We could go back to at least gen6 and we should go forward. That said, it'll take a little testing from the Intel side. > Iago > > > > > + fix_exec_size = dest.width < BRW_EXECUTE_4; > > > + else > > > + fix_exec_size = dest.width < BRW_EXECUTE_8; > > > + > > > + if (fix_exec_size) > > >brw_inst_set_exec_size(devinfo, inst, dest.width); > > > } > > > > > > -- > > > 2.1.4 > > > > > > ___ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 5/5] i965: Skip execution size adjustment for instructions of width 4
On Wed, 2015-12-09 at 08:10 -0800, Jason Ekstrand wrote: > > On Dec 9, 2015 4:16 AM, "Iago Toral Quiroga" > wrote: > > > > This code in brw_set_dest adjusts the execution size of any > instruction > > with a dst.width < 8. However, we don't want to do this with > instructions > > operating on doubles, since these will have a width of 4, but still > > need an execution size of 8 (for SIMD8). Unfortunately, we can't > just check > > the size of the operands involved to detect if we are doing an > operation on > > doubles, because we can have instructions that do operations on > double > > operands interpreted as UD, operating on any of its 2 32-bit > components. > > > > Previous commits have made it so we never emit instructions with a > horizontal > > width of 4 that don't have the correct execution size set for > gen7/gen8, so > > we can skip it in this case, avoiding the conflicts with fp64 > requirements. > > > > Expanding the same fix to other hardware generations requires many > more > > changes but since we are not targetting fp64 support on them > > wer don't really care for now. > > --- > > src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > index 78f2c8c..50a8771 100644 > > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > @@ -202,8 +202,20 @@ brw_set_dest(struct brw_codegen *p, brw_inst > *inst, struct brw_reg dest) > > /* Generators should set a default exec_size of either 8 > (SIMD4x2 or SIMD8) > > * or 16 (SIMD16), as that's normally correct. However, when > dealing with > > * small registers, we automatically reduce it to match the > register size. > > +* > > +* In platforms that support fp64 we can emit instructions with > a width of > > +* 4 that need two SIMD8 registers and an exec_size of 8 or 16. > In these > > +* cases we need to make sure that these instructions have their > exec sizes > > +* set properly when they are emitted and we can't rely on this > code to fix > > +* it. > > */ > > - if (dest.width < BRW_EXECUTE_8) > > + bool fix_exec_size; > > + if (devinfo->gen == 7 || devinfo->gen == 8) > > If we're doing to take this approach, we definitely want to make it > gen > 6 or something so we include future gens. Really gen > 4 is > probably doable since the only real problem is the legacy clipping > code. Strips and fans is also a problem, but it is certainly doable if we want to do it. Iago > > + fix_exec_size = dest.width < BRW_EXECUTE_4; > > + else > > + fix_exec_size = dest.width < BRW_EXECUTE_8; > > + > > + if (fix_exec_size) > >brw_inst_set_exec_size(devinfo, inst, dest.width); > > } > > > > -- > > 2.1.4 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/shader: return correct attribute location for double matrix arrays
On 12/10/2015 05:41 AM, Dave Airlie wrote: From: Dave Airlie If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19, dmat2[2] at 21 etc. The old code was returning 17,18,19. I think this code is also wrong for float matricies as well. This partly fixes: GL41-CTS.vertex_attrib_64bit.limits_test --- src/mesa/main/shader_query.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp index 5d15006..faaf08c 100644 --- a/src/mesa/main/shader_query.cpp +++ b/src/mesa/main/shader_query.cpp @@ -858,7 +858,7 @@ program_resource_location(struct gl_shader_program *shProg, && array_index >= RESOURCE_VAR(res)->type->length) { return -1; } - return RESOURCE_VAR(res)->data.location + array_index - VERT_ATTRIB_GENERIC0; + return RESOURCE_VAR(res)->data.location + (array_index * RESOURCE_VAR(res)->type->without_array()->matrix_columns) - VERT_ATTRIB_GENERIC0; There are some lines that exceed 80 chars in this file but this seems way too long (?) You could put this case in braces and have a temporary for glsl_type or maybe matrix cols to make it fit. Otherwise seems correct to me. Looks like this has been broken for quite a long time, even before program interface query was introduced. Reviewed-by: Tapani Pälli case GL_PROGRAM_OUTPUT: /* If the output is an array, fail if the index is out of bounds. */ if (array_index > 0 // Tapani ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 44/44] docs: Add ARB_compute_shader to 11.2.0 release notes
On Tue, Dec 01, 2015 at 12:20:02AM -0800, Jordan Justen wrote: > Signed-off-by: Jordan Justen > Cc: Iago Toral Quiroga For the series: Reviewed-by: Kristian Høgsberg Admittedly, light review on patches 17-25 but Iago covered that and the overall refactoring looks sounds. For the new shared load/store intrinsics, we'll want to avoid the _indirect versions, but that may be better done as an update to Jasons series. Kristian > --- > docs/relnotes/11.2.0.html | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/docs/relnotes/11.2.0.html b/docs/relnotes/11.2.0.html > index c9c0c90..eb53e81 100644 > --- a/docs/relnotes/11.2.0.html > +++ b/docs/relnotes/11.2.0.html > @@ -45,6 +45,7 @@ Note: some of the new features are only available with > certain drivers. > > > GL_ARB_base_instance on freedreno/a4xx > +GL_ARB_compute_shader on i965 > GL_ARB_texture_buffer_object_rgb32 on freedreno/a4xx > GL_ARB_texture_buffer_range on freedreno/a4xx > GL_ARB_texture_query_lod on freedreno/a4xx > -- > 2.6.2 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT
On Wed, Dec 9, 2015 at 9:36 PM, Jason Ekstrand wrote: > This is an initial implementation of the MOV_INDIRECT opcode in the vec4 > backend. Unfortunately, I haven't had a chance to test it in the wild yet, > but I think review would still be good. In particular, the approach I took > to handling swizzles. > > Unfortunately, the only indirect MOV instructions you can use in align16 > mode have a uniform indirect. This means that, in order to do an indirect > MOV, we need to either do two movs or use align1 mode. The problem with > two MOVs is that, in order to force first/second half, you have to disable > writemasking so you might as well be in align1. In align1 mode, we have > two options for indirects: We could use height of 2 and make the hardware > grab two sets of 4 consecutive dwords for us or we could use a UV > immediate to add 0, 4, 8, and 12 to the four channels. The second method > only works easily on SNB+ because we don't have UV immediates on ILK and > previous. However, the first method (use a height of 2) may have > interesting hardware implications if any of those sets of 4 dwords ever > crosses a register boundary. I didn't want to count software always giving > us vec4-aligned offsets, so I decided on using the UV immediate. > > Ok, so swizzles. One of the other advantages of adding a UV immediate is > that we can make the immediate be whatever we want. It doesn't have to be > (0, 4, 8, 12); it could be anything. So, I used it to implement swizzling. Correction: This patch doesn't support swizzling. But we could if we wanted. Do we want? > Do we want swizzling? Does it make sense to load a swizzled value starting > at an arbitrary offset? Does it make sense to support swizzling but not > writemasking? I don't know. Thoughts? > > --- > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 45 > > 1 file changed, 45 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > index c3426dd..71a7f63 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > @@ -1052,6 +1052,48 @@ generate_set_simd4x2_header_gen9(struct brw_codegen *p, > } > > static void > +generate_mov_indirect(struct brw_codegen *p, > + vec4_instruction *inst, > + struct brw_reg dst, struct brw_reg reg, > + struct brw_reg indirect, struct brw_reg length) > +{ > + assert(indirect.type == BRW_REGISTER_TYPE_UD); > + > + unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr * (REG_SIZE / 2); > + > + /* This instruction acts in align1 mode */ > + assert(inst->force_writemask_all || reg.writemask == 0xf); > + > + brw_push_insn_state(p); > + brw_set_default_access_mode(p, BRW_ALIGN_1); > + brw_set_default_mask_control(p, BRW_MASK_DISABLE); > + > + struct brw_reg addr = vec2(brw_address_reg(0)); > + > + /* We need to move the indirect value into the address register. In order > +* to make things make some sense, we want to respect at least the X > +* component of the swizzle. In order to do that, we need to convert the > +* subnr (probably 0) to an align1 subnr and add in the swizzle. We then > +* use a region of <8,4,0>:uw to pick off the first 2 bytes of the > indirect > +* and splat it out to all four channels of the given half of a0. > +*/ > + assert(brw_is_single_value_swizzle(indirect.swizzle)); > + indirect.subnr = (indirect.subnr * 4 + BRW_GET_SWZ(indirect.swizzle, 0)) > * 2; > + indirect = stride(retype(indirect, BRW_REGISTER_TYPE_UW), 8, 4, 0); > + > + brw_ADD(p, addr, indirect, brw_imm_uw(imm_byte_offset)); > + > + /* Use a <4,1> region Vx1 region*/ > + struct brw_reg src = brw_VxH_indirect(0, 0); > + src.width = BRW_WIDTH_4; > + src.hstride = BRW_HORIZONTAL_STRIDE_1; > + > + brw_MOV(p, dst, retype(src, reg.type)); > + > + brw_pop_insn_state(p); > +} > + > +static void > generate_code(struct brw_codegen *p, >const struct brw_compiler *compiler, >void *log_data, > @@ -1538,6 +1580,9 @@ generate_code(struct brw_codegen *p, > break; >} > > + case SHADER_OPCODE_MOV_INDIRECT: > + generate_mov_indirect(p, inst, dst, src[0], src[1], src[2]); > + >default: > unreachable("Unsupported opcode"); >} > -- > 2.5.0.400.gff86faf > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT
This is an initial implementation of the MOV_INDIRECT opcode in the vec4 backend. Unfortunately, I haven't had a chance to test it in the wild yet, but I think review would still be good. In particular, the approach I took to handling swizzles. Unfortunately, the only indirect MOV instructions you can use in align16 mode have a uniform indirect. This means that, in order to do an indirect MOV, we need to either do two movs or use align1 mode. The problem with two MOVs is that, in order to force first/second half, you have to disable writemasking so you might as well be in align1. In align1 mode, we have two options for indirects: We could use height of 2 and make the hardware grab two sets of 4 consecutive dwords for us or we could use a UV immediate to add 0, 4, 8, and 12 to the four channels. The second method only works easily on SNB+ because we don't have UV immediates on ILK and previous. However, the first method (use a height of 2) may have interesting hardware implications if any of those sets of 4 dwords ever crosses a register boundary. I didn't want to count software always giving us vec4-aligned offsets, so I decided on using the UV immediate. Ok, so swizzles. One of the other advantages of adding a UV immediate is that we can make the immediate be whatever we want. It doesn't have to be (0, 4, 8, 12); it could be anything. So, I used it to implement swizzling. Do we want swizzling? Does it make sense to load a swizzled value starting at an arbitrary offset? Does it make sense to support swizzling but not writemasking? I don't know. Thoughts? --- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 45 1 file changed, 45 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index c3426dd..71a7f63 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1052,6 +1052,48 @@ generate_set_simd4x2_header_gen9(struct brw_codegen *p, } static void +generate_mov_indirect(struct brw_codegen *p, + vec4_instruction *inst, + struct brw_reg dst, struct brw_reg reg, + struct brw_reg indirect, struct brw_reg length) +{ + assert(indirect.type == BRW_REGISTER_TYPE_UD); + + unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr * (REG_SIZE / 2); + + /* This instruction acts in align1 mode */ + assert(inst->force_writemask_all || reg.writemask == 0xf); + + brw_push_insn_state(p); + brw_set_default_access_mode(p, BRW_ALIGN_1); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + + struct brw_reg addr = vec2(brw_address_reg(0)); + + /* We need to move the indirect value into the address register. In order +* to make things make some sense, we want to respect at least the X +* component of the swizzle. In order to do that, we need to convert the +* subnr (probably 0) to an align1 subnr and add in the swizzle. We then +* use a region of <8,4,0>:uw to pick off the first 2 bytes of the indirect +* and splat it out to all four channels of the given half of a0. +*/ + assert(brw_is_single_value_swizzle(indirect.swizzle)); + indirect.subnr = (indirect.subnr * 4 + BRW_GET_SWZ(indirect.swizzle, 0)) * 2; + indirect = stride(retype(indirect, BRW_REGISTER_TYPE_UW), 8, 4, 0); + + brw_ADD(p, addr, indirect, brw_imm_uw(imm_byte_offset)); + + /* Use a <4,1> region Vx1 region*/ + struct brw_reg src = brw_VxH_indirect(0, 0); + src.width = BRW_WIDTH_4; + src.hstride = BRW_HORIZONTAL_STRIDE_1; + + brw_MOV(p, dst, retype(src, reg.type)); + + brw_pop_insn_state(p); +} + +static void generate_code(struct brw_codegen *p, const struct brw_compiler *compiler, void *log_data, @@ -1538,6 +1580,9 @@ generate_code(struct brw_codegen *p, break; } + case SHADER_OPCODE_MOV_INDIRECT: + generate_mov_indirect(p, inst, dst, src[0], src[1], src[2]); + default: unreachable("Unsupported opcode"); } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/shader: return correct attribute location for double matrix arrays
On Thu, 2015-12-10 at 13:41 +1000, Dave Airlie wrote: > From: Dave Airlie > > If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19, > dmat2[2] at 21 etc. The old code was returning 17,18,19. > > I think this code is also wrong for float matricies as well. Would be good to have a piglit for the float case. Reviewed-by: Timothy Arceri > > This partly fixes: > GL41-CTS.vertex_attrib_64bit.limits_test > --- > src/mesa/main/shader_query.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/main/shader_query.cpp > b/src/mesa/main/shader_query.cpp > index 5d15006..faaf08c 100644 > --- a/src/mesa/main/shader_query.cpp > +++ b/src/mesa/main/shader_query.cpp > @@ -858,7 +858,7 @@ program_resource_location(struct > gl_shader_program *shProg, >&& array_index >= RESOURCE_VAR(res)->type->length) { > return -1; >} > - return RESOURCE_VAR(res)->data.location + array_index - > VERT_ATTRIB_GENERIC0; > + return RESOURCE_VAR(res)->data.location + (array_index * > RESOURCE_VAR(res)->type->without_array()->matrix_columns) - > VERT_ATTRIB_GENERIC0; > case GL_PROGRAM_OUTPUT: >/* If the output is an array, fail if the index is out of > bounds. */ >if (array_index > 0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/15] i965/fs: Use MOV_INDIRECT for all indirect uniform loads
On Wed, Dec 9, 2015 at 8:23 PM, Jason Ekstrand wrote: > Instead of using reladdr, this commit changes the FS backend to emit a > MOV_INDIRECT whenever we need an indirect uniform load. We also have to > rework some of the other bits of the backend to handle this new form of > uniform load. The obvious change is that demote_pull_constants now acts > more like a lowering pass when it hits a MOV_INDIRECT. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 72 > +++- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++- > 2 files changed, 86 insertions(+), 39 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index bf446d2..7cc03c5 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1945,8 +1945,8 @@ fs_visitor::assign_constant_locations() > if (inst->src[i].file != UNIFORM) > continue; > > - if (inst->src[i].reladdr) { > -int uniform = inst->src[i].nr; > + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { > +int uniform = inst->src[0].nr; > > /* If this array isn't already present in the pull constant > buffer, > * add it. > @@ -2028,49 +2028,63 @@ fs_visitor::assign_constant_locations() > void > fs_visitor::demote_pull_constants() > { > - foreach_block_and_inst (block, fs_inst, inst, cfg) { > + const unsigned index = > stage_prog_data->binding_table.pull_constants_start; > + > + foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { > + /* Set up the annotation tracking for new generated instructions. */ > + const fs_builder ibld(this, block, inst); > + >for (int i = 0; i < inst->sources; i++) { > if (inst->src[i].file != UNIFORM) > continue; > > - int pull_index; > + /* We'll handle this case later */ > + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) > +continue; > + > unsigned location = inst->src[i].nr + inst->src[i].reg_offset; > - if (location >= uniforms) /* Out of bounds access */ > -pull_index = -1; > - else > -pull_index = pull_constant_loc[location]; > + if (location >= uniforms) > +continue; /* Out of bounds access */ > + > + int pull_index = pull_constant_loc[location]; > > if (pull_index == -1) > continue; > > - /* Set up the annotation tracking for new generated instructions. */ > - const fs_builder ibld(this, block, inst); > - const unsigned index = > stage_prog_data->binding_table.pull_constants_start; > - fs_reg dst = vgrf(glsl_type::float_type); > - > assert(inst->src[i].stride == 0); > > - /* Generate a pull load into dst. */ > - if (inst->src[i].reladdr) { > -VARYING_PULL_CONSTANT_LOAD(ibld, dst, > - brw_imm_ud(index), > - *inst->src[i].reladdr, > - pull_index * 4); > -inst->src[i].reladdr = NULL; > -inst->src[i].stride = 1; > - } else { > -const fs_builder ubld = ibld.exec_all().group(8, 0); > -struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & > ~15); > -ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, > - dst, brw_imm_ud(index), offset); > -inst->src[i].set_smear(pull_index & 3); > - } > - brw_mark_surface_used(prog_data, index); > + fs_reg dst = vgrf(glsl_type::float_type); > + const fs_builder ubld = ibld.exec_all().group(8, 0); > + struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & > ~15); > + ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, > + dst, brw_imm_ud(index), offset); > > /* Rewrite the instruction to use the temporary VGRF. */ > inst->src[i].file = VGRF; > inst->src[i].nr = dst.nr; > inst->src[i].reg_offset = 0; > + inst->src[i].set_smear(pull_index & 3); > + > + brw_mark_surface_used(prog_data, index); > + } > + > + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && > + inst->src[0].file == UNIFORM) { > + > + unsigned location = inst->src[0].nr + inst->src[0].reg_offset; > + if (location >= uniforms) > +continue; /* Out of bounds access */ > + > + int pull_index = pull_constant_loc[location]; > + assert(pull_index >= 0); /* This had better be pull */ > + > + VARYING_PULL_CONSTANT_LOAD(ibld, inst->dst, > +brw_imm_ud(index), > +inst->src[1], > +pull_index * 4); > + inst->remove(block); > + > + brw_mark
[Mesa-dev] [PATCH] draw: fix clipping with linear interpolated values and gl_ClipVertex
From: Roland Scheidegger Discovered this when working on other clip code, apparently didn't work correctly - the combination of linear interpolated values and using gl_ClipVertex produced wrong values (failing all such combinations in piglits glsl-1.30 interpolation tests). Use the pre-clip-pos values when determining the interpolation factor to fix this. Unfortunately I have no idea what I'm doing here really, but it fixes all these failures in piglit (all interpolation-noperspective-XXX-vertex, 10 tests in total). Albeit piglit coverage of clipping isn't great, so hopefully someone can confirm this actually makes sense, and wouldn't cause failures elsewhere... --- src/gallium/auxiliary/draw/draw_pipe_clip.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c b/src/gallium/auxiliary/draw/draw_pipe_clip.c index f2b56b0..7f22eef 100644 --- a/src/gallium/auxiliary/draw/draw_pipe_clip.c +++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c @@ -192,11 +192,11 @@ static void interp(const struct clip_stage *clip, t_nopersp = t; /* find either in.x != out.x or in.y != out.y */ for (k = 0; k < 2; k++) { - if (in->clip[k] != out->clip[k]) { + if (in->pre_clip_pos[k] != out->pre_clip_pos[k]) { /* do divide by W, then compute linear interpolation factor */ -float in_coord = in->clip[k] / in->clip[3]; -float out_coord = out->clip[k] / out->clip[3]; -float dst_coord = dst->clip[k] / dst->clip[3]; +float in_coord = in->pre_clip_pos[k] / in->pre_clip_pos[3]; +float out_coord = out->pre_clip_pos[k] / out->pre_clip_pos[3]; +float dst_coord = dst->pre_clip_pos[k] / dst->pre_clip_pos[3]; t_nopersp = (dst_coord - out_coord) / (in_coord - out_coord); break; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 05/12] nir/lower_io: Get rid of load/store_foo_indirect
On Tue, Dec 8, 2015 at 6:25 PM, Kenneth Graunke wrote: > On Tuesday, December 08, 2015 01:46:22 PM Jason Ekstrand wrote: >> diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c >> index f64ac69..a2723d5 100644 >> --- a/src/glsl/nir/nir_lower_io.c >> +++ b/src/glsl/nir/nir_lower_io.c >> @@ -333,18 +311,18 @@ nir_lower_io(nir_shader *shader, nir_variable_mode >> mode, >> * Return the indirect source for a load/store indirect intrinsic. > > This comment could use updating. How about: > > /** > * Return the offset soruce for a load/store intrinsic. > */ Good call. Fixed locally. --Jason >> */ >> nir_src * >> -nir_get_io_indirect_src(nir_intrinsic_instr *instr) >> +nir_get_io_offset_src(nir_intrinsic_instr *instr) >> { >> switch (instr->intrinsic) { >> - case nir_intrinsic_load_input_indirect: >> - case nir_intrinsic_load_output_indirect: >> - case nir_intrinsic_load_uniform_indirect: >> + case nir_intrinsic_load_input: >> + case nir_intrinsic_load_output: >> + case nir_intrinsic_load_uniform: >>return &instr->src[0]; >> - case nir_intrinsic_load_per_vertex_input_indirect: >> - case nir_intrinsic_load_per_vertex_output_indirect: >> - case nir_intrinsic_store_output_indirect: >> + case nir_intrinsic_load_per_vertex_input: >> + case nir_intrinsic_load_per_vertex_output: >> + case nir_intrinsic_store_output: >>return &instr->src[1]; >> - case nir_intrinsic_store_per_vertex_output_indirect: >> + case nir_intrinsic_store_per_vertex_output: >>return &instr->src[2]; >> default: >>return NULL; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/15] i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD
This is to prevent shader-db regressions from D <-> UD conversions in deref add+mul chains caused by using MOV_INDIRECT. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5e8acec..bf446d2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -174,7 +174,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld, * CSE can later notice that those loads are all the same and eliminate * the redundant ones. */ - fs_reg vec4_offset = vgrf(glsl_type::int_type); + fs_reg vec4_offset = vgrf(glsl_type::uint_type); bld.ADD(vec4_offset, varying_offset, brw_imm_ud(const_offset & ~0xf)); int scale = 1; -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/15] i965/fs: Stop relying on param_size in assign_constant_locations
Now that we have MOV_INDIRECT opcodes, we have all of the size information we need directly in the opcode. With a little restructuring of the algorithm used in assign_constant_locations we don't need param_size anymore. The big thing to watch out for now, however, is that you can have two ranges overlap where neither contains the other. In order to deal with this, we make the first pass just flag what needs pulling and handle assigning pull constant locations until later. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 44 ++-- 1 file changed, 17 insertions(+), 27 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 786c5fb..1add656 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1920,14 +1920,12 @@ fs_visitor::assign_constant_locations() if (dispatch_width != 8) return; - unsigned int num_pull_constants = 0; - - pull_constant_loc = ralloc_array(mem_ctx, int, uniforms); - memset(pull_constant_loc, -1, sizeof(pull_constant_loc[0]) * uniforms); - bool is_live[uniforms]; memset(is_live, 0, sizeof(is_live)); + bool needs_pull[uniforms]; + memset(needs_pull, 0, sizeof(is_live)); + /* First, we walk through the instructions and do two things: * * 1) Figure out which uniforms are live. @@ -1943,20 +1941,15 @@ fs_visitor::assign_constant_locations() if (inst->src[i].file != UNIFORM) continue; - if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { -int uniform = inst->src[0].nr; + int constant_nr = inst->src[i].nr + inst->src[i].reg_offset; -/* If this array isn't already present in the pull constant buffer, - * add it. - */ -if (pull_constant_loc[uniform] == -1) { - assert(param_size[uniform]); - for (int j = 0; j < param_size[uniform]; j++) - pull_constant_loc[uniform + j] = num_pull_constants++; + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { +for (unsigned j = 0; j < inst->src[2].ud / 4; j++) { + is_live[constant_nr + j] = true; + needs_pull[constant_nr + j] = true; } } else { /* Mark the the one accessed uniform as live */ -int constant_nr = inst->src[i].nr + inst->src[i].reg_offset; if (constant_nr >= 0 && constant_nr < (int) uniforms) is_live[constant_nr] = true; } @@ -1973,26 +1966,23 @@ fs_visitor::assign_constant_locations() */ unsigned int max_push_components = 16 * 8; unsigned int num_push_constants = 0; + unsigned int num_pull_constants = 0; push_constant_loc = ralloc_array(mem_ctx, int, uniforms); + pull_constant_loc = ralloc_array(mem_ctx, int, uniforms); for (unsigned int i = 0; i < uniforms; i++) { - if (!is_live[i] || pull_constant_loc[i] != -1) { - /* This UNIFORM register is either dead, or has already been demoted - * to a pull const. Mark it as no longer living in the param[] array. - */ - push_constant_loc[i] = -1; + push_constant_loc[i] = -1; + pull_constant_loc[i] = -1; + + if (!is_live[i]) continue; - } - if (num_push_constants < max_push_components) { - /* Retain as a push constant. Record the location in the params[] - * array. - */ + if (!needs_pull[i] && num_push_constants < max_push_components) { + /* Retain as a push constant */ push_constant_loc[i] = num_push_constants++; } else { - /* Demote to a pull constant. */ - push_constant_loc[i] = -1; + /* We have to pull it */ pull_constant_loc[i] = num_pull_constants++; } } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/15] i965/fs: Add support for doing MOV_INDIRECT on uniforms
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 9b06ed2..de33c1d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -848,7 +848,10 @@ fs_inst::regs_read(int arg) const assert(src[2].file == IMM); unsigned region_length = src[2].ud; - if (src[0].file == FIXED_GRF) { + if (src[0].file == UNIFORM) { +assert(region_length % 4 == 0); +return region_length / 4; + } else if (src[0].file == FIXED_GRF) { /* If the start of the region is not register aligned, then * there's some portion of the register that's technically * unread at the beginning. -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/15] i965/fs: Get rid of reladdr
We aren't using it anymore. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 7 +-- src/mesa/drivers/dri/i965/brw_ir_fs.h | 5 + 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 7cc03c5..786c5fb 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -433,7 +433,6 @@ fs_reg::fs_reg(struct ::brw_reg reg) : { this->reg_offset = 0; this->subreg_offset = 0; - this->reladdr = NULL; this->stride = 1; if (this->file == IMM && (this->type != BRW_REGISTER_TYPE_V && @@ -448,7 +447,6 @@ fs_reg::equals(const fs_reg &r) const { return (this->backend_reg::equals(r) && subreg_offset == r.subreg_offset && - !reladdr && !r.reladdr && stride == r.stride); } @@ -4716,9 +4714,7 @@ fs_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) break; case UNIFORM: fprintf(file, "u%d", inst->src[i].nr + inst->src[i].reg_offset); - if (inst->src[i].reladdr) { -fprintf(file, "+reladdr"); - } else if (inst->src[i].subreg_offset) { + if (inst->src[i].subreg_offset) { fprintf(file, "+%d.%d", inst->src[i].reg_offset, inst->src[i].subreg_offset); } @@ -4829,7 +4825,6 @@ fs_visitor::get_instruction_generating_reg(fs_inst *start, { if (end == start || end->is_partial_write() || - reg.reladdr || !reg.equals(end->dst)) { return NULL; } else { diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h b/src/mesa/drivers/dri/i965/brw_ir_fs.h index c3eec2e..e4f20f4 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h @@ -58,8 +58,6 @@ public: */ int subreg_offset; - fs_reg *reladdr; - /** Register region horizontal stride */ uint8_t stride; }; @@ -136,8 +134,7 @@ component(fs_reg reg, unsigned idx) static inline bool is_uniform(const fs_reg ®) { - return (reg.stride == 0 || reg.is_null()) && - (!reg.reladdr || is_uniform(*reg.reladdr)); + return (reg.stride == 0 || reg.is_null()); } /** -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/15] i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants
This commit moves us to an instruction based model rather than a register-based model for indirects. This is more accurate anyway as we have to emit instructions to resolve the reladdr. It's also a lot simpler because it gets rid of the recursive reladdr problem by design. One side-effect of this is that we need a whole new algorithm in move_uniform_array_access_to_pull_constants. This new algorithm is much more straightforward than the old one and is fairly similar to what we're already doing in the FS backend. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4.h | 3 +- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 +-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 86 -- 4 files changed, 50 insertions(+), 51 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index a697bdf..e4a405b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -775,7 +775,7 @@ vec4_visitor::move_push_constants_to_pull_constants() dst_reg temp = dst_reg(this, glsl_type::vec4_type); emit_pull_constant_load(block, inst, temp, inst->src[i], -pull_constant_loc[uniform]); +pull_constant_loc[uniform], src_reg()); inst->src[i].file = temp.file; inst->src[i].nr = temp.nr; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index f2e5ce1..e6d6c82 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -293,7 +293,8 @@ public: void emit_pull_constant_load(bblock_t *block, vec4_instruction *inst, dst_reg dst, src_reg orig_src, - int base_offset); + int base_offset, +src_reg indirect); void emit_pull_constant_load_reg(dst_reg dst, src_reg surf_index, src_reg offset, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index f965b39..58b6612 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -673,12 +673,14 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) /* Offsets are in bytes but they should always be multiples of 16 */ assert(const_offset->u[0] % 16 == 0); src.reg_offset = const_offset->u[0] / 16; + + emit(MOV(dest, src)); } else { - src_reg tmp = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_D, 1); - src.reladdr = new(mem_ctx) src_reg(tmp); - } + src_reg indirect = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_UD, 1); - emit(MOV(dest, src)); + emit(SHADER_OPCODE_MOV_INDIRECT, dest, src, + indirect, brw_imm_ud(instr->const_index[1])); + } break; } diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 7712d34..e7ab536 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1641,16 +1641,16 @@ vec4_visitor::move_grf_array_access_to_scratch() void vec4_visitor::emit_pull_constant_load(bblock_t *block, vec4_instruction *inst, dst_reg temp, src_reg orig_src, - int base_offset) + int base_offset, src_reg indirect) { int reg_offset = base_offset + orig_src.reg_offset; const unsigned index = prog_data->base.binding_table.pull_constants_start; src_reg offset; - if (orig_src.reladdr) { + if (indirect.file != BAD_FILE) { offset = src_reg(this, glsl_type::int_type); - emit_before(block, inst, ADD(dst_reg(offset), *orig_src.reladdr, + emit_before(block, inst, ADD(dst_reg(offset), indirect, brw_imm_d(reg_offset * 16))); } else if (devinfo->gen >= 8) { /* Store the offset in a GRF so we can send-from-GRF. */ @@ -1685,59 +1685,55 @@ vec4_visitor::move_uniform_array_access_to_pull_constants() { int pull_constant_loc[this->uniforms]; memset(pull_constant_loc, -1, sizeof(pull_constant_loc)); - bool nested_reladdr; - /* Walk through and find array access of uniforms. Put a copy of that -* uniform in the pull constant buffer. -* -* Note that we don't move constant-indexed accesses to arrays. No -* testing has been done of the performance impact of this choice. + /* First, walk through the instructions and determine which things need to +* be pulled. We mark something as needing to bye pulled by setting +* pull_constant_loc to 0. */ - do { - nested_r
[Mesa-dev] [PATCH 03/15] i965/fs: Use MOV_INDIRECT for all indirect uniform loads
Instead of using reladdr, this commit changes the FS backend to emit a MOV_INDIRECT whenever we need an indirect uniform load. We also have to rework some of the other bits of the backend to handle this new form of uniform load. The obvious change is that demote_pull_constants now acts more like a lowering pass when it hits a MOV_INDIRECT. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 72 +++- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++- 2 files changed, 86 insertions(+), 39 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index bf446d2..7cc03c5 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1945,8 +1945,8 @@ fs_visitor::assign_constant_locations() if (inst->src[i].file != UNIFORM) continue; - if (inst->src[i].reladdr) { -int uniform = inst->src[i].nr; + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { +int uniform = inst->src[0].nr; /* If this array isn't already present in the pull constant buffer, * add it. @@ -2028,49 +2028,63 @@ fs_visitor::assign_constant_locations() void fs_visitor::demote_pull_constants() { - foreach_block_and_inst (block, fs_inst, inst, cfg) { + const unsigned index = stage_prog_data->binding_table.pull_constants_start; + + foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { + /* Set up the annotation tracking for new generated instructions. */ + const fs_builder ibld(this, block, inst); + for (int i = 0; i < inst->sources; i++) { if (inst->src[i].file != UNIFORM) continue; - int pull_index; + /* We'll handle this case later */ + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) +continue; + unsigned location = inst->src[i].nr + inst->src[i].reg_offset; - if (location >= uniforms) /* Out of bounds access */ -pull_index = -1; - else -pull_index = pull_constant_loc[location]; + if (location >= uniforms) +continue; /* Out of bounds access */ + + int pull_index = pull_constant_loc[location]; if (pull_index == -1) continue; - /* Set up the annotation tracking for new generated instructions. */ - const fs_builder ibld(this, block, inst); - const unsigned index = stage_prog_data->binding_table.pull_constants_start; - fs_reg dst = vgrf(glsl_type::float_type); - assert(inst->src[i].stride == 0); - /* Generate a pull load into dst. */ - if (inst->src[i].reladdr) { -VARYING_PULL_CONSTANT_LOAD(ibld, dst, - brw_imm_ud(index), - *inst->src[i].reladdr, - pull_index * 4); -inst->src[i].reladdr = NULL; -inst->src[i].stride = 1; - } else { -const fs_builder ubld = ibld.exec_all().group(8, 0); -struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & ~15); -ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, - dst, brw_imm_ud(index), offset); -inst->src[i].set_smear(pull_index & 3); - } - brw_mark_surface_used(prog_data, index); + fs_reg dst = vgrf(glsl_type::float_type); + const fs_builder ubld = ibld.exec_all().group(8, 0); + struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & ~15); + ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, + dst, brw_imm_ud(index), offset); /* Rewrite the instruction to use the temporary VGRF. */ inst->src[i].file = VGRF; inst->src[i].nr = dst.nr; inst->src[i].reg_offset = 0; + inst->src[i].set_smear(pull_index & 3); + + brw_mark_surface_used(prog_data, index); + } + + if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && + inst->src[0].file == UNIFORM) { + + unsigned location = inst->src[0].nr + inst->src[0].reg_offset; + if (location >= uniforms) +continue; /* Out of bounds access */ + + int pull_index = pull_constant_loc[location]; + assert(pull_index >= 0); /* This had better be pull */ + + VARYING_PULL_CONSTANT_LOAD(ibld, inst->dst, +brw_imm_ud(index), +inst->src[1], +pull_index * 4); + inst->remove(block); + + brw_mark_surface_used(prog_data, index); } } invalidate_live_intervals(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 15d9b1c..bf239c3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i
[Mesa-dev] [PATCH 01/15] nir: Add another index to load_uniform to specify the range read
--- src/glsl/nir/nir_intrinsics.h | 7 +-- src/glsl/nir/nir_lower_io.c | 5 + 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index 63df21e..c329a82 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -236,6 +236,9 @@ SYSTEM_VALUE(helper_invocation, 1, 0) * of the start of the variable being loaded and and the offset source is a * offset into that variable. * + * Uniform load operations have a second index that specifies the size of the + * variable being loaded. If const_index[1] == 0, then the size is unknown. + * * Some load operations such as UBO/SSBO load and per_vertex loads take an * additional source to specify which UBO/SSBO/vertex to load from. * @@ -248,8 +251,8 @@ SYSTEM_VALUE(helper_invocation, 1, 0) #define LOAD(name, srcs, indices, flags) \ INTRINSIC(load_##name, srcs, ARR(1, 1, 1, 1), true, 0, 0, indices, flags) -/* src[] = { offset }. const_index[] = { base } */ -LOAD(uniform, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) +/* src[] = { offset }. const_index[] = { base, size } */ +LOAD(uniform, 1, 2, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) /* src[] = { buffer_index, offset }. No const_index */ LOAD(ubo, 2, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) /* src[] = { offset }. const_index[0] = { base } */ diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c index a2723d5..b73ceec 100644 --- a/src/glsl/nir/nir_lower_io.c +++ b/src/glsl/nir/nir_lower_io.c @@ -216,6 +216,11 @@ nir_lower_io_block(nir_block *block, void *void_state) load->const_index[0] = intrin->variables[0]->var->data.driver_location; + if (load->intrinsic == nir_intrinsic_load_uniform) { +load->const_index[1] = + state->type_size(intrin->variables[0]->var->type); + } + if (per_vertex) load->src[0] = nir_src_for_ssa(vertex_index); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/15] i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions
It should work fine without it and the visitor can set it if it wants. --- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 1 - 1 file changed, 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index c25da07..d86eee1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -366,7 +366,6 @@ fs_generator::generate_mov_indirect(fs_inst *inst, assert(inst->exec_size == 8 || devinfo->gen >= 8); brw_MOV(p, addr, indirect_byte_offset); - brw_inst_set_mask_control(devinfo, brw_last_inst, BRW_MASK_DISABLE); brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset), dst.type)); } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/15] i965/vec4: Inline get_pull_constant_offset
It's not really doing enough anymore to justify a helper function. --- src/mesa/drivers/dri/i965/brw_vec4.h | 2 -- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 37 ++ 2 files changed, 14 insertions(+), 25 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index ae5bf69..f2e5ce1 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -284,8 +284,6 @@ public: src_reg get_scratch_offset(bblock_t *block, vec4_instruction *inst, src_reg *reladdr, int reg_offset); - src_reg get_pull_constant_offset(bblock_t *block, vec4_instruction *inst, - src_reg *reladdr, int reg_offset); void emit_scratch_read(bblock_t *block, vec4_instruction *inst, dst_reg dst, src_reg orig_src, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 443d0eb..7712d34 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1464,27 +1464,6 @@ vec4_visitor::get_scratch_offset(bblock_t *block, vec4_instruction *inst, } } -src_reg -vec4_visitor::get_pull_constant_offset(bblock_t * block, vec4_instruction *inst, - src_reg *reladdr, int reg_offset) -{ - if (reladdr) { - src_reg index = src_reg(this, glsl_type::int_type); - - emit_before(block, inst, ADD(dst_reg(index), *reladdr, - brw_imm_d(reg_offset * 16))); - - return index; - } else if (devinfo->gen >= 8) { - /* Store the offset in a GRF so we can send-from-GRF. */ - src_reg offset = src_reg(this, glsl_type::int_type); - emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset * 16))); - return offset; - } else { - return brw_imm_d(reg_offset * 16); - } -} - /** * Emits an instruction before @inst to load the value named by @orig_src * from scratch space at @base_offset to @temp. @@ -1666,8 +1645,20 @@ vec4_visitor::emit_pull_constant_load(bblock_t *block, vec4_instruction *inst, { int reg_offset = base_offset + orig_src.reg_offset; const unsigned index = prog_data->base.binding_table.pull_constants_start; - src_reg offset = get_pull_constant_offset(block, inst, orig_src.reladdr, - reg_offset); + + src_reg offset; + if (orig_src.reladdr) { + offset = src_reg(this, glsl_type::int_type); + + emit_before(block, inst, ADD(dst_reg(offset), *orig_src.reladdr, + brw_imm_d(reg_offset * 16))); + } else if (devinfo->gen >= 8) { + /* Store the offset in a GRF so we can send-from-GRF. */ + offset = src_reg(this, glsl_type::int_type); + emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset * 16))); + } else { + offset = brw_imm_d(reg_offset * 16); + } emit_pull_constant_load_reg(temp, brw_imm_ud(index), -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/15] i965/vec4: Get rid of the uniform_size array
--- src/mesa/drivers/dri/i965/brw_vec4.cpp| 8 src/mesa/drivers/dri/i965/brw_vec4.h | 2 -- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp| 9 - src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 11 --- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 1 - 5 files changed, 31 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e4a405b..1304e23 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -466,11 +466,6 @@ vec4_visitor::split_uniform_registers() inst->src[i].reg_offset = 0; } } - - /* Update that everything is now vector-sized. */ - for (int i = 0; i < this->uniforms; i++) { - this->uniform_size[i] = 1; - } } void @@ -528,7 +523,6 @@ vec4_visitor::pack_uniform_registers() * push constants. */ for (int src = 0; src < uniforms; src++) { - assert(src < uniform_array_size); int size = chans_used[src]; if (size == 0) @@ -1588,8 +1582,6 @@ vec4_visitor::setup_uniforms(int reg) * matter what, or the GPU would hang. */ if (devinfo->gen < 6 && this->uniforms == 0) { - assert(this->uniforms < this->uniform_array_size); - stage_prog_data->param = reralloc(NULL, stage_prog_data->param, const gl_constant_value *, 4); for (unsigned int i = 0; i < 4; i++) { diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index e6d6c82..0dc04ea 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -115,8 +115,6 @@ public: */ dst_reg output_reg[BRW_VARYING_SLOT_COUNT]; const char *output_reg_annotation[BRW_VARYING_SLOT_COUNT]; - int *uniform_size; - int uniform_array_size; /*< Size of the uniform_size array */ int uniforms; src_reg shader_start_time; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 58b6612..bafc9a5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -118,15 +118,6 @@ void vec4_visitor::nir_setup_uniforms() { uniforms = nir->num_uniforms / 16; - - nir_foreach_variable(var, &nir->uniforms) { - /* UBO's and atomics don't take up space in the uniform file */ - if (var->interface_type != NULL || var->type->contains_atomic()) - continue; - - if (type_size_vec4(var->type) > 0) - uniform_size[var->data.driver_location / 16] = type_size_vec4(var->type); - } } void diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index e7ab536..138db7e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1786,17 +1786,6 @@ vec4_visitor::vec4_visitor(const struct brw_compiler *compiler, this->max_grf = devinfo->gen >= 7 ? GEN7_MRF_HACK_START : BRW_MAX_GRF; this->uniforms = 0; - - /* Initialize uniform_array_size to at least 1 because pre-gen6 VS requires -* at least one. See setup_uniforms() in brw_vec4.cpp. -*/ - this->uniform_array_size = 1; - if (prog_data) { - this->uniform_array_size = - MAX2(DIV_ROUND_UP(stage_prog_data->nr_params, 4), 1); - } - - this->uniform_size = rzalloc_array(mem_ctx, int, this->uniform_array_size); } vec4_visitor::~vec4_visitor() diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp index fd8be7d..205323c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp @@ -261,7 +261,6 @@ void vec4_vs_visitor::setup_uniform_clipplane_values() { for (int i = 0; i < key->nr_userclip_plane_consts; ++i) { - assert(this->uniforms < uniform_array_size); this->userplane[i] = dst_reg(UNIFORM, this->uniforms); this->userplane[i].type = BRW_REGISTER_TYPE_F; for (int j = 0; j < 4; ++j) { -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 15/15] i965/fs: Push small uniform arrays
Unfortunately, this also means that we need to use a slightly different algorithm for assign_constant_locations. The old algorithm worked based on the assumption that each read of a uniform value read exactly one float. If it encountered a MOV_INDIRECT, it would immediately bail and push the whole thing. Since we can now read ranges using MOV_INDIRECT, we need to be able to push a series of floats without breaking them up. To do this, we use an algorithm similar to the on in split_virtual_grfs. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 76 +--- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f3cf129..98f8336 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1911,9 +1911,7 @@ fs_visitor::compact_virtual_grfs() * maximum number of fragment shader uniform components (64). If * there are too many of these, they'd fill up all of register space. * So, this will push some of them out to the pull constant buffer and - * update the program to load them. We also use pull constants for all - * indirect constant loads because we don't support indirect accesses in - * registers yet. + * update the program to load them. */ void fs_visitor::assign_constant_locations() @@ -1925,15 +1923,18 @@ fs_visitor::assign_constant_locations() bool is_live[uniforms]; memset(is_live, 0, sizeof(is_live)); - bool needs_pull[uniforms]; - memset(needs_pull, 0, sizeof(is_live)); + /* For each uniform slot, a value of true indicates that the given slot and +* the next slot must remain contiguous. This is used to keep us from +* splitting arrays apart. +*/ + bool contiguous[uniforms]; + memset(contiguous, 0, sizeof(contiguous)); /* First, we walk through the instructions and do two things: * * 1) Figure out which uniforms are live. * -* 2) Find all indirect access of uniform arrays and flag them as needing -* to go into the pull constant buffer. +* 2) Mark any indirectly used ranges of registers as contiguous. * * Note that we don't move constant-indexed accesses to arrays. No * testing has been done of the performance impact of this choice. @@ -1946,12 +1947,16 @@ fs_visitor::assign_constant_locations() int constant_nr = inst->src[i].nr + inst->src[i].reg_offset; if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { -for (unsigned j = 0; j < inst->src[2].ud / 4; j++) { - is_live[constant_nr + j] = true; - needs_pull[constant_nr + j] = true; +assert(inst->src[2].ud % 4 == 0); +unsigned last = constant_nr + (inst->src[2].ud / 4) - 1; +assert(last < uniforms); + +for (unsigned j = constant_nr; j < last; j++) { + is_live[j] = true; + contiguous[j] = true; } +is_live[last] = true; } else { -/* Mark the the one accessed uniform as live */ if (constant_nr >= 0 && constant_nr < (int) uniforms) is_live[constant_nr] = true; } @@ -1966,26 +1971,49 @@ fs_visitor::assign_constant_locations() * If changing this value, note the limitation about total_regs in * brw_curbe.c. */ - unsigned int max_push_components = 16 * 8; + const unsigned int max_push_components = 16 * 8; + + /* We push small arrays, but no bigger than 16 floats. This is big enough +* for a vec4 but hopefully not large enough to push out other stuff. We +* should probably use a better heuristic at some point. +*/ + const unsigned int max_chunk_size = 16; + unsigned int num_push_constants = 0; unsigned int num_pull_constants = 0; push_constant_loc = ralloc_array(mem_ctx, int, uniforms); pull_constant_loc = ralloc_array(mem_ctx, int, uniforms); - for (unsigned int i = 0; i < uniforms; i++) { - push_constant_loc[i] = -1; - pull_constant_loc[i] = -1; + int chunk_start = -1; + for (unsigned u = 0; u < uniforms; u++) { + push_constant_loc[u] = -1; + pull_constant_loc[u] = -1; - if (!is_live[i]) + if (!is_live[u]) continue; - if (!needs_pull[i] && num_push_constants < max_push_components) { - /* Retain as a push constant */ - push_constant_loc[i] = num_push_constants++; - } else { - /* We have to pull it */ - pull_constant_loc[i] = num_pull_constants++; + /* This is the first live uniform in the chunk */ + if (chunk_start < 0) + chunk_start = u; + + /* If this element does not need to be contiguous with the next, we + * split at this point and everthing between chunk_start and u forms a + * single chunk. + */ + if (!contiguous[u]) { + unsigned chunk_size = u - chunk_start + 1; + + if (n
[Mesa-dev] [PATCH 12/15] i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr
The subnr field is in bytes so we don't need to multiply by type_sz. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index de33c1d..9eaf8d0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -865,7 +865,7 @@ fs_inst::regs_read(int arg) const * unread portion at the beginning. */ if (src[0].subnr) - region_length += src[0].subnr * type_sz(src[0].type); + region_length += src[0].subnr; return DIV_ROUND_UP(region_length, REG_SIZE); } else { -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/15] i965/fs: Get rid of the param_size array
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 1 - src/mesa/drivers/dri/i965/brw_fs.h | 2 -- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 9 - src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 --- 4 files changed, 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 1add656..9b06ed2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1018,7 +1018,6 @@ fs_visitor::import_uniforms(fs_visitor *v) this->push_constant_loc = v->push_constant_loc; this->pull_constant_loc = v->pull_constant_loc; this->uniforms = v->uniforms; - this->param_size = v->param_size; } fs_reg * diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index b55589f..b42c49d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -311,8 +311,6 @@ public: struct brw_stage_prog_data *prog_data; struct gl_program *prog; - int *param_size; - int *virtual_grf_start; int *virtual_grf_end; brw::fs_live_variables *live_intervals; diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index bf239c3..22323b4 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -174,15 +174,6 @@ fs_visitor::nir_setup_uniforms() return; uniforms = nir->num_uniforms / 4; - - nir_foreach_variable(var, &nir->uniforms) { - /* UBO's and atomics don't take up space in the uniform file */ - if (var->interface_type != NULL || var->type->contains_atomic()) - continue; - - if (type_size_scalar(var->type) > 0) - param_size[var->data.driver_location / 4] = type_size_scalar(var->type); - } } static bool diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 68f2548..5616963 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1011,9 +1011,6 @@ fs_visitor::init() this->spilled_any_registers = false; this->do_dual_src = false; - - if (dispatch_width == 8) - this->param_size = rzalloc_array(mem_ctx, int, stage_prog_data->nr_params); } fs_visitor::~fs_visitor() -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 14/15] i965/fs: Rename demote_pull_constants to lower_constant_loads
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_fs.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index a2ec03e..f3cf129 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2016,7 +2016,7 @@ fs_visitor::assign_constant_locations() * or VARYING_PULL_CONSTANT_LOAD instructions which load values into VGRFs. */ void -fs_visitor::demote_pull_constants() +fs_visitor::lower_constant_loads() { const unsigned index = stage_prog_data->binding_table.pull_constants_start; @@ -5033,7 +5033,7 @@ fs_visitor::optimize() bld = fs_builder(this, 64); assign_constant_locations(); - demote_pull_constants(); + lower_constant_loads(); validate(); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index b42c49d..eb6ecb2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -136,7 +136,7 @@ public: void split_virtual_grfs(); bool compact_virtual_grfs(); void assign_constant_locations(); - void demote_pull_constants(); + void lower_constant_loads(); void invalidate_live_intervals(); void calculate_live_intervals(); void calculate_register_pressure(); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 13/15] i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware
While we're at it, we also add support for the possibility that the indirect is, in fact, a constant. This shouldn't happen in the common case (if it does, that means NIR failed to constant-fold something), but it's possible so we should handle it. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++ src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 51 +++--- 2 files changed, 42 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 9eaf8d0..a2ec03e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -4424,6 +4424,10 @@ get_lowered_simd_width(const struct brw_device_info *devinfo, case SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL: return 8; + case SHADER_OPCODE_MOV_INDIRECT: + /* Prior to Broadwell, we only have 8 address subregisters */ + return devinfo->gen < 8 ? 8 : inst->exec_size; + default: return inst->exec_size; } diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index d86eee1..7fa6d84 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -351,22 +351,47 @@ fs_generator::generate_mov_indirect(fs_inst *inst, unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr; - /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */ - struct brw_reg addr = vec8(brw_address_reg(0)); + if (indirect_byte_offset.file == BRW_IMMEDIATE_VALUE) { + imm_byte_offset += indirect_byte_offset.ud; - /* The destination stride of an instruction (in bytes) must be greater -* than or equal to the size of the rest of the instruction. Since the -* address register is of type UW, we can't use a D-type instruction. -* In order to get around this, re re-type to UW and use a stride. -*/ - indirect_byte_offset = - retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW); + reg.nr = imm_byte_offset / REG_SIZE; + reg.subnr = imm_byte_offset % REG_SIZE; + brw_MOV(p, dst, reg); + } else { + /* Prior to Broadwell, there are only 8 address registers. */ + assert(inst->exec_size == 8 || devinfo->gen >= 8); - /* Prior to Broadwell, there are only 8 address registers. */ - assert(inst->exec_size == 8 || devinfo->gen >= 8); + /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */ + struct brw_reg addr = vec8(brw_address_reg(0)); - brw_MOV(p, addr, indirect_byte_offset); - brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset), dst.type)); + /* The destination stride of an instruction (in bytes) must be greater + * than or equal to the size of the rest of the instruction. Since the + * address register is of type UW, we can't use a D-type instruction. + * In order to get around this, re re-type to UW and use a stride. + */ + indirect_byte_offset = + retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW); + + if (devinfo->gen < 8) { + /* Prior to broadwell, we have a restriction that the bottom 5 bits + * of the base offset and the bottom 5 bits of the indirect must add + * to less than 32. In other words, the hardware needs to be able to + * add the bottom five bits of the two to get the subnumber and add + * the next 7 bits of each to get the actual register number. Since + * the indirect may cause us to cross a register boundary, this makes + * it almost useless. We could try and do something clever where we + * use a actual base offset if base_offset % 32 == 0 but that would + * mean we were generating different code depending on the base + * offset. Instead, for the sake of consistency, we'll just do the + * add ourselves. + */ + brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset)); + brw_MOV(p, dst, retype(brw_VxH_indirect(0, 0), dst.type)); + } else { + brw_MOV(p, addr, indirect_byte_offset); + brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset), dst.type)); + } + } } void -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/15] i965: Rework uniform handling in the back-end
This series is kind-of two series in one. First, we rework the way we do uniform handling all throughout both backends. In particular, we stop using reladdr and, instead, start using MOV_INDIRECT. Doing this allows us some nice simplifications: First, we no longer have the recursive reladdr problem; it's gone by design. Second, we can now get rid of the extra arrays of uniform sizes that we've been carying around everywhere. The fact that they exist has bothered me for a while, but no more! The second half of the series does a bunch of cleanups on the MOV_INDIRECT instruction in the FS and then starts using it to do indirect push constants. Jason Ekstrand (15): nir: Add another index to load_uniform to specify the range read i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD i965/fs: Use MOV_INDIRECT for all indirect uniform loads i965/fs: Get rid of reladdr i965/fs: Stop relying on param_size in assign_constant_locations i965/fs: Get rid of the param_size array i965/vec4: Inline get_pull_constant_offset i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants i965/vec4: Get rid of the uniform_size array i965/fs: Add support for doing MOV_INDIRECT on uniforms i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware i965/fs: Rename demote_pull_constants to lower_constant_loads i965/fs: Push small uniform arrays src/glsl/nir/nir_intrinsics.h | 7 +- src/glsl/nir/nir_lower_io.c | 5 + src/mesa/drivers/dri/i965/brw_fs.cpp | 189 +- src/mesa/drivers/dri/i965/brw_fs.h| 4 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp| 52 -- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 62 --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 - src/mesa/drivers/dri/i965/brw_ir_fs.h | 5 +- src/mesa/drivers/dri/i965/brw_vec4.cpp| 10 +- src/mesa/drivers/dri/i965/brw_vec4.h | 7 +- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp| 19 +-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 130 ++- src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp | 1 - 13 files changed, 267 insertions(+), 227 deletions(-) -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/shader: return correct attribute location for double matrix arrays
From: Dave Airlie If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19, dmat2[2] at 21 etc. The old code was returning 17,18,19. I think this code is also wrong for float matricies as well. This partly fixes: GL41-CTS.vertex_attrib_64bit.limits_test --- src/mesa/main/shader_query.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp index 5d15006..faaf08c 100644 --- a/src/mesa/main/shader_query.cpp +++ b/src/mesa/main/shader_query.cpp @@ -858,7 +858,7 @@ program_resource_location(struct gl_shader_program *shProg, && array_index >= RESOURCE_VAR(res)->type->length) { return -1; } - return RESOURCE_VAR(res)->data.location + array_index - VERT_ATTRIB_GENERIC0; + return RESOURCE_VAR(res)->data.location + (array_index * RESOURCE_VAR(res)->type->without_array()->matrix_columns) - VERT_ATTRIB_GENERIC0; case GL_PROGRAM_OUTPUT: /* If the output is an array, fail if the index is out of bounds. */ if (array_index > 0 -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: don't call of u_prims_for_vertices for patches and rectangles
On 10.12.2015 06:58, Marek Olšák wrote: > From: Marek Olšák > > Both caused a crash due to a division by zero in that function. > This is an alternative fix. > > Cc: 11.0 11.1 > --- > src/gallium/drivers/radeonsi/si_state_draw.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c > b/src/gallium/drivers/radeonsi/si_state_draw.c > index ee84a1f..e550011 100644 > --- a/src/gallium/drivers/radeonsi/si_state_draw.c > +++ b/src/gallium/drivers/radeonsi/si_state_draw.c > @@ -216,6 +216,18 @@ static void si_emit_derived_tess_state(struct si_context > *sctx, > radeon_emit(cs, tcs_out_layout | (num_tcs_output_cp << 26)); > } > > +static unsigned si_num_prims_for_vertices(const struct pipe_draw_info *info) > +{ > + switch (info->mode) { > + case PIPE_PRIM_PATCHES: > + return info->count / info->vertices_per_patch; > + case R600_PRIM_RECTANGLE_LIST: > + return info->count / 3; > + default: > + return u_prims_for_vertices(info->mode, info->count); > + } > +} I don't suppose it makes sense to handle PIPE_PRIM_PATCHES in u_prims_for_vertices? Either way, Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] winsys/radeon: clear the buffer cache on allocation failure and try again
On 10.12.2015 06:58, Marek Olšák wrote: > From: Marek Olšák The series is Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: also print hexadecimal values for register fields in the IB parser
On 10.12.2015 07:54, Marek Olšák wrote: > From: Marek Olšák > > --- > src/gallium/drivers/radeonsi/si_debug.c | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/drivers/radeonsi/si_debug.c > b/src/gallium/drivers/radeonsi/si_debug.c > index cce665e..034acf5 100644 > --- a/src/gallium/drivers/radeonsi/si_debug.c > +++ b/src/gallium/drivers/radeonsi/si_debug.c > @@ -61,13 +61,16 @@ static void print_spaces(FILE *f, unsigned num) > static void print_value(FILE *file, uint32_t value, int bits) > { > /* Guess if it's int or float */ > - if (value <= (1 << 15)) > - fprintf(file, "%u\n", value); > - else { > + if (value <= (1 << 15)) { > + if (value <= 9) > + fprintf(file, "%u\n", value); > + else > + fprintf(file, "%u (0x%0*x)\n", value, bits / 4, value); > + } else { > float f = uif(value); > > if (fabs(f) < 10 && f*10 == floor(f*10)) > - fprintf(file, "%.1ff\n", f); > + fprintf(file, "%.1ff (0x%0*x)\n", f, bits / 4, value); > else > /* Don't print more leading zeros than there are bits. > */ > fprintf(file, "0x%0*x\n", bits / 4, value); > Reviewed-by: Michel Dänzer -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa: docs: i965: Use correct doxygen groupings syntax
On 12/07/2015 12:18 PM, Sarah Sharp wrote: > When reading the source code, it's useful to indicate that a group of > fields in a struct are related in someway. The convention in Mesa seems > to be: > > struct foo { > /** > * Related fields description > * @{ > */ > int bar; > char baz; > */@} /*@}*/ > long qux; > } > > However, the doxygen syntax for grouping is: > > struct foo { > /** > * @defgroup group_name Related fields description There are 193 uses of \name, but only one use @defgroup. Assuming that \name is also correct, and https://www.stack.nl/~dimitri/doxygen/manual/grouping.html#memgroup suggests that it is, we should stick with that. If it's not correct... can a sed job convert the \name uses to \defgroup? > * @{ > */ > int bar; > char baz; > */@} /*@}*/ > long qux; > } > > https://www.stack.nl/~dimitri/doxygen/manual/grouping.html > > Without the group name definition, the fields don't get properly > grouped. Instead, the group description is applied to the first field. > You can see this in the current doxygen build, since there are no links > to groups (doxygen calls them modules). > > Fix this. Use the brw_ prefix for the group name, since I'm pretty sure > group names are global. > > Signed-off-by: Sarah Sharp > --- > src/mesa/drivers/dri/i965/brw_device_info.h | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_device_info.h > b/src/mesa/drivers/dri/i965/brw_device_info.h > index 4911c23..89c1501 100644 > --- a/src/mesa/drivers/dri/i965/brw_device_info.h > +++ b/src/mesa/drivers/dri/i965/brw_device_info.h > @@ -49,8 +49,8 @@ struct brw_device_info > bool has_resource_streamer; > > /** > -* Quirks: > -* @{ > +* @defgroup brw_quirks Hardware quirks > +* @{ > */ > bool has_negative_rhw_bug; > > @@ -62,11 +62,11 @@ struct brw_device_info > * fragment shader instructions. > */ > bool needs_unlit_centroid_workaround; > - /** @} */ > + /* @} */ > > /** > -* GPU Limits: > -* @{ > +* @defgroup brw_gpu_limits GPU hardware limitations > +* @{ > */ > unsigned max_vs_threads; > unsigned max_hs_threads; > @@ -83,7 +83,7 @@ struct brw_device_info >unsigned max_ds_entries; >unsigned max_gs_entries; > } urb; > - /** @} */ > + /* @} */ > }; > > const struct brw_device_info *brw_get_device_info(int devid); > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] mesa: docs: Add link to planet.freedesktop.org
This patch is Reviewed-by: Ian Romanick On 12/07/2015 12:18 PM, Sarah Sharp wrote: > The freedesktop.org blog feeds aren't mentioned on either mesa3d.org or > any of the graphics project wikis (including the DRI wiki) on > freedeskop.org. Fix that by linking to it from the sidebar. > > Signed-off-by: Sarah Sharp > --- > docs/contents.html | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/docs/contents.html b/docs/contents.html > index 6612cbe..a683b07 100644 > --- a/docs/contents.html > +++ b/docs/contents.html > @@ -90,6 +90,7 @@ > http://www.opengl.org"; target="_parent">OpenGL website > http://dri.freedesktop.org"; target="_parent">DRI website > http://www.freedesktop.org"; target="_parent">freedesktop.org > +http://planet.freedesktop.org"; target="_parent">Developer > blogs > > > Hosted by: > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallium/ddebug: add GALLIUM_DDEBUG_SKIP option
From: Nicolai Hähnle When we know that hangs occur only very late in a reproducible run (e.g. apitrace), we can save a lot of debugging time by skipping the flush and hang detection for earlier draw calls. --- src/gallium/drivers/ddebug/dd_draw.c | 39 +- src/gallium/drivers/ddebug/dd_pipe.h | 3 +++ src/gallium/drivers/ddebug/dd_screen.c | 9 3 files changed, 36 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/ddebug/dd_draw.c b/src/gallium/drivers/ddebug/dd_draw.c index b443c5b..0778099 100644 --- a/src/gallium/drivers/ddebug/dd_draw.c +++ b/src/gallium/drivers/ddebug/dd_draw.c @@ -588,8 +588,11 @@ dd_context_flush(struct pipe_context *_pipe, static void dd_before_draw(struct dd_context *dctx) { - if (dd_screen(dctx->base.screen)->mode == DD_DETECT_HANGS && - !dd_screen(dctx->base.screen)->no_flush) + struct dd_screen *dscreen = dd_screen(dctx->base.screen); + + if (dscreen->mode == DD_DETECT_HANGS && + !dscreen->no_flush && + dctx->num_draw_calls >= dscreen->skip_count) dd_flush_and_handle_hang(dctx, NULL, 0, "GPU hang most likely caused by internal " "driver commands"); @@ -598,22 +601,28 @@ dd_before_draw(struct dd_context *dctx) static void dd_after_draw(struct dd_context *dctx, struct dd_call *call) { - switch (dd_screen(dctx->base.screen)->mode) { - case DD_DETECT_HANGS: - if (!dd_screen(dctx->base.screen)->no_flush && - dd_flush_and_check_hang(dctx, NULL, 0)) { - dd_dump_call(dctx, call, PIPE_DEBUG_DEVICE_IS_HUNG); + struct dd_screen *dscreen = dd_screen(dctx->base.screen); - /* Terminate the process to prevent future hangs. */ - dd_kill_process(); + if (dctx->num_draw_calls >= dscreen->skip_count) { + switch (dscreen->mode) { + case DD_DETECT_HANGS: + if (!dscreen->no_flush && +dd_flush_and_check_hang(dctx, NULL, 0)) { +dd_dump_call(dctx, call, PIPE_DEBUG_DEVICE_IS_HUNG); + +/* Terminate the process to prevent future hangs. */ +dd_kill_process(); + } + break; + case DD_DUMP_ALL_CALLS: + dd_dump_call(dctx, call, 0); + break; + default: + assert(0); } - break; - case DD_DUMP_ALL_CALLS: - dd_dump_call(dctx, call, 0); - break; - default: - assert(0); } + + ++dctx->num_draw_calls; } static void diff --git a/src/gallium/drivers/ddebug/dd_pipe.h b/src/gallium/drivers/ddebug/dd_pipe.h index 34f5920..a045518 100644 --- a/src/gallium/drivers/ddebug/dd_pipe.h +++ b/src/gallium/drivers/ddebug/dd_pipe.h @@ -45,6 +45,7 @@ struct dd_screen unsigned timeout_ms; enum dd_mode mode; bool no_flush; + unsigned skip_count; }; struct dd_query @@ -110,6 +111,8 @@ struct dd_context struct pipe_scissor_state scissors[PIPE_MAX_VIEWPORTS]; struct pipe_viewport_state viewports[PIPE_MAX_VIEWPORTS]; float tess_default_levels[6]; + + unsigned num_draw_calls; }; diff --git a/src/gallium/drivers/ddebug/dd_screen.c b/src/gallium/drivers/ddebug/dd_screen.c index a776580..2716845 100644 --- a/src/gallium/drivers/ddebug/dd_screen.c +++ b/src/gallium/drivers/ddebug/dd_screen.c @@ -290,6 +290,9 @@ ddebug_screen_create(struct pipe_screen *screen) puts("$HOME/"DD_DIR"/ when a hang is detected."); puts("If 'noflush' is specified, only detect hangs in pipe->flush."); puts(""); + puts(" GALLIUM_DDEBUG_SKIP=[count]"); + puts("Skip flush and hang detection for the given initial number of draw calls."); + puts(""); exit(0); } @@ -349,5 +352,11 @@ ddebug_screen_create(struct pipe_screen *screen) assert(0); } + dscreen->skip_count = debug_get_num_option("GALLIUM_DDEBUG_SKIP", 0); + if (dscreen->skip_count > 0) { + fprintf(stderr, "Gallium debugger skipping the first %u draw calls.\n", + dscreen->skip_count); + } + return &dscreen->base; } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] gallium/ddebug: regularly log the total number of draw calls
From: Nicolai Hähnle This helps in the use of GALLIUM_DDEBUG_SKIP: first run a target application with skip set to a very large number and note how many draw calls happen before the bug. Then re-run, skipping the corresponding number of calls. Despite the additional run, this can still be much faster than not skipping anything. --- src/gallium/drivers/ddebug/dd_draw.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/ddebug/dd_draw.c b/src/gallium/drivers/ddebug/dd_draw.c index 0778099..0d7ee9a 100644 --- a/src/gallium/drivers/ddebug/dd_draw.c +++ b/src/gallium/drivers/ddebug/dd_draw.c @@ -623,6 +623,9 @@ dd_after_draw(struct dd_context *dctx, struct dd_call *call) } ++dctx->num_draw_calls; + if (dscreen->skip_count && dctx->num_draw_calls % 1 == 0) + fprintf(stderr, "Gallium debugger reached %u draw calls.\n", + dctx->num_draw_calls); } static void -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/varray: set double arrays to non-normalised.
...and it matches what we do for single precision. Reviewed-by: Ian Romanick Presumably this should also be a candidate for 11.0 and 11.1? On 12/09/2015 04:37 PM, Dave Airlie wrote: > From: Dave Airlie > > Doesn't have any effect in practice I don't think, but > CTS reads back using GetVertexAttrib. > > This fixes: GL41-CTS.vertex_attrib_64bit.get_vertex_attrib > > Signed-off-by: Dave Airlie > --- > src/mesa/main/varray.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c > index 58f376b..c71e16a 100644 > --- a/src/mesa/main/varray.c > +++ b/src/mesa/main/varray.c > @@ -776,7 +776,7 @@ _mesa_VertexAttribLPointer(GLuint index, GLint size, > GLenum type, > > update_array(ctx, "glVertexAttribLPointer", VERT_ATTRIB_GENERIC(index), > legalTypes, 1, 4, > -size, type, stride, GL_TRUE, GL_FALSE, GL_TRUE, ptr); > +size, type, stride, GL_FALSE, GL_FALSE, GL_TRUE, ptr); > } > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 8/8] glsl: only divide left components when it is a dual slot double.
On 12/09/2015 04:07 AM, Timothy Arceri wrote: > On Wed, 2015-12-09 at 16:06 +1000, Dave Airlie wrote: >> From: Dave Airlie >> >> Signed-off-by: Dave Airlie >> --- >> src/glsl/lower_packed_varyings.cpp | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/glsl/lower_packed_varyings.cpp >> b/src/glsl/lower_packed_varyings.cpp >> index 037c27d..ec9af62 100644 >> --- a/src/glsl/lower_packed_varyings.cpp >> +++ b/src/glsl/lower_packed_varyings.cpp >> @@ -472,7 +472,7 @@ >> lower_packed_varyings_visitor::lower_rvalue(ir_rvalue *rvalue, >>char right_swizzle_name[4] = { 0, 0, 0, 0 }; >> >>left_components = 4 - fine_location % 4; >> - if (rvalue->type->is_double()) { >> + if (rvalue->type->is_dual_slot_double()) { > > The subject line says what the change is but there is no explanation on > why it was made. Can you add more detail to the comment? > > To me the existing code *seems* correct as all doubles take up twice > the amount of components, why would we only divide by 2 when its a dual > slot double? Yeah... it seems like after this change it might try to pack a double in the W component of a vec3. >> /* We might actually end up with 0 left components! */ >> left_components /= 2; >>} > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa/varray: set double arrays to non-normalised.
From: Dave Airlie Doesn't have any effect in practice I don't think, but CTS reads back using GetVertexAttrib. This fixes: GL41-CTS.vertex_attrib_64bit.get_vertex_attrib Signed-off-by: Dave Airlie --- src/mesa/main/varray.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/varray.c b/src/mesa/main/varray.c index 58f376b..c71e16a 100644 --- a/src/mesa/main/varray.c +++ b/src/mesa/main/varray.c @@ -776,7 +776,7 @@ _mesa_VertexAttribLPointer(GLuint index, GLint size, GLenum type, update_array(ctx, "glVertexAttribLPointer", VERT_ATTRIB_GENERIC(index), legalTypes, 1, 4, -size, type, stride, GL_TRUE, GL_FALSE, GL_TRUE, ptr); +size, type, stride, GL_FALSE, GL_FALSE, GL_TRUE, ptr); } -- 2.5.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91888] EGL Wayland software rendering no longer work after regression
https://bugs.freedesktop.org/show_bug.cgi?id=91888 --- Comment #20 from nerdopol...@verizon.net --- And now all the examples from qtbase/examples/opengl are working as well even after exporting LIBGL_ALWAYS_SOFTWARE as well with mesa master. including contextinfo, qopenglwidget, qopenglwindow, and threadedqopenglwindow -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Separate base offset/constant offset combining from remapping.
On Wed, Dec 9, 2015 at 12:08 PM, Kenneth Graunke wrote: > On Wednesday, December 09, 2015 08:03:25 AM Jason Ekstrand wrote: >> On Dec 9, 2015 2:51 AM, "Kenneth Graunke" wrote: >> > >> > My tessellation branch has two additional remap functions. I don't want >> > to replicate this logic there. >> > >> > Signed-off-by: Kenneth Graunke >> > --- >> > src/mesa/drivers/dri/i965/brw_nir.c | 78 >> - >> > 1 file changed, 50 insertions(+), 28 deletions(-) >> > >> > Hey Jason, >> > >> > If you like this patch, and haven't yet merged your NIR input reworks, >> > feel free to just squash it into your changes. Or, we can land it >> > separately after your changes. It's up to you. >> > >> > Separating this out allows me to reuse this in my new tessellation input >> > and output remapping functions, and also means we don't need to add >> structs >> > for the remap functions...we can just pass the builder, or inputs_read, or >> > the VUE map...and not have to pack multiple things together. >> >> Sure. It does make things simpler. >> >> > --Ken >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c >> b/src/mesa/drivers/dri/i965/brw_nir.c >> > index 14ad172..105a175 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_nir.c >> > +++ b/src/mesa/drivers/dri/i965/brw_nir.c >> > @@ -27,15 +27,19 @@ >> > #include "glsl/nir/nir_builder.h" >> > #include "program/prog_to_nir.h" >> > >> > -struct remap_vs_attrs_state { >> > - nir_builder b; >> > - uint64_t inputs_read; >> > -}; >> > - >> > +/** >> > + * In many cases, we just add the base and offset together, so there's no >> > + * reason to keep them separate. Sometimes, combining them is essential: >> > + * if a shader only accesses part of a compound variable (such as a >> matrix >> > + * or array), the variable's base may not actually exist in the VUE map. >> > + * >> > + * This pass adds constant offsets to instr->const_index[0], and resets >> > + * the offset source to 0. Non-constant offsets remain unchanged. >> > + */ >> > static bool >> > -remap_vs_attrs(nir_block *block, void *void_state) >> > +add_const_offset_to_base(nir_block *block, void *closure) >> > { >> > - struct remap_vs_attrs_state *state = void_state; >> > + nir_builder *b = closure; >> > >> > nir_foreach_instr_safe(block, instr) { >> >if (instr->type != nir_instr_type_intrinsic) >> > @@ -43,30 +47,48 @@ remap_vs_attrs(nir_block *block, void *void_state) >> > >> >nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); >> > >> > + if (intrin->intrinsic == nir_intrinsic_load_input || >> > + intrin->intrinsic == nir_intrinsic_load_per_vertex_input || >> > + intrin->intrinsic == nir_intrinsic_load_output || >> > + intrin->intrinsic == nir_intrinsic_load_per_vertex_output || >> > + intrin->intrinsic == nir_intrinsic_store_output || >> > + intrin->intrinsic == nir_intrinsic_store_per_vertex_output) { >> >> This seems a bit scortched-earth. It would be nice if the caller had a bit >> more control. > > We could always add "do input"/"do output" options once we actually need > them. It means adding the structs back, but that's fine. > > I suppose for now we could ignore output intrinsics here. > >> > + nir_src *offset = nir_get_io_offset_src(intrin); >> > + nir_const_value *const_offset = nir_src_as_const_value(*offset); >> > + >> > + if (const_offset) { >> > +intrin->const_index[0] += const_offset->u[0]; >> > +b->cursor = nir_before_instr(&intrin->instr); >> > +nir_instr_rewrite_src(&intrin->instr, offset, >> > + nir_src_for_ssa(nir_imm_int(b, 0))); >> > + } >> >> Else??? It seems that you don't want to run this pass if you think you'll >> ever hit an indirect. I guess it's harmless to just do this for all direct >> things in our driver, but it doesn't sit well. > > Else do nothing. The problem I'm trying to avoid with this logic is > that inputs/outputs which take up multiple slots and are directly > accessed may only have slots assigned for the sub-components that are > actually used. So, I can't use the base offset, and need to move the > base to the slot that's actually used. > > If something is accessed indirectly, we assign all of its slots, so > using the base location is fine. Ok, that makes sense. >> > + } >> > + } >> > + return true; >> > + >> > +} >> > + >> > +static bool >> > +remap_vs_attrs(nir_block *block, void *closure) >> > +{ >> > + GLbitfield64 inputs_read = *((GLbitfield64 *) closure); >> > + >> > + nir_foreach_instr(block, instr) { >> > + if (instr->type != nir_instr_type_intrinsic) >> > + continue; >> > + >> > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); >> > + >> >if (intrin->intrinsic == nir_intrinsic_load_input) { >> > /* Attributes come in a contiguous block, ordered by their >> >
Re: [Mesa-dev] [PATCH] mesa: fix ID usage for buffer warnings
Reviewed-by: Ilia Mirkin On Wed, Dec 9, 2015 at 6:02 PM, Brian Paul wrote: > We need a different ID pointer for each call site. > --- > src/mesa/main/bufferobj.c | 18 -- > 1 file changed, 12 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c > index 6bc1b5e..e0639c8 100644 > --- a/src/mesa/main/bufferobj.c > +++ b/src/mesa/main/bufferobj.c > @@ -60,16 +60,16 @@ > > /** > * Helper to warn of possible performance issues, such as frequently > - * updating a buffer created with GL_STATIC_DRAW. > + * updating a buffer created with GL_STATIC_DRAW. Called via the macro > + * below. > */ > static void > -buffer_usage_warning(struct gl_context *ctx, const char *fmt, ...) > +buffer_usage_warning(struct gl_context *ctx, GLuint *id, const char *fmt, > ...) > { > va_list args; > - GLuint msg_id = 0; > > va_start(args, fmt); > - _mesa_gl_vdebug(ctx, &msg_id, > + _mesa_gl_vdebug(ctx, id, > MESA_DEBUG_SOURCE_API, > MESA_DEBUG_TYPE_PERFORMANCE, > MESA_DEBUG_SEVERITY_MEDIUM, > @@ -77,6 +77,12 @@ buffer_usage_warning(struct gl_context *ctx, const char > *fmt, ...) > va_end(args); > } > > +#define BUFFER_USAGE_WARNING(CTX, FMT, ...) \ > + do { \ > + static GLuint id = 0; \ > + buffer_usage_warning(CTX, &id, FMT, ##__VA_ARGS__); \ > + } while (0) > + > > /** > * Used as a placeholder for buffer objects between glGenBuffers() and > @@ -1713,7 +1719,7 @@ _mesa_buffer_sub_data(struct gl_context *ctx, struct > gl_buffer_object *bufObj, >/* If the application declared the buffer as static draw/copy or stream > * draw, it should not be frequently modified with glBufferSubData. > */ > - buffer_usage_warning(ctx, > + BUFFER_USAGE_WARNING(ctx, > "using %s(buffer %u, offset %u, size %u) to " > "update a %s buffer", > func, bufObj->Name, offset, size, > @@ -2432,7 +2438,7 @@ _mesa_map_buffer_range(struct gl_context *ctx, >if ((bufObj->Usage == GL_STATIC_DRAW || > bufObj->Usage == GL_STATIC_COPY) && >bufObj->NumMapBufferWriteCalls >= BUFFER_WARNING_CALL_COUNT) { > - buffer_usage_warning(ctx, > + BUFFER_USAGE_WARNING(ctx, >"using %s(buffer %u, offset %u, length %u) to " >"update a %s buffer", >func, bufObj->Name, offset, length, > -- > 1.9.1 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: fix ID usage for buffer warnings
We need a different ID pointer for each call site. --- src/mesa/main/bufferobj.c | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 6bc1b5e..e0639c8 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -60,16 +60,16 @@ /** * Helper to warn of possible performance issues, such as frequently - * updating a buffer created with GL_STATIC_DRAW. + * updating a buffer created with GL_STATIC_DRAW. Called via the macro + * below. */ static void -buffer_usage_warning(struct gl_context *ctx, const char *fmt, ...) +buffer_usage_warning(struct gl_context *ctx, GLuint *id, const char *fmt, ...) { va_list args; - GLuint msg_id = 0; va_start(args, fmt); - _mesa_gl_vdebug(ctx, &msg_id, + _mesa_gl_vdebug(ctx, id, MESA_DEBUG_SOURCE_API, MESA_DEBUG_TYPE_PERFORMANCE, MESA_DEBUG_SEVERITY_MEDIUM, @@ -77,6 +77,12 @@ buffer_usage_warning(struct gl_context *ctx, const char *fmt, ...) va_end(args); } +#define BUFFER_USAGE_WARNING(CTX, FMT, ...) \ + do { \ + static GLuint id = 0; \ + buffer_usage_warning(CTX, &id, FMT, ##__VA_ARGS__); \ + } while (0) + /** * Used as a placeholder for buffer objects between glGenBuffers() and @@ -1713,7 +1719,7 @@ _mesa_buffer_sub_data(struct gl_context *ctx, struct gl_buffer_object *bufObj, /* If the application declared the buffer as static draw/copy or stream * draw, it should not be frequently modified with glBufferSubData. */ - buffer_usage_warning(ctx, + BUFFER_USAGE_WARNING(ctx, "using %s(buffer %u, offset %u, size %u) to " "update a %s buffer", func, bufObj->Name, offset, size, @@ -2432,7 +2438,7 @@ _mesa_map_buffer_range(struct gl_context *ctx, if ((bufObj->Usage == GL_STATIC_DRAW || bufObj->Usage == GL_STATIC_COPY) && bufObj->NumMapBufferWriteCalls >= BUFFER_WARNING_CALL_COUNT) { - buffer_usage_warning(ctx, + BUFFER_USAGE_WARNING(ctx, "using %s(buffer %u, offset %u, length %u) to " "update a %s buffer", func, bufObj->Name, offset, length, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: also print hexadecimal values for register fields in the IB parser
From: Marek Olšák --- src/gallium/drivers/radeonsi/si_debug.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/radeonsi/si_debug.c b/src/gallium/drivers/radeonsi/si_debug.c index cce665e..034acf5 100644 --- a/src/gallium/drivers/radeonsi/si_debug.c +++ b/src/gallium/drivers/radeonsi/si_debug.c @@ -61,13 +61,16 @@ static void print_spaces(FILE *f, unsigned num) static void print_value(FILE *file, uint32_t value, int bits) { /* Guess if it's int or float */ - if (value <= (1 << 15)) - fprintf(file, "%u\n", value); - else { + if (value <= (1 << 15)) { + if (value <= 9) + fprintf(file, "%u\n", value); + else + fprintf(file, "%u (0x%0*x)\n", value, bits / 4, value); + } else { float f = uif(value); if (fabs(f) < 10 && f*10 == floor(f*10)) - fprintf(file, "%.1ff\n", f); + fprintf(file, "%.1ff (0x%0*x)\n", f, bits / 4, value); else /* Don't print more leading zeros than there are bits. */ fprintf(file, "0x%0*x\n", bits / 4, value); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] nv50, nvc0: optimize coherent buffer checking at draw time
On Wed, Dec 9, 2015 at 5:40 PM, Samuel Pitoiset wrote: > Instead of iterating over all the buffer resources looking for coherent > buffers, we keep track of a context-wide count. This will save some > iterations (and CPU cycles) in 99.99% case because usually coherent > buffers are not so used. > > Changes from v2: > - forgot to apply some changes for nv50 (texture/vertex bufs) > > Signed-off-by: Samuel Pitoiset > --- > src/gallium/drivers/nouveau/nv50/nv50_context.h | 3 ++ > src/gallium/drivers/nouveau/nv50/nv50_state.c | 19 +++ > src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 44 > +++-- > src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 3 ++ > src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 26 +++ > src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 41 +-- > 6 files changed, 64 insertions(+), 72 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h > b/src/gallium/drivers/nouveau/nv50/nv50_context.h > index 2cebcd9..712d00e 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_context.h > +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h > @@ -134,9 +134,11 @@ struct nv50_context { > struct nv50_constbuf constbuf[3][NV50_MAX_PIPE_CONSTBUFS]; > uint16_t constbuf_dirty[3]; > uint16_t constbuf_valid[3]; > + uint16_t constbuf_coherent[3]; > > struct pipe_vertex_buffer vtxbuf[PIPE_MAX_ATTRIBS]; > unsigned num_vtxbufs; > + uint32_t vtxbufs_coherent; > struct pipe_index_buffer idxbuf; > uint32_t vbo_fifo; /* bitmask of vertex elements to be pushed to FIFO */ > uint32_t vbo_user; /* bitmask of vertex buffers pointing to user memory */ > @@ -148,6 +150,7 @@ struct nv50_context { > > struct pipe_sampler_view *textures[3][PIPE_MAX_SAMPLERS]; > unsigned num_textures[3]; > + uint32_t textures_coherent[3]; > struct nv50_tsc_entry *samplers[3][PIPE_MAX_SAMPLERS]; > unsigned num_samplers[3]; > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c > b/src/gallium/drivers/nouveau/nv50/nv50_state.c > index fd7c7cd..b6e5c75 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c > @@ -661,9 +661,16 @@ nv50_stage_set_sampler_views(struct nv50_context *nv50, > int s, > assert(nr <= PIPE_MAX_SAMPLERS); > for (i = 0; i < nr; ++i) { >struct nv50_tic_entry *old = nv50_tic_entry(nv50->textures[s][i]); > + struct pipe_resource *res = views[i]->texture; I'm moderately sure either views[i] or texture can be null. [Same for nvc0.] >if (old) > nv50_screen_tic_unlock(nv50->screen, old); > > + if (res->target == PIPE_BUFFER && > + (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT)) > + nv50->textures_coherent[s] |= 1 << i; > + else > + nv50->textures_coherent[s] &= ~(1 << i); > + >pipe_sampler_view_reference(&nv50->textures[s][i], views[i]); > } > > @@ -852,8 +859,13 @@ nv50_set_constant_buffer(struct pipe_context *pipe, uint > shader, uint index, >nv50->constbuf[s][i].offset = cb->buffer_offset; >nv50->constbuf[s][i].size = MIN2(align(cb->buffer_size, 0x100), > 0x1); >nv50->constbuf_valid[s] |= 1 << i; > + if (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) > + nv50->constbuf_coherent[s] |= 1 << i; > + else > + nv50->constbuf_coherent[s] &= ~(1 << i); You also need to clear it out in the user case. > } else { >nv50->constbuf_valid[s] &= ~(1 << i); > + nv50->constbuf_coherent[s] &= ~(1 << i); > } > nv50->constbuf_dirty[s] |= 1 << i; > > @@ -1000,6 +1012,7 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, > if (!vb) { >nv50->vbo_user &= ~(((1ull << count) - 1) << start_slot); >nv50->vbo_constant &= ~(((1ull << count) - 1) << start_slot); > + nv50->vtxbufs_coherent &= ~(((1ull << count) - 1) << start_slot); >return; > } > > @@ -1012,9 +1025,15 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, > nv50->vbo_constant |= 1 << dst_index; > else > nv50->vbo_constant &= ~(1 << dst_index); > + nv50->vtxbufs_coherent &= ~(1 << dst_index); >} else { > nv50->vbo_user &= ~(1 << dst_index); > nv50->vbo_constant &= ~(1 << dst_index); > + > + if (vb[i].buffer->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) I wonder if vb[i].buffer might be null here... Not sure if that's allowed or not... > +nv50->vtxbufs_coherent |= (1 << dst_index); > + else > +nv50->vtxbufs_coherent &= ~(1 << dst_index); >} > } > > diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c > b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c > index 85878d5..b6ba803 100644 > --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c > +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c > @@ -761,8 +761,7 @@ nv50_draw_vbo(struct pipe_context *pipe, const
[Mesa-dev] [PATCH v2] nv50, nvc0: optimize coherent buffer checking at draw time
Instead of iterating over all the buffer resources looking for coherent buffers, we keep track of a context-wide count. This will save some iterations (and CPU cycles) in 99.99% case because usually coherent buffers are not so used. Changes from v2: - forgot to apply some changes for nv50 (texture/vertex bufs) Signed-off-by: Samuel Pitoiset --- src/gallium/drivers/nouveau/nv50/nv50_context.h | 3 ++ src/gallium/drivers/nouveau/nv50/nv50_state.c | 19 +++ src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 44 +++-- src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 3 ++ src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 26 +++ src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 41 +-- 6 files changed, 64 insertions(+), 72 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h b/src/gallium/drivers/nouveau/nv50/nv50_context.h index 2cebcd9..712d00e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h @@ -134,9 +134,11 @@ struct nv50_context { struct nv50_constbuf constbuf[3][NV50_MAX_PIPE_CONSTBUFS]; uint16_t constbuf_dirty[3]; uint16_t constbuf_valid[3]; + uint16_t constbuf_coherent[3]; struct pipe_vertex_buffer vtxbuf[PIPE_MAX_ATTRIBS]; unsigned num_vtxbufs; + uint32_t vtxbufs_coherent; struct pipe_index_buffer idxbuf; uint32_t vbo_fifo; /* bitmask of vertex elements to be pushed to FIFO */ uint32_t vbo_user; /* bitmask of vertex buffers pointing to user memory */ @@ -148,6 +150,7 @@ struct nv50_context { struct pipe_sampler_view *textures[3][PIPE_MAX_SAMPLERS]; unsigned num_textures[3]; + uint32_t textures_coherent[3]; struct nv50_tsc_entry *samplers[3][PIPE_MAX_SAMPLERS]; unsigned num_samplers[3]; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c b/src/gallium/drivers/nouveau/nv50/nv50_state.c index fd7c7cd..b6e5c75 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c @@ -661,9 +661,16 @@ nv50_stage_set_sampler_views(struct nv50_context *nv50, int s, assert(nr <= PIPE_MAX_SAMPLERS); for (i = 0; i < nr; ++i) { struct nv50_tic_entry *old = nv50_tic_entry(nv50->textures[s][i]); + struct pipe_resource *res = views[i]->texture; if (old) nv50_screen_tic_unlock(nv50->screen, old); + if (res->target == PIPE_BUFFER && + (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT)) + nv50->textures_coherent[s] |= 1 << i; + else + nv50->textures_coherent[s] &= ~(1 << i); + pipe_sampler_view_reference(&nv50->textures[s][i], views[i]); } @@ -852,8 +859,13 @@ nv50_set_constant_buffer(struct pipe_context *pipe, uint shader, uint index, nv50->constbuf[s][i].offset = cb->buffer_offset; nv50->constbuf[s][i].size = MIN2(align(cb->buffer_size, 0x100), 0x1); nv50->constbuf_valid[s] |= 1 << i; + if (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) + nv50->constbuf_coherent[s] |= 1 << i; + else + nv50->constbuf_coherent[s] &= ~(1 << i); } else { nv50->constbuf_valid[s] &= ~(1 << i); + nv50->constbuf_coherent[s] &= ~(1 << i); } nv50->constbuf_dirty[s] |= 1 << i; @@ -1000,6 +1012,7 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, if (!vb) { nv50->vbo_user &= ~(((1ull << count) - 1) << start_slot); nv50->vbo_constant &= ~(((1ull << count) - 1) << start_slot); + nv50->vtxbufs_coherent &= ~(((1ull << count) - 1) << start_slot); return; } @@ -1012,9 +1025,15 @@ nv50_set_vertex_buffers(struct pipe_context *pipe, nv50->vbo_constant |= 1 << dst_index; else nv50->vbo_constant &= ~(1 << dst_index); + nv50->vtxbufs_coherent &= ~(1 << dst_index); } else { nv50->vbo_user &= ~(1 << dst_index); nv50->vbo_constant &= ~(1 << dst_index); + + if (vb[i].buffer->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) +nv50->vtxbufs_coherent |= (1 << dst_index); + else +nv50->vtxbufs_coherent &= ~(1 << dst_index); } } diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 85878d5..b6ba803 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -761,8 +761,7 @@ nv50_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) { struct nv50_context *nv50 = nv50_context(pipe); struct nouveau_pushbuf *push = nv50->base.pushbuf; - bool tex_dirty = false; - int i, s; + int s; /* NOTE: caller must ensure that (min_index + index_bias) is >= 0 */ nv50->vb_elt_first = info->min_index + info->index_bias; @@ -791,27 +790,9 @@ nv50_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) push->kick_notify = nv50_draw_vbo_kick_notify;
[Mesa-dev] [PATCH] radeonsi: implement RB+ for Stoney (v2)
From: Marek Olšák v2: fix dual source blending --- src/gallium/drivers/radeon/r600_pipe_common.c | 1 + src/gallium/drivers/radeon/r600_pipe_common.h | 3 + src/gallium/drivers/radeon/r600_texture.c | 6 + src/gallium/drivers/radeonsi/si_state.c | 159 +- src/gallium/drivers/radeonsi/sid.h| 3 + 5 files changed, 170 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.c b/src/gallium/drivers/radeon/r600_pipe_common.c index 8899ba4..ba541ac 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.c +++ b/src/gallium/drivers/radeon/r600_pipe_common.c @@ -375,6 +375,7 @@ static const struct debug_named_value common_debug_options[] = { { "check_vm", DBG_CHECK_VM, "Check VM faults and dump debug info." }, { "nodcc", DBG_NO_DCC, "Disable DCC." }, { "nodccclear", DBG_NO_DCC_CLEAR, "Disable DCC fast clear." }, + { "norbplus", DBG_NO_RB_PLUS, "Disable RB+ on Stoney." }, DEBUG_NAMED_VALUE_END /* must be last */ }; diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 8c6c0c3..dd23ed5 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -86,6 +86,7 @@ #define DBG_CHECK_VM (1llu << 42) #define DBG_NO_DCC (1llu << 43) #define DBG_NO_DCC_CLEAR (1llu << 44) +#define DBG_NO_RB_PLUS (1llu << 45) #define R600_MAP_BUFFER_ALIGNMENT 64 @@ -250,6 +251,8 @@ struct r600_surface { unsigned cb_color_fmask_slice; /* EG and later */ unsigned cb_color_cmask;/* CB_COLORn_TILE (r600 only) */ unsigned cb_color_mask; /* R600 only */ + unsigned sx_ps_downconvert; /* Stoney only */ + unsigned sx_blend_opt_epsilon; /* Stoney only */ struct r600_resource *cb_buffer_fmask; /* Used for FMASK relocations. R600 only */ struct r600_resource *cb_buffer_cmask; /* Used for CMASK relocations. R600 only */ diff --git a/src/gallium/drivers/radeon/r600_texture.c b/src/gallium/drivers/radeon/r600_texture.c index 774722f..8c145e5 100644 --- a/src/gallium/drivers/radeon/r600_texture.c +++ b/src/gallium/drivers/radeon/r600_texture.c @@ -1393,6 +1393,7 @@ void evergreen_do_fast_color_clear(struct r600_common_context *rctx, return; for (i = 0; i < fb->nr_cbufs; i++) { + struct r600_surface *surf; struct r600_texture *tex; unsigned clear_bit = PIPE_CLEAR_COLOR0 << i; @@ -1403,6 +1404,7 @@ void evergreen_do_fast_color_clear(struct r600_common_context *rctx, if (!(*buffers & clear_bit)) continue; + surf = (struct r600_surface *)fb->cbufs[i]; tex = (struct r600_texture *)fb->cbufs[i]->texture; /* 128-bit formats are unusupported */ @@ -1449,6 +1451,10 @@ void evergreen_do_fast_color_clear(struct r600_common_context *rctx, if (clear_words_needed) tex->dirty_level_mask |= 1 << fb->cbufs[i]->u.tex.level; } else { + /* RB+ doesn't work with CMASK fast clear. */ + if (surf->sx_ps_downconvert) + continue; + /* ensure CMASK is enabled */ r600_texture_alloc_cmask_separate(rctx->screen, tex); if (tex->cmask.size == 0) { diff --git a/src/gallium/drivers/radeonsi/si_state.c b/src/gallium/drivers/radeonsi/si_state.c index 2ebfa1c..dcf4a7b 100644 --- a/src/gallium/drivers/radeonsi/si_state.c +++ b/src/gallium/drivers/radeonsi/si_state.c @@ -347,10 +347,54 @@ static uint32_t si_translate_blend_factor(int blend_fact) return 0; } +static uint32_t si_translate_blend_opt_function(int blend_func) +{ + switch (blend_func) { + case PIPE_BLEND_ADD: + return V_028760_OPT_COMB_ADD; + case PIPE_BLEND_SUBTRACT: + return V_028760_OPT_COMB_SUBTRACT; + case PIPE_BLEND_REVERSE_SUBTRACT: + return V_028760_OPT_COMB_REVSUBTRACT; + case PIPE_BLEND_MIN: + return V_028760_OPT_COMB_MIN; + case PIPE_BLEND_MAX: + return V_028760_OPT_COMB_MAX; + default: + return V_028760_OPT_COMB_BLEND_DISABLED; + } +} + +static uint32_t si_translate_blend_opt_factor(int blend_fact, bool is_alpha) +{ + switch (blend_fact) { + case PIPE_BLENDFACTOR_ZERO: + return V_028760_BLEND_OPT_PRESERVE_NONE_IGNORE_ALL; + case PIPE_BLENDFACTOR_ONE: + return V_028760_BLEND_OPT_PRESERVE_ALL_IGNORE_NONE; + case PIPE_BLENDFACTOR_SRC_COLOR: + return is_alpha ? V_028760_BLEND_OPT_PRESERVE_A1_IGNORE_A0 + : V_028760_BLEND_OPT_PRESERVE_C1_IGNORE_C0; +
Re: [Mesa-dev] [PATCH] mesa: detect inefficient buffer use and report through debug output
On Wed, Dec 9, 2015 at 5:23 PM, Brian Paul wrote: > On 12/09/2015 11:43 AM, Ilia Mirkin wrote: >> >> On Mon, Dec 7, 2015 at 8:42 PM, Brian Paul wrote: >>> >>> When a buffer is created with GL_STATIC_DRAW, its contents should not >>> be changed frequently. But that's exactly what one application I'm >>> debugging does. This patch adds code to try to detect inefficient >>> buffer use in a couple places. The GL_ARB_debug_output mechanism is >>> used to report the issue. >>> >>> NVIDIA's driver detects these sort of things too. >>> >>> Other types of inefficient buffer use could also be detected in the >>> future. >>> --- >>> src/mesa/main/bufferobj.c | 55 >>> +++ >>> src/mesa/main/mtypes.h| 4 >>> 2 files changed, 59 insertions(+) >>> >>> diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c >>> index f985982..6bc1b5e 100644 >>> --- a/src/mesa/main/bufferobj.c >>> +++ b/src/mesa/main/bufferobj.c >>> @@ -51,6 +51,34 @@ >>> >>> >>> /** >>> + * We count the number of buffer modification calls to check for >>> + * inefficient buffer use. This is the number of such calls before we >>> + * issue a warning. >>> + */ >>> +#define BUFFER_WARNING_CALL_COUNT 4 >>> + >>> + >>> +/** >>> + * Helper to warn of possible performance issues, such as frequently >>> + * updating a buffer created with GL_STATIC_DRAW. >>> + */ >>> +static void >>> +buffer_usage_warning(struct gl_context *ctx, const char *fmt, ...) >>> +{ >>> + va_list args; >>> + GLuint msg_id = 0; >> >> >> This needs to be wrapped in a macro, with a 'static' id (at each macro >> invocation), otherwise a fresh id will get generated each time this is >> called, which is presumably not desirable. Same as what I did with >> pipe_debug_message/_pipe_debug_message. > > > Is the macro required, or can I just pass a pointer to a static GLuint from > each call site? I took a quick stab at the macro but I'm having trouble > with the varargs stuff. The macro is not required... but you can copy the one I used for pipe_debug_message. Passing in a different static variable from each callsite would work too -- that's precisely what the macro would be doing in the first place. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: detect inefficient buffer use and report through debug output
On 12/09/2015 11:43 AM, Ilia Mirkin wrote: On Mon, Dec 7, 2015 at 8:42 PM, Brian Paul wrote: When a buffer is created with GL_STATIC_DRAW, its contents should not be changed frequently. But that's exactly what one application I'm debugging does. This patch adds code to try to detect inefficient buffer use in a couple places. The GL_ARB_debug_output mechanism is used to report the issue. NVIDIA's driver detects these sort of things too. Other types of inefficient buffer use could also be detected in the future. --- src/mesa/main/bufferobj.c | 55 +++ src/mesa/main/mtypes.h| 4 2 files changed, 59 insertions(+) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index f985982..6bc1b5e 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -51,6 +51,34 @@ /** + * We count the number of buffer modification calls to check for + * inefficient buffer use. This is the number of such calls before we + * issue a warning. + */ +#define BUFFER_WARNING_CALL_COUNT 4 + + +/** + * Helper to warn of possible performance issues, such as frequently + * updating a buffer created with GL_STATIC_DRAW. + */ +static void +buffer_usage_warning(struct gl_context *ctx, const char *fmt, ...) +{ + va_list args; + GLuint msg_id = 0; This needs to be wrapped in a macro, with a 'static' id (at each macro invocation), otherwise a fresh id will get generated each time this is called, which is presumably not desirable. Same as what I did with pipe_debug_message/_pipe_debug_message. Is the macro required, or can I just pass a pointer to a static GLuint from each call site? I took a quick stab at the macro but I'm having trouble with the varargs stuff. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] softpipe: V.2 implement some support for multiple viewports
Am 09.12.2015 um 19:59 schrieb eocallag...@alterapraxis.com: > Roland, > > I could not due to ml size limit or something, it just bounces hence the > pull request. Due to size limit? Didn't look that big to me... Otherwise, looks good to me, though the second commit should mention it actually enables the extension. Roland > Cheers, > Edward. > > On 2015-12-10 02:38, Roland Scheidegger wrote: >> Am 09.12.2015 um 05:16 schrieb Edward O'Callaghan: >>> This fixes my initial attempt so that piglit now passes 14/14. Thanks >>> to a couple of tips from Roland in the previous patch I was able to >>> fix the remaining issue. This should be golden now. >>> >> >> Great that you got it working! >> Please send the patches to the ml. >> >> Roland > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] st/mesa: fix GLSL uniform updates for glBitmap & glDrawPixels
On Mon, Dec 7, 2015 at 5:36 PM, Brian Paul wrote: > On 12/06/2015 04:34 PM, Marek Olšák wrote: >> >> From: Marek Olšák >> >> Spotted by luck. The GLSL uniform storage is only associated once >> in LinkShader and can't be reallocated afterwards, because that would >> break the association. >> --- >> src/mesa/state_tracker/st_cb_bitmap.c | 6 +- >> src/mesa/state_tracker/st_cb_drawpixels.c | 6 -- >> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 6 ++ >> src/mesa/state_tracker/st_program.c| 17 - >> src/mesa/state_tracker/st_program.h| 1 - >> 5 files changed, 15 insertions(+), 21 deletions(-) >> >> diff --git a/src/mesa/state_tracker/st_cb_bitmap.c >> b/src/mesa/state_tracker/st_cb_bitmap.c >> index cbc6845..a4a48a6 100644 >> --- a/src/mesa/state_tracker/st_cb_bitmap.c >> +++ b/src/mesa/state_tracker/st_cb_bitmap.c >> @@ -287,7 +287,8 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, >> GLint y, GLfloat z, >> GLfloat colorSave[4]; >> COPY_4V(colorSave, ctx->Current.Attrib[VERT_ATTRIB_COLOR0]); >> COPY_4V(ctx->Current.Attrib[VERT_ATTRIB_COLOR0], color); >> - st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT); >> + st_upload_constants(st, st->fp->Base.Base.Parameters, >> + PIPE_SHADER_FRAGMENT); >> COPY_4V(ctx->Current.Attrib[VERT_ATTRIB_COLOR0], colorSave); >> } >> >> @@ -404,6 +405,9 @@ draw_bitmap_quad(struct gl_context *ctx, GLint x, >> GLint y, GLfloat z, >> cso_restore_stream_outputs(cso); >> >> pipe_resource_reference(&vbuf, NULL); >> + >> + /* We uploaded modified constants, need to invalidate them. */ >> + st->dirty.mesa |= _NEW_PROGRAM_CONSTANTS; >> } >> >> >> diff --git a/src/mesa/state_tracker/st_cb_drawpixels.c >> b/src/mesa/state_tracker/st_cb_drawpixels.c >> index 262ad80..e295f54 100644 >> --- a/src/mesa/state_tracker/st_cb_drawpixels.c >> +++ b/src/mesa/state_tracker/st_cb_drawpixels.c >> @@ -1109,9 +1109,6 @@ st_DrawPixels(struct gl_context *ctx, GLint x, GLint >> y, >> >> st->pixel_xfer.pixelmap_sampler_view); >>num_sampler_view++; >> } >> - >> - /* update fragment program constants */ >> - st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT); >> } >> >> /* Put glDrawPixels image into a texture */ >> @@ -1462,9 +1459,6 @@ st_CopyPixels(struct gl_context *ctx, GLint srcx, >> GLint srcy, >> >> st->pixel_xfer.pixelmap_sampler_view); >>num_sampler_view++; >> } >> - >> - /* update fragment program constants */ >> - st_upload_constants(st, fpv->parameters, PIPE_SHADER_FRAGMENT); >> } >> else { >> assert(type == GL_DEPTH); >> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> index 40c7725..a32c4cf 100644 >> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> @@ -5640,6 +5640,12 @@ get_mesa_program(struct gl_context *ctx, >> >> _mesa_reference_program(ctx, &shader->Program, prog); >> >> + /* Avoid reallocation of the program parameter list, because the >> uniform >> +* storage is only associated with the original parameter list. >> +* This should be enough for Bitmap and DrawPixels constants. >> +*/ >> + _mesa_reserve_parameter_storage(prog->Parameters, 8); >> + >> /* This has to be done last. Any operation the can cause >> * prog->ParameterValues to get reallocated (e.g., anything that adds >> a >> * program constant) has to happen before creating this linkage. >> diff --git a/src/mesa/state_tracker/st_program.c >> b/src/mesa/state_tracker/st_program.c >> index 75ccaf2..39c54c2 100644 >> --- a/src/mesa/state_tracker/st_program.c >> +++ b/src/mesa/state_tracker/st_program.c >> @@ -112,8 +112,6 @@ delete_fp_variant(struct st_context *st, struct >> st_fp_variant *fpv) >> { >> if (fpv->driver_shader) >> cso_delete_fragment_shader(st->cso_context, fpv->driver_shader); >> - if (fpv->parameters) >> - _mesa_free_parameter_list(fpv->parameters); >> free(fpv); >> } >> >> @@ -914,8 +912,6 @@ st_create_fp_variant(struct st_context *st, >>if (tgsi.tokens != stfp->tgsi.tokens) >> tgsi_free_tokens(tgsi.tokens); >>tgsi.tokens = tokens; >> - variant->parameters = >> -_mesa_clone_parameter_list(stfp->Base.Base.Parameters); >> } else >>fprintf(stderr, "mesa: cannot create a shader for glBitmap\n"); >> } >> @@ -924,6 +920,7 @@ st_create_fp_variant(struct st_context *st, >> if (key->drawpixels) { >> const struct tgsi_token *tokens; >> unsigned scale_const = 0, bias_const = 0, texcoord_const = 0; >> + struct gl_program_parameter_list *params = >> stfp->Base.Base.Parameters; >> >> /* Find the first unused slot. */ >> variant->drawpix_sampler = ffs(~stfp->Base.Base.S
[Mesa-dev] [PATCH 2/4] winsys/amdgpu: clear the buffer cache on allocation failure and try again
From: Marek Olšák --- src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c index 41efbcb..674482e 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c @@ -494,8 +494,13 @@ amdgpu_bo_create(struct radeon_winsys *rws, /* Create a new one. */ bo = amdgpu_create_bo(ws, size, alignment, usage, domain, flags); - if (!bo) - return NULL; + if (!bo) { + /* Clear the cache and try again. */ + pb_cache_release_all_buffers(&ws->bo_cache); + bo = amdgpu_create_bo(ws, size, alignment, usage, domain, flags); + if (!bo) + return NULL; + } bo->use_reusable_pool = use_reusable_pool; return &bo->base; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] winsys/radeon: clear the buffer cache on mmap failure and try again
From: Marek Olšák --- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index a5f8aeb..5ba01b9 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -361,9 +361,16 @@ void *radeon_bo_do_map(struct radeon_bo *bo) ptr = os_mmap(0, args.size, PROT_READ|PROT_WRITE, MAP_SHARED, bo->rws->fd, args.addr_ptr); if (ptr == MAP_FAILED) { -pipe_mutex_unlock(bo->map_mutex); -fprintf(stderr, "radeon: mmap failed, errno: %i\n", errno); -return NULL; +/* Clear the cache and try again. */ +pb_cache_release_all_buffers(&bo->rws->bo_cache); + +ptr = os_mmap(0, args.size, PROT_READ|PROT_WRITE, MAP_SHARED, + bo->rws->fd, args.addr_ptr); +if (ptr == MAP_FAILED) { +pipe_mutex_unlock(bo->map_mutex); +fprintf(stderr, "radeon: mmap failed, errno: %i\n", errno); +return NULL; +} } bo->ptr = ptr; bo->map_count = 1; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] winsys/amdgpu: clear the buffer cache on mmap failure and try again
From: Marek Olšák --- src/gallium/winsys/amdgpu/drm/amdgpu_bo.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c index 674482e..adea376 100644 --- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c +++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c @@ -229,6 +229,11 @@ static void *amdgpu_bo_map(struct pb_buffer *buf, return bo->user_ptr; r = amdgpu_bo_cpu_map(bo->bo, &cpu); + if (r) { + /* Clear the cache and try again. */ + pb_cache_release_all_buffers(&bo->ws->bo_cache); + r = amdgpu_bo_cpu_map(bo->bo, &cpu); + } return r ? NULL : cpu; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] winsys/radeon: clear the buffer cache on allocation failure and try again
From: Marek Olšák --- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 3fd233c..a5f8aeb 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -766,8 +766,13 @@ radeon_winsys_bo_create(struct radeon_winsys *rws, } bo = radeon_create_bo(ws, size, alignment, usage, domain, flags); -if (!bo) -return NULL; +if (!bo) { +/* Clear the cache and try again. */ +pb_cache_release_all_buffers(&ws->bo_cache); +bo = radeon_create_bo(ws, size, alignment, usage, domain, flags); +if (!bo) +return NULL; +} bo->use_reusable_pool = use_reusable_pool; -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: don't call of u_prims_for_vertices for patches and rectangles
From: Marek Olšák Both caused a crash due to a division by zero in that function. This is an alternative fix. Cc: 11.0 11.1 --- src/gallium/drivers/radeonsi/si_state_draw.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index ee84a1f..e550011 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -216,6 +216,18 @@ static void si_emit_derived_tess_state(struct si_context *sctx, radeon_emit(cs, tcs_out_layout | (num_tcs_output_cp << 26)); } +static unsigned si_num_prims_for_vertices(const struct pipe_draw_info *info) +{ + switch (info->mode) { + case PIPE_PRIM_PATCHES: + return info->count / info->vertices_per_patch; + case R600_PRIM_RECTANGLE_LIST: + return info->count / 3; + default: + return u_prims_for_vertices(info->mode, info->count); + } +} + static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx, const struct pipe_draw_info *info, unsigned num_patches) @@ -320,7 +332,7 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx, if (sctx->b.screen->info.max_se >= 2 && ia_switch_on_eoi && (info->indirect || (info->instance_count > 1 && - u_prims_for_vertices(info->mode, info->count) <= 1))) + si_num_prims_for_vertices(info) <= 1))) sctx->b.flags |= SI_CONTEXT_VGT_FLUSH; return S_028AA8_SWITCH_ON_EOP(ia_switch_on_eop) | -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/aux../util: Make u_prims_for_vertices() safe
Pushed, thanks. Marek On Wed, Dec 9, 2015 at 10:07 AM, Edward O'Callaghan wrote: > Let us avoid trapping in hardware from a SIGFPE and instead > assert on a zero divisor. > > Hint: This can occur if a PIPE_PRIM_? is not handled in > u_prim_vertex_count() that results in ' info ' not > being initialized in the expected manner. > > Further, we also fix a possibly NULL pointer dereference > from ' info ' being NULL from a u_prim_vertex_count() call. > > Signed-off-by: Edward O'Callaghan > --- > src/gallium/auxiliary/util/u_prim.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/gallium/auxiliary/util/u_prim.h > b/src/gallium/auxiliary/util/u_prim.h > index 3668015..a09c315 100644 > --- a/src/gallium/auxiliary/util/u_prim.h > +++ b/src/gallium/auxiliary/util/u_prim.h > @@ -145,6 +145,9 @@ u_prims_for_vertices(unsigned prim, unsigned num) > { > const struct u_prim_vertex_count *info = u_prim_vertex_count(prim); > > + assert(info); > + assert(info->incr != 0); > + > if (num < info->min) >return 0; > > -- > 2.5.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/26] i965: Add Gen7+ tessellation engine state (3DSTATE_TE).
On 2015-12-02 16:15:56, Kenneth Graunke wrote: > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/gen7_te_state.c | 36 > --- > 1 file changed, 28 insertions(+), 8 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/gen7_te_state.c > b/src/mesa/drivers/dri/i965/gen7_te_state.c > index 95a5e98..2650fa5 100644 > --- a/src/mesa/drivers/dri/i965/gen7_te_state.c > +++ b/src/mesa/drivers/dri/i965/gen7_te_state.c > @@ -29,19 +29,39 @@ > static void > upload_te_state(struct brw_context *brw) > { > - /* Disable the TE */ > - BEGIN_BATCH(4); > - OUT_BATCH(_3DSTATE_TE << 16 | (4 - 2)); > - OUT_BATCH(0); > - OUT_BATCH(0); > - OUT_BATCH(0); > - ADVANCE_BATCH(); > + /* BRW_NEW_TESS_EVAL_PROGRAM */ > + bool active = brw->tess_eval_program; > + if (active) > + assert(brw->tess_ctrl_program); > + > + const struct brw_tes_prog_data *tes_prog_data = brw->tes.prog_data; > + > + if (active) { > + BEGIN_BATCH(4); > + OUT_BATCH(_3DSTATE_TE << 16 | (4 - 2)); > + OUT_BATCH((tes_prog_data->partitioning << GEN7_TE_PARTITIONING_SHIFT) | > +(tes_prog_data->output_topology << > GEN7_TE_OUTPUT_TOPOLOGY_SHIFT) | > +(tes_prog_data->domain << GEN7_TE_DOMAIN_SHIFT) | Looks like we don't currently have the masks for SET_FIELD, but maybe we should add them? Reviewed-by: Jordan Justen > +GEN7_TE_ENABLE); > + OUT_BATCH_F(63.0); > + OUT_BATCH_F(64.0); > + ADVANCE_BATCH(); > + } else { > + BEGIN_BATCH(4); > + OUT_BATCH(_3DSTATE_TE << 16 | (4 - 2)); > + OUT_BATCH(0); > + OUT_BATCH_F(0); > + OUT_BATCH_F(0); > + ADVANCE_BATCH(); > + } > } > > const struct brw_tracked_state gen7_te_state = { > .dirty = { >.mesa = 0, > - .brw = BRW_NEW_CONTEXT, > + .brw = BRW_NEW_CONTEXT | > + BRW_NEW_TES_PROG_DATA | > + BRW_NEW_TESS_EVAL_PROGRAM, > }, > .emit = upload_te_state, > }; > -- > 2.6.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 14/26] i965: Add Gen8+ tessellation evaluation shader state (3DSTATE_DS).
On 2015-12-02 16:15:55, Kenneth Graunke wrote: > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/gen8_ds_state.c | 66 > +++ > 1 file changed, 59 insertions(+), 7 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/gen8_ds_state.c > b/src/mesa/drivers/dri/i965/gen8_ds_state.c > index 4ce4ab3..a79e8aa 100644 > --- a/src/mesa/drivers/dri/i965/gen8_ds_state.c > +++ b/src/mesa/drivers/dri/i965/gen8_ds_state.c > @@ -29,19 +29,71 @@ > static void > gen8_upload_ds_state(struct brw_context *brw) > { > - /* Disable the DS Unit */ > - int ds_pkt_len = brw->gen >= 9 ? 11 : 9; > - BEGIN_BATCH(ds_pkt_len); > - OUT_BATCH(_3DSTATE_DS << 16 | (ds_pkt_len - 2)); > - for (int i = 0; i < ds_pkt_len - 1; i++) > + struct gl_context *ctx = &brw->ctx; > + const struct brw_stage_state *stage_state = &brw->tes.base; > + /* BRW_NEW_TESS_EVAL_PROGRAM */ > + bool active = brw->tess_eval_program; > + assert(!active || brw->tess_ctrl_program); > + > + /* BRW_NEW_TES_PROG_DATA */ > + const struct brw_tes_prog_data *tes_prog_data = brw->tes.prog_data; > + const struct brw_vue_prog_data *vue_prog_data = &tes_prog_data->base; > + const struct brw_stage_prog_data *prog_data = &vue_prog_data->base; > + > + if (active) { > + BEGIN_BATCH(9); > + OUT_BATCH(_3DSTATE_DS << 16 | (9 - 2)); > + OUT_BATCH(stage_state->prog_offset); > + OUT_BATCH(0); > + OUT_BATCH(SET_FIELD(DIV_ROUND_UP(stage_state->sampler_count, 4), > + GEN7_DS_SAMPLER_COUNT) | > +SET_FIELD(prog_data->binding_table.size_bytes / 4, > + GEN7_DS_BINDING_TABLE_ENTRY_COUNT)); > + if (prog_data->total_scratch) { > + OUT_RELOC64(stage_state->scratch_bo, > + I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, > + ffs(prog_data->total_scratch) - 11); > + } else { > + OUT_BATCH(0); > + OUT_BATCH(0); > + } > + OUT_BATCH(SET_FIELD(prog_data->dispatch_grf_start_reg, > + GEN7_DS_DISPATCH_START_GRF) | > +SET_FIELD(vue_prog_data->urb_read_length, > + GEN7_DS_URB_READ_LENGTH)); > + > + OUT_BATCH(GEN7_DS_ENABLE | > +GEN7_DS_STATISTICS_ENABLE | > +(brw->max_ds_threads - 1) << HSW_DS_MAX_THREADS_SHIFT | SET_FIELD? Reviewed-by: Jordan Justen > +(vue_prog_data->dispatch_mode == DISPATCH_MODE_SIMD8 ? > + GEN7_DS_SIMD8_DISPATCH_ENABLE : 0) | > +(tes_prog_data->domain == BRW_TESS_DOMAIN_TRI ? > + GEN7_DS_COMPUTE_W_COORDINATE_ENABLE : 0)); > + OUT_BATCH(SET_FIELD(ctx->Transform.ClipPlanesEnabled, > + GEN8_DS_USER_CLIP_DISTANCE)); > + ADVANCE_BATCH(); > + } else { > + BEGIN_BATCH(9); > + OUT_BATCH(_3DSTATE_DS << 16 | (9 - 2)); > + OUT_BATCH(0); > + OUT_BATCH(0); > + OUT_BATCH(0); > + OUT_BATCH(0); > + OUT_BATCH(0); > + OUT_BATCH(0); > + OUT_BATCH(0); >OUT_BATCH(0); > - ADVANCE_BATCH(); > + ADVANCE_BATCH(); > + } > + brw->tes.enabled = active; > } > > const struct brw_tracked_state gen8_ds_state = { > .dirty = { >.mesa = 0, > - .brw = BRW_NEW_CONTEXT, > + .brw = BRW_NEW_BATCH | > + BRW_NEW_TESS_EVAL_PROGRAM | > + BRW_NEW_TES_PROG_DATA, > }, > .emit = gen8_upload_ds_state, > }; > -- > 2.6.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [ANNOUNCE] mesa-demos 8.3.0
This new mesa-demos release fixes the build issue against mesa 10.6 (Bug #91643) and picks up the latest glxinfo changes. For the misc changes see below. Andreas. Adam Jackson (1): glxinfo: Add support for GLX_MESA_query_renderer Andreas Boll (3): demos: add missing binaries to .gitignore util: Remove unused glstate.[ch] demos: Bump version to 8.3.0 for release Awais Belal (1): sharedtex_mt: fix rendering thread hang Brian Paul (21): wglinfo: query and report multisample information glxinfo/wglinfo: move argument parsing into common code wglinfo: Add support for reporting core profile info wglinfo: print swap method in print_visual_attribs_verbose() wglinfo: adjust column spacing for pixel format info glxinfo/wglinfo: query/print GL_ARB_texture_multisample limits glxinfo/wglinfo: reverse order of the gl_versions[] array geom-stipple-lines: use VBO, specify number of verts on cmd line geom-wide-lines: use VBO, specify number of verts on cmd line line-sample: a new test that draws a sample of wide/stipple/smooth lines line-sample: use GL_LINE_STRIP instead of GL_LINE_LOOP glxinfo/wglinfo: add brief (-B) output mode glxinfo: pass the options object to print_screen_info() wglinfo: pass the options object to print_screen_info() demos: flush stdout after printing frame rate quad-offset-unfilled: fix GL_POLYGON_OFFSET_FILL/LINE mistake util: add fflush() in ValidateShaderProgram() tri-tex-stipple: trivial test of texturing with stippling line-sample: add flat/smooth and blend toggles rubberband: select line width with 1..4 keys rubberband: add keyboard option to test non-white drawing color Dave Airlie (1): glxinfo: add 4.5 as a valid version Dongwon Kim (1): torus.c: Lighting effect is distorted when object is scaled up/down Jose Fonseca (3): wgl: Remove unnecessary #pragmas. line-sample: Ensure GL_ALIASED_LINE_WIDTH_RANGE is defined on Windows. xdemos/corender: Remove. José Fonseca (5): cmake: Define HAVE_FREEGLUT when glutInitContextProfile symbol is present. cmake: Don't use gcc specific warnings with g++. tests,trival,fp,vp: Rename errno with errnum. wgl: Ensure PIXELFORMATDESCRIPTOR members are zeroed. s/Tungsten Graphics/VMware/ Julien Isorce (1): glxinfo: fix segfault when core profile is unavailable Marc Dietrich (1): glxinfo: fix direct rendering context in glxinfo Marek Olšák (2): eglinfo: print client extensions glxinfo: fix printing core profile extensions Matt Turner (1): egl: Remove demos using EGL_MESA_screen_surface. Michael Olbrich (1): opengles2: fix building without X11 Nathan Kidd (1): glxgears_pixmap: destroy correct pixmap id Ross Burton (1): configure.ac: fix AC_WITH(glut) so that --without-glut works git tag: mesa-demos-8.3.0 ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/mesa-demos-8.3.0.tar.bz2 MD5: 628e75c23c17394f11a316c36f8e4164 mesa-demos-8.3.0.tar.bz2 SHA1: 468a8f24938ab07e2e31828cf961515371d45b56 mesa-demos-8.3.0.tar.bz2 SHA256: c173154bbd0d5fb53d732471984def42fb1b14ac85fcb834138fb9518b3e0bef mesa-demos-8.3.0.tar.bz2 PGP: ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/mesa-demos-8.3.0.tar.bz2.sig ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/mesa-demos-8.3.0.tar.gz MD5: f42510108d9401ad1b4ee67957cc73d7 mesa-demos-8.3.0.tar.gz SHA1: 2d46ef2d83030c978ba032eac14bf94007bd4a84 mesa-demos-8.3.0.tar.gz SHA256: 6127c5511e63447b28c2df735739de06c5d221f68e7671cc0ee446e605e92357 mesa-demos-8.3.0.tar.gz PGP: ftp://ftp.freedesktop.org/pub/mesa/demos/8.3.0/mesa-demos-8.3.0.tar.gz.sig signature.asc Description: Digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/26] i965: Add tessellation shader sampler support.
12 & 13 Reviewed-by: Jordan Justen On 2015-12-02 16:15:53, Kenneth Graunke wrote: > Based on code by Chris Forbes and Fabian Bieler. > > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_context.h | 2 +- > src/mesa/drivers/dri/i965/brw_sampler_state.c | 46 > +++ > src/mesa/drivers/dri/i965/brw_state.h | 2 ++ > src/mesa/drivers/dri/i965/brw_state_upload.c | 2 ++ > 4 files changed, 51 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index 8de0e6f..c440c7d 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1228,7 +1228,7 @@ struct brw_context > } perfmon; > > int num_atoms[BRW_NUM_PIPELINES]; > - const struct brw_tracked_state render_atoms[71]; > + const struct brw_tracked_state render_atoms[73]; > const struct brw_tracked_state compute_atoms[9]; > > /* If (INTEL_DEBUG & DEBUG_BATCH) */ > diff --git a/src/mesa/drivers/dri/i965/brw_sampler_state.c > b/src/mesa/drivers/dri/i965/brw_sampler_state.c > index 6d73444..3f29e2f 100644 > --- a/src/mesa/drivers/dri/i965/brw_sampler_state.c > +++ b/src/mesa/drivers/dri/i965/brw_sampler_state.c > @@ -55,6 +55,8 @@ gen7_emit_sampler_state_pointers_xs(struct brw_context *brw, > { > static const uint16_t packet_headers[] = { >[MESA_SHADER_VERTEX] = _3DSTATE_SAMPLER_STATE_POINTERS_VS, > + [MESA_SHADER_TESS_CTRL] = _3DSTATE_SAMPLER_STATE_POINTERS_HS, > + [MESA_SHADER_TESS_EVAL] = _3DSTATE_SAMPLER_STATE_POINTERS_DS, >[MESA_SHADER_GEOMETRY] = _3DSTATE_SAMPLER_STATE_POINTERS_GS, >[MESA_SHADER_FRAGMENT] = _3DSTATE_SAMPLER_STATE_POINTERS_PS, > }; > @@ -647,3 +649,47 @@ const struct brw_tracked_state brw_gs_samplers = { > }, > .emit = brw_upload_gs_samplers, > }; > + > + > +static void > +brw_upload_tcs_samplers(struct brw_context *brw) > +{ > + /* BRW_NEW_TESS_CTRL_PROGRAM */ > + struct gl_program *tcs = (struct gl_program *) brw->tess_ctrl_program; > + if (!tcs) > + return; > + > + brw_upload_sampler_state_table(brw, tcs, &brw->tcs.base); > +} > + > + > +const struct brw_tracked_state brw_tcs_samplers = { > + .dirty = { > + .mesa = _NEW_TEXTURE, > + .brw = BRW_NEW_BATCH | > + BRW_NEW_TESS_CTRL_PROGRAM, > + }, > + .emit = brw_upload_tcs_samplers, > +}; > + > + > +static void > +brw_upload_tes_samplers(struct brw_context *brw) > +{ > + /* BRW_NEW_TESS_EVAL_PROGRAM */ > + struct gl_program *tes = (struct gl_program *) brw->tess_eval_program; > + if (!tes) > + return; > + > + brw_upload_sampler_state_table(brw, tes, &brw->tes.base); > +} > + > + > +const struct brw_tracked_state brw_tes_samplers = { > + .dirty = { > + .mesa = _NEW_TEXTURE, > + .brw = BRW_NEW_BATCH | > + BRW_NEW_TESS_EVAL_PROGRAM, > + }, > + .emit = brw_upload_tes_samplers, > +}; > diff --git a/src/mesa/drivers/dri/i965/brw_state.h > b/src/mesa/drivers/dri/i965/brw_state.h > index 129de30..a332b92 100644 > --- a/src/mesa/drivers/dri/i965/brw_state.h > +++ b/src/mesa/drivers/dri/i965/brw_state.h > @@ -72,6 +72,8 @@ extern const struct brw_tracked_state > brw_state_base_address; > extern const struct brw_tracked_state brw_urb_fence; > extern const struct brw_tracked_state brw_vs_prog; > extern const struct brw_tracked_state brw_vs_samplers; > +extern const struct brw_tracked_state brw_tcs_samplers; > +extern const struct brw_tracked_state brw_tes_samplers; > extern const struct brw_tracked_state brw_gs_samplers; > extern const struct brw_tracked_state brw_vs_ubo_surfaces; > extern const struct brw_tracked_state brw_vs_abo_surfaces; > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > b/src/mesa/drivers/dri/i965/brw_state_upload.c > index ba66886..e61fa6a 100644 > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > @@ -322,6 +322,8 @@ static const struct brw_tracked_state > *gen8_render_atoms[] = > > &brw_fs_samplers, > &brw_vs_samplers, > + &brw_tcs_samplers, > + &brw_tes_samplers, > &brw_gs_samplers, > &gen8_multisample_state, > > -- > 2.6.2 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Mesa 11.0.7
Mesa 11.0.7 is now available. This release brings substantial amount of fixes in meta (affecting i965), some driver fixes for i965, nouveau, r600 and llvm. The video encoding for Stoney has been disabled, as it isn't working properly. There are also build fixes for DragonFly and other *BSD platforms, Chris Wilson (1): meta: Compute correct buffer size with SkipRows/SkipPixels Daniel Stone (1): egl/wayland: Ignore rects from SwapBuffersWithDamage Dave Airlie (4): texgetimage: consolidate 1D array handling code. r600: geometry shader gsvs itemsize workaround r600: rv670 use at least 16es/gs threads r600: workaround empty geom shader. Emil Velikov (5): docs: add sha256 checksums for 11.0.6 get-pick-list.sh: Require explicit "11.0" for nominating stable patches mesa; add get-extra-pick-list.sh script into bin/ Update version to 11.0.7 docs: add release notes for 11.0.7 François Tigeot (1): xmlconfig: Add support for DragonFly Ian Romanick (22): mesa: Make bind_vertex_buffer avilable outside varray.c mesa: Refactor update_array_format to make _mesa_update_array_format_public mesa: Refactor enable_vertex_array_attrib to make _mesa_enable_vertex_array_attrib i965: Pass brw_context instead of gl_context to brw_draw_rectlist i965: Use DSA functions for VBOs in brw_meta_fast_clear i965: Use internal functions for buffer object access i965: Don't pollute the buffer object namespace in brw_meta_fast_clear meta: Use DSA functions for PBO in create_texture_for_pbo meta: Use _mesa_NamedBufferData and _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects i965: Use _mesa_NamedBufferSubData for users of _mesa_meta_setup_vertex_objects meta: Don't leave the VBO bound after _mesa_meta_setup_vertex_objects meta: Track VBO using gl_buffer_object instead of GL API object handle meta: Use DSA functions for VBOs in _mesa_meta_setup_vertex_objects meta: Use internal functions for buffer object and VAO access meta: Don't pollute the buffer object namespace in _mesa_meta_setup_vertex_objects meta: Partially convert _mesa_meta_DrawTex to DSA meta: Track VBO using gl_buffer_object instead of GL API object handle in _mesa_meta_DrawTex meta: Use internal functions for buffer object and VAO access in _mesa_meta_DrawTex meta: Don't pollute the buffer object namespace in _mesa_meta_DrawTex meta/TexSubImage: Don't pollute the buffer object namespace meta/generate_mipmap: Don't leak the framebuffer object glsl: Fix off-by-one error in array size check assertion Ilia Mirkin (7): nvc0/ir: actually emit AFETCH on kepler nir: fix typo in idiv lowering, causing large-udiv-udiv failures nouveau: use the buffer usage to determine placement when no binding nv50,nvc0: properly handle buffer storage invalidation on dsa buffer nv50/ir: fix (un)spilling of 3-wide results mesa: support GL_RED/GL_RG in ES2 contexts when driver support exists nvc0/ir: start offset at texBindBase for txq, like regular texturing Jonathan Gray (1): automake: fix some occurrences of hardcoded -ldl and -lpthread Leo Liu (1): radeon/vce: disable Stoney VCE for 11.0 Marta Lofstedt (1): gles2: Update gl2ext.h to revision: 32120 Oded Gabbay (1): llvmpipe: disable VSX in ppc due to LLVM PPC bug git tag: mesa-11.0.7 ftp://ftp.freedesktop.org/pub/mesa/11.0.7/mesa-11.0.7.tar.gz MD5: 73e2e04d02ab2cdddc635fcb261bde13 mesa-11.0.7.tar.gz SHA1: ba017f72c5cfe08140b8b0c1a98035f76e19abae mesa-11.0.7.tar.gz SHA256: 07c27004ff68b288097d17b2faa7bdf15ec73c96b7e6c9835266e544adf0a62f mesa-11.0.7.tar.gz PGP: ftp://ftp.freedesktop.org/pub/mesa/11.0.7/mesa-11.0.7.tar.gz.sig ftp://ftp.freedesktop.org/pub/mesa/11.0.7/mesa-11.0.7.tar.xz MD5: 5bb515d4b0931b7a9e1bfec3da73f10f mesa-11.0.7.tar.xz SHA1: 7e96868bf104673509e30846ccb6f641231e8c5a mesa-11.0.7.tar.xz SHA256: e7e90a332ede6c8fd08eff90786a3fd1605a4e62ebf3a9b514047838194538cb mesa-11.0.7.tar.xz PGP: ftp://ftp.freedesktop.org/pub/mesa/11.0.7/mesa-11.0.7.tar.xz.sig -- Emil signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nvc0: optimize coherent buffer checking at draw time
Instead of iterating over all the buffer resources looking for coherent buffers, we keep track of a context-wide count. This will save some iterations (and CPU cycles) in 99.99% case because usually coherent buffers are not so used. Signed-off-by: Samuel Pitoiset --- I didn't test the patch, but I will run a full piglit tonight. src/gallium/drivers/nouveau/nv50/nv50_context.h | 1 + src/gallium/drivers/nouveau/nv50/nv50_state.c | 5 +++ src/gallium/drivers/nouveau/nv50/nv50_vbo.c | 22 ++--- src/gallium/drivers/nouveau/nvc0/nvc0_context.h | 3 ++ src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 26 src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 41 + 6 files changed, 45 insertions(+), 53 deletions(-) diff --git a/src/gallium/drivers/nouveau/nv50/nv50_context.h b/src/gallium/drivers/nouveau/nv50/nv50_context.h index 2cebcd9..dd5b5e3 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_context.h +++ b/src/gallium/drivers/nouveau/nv50/nv50_context.h @@ -134,6 +134,7 @@ struct nv50_context { struct nv50_constbuf constbuf[3][NV50_MAX_PIPE_CONSTBUFS]; uint16_t constbuf_dirty[3]; uint16_t constbuf_valid[3]; + uint16_t constbuf_coherent[3]; struct pipe_vertex_buffer vtxbuf[PIPE_MAX_ATTRIBS]; unsigned num_vtxbufs; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_state.c b/src/gallium/drivers/nouveau/nv50/nv50_state.c index fd7c7cd..c08a63e 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_state.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_state.c @@ -852,8 +852,13 @@ nv50_set_constant_buffer(struct pipe_context *pipe, uint shader, uint index, nv50->constbuf[s][i].offset = cb->buffer_offset; nv50->constbuf[s][i].size = MIN2(align(cb->buffer_size, 0x100), 0x1); nv50->constbuf_valid[s] |= 1 << i; + if (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) + nv50->constbuf_coherent[s] |= 1 << i; + else + nv50->constbuf_coherent[s] &= ~(1 << i); } else { nv50->constbuf_valid[s] &= ~(1 << i); + nv50->constbuf_coherent[s] &= ~(1 << i); } nv50->constbuf_dirty[s] |= 1 << i; diff --git a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c index 85878d5..4ca8f11 100644 --- a/src/gallium/drivers/nouveau/nv50/nv50_vbo.c +++ b/src/gallium/drivers/nouveau/nv50/nv50_vbo.c @@ -791,27 +791,9 @@ nv50_draw_vbo(struct pipe_context *pipe, const struct pipe_draw_info *info) push->kick_notify = nv50_draw_vbo_kick_notify; - /* TODO: Instead of iterating over all the buffer resources looking for -* coherent buffers, keep track of a context-wide count. -*/ for (s = 0; s < 3 && !nv50->cb_dirty; ++s) { - uint32_t valid = nv50->constbuf_valid[s]; - - while (valid && !nv50->cb_dirty) { - const unsigned i = ffs(valid) - 1; - struct pipe_resource *res; - - valid &= ~(1 << i); - if (nv50->constbuf[s][i].user) -continue; - - res = nv50->constbuf[s][i].u.buf; - if (!res) -continue; - - if (res->flags & PIPE_RESOURCE_FLAG_MAP_COHERENT) -nv50->cb_dirty = true; - } + if (nv50->constbuf_coherent[s]) + nv50->cb_dirty = true; } /* If there are any coherent constbufs, flush the cache */ diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h index 39b73ec..1219548 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_context.h +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_context.h @@ -134,10 +134,12 @@ struct nvc0_context { struct nvc0_constbuf constbuf[6][NVC0_MAX_PIPE_CONSTBUFS]; uint16_t constbuf_dirty[6]; uint16_t constbuf_valid[6]; + uint16_t constbuf_coherent[6]; bool cb_dirty; struct pipe_vertex_buffer vtxbuf[PIPE_MAX_ATTRIBS]; unsigned num_vtxbufs; + uint32_t vtxbufs_coherent; struct pipe_index_buffer idxbuf; uint32_t constant_vbos; uint32_t vbo_user; /* bitmask of vertex buffers pointing to user memory */ @@ -149,6 +151,7 @@ struct nvc0_context { struct pipe_sampler_view *textures[6][PIPE_MAX_SAMPLERS]; unsigned num_textures[6]; uint32_t textures_dirty[6]; + uint32_t textures_coherent[6]; struct nv50_tsc_entry *samplers[6][PIPE_MAX_SAMPLERS]; unsigned num_samplers[6]; uint16_t samplers_dirty[6]; diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c index 5da0ea8..b498b5d 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c @@ -549,11 +549,18 @@ nvc0_stage_set_sampler_views(struct nvc0_context *nvc0, int s, for (i = 0; i < nr; ++i) { struct nv50_tic_entry *old = nv50_tic_entry(nvc0->textures[s][i]); + struct pipe_resource *res = views[i]->texture; if (views[i] == nvc0->textures[s][i]) continue; nvc0->textures_dirty[s] |= 1 <<
Re: [Mesa-dev] [PATCH] i965: Separate base offset/constant offset combining from remapping.
On Wednesday, December 09, 2015 08:03:25 AM Jason Ekstrand wrote: > On Dec 9, 2015 2:51 AM, "Kenneth Graunke" wrote: > > > > My tessellation branch has two additional remap functions. I don't want > > to replicate this logic there. > > > > Signed-off-by: Kenneth Graunke > > --- > > src/mesa/drivers/dri/i965/brw_nir.c | 78 > - > > 1 file changed, 50 insertions(+), 28 deletions(-) > > > > Hey Jason, > > > > If you like this patch, and haven't yet merged your NIR input reworks, > > feel free to just squash it into your changes. Or, we can land it > > separately after your changes. It's up to you. > > > > Separating this out allows me to reuse this in my new tessellation input > > and output remapping functions, and also means we don't need to add > structs > > for the remap functions...we can just pass the builder, or inputs_read, or > > the VUE map...and not have to pack multiple things together. > > Sure. It does make things simpler. > > > --Ken > > > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c > b/src/mesa/drivers/dri/i965/brw_nir.c > > index 14ad172..105a175 100644 > > --- a/src/mesa/drivers/dri/i965/brw_nir.c > > +++ b/src/mesa/drivers/dri/i965/brw_nir.c > > @@ -27,15 +27,19 @@ > > #include "glsl/nir/nir_builder.h" > > #include "program/prog_to_nir.h" > > > > -struct remap_vs_attrs_state { > > - nir_builder b; > > - uint64_t inputs_read; > > -}; > > - > > +/** > > + * In many cases, we just add the base and offset together, so there's no > > + * reason to keep them separate. Sometimes, combining them is essential: > > + * if a shader only accesses part of a compound variable (such as a > matrix > > + * or array), the variable's base may not actually exist in the VUE map. > > + * > > + * This pass adds constant offsets to instr->const_index[0], and resets > > + * the offset source to 0. Non-constant offsets remain unchanged. > > + */ > > static bool > > -remap_vs_attrs(nir_block *block, void *void_state) > > +add_const_offset_to_base(nir_block *block, void *closure) > > { > > - struct remap_vs_attrs_state *state = void_state; > > + nir_builder *b = closure; > > > > nir_foreach_instr_safe(block, instr) { > >if (instr->type != nir_instr_type_intrinsic) > > @@ -43,30 +47,48 @@ remap_vs_attrs(nir_block *block, void *void_state) > > > >nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); > > > > + if (intrin->intrinsic == nir_intrinsic_load_input || > > + intrin->intrinsic == nir_intrinsic_load_per_vertex_input || > > + intrin->intrinsic == nir_intrinsic_load_output || > > + intrin->intrinsic == nir_intrinsic_load_per_vertex_output || > > + intrin->intrinsic == nir_intrinsic_store_output || > > + intrin->intrinsic == nir_intrinsic_store_per_vertex_output) { > > This seems a bit scortched-earth. It would be nice if the caller had a bit > more control. We could always add "do input"/"do output" options once we actually need them. It means adding the structs back, but that's fine. I suppose for now we could ignore output intrinsics here. > > + nir_src *offset = nir_get_io_offset_src(intrin); > > + nir_const_value *const_offset = nir_src_as_const_value(*offset); > > + > > + if (const_offset) { > > +intrin->const_index[0] += const_offset->u[0]; > > +b->cursor = nir_before_instr(&intrin->instr); > > +nir_instr_rewrite_src(&intrin->instr, offset, > > + nir_src_for_ssa(nir_imm_int(b, 0))); > > + } > > Else??? It seems that you don't want to run this pass if you think you'll > ever hit an indirect. I guess it's harmless to just do this for all direct > things in our driver, but it doesn't sit well. Else do nothing. The problem I'm trying to avoid with this logic is that inputs/outputs which take up multiple slots and are directly accessed may only have slots assigned for the sub-components that are actually used. So, I can't use the base offset, and need to move the base to the slot that's actually used. If something is accessed indirectly, we assign all of its slots, so using the base location is fine. > > + } > > + } > > + return true; > > + > > +} > > + > > +static bool > > +remap_vs_attrs(nir_block *block, void *closure) > > +{ > > + GLbitfield64 inputs_read = *((GLbitfield64 *) closure); > > + > > + nir_foreach_instr(block, instr) { > > + if (instr->type != nir_instr_type_intrinsic) > > + continue; > > + > > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); > > + > >if (intrin->intrinsic == nir_intrinsic_load_input) { > > /* Attributes come in a contiguous block, ordered by their > >* gl_vert_attrib value. That means we can compute the slot > >* number for an attribute by masking out the enabled attributes > >* before it and counting the bits. > >*/
Re: [Mesa-dev] [PATCH 3/8] glsl: use dual slot helper in the linker code.
On 9 December 2015 at 21:39, Timothy Arceri wrote: > On Wed, 2015-12-09 at 16:06 +1000, Dave Airlie wrote: >> From: Dave Airlie >> >> Signed-off-by: Dave Airlie > > Great timing :) I was going to have to look into fixing this stuff for > enhanced layouts. > > Patches 1 & 2 are: > Reviewed-by: Timothy Arceri > > I have a question about this patch. If these doubles only take up a > single attribute then why do we even bother with this test? The spec > says its optional and your fixing the counting up in later patches so > what does it do thats useful? It's complicated. ARB_gpu_shader_fp64 passes doubles between shaders, dual slots ones take two locations. So a dvec3[2] will consume locations 17/18, 19/20. The limits are in number of locations. ARB_vertex_attrib_64bit allows doubles as vertex inputs. Dual slot ones take up a single location, however they consume two slots with respect to the hw limits. So a dvec3[2] will consume 17, 18. However it will take 4 slots against the vertex attrib limits. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/5] i965: Check base format to determine whether to use tiled memcpy
Reviewed-by: Jason Ekstrand On Thu, Nov 19, 2015 at 7:25 AM, Neil Roberts wrote: > The tiled memcpy doesn't work for copying from RGBX to RGBA because it > doesn't override the alpha component to 1.0. Commit 2cebaac479d4 added > a check to disable it for RGBX formats by looking at the TexFormat. > However a lot of the rest of the code base is written with the > assumption that an RGBA texture can be used internally to implement a > GL_RGB texture. If that is done then this check breaks. This patch > makes it instead check the base format of the texture which I think > more directly matches the intention. > > Cc: Jason Ekstrand > --- > src/mesa/drivers/dri/i965/intel_pixel_read.c | 7 --- > src/mesa/drivers/dri/i965/intel_tex_image.c | 7 --- > 2 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c > b/src/mesa/drivers/dri/i965/intel_pixel_read.c > index 9bcbbd1..c8aef65 100644 > --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c > +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c > @@ -135,10 +135,11 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx, >return false; > > /* We can't handle copying from RGBX or BGRX because the tiled_memcpy > -* function doesn't set the last channel to 1. > +* function doesn't set the last channel to 1. Note this checks BaseFormat > +* rather than TexFormat in case the RGBX format is being simulated with > an > +* RGBA format. > */ > - if (rb->Format == MESA_FORMAT_B8G8R8X8_UNORM || > - rb->Format == MESA_FORMAT_R8G8B8X8_UNORM) > + if (rb->_BaseFormat == GL_RGB) >return false; > > if (!intel_get_memcpy(rb->Format, format, type, &mem_copy, &cpp, > diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c > b/src/mesa/drivers/dri/i965/intel_tex_image.c > index 34b91e8..e3710da7 100644 > --- a/src/mesa/drivers/dri/i965/intel_tex_image.c > +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c > @@ -399,10 +399,11 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context > *ctx, >return false; > > /* We can't handle copying from RGBX or BGRX because the tiled_memcpy > -* function doesn't set the last channel to 1. > +* function doesn't set the last channel to 1. Note this checks BaseFormat > +* rather than TexFormat in case the RGBX format is being simulated with > an > +* RGBA format. > */ > - if (texImage->TexFormat == MESA_FORMAT_B8G8R8X8_UNORM || > - texImage->TexFormat == MESA_FORMAT_R8G8B8X8_UNORM) > + if (texImage->_BaseFormat == GL_RGB) >return false; > > if (!intel_get_memcpy(texImage->TexFormat, format, type, &mem_copy, &cpp, > -- > 1.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] util/u_helpers: return correct number of bound buffers
On Wed, Dec 9, 2015 at 2:01 PM, Patrick Rudolph wrote: > Ok, first of all bind some buffers: > > pipe->set_vertex_buffers(pipe, 0, 1, &vtxbuf); > pipe->set_vertex_buffers(pipe, 1, 1, &vtxbuf); > pipe->set_vertex_buffers(pipe, 2, 1, &vtxbuf); > > num_vtxbufs is now 3 as it should be. > > Now you are unbinding buffers, one after another starting at slot 0: > pipe->set_vertex_buffers(pipe, 0, 1, NULL); > > enabled_buffers = (1ull << num_vtxbufs) - 1 = 0x7; > util_set_vertex_buffers_mask(...); > enabled_buffers = 0x6; > num_vtxbufs = util_last_bit(enabled_buffers) = 3 > > > pipe->set_vertex_buffers(pipe, 1, 1, NULL); > > enabled_buffers = (1ull << num_vtxbufs) - 1 = 0x7; > util_set_vertex_buffers_mask(...); > enabled_buffers = 0x5; > num_vtxbufs = util_last_bit(enabled_buffers) = 3 > > > pipe->set_vertex_buffers(pipe, 2, 1, NULL); > > enabled_buffers = (1ull << num_vtxbufs) - 1 = 0x7; > util_set_vertex_buffers_mask(...); > enabled_buffers = 0x3; > num_vtxbufs = util_last_bit(enabled_buffers) = 2 > > There are no buffers bound any more, but num_vtxbufs is now 2 instead of > 0. > > There would be no problem if you start at slot 2 going down to 0. > There would be no problem if you unbind all buffers at once. > > I hope this clarifies the problem. Right, thank you, yes it does. With a slightly fixed commit log including a bit more justification, this is also Reviewed-by: Ilia Mirkin Basically something mentioning that the current algorithm does not account for pre-existing holes in the buffer list. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 90821] Segfault when calling glViewport on surfaceless EGL context without bound FBO
https://bugs.freedesktop.org/show_bug.cgi?id=90821 Nanley Chery changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #5 from Nanley Chery --- *** This bug has been marked as a duplicate of bug 93257 *** -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nvc0: fix use after free of pipe_resource
I pushed a slightly modified version of this: http://cgit.freedesktop.org/mesa/mesa/commit/?id=432a798cf5c7fab18a3e32d4073840df7d0d37cb Thanks for the patch! I hope this will resolve some weird crashes people have seen with various buffers being null unexpectedly. On Sun, Dec 6, 2015 at 4:11 AM, Patrick Rudolph wrote: > Always reset the vertex bufctx to make sure there's no pointer to > an already freed pipe_resource left after unbinding buffers. > Fixes use after free crash in nvc0_bufctx_fence(). > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004 > > Signed-off-by: Patrick Rudolph > --- > src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > index 5dce5f0..2aa90c9 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c > @@ -1000,12 +1000,16 @@ nvc0_set_vertex_buffers(struct pipe_context *pipe, > struct nvc0_context *nvc0 = nvc0_context(pipe); > unsigned i; > > +if (nvc0->num_vtxbufs) > +nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_VTX); > + > util_set_vertex_buffers_count(nvc0->vtxbuf, &nvc0->num_vtxbufs, vb, >start_slot, count); > > if (!vb) { > nvc0->vbo_user &= ~(((1ull << count) - 1) << start_slot); > nvc0->constant_vbos &= ~(((1ull << count) - 1) << start_slot); > + nvc0->dirty |= NVC0_NEW_ARRAYS; > return; > } > > @@ -1025,7 +1029,6 @@ nvc0_set_vertex_buffers(struct pipe_context *pipe, > } > > nvc0->dirty |= NVC0_NEW_ARRAYS; > -nouveau_bufctx_reset(nvc0->bufctx_3d, NVC0_BIND_VTX); > } > > static void > -- > 2.4.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: handle stencil_bits parameter for MESA_FORMAT_B8G8R8X8_UNORM format.
On Wed, Dec 9, 2015 at 11:23 AM, Ilia Mirkin wrote: > On Wed, Dec 9, 2015 at 11:18 AM, Deve wrote: >> This patch indeed seems to not have a sense. I just added it to the bug >> report as a suggestion that it works for me after this modification. Emil >> Velikov said that I should send it to the mailing list. >> >> Here is how it works in Supertuxkart: >> We create rtt with following parameters: >> >> DepthStencilTexture = generateRTT(res, GL_DEPTH24_STENCIL8, >> GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8); >> >> Then, during rendering scene, we do: >> >> glEnable(GL_FRAMEBUFFER_SRGB); >> glBindFramebuffer(GL_FRAMEBUFFER, 0); > > OK, so this is the "winsys" framebuffer (GL has some term for it, > sorry, I don't remember what it is... perhaps it's even winsys). This > is created based on parameters of your selected GLX visual. For > example, when I run glxinfo, I see (on nouveau; the list on intel will > be different but comparable): > > 480 GLX Visuals > visual x bf lv rg d st colorbuffer sr ax dp st accumbuffer ms cav > id dep cl sp sz l ci b ro r g b a F gb bf th cl r g b a ns b eat > > 0x021 24 tc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None > 0x022 24 dc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None > ... > 0x343 24 tc 0 32 0 r . . 8 8 8 8 . s 0 0 0 0 0 0 0 0 0 None > 0x344 24 tc 0 32 0 r . . 8 8 8 8 . s 0 0 0 16 16 16 16 0 0 Slow > 0x345 24 tc 0 32 0 r y . 8 8 8 8 . s 0 0 0 0 0 0 0 0 0 None > 0x346 24 tc 0 32 0 r y . 8 8 8 8 . s 0 0 0 16 16 16 16 0 0 Slow > > Note how some of them have srgb, others don't (and have various > differences in their various other properties). EGL has something > similar I believe, but tbh I don't remember the specifics. That's the > mesaVis->sRGBCapable bit below. If you need an sRGB-capable visual, > are you sure you're picking one? This would be somewhere well before > any actual rendering takes place. > > If you're not sure whether you need an sRGB-capable visual or not, try > making sure you pick a *non-srgb* visual in a working configuration > and see if it breaks. Right, so looking at your code with in more detail, you probably just require a stencil visual, which, in combination with this patch, makes you skip this one. However I noticed that i965/hsw and probably others don't expose *any* sRGB-capable GLX visuals/fb configs! Pretty sure that's not good... but GLX, sRGB, and visuals are not exactly my strong point, hopefully someone a bit more clued in can comment. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/util: handle patches in u_prims_for_vertices to fix a radeonsi crash
On 2015-12-10 01:47, Marek Olšák wrote: From: Marek Olšák I guess the crash was because of divison by zero. Cc: 11.0 11.1 --- src/gallium/auxiliary/util/u_prim.h | 17 + src/gallium/drivers/radeonsi/si_state_draw.c | 3 ++- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/gallium/auxiliary/util/u_prim.h b/src/gallium/auxiliary/util/u_prim.h index 3668015..4926af6 100644 --- a/src/gallium/auxiliary/util/u_prim.h +++ b/src/gallium/auxiliary/util/u_prim.h @@ -141,14 +141,23 @@ u_prim_vertex_count(unsigned prim) * For polygons, return the number of triangles. */ static inline unsigned -u_prims_for_vertices(unsigned prim, unsigned num) +u_prims_for_vertices(unsigned prim, unsigned num, unsigned vertices_per_patch) { - const struct u_prim_vertex_count *info = u_prim_vertex_count(prim); + struct u_prim_vertex_count info; - if (num < info->min) + if (prim == PIPE_PRIM_PATCHES) + info.min = info.incr = vertices_per_patch; + else if (prim < PIPE_PRIM_MAX) We already do this check in u_prim_vertex_count() and if out-of-bounds we returned a NULL. Perhaps it would be better avoid this extra else-if branch here and just in the else branch, make the call and then assert on the NULL. + info = *u_prim_vertex_count(prim); + else { + assert(!"invalid prim type"); + return 0; + } + + if (num < info.min) return 0; Well convolving this with my previous patch, http://lists.freedesktop.org/archives/mesa-dev/2015-December/102729.html I think we should still have an assert(info.incr != 0); here. - return 1 + ((num - info->min) / info->incr); + return 1 + ((num - info.min) / info.incr); } static inline boolean u_validate_pipe_prim( unsigned pipe_prim, unsigned nr ) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index ee84a1f..4ac9d0a 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -320,7 +320,8 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx, if (sctx->b.screen->info.max_se >= 2 && ia_switch_on_eoi && (info->indirect || (info->instance_count > 1 && - u_prims_for_vertices(info->mode, info->count) <= 1))) + u_prims_for_vertices(info->mode, info->count, + info->vertices_per_patch) <= 1))) sctx->b.flags |= SI_CONTEXT_VGT_FLUSH; return S_028AA8_SWITCH_ON_EOP(ia_switch_on_eop) | ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] util/u_helpers: return correct number of bound buffers
Ok, first of all bind some buffers: pipe->set_vertex_buffers(pipe, 0, 1, &vtxbuf); pipe->set_vertex_buffers(pipe, 1, 1, &vtxbuf); pipe->set_vertex_buffers(pipe, 2, 1, &vtxbuf); num_vtxbufs is now 3 as it should be. Now you are unbinding buffers, one after another starting at slot 0: pipe->set_vertex_buffers(pipe, 0, 1, NULL); enabled_buffers = (1ull << num_vtxbufs) - 1 = 0x7; util_set_vertex_buffers_mask(...); enabled_buffers = 0x6; num_vtxbufs = util_last_bit(enabled_buffers) = 3 pipe->set_vertex_buffers(pipe, 1, 1, NULL); enabled_buffers = (1ull << num_vtxbufs) - 1 = 0x7; util_set_vertex_buffers_mask(...); enabled_buffers = 0x5; num_vtxbufs = util_last_bit(enabled_buffers) = 3 pipe->set_vertex_buffers(pipe, 2, 1, NULL); enabled_buffers = (1ull << num_vtxbufs) - 1 = 0x7; util_set_vertex_buffers_mask(...); enabled_buffers = 0x3; num_vtxbufs = util_last_bit(enabled_buffers) = 2 There are no buffers bound any more, but num_vtxbufs is now 2 instead of 0. There would be no problem if you start at slot 2 going down to 0. There would be no problem if you unbind all buffers at once. I hope this clarifies the problem. Kind Regards, Patrick On 2015-12-09 07:10 PM, Ilia Mirkin wrote: > I'm probably just being dense... can you provide an exact sequence of > calls that would cause this logic to fail? Seems like it should work > as-is... > > On Sun, Dec 6, 2015 at 4:12 AM, Patrick Rudolph wrote: >> In case a state tracker unbinds every slot by a seperate >> pipe->set_vertex_buffers() call, starting from slot zero, the number >> of bound buffers would not reach zero at all. Unbinding all buffers >> at once or starting at the top-most slot results in correct behaviour. >> >> Calculating the correct number of bound buffers fixes a NULL pointer >> dereference in nvc0_validate_vertex_buffers_shared(). >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004 >> >> Signed-off-by: Patrick Rudolph >> --- >> src/gallium/auxiliary/util/u_helpers.c | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/src/gallium/auxiliary/util/u_helpers.c >> b/src/gallium/auxiliary/util/u_helpers.c >> index 09619c1..09020b0 100644 >> --- a/src/gallium/auxiliary/util/u_helpers.c >> +++ b/src/gallium/auxiliary/util/u_helpers.c >> @@ -81,7 +81,13 @@ void util_set_vertex_buffers_count(struct >> pipe_vertex_buffer *dst, >> const struct pipe_vertex_buffer *src, >> unsigned start_slot, unsigned count) >> { >> - uint32_t enabled_buffers = (1ull << *dst_count) - 1; >> + unsigned i; >> + uint32_t enabled_buffers = 0; >> + >> + for (i = 0; i < *dst_count; i++) { >> + if (dst[i].buffer || dst[i].user_buffer) >> + enabled_buffers |= (1ull << i); >> + } >> >> util_set_vertex_buffers_mask(dst, &enabled_buffers, src, start_slot, >> count); >> -- >> 2.4.3 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] softpipe: V.2 implement some support for multiple viewports
Roland, I could not due to ml size limit or something, it just bounces hence the pull request. Cheers, Edward. On 2015-12-10 02:38, Roland Scheidegger wrote: Am 09.12.2015 um 05:16 schrieb Edward O'Callaghan: This fixes my initial attempt so that piglit now passes 14/14. Thanks to a couple of tips from Roland in the previous patch I was able to fix the remaining issue. This should be golden now. Great that you got it working! Please send the patches to the ml. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: detect inefficient buffer use and report through debug output
On Mon, Dec 7, 2015 at 8:42 PM, Brian Paul wrote: > When a buffer is created with GL_STATIC_DRAW, its contents should not > be changed frequently. But that's exactly what one application I'm > debugging does. This patch adds code to try to detect inefficient > buffer use in a couple places. The GL_ARB_debug_output mechanism is > used to report the issue. > > NVIDIA's driver detects these sort of things too. > > Other types of inefficient buffer use could also be detected in the > future. > --- > src/mesa/main/bufferobj.c | 55 > +++ > src/mesa/main/mtypes.h| 4 > 2 files changed, 59 insertions(+) > > diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c > index f985982..6bc1b5e 100644 > --- a/src/mesa/main/bufferobj.c > +++ b/src/mesa/main/bufferobj.c > @@ -51,6 +51,34 @@ > > > /** > + * We count the number of buffer modification calls to check for > + * inefficient buffer use. This is the number of such calls before we > + * issue a warning. > + */ > +#define BUFFER_WARNING_CALL_COUNT 4 > + > + > +/** > + * Helper to warn of possible performance issues, such as frequently > + * updating a buffer created with GL_STATIC_DRAW. > + */ > +static void > +buffer_usage_warning(struct gl_context *ctx, const char *fmt, ...) > +{ > + va_list args; > + GLuint msg_id = 0; This needs to be wrapped in a macro, with a 'static' id (at each macro invocation), otherwise a fresh id will get generated each time this is called, which is presumably not desirable. Same as what I did with pipe_debug_message/_pipe_debug_message. [I know you already pushed this... that's how I noticed... but should still get fixed.] -ilia > + > + va_start(args, fmt); > + _mesa_gl_vdebug(ctx, &msg_id, > + MESA_DEBUG_SOURCE_API, > + MESA_DEBUG_TYPE_PERFORMANCE, > + MESA_DEBUG_SEVERITY_MEDIUM, > + fmt, args); > + va_end(args); > +} > + > + > +/** > * Used as a placeholder for buffer objects between glGenBuffers() and > * glBindBuffer() so that glIsBuffer() can work correctly. > */ > @@ -1677,6 +1705,21 @@ _mesa_buffer_sub_data(struct gl_context *ctx, struct > gl_buffer_object *bufObj, > if (size == 0) >return; > > + bufObj->NumSubDataCalls++; > + > + if ((bufObj->Usage == GL_STATIC_DRAW || > +bufObj->Usage == GL_STATIC_COPY) && > + bufObj->NumSubDataCalls >= BUFFER_WARNING_CALL_COUNT) { > + /* If the application declared the buffer as static draw/copy or stream > + * draw, it should not be frequently modified with glBufferSubData. > + */ > + buffer_usage_warning(ctx, > + "using %s(buffer %u, offset %u, size %u) to " > + "update a %s buffer", > + func, bufObj->Name, offset, size, > + _mesa_enum_to_string(bufObj->Usage)); > + } > + > bufObj->Written = GL_TRUE; > > assert(ctx->Driver.BufferSubData); > @@ -2384,6 +2427,18 @@ _mesa_map_buffer_range(struct gl_context *ctx, >return NULL; > } > > + if (access & GL_MAP_WRITE_BIT) { > + bufObj->NumMapBufferWriteCalls++; > + if ((bufObj->Usage == GL_STATIC_DRAW || > + bufObj->Usage == GL_STATIC_COPY) && > + bufObj->NumMapBufferWriteCalls >= BUFFER_WARNING_CALL_COUNT) { > + buffer_usage_warning(ctx, > + "using %s(buffer %u, offset %u, length %u) to " > + "update a %s buffer", > + func, bufObj->Name, offset, length, > + _mesa_enum_to_string(bufObj->Usage)); > + } > + } > > assert(ctx->Driver.MapBufferRange); > map = ctx->Driver.MapBufferRange(ctx, offset, length, access, bufObj, > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index 1eb1e21..de54169 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -1275,6 +1275,10 @@ struct gl_buffer_object > GLboolean Immutable; /**< GL_ARB_buffer_storage */ > gl_buffer_usage UsageHistory; /**< How has this buffer been used so far? > */ > > + /** Counters used for buffer usage warnings */ > + GLuint NumSubDataCalls; > + GLuint NumMapBufferWriteCalls; > + > struct gl_buffer_mapping Mappings[MAP_COUNT]; > }; > > -- > 1.9.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 1/5] i965/eu: set correct execution size in brw_NOP
On Wed, Dec 9, 2015 at 4:15 AM, Iago Toral Quiroga wrote: > --- > src/mesa/drivers/dri/i965/brw_eu_emit.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > index f8c0f80..9543d5e 100644 > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > @@ -1256,6 +1256,7 @@ brw_F16TO32(struct brw_codegen *p, struct brw_reg dst, > struct brw_reg src) > void brw_NOP(struct brw_codegen *p) > { > brw_inst *insn = next_insn(p, BRW_OPCODE_NOP); > + brw_inst_set_exec_size(p->devinfo, insn, BRW_EXECUTE_4); I don't follow this change. Was this implicitly set before? At least in newer documentation, NOP is defined to have nearly all fields 0 which would mean execution size must be 1. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] configure.ac: fix test for SSE4.1 assembler support
On Tue, Dec 8, 2015 at 9:37 PM, Jonathan Gray wrote: > Change the __m128i variables to be volatile so gcc 4.9 won't optimise > all of them out with -O1 or greater. The _mm_set1_epi32/pinsrd calls > still get optimised out but now there is at least one SSE4.1 instruction > generated via _mm_max_epu32/pmaxud. When all of the sse4.1 instructions > got optimised out the configure test would incorrectly pass when the > compiler supported the intrinsics and the assembler didn't support the > instructions. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806 > Signed-off-by: Jonathan Gray > Cc: "11.0 11.1" > --- > configure.ac | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure.ac b/configure.ac > index 260934d..1d82e47 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -384,7 +384,7 @@ CFLAGS="$SSE41_CFLAGS $CFLAGS" > AC_COMPILE_IFELSE([AC_LANG_SOURCE([[ > #include > int main () { > -__m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c; > +volatile __m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c; > c = _mm_max_epu32(a, b); > return 0; I would have extracted an int from the result of _mm_max_epu32 and returned that instead of 0. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] util/u_helpers: return correct number of bound buffers
I'm probably just being dense... can you provide an exact sequence of calls that would cause this logic to fail? Seems like it should work as-is... On Sun, Dec 6, 2015 at 4:12 AM, Patrick Rudolph wrote: > In case a state tracker unbinds every slot by a seperate > pipe->set_vertex_buffers() call, starting from slot zero, the number > of bound buffers would not reach zero at all. Unbinding all buffers > at once or starting at the top-most slot results in correct behaviour. > > Calculating the correct number of bound buffers fixes a NULL pointer > dereference in nvc0_validate_vertex_buffers_shared(). > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93004 > > Signed-off-by: Patrick Rudolph > --- > src/gallium/auxiliary/util/u_helpers.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/auxiliary/util/u_helpers.c > b/src/gallium/auxiliary/util/u_helpers.c > index 09619c1..09020b0 100644 > --- a/src/gallium/auxiliary/util/u_helpers.c > +++ b/src/gallium/auxiliary/util/u_helpers.c > @@ -81,7 +81,13 @@ void util_set_vertex_buffers_count(struct > pipe_vertex_buffer *dst, > const struct pipe_vertex_buffer *src, > unsigned start_slot, unsigned count) > { > - uint32_t enabled_buffers = (1ull << *dst_count) - 1; > + unsigned i; > + uint32_t enabled_buffers = 0; > + > + for (i = 0; i < *dst_count; i++) { > + if (dst[i].buffer || dst[i].user_buffer) > + enabled_buffers |= (1ull << i); > + } > > util_set_vertex_buffers_mask(dst, &enabled_buffers, src, start_slot, > count); > -- > 2.4.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] glapi: Build gl_gentable.c only on Darwin
The general concept of this change seems fine to me. Given the desire to keep glapi as similar as possible across platforms, would it be better to just move this into glx/apple rather than leaving it in glapi? > On Dec 9, 2015, at 09:07, Emil Velikov wrote: > > On 9 December 2015 at 14:11, Andreas Boll wrote: >> Removes the public symbol _glapi_create_table_from_handle from >> libGL.so.1 on all plattforms except Darwin. >> > typo -> platforms > >> Since the symbol is not used on other plattforms it makes sense to > ditto > >> build gl_gentable.c only on Darwin. >> > Ideally we'll keep the dispatch as close to identical across all > platforms (i.e. we'll nuke this), although for now this will do. > > Out of curiosity is there any noticeable difference in the build times? > > ... >> XXX If we still want to distribute gl_gentable.c in the release tarball >> we could drop the changes in src/mapi/glapi/gen/Makefile.am >> > Yes please. We want to ship all the generated sources, regardless of > the platform they're used. > > ... >> index 2da8f7d..25ea44a 100644 >> --- a/src/mapi/glapi/gen/Makefile.am >> +++ b/src/mapi/glapi/gen/Makefile.am >> @@ -27,8 +27,11 @@ MESA_GLAPI_OUTPUTS = \ >>$(MESA_GLAPI_DIR)/glapi_mapi_tmp.h \ >>$(MESA_GLAPI_DIR)/glprocs.h \ >>$(MESA_GLAPI_DIR)/glapitemp.h \ >> - $(MESA_GLAPI_DIR)/glapitable.h \ >> - $(MESA_GLAPI_DIR)/glapi_gentable.c >> + $(MESA_GLAPI_DIR)/glapitable.h >> + >> +if HAVE_APPLEDRI >> +MESA_GLAPI_OUTPUTS += $(MESA_GLAPI_DIR)/glapi_gentable.c >> +endif >> >> MESA_GLAPI_ASM_OUTPUTS = >> if HAVE_X86_ASM >> @@ -88,8 +91,11 @@ XORG_GLAPI_DIR = $(XORG_BASE)/glx >> XORG_GLAPI_OUTPUTS = \ >>$(XORG_GLAPI_DIR)/glprocs.h \ >>$(XORG_GLAPI_DIR)/glapitable.h \ >> - $(XORG_GLAPI_DIR)/dispatch.h \ >> + $(XORG_GLAPI_DIR)/dispatch.h >> + >> +if HAVE_APPLEDRI >>$(XORG_GLAPI_DIR)/glapi_gentable.c > Erm missing XORG_GLAPI_OUTPUTS += ? > > Afaict even with the above makefile changes the file should still be > in the in the tarball. Am I missing something ? > > Would be great if Jeremy (or someone else) has the chance to test > this, in case we've missing something. > > -Emil smime.p7s Description: S/MIME cryptographic signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] glapi: Build gl_gentable.c only on Darwin
On 9 December 2015 at 14:11, Andreas Boll wrote: > Removes the public symbol _glapi_create_table_from_handle from > libGL.so.1 on all plattforms except Darwin. > typo -> platforms > Since the symbol is not used on other plattforms it makes sense to ditto > build gl_gentable.c only on Darwin. > Ideally we'll keep the dispatch as close to identical across all platforms (i.e. we'll nuke this), although for now this will do. Out of curiosity is there any noticeable difference in the build times? ... > XXX If we still want to distribute gl_gentable.c in the release tarball > we could drop the changes in src/mapi/glapi/gen/Makefile.am > Yes please. We want to ship all the generated sources, regardless of the platform they're used. ... > index 2da8f7d..25ea44a 100644 > --- a/src/mapi/glapi/gen/Makefile.am > +++ b/src/mapi/glapi/gen/Makefile.am > @@ -27,8 +27,11 @@ MESA_GLAPI_OUTPUTS = \ > $(MESA_GLAPI_DIR)/glapi_mapi_tmp.h \ > $(MESA_GLAPI_DIR)/glprocs.h \ > $(MESA_GLAPI_DIR)/glapitemp.h \ > - $(MESA_GLAPI_DIR)/glapitable.h \ > - $(MESA_GLAPI_DIR)/glapi_gentable.c > + $(MESA_GLAPI_DIR)/glapitable.h > + > +if HAVE_APPLEDRI > +MESA_GLAPI_OUTPUTS += $(MESA_GLAPI_DIR)/glapi_gentable.c > +endif > > MESA_GLAPI_ASM_OUTPUTS = > if HAVE_X86_ASM > @@ -88,8 +91,11 @@ XORG_GLAPI_DIR = $(XORG_BASE)/glx > XORG_GLAPI_OUTPUTS = \ > $(XORG_GLAPI_DIR)/glprocs.h \ > $(XORG_GLAPI_DIR)/glapitable.h \ > - $(XORG_GLAPI_DIR)/dispatch.h \ > + $(XORG_GLAPI_DIR)/dispatch.h > + > +if HAVE_APPLEDRI > $(XORG_GLAPI_DIR)/glapi_gentable.c Erm missing XORG_GLAPI_OUTPUTS += ? Afaict even with the above makefile changes the file should still be in the in the tarball. Am I missing something ? Would be great if Jeremy (or someone else) has the chance to test this, in case we've missing something. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] glsl: Fix a typo in a comment
On 12/09/2015 09:20 AM, Andreas Boll wrote: s/suports/supports/ Signed-off-by: Andreas Boll --- src/glsl/glsl_parser_extras.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index 6bded3e..a4bda77 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -97,7 +97,7 @@ struct _mesa_glsl_parse_state { * supports the feature. * * \param required_glsl_es_version is the GLSL ES version that is required -* to support the feature, or 0 if no version of GLSL ES suports the +* to support the feature, or 0 if no version of GLSL ES supports the * feature. */ bool is_version(unsigned required_glsl_version, For all 4, Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glx: Fix a typo in a comment
On 12/09/2015 09:29 AM, Andreas Boll wrote: s/suports/supports/ Signed-off-by: Andreas Boll --- Found two more "suports" typos. I could squash all patches together if that's preferred. src/glx/dri2_glx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 27ea952..651915a 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1289,7 +1289,7 @@ dri2CreateScreen(int screen, struct glx_display * priv) __glXEnableDirectExtension(&psc->base, "GLX_OML_sync_control"); } - /* DRI2 suports SubBuffer through DRI2CopyRegion, so it's always + /* DRI2 supports SubBuffer through DRI2CopyRegion, so it's always * available.*/ psp->copySubBuffer = dri2CopySubBuffer; __glXEnableDirectExtension(&psc->base, "GLX_MESA_copy_sub_buffer"); For both, Reviewed-by: Brian Paul I'd say you don't really have to get reviews for fixing comment typos though. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] mesa: Fix a typo in a comment
s/suports/supports/ Signed-off-by: Andreas Boll --- src/mesa/main/extensions.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/extensions.h b/src/mesa/main/extensions.h index 1615e1c..b5e0350 100644 --- a/src/mesa/main/extensions.h +++ b/src/mesa/main/extensions.h @@ -88,7 +88,7 @@ enum { }; -/** Checks if the context suports a user-facing extension */ +/** Checks if the context supports a user-facing extension */ #define EXT(name_str, driver_cap, ...) \ static inline bool \ _mesa_has_##name_str(const struct gl_context *ctx) \ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] glx: Fix a typo in a comment
s/suports/supports/ Signed-off-by: Andreas Boll --- Found two more "suports" typos. I could squash all patches together if that's preferred. src/glx/dri2_glx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glx/dri2_glx.c b/src/glx/dri2_glx.c index 27ea952..651915a 100644 --- a/src/glx/dri2_glx.c +++ b/src/glx/dri2_glx.c @@ -1289,7 +1289,7 @@ dri2CreateScreen(int screen, struct glx_display * priv) __glXEnableDirectExtension(&psc->base, "GLX_OML_sync_control"); } - /* DRI2 suports SubBuffer through DRI2CopyRegion, so it's always + /* DRI2 supports SubBuffer through DRI2CopyRegion, so it's always * available.*/ psp->copySubBuffer = dri2CopySubBuffer; __glXEnableDirectExtension(&psc->base, "GLX_MESA_copy_sub_buffer"); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: handle stencil_bits parameter for MESA_FORMAT_B8G8R8X8_UNORM format.
On Wed, Dec 9, 2015 at 11:18 AM, Deve wrote: > This patch indeed seems to not have a sense. I just added it to the bug > report as a suggestion that it works for me after this modification. Emil > Velikov said that I should send it to the mailing list. > > Here is how it works in Supertuxkart: > We create rtt with following parameters: > > DepthStencilTexture = generateRTT(res, GL_DEPTH24_STENCIL8, > GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8); > > Then, during rendering scene, we do: > > glEnable(GL_FRAMEBUFFER_SRGB); > glBindFramebuffer(GL_FRAMEBUFFER, 0); OK, so this is the "winsys" framebuffer (GL has some term for it, sorry, I don't remember what it is... perhaps it's even winsys). This is created based on parameters of your selected GLX visual. For example, when I run glxinfo, I see (on nouveau; the list on intel will be different but comparable): 480 GLX Visuals visual x bf lv rg d st colorbuffer sr ax dp st accumbuffer ms cav id dep cl sp sz l ci b ro r g b a F gb bf th cl r g b a ns b eat 0x021 24 tc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None 0x022 24 dc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None ... 0x343 24 tc 0 32 0 r . . 8 8 8 8 . s 0 0 0 0 0 0 0 0 0 None 0x344 24 tc 0 32 0 r . . 8 8 8 8 . s 0 0 0 16 16 16 16 0 0 Slow 0x345 24 tc 0 32 0 r y . 8 8 8 8 . s 0 0 0 0 0 0 0 0 0 None 0x346 24 tc 0 32 0 r y . 8 8 8 8 . s 0 0 0 16 16 16 16 0 0 Slow Note how some of them have srgb, others don't (and have various differences in their various other properties). EGL has something similar I believe, but tbh I don't remember the specifics. That's the mesaVis->sRGBCapable bit below. If you need an sRGB-capable visual, are you sure you're picking one? This would be somewhere well before any actual rendering takes place. If you're not sure whether you need an sRGB-capable visual or not, try making sure you pick a *non-srgb* visual in a working configuration and see if it breaks. Cheers, -ilia > (...) > render(); > (...) > glDisable(GL_FRAMEBUFFER_SRGB); > > It looks that glEnable(GL_FRAMEBUFFER_SRGB) doesn't work anymore. It's > because of following lines in intel_screen.c in intelCreateBuffer() > function: > >if (mesaVis->redBits == 5) > rgbFormat = MESA_FORMAT_B5G6R5_UNORM; >else if (mesaVis->sRGBCapable) > rgbFormat = MESA_FORMAT_B8G8R8A8_SRGB; >else if (mesaVis->alphaBits == 0) > rgbFormat = MESA_FORMAT_B8G8R8X8_UNORM; >else { > rgbFormat = MESA_FORMAT_B8G8R8A8_SRGB; > fb->Visual.sRGBCapable = true; >} > > Previously MESA_FORMAT_B8G8R8X8_UNORM was not available, and thus > MESA_FORMAT_B8G8R8A8_UNORM was handled as last case (using > MESA_FORMAT_B8G8R8A8_SRGB format). Now it uses MESA_FORMAT_B8G8R8X8_UNORM > format. > > Any ideas how it should be handled? > > Regards, > Deve > > W dniu 09.12.2015 o 03:00, Ilia Mirkin pisze: > >> On Mon, Dec 7, 2015 at 5:32 PM, Dawid Gan wrote: >>> >>> This format has been added in commit: >>> 28090b30dd6b5977de085f48c620574214b6b4ba >>> But it was handled in the same way as MESA_FORMAT_B8G8R8A8_UNORM format. >>> It was causing the screen in Supertuxkart to be darker than expected, >>> see: >>> https://bugs.freedesktop.org/show_bug.cgi?id=92759 >>> >>> Cc: Boyan Ding >>> Cc: "11.0 11.1" >>> Fixes: 28090b30dd6 "i965: Add XRGB format to >>> intel_screen_make_configs" >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92759 >>> --- >>> src/mesa/drivers/dri/i965/intel_screen.c | 9 + >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c >>> b/src/mesa/drivers/dri/i965/intel_screen.c >>> index cc90efe..75d5a65 100644 >>> --- a/src/mesa/drivers/dri/i965/intel_screen.c >>> +++ b/src/mesa/drivers/dri/i965/intel_screen.c >>> @@ -1237,6 +1237,9 @@ intel_screen_make_configs(__DRIscreen *dri_screen) >>>stencil_bits[2] = 8; >>>num_depth_stencil_bits = 3; >>>} >>> + } else if (formats[i] == MESA_FORMAT_B8G8R8X8_UNORM) { >>> + depth_bits[1] = 24; >>> + stencil_bits[1] = 0; >> >> >> Why would you want depth without stencil when using BGRX? I don't see >> how the two are connected... Are you sure you're picking the right >> visual? >> >>-ilia >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91888] EGL Wayland software rendering no longer work after regression
https://bugs.freedesktop.org/show_bug.cgi?id=91888 --- Comment #19 from nerdopol...@verizon.net --- I didn't see any thing in the changelog for 'egl' that looked like it might be a fix... Not 100% sure though -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] mesa: Fix typos in print messages
s/inconsistant/inconsistent/ s/occurences/occurrences/ Signed-off-by: Andreas Boll --- src/mesa/main/teximage.c | 2 +- src/mesa/main/transformfeedback.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index 60fc7cc..73b3318 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -2028,7 +2028,7 @@ compressed_texture_error_check(struct gl_context *ctx, GLint dimensions, * if is not consistent with the format, dimensions, and * contents of the specified image. */ - reason = "imageSize inconsistant with width/height/format"; + reason = "imageSize inconsistent with width/height/format"; error = GL_INVALID_VALUE; goto error; } diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index 103011c..976b268 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -861,7 +861,7 @@ _mesa_TransformFeedbackVaryings(GLuint program, GLsizei count, if (buffers > ctx->Const.MaxTransformFeedbackBuffers) { _mesa_error(ctx, GL_INVALID_OPERATION, "glTransformFeedbackVaryings(too many gl_NextBuffer " -"occurences)"); +"occurrences)"); return; } } else { -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] glsl: Fix a typo in a comment
s/suports/supports/ Signed-off-by: Andreas Boll --- src/glsl/glsl_parser_extras.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index 6bded3e..a4bda77 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -97,7 +97,7 @@ struct _mesa_glsl_parse_state { * supports the feature. * * \param required_glsl_es_version is the GLSL ES version that is required -* to support the feature, or 0 if no version of GLSL ES suports the +* to support the feature, or 0 if no version of GLSL ES supports the * feature. */ bool is_version(unsigned required_glsl_version, -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] st/osmesa: Fix a typo in a comment
s/suport/support/ Signed-off-by: Andreas Boll --- src/gallium/state_trackers/osmesa/osmesa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/osmesa/osmesa.c b/src/gallium/state_trackers/osmesa/osmesa.c index 0285cb0..0f27ba8 100644 --- a/src/gallium/state_trackers/osmesa/osmesa.c +++ b/src/gallium/state_trackers/osmesa/osmesa.c @@ -32,7 +32,7 @@ * may be set to "softpipe" or "llvmpipe" to override. * * With softpipe we could render directly into the user's buffer by using a - * display target resource. However, softpipe doesn't suport "upside-down" + * display target resource. However, softpipe doesn't support "upside-down" * rendering which would be needed for the OSMESA_Y_UP=TRUE case. * * With llvmpipe we could only render directly into the user's buffer when its -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] meta: Fix a typo in a print message
s/Unkown/Unknown/ Signed-off-by: Andreas Boll --- src/mesa/drivers/common/meta_blit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index c5faf61..4dbf0a7 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -325,7 +325,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx, } break; default: - _mesa_problem(ctx, "Unkown texture target %s\n", + _mesa_problem(ctx, "Unknown texture target %s\n", _mesa_enum_to_string(target)); shader_index = BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: handle stencil_bits parameter for MESA_FORMAT_B8G8R8X8_UNORM format.
This patch indeed seems to not have a sense. I just added it to the bug report as a suggestion that it works for me after this modification. Emil Velikov said that I should send it to the mailing list. Here is how it works in Supertuxkart: We create rtt with following parameters: DepthStencilTexture = generateRTT(res, GL_DEPTH24_STENCIL8, GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8); Then, during rendering scene, we do: glEnable(GL_FRAMEBUFFER_SRGB); glBindFramebuffer(GL_FRAMEBUFFER, 0); (...) render(); (...) glDisable(GL_FRAMEBUFFER_SRGB); It looks that glEnable(GL_FRAMEBUFFER_SRGB) doesn't work anymore. It's because of following lines in intel_screen.c in intelCreateBuffer() function: if (mesaVis->redBits == 5) rgbFormat = MESA_FORMAT_B5G6R5_UNORM; else if (mesaVis->sRGBCapable) rgbFormat = MESA_FORMAT_B8G8R8A8_SRGB; else if (mesaVis->alphaBits == 0) rgbFormat = MESA_FORMAT_B8G8R8X8_UNORM; else { rgbFormat = MESA_FORMAT_B8G8R8A8_SRGB; fb->Visual.sRGBCapable = true; } Previously MESA_FORMAT_B8G8R8X8_UNORM was not available, and thus MESA_FORMAT_B8G8R8A8_UNORM was handled as last case (using MESA_FORMAT_B8G8R8A8_SRGB format). Now it uses MESA_FORMAT_B8G8R8X8_UNORM format. Any ideas how it should be handled? Regards, Deve W dniu 09.12.2015 o 03:00, Ilia Mirkin pisze: On Mon, Dec 7, 2015 at 5:32 PM, Dawid Gan wrote: This format has been added in commit: 28090b30dd6b5977de085f48c620574214b6b4ba But it was handled in the same way as MESA_FORMAT_B8G8R8A8_UNORM format. It was causing the screen in Supertuxkart to be darker than expected, see: https://bugs.freedesktop.org/show_bug.cgi?id=92759 Cc: Boyan Ding Cc: "11.0 11.1" Fixes: 28090b30dd6 "i965: Add XRGB format to intel_screen_make_configs" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92759 --- src/mesa/drivers/dri/i965/intel_screen.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index cc90efe..75d5a65 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1237,6 +1237,9 @@ intel_screen_make_configs(__DRIscreen *dri_screen) stencil_bits[2] = 8; num_depth_stencil_bits = 3; } + } else if (formats[i] == MESA_FORMAT_B8G8R8X8_UNORM) { + depth_bits[1] = 24; + stencil_bits[1] = 0; Why would you want depth without stencil when using BGRX? I don't see how the two are connected... Are you sure you're picking the right visual? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] Fix locking of GLsync objects
On Wed, Dec 09, 2015 at 10:35:25AM +, Emil Velikov wrote: >> I was told that it's easier for people to review my patch if it comes in via >> email than being stuck in the bug tracker; FWIW, this is for bug 120238. > Which bugtracker it this ? bugs.fd.o does not like the number > mentioned. Please add the full URL to the commit message with a > Bugzilla: tag. I mixed up some bug numbers, sorry. https://bugs.freedesktop.org/show_bug.cgi?id=92757 >> GLsync objects had a race condition when used from multiple threads >> (which is the main point of the extension, really); it could be >> validated as a sync object at the beginning of the function, and then >> deleted by another thread before use, causing crashes. Fix this by >> changing all casts from GLsync to struct gl_sync_object to a new >> function _mesa_get_sync() that validates and increases the refcount. > Might be worth keeping _mesa_ref_sync_object(), even if it's an inline > wrapper around the above. As things get a bit confusing - foo_get vs > foo_unref. What about _mesa_get_and_ref_sync()? > Alternatively one could even throw the locking (+extra checks) into > the validate, use it in _mesa_IsSync(), while using the ref/unref > combo elsewhere and drop the "amount" argument from unref. I agree the amount argument is a bit icky. :-) > Please mention if this commit fixes a certain game/program. The motivating program is unreleased for now, but it will be released (under GPLv3) in February. > Can you also add the following tag. This way it'll be harder to miss > the patch when picking things for the stable branch(es). > > Cc: "11.0 11.1" Sure. (I'll send an updated patch when we have agreement on the issues above.) /* Steinar */ -- Homepage: https://www.sesse.net/ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH 5/5] i965: Skip execution size adjustment for instructions of width 4
On Dec 9, 2015 4:16 AM, "Iago Toral Quiroga" wrote: > > This code in brw_set_dest adjusts the execution size of any instruction > with a dst.width < 8. However, we don't want to do this with instructions > operating on doubles, since these will have a width of 4, but still > need an execution size of 8 (for SIMD8). Unfortunately, we can't just check > the size of the operands involved to detect if we are doing an operation on > doubles, because we can have instructions that do operations on double > operands interpreted as UD, operating on any of its 2 32-bit components. > > Previous commits have made it so we never emit instructions with a horizontal > width of 4 that don't have the correct execution size set for gen7/gen8, so > we can skip it in this case, avoiding the conflicts with fp64 requirements. > > Expanding the same fix to other hardware generations requires many more > changes but since we are not targetting fp64 support on them > wer don't really care for now. > --- > src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c > index 78f2c8c..50a8771 100644 > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > @@ -202,8 +202,20 @@ brw_set_dest(struct brw_codegen *p, brw_inst *inst, struct brw_reg dest) > /* Generators should set a default exec_size of either 8 (SIMD4x2 or SIMD8) > * or 16 (SIMD16), as that's normally correct. However, when dealing with > * small registers, we automatically reduce it to match the register size. > +* > +* In platforms that support fp64 we can emit instructions with a width of > +* 4 that need two SIMD8 registers and an exec_size of 8 or 16. In these > +* cases we need to make sure that these instructions have their exec sizes > +* set properly when they are emitted and we can't rely on this code to fix > +* it. > */ > - if (dest.width < BRW_EXECUTE_8) > + bool fix_exec_size; > + if (devinfo->gen == 7 || devinfo->gen == 8) If we're doing to take this approach, we definitely want to make it gen > 6 or something so we include future gens. Really gen > 4 is probably doable since the only real problem is the legacy clipping code. > + fix_exec_size = dest.width < BRW_EXECUTE_4; > + else > + fix_exec_size = dest.width < BRW_EXECUTE_8; > + > + if (fix_exec_size) >brw_inst_set_exec_size(devinfo, inst, dest.width); > } > > -- > 2.1.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Separate base offset/constant offset combining from remapping.
On Dec 9, 2015 2:51 AM, "Kenneth Graunke" wrote: > > My tessellation branch has two additional remap functions. I don't want > to replicate this logic there. > > Signed-off-by: Kenneth Graunke > --- > src/mesa/drivers/dri/i965/brw_nir.c | 78 - > 1 file changed, 50 insertions(+), 28 deletions(-) > > Hey Jason, > > If you like this patch, and haven't yet merged your NIR input reworks, > feel free to just squash it into your changes. Or, we can land it > separately after your changes. It's up to you. > > Separating this out allows me to reuse this in my new tessellation input > and output remapping functions, and also means we don't need to add structs > for the remap functions...we can just pass the builder, or inputs_read, or > the VUE map...and not have to pack multiple things together. Sure. It does make things simpler. > --Ken > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c > index 14ad172..105a175 100644 > --- a/src/mesa/drivers/dri/i965/brw_nir.c > +++ b/src/mesa/drivers/dri/i965/brw_nir.c > @@ -27,15 +27,19 @@ > #include "glsl/nir/nir_builder.h" > #include "program/prog_to_nir.h" > > -struct remap_vs_attrs_state { > - nir_builder b; > - uint64_t inputs_read; > -}; > - > +/** > + * In many cases, we just add the base and offset together, so there's no > + * reason to keep them separate. Sometimes, combining them is essential: > + * if a shader only accesses part of a compound variable (such as a matrix > + * or array), the variable's base may not actually exist in the VUE map. > + * > + * This pass adds constant offsets to instr->const_index[0], and resets > + * the offset source to 0. Non-constant offsets remain unchanged. > + */ > static bool > -remap_vs_attrs(nir_block *block, void *void_state) > +add_const_offset_to_base(nir_block *block, void *closure) > { > - struct remap_vs_attrs_state *state = void_state; > + nir_builder *b = closure; > > nir_foreach_instr_safe(block, instr) { >if (instr->type != nir_instr_type_intrinsic) > @@ -43,30 +47,48 @@ remap_vs_attrs(nir_block *block, void *void_state) > >nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); > > + if (intrin->intrinsic == nir_intrinsic_load_input || > + intrin->intrinsic == nir_intrinsic_load_per_vertex_input || > + intrin->intrinsic == nir_intrinsic_load_output || > + intrin->intrinsic == nir_intrinsic_load_per_vertex_output || > + intrin->intrinsic == nir_intrinsic_store_output || > + intrin->intrinsic == nir_intrinsic_store_per_vertex_output) { This seems a bit scortched-earth. It would be nice if the caller had a bit more control. > + nir_src *offset = nir_get_io_offset_src(intrin); > + nir_const_value *const_offset = nir_src_as_const_value(*offset); > + > + if (const_offset) { > +intrin->const_index[0] += const_offset->u[0]; > +b->cursor = nir_before_instr(&intrin->instr); > +nir_instr_rewrite_src(&intrin->instr, offset, > + nir_src_for_ssa(nir_imm_int(b, 0))); > + } Else??? It seems that you don't want to run this pass if you think you'll ever hit an indirect. I guess it's harmless to just do this for all direct things in our driver, but it doesn't sit well. > + } > + } > + return true; > + > +} > + > +static bool > +remap_vs_attrs(nir_block *block, void *closure) > +{ > + GLbitfield64 inputs_read = *((GLbitfield64 *) closure); > + > + nir_foreach_instr(block, instr) { > + if (instr->type != nir_instr_type_intrinsic) > + continue; > + > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); > + >if (intrin->intrinsic == nir_intrinsic_load_input) { > /* Attributes come in a contiguous block, ordered by their >* gl_vert_attrib value. That means we can compute the slot >* number for an attribute by masking out the enabled attributes >* before it and counting the bits. >*/ > - nir_const_value *const_offset = nir_src_as_const_value(intrin->src[0]); > - > - /* We set EmitNoIndirect for VS inputs, so there are no indirects. */ > - assert(const_offset); > - > - int attr = intrin->const_index[0] + const_offset->u[0]; > - int slot = _mesa_bitcount_64(state->inputs_read & > - BITFIELD64_MASK(attr)); > + int attr = intrin->const_index[0]; > + int slot = _mesa_bitcount_64(inputs_read & BITFIELD64_MASK(attr)); > > - /* The NIR -> FS pass will just add the base and offset together, so > - * there's no reason to keep them separate. Just put it all in > - * const_index[0] and set the offset src[0] to load_const(0). > - */ > intrin->const_index[0] = 4 * slot; > - > - state->b.cursor = nir_before_instr(&intrin->instr);
Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix crash when calling glViewport with no surface bound
On 9 December 2015 at 14:57, Neil Roberts wrote: > Emil Velikov writes: > >> Worth throwing in 11.0 as well ? > > Yeah, that would probably be sensible. > >>> if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) { >>> - dri2InvalidateDrawable(driContext->driDrawablePriv); >>> - dri2InvalidateDrawable(driContext->driReadablePriv); >>> + if (driContext->driDrawablePriv) >>> + dri2InvalidateDrawable(driContext->driDrawablePriv); >>> + if (driContext->driReadablePriv) >>> + dri2InvalidateDrawable(driContext->driReadablePriv); >> >> Afaict i915 could use an identical fix ? > > Yes, I think you're right. However I don't have any way of testing it so > I feel a bit uncomfortable touching i915 driver. > Considering it's a null-deref fix one can just move the check in dri2InvalidateDrawable, which would spare going through i915, i965, radeon... ;-) Just throwing it out there, it's up-to whichever route you want to take. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] softpipe: V.2 implement some support for multiple viewports
Am 09.12.2015 um 05:16 schrieb Edward O'Callaghan: > This fixes my initial attempt so that piglit now passes 14/14. Thanks > to a couple of tips from Roland in the previous patch I was able to > fix the remaining issue. This should be golden now. > Great that you got it working! Please send the patches to the ml. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH shader-db 1/3] split-to-files: deal with minimum versions, other shader types
Hi, On 11/09/2015 08:47 PM, Matt Turner wrote: On Mon, Nov 9, 2015 at 10:46 AM, Ilia Mirkin wrote: I used this script in conjunction with ST_DUMP_SHADERS. What other way is there? Some local hack and we should probably finish and upstream. Did anything happen with this? I had to rewrite split-to-files because it didn't output all (ARB) shaders and it picks the wrong one [1] when application re-uses same program numbers. [1] It picked first one, although I think almost always the last one will be most interesting. Last one will also allow easily dumping different shader sets (that re-use same program numbers) from the same application. RFC patches for my changes are attached. - Eero >From c52f02ff664af269fa5268627624fe94c647ad37 Mon Sep 17 00:00:00 2001 From: Eero Tamminen Date: Wed, 9 Dec 2015 16:29:43 +0200 Subject: [PATCH 1/2] Rewrite split-to-files.py to fix it This rewrite improves on previous version in following ways: * Improve recognization of shader end. * Remove extra lines after shader ends (in normal shaders anything after last line with '}' that closes main(), and in ARB shaders, lines after END). * Optimize parsing by using compiled regexps. * Calculate (md5) hashes for normalized (single line comments and white space removed) shader contents and identify duplicate shaders with those * If program gets a new shader, output the latest one. It should be more relevant one. It also allows dumping different shader sets e.g. shaders for startup / game menu vs. actual game play, just by running application further before killing it. * When application replaces ARB shaders, continue instead of claiming to be done & exiting. Same program numbers can be used if application removes previous programs. * Tell user which shaders were duplicates and which were replaced by which shaders. * Remove duplicate programs based on shader stage hashes (of their normalized sources) and tell user about this. * Output shader stage sources in 3D pipeline order. * Give ARB shaders different file name from normal shaders. --- split-to-files.py | 409 +++-- 1 file changed, 306 insertions(+), 103 deletions(-) diff --git a/split-to-files.py b/split-to-files.py index 151681e..7150622 100755 --- a/split-to-files.py +++ b/split-to-files.py @@ -2,122 +2,300 @@ import re import os +import hashlib import argparse +class ShaderBase: +def __init__(self, prog, stage): +self.lines = [] +self.progid = prog # latest +self.programs = {self.progid: True} # all +self.stage = stage +self.hash = None +self.hashed_len = 0 +self.done = False +self.replaced = False +# filled by subclasses +self.shadernum = 0 +self.req_start = None +self.req_end = None +self.warn = None + +def append_line(self, line): +assert not self.done +self.lines.append(line) + +def is_finished(self, line): +return False + +def get_source(self): +assert self.done +return "\n".join(self.lines) + "\n" + +def add_program(self, dup): +self.programs[dup.progid] = True + +def del_program(self, dup): +del(self.programs[dup.progid]) + +def get_hash(self): +if self.hash: +return self.hash +assert self.done + +# source without single line comments & whitespace +normalized = [] +for line in self.lines: +offset = line.find("//") +if offset >= 0: +line = line[:offset] +# Python2: line = line.translate(None, " \t") +line = line.translate({' ': None, '\t': None}) +if line: +normalized.append(line) +normalized = "".join(normalized).encode() + +# create hash for normalized source +md5 = hashlib.md5() +self.hashed_len = len(normalized) +md5.update(normalized) +self.hash = md5.hexdigest() +return self.hash + +def check_conflict(self, dup): +assert self.done and dup.done +if self.hash == dup.hash and self.hashed_len != dup.hashed_len: +print("ERROR: hash collision with %s" % dup.get_info()) +exit(-1) +if dup.stage != dup.stage: +# same shader for different stage, this isn't handled correctly at the moment +# all code assumes that each hash/shader represents just one shader stage +print("ERROR: duplicate is for different shader stage (%s)" % dup.stage) +exit(-1) + +def get_info(self): +assert None # must be subclassed + +def show_info(self): +print(self.get_info()) + + +class ShaderARB(ShaderBase): +def __init__(self, prog, stage): +ShaderBase.__init__(self, prog, stage) +self.progid = "%s-ARB_%s" % (prog, stage) + +self.req_start = "GL_ARB
Re: [Mesa-dev] [Mesa-stable] [PATCH] i965: Fix crash when calling glViewport with no surface bound
Emil Velikov writes: > Worth throwing in 11.0 as well ? Yeah, that would probably be sensible. >> if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) { >> - dri2InvalidateDrawable(driContext->driDrawablePriv); >> - dri2InvalidateDrawable(driContext->driReadablePriv); >> + if (driContext->driDrawablePriv) >> + dri2InvalidateDrawable(driContext->driDrawablePriv); >> + if (driContext->driReadablePriv) >> + dri2InvalidateDrawable(driContext->driReadablePriv); > > Afaict i915 could use an identical fix ? Yes, I think you're right. However I don't have any way of testing it so I feel a bit uncomfortable touching i915 driver. Regards, - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 91888] EGL Wayland software rendering no longer work after regression
https://bugs.freedesktop.org/show_bug.cgi?id=91888 --- Comment #18 from Daniel Stone --- How about 'hooray, it's fixed'? :) -- You are receiving this mail because: You are the QA Contact for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium/util: handle patches in u_prims_for_vertices to fix a radeonsi crash
From: Marek Olšák I guess the crash was because of divison by zero. Cc: 11.0 11.1 --- src/gallium/auxiliary/util/u_prim.h | 17 + src/gallium/drivers/radeonsi/si_state_draw.c | 3 ++- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/gallium/auxiliary/util/u_prim.h b/src/gallium/auxiliary/util/u_prim.h index 3668015..4926af6 100644 --- a/src/gallium/auxiliary/util/u_prim.h +++ b/src/gallium/auxiliary/util/u_prim.h @@ -141,14 +141,23 @@ u_prim_vertex_count(unsigned prim) * For polygons, return the number of triangles. */ static inline unsigned -u_prims_for_vertices(unsigned prim, unsigned num) +u_prims_for_vertices(unsigned prim, unsigned num, unsigned vertices_per_patch) { - const struct u_prim_vertex_count *info = u_prim_vertex_count(prim); + struct u_prim_vertex_count info; - if (num < info->min) + if (prim == PIPE_PRIM_PATCHES) + info.min = info.incr = vertices_per_patch; + else if (prim < PIPE_PRIM_MAX) + info = *u_prim_vertex_count(prim); + else { + assert(!"invalid prim type"); + return 0; + } + + if (num < info.min) return 0; - return 1 + ((num - info->min) / info->incr); + return 1 + ((num - info.min) / info.incr); } static inline boolean u_validate_pipe_prim( unsigned pipe_prim, unsigned nr ) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index ee84a1f..4ac9d0a 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -320,7 +320,8 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx, if (sctx->b.screen->info.max_se >= 2 && ia_switch_on_eoi && (info->indirect || (info->instance_count > 1 && - u_prims_for_vertices(info->mode, info->count) <= 1))) + u_prims_for_vertices(info->mode, info->count, + info->vertices_per_patch) <= 1))) sctx->b.flags |= SI_CONTEXT_VGT_FLUSH; return S_028AA8_SWITCH_ON_EOP(ia_switch_on_eop) | -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC] glapi: Build gl_gentable.c only on Darwin
Removes the public symbol _glapi_create_table_from_handle from libGL.so.1 on all plattforms except Darwin. Since the symbol is not used on other plattforms it makes sense to build gl_gentable.c only on Darwin. A little bit of history: _glapi_create_table_from_handle was introduced in commit 85937f4c0d4a78d3a11e3c1fa6148640f2a9ad7b Author: Jeremy Huddleston Date: Thu Jun 9 16:59:49 2011 -0700 glapi: Add API that can create a _glapi_table from a dlfcn handle Example usage: void *handle = dlopen(opengl_library_path, RTLD_LOCAL); struct _glapi_table *disp = _glapi_create_table_from_handle(handle, "gl"); Signed-off-by: Jeremy Huddleston and the only user in mesa was added in commit f35913b96e743c5014e99220b1a1c5532a894d69 Author: Jeremy Huddleston Date: Thu Jun 9 17:29:51 2011 -0700 apple: Use _glapi_create_table_from_handle to initialize our dispatch table Signed-off-by: Jeremy Huddleston gl_gentable.py was also used for XQuartz in xserver 1.11 - 1.14. Cc: Jeremy Huddleston Signed-off-by: Andreas Boll --- XXX If we still want to distribute gl_gentable.c in the release tarball we could drop the changes in src/mapi/glapi/gen/Makefile.am src/mapi/Makefile.am | 6 +- src/mapi/glapi/gen/Makefile.am | 12 +--- src/mapi/glapi/glapi.h | 2 ++ 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/src/mapi/Makefile.am b/src/mapi/Makefile.am index 307e05d..ddd3daa 100644 --- a/src/mapi/Makefile.am +++ b/src/mapi/Makefile.am @@ -106,12 +106,16 @@ if HAVE_SPARC_ASM GLAPI_ASM_SOURCES = glapi/glapi_sparc.S endif -glapi_libglapi_la_SOURCES = glapi/glapi_gentable.c +glapi_libglapi_la_SOURCES = glapi_libglapi_la_CPPFLAGS = \ $(AM_CPPFLAGS) \ -I$(top_srcdir)/src/mapi/glapi \ -I$(top_srcdir)/src/mesa +if HAVE_APPLEDRI +glapi_libglapi_la_SOURCES += glapi/glapi_gentable.c +endif + if HAVE_SHARED_GLAPI glapi_libglapi_la_SOURCES += $(MAPI_BRIDGE_FILES) glapi/glapi_mapi_tmp.h glapi_libglapi_la_CPPFLAGS += \ diff --git a/src/mapi/glapi/gen/Makefile.am b/src/mapi/glapi/gen/Makefile.am index 2da8f7d..25ea44a 100644 --- a/src/mapi/glapi/gen/Makefile.am +++ b/src/mapi/glapi/gen/Makefile.am @@ -27,8 +27,11 @@ MESA_GLAPI_OUTPUTS = \ $(MESA_GLAPI_DIR)/glapi_mapi_tmp.h \ $(MESA_GLAPI_DIR)/glprocs.h \ $(MESA_GLAPI_DIR)/glapitemp.h \ - $(MESA_GLAPI_DIR)/glapitable.h \ - $(MESA_GLAPI_DIR)/glapi_gentable.c + $(MESA_GLAPI_DIR)/glapitable.h + +if HAVE_APPLEDRI +MESA_GLAPI_OUTPUTS += $(MESA_GLAPI_DIR)/glapi_gentable.c +endif MESA_GLAPI_ASM_OUTPUTS = if HAVE_X86_ASM @@ -88,8 +91,11 @@ XORG_GLAPI_DIR = $(XORG_BASE)/glx XORG_GLAPI_OUTPUTS = \ $(XORG_GLAPI_DIR)/glprocs.h \ $(XORG_GLAPI_DIR)/glapitable.h \ - $(XORG_GLAPI_DIR)/dispatch.h \ + $(XORG_GLAPI_DIR)/dispatch.h + +if HAVE_APPLEDRI $(XORG_GLAPI_DIR)/glapi_gentable.c +endif XORG_OUTPUTS = \ $(XORG_GLAPI_OUTPUTS) \ diff --git a/src/mapi/glapi/glapi.h b/src/mapi/glapi/glapi.h index f269b17..3593c88 100644 --- a/src/mapi/glapi/glapi.h +++ b/src/mapi/glapi/glapi.h @@ -158,8 +158,10 @@ _GLAPI_EXPORT const char * _glapi_get_proc_name(unsigned int offset); +#ifdef GLX_USE_APPLEGL _GLAPI_EXPORT struct _glapi_table * _glapi_create_table_from_handle(void *handle, const char *symbol_prefix); +#endif _GLAPI_EXPORT void -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [Mesa-stable] [PATCH] configure.ac: fix test for SSE4.1 assembler support
On Wed, Dec 9, 2015 at 2:34 PM, Jonathan Gray wrote: > On Wed, Dec 09, 2015 at 01:39:30PM +0200, Oded Gabbay wrote: >> On Wed, Dec 9, 2015 at 1:09 PM, Emil Velikov >> wrote: >> > On 9 December 2015 at 05:37, Jonathan Gray wrote: >> >> Change the __m128i variables to be volatile so gcc 4.9 won't optimise >> >> all of them out with -O1 or greater. The _mm_set1_epi32/pinsrd calls >> >> still get optimised out but now there is at least one SSE4.1 instruction >> >> generated via _mm_max_epu32/pmaxud. When all of the sse4.1 instructions >> >> got optimised out the configure test would incorrectly pass when the >> >> compiler supported the intrinsics and the assembler didn't support the >> >> instructions. >> >> >> > Must admit that I was not expecting that one. Looks like pixman (the >> > inspiration for this check) is missing volatile as well. Does that one >> > build/run fine on OpenBSD ? >> > >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91806 >> >> Signed-off-by: Jonathan Gray >> >> Cc: "11.0 11.1" >> > Reviewed-by: Emil Velikov >> > >> > I'll pick this in a couple of days (barring any objections). >> > >> > Thanks >> > Emil >> >> Adding pixman ML. >> I must admit ignorance on this one. >> I looked at configure.ac of pixman and I don't see any SSE4.1 >> reference, and AFAIK, we don't use those instructions (only SSE2 and >> SSSE3). >> Is the above patch relevant for those as well ? because the tests in >> configure.ac does *not* contain volatile. >> >> Oded > > It looks like this is indeed a problem in pixman as well with at least > gcc 4.2 and 4.9. Running the pixman sse2 test on amd64 I only > see xmm register use and movdqa in the generated assembly with -O0. > > The pixman configure tests passes on OpenBSD but the toolchain > recognises sse2 instructions just not sse 4.1. > > Introducing volatile to the pixman test stops the xmm/movdqa > use from being optimised out. I adapted the patch to pixman's configure.ac and sent it to the ML. Jonathan, thanks for the patch. Oded ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] i965: Resolve color for all active shader images in intel_update_state().
Kristian Høgsberg writes: > On Sat, Sep 5, 2015 at 11:30 AM, Jordan Justen > wrote: >> From: Francisco Jerez >> >> Fixes >> arb_shader_image_load_store/execution/load-from-cleared-image.shader_test >> >> Cc: Chris Wilson >> Cc: Jason Ekstrand >> Tested-by: Jordan Justen > > This patch is required for correct behavior and looks straightforward > and correct to me. Let's fix the bug and optimize the CPU performance > regression (if there is one) later. > > Reviewed-by: Kristian Høgsberg Thanks, pushed. Chris, feel free to open a bug report and add me to the CC list if you can still reproduce a regression on master. > >> --- >> RE: i965: Perform an explicit flush after doing _mesa_meta_pbo_TexSubImage >> >> curro has some concerns about potential perf impact by this and >> wanted it to be checked on small-core w/CPU bound apps. >> Unfortunately, he is on vacation now. >> >> src/mesa/drivers/dri/i965/brw_context.c | 18 ++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_context.c >> b/src/mesa/drivers/dri/i965/brw_context.c >> index f0ed891..a274a43 100644 >> --- a/src/mesa/drivers/dri/i965/brw_context.c >> +++ b/src/mesa/drivers/dri/i965/brw_context.c >> @@ -189,6 +189,24 @@ intel_update_state(struct gl_context * ctx, GLuint >> new_state) >>brw_render_cache_set_check_flush(brw, tex_obj->mt->bo); >> } >> >> + /* Resolve color for each active shader image. */ >> + for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { >> + const struct gl_shader *shader = ctx->_Shader->CurrentProgram[i] ? >> + ctx->_Shader->CurrentProgram[i]->_LinkedShaders[i] : NULL; >> + >> + if (unlikely(shader && shader->NumImages)) { >> + for (unsigned j = 0; j < shader->NumImages; j++) { >> +struct gl_image_unit *u = >> &ctx->ImageUnits[shader->ImageUnits[j]]; >> +tex_obj = intel_texture_object(u->TexObj); >> + >> +if (tex_obj && tex_obj->mt) { >> + intel_miptree_resolve_color(brw, tex_obj->mt); >> + brw_render_cache_set_check_flush(brw, tex_obj->mt->bo); >> +} >> + } >> + } >> + } >> + >> _mesa_lock_context_textures(ctx); >> } >> >> -- >> 2.5.0 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev