Re: [Mesa-dev] [PATCH 09/23] main: Cosmetic changes to BufferSubData infrastructure.
Be more precise in the commit message. I propose: report more explicit error messages in the BufferSubData infrastructure Other than that: Reviewed-by: Martin Peres martin.pe...@linux.intel.com On 12/02/2015 04:05, Laura Ekstrand wrote: --- src/mesa/main/bufferobj.c | 10 +++--- src/mesa/main/bufferobj.h | 4 ++-- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 4f89748..fc01d02 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -267,13 +267,17 @@ buffer_object_subdata_range_good(struct gl_context *ctx, if (mappedRange) { if (bufferobj_range_mapped(bufObj, offset, size)) { - _mesa_error(ctx, GL_INVALID_OPERATION, %s, caller); + _mesa_error(ctx, GL_INVALID_OPERATION, + %s(range is mapped without persistent bit), + caller); return false; } } else { if (_mesa_bufferobj_mapped(bufObj, MAP_USER)) { - _mesa_error(ctx, GL_INVALID_OPERATION, %s, caller); + _mesa_error(ctx, GL_INVALID_OPERATION, + %s(buffer is mapped without persistent bit), + caller); return false; } } @@ -1613,7 +1617,7 @@ _mesa_buffer_sub_data(struct gl_context *ctx, struct gl_buffer_object *bufObj, bufObj-Written = GL_TRUE; ASSERT(ctx-Driver.BufferSubData); - ctx-Driver.BufferSubData( ctx, offset, size, data, bufObj ); + ctx-Driver.BufferSubData(ctx, offset, size, data, bufObj); } void GLAPIENTRY diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h index 889bbb1..d15ad00 100644 --- a/src/mesa/main/bufferobj.h +++ b/src/mesa/main/bufferobj.h @@ -190,8 +190,8 @@ _mesa_NamedBufferData(GLuint buffer, GLsizeiptr size, const GLvoid *data, GLenum usage); void GLAPIENTRY -_mesa_BufferSubData(GLenum target, GLintptrARB offset, -GLsizeiptrARB size, const GLvoid * data); +_mesa_BufferSubData(GLenum target, GLintptr offset, +GLsizeiptr size, const GLvoid *data); void GLAPIENTRY _mesa_NamedBufferSubData(GLuint buffer, GLintptr offset, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Set pixel/sample mask for compute shaders atomic ops
Jordan Justen jordan.l.jus...@intel.com writes: On 2015-02-19 21:40:37, Ben Widawsky wrote: On Thu, Feb 19, 2015 at 03:42:05PM -0800, Jordan Justen wrote: For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Ben Widawsky b...@bwidawsk.net Cc: Francisco Jerez curroje...@riseup.net Just add to the commit message that this is needed specifically because compute is invoked as SIMD16 (and perhaps reference the other commits?) and it's: Reviewed-by: Ben Widawsky b...@bwidawsk.net Good idea. I'll add those. Sorry it advance... we may as well just go for 0x in case we ever support SIMD32. I had been setting all 32-bits previously. I mentioned to you that I thought this was needed for SIMD32. I wanted to double check it before sending the patch out. I think I found the field for IVB in the PRM: IVB Vol 4 Part 1 3.9.9.9 Message Header Pixel/Sample Mask ...and it looks like it is only 16-bits. Maybe Francisco can confirm that I got it right. I couldn't find this same information in the HSW PRMs. I'm not sure what that means for SIMD32. Yeah, it's only 16 bits. For SIMD32 it means that, *sigh*, we'll have to split up the message in several SIMD16 chunks (my image load store branch has to do something similar to do typed surface operations in SIMD16 mode because there are only 8-wide variants of those messages). To initialize the header we would just copy the first or second 16-bit half of the sample mask, and set the quarter control bits appropriately for each half so the execution mask is taken into account correctly. With Ben's and Matt's suggestions this patch is: Reviewed-by: Francisco Jerez curroje...@riseup.net -Jordan --- While it's fresh in our minds. :) This seems to work for gen7 gen8 CS. For CS simd16, we need the 0x change, but it seems to work fine for simd8 as well. I also tested gen8 (simd8vs), and there were no piglit regressions. src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 24cc118..960a0aa 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2998,9 +2998,9 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } length++; @@ -3061,9 +3061,9 @@ fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst, * mask sent in the header to compute the actual set of channels that execute * the atomic operation. */ - assert(stage == MESA_SHADER_VERTEX); + assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0xff)))-force_writemask_all = true; + brw_imm_ud(0x)))-force_writemask_all = true; } /* Set the surface read offset. */ -- 2.1.4 pgpppOi6R6AAi.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: add double support for packing varyings
On Fri, Feb 20, 2015 at 4:10 AM, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- This works with a modified varying-packing test for everything except dvec4 array which crashes in st/mesa somewhere. Still working on that one. The IR generated here kinda stinks, but I couldn't get an assignment with a swizzle to work. Ugh, looks like this is a no-go for GS due to the temp vars. But this change gets it mostly working. I also had to switch doubles to *always* getting packed, even for dvec2 and dvec4... I think there's some variable storage/linking-related logic which doesn't take the double size into account properly. -ilia src/glsl/lower_packed_varyings.cpp | 88 +- 1 file changed, 67 insertions(+), 21 deletions(-) diff --git a/src/glsl/lower_packed_varyings.cpp b/src/glsl/lower_packed_varyings.cpp index 5e844c7..f9e3e85 100644 --- a/src/glsl/lower_packed_varyings.cpp +++ b/src/glsl/lower_packed_varyings.cpp @@ -146,7 +146,11 @@ #include glsl_symbol_table.h #include ir.h +#include ir_builder.h #include ir_optimization.h +#include program/prog_instruction.h + +using namespace ir_builder; namespace { @@ -168,8 +172,8 @@ public: void run(exec_list *instructions); private: - ir_assignment *bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs); - ir_assignment *bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs); + void bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs); + void bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs); unsigned lower_rvalue(ir_rvalue *rvalue, unsigned fine_location, ir_variable *unpacked_var, const char *name, bool gs_input_toplevel, unsigned vertex_index); @@ -274,6 +278,7 @@ lower_packed_varyings_visitor::run(exec_list *instructions) } } +#define SWIZZLE_ZWZW MAKE_SWIZZLE4(SWIZZLE_Z, SWIZZLE_W, SWIZZLE_Z, SWIZZLE_W) /** * Make an ir_assignment from \c rhs to \c lhs, performing appropriate @@ -281,7 +286,7 @@ lower_packed_varyings_visitor::run(exec_list *instructions) * * This function is called when packing varyings. */ -ir_assignment * +void lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs) { @@ -300,12 +305,28 @@ lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, rhs = new(this-mem_ctx) ir_expression(ir_unop_bitcast_f2i, lhs-type, rhs); break; + case GLSL_TYPE_DOUBLE: + assert(rhs-type-vector_elements = 2); + if (rhs-type-vector_elements == 2) { +ir_variable *t = new(mem_ctx) ir_variable(lhs-type, pack, ir_var_temporary); + +assert(lhs-type-vector_elements == 4); +this-out_instructions-push_tail(t); +this-out_instructions-push_tail( + assign(t, u2i(expr(ir_unop_unpack_double_2x32, swizzle_x(rhs-clone(mem_ctx, NULL, 0x3)); +this-out_instructions-push_tail( + assign(t, u2i(expr(ir_unop_unpack_double_2x32, swizzle_y(rhs))), 0xc)); +rhs = deref(t).val; + } else { +rhs = u2i(expr(ir_unop_unpack_double_2x32, rhs)); + } + break; default: assert(!Unexpected type conversion while lowering varyings); break; } } - return new(this-mem_ctx) ir_assignment(lhs, rhs); + this-out_instructions-push_tail(new (this-mem_ctx) ir_assignment(lhs, rhs)); } @@ -315,7 +336,7 @@ lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, * * This function is called when unpacking varyings. */ -ir_assignment * +void lower_packed_varyings_visitor::bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs) { @@ -334,12 +355,27 @@ lower_packed_varyings_visitor::bitwise_assign_unpack(ir_rvalue *lhs, rhs = new(this-mem_ctx) ir_expression(ir_unop_bitcast_i2f, lhs-type, rhs); break; + case GLSL_TYPE_DOUBLE: + assert(lhs-type-vector_elements = 2); + if (lhs-type-vector_elements == 2) { +ir_variable *t = new(mem_ctx) ir_variable(lhs-type, unpack, ir_var_temporary); +assert(rhs-type-vector_elements == 4); +this-out_instructions-push_tail(t); +this-out_instructions-push_tail( + assign(t, expr(ir_unop_pack_double_2x32, i2u(swizzle_xy(rhs-clone(mem_ctx, NULL, 0x1)); +this-out_instructions-push_tail( + assign(t, expr(ir_unop_pack_double_2x32, i2u(swizzle(rhs-clone(mem_ctx, NULL), SWIZZLE_ZWZW, 2))), 0x2)); +rhs = deref(t).val; + } else { +rhs = expr(ir_unop_pack_double_2x32, i2u(rhs)); + } + break;
Re: [Mesa-dev] [PATCH] i965/skl: Use 1 register for uniform pull constant payload
On Thursday, February 19, 2015 10:48:08 PM Ben Widawsky wrote: When under dispatch_width=16 the previous code would allocate 2 registers for the payload when only one is needed. This manifested itself through bugs on SKL which needs to mess with this instruction. Ken though this might impact shader-db, but apparently it doesn't Cc: Kenneth Graunke kenn...@whitecape.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999 Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c46e1d7..24125cc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3062,7 +3062,7 @@ fs_visitor::lower_uniform_pull_constant_loads() assert(const_offset_reg.file == IMM const_offset_reg.type == BRW_REGISTER_TYPE_UD); const_offset_reg.fixed_hw_reg.dw1.ud /= 4; - fs_reg payload = vgrf(glsl_type::uint_type); + fs_reg payload = fs_reg(GRF, alloc.allocate(1)); /* We have to use a message header on Skylake to get SIMD4x2 mode. * Reserve space for the register. Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/23] main: Add entry point for NamedBufferSubData.
On 12/02/2015 04:05, Laura Ekstrand wrote: v2: review by Ian Romanick - Remove _mesa from name of static software fallback buffer_sub_data. - Remove mappedRange from _mesa_buffer_sub_data. - Removed some cosmetic changes to a separate commit. --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 ++ src/mesa/main/bufferobj.c | 129 +++-- src/mesa/main/bufferobj.h | 9 ++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + 4 files changed, 97 insertions(+), 49 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 7779262..6d70b8e 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -28,6 +28,13 @@ param name=usage type=GLenum / /function + function name=NamedBufferSubData offset=assign + param name=buffer type=GLuint / + param name=offset type=GLintptr / + param name=size type=GLsizeiptr / + param name=data type=const GLvoid * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index ac8eed1..4f89748 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -227,67 +227,58 @@ bufferobj_range_mapped(const struct gl_buffer_object *obj, * \c glClearBufferSubData. * * \param ctx GL context. - * \param target Buffer object target on which to operate. + * \param bufObj The buffer object. * \param offset Offset of the first byte of the subdata range. * \param sizeSize, in bytes, of the subdata range. * \param mappedRange If true, checks if an overlapping range is mapped. * If false, checks if buffer is mapped. - * \param errorNoBuffer Error code if no buffer is bound to target. * \param caller Name of calling function for recording errors. - * \return A pointer to the buffer object bound to \c target in the - * specified context or \c NULL if any of the parameter or state - * conditions are invalid. + * \return false if error, true otherwise * * \sa glBufferSubDataARB, glGetBufferSubDataARB, glClearBufferSubData */ -static struct gl_buffer_object * -buffer_object_subdata_range_good(struct gl_context * ctx, GLenum target, - GLintptrARB offset, GLsizeiptrARB size, - bool mappedRange, GLenum errorNoBuffer, - const char *caller) +static bool +buffer_object_subdata_range_good(struct gl_context *ctx, + struct gl_buffer_object *bufObj, + GLintptr offset, GLsizeiptr size, + bool mappedRange, const char *caller) { - struct gl_buffer_object *bufObj; - if (size 0) { _mesa_error(ctx, GL_INVALID_VALUE, %s(size 0), caller); - return NULL; + return false; } if (offset 0) { _mesa_error(ctx, GL_INVALID_VALUE, %s(offset 0), caller); - return NULL; + return false; } - bufObj = get_buffer(ctx, caller, target, errorNoBuffer); - if (!bufObj) - return NULL; - if (offset + size bufObj-Size) { _mesa_error(ctx, GL_INVALID_VALUE, %s(offset %lu + size %lu buffer size %lu), caller, (unsigned long) offset, (unsigned long) size, (unsigned long) bufObj-Size); - return NULL; + return false; } if (bufObj-Mappings[MAP_USER].AccessFlags GL_MAP_PERSISTENT_BIT) - return bufObj; + return true; if (mappedRange) { if (bufferobj_range_mapped(bufObj, offset, size)) { _mesa_error(ctx, GL_INVALID_OPERATION, %s, caller); - return NULL; + return false; } } else { if (_mesa_bufferobj_mapped(bufObj, MAP_USER)) { _mesa_error(ctx, GL_INVALID_OPERATION, %s, caller); - return NULL; + return false; } } - return bufObj; + return true; } @@ -602,9 +593,9 @@ buffer_data_fallback(struct gl_context *ctx, GLenum target, GLsizeiptr size, * \sa glBufferSubDataARB, dd_function_table::BufferSubData. */ static void -_mesa_buffer_subdata( struct gl_context *ctx, GLintptrARB offset, - GLsizeiptrARB size, const GLvoid * data, - struct gl_buffer_object * bufObj ) +buffer_sub_data_fallback(struct gl_context *ctx, GLintptr offset, + GLsizeiptr size, const GLvoid *data, + struct gl_buffer_object *bufObj) { (void) ctx; @@ -1113,7 +1104,7 @@ _mesa_init_buffer_object_functions(struct dd_function_table *driver) driver-NewBufferObject = _mesa_new_buffer_object;
[Mesa-dev] [PATCH] glsl: add double support for packing varyings
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- This works with a modified varying-packing test for everything except dvec4 array which crashes in st/mesa somewhere. Still working on that one. The IR generated here kinda stinks, but I couldn't get an assignment with a swizzle to work. src/glsl/lower_packed_varyings.cpp | 88 +- 1 file changed, 67 insertions(+), 21 deletions(-) diff --git a/src/glsl/lower_packed_varyings.cpp b/src/glsl/lower_packed_varyings.cpp index 5e844c7..f9e3e85 100644 --- a/src/glsl/lower_packed_varyings.cpp +++ b/src/glsl/lower_packed_varyings.cpp @@ -146,7 +146,11 @@ #include glsl_symbol_table.h #include ir.h +#include ir_builder.h #include ir_optimization.h +#include program/prog_instruction.h + +using namespace ir_builder; namespace { @@ -168,8 +172,8 @@ public: void run(exec_list *instructions); private: - ir_assignment *bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs); - ir_assignment *bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs); + void bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs); + void bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs); unsigned lower_rvalue(ir_rvalue *rvalue, unsigned fine_location, ir_variable *unpacked_var, const char *name, bool gs_input_toplevel, unsigned vertex_index); @@ -274,6 +278,7 @@ lower_packed_varyings_visitor::run(exec_list *instructions) } } +#define SWIZZLE_ZWZW MAKE_SWIZZLE4(SWIZZLE_Z, SWIZZLE_W, SWIZZLE_Z, SWIZZLE_W) /** * Make an ir_assignment from \c rhs to \c lhs, performing appropriate @@ -281,7 +286,7 @@ lower_packed_varyings_visitor::run(exec_list *instructions) * * This function is called when packing varyings. */ -ir_assignment * +void lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs) { @@ -300,12 +305,28 @@ lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, rhs = new(this-mem_ctx) ir_expression(ir_unop_bitcast_f2i, lhs-type, rhs); break; + case GLSL_TYPE_DOUBLE: + assert(rhs-type-vector_elements = 2); + if (rhs-type-vector_elements == 2) { +ir_variable *t = new(mem_ctx) ir_variable(lhs-type, pack, ir_var_temporary); + +assert(lhs-type-vector_elements == 4); +this-out_instructions-push_tail(t); +this-out_instructions-push_tail( + assign(t, u2i(expr(ir_unop_unpack_double_2x32, swizzle_x(rhs-clone(mem_ctx, NULL, 0x3)); +this-out_instructions-push_tail( + assign(t, u2i(expr(ir_unop_unpack_double_2x32, swizzle_y(rhs))), 0xc)); +rhs = deref(t).val; + } else { +rhs = u2i(expr(ir_unop_unpack_double_2x32, rhs)); + } + break; default: assert(!Unexpected type conversion while lowering varyings); break; } } - return new(this-mem_ctx) ir_assignment(lhs, rhs); + this-out_instructions-push_tail(new (this-mem_ctx) ir_assignment(lhs, rhs)); } @@ -315,7 +336,7 @@ lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, * * This function is called when unpacking varyings. */ -ir_assignment * +void lower_packed_varyings_visitor::bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs) { @@ -334,12 +355,27 @@ lower_packed_varyings_visitor::bitwise_assign_unpack(ir_rvalue *lhs, rhs = new(this-mem_ctx) ir_expression(ir_unop_bitcast_i2f, lhs-type, rhs); break; + case GLSL_TYPE_DOUBLE: + assert(lhs-type-vector_elements = 2); + if (lhs-type-vector_elements == 2) { +ir_variable *t = new(mem_ctx) ir_variable(lhs-type, unpack, ir_var_temporary); +assert(rhs-type-vector_elements == 4); +this-out_instructions-push_tail(t); +this-out_instructions-push_tail( + assign(t, expr(ir_unop_pack_double_2x32, i2u(swizzle_xy(rhs-clone(mem_ctx, NULL, 0x1)); +this-out_instructions-push_tail( + assign(t, expr(ir_unop_pack_double_2x32, i2u(swizzle(rhs-clone(mem_ctx, NULL), SWIZZLE_ZWZW, 2))), 0x2)); +rhs = deref(t).val; + } else { +rhs = expr(ir_unop_pack_double_2x32, i2u(rhs)); + } + break; default: assert(!Unexpected type conversion while lowering varyings); break; } } - return new(this-mem_ctx) ir_assignment(lhs, rhs); + this-out_instructions-push_tail(new(this-mem_ctx) ir_assignment(lhs, rhs)); } @@ -372,6 +408,7 @@ lower_packed_varyings_visitor::lower_rvalue(ir_rvalue *rvalue, bool gs_input_toplevel, unsigned vertex_index) { + unsigned dmul =
Re: [Mesa-dev] [PATCH 23/23] main: Cosmetic changes to GetBufferSubData.
Please squash this commit with the one introducing GetBufferSubData. It should be pretty easy to do with git rebase -i :) On 12/02/2015 04:06, Laura Ekstrand wrote: --- src/mesa/main/bufferobj.c | 2 +- src/mesa/main/bufferobj.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 0272704..38d8b5a 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -1668,7 +1668,7 @@ _mesa_GetBufferSubData(GLenum target, GLintptr offset, } ASSERT(ctx-Driver.GetBufferSubData); - ctx-Driver.GetBufferSubData( ctx, offset, size, data, bufObj ); + ctx-Driver.GetBufferSubData(ctx, offset, size, data, bufObj); } void GLAPIENTRY diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h index feeaa8b..b5d73ae 100644 --- a/src/mesa/main/bufferobj.h +++ b/src/mesa/main/bufferobj.h @@ -230,8 +230,8 @@ _mesa_NamedBufferSubData(GLuint buffer, GLintptr offset, GLsizeiptr size, const GLvoid *data); void GLAPIENTRY -_mesa_GetBufferSubData(GLenum target, GLintptrARB offset, - GLsizeiptrARB size, void * data); +_mesa_GetBufferSubData(GLenum target, GLintptr offset, + GLsizeiptr size, GLvoid *data); void GLAPIENTRY _mesa_GetNamedBufferSubData(GLuint buffer, GLintptr offset, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/23] main: Add entry point for CopyNamedBufferSubData.
Looks good to me. Reviewed-by: Martin Peres martin.pe...@linux.intel.com On 12/02/2015 04:05, Laura Ekstrand wrote: v2: remove _mesa in front of static software fallback. --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 8 +++ src/mesa/main/bufferobj.c | 99 +- src/mesa/main/bufferobj.h | 12 src/mesa/main/tests/dispatch_sanity.cpp| 1 + 4 files changed, 87 insertions(+), 33 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 6d70b8e..042b2a8 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -35,6 +35,14 @@ param name=data type=const GLvoid * / /function + function name=CopyNamedBufferSubData offset=assign + param name=readBuffer type=GLuint / + param name=writeBuffer type=GLuint / + param name=readOffset type=GLintptr / + param name=writeOffset type=GLintptr / + param name=size type=GLsizeiptr / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index fc01d02..7225b64 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -761,11 +761,11 @@ _mesa_buffer_unmap(struct gl_context *ctx, struct gl_buffer_object *bufObj, * Called via glCopyBufferSubData(). */ static void -_mesa_copy_buffer_subdata(struct gl_context *ctx, - struct gl_buffer_object *src, - struct gl_buffer_object *dst, - GLintptr readOffset, GLintptr writeOffset, - GLsizeiptr size) +copy_buffer_sub_data_fallback(struct gl_context *ctx, + struct gl_buffer_object *src, + struct gl_buffer_object *dst, + GLintptr readOffset, GLintptr writeOffset, + GLsizeiptr size) { GLubyte *srcPtr, *dstPtr; @@ -1120,7 +1120,7 @@ _mesa_init_buffer_object_functions(struct dd_function_table *driver) driver-FlushMappedBufferRange = _mesa_buffer_flush_mapped_range; /* GL_ARB_copy_buffer */ - driver-CopyBufferSubData = _mesa_copy_buffer_subdata; + driver-CopyBufferSubData = copy_buffer_sub_data_fallback; } @@ -2101,65 +2101,54 @@ _mesa_GetBufferPointerv(GLenum target, GLenum pname, GLvoid **params) } -void GLAPIENTRY -_mesa_CopyBufferSubData(GLenum readTarget, GLenum writeTarget, -GLintptr readOffset, GLintptr writeOffset, -GLsizeiptr size) +void +_mesa_copy_buffer_sub_data(struct gl_context *ctx, + struct gl_buffer_object *src, + struct gl_buffer_object *dst, + GLintptr readOffset, GLintptr writeOffset, + GLsizeiptr size, const char *func) { - GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object *src, *dst; - - src = get_buffer(ctx, glCopyBufferSubData, readTarget, -GL_INVALID_OPERATION); - if (!src) - return; - - dst = get_buffer(ctx, glCopyBufferSubData, writeTarget, -GL_INVALID_OPERATION); - if (!dst) - return; - if (_mesa_check_disallowed_mapping(src)) { _mesa_error(ctx, GL_INVALID_OPERATION, - glCopyBufferSubData(readBuffer is mapped)); + %s(readBuffer is mapped), func); return; } if (_mesa_check_disallowed_mapping(dst)) { _mesa_error(ctx, GL_INVALID_OPERATION, - glCopyBufferSubData(writeBuffer is mapped)); + %s(writeBuffer is mapped), func); return; } if (readOffset 0) { _mesa_error(ctx, GL_INVALID_VALUE, - glCopyBufferSubData(readOffset = %d), (int) readOffset); + %s(readOffset %d 0), func, (int) readOffset); return; } if (writeOffset 0) { _mesa_error(ctx, GL_INVALID_VALUE, - glCopyBufferSubData(writeOffset = %d), (int) writeOffset); + %s(writeOffset %d 0), func, (int) writeOffset); return; } if (size 0) { _mesa_error(ctx, GL_INVALID_VALUE, - glCopyBufferSubData(writeOffset = %d), (int) size); + %s(size %d 0), func, (int) size); return; } if (readOffset + size src-Size) { _mesa_error(ctx, GL_INVALID_VALUE, - glCopyBufferSubData(readOffset + size = %d), - (int) (readOffset + size)); + %s(readOffset %d + size %d src_buffer_size %d), func, + (int) readOffset, (int) size, (int) src-Size); return; } if (writeOffset + size dst-Size) {
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. [1] http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html [2] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html [3] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html [4] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html --Jason On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something out of an MRF into a payload, which we shouldn't be doing in the first place. Aside from simplifying the function somewhat, that allows us to drop the 16 register gap reserved for MRFs at register offset zero, what will allow us to drop the vgrf_to_reg[] offset calculation completely (also in split_virtual_grfs()) in a future patch (not sent for review yet). No, we do read from MRF's sort-of... Send messages have an implicit read from an MRF. Heh, and that's pretty much the only way you read from it. This was written precicely so that we could use LOAD_PAYLOAD to build MRF payloads. We do on pre-GEN6. I'm aware, but you don't need any of this meta-data to LOAD_PAYLOAD *into* an MRF, and LOAD_PAYLOAD with an MRF as source should be illegal anyway. And no one is using it that way. All of the metadata checks you are deleting are checks on the *destination*. Didn't you write this code yourself? The only use for the collected metadata is initializing the instruction flags of the MOVs subsequent LOAD_PAYLOAD instructions are lowered to, based on the metadata already collected for its source registers, which can never be MRFs, so the metadata you collect from MRF writes is never actually used. Right... I misred something initially. Yes, we should never be tracking MRF's as a source of a LOAD_PAYLOAD. I'll give it a better look a bit later, but it looks better. pgpjBO09kHb1i.pgp Description: PGP signature
[Mesa-dev] [PATCH] mesa: Adds missing error condition in _mesa_check_sample_count()
This corrects a trivial error introduced in commit 19252fee46b835cb4f6b1cce18d7737d62b64a2e. That patch was merged recently and omits one condition (that 'samples' is greater than zero) in one of the error checks. That error will definitely cause regressions. --- src/mesa/main/multisample.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/multisample.c b/src/mesa/main/multisample.c index b696de9..50a8a11 100644 --- a/src/mesa/main/multisample.c +++ b/src/mesa/main/multisample.c @@ -155,7 +155,8 @@ _mesa_check_sample_count(struct gl_context *ctx, GLenum target, * If internalformat is a signed or unsigned integer format and samples * is greater than zero, then the error INVALID_OPERATION is generated. */ - if (_mesa_is_gles3(ctx) _mesa_is_enum_format_integer(internalFormat)) { + if (_mesa_is_gles3(ctx) _mesa_is_enum_format_integer(internalFormat) +samples 0) { return GL_INVALID_OPERATION; } -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/23] main: Refactor ClearBuffer[Sub]Data.
On 12/02/2015 04:05, Laura Ekstrand wrote: v2: review by Jason Ekstrand - Split refactor of clear buffer sub data from addition of DSA entry points. --- src/mesa/main/bufferobj.c| 125 --- src/mesa/main/bufferobj.h| 19 ++-- src/mesa/state_tracker/st_cb_bufferobjects.c | 4 +- 3 files changed, 69 insertions(+), 79 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 7225b64..b8fa917 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -660,11 +660,11 @@ _mesa_buffer_get_subdata( struct gl_context *ctx, GLintptrARB offset, * dd_function_table::ClearBufferSubData. */ void -_mesa_buffer_clear_subdata(struct gl_context *ctx, - GLintptr offset, GLsizeiptr size, - const GLvoid *clearValue, - GLsizeiptr clearValueSize, - struct gl_buffer_object *bufObj) +_mesa_ClearBufferSubData_sw(struct gl_context *ctx, Interesting choice of naming the function as a mix of camel case and underscores. Since it is an internal function, shouldn't it only have underscores? Other than that: Reviewed-by: Martin Peres martin.pe...@linux.intel.com +GLintptr offset, GLsizeiptr size, +const GLvoid *clearValue, +GLsizeiptr clearValueSize, +struct gl_buffer_object *bufObj) { GLsizeiptr i; GLubyte *dest; @@ -1113,7 +1113,7 @@ _mesa_init_buffer_object_functions(struct dd_function_table *driver) driver-UnmapBuffer = _mesa_buffer_unmap; /* GL_ARB_clear_buffer_object */ - driver-ClearBufferSubData = _mesa_buffer_clear_subdata; + driver-ClearBufferSubData = _mesa_ClearBufferSubData_sw; /* GL_ARB_map_buffer_range */ driver-MapBufferRange = _mesa_buffer_map_range; @@ -1671,57 +1671,77 @@ _mesa_GetBufferSubData(GLenum target, GLintptr offset, ctx-Driver.GetBufferSubData( ctx, offset, size, data, bufObj ); } - -void GLAPIENTRY -_mesa_ClearBufferData(GLenum target, GLenum internalformat, GLenum format, - GLenum type, const GLvoid* data) +/** + * \param subdata true if caller is *SubData, false if *Data + */ +void +_mesa_clear_buffer_sub_data(struct gl_context *ctx, +struct gl_buffer_object *bufObj, +GLenum internalformat, +GLintptr offset, GLsizeiptr size, +GLenum format, GLenum type, +const GLvoid *data, +const char *func, bool subdata) { - GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object* bufObj; mesa_format mesaFormat; GLubyte clearValue[MAX_PIXEL_BYTES]; GLsizeiptr clearValueSize; - bufObj = get_buffer(ctx, glClearBufferData, target, GL_INVALID_VALUE); - if (!bufObj) { - return; - } - - if (_mesa_check_disallowed_mapping(bufObj)) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glClearBufferData(buffer currently mapped)); + /* This checks for disallowed mappings. */ + if (!buffer_object_subdata_range_good(ctx, bufObj, offset, size, + subdata, func)) { return; } mesaFormat = validate_clear_buffer_format(ctx, internalformat, - format, type, - glClearBufferData); + format, type, func); + if (mesaFormat == MESA_FORMAT_NONE) { return; } clearValueSize = _mesa_get_format_bytes(mesaFormat); - if (bufObj-Size % clearValueSize != 0) { + if (offset % clearValueSize != 0 || size % clearValueSize != 0) { _mesa_error(ctx, GL_INVALID_VALUE, - glClearBufferData(size is not a multiple of - internalformat size)); + %s(offset or size is not a multiple of + internalformat size), func); return; } if (data == NULL) { /* clear to zeros, per the spec */ - ctx-Driver.ClearBufferSubData(ctx, 0, bufObj-Size, - NULL, clearValueSize, bufObj); + if (size 0) { + ctx-Driver.ClearBufferSubData(ctx, offset, size, +NULL, clearValueSize, bufObj); + } return; } if (!convert_clear_buffer_data(ctx, mesaFormat, clearValue, - format, type, data, glClearBufferData)) { + format, type, data, func)) { return; } - ctx-Driver.ClearBufferSubData(ctx, 0, bufObj-Size, - clearValue, clearValueSize, bufObj); + if (size 0) { +
Re: [Mesa-dev] [PATCH] glsl: add lowering for double divide to rcp/mul
Am 19.02.2015 um 23:47 schrieb Dave Airlie: From: Dave Airlie airl...@redhat.com It looks like no hw does div anyways, so we should just lower at the GLSL level. Signed-off-by: Dave Airlie airl...@redhat.com --- src/glsl/lower_instructions.cpp | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/glsl/lower_instructions.cpp b/src/glsl/lower_instructions.cpp index e8a69e7..ac6715b 100644 --- a/src/glsl/lower_instructions.cpp +++ b/src/glsl/lower_instructions.cpp @@ -199,7 +199,7 @@ lower_instructions_visitor::sub_to_add_neg(ir_expression *ir) void lower_instructions_visitor::div_to_mul_rcp(ir_expression *ir) { - assert(ir-operands[1]-type-is_float()); + assert(ir-operands[1]-type-is_float() || ir-operands[1]-type-is_double()); /* New expression for the 1.0 / op1 */ ir_rvalue *expr; @@ -327,7 +327,7 @@ lower_instructions_visitor::mod_to_floor(ir_expression *ir) /* Don't generate new IR that would need to be lowered in an additional * pass. */ - if (lowering(DIV_TO_MUL_RCP) ir-type-is_float()) + if (lowering(DIV_TO_MUL_RCP) (ir-type-is_float() || ir-type-is_double())) div_to_mul_rcp(div_expr); ir_expression *const floor_expr = @@ -1014,7 +1014,7 @@ lower_instructions_visitor::visit_leave(ir_expression *ir) case ir_binop_div: if (ir-operands[1]-type-is_integer() lowering(INT_DIV_TO_MUL_RCP)) int_div_to_mul_rcp(ir); - else if (ir-operands[1]-type-is_float() lowering(DIV_TO_MUL_RCP)) + else if ((ir-operands[1]-type-is_float() || ir-operands[1]-type-is_double()) lowering(DIV_TO_MUL_RCP)) div_to_mul_rcp(ir); break; FWIW I suspect lowering ddiv doesn't really cut it always. Can be done later though. (d3d11 in fact has ddiv as an extended double feature not just the ordinary double feature, though along with dfma and drcp, which makes me wonder how you'd do a division with the unextended version if you don't even have rcp... In any case ddiv is required to be precise to 0.5 ULP if supported, drcp only 1.0 ULP.) Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/23] main: Minor whitespace fixes in ClearNamedBuffer[Sub]Data.
Please squash this in the previous commit (with git rebase -i). On 12/02/2015 04:05, Laura Ekstrand wrote: --- src/mesa/main/bufferobj.c | 4 ++-- src/mesa/main/bufferobj.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index bd21c8a..88230d6 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -1765,10 +1765,10 @@ void GLAPIENTRY _mesa_ClearBufferSubData(GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, - const GLvoid* data) + const GLvoid *data) { GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object* bufObj; + struct gl_buffer_object *bufObj; bufObj = get_buffer(ctx, glClearBufferSubData, target, GL_INVALID_VALUE); if (!bufObj) diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h index 5254727..2a66444 100644 --- a/src/mesa/main/bufferobj.h +++ b/src/mesa/main/bufferobj.h @@ -220,7 +220,7 @@ _mesa_GetBufferSubData(GLenum target, GLintptrARB offset, void GLAPIENTRY _mesa_ClearBufferData(GLenum target, GLenum internalformat, GLenum format, GLenum type, - const GLvoid * data); + const GLvoid *data); void GLAPIENTRY _mesa_ClearNamedBufferData(GLuint buffer, GLenum internalformat, @@ -231,7 +231,7 @@ void GLAPIENTRY _mesa_ClearBufferSubData(GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, - const GLvoid * data); + const GLvoid *data); void GLAPIENTRY _mesa_ClearNamedBufferSubData(GLuint buffer, GLenum internalformat, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/23] main: Add entry points for ClearNamedBuffer[Sub]Data.
It could have been split in two, but as no changes affect any already-running code, I do not see any good reason to spend time on splitting this. Reviewed-by: Martin Peres martin.pe...@linux.intel.com On 12/02/2015 04:05, Laura Ekstrand wrote: --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 18 + src/mesa/main/bufferobj.c | 35 ++ src/mesa/main/bufferobj.h | 11 src/mesa/main/tests/dispatch_sanity.cpp| 2 ++ 4 files changed, 66 insertions(+) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 042b2a8..ce4017b 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -43,6 +43,24 @@ param name=size type=GLsizeiptr / /function + function name=ClearNamedBufferData offset=assign + param name=buffer type=GLuint / + param name=internalformat type=GLenum / + param name=format type=GLenum / + param name=type type=GLenum / + param name=data type=const GLvoid * / + /function + + function name=ClearNamedBufferSubData offset=assign + param name=buffer type=GLuint / + param name=internalformat type=GLenum / + param name=offset type=GLintptr / + param name=size type=GLsizeiptr / + param name=format type=GLenum / + param name=type type=GLenum / + param name=data type=const GLvoid * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index b8fa917..bd21c8a 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -1744,6 +1744,22 @@ _mesa_ClearBufferData(GLenum target, GLenum internalformat, GLenum format, glClearBufferData, false); } +void GLAPIENTRY +_mesa_ClearNamedBufferData(GLuint buffer, GLenum internalformat, + GLenum format, GLenum type, const GLvoid *data) +{ + GET_CURRENT_CONTEXT(ctx); + struct gl_buffer_object *bufObj; + + bufObj = _mesa_lookup_bufferobj_err(ctx, buffer, glClearNamedBufferData); + if (!bufObj) + return; + + _mesa_clear_buffer_sub_data(ctx, bufObj, internalformat, 0, bufObj-Size, + format, type, data, + glClearNamedBufferData, false); +} + void GLAPIENTRY _mesa_ClearBufferSubData(GLenum target, GLenum internalformat, @@ -1763,6 +1779,25 @@ _mesa_ClearBufferSubData(GLenum target, GLenum internalformat, glClearBufferSubData, true); } +void GLAPIENTRY +_mesa_ClearNamedBufferSubData(GLuint buffer, GLenum internalformat, + GLintptr offset, GLsizeiptr size, + GLenum format, GLenum type, + const GLvoid *data) +{ + GET_CURRENT_CONTEXT(ctx); + struct gl_buffer_object *bufObj; + + bufObj = _mesa_lookup_bufferobj_err(ctx, buffer, + glClearNamedBufferSubData); + if (!bufObj) + return; + + _mesa_clear_buffer_sub_data(ctx, bufObj, internalformat, offset, size, + format, type, data, + glClearNamedBufferSubData, true); +} + void * GLAPIENTRY _mesa_MapBuffer(GLenum target, GLenum access) diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h index 5911270..5254727 100644 --- a/src/mesa/main/bufferobj.h +++ b/src/mesa/main/bufferobj.h @@ -223,11 +223,22 @@ _mesa_ClearBufferData(GLenum target, GLenum internalformat, const GLvoid * data); void GLAPIENTRY +_mesa_ClearNamedBufferData(GLuint buffer, GLenum internalformat, + GLenum format, GLenum type, + const GLvoid *data); + +void GLAPIENTRY _mesa_ClearBufferSubData(GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, const GLvoid * data); +void GLAPIENTRY +_mesa_ClearNamedBufferSubData(GLuint buffer, GLenum internalformat, + GLintptr offset, GLsizeiptr size, + GLenum format, GLenum type, + const GLvoid *data); + void * GLAPIENTRY _mesa_MapBuffer(GLenum target, GLenum access); diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index aa8e352..4bd3e10 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -960,6 +960,8 @@ const struct function gl_core_functions_possible[] = { { glNamedBufferData, 45, -1 }, { glNamedBufferSubData, 45, -1 }, { glCopyNamedBufferSubData, 45, -1 }, + { glClearNamedBufferData, 45,
Re: [Mesa-dev] [PATCH 1/6] gallium: add some more double opcodes to avoid unnecessary lowering
Just some minor nits. Am 20.02.2015 um 00:52 schrieb Ilia Mirkin: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_info.c | 5 src/gallium/docs/source/tgsi.rst | 39 ++ src/gallium/include/pipe/p_shader_tokens.h | 7 +- 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index d04f9da..4d838fd 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -257,6 +257,11 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, COMP, D2U, TGSI_OPCODE_D2U }, { 1, 1, 0, 0, 0, 0, COMP, U2D, TGSI_OPCODE_U2D }, { 1, 1, 0, 0 ,0, 0, COMP, DRSQ, TGSI_OPCODE_DRSQ }, + { 1, 1, 0, 0, 0, 0, COMP, DTRUNC, TGSI_OPCODE_DTRUNC }, + { 1, 1, 0, 0, 0, 0, COMP, DCEIL, TGSI_OPCODE_DCEIL }, + { 1, 1, 0, 0, 0, 0, COMP, DFLR, TGSI_OPCODE_DFLR }, + { 1, 1, 0, 0, 0, 0, COMP, DROUND, TGSI_OPCODE_DROUND }, + { 1, 1, 0, 0, 0, 0, COMP, DSSG, TGSI_OPCODE_DSSG }, }; const struct tgsi_opcode_info * diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index e20af79..15f1e9f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1861,6 +1861,45 @@ two-component vectors with doubled precision in each component. dst.zw = src.zw - \lfloor src.zw\rfloor +.. opcode:: DTRUNC - Truncate + +.. math:: + + dst.xy = trunc(src.xy) + + dst.zw = trunc(src.zw) + +.. opcode:: DCEIL - Ceiling + +.. math:: + + dst.xy = \lceil src.xy\rceil + + dst.zw = \lceil src.zw\rceil + +.. opcode:: DFLR - Floor + +.. math:: + + dst.xy = \lfloor src.xy\rfloor + + dst.zw = \lfloor src.zw\rfloor + +.. opcode:: DROUND - Fraction + +.. math:: + + dst.xy = round(src.xy) + + dst.zw = round(src.zw) + +.. opcode:: DSSG - Set Sign + +.. math:: + + dst.xy = (src.xy 0) ? 1 : (src.xy 0) ? -1 : 0 + + dst.zw = (src.zw 0) ? 1 : (src.zw 0) ? -1 : 0 I think this would be more obvious written as 1.0/-1.0/0.0 (same goes for the non-double version, of course). (As a side note, I'm wondering if this actually has defined NaN behavior, the glsl language sort of implies a number has to be exactly one of these 3 cases, and doesn't give an answer what should be returned for a NaN - maybe a NaN?) .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index fc41cc9..95ac590 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -519,7 +519,12 @@ struct tgsi_property_data { #define TGSI_OPCODE_D2U 215 #define TGSI_OPCODE_U2D 216 #define TGSI_OPCODE_DRSQ217 /* eg, cayman also has DRSQ */ -#define TGSI_OPCODE_LAST218 +#define TGSI_OPCODE_DTRUNC 218 /* nvc0 */ +#define TGSI_OPCODE_DCEIL 219 /* nvc0 */ +#define TGSI_OPCODE_DFLR220 /* nvc0 */ +#define TGSI_OPCODE_DROUND 221 /* nvc0 */ +#define TGSI_OPCODE_DSSG222 +#define TGSI_OPCODE_LAST223 I don't think these hw-specific references (nvc0) really fit in there - the same can be said about some existing ones (cayman...) #define TGSI_SAT_NONE0 /* do not saturate */ #define TGSI_SAT_ZERO_ONE1 /* clamp to [0,1] */ Otherwise looks good to me. Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/23] main: Refactor ClearBuffer[Sub]Data.
On 20/02/2015 20:03, Laura Ekstrand wrote: This naming convention tries to match the corresponding DD table entry. There's a thread discussing the naming convention for external software fallback functions. Feel free to add your input to the discussion there :) Ack, no strong opinion here :) On Fri, Feb 20, 2015 at 6:18 AM, Martin Peres martin.pe...@free.fr mailto:martin.pe...@free.fr wrote: On 12/02/2015 04:05, Laura Ekstrand wrote: v2: review by Jason Ekstrand - Split refactor of clear buffer sub data from addition of DSA entry points. --- src/mesa/main/bufferobj.c| 125 --- src/mesa/main/bufferobj.h| 19 ++-- src/mesa/state_tracker/st_cb_bufferobjects.c | 4 +- 3 files changed, 69 insertions(+), 79 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 7225b64..b8fa917 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -660,11 +660,11 @@ _mesa_buffer_get_subdata( struct gl_context *ctx, GLintptrARB offset, * dd_function_table::ClearBufferSubData. */ void -_mesa_buffer_clear_subdata(struct gl_context *ctx, - GLintptr offset, GLsizeiptr size, - const GLvoid *clearValue, - GLsizeiptr clearValueSize, - struct gl_buffer_object *bufObj) +_mesa_ClearBufferSubData_sw(struct gl_context *ctx, Interesting choice of naming the function as a mix of camel case and underscores. Since it is an internal function, shouldn't it only have underscores? Other than that: Reviewed-by: Martin Peres martin.pe...@linux.intel.com mailto:martin.pe...@linux.intel.com +GLintptr offset, GLsizeiptr size, +const GLvoid *clearValue, +GLsizeiptr clearValueSize, +struct gl_buffer_object *bufObj) { GLsizeiptr i; GLubyte *dest; @@ -1113,7 +1113,7 @@ _mesa_init_buffer_object_functions(struct dd_function_table *driver) driver-UnmapBuffer = _mesa_buffer_unmap; /* GL_ARB_clear_buffer_object */ - driver-ClearBufferSubData = _mesa_buffer_clear_subdata; + driver-ClearBufferSubData = _mesa_ClearBufferSubData_sw; /* GL_ARB_map_buffer_range */ driver-MapBufferRange = _mesa_buffer_map_range; @@ -1671,57 +1671,77 @@ _mesa_GetBufferSubData(GLenum target, GLintptr offset, ctx-Driver.GetBufferSubData( ctx, offset, size, data, bufObj ); } - -void GLAPIENTRY -_mesa_ClearBufferData(GLenum target, GLenum internalformat, GLenum format, - GLenum type, const GLvoid* data) +/** + * \param subdata true if caller is *SubData, false if *Data + */ +void +_mesa_clear_buffer_sub_data(struct gl_context *ctx, +struct gl_buffer_object *bufObj, +GLenum internalformat, +GLintptr offset, GLsizeiptr size, +GLenum format, GLenum type, +const GLvoid *data, +const char *func, bool subdata) { - GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object* bufObj; mesa_format mesaFormat; GLubyte clearValue[MAX_PIXEL_BYTES]; GLsizeiptr clearValueSize; - bufObj = get_buffer(ctx, glClearBufferData, target, GL_INVALID_VALUE); - if (!bufObj) { - return; - } - - if (_mesa_check_disallowed_mapping(bufObj)) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glClearBufferData(buffer currently mapped)); + /* This checks for disallowed mappings. */ + if (!buffer_object_subdata_range_good(ctx, bufObj, offset, size, + subdata, func)) { return; } mesaFormat = validate_clear_buffer_format(ctx, internalformat, - format, type, - glClearBufferData); + format, type, func); + if (mesaFormat == MESA_FORMAT_NONE) { return; } clearValueSize = _mesa_get_format_bytes(mesaFormat); - if (bufObj-Size %
Re: [Mesa-dev] [PATCH 08/23] main: Add entry point for NamedBufferSubData.
On 20/02/2015 22:17, Laura Ekstrand wrote: That would make the diff easier to read, but it doesn't make sense to lay out the functions in that order in the file. GetBufferSubData is a download function and shouldn't be part of the upload function group in the file. Fair enough! That was a minor nitpick anyway and I did not expect you to fix it, hence why I gave my R-b ;) On Fri, Feb 20, 2015 at 2:28 AM, Martin Peres martin.pe...@free.fr mailto:martin.pe...@free.fr wrote: On 12/02/2015 04:05, Laura Ekstrand wrote: v2: review by Ian Romanick - Remove _mesa from name of static software fallback buffer_sub_data. - Remove mappedRange from _mesa_buffer_sub_data. - Removed some cosmetic changes to a separate commit. --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 ++ src/mesa/main/bufferobj.c | 129 +++-- src/mesa/main/bufferobj.h | 9 ++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + 4 files changed, 97 insertions(+), 49 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 7779262..6d70b8e 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -28,6 +28,13 @@ param name=usage type=GLenum / /function + function name=NamedBufferSubData offset=assign + param name=buffer type=GLuint / + param name=offset type=GLintptr / + param name=size type=GLsizeiptr / + param name=data type=const GLvoid * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index ac8eed1..4f89748 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -227,67 +227,58 @@ bufferobj_range_mapped(const struct gl_buffer_object *obj, * \c glClearBufferSubData. * * \param ctx GL context. - * \param target Buffer object target on which to operate. + * \param bufObj The buffer object. * \param offset Offset of the first byte of the subdata range. * \param sizeSize, in bytes, of the subdata range. * \param mappedRange If true, checks if an overlapping range is mapped. * If false, checks if buffer is mapped. - * \param errorNoBuffer Error code if no buffer is bound to target. * \param caller Name of calling function for recording errors. - * \return A pointer to the buffer object bound to \c target in the - * specified context or \c NULL if any of the parameter or state - * conditions are invalid. + * \return false if error, true otherwise * * \sa glBufferSubDataARB, glGetBufferSubDataARB, glClearBufferSubData */ -static struct gl_buffer_object * -buffer_object_subdata_range_good(struct gl_context * ctx, GLenum target, - GLintptrARB offset, GLsizeiptrARB size, - bool mappedRange, GLenum errorNoBuffer, - const char *caller) +static bool +buffer_object_subdata_range_good(struct gl_context *ctx, + struct gl_buffer_object *bufObj, + GLintptr offset, GLsizeiptr size, + bool mappedRange, const char *caller) { - struct gl_buffer_object *bufObj; - if (size 0) { _mesa_error(ctx, GL_INVALID_VALUE, %s(size 0), caller); - return NULL; + return false; } if (offset 0) { _mesa_error(ctx, GL_INVALID_VALUE, %s(offset 0), caller); - return NULL; + return false; } - bufObj = get_buffer(ctx, caller, target, errorNoBuffer); - if (!bufObj) - return NULL; - if (offset + size bufObj-Size) { _mesa_error(ctx, GL_INVALID_VALUE, %s(offset %lu + size %lu buffer size %lu), caller, (unsigned long) offset, (unsigned long) size, (unsigned long) bufObj-Size); - return NULL; + return false; }
[Mesa-dev] [PATCH] mesa: Have configure define NDEBUG, not mtypes.h.
mtypes.h had been defining NDEBUG (used by assert) if DEBUG was not defined. Confusing and bizarre that you don't get NDEBUG if you don't include mtypes.h. ... which is just what happened in commit bef38f62e. Let's let configure define this for us if not using --enable-debug. --- configure.ac | 2 ++ src/mesa/main/mtypes.h | 3 --- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index beb7a7d..5fbb7bc 100644 --- a/configure.ac +++ b/configure.ac @@ -370,6 +370,8 @@ if test x$enable_debug = xyes; then CXXFLAGS=$CXXFLAGS -O0 fi fi +else + DEFINES=$DEFINES -DNDEBUG fi dnl diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 05b5a81..6e99773 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -4512,9 +4512,6 @@ extern int MESA_DEBUG_FLAGS; # define MESA_VERBOSE 0 # define MESA_DEBUG_FLAGS 0 # define MESA_FUNCTION a function -# ifndef NDEBUG -# define NDEBUG -# endif #endif -- 2.0.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Fix the Mesa build without -DDEBUG.
Regardless of where DEBUG/NDEBUG is defined, it seems like we should do this anyway. Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Use fs_reg for CS/VS atomics pixel mask immediate data
Reviewed-by: Matt Turner matts...@gmail.com It might be a good idea to make the fs_reg(struct brw_reg) explicit to prevent this kind of mistake from happening. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] common: Correct texture initialization in create_texture_for_pbo.
Solves bugs related to the driver not setting up the texture miptree correctly, leading to faulty PBO uploads and downloads. --- src/mesa/drivers/common/meta_tex_subimage.c | 13 + src/mesa/drivers/dri/i965/intel_tex.c | 3 ++- src/mesa/main/dd.h | 1 + 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 68c8273..f4f7716 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -51,7 +51,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, { uint32_t pbo_format; GLenum internal_format; - unsigned row_stride; + unsigned row_stride, image_stride; struct gl_buffer_object *buffer_obj; struct gl_texture_object *tex_obj; struct gl_texture_image *tex_image; @@ -74,6 +74,8 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, pixels = _mesa_image_address3d(packing, pixels, width, height, format, type, 0, 0, 0); row_stride = _mesa_image_row_stride(packing, width, format, type); + image_stride = _mesa_image_image_stride(packing, width, height, format, + type); if (_mesa_is_bufferobj(packing-BufferObj)) { *tmp_pbo = 0; @@ -89,7 +91,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, */ _mesa_BindBuffer(pbo_target, *tmp_pbo); - _mesa_BufferData(pbo_target, row_stride * height, pixels, GL_STREAM_DRAW); + _mesa_BufferData(pbo_target, image_stride * depth, pixels, GL_STREAM_DRAW); buffer_obj = ctx-Unpack.BufferObj; pixels = NULL; @@ -99,9 +101,11 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, _mesa_GenTextures(1, tmp_tex); tex_obj = _mesa_lookup_texture(ctx, *tmp_tex); - tex_obj-Target = depth 1 ? GL_TEXTURE_2D_ARRAY : GL_TEXTURE_2D; - tex_obj-Immutable = GL_TRUE; _mesa_initialize_texture_object(ctx, tex_obj, *tmp_tex, GL_TEXTURE_2D); + /* This must be set after _mesa_initialize_texture_object, not before. */ + tex_obj-Immutable = GL_TRUE; + /* This is required for interactions with ARB_texture_view. */ + tex_obj-NumLayers = 1; internal_format = _mesa_get_format_base_format(pbo_format); @@ -114,6 +118,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, buffer_obj, (intptr_t)pixels, row_stride, + image_stride, read_only)) { _mesa_DeleteTextures(1, tmp_tex); _mesa_DeleteBuffers(1, tmp_pbo); diff --git a/src/mesa/drivers/dri/i965/intel_tex.c b/src/mesa/drivers/dri/i965/intel_tex.c index 2d3009a..3a0c09a 100644 --- a/src/mesa/drivers/dri/i965/intel_tex.c +++ b/src/mesa/drivers/dri/i965/intel_tex.c @@ -305,6 +305,7 @@ intel_set_texture_storage_for_buffer_object(struct gl_context *ctx, struct gl_buffer_object *buffer_obj, uint32_t buffer_offset, uint32_t row_stride, +uint32_t image_stride, bool read_only) { struct brw_context *brw = brw_context(ctx); @@ -334,7 +335,7 @@ intel_set_texture_storage_for_buffer_object(struct gl_context *ctx, drm_intel_bo *bo = intel_bufferobj_buffer(brw, intel_buffer_obj, buffer_offset, - row_stride * image-Height); + image_stride * image-Depth); intel_texobj-mt = intel_miptree_create_for_bo(brw, bo, image-TexFormat, diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h index ec8662b..9de08e2 100644 --- a/src/mesa/main/dd.h +++ b/src/mesa/main/dd.h @@ -429,6 +429,7 @@ struct dd_function_table { struct gl_buffer_object *bufferObj, uint32_t buffer_offset, uint32_t row_stride, +uint32_t image_stride, bool read_only); /** -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] i965: Force miptrees for BOs to have all slices in each lod.
Textures made expressly for internal buffer objects shouldn't have extra padding around them, but should be densely packed. --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 0e3888f..b46532d 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -724,7 +724,7 @@ intel_miptree_create_for_bo(struct brw_context *brw, mt = intel_miptree_create_layout(brw, target, format, 0, 0, width, height, depth, -true, 0, false); +true, 0, true); if (!mt) { free(mt); return mt; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] common: Fix PBOs for 1D_ARRAY.
Corrects the way that _mesa_meta_pbo_TexSubImage and _mesa_meta_pbo_GetTexSubImage handle 1D_ARRAY textures. Fixes a failure in the Piglit arb_direct_state_access/gettextureimage-targets test. --- src/mesa/drivers/common/meta_tex_subimage.c | 62 + 1 file changed, 36 insertions(+), 26 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index ee3295b..24a72ba 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -141,7 +141,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; - int z; + int z, iters; /* XXX: This should probably be passed in from somewhere */ const char *where = _mesa_meta_pbo_TexSubImage; @@ -189,12 +189,6 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_BindFramebuffer(GL_READ_FRAMEBUFFER, fbos[0]); _mesa_BindFramebuffer(GL_DRAW_FRAMEBUFFER, fbos[1]); - if (tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) { - assert(depth == 1); - depth = height; - height = 1; - } - _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, pbo_tex_image, 0); /* If this passes on the first layer it should pass on the others */ @@ -218,7 +212,10 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST)) goto fail; - for (z = 1; z depth; z++) { + iters = tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY ? + height : depth; + + for (z = 1; z iters; z++) { _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, pbo_tex_image, z); _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, @@ -226,11 +223,18 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_update_state(ctx); - _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, - 0, z * height, width, (z + 1) * height, - xoffset, yoffset, - xoffset + width, yoffset + height, - GL_COLOR_BUFFER_BIT, GL_NEAREST); + if (tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) + _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, +0, z, width, z + 1, +xoffset, yoffset, +xoffset + width, yoffset + 1, +GL_COLOR_BUFFER_BIT, GL_NEAREST); + else + _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, +0, z * height, width, (z + 1) * height, +xoffset, yoffset, +xoffset + width, yoffset + height, +GL_COLOR_BUFFER_BIT, GL_NEAREST); } success = true; @@ -257,7 +261,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, struct gl_texture_image *pbo_tex_image; GLenum status; bool success = false; - int z; + int z, iters; /* XXX: This should probably be passed in from somewhere */ const char *where = _mesa_meta_pbo_GetTexSubImage; @@ -299,12 +303,6 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_GenFramebuffers(2, fbos); - if (tex_image tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) { - assert(depth == 1); - depth = height; - height = 1; - } - /* If we were given a texture, bind it to the read framebuffer. If not, * we're doing a ReadPixels and we should just use whatever framebuffer * the client has bound. @@ -338,7 +336,12 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST)) goto fail; - for (z = 1; z depth; z++) { + if (tex_image tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) + iters = height; + else + iters = depth; + + for (z = 1; z iters; z++) { _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); _mesa_meta_bind_fbo_image(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, @@ -346,11 +349,18 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_update_state(ctx); - _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, - xoffset, yoffset, - xoffset + width, yoffset + height, - 0, z * height, width, (z + 1) * height, -
[Mesa-dev] [PATCH 2/4] common: Correct PBO 2D_ARRAY handling.
Changes PBO uploads and downloads to use a tall (height * depth) 2D texture for blitting. This fixes the bug where 2D_ARRAY, 3D, and CUBE_MAP_ARRAY textures are not properly uploaded and downloaded. --- src/mesa/drivers/common/meta_tex_subimage.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index f4f7716..ee3295b 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -110,7 +110,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, internal_format = _mesa_get_format_base_format(pbo_format); tex_image = _mesa_get_tex_image(ctx, tex_obj, tex_obj-Target, 0); - _mesa_init_teximage_fields(ctx, tex_image, width, height, depth, + _mesa_init_teximage_fields(ctx, tex_image, width, height * depth, 1, 0, internal_format, pbo_format); read_only = pbo_target == GL_PIXEL_UNPACK_BUFFER; @@ -227,7 +227,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_update_state(ctx); _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, - 0, 0, width, height, + 0, z * height, width, (z + 1) * height, xoffset, yoffset, xoffset + width, yoffset + height, GL_COLOR_BUFFER_BIT, GL_NEAREST); @@ -349,7 +349,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, xoffset, yoffset, xoffset + width, yoffset + height, - 0, 0, width, height, + 0, z * height, width, (z + 1) * height, GL_COLOR_BUFFER_BIT, GL_NEAREST); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/4] v2: Fix PBO uploads/downloads.
This is a more robust set of patches to fix Meta PBO texture uploads and downloads. This fixes bugs related to array PBOs that were found using the new set of getteximage-targets tests. Laura Ekstrand (4): common: Correct texture initialization in create_texture_for_pbo. common: Correct PBO 2D_ARRAY handling. common: Fix PBOs for 1D_ARRAY. i965: Force miptrees for BOs to have all slices in each lod. src/mesa/drivers/common/meta_tex_subimage.c | 77 --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +- src/mesa/drivers/dri/i965/intel_tex.c | 3 +- src/mesa/main/dd.h| 1 + 4 files changed, 50 insertions(+), 33 deletions(-) -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] i965/skl: Layout 3D textures the same as array textures
On Gen9+ the 3D textures use the same mipmap layout as 2D array textures. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 0e2841f..57922e9 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -224,6 +224,9 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) width = minify(width, 1); height = minify(height, 1); + + if (mt-target == GL_TEXTURE_3D) + depth = minify(depth, 1); } } @@ -263,7 +266,7 @@ brw_miptree_layout_texture_array(struct brw_context *brw, if (mt-compressed) img_height /= mt-align_h; - for (int q = 0; q mt-physical_depth0; q++) { + for (int q = 0; q mt-level[level].depth; q++) { if (mt-array_layout == ALL_SLICES_AT_EACH_LOD) { intel_miptree_set_image_offset(mt, level, q, 0, q * img_height); } else { @@ -368,7 +371,10 @@ brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt) break; case GL_TEXTURE_3D: - brw_miptree_layout_texture_3d(brw, mt); + if (brw-gen = 9) + brw_miptree_layout_texture_array(brw, mt); + else + brw_miptree_layout_texture_3d(brw, mt); break; case GL_TEXTURE_1D_ARRAY: -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] i965/skl: Layout a 1D miptree horizontally
On Gen9+ the 1D miptree is laid out with all of the mipmap levels in a horizontal line. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 62 +- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 57922e9..851742b 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -158,6 +158,36 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, } static void +gen9_miptree_layout_1d(struct intel_mipmap_tree *mt) +{ + unsigned x = 0; + unsigned width = mt-physical_width0; + unsigned depth = mt-physical_depth0; /* number of array layers. */ + + /* When this layout is used the horizontal alignment is fixed at 64 and the +* hardware ignores the value given in the surface state +*/ + const unsigned int align_w = 64; + + mt-total_height = mt-physical_height0; + mt-total_width = 0; + + for (unsigned level = mt-first_level; level = mt-last_level; level++) { + unsigned img_width; + + intel_miptree_set_level_info(mt, level, x, 0, depth); + + img_width = ALIGN(width, align_w); + + mt-total_width = MAX2(mt-total_width, x + img_width); + + x += img_width; + + width = minify(width, 1); + } +} + +static void brw_miptree_layout_2d(struct intel_mipmap_tree *mt) { unsigned x = 0; @@ -242,12 +272,34 @@ align_cube(struct intel_mipmap_tree *mt) mt-total_height += 2; } +static bool +use_linear_1d_layout(struct brw_context *brw, + struct intel_mipmap_tree *mt) +{ + /* On Gen9+ the mipmap levels of a 1D surface are all laid out in a +* horizontal line. This isn't done for depth/stencil buffers however +* because those will be using a tiled layout +*/ + if (brw-gen = 9 + (mt-target == GL_TEXTURE_1D || +mt-target == GL_TEXTURE_1D_ARRAY)) { + GLenum base_format = _mesa_get_format_base_format(mt-format); + + if (base_format != GL_DEPTH_COMPONENT + base_format != GL_DEPTH_STENCIL) + return true; + } + + return false; +} + static void brw_miptree_layout_texture_array(struct brw_context *brw, struct intel_mipmap_tree *mt) { int h0, h1; unsigned height = mt-physical_height0; + bool layout_1d = use_linear_1d_layout(brw, mt); h0 = ALIGN(mt-physical_height0, mt-align_h); h1 = ALIGN(minify(mt-physical_height0, 1), mt-align_h); @@ -258,7 +310,10 @@ brw_miptree_layout_texture_array(struct brw_context *brw, int physical_qpitch = mt-compressed ? mt-qpitch / 4 : mt-qpitch; - brw_miptree_layout_2d(mt); + if (layout_1d) + gen9_miptree_layout_1d(mt); + else + brw_miptree_layout_2d(mt); for (unsigned level = mt-first_level; level = mt-last_level; level++) { unsigned img_height; @@ -392,7 +447,10 @@ brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt) break; case INTEL_MSAA_LAYOUT_NONE: case INTEL_MSAA_LAYOUT_IMS: - brw_miptree_layout_2d(mt); + if (use_linear_1d_layout(brw, mt)) +gen9_miptree_layout_1d(mt); + else +brw_miptree_layout_2d(mt); break; } break; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] i965/skl: Align compressed textures to four times the block size
On Skylake it is possible to choose your own alignment values for compressed textures but they are expressed as a multiple of the block size. The minimum alignment value we can use is 4 so we effectively have to align to 4 times the block size. This patch makes it initially set mt-align_[wh] to the large alignment value and then later divides it by the block size so that it can be uploaded as part of the surface state. --- src/mesa/drivers/dri/i965/brw_tex_layout.c | 31 ++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index d89a04e..1fe2c26 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -73,7 +73,16 @@ intel_horizontal_texture_alignment_unit(struct brw_context *brw, */ unsigned int i, j; _mesa_get_format_block_size(format, i, j); - return i; + + /* On Gen9+ we can pick our own alignment for compressed textures but it + * has to be a multiple of the block size. The minimum alignment we can + * pick is 4 so we effectively have to align to 4 times the block + * size + */ + if (brw-gen = 9) + return i * 4; + else + return i; } if (format == MESA_FORMAT_S_UINT8) @@ -113,7 +122,8 @@ intel_vertical_texture_alignment_unit(struct brw_context *brw, * the SURFACE_STATE Surface Vertical Alignment field. */ if (_mesa_is_format_compressed(format)) - return 4; + /* See comment above for the horizontal alignment */ + return brw-gen = 9 ? 16 : 4; if (format == MESA_FORMAT_S_UINT8) return brw-gen = 7 ? 8 : 4; @@ -195,6 +205,9 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) unsigned width = mt-physical_width0; unsigned height = mt-physical_height0; unsigned depth = mt-physical_depth0; /* number of array layers. */ + unsigned int bw, bh; + + _mesa_get_format_block_size(mt-format, bw, bh); mt-total_width = mt-physical_width0; @@ -212,7 +225,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) if (mt-compressed) { mip1_width = ALIGN(minify(mt-physical_width0, 1), mt-align_w) + - ALIGN(minify(mt-physical_width0, 2), mt-align_w); + ALIGN(minify(mt-physical_width0, 2), bw); } else { mip1_width = ALIGN(minify(mt-physical_width0, 1), mt-align_w) + minify(mt-physical_width0, 2); @@ -232,7 +245,7 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt) img_height = ALIGN(height, mt-align_h); if (mt-compressed) -img_height /= mt-align_h; +img_height /= bh; if (mt-array_layout == ALL_SLICES_AT_EACH_LOD) { /* Compact arrays with separated miplevels */ @@ -481,5 +494,15 @@ brw_miptree_layout(struct brw_context *brw, struct intel_mipmap_tree *mt) } DBG(%s: %dx%dx%d\n, __FUNCTION__, mt-total_width, mt-total_height, mt-cpp); + + /* On Gen9+ the alignment values are expressed in multiples of the block +* size +*/ + if (brw-gen = 9) { + unsigned int i, j; + _mesa_get_format_block_size(mt-format, i, j); + mt-align_w /= i; + mt-align_h /= j; + } } -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/6] i965/skl: Texture layout fixes
Here is a v2 of my patch series to fix 1D textures on Skylake. It's now turned into just some general fixes and also helps with 3D textures. It fixes 110 piglit tests but sadly seems to cause 3 regressions. It might not be worth landing until I can work out what the regressions are so I guess I'm just posting it now in case anyone's interested to look at it anyway. I've uploaded the piglit html summary of the changes here: http://busydoingnothing.co.uk/skl-qpitch-patches/changes.html - Neil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] i965: Don't force x-tiling for 16-bpp formats on Gen7
Sandybridge doesn't support y-tiling for surface formats with 16 or more bpp. There was previously an override to explicitly allow this for Gen7. However, this restriction is also removed in Gen8+ so we should use y-tiling there too. This is important to do for Skylake which doesn't support x-tiling for 3D surfaces. --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 0e3888f..994670a 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -516,9 +516,9 @@ intel_miptree_choose_tiling(struct brw_context *brw, * NOTE: 128BPE Format Color Buffer ( render target ) MUST be either TileX * or Linear. * 128 bits per pixel translates to 16 bytes per pixel. This is necessary -* all the way back to 965, but is explicitly permitted on Gen7. +* all the way back to 965, but is explicitly permitted on Gen7+. */ - if (brw-gen != 7 mt-cpp = 16) + if (brw-gen 7 mt-cpp = 16) return I915_TILING_X; /* From the Ivy Bridge PRM, Vol4 Part1 2.12.2.1 (SURFACE_STATE for most -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] i965/skl: Don't use ALL_SLICES_AT_EACH_LOD
The render surface state command for Skylake doesn't have the surface array spacing bit so I don't think it's possible to select this layout. This avoids a kernel panic when running the piglit test below: ext_framebuffer_multisample-no-color 8 stencil single However the test still fails so there may be something else wrong as well. The test was not causing a kernel panic before the patch to fix the qpitch. I think it's also not possible to select this layout on Gen8 so it may need to be changed to only be used on Gen7. --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 26 -- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 994670a..018e16b 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -371,19 +371,25 @@ intel_miptree_create_layout(struct brw_context *brw, } } - /* Set array_layout to ALL_SLICES_AT_EACH_LOD when gen7+ array_spacing_lod0 -* can be used. array_spacing_lod0 is only used for non-IMS MSAA surfaces. + /* Set array_layout to ALL_SLICES_AT_EACH_LOD when array_spacing_lod0 +* can be used. array_spacing_lod0 is only used for non-IMS MSAA surfaces +* on Gen 7 and 8. * TODO: can we use it elsewhere? +* TODO: does this actually work on Gen 8? */ - switch (mt-msaa_layout) { - case INTEL_MSAA_LAYOUT_NONE: - case INTEL_MSAA_LAYOUT_IMS: + if (brw-gen = 9) { mt-array_layout = ALL_LOD_IN_EACH_SLICE; - break; - case INTEL_MSAA_LAYOUT_UMS: - case INTEL_MSAA_LAYOUT_CMS: - mt-array_layout = ALL_SLICES_AT_EACH_LOD; - break; + } else { + switch (mt-msaa_layout) { + case INTEL_MSAA_LAYOUT_NONE: + case INTEL_MSAA_LAYOUT_IMS: + mt-array_layout = ALL_LOD_IN_EACH_SLICE; + break; + case INTEL_MSAA_LAYOUT_UMS: + case INTEL_MSAA_LAYOUT_CMS: + mt-array_layout = ALL_SLICES_AT_EACH_LOD; + break; + } } if (target == GL_TEXTURE_CUBE_MAP) { -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] nir: Add a couple of simplifications of csel operations.
The series is Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] drivers/x11: add gallium include dirs to Makefile.am
Matt Turner matts...@gmail.com writes: On Fri, Feb 20, 2015 at 12:20 PM, Brian Paul bri...@vmware.com wrote: Fixes xlib driver build after e8c5cbfd921680c. --- src/mesa/drivers/x11/Makefile.am | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/x11/Makefile.am b/src/mesa/drivers/x11/Makefile.am index 76a7cd4..c0596f8 100644 --- a/src/mesa/drivers/x11/Makefile.am +++ b/src/mesa/drivers/x11/Makefile.am @@ -30,6 +30,8 @@ AM_CPPFLAGS = \ -I$(top_srcdir)/src/mapi \ -I$(top_srcdir)/src/mesa \ -I$(top_srcdir)/src \ + -I$(top_srcdir)/src/gallium/include \ + -I$(top_srcdir)/src/gallium/auxiliary \ I really dislike including these directories in a bunch of non-Gallium places. I should have said so more strongly when the patch went by. I'm not sure whether I want to spend my time to fix this the right way or what. I guess just go ahead and commit this until something better comes along. I'm sympathetic, it's just the feeling of if I can't get incremental changes towards sharing, we'll never get anywhere and we'll be stuck with this code duplication we've had for the last 5 years. p_compiler.h looks like something we should be able to get out into util/, at which point we could drop these include dirs. It's on my list still, but I do really want to land my NIR work. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Have configure define NDEBUG, not mtypes.h.
On Friday, February 20, 2015 12:44:53 PM Matt Turner wrote: mtypes.h had been defining NDEBUG (used by assert) if DEBUG was not defined. Confusing and bizarre that you don't get NDEBUG if you don't include mtypes.h. ... which is just what happened in commit bef38f62e. Let's let configure define this for us if not using --enable-debug. --- configure.ac | 2 ++ src/mesa/main/mtypes.h | 3 --- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index beb7a7d..5fbb7bc 100644 --- a/configure.ac +++ b/configure.ac @@ -370,6 +370,8 @@ if test x$enable_debug = xyes; then CXXFLAGS=$CXXFLAGS -O0 fi fi +else + DEFINES=$DEFINES -DNDEBUG fi dnl diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index 05b5a81..6e99773 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -4512,9 +4512,6 @@ extern int MESA_DEBUG_FLAGS; # define MESA_VERBOSE 0 # define MESA_DEBUG_FLAGS 0 # define MESA_FUNCTION a function -# ifndef NDEBUG -# define NDEBUG -# endif #endif It looks like scons/gallium.py already defines DEBUG and NDEBUG: cppdefines = [] if env['build'] in ('debug', 'checked'): cppdefines += ['DEBUG'] else: cppdefines += ['NDEBUG'] So this may not even break the SCons build. If it does, there's a fairly clear solution there. Reviewed-by: Kenneth Graunke kenn...@whitecape.org signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: Fix the Mesa build without -DDEBUG.
With -DDEBUG -UNDEBUG, this assert uses reg_state::stack_size, which doesn't exist, breaking the build: assert(state-states[index].index state-states[index].stack_size); Switch it to ifndef NDEBUG, so the field will exist if the assertion actually generates code. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/glsl/nir/nir_to_ssa.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_to_ssa.c b/src/glsl/nir/nir_to_ssa.c index dbe1699..47cf453 100644 --- a/src/glsl/nir/nir_to_ssa.c +++ b/src/glsl/nir/nir_to_ssa.c @@ -134,7 +134,7 @@ typedef struct { nir_ssa_def **stack; int index; unsigned num_defs; /** used to add indices to debug names */ -#ifdef DEBUG +#ifndef NDEBUG unsigned stack_size; #endif } reg_state; @@ -489,7 +489,7 @@ init_rewrite_state(nir_function_impl *impl, rewrite_state *state) state-states[reg-index].stack = ralloc_array(state-states, nir_ssa_def *, stack_size); -#ifdef DEBUG +#ifndef NDEBUG state-states[reg-index].stack_size = stack_size; #endif state-states[reg-index].index = -1; -- 2.2.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/fs/nir: Optimize (gl_FrontFacing ? x : y) where x and y are ±1.0.
Matt Turner matts...@gmail.com writes: On Fri, Feb 20, 2015 at 11:54 AM, Eric Anholt e...@anholt.net wrote: I wanted patch #1 to land, so I took a look at this one :) Thanks! :) Matt Turner matts...@gmail.com writes: + if (brw-gen = 6) { + /* Bit 15 of g0.0 is 0 if the polygon is front facing. */ + fs_reg g0 = fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_W)); + + /* For (gl_FrontFacing ? 1.0 : -1.0), emit: + * + *or(8) tmp.12W g0.00,1,0W 0x3f80W + *and(8) dst1Dtmp8,8,1D 0xbf80D + * + * and negate g0.00,1,0W for (gl_FrontFacing ? -1.0 : 1.0). + */ + + if (value1-f[0] == -1.0f) { + g0.negate = true; + } Does this do what you want? If g0.0 happened to be *all* zeroes, you're still going to get 0 after negation, right? That's a good question. I'm not sure. The bits below it are 13: Source Depth Present to Render Target. 12: oMask to Render Target 11: Source0 Alpha Present to RenderTarget. 8:6: Starting Sample Pair Index BDW adds some additional fields as well. The others are ignored. It's not clear to me that at least one of the defined bits is guaranteed to be zero. It's no guarantee or anything, but FWIW without realizing it we were depending on some bit being non-zero for the frontfacing optimizations that used ASR as well (commits d1c43ed4, 19c6617a) and haven't seen any problems from it. So if there's a problem... we're not making it worse in this commit... The simulator seems to set some bits in the ignored fields, but I don't have any explanation for that, nor is that evidence that we can rely on the hardware to do the same. Or maybe I'm just wrong and some bit is guaranteed to be set? A This negation looks like it's safe in practice, because bits 0:4 will surely be TRIANGLES comment seems fine with me. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Use fs_reg for CS/VS atomics pixel mask immediate data
The brw_imm_ud will yield a HW_REG which then will introduce a barrier for certain optimization opportunities. No piglit regressions seen with gen8 (simd8vs). Suggested-by: Matt Turner matts...@gmail.com Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Matt Turner matts...@gmail.com --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index fa7d32c..b1b75821 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -3016,7 +3016,7 @@ fs_visitor::emit_untyped_atomic(unsigned atomic_op, unsigned surf_index, */ assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0x)))-force_writemask_all = true; + fs_reg(0x)))-force_writemask_all = true; } length++; @@ -3079,7 +3079,7 @@ fs_visitor::emit_untyped_surface_read(unsigned surf_index, fs_reg dst, */ assert(stage == MESA_SHADER_VERTEX || stage == MESA_SHADER_COMPUTE); emit(MOV(component(sources[0], 7), - brw_imm_ud(0x)))-force_writemask_all = true; + fs_reg(0x)))-force_writemask_all = true; } /* Set the surface read offset. */ -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] nir: Compilation error in nir/nir_to_ssa.c
On Fri, Feb 20, 2015 at 1:23 PM, Dieter Nützel die...@nuetzel-hh.de wrote: make[3]: Entering directory '/opt/mesa/src/glsl' CC nir/nir_to_ssa.lo In file included from nir/../glsl_types.h:30:0, from nir/nir_types.h:32, from nir/nir.h:36, from nir/nir_to_ssa.c:28: nir/nir_to_ssa.c: In function 'rewrite_def_forwards': nir/nir_to_ssa.c:226:60: error: 'reg_state' has no member named 'stack_size' assert(state-states[index].index state-states[index].stack_size); ^ Makefile:1602: recipe for target 'nir/nir_to_ssa.lo' failed make[3]: *** [nir/nir_to_ssa.lo] Error 1 It's only defined for DEBUG. -Dieter Yep. There are two patches on the list to fix this. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] drivers/x11: add gallium include dirs to Makefile.am
On Fri, Feb 20, 2015 at 12:20 PM, Brian Paul bri...@vmware.com wrote: Fixes xlib driver build after e8c5cbfd921680c. --- src/mesa/drivers/x11/Makefile.am | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/x11/Makefile.am b/src/mesa/drivers/x11/Makefile.am index 76a7cd4..c0596f8 100644 --- a/src/mesa/drivers/x11/Makefile.am +++ b/src/mesa/drivers/x11/Makefile.am @@ -30,6 +30,8 @@ AM_CPPFLAGS = \ -I$(top_srcdir)/src/mapi \ -I$(top_srcdir)/src/mesa \ -I$(top_srcdir)/src \ + -I$(top_srcdir)/src/gallium/include \ + -I$(top_srcdir)/src/gallium/auxiliary \ I really dislike including these directories in a bunch of non-Gallium places. I should have said so more strongly when the patch went by. I'm not sure whether I want to spend my time to fix this the right way or what. I guess just go ahead and commit this until something better comes along. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] i965/fs/nir: Optimize (gl_FrontFacing ? x : y) where x and y are ±1.0.
On Fri, Feb 20, 2015 at 1:41 PM, Eric Anholt e...@anholt.net wrote: Or maybe I'm just wrong and some bit is guaranteed to be set? A This negation looks like it's safe in practice, because bits 0:4 will surely be TRIANGLES comment seems fine with me. Thanks, will do. R-b? I realized I was looking at Vol4/Part1 which described the render target write message header, which much the same description but not the actual incoming thread dispatch payload. Vol2/Part1 has the info I want. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] i965/skl: Fix the qpitch value
On Skylake the qpitch value is uploaded as part of the surface state so we don't need to add the extra rows that are done for other generations. However for 3D textures it needs to be aligned to the tile height. Unlike previous generations the qpitch is measured as a multiple of the block size for compressed surfaces. When the horizontal mipmap layout is used for 1D textures then the qpitch is measured in pixels instead of rows. --- src/mesa/drivers/dri/i965/brw_tex_layout.c| 44 +-- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 10 -- 2 files changed, 41 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c index 851742b..d89a04e 100644 --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c @@ -297,24 +297,48 @@ static void brw_miptree_layout_texture_array(struct brw_context *brw, struct intel_mipmap_tree *mt) { - int h0, h1; unsigned height = mt-physical_height0; bool layout_1d = use_linear_1d_layout(brw, mt); - - h0 = ALIGN(mt-physical_height0, mt-align_h); - h1 = ALIGN(minify(mt-physical_height0, 1), mt-align_h); - if (mt-array_layout == ALL_SLICES_AT_EACH_LOD) - mt-qpitch = h0; - else - mt-qpitch = (h0 + h1 + (brw-gen = 7 ? 12 : 11) * mt-align_h); - - int physical_qpitch = mt-compressed ? mt-qpitch / 4 : mt-qpitch; + int physical_qpitch; if (layout_1d) gen9_miptree_layout_1d(mt); else brw_miptree_layout_2d(mt); + if (layout_1d) { + physical_qpitch = mt-align_h; + /* When using the horizontal layout the qpitch is measured in pixels */ + mt-qpitch = physical_qpitch * mt-total_width; + } else if (brw-gen = 9) { + /* On Gen9 we can pick whatever qpitch we like as long as it's aligned + * to the vertical alignment so we don't need to add any extra rows. + */ + mt-qpitch = mt-total_height; + + /* However 3D textures need to be aligned to the tile height. At this + * point we don't know which tiling will be used so let's just align it + * to 32 + */ + if (mt-target == GL_TEXTURE_3D) + mt-qpitch = ALIGN(mt-qpitch, 32); + + /* Unlike previous generations the qpitch is now a multiple of the + * compressed block size so physical_qpitch matches mt-qpitch. + */ + physical_qpitch = mt-qpitch; + } else { + int h0 = ALIGN(mt-physical_height0, mt-align_h); + int h1 = ALIGN(minify(mt-physical_height0, 1), mt-align_h); + + if (mt-array_layout == ALL_SLICES_AT_EACH_LOD) + mt-qpitch = h0; + else + mt-qpitch = (h0 + h1 + (brw-gen = 7 ? 12 : 11) * mt-align_h); + + physical_qpitch = mt-compressed ? mt-qpitch / 4 : mt-qpitch; + } + for (unsigned level = mt-first_level; level = mt-last_level; level++) { unsigned img_height; img_height = ALIGN(height, mt-align_h); diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index ee9cf1e..247e5b8 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -380,10 +380,14 @@ struct intel_mipmap_tree enum miptree_array_layout array_layout; /** -* The distance in rows between array slices in an uncompressed surface. +* The distance in between array slices. * -* For compressed surfaces, slices are stored closer together physically; -* the real distance is (qpitch / block height). +* The value is the one that is sent in the surface state. The actual +* meaning depends on certain criteria. Usually it is simply the number of +* uncompressed rows between each slice. However on Gen9+ for compressed +* surfaces it is the number of blocks. For 1D surfaces that have the +* mipmap tree stored horizontally it is the number of pixels between each +* slice. */ uint32_t qpitch; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Have configure define NDEBUG, not mtypes.h.
On Fri, Feb 20, 2015 at 12:44 PM, Matt Turner matts...@gmail.com wrote: mtypes.h had been defining NDEBUG (used by assert) if DEBUG was not defined. Confusing and bizarre that you don't get NDEBUG if you don't include mtypes.h. ... which is just what happened in commit bef38f62e. Let's let configure define this for us if not using --enable-debug. --- Scons will probably need similar treatment. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 23/23] main: Cosmetic changes to GetBufferSubData.
On 20/02/2015 19:58, Laura Ekstrand wrote: Ian has asked that this not be squashed. In fact, they were the same commit last time around on the ML. Ok. Sorry for the noise then :s On Fri, Feb 20, 2015 at 12:29 AM, Martin Peres martin.pe...@free.fr mailto:martin.pe...@free.fr wrote: Please squash this commit with the one introducing GetBufferSubData. It should be pretty easy to do with git rebase -i :) On 12/02/2015 04:06, Laura Ekstrand wrote: --- src/mesa/main/bufferobj.c | 2 +- src/mesa/main/bufferobj.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 0272704..38d8b5a 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -1668,7 +1668,7 @@ _mesa_GetBufferSubData(GLenum target, GLintptr offset, } ASSERT(ctx-Driver.GetBufferSubData); - ctx-Driver.GetBufferSubData( ctx, offset, size, data, bufObj ); + ctx-Driver.GetBufferSubData(ctx, offset, size, data, bufObj); } void GLAPIENTRY diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h index feeaa8b..b5d73ae 100644 --- a/src/mesa/main/bufferobj.h +++ b/src/mesa/main/bufferobj.h @@ -230,8 +230,8 @@ _mesa_NamedBufferSubData(GLuint buffer, GLintptr offset, GLsizeiptr size, const GLvoid *data); void GLAPIENTRY -_mesa_GetBufferSubData(GLenum target, GLintptrARB offset, - GLsizeiptrARB size, void * data); +_mesa_GetBufferSubData(GLenum target, GLintptr offset, + GLsizeiptr size, GLvoid *data); void GLAPIENTRY _mesa_GetNamedBufferSubData(GLuint buffer, GLintptr offset, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] nir: Compilation error in nir/nir_to_ssa.c
make[3]: Entering directory '/opt/mesa/src/glsl' CC nir/nir_to_ssa.lo In file included from nir/../glsl_types.h:30:0, from nir/nir_types.h:32, from nir/nir.h:36, from nir/nir_to_ssa.c:28: nir/nir_to_ssa.c: In function 'rewrite_def_forwards': nir/nir_to_ssa.c:226:60: error: 'reg_state' has no member named 'stack_size' assert(state-states[index].index state-states[index].stack_size); ^ Makefile:1602: recipe for target 'nir/nir_to_ssa.lo' failed make[3]: *** [nir/nir_to_ssa.lo] Error 1 It's only defined for DEBUG. -Dieter ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 89245] [bisected] Drop dependency on mtypes.h for core NIR patch broke Gallium builds
https://bugs.freedesktop.org/show_bug.cgi?id=89245 Bug ID: 89245 Summary: [bisected] Drop dependency on mtypes.h for core NIR patch broke Gallium builds Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: All Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: b.bel...@gmail.com QA Contact: mesa-dev@lists.freedesktop.org CC: e...@anholt.net The patch nir: Drop dependency on mtypes.h for core NIR. (commit bef38f62e026) broke my Gallium (r600g) build. I build with: ./autogen.sh --with-gallium-drivers=r600 --with-dri-drivers= --enable-texture-float --disable-dri3 --disable-r600-llvm-compiler --disable-gallium-llvm CFLAGS=-O2 -march=native CXXFLAGS=-O2 -march=native --libdir=/usr/local/lib64 --prefix=/usr/local Here is the error: [...] CC nir/nir_split_var_copies.lo CC nir/nir_to_ssa.lo CC nir/nir_validate.lo CC nir/nir_worklist.lo In file included from nir/../glsl_types.h:30:0, from nir/nir_types.h:32, from nir/nir.h:36, from nir/nir_to_ssa.c:28: nir/nir_to_ssa.c: In function 'rewrite_def_forwards': nir/nir_to_ssa.c:226:60: error: 'reg_state' has no member named 'stack_size' assert(state-states[index].index state-states[index].stack_size); ^ Makefile:1602: recipe for target 'nir/nir_to_ssa.lo' failed make[3]: *** [nir/nir_to_ssa.lo] Error 1 make[3]: *** Attente des tâches non terminées make[3]: Leaving directory '/home/benjamin/MESA/mesa/src/glsl' Makefile:1209: recipe for target 'all' failed make[2]: *** [all] Error 2 make[2]: Leaving directory '/home/benjamin/MESA/mesa/src/glsl' Makefile:658: recipe for target 'all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/home/benjamin/MESA/mesa/src' Makefile:601: recipe for target 'all-recursive' failed make: *** [all-recursive] Error 1 -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Use fs_reg for CS/VS atomics pixel mask immediate data
On Fri, Feb 20, 2015 at 1:14 PM, Jordan Justen jordan.l.jus...@intel.com wrote: The brw_imm_ud will yield a HW_REG which then will introduce a barrier for certain optimization opportunities. No piglit regressions seen with gen8 (simd8vs). Suggested-by: Matt Turner matts...@gmail.com Signed-off-by: Jordan Justen jordan.l.jus...@intel.com Cc: Matt Turner matts...@gmail.com By the way, there are some more in gen6_gs_visitor.cpp if you'd like to clean those up as well. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/5] nir: Allow nir_opt_algebraic to see booleanness through , ||, !.
On Fri, Feb 20, 2015 at 3:04 PM, Eric Anholt e...@anholt.net wrote: We have some useful optimizations to drop things like 'ine a, 0' on a boolean argument, but if 'a' came from logical operations on bools, it couldn't tell. These kinds of constructs appear as a result of TGSI-NIR quite frequently (at least with if flattening), so being a little more aggressive in detecting booleans can pay off. vc4 results: total instructions in shared programs: 40207 - 39881 (-0.81%) instructions in affected programs: 6677 - 6351 (-4.88%) --- src/glsl/nir/nir_search.c | 29 - 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index 4671931..d12b6ab 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -39,6 +39,32 @@ match_expression(const nir_search_expression *expr, nir_alu_instr *instr, static const uint8_t identity_swizzle[] = { 0, 1, 2, 3 }; +static bool alu_instr_is_bool(nir_alu_instr *instr); + +static bool +src_is_bool(nir_src src) +{ + if (!src.is_ssa) + return false; + if (src.ssa-parent_instr-type != nir_instr_type_alu) + return false; + return alu_instr_is_bool((nir_alu_instr *)src.ssa-parent_instr); +} + +static bool +alu_instr_is_bool(nir_alu_instr *instr) +{ + switch (instr-op) { + case nir_op_iand: + case nir_op_ior: Add nir_op_ixor here? + return src_is_bool(instr-src[0].src) src_is_bool(instr-src[1].src); + case nir_op_inot: + return src_is_bool(instr-src[0].src); + default: + return nir_op_infos[instr-op].output_type == nir_type_bool; + } +} + static bool match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned src, unsigned num_components, const uint8_t *swizzle, @@ -89,7 +115,8 @@ match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned src, nir_alu_instr *src_alu = nir_instr_as_alu(instr-src[src].src.ssa-parent_instr); -if (nir_op_infos[src_alu-op].output_type != var-type) +if (nir_op_infos[src_alu-op].output_type != var-type +!(var-type == nir_type_bool alu_instr_is_bool(src_alu))) return false; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 89245] [bisected] Drop dependency on mtypes.h for core NIR patch broke Gallium builds
https://bugs.freedesktop.org/show_bug.cgi?id=89245 Benjamin Bellec b.bel...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED CC||b.bel...@gmail.com Resolution|--- |FIXED --- Comment #1 from Benjamin Bellec b.bel...@gmail.com --- Fixed by commit b6393d70402a60c124c1884d8d0cc1dc6a9b4ca5 and/or b6393d70402a60c124c1884d8d0cc1dc6a9b4ca5 -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] i965: Introduce the BROADCAST pseudo-opcode.
I want to be sure I understand what this series is doing and why it's necessary. Sampler arrays and UBO arrays can, with some restrictions, be indexed dynamically. The primary restriction is that any active SIMD channels that access the array must access the same index. When you get to a particular access, not all SIMD channels may be active. This is the essence of what the GLSL spec calls dynamically uniform. The implication is that the compiler is allowed to pick the index from any active SIMD channel. If they're not all the same, the shader author should have followed the rules. Currently i965 picks the index from a SIMD channel that may or may not be active (due to other, non-uniform flow control). Correct? This patch series attempts to fix that by first introducing some opcodes to get a value from one of the active SIMD channels, then using those opcodes to pick an array index from one of the active SIMD channels. Correct? On 02/20/2015 11:48 AM, Francisco Jerez wrote: The BROADCAST instruction picks the channel from its first source given by an index passed in as second source. This will be used in situations where all channels from the same SIMD thread have to agree on the value of something, e.g. a surface binding table index. --- src/mesa/drivers/dri/i965/brw_defines.h | 6 ++ src/mesa/drivers/dri/i965/brw_eu.h | 6 ++ src/mesa/drivers/dri/i965/brw_eu_emit.c | 77 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 4 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 3 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 ++ 6 files changed, 100 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 17c27dd..d4930e3 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -911,6 +911,12 @@ enum opcode { SHADER_OPCODE_URB_WRITE_SIMD8, + /** +* Pick the channel from its first source register given by the index +* specified as second source. Useful for variable indexing of surfaces. +*/ + SHADER_OPCODE_BROADCAST, + VEC4_OPCODE_MOV_BYTES, VEC4_OPCODE_PACK_BYTES, VEC4_OPCODE_UNPACK_UNIFORM, diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index a94ea42..2505480 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -413,6 +413,12 @@ brw_pixel_interpolator_query(struct brw_compile *p, unsigned msg_length, unsigned response_length); +void +brw_broadcast(struct brw_compile *p, + struct brw_reg dst, + struct brw_reg src, + struct brw_reg idx); + /*** * brw_eu_util.c: */ diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 1d6fd67..d7e3995 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2854,6 +2854,83 @@ brw_pixel_interpolator_query(struct brw_compile *p, brw_inst_set_pi_message_data(brw, insn, data); } +void +brw_broadcast(struct brw_compile *p, + struct brw_reg dst, + struct brw_reg src, + struct brw_reg idx) +{ + const struct brw_context *brw = p-brw; + const bool align1 = (brw_inst_access_mode(brw, p-current) == BRW_ALIGN_1); + brw_inst *inst; + + assert(src.file == BRW_GENERAL_REGISTER_FILE + src.address_mode == BRW_ADDRESS_DIRECT); + + if ((src.vstride == 0 (src.hstride == 0 || !align1)) || + idx.file == BRW_IMMEDIATE_VALUE) { + /* Trivial, the source is already uniform or the index is a constant. + * We will typically not get here if the optimizer is doing its job, but + * asserting would be mean. + */ + const unsigned i = (idx.file == BRW_IMMEDIATE_VALUE ? idx.dw1.ud : 0); + brw_MOV(p, dst, + (align1 ? stride(suboffset(src, i), 0, 1, 0) : + stride(suboffset(src, 4 * i), 0, 4, 1))); + + } else { + if (align1) { + const struct brw_reg addr = +retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD); + const unsigned offset = src.nr * REG_SIZE + src.subnr; + /* Limit in bytes of the signed indirect addressing immediate. */ + const unsigned limit = 512; + + brw_push_insn_state(p); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_set_default_predicate_control(p, BRW_PREDICATE_NONE); + + /* Take into account the component size and horizontal stride. */ + assert(src.vstride == src.hstride + src.width); + brw_SHL(p, addr, vec1(idx), + brw_imm_ud(_mesa_logbase2(type_sz(src.type)) + +
Re: [Mesa-dev] [PATCH 06/16] main: Added entry point for glTransformFeedbackBufferRange
On Mon, Feb 16, 2015 at 6:13 AM, Martin Peres martin.pe...@linux.intel.com wrote: v2: review from Laura Ekstrand - use the refactored code to lookup the objects - improve some error messages - factor out the gl method name computation - better handle the spec differences between the DSA and non-DSA cases - quote the spec a little more v3: review from Laura Ekstrand - use the new name of _mesa_lookup_bufferobj_err - swap the comments around the offset and size checks v4: review from Laura Ekstrand - add more spec quotes - properly fix the comments around the offset and size checks Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 8 +++ src/mesa/main/bufferobj.c | 3 +- src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 98 +- src/mesa/main/transformfeedback.h | 6 +- 5 files changed, 97 insertions(+), 19 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 35d6906..b3c090f 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -20,6 +20,14 @@ param name=buffer type=GLuint / /function + function name=TransformFeedbackBufferRange offset=assign + param name=xfb type=GLuint / + param name=index type=GLuint / + param name=buffer type=GLuint / + param name=offset type=GLintptr / + param name=size type=GLsizeiptr / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 86532ea..7558e17 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -3548,7 +3548,8 @@ _mesa_BindBufferRange(GLenum target, GLuint index, case GL_TRANSFORM_FEEDBACK_BUFFER: _mesa_bind_buffer_range_transform_feedback(ctx, ctx-TransformFeedback.CurrentObject, - index, bufObj, offset, size); + index, bufObj, offset, size, + false); return; case GL_UNIFORM_BUFFER: bind_buffer_range_uniform_buffer(ctx, index, bufObj, offset, size); diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 183755f..87f7d6f 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -957,6 +957,7 @@ const struct function gl_core_functions_possible[] = { /* GL_ARB_direct_state_access */ { glCreateTransformFeedbacks, 45, -1 }, { glTransformFeedbackBufferBase, 45, -1 }, + { glTransformFeedbackBufferRange, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index d932943..2dded21 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -541,7 +541,8 @@ bind_buffer_range(struct gl_context *ctx, /** * Specify a buffer object to receive transform feedback results. Plus, * specify the starting offset to place the results, and max size. - * Called from the glBindBufferRange() function. + * Called from the glBindBufferRange() and glTransformFeedbackBufferRange + * functions. */ void _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, @@ -549,35 +550,74 @@ _mesa_bind_buffer_range_transform_feedback(struct gl_context *ctx, GLuint index, struct gl_buffer_object *bufObj, GLintptr offset, - GLsizeiptr size) + GLsizeiptr size, + bool dsa) { + const char *gl_methd_name; + if (dsa) + gl_methd_name = glTransformFeedbackBufferRange; + else + gl_methd_name = glBindBufferRange; + + if (obj-Active) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glBindBufferRange(transform feedback active)); + _mesa_error(ctx, GL_INVALID_OPERATION, %s(transform feedback active), + gl_methd_name); return; } if (index = ctx-Const.MaxTransformFeedbackBuffers) { - _mesa_error(ctx, GL_INVALID_VALUE, glBindBufferRange(index=%d - out of bounds), index); + /* OpenGL 4.5 core profile, 6.1, pdf page 82: An INVALID_VALUE error is + * generated if index is greater than or equal to the number of binding + * points for transform feedback, as described in section 6.7.1. Again, would be
Re: [Mesa-dev] [PATCH 2/4] common: Correct PBO 2D_ARRAY handling.
This is mostly correct and it's a good solution. The only problem is that it doesn't handle the skipRows packing property properly. This property tells the driver the stride (in rows) between image planes in the data. Most of the places where depth is used below, we should be using the stride (in rows) between images. Look at the _mesa_get_image_stride familiy of functions to see exactly how to handle this. On Fri, Feb 20, 2015 at 4:30 PM, Laura Ekstrand la...@jlekstrand.net wrote: Changes PBO uploads and downloads to use a tall (height * depth) 2D texture for blitting. This fixes the bug where 2D_ARRAY, 3D, and CUBE_MAP_ARRAY textures are not properly uploaded and downloaded. --- src/mesa/drivers/common/meta_tex_subimage.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index f4f7716..ee3295b 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -110,7 +110,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, internal_format = _mesa_get_format_base_format(pbo_format); tex_image = _mesa_get_tex_image(ctx, tex_obj, tex_obj-Target, 0); - _mesa_init_teximage_fields(ctx, tex_image, width, height, depth, + _mesa_init_teximage_fields(ctx, tex_image, width, height * depth, 1, 0, internal_format, pbo_format); read_only = pbo_target == GL_PIXEL_UNPACK_BUFFER; @@ -227,7 +227,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_update_state(ctx); _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, - 0, 0, width, height, + 0, z * height, width, (z + 1) * height, xoffset, yoffset, xoffset + width, yoffset + height, GL_COLOR_BUFFER_BIT, GL_NEAREST); @@ -349,7 +349,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, xoffset, yoffset, xoffset + width, yoffset + height, - 0, 0, width, height, + 0, z * height, width, (z + 1) * height, GL_COLOR_BUFFER_BIT, GL_NEAREST); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] i965: Introduce the BROADCAST pseudo-opcode.
Ian Romanick i...@freedesktop.org writes: I want to be sure I understand what this series is doing and why it's necessary. Sampler arrays and UBO arrays can, with some restrictions, be indexed dynamically. The primary restriction is that any active SIMD channels that access the array must access the same index. When you get to a particular access, not all SIMD channels may be active. This is the essence of what the GLSL spec calls dynamically uniform. The implication is that the compiler is allowed to pick the index from any active SIMD channel. If they're not all the same, the shader author should have followed the rules. Currently i965 picks the index from a SIMD channel that may or may not be active (due to other, non-uniform flow control). Correct? Yeah, the problem is that right now we incorrectly assume that the first channel of the indexing expression is always live, if it's not we end up using garbage for the binding table index. This patch series attempts to fix that by first introducing some opcodes to get a value from one of the active SIMD channels, then using those opcodes to pick an array index from one of the active SIMD channels. Correct? Yeah, that's right. FIND_LIVE_CHANNEL returns the index of an arbitrary live channel and BROADCAST uses indirect register addressing to fetch the value from that channel and write it to the destination. The same opcodes will be used for dynamically uniform indexing of image arrays too. On 02/20/2015 11:48 AM, Francisco Jerez wrote: The BROADCAST instruction picks the channel from its first source given by an index passed in as second source. This will be used in situations where all channels from the same SIMD thread have to agree on the value of something, e.g. a surface binding table index. --- src/mesa/drivers/dri/i965/brw_defines.h | 6 ++ src/mesa/drivers/dri/i965/brw_eu.h | 6 ++ src/mesa/drivers/dri/i965/brw_eu_emit.c | 77 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 4 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 3 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 ++ 6 files changed, 100 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 17c27dd..d4930e3 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -911,6 +911,12 @@ enum opcode { SHADER_OPCODE_URB_WRITE_SIMD8, + /** +* Pick the channel from its first source register given by the index +* specified as second source. Useful for variable indexing of surfaces. +*/ + SHADER_OPCODE_BROADCAST, + VEC4_OPCODE_MOV_BYTES, VEC4_OPCODE_PACK_BYTES, VEC4_OPCODE_UNPACK_UNIFORM, diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index a94ea42..2505480 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -413,6 +413,12 @@ brw_pixel_interpolator_query(struct brw_compile *p, unsigned msg_length, unsigned response_length); +void +brw_broadcast(struct brw_compile *p, + struct brw_reg dst, + struct brw_reg src, + struct brw_reg idx); + /*** * brw_eu_util.c: */ diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 1d6fd67..d7e3995 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2854,6 +2854,83 @@ brw_pixel_interpolator_query(struct brw_compile *p, brw_inst_set_pi_message_data(brw, insn, data); } +void +brw_broadcast(struct brw_compile *p, + struct brw_reg dst, + struct brw_reg src, + struct brw_reg idx) +{ + const struct brw_context *brw = p-brw; + const bool align1 = (brw_inst_access_mode(brw, p-current) == BRW_ALIGN_1); + brw_inst *inst; + + assert(src.file == BRW_GENERAL_REGISTER_FILE + src.address_mode == BRW_ADDRESS_DIRECT); + + if ((src.vstride == 0 (src.hstride == 0 || !align1)) || + idx.file == BRW_IMMEDIATE_VALUE) { + /* Trivial, the source is already uniform or the index is a constant. + * We will typically not get here if the optimizer is doing its job, but + * asserting would be mean. + */ + const unsigned i = (idx.file == BRW_IMMEDIATE_VALUE ? idx.dw1.ud : 0); + brw_MOV(p, dst, + (align1 ? stride(suboffset(src, i), 0, 1, 0) : + stride(suboffset(src, 4 * i), 0, 4, 1))); + + } else { + if (align1) { + const struct brw_reg addr = +retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD); + const unsigned offset = src.nr * REG_SIZE + src.subnr;
Re: [Mesa-dev] [PATCH 05/16] main: Added entry point for glTransformFeedbackBufferBase
You still seem to be confused about the names of internal functions. If a function exists outside of its class, (ie. is extern-ed from a *.h file), it should have the _mesa in front. If a function exists purely inside of a *.c file, it should be static and not have _mesa. Since _mesa_lookup_transform_feedback_object in transformfeedback.h is used outside of transformfeedback.[c|h], it should have the _mesa in front and not be static. In this patch, you renamed it to lookup_transform_feedback_object, which is incorrect. On the other hand, the names that you gave to lookup_transform_feedback_object_err and lookup_transform_feedback_bufferobj_err are just fine because these functions are static and never called outside transformfeedback.c. On Mon, Feb 16, 2015 at 6:13 AM, Martin Peres martin.pe...@linux.intel.com wrote: v2: Review from Laura Ekstrand - give more helpful error messages - factor the lookup code for the xfb and objBuf - replace some already-existing tabs with spaces - add comments to explain the cases where xfb == 0 or buffer == 0 - fix the condition for binding the transform buffer or not v3: Review from Laura Ekstrand - rename _mesa_lookup_bufferobj_err to _mesa_lookup_transform_feedback_bufferobj_err and make it static to avoid a future conflict - make _mesa_lookup_transform_feedback_object_err static v4: Review from Laura Ekstrand - add the pdf page number when quoting the spec - rename some of the symbols to follow the public/private conventions Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 6 + src/mesa/main/bufferobj.c | 9 +- src/mesa/main/objectlabel.c| 2 +- src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 146 +++-- src/mesa/main/transformfeedback.h | 13 ++- src/mesa/vbo/vbo_exec_array.c | 8 +- 7 files changed, 139 insertions(+), 46 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 15b00c2..35d6906 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -14,6 +14,12 @@ param name=ids type=GLuint * / /function + function name=TransformFeedbackBufferBase offset=assign + param name=xfb type=GLuint / + param name=index type=GLuint / + param name=buffer type=GLuint / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 0c1ce98..86532ea 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -3546,8 +3546,9 @@ _mesa_BindBufferRange(GLenum target, GLuint index, switch (target) { case GL_TRANSFORM_FEEDBACK_BUFFER: - _mesa_bind_buffer_range_transform_feedback(ctx, index, bufObj, -offset, size); + _mesa_bind_buffer_range_transform_feedback(ctx, + ctx-TransformFeedback.CurrentObject, + index, bufObj, offset, size); return; case GL_UNIFORM_BUFFER: bind_buffer_range_uniform_buffer(ctx, index, bufObj, offset, size); @@ -3611,7 +3612,9 @@ _mesa_BindBufferBase(GLenum target, GLuint index, GLuint buffer) switch (target) { case GL_TRANSFORM_FEEDBACK_BUFFER: - _mesa_bind_buffer_base_transform_feedback(ctx, index, bufObj); + _mesa_bind_buffer_base_transform_feedback(ctx, + ctx-TransformFeedback.CurrentObject, +index, bufObj, false); return; case GL_UNIFORM_BUFFER: bind_buffer_base_uniform_buffer(ctx, index, bufObj); diff --git a/src/mesa/main/objectlabel.c b/src/mesa/main/objectlabel.c index 78df96b..19a7e59 100644 --- a/src/mesa/main/objectlabel.c +++ b/src/mesa/main/objectlabel.c @@ -170,7 +170,7 @@ get_label_pointer(struct gl_context *ctx, GLenum identifier, GLuint name, case GL_TRANSFORM_FEEDBACK: { struct gl_transform_feedback_object *tfo = -_mesa_lookup_transform_feedback_object(ctx, name); +lookup_transform_feedback_object(ctx, name); if (tfo) labelPtr = tfo-Label; } diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index bf320bf..183755f 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -956,6 +956,7 @@ const struct function gl_core_functions_possible[] = { /* GL_ARB_direct_state_access */ { glCreateTransformFeedbacks, 45, -1 }, + { glTransformFeedbackBufferBase, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1
Re: [Mesa-dev] [PATCH v5] mesa: use fi_type in vertex attribute code
On 02/19/2015 11:42 AM, marius.pre...@intel.com wrote: From: Marius Predut marius.pre...@intel.com For 32-bit builds, floating point operations use x86 FPU registers, not SSE registers. If we're actually storing an integer in a float variable, the value might get modified when written to memory. This patch changes the VBO code to use the fi_type (float/int union) to store/copy vertex attributes. Also, this can improve performance on x86 because moving floats with integer registers instead of FP registers is faster. Neil Roberts review: - include changes on all places that are storing attribute values. - check with and without -O3 compiler flag. Brian Paul review: - use fi_type type instead gl_constant_value type - fix a bunch of nit-picks. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82668 Signed-off-by: Marius Predut marius.pre...@intel.com Looks good. Thanks. Unfortunately, there's a few new compiler warnings after applying the patch. I'll try to fix those up before I push the patch. I'll also do some piglit testing... -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] glCompressedTex(Sub)Image(2,3) on mapped PBOs
Hello, While working on the following dEQP failing tests I ran into an issue that I think deserves clarification from more experienced people: dEQP-GLES3.functional.negative_api.texture.compressedteximage2d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage2d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedteximage3d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d_invalid_buffer_target All above tests fail because a CompressedTex(Sub)Image(2,3)D operation is performed while there is a pixel buffer object bound to GL_PIXEL_UNPACK_BUFFER and that pixel buffer object is currently mapped. The tests expect this situation to throw an GL_INVALID_OPERATION but no error is returned on i965. In the spec GLES 3.1 spec, page 57, there is this: (and similar wording in GL 4.5 spec, page 76) 6.3.2 Effects of Mapping Buffers on Other GL Commands Any GL command which attempts to read from, write to, or change the state of a buffer object may generate an INVALID_OPERATION error if all or part of the buffer object is mapped. However, only commands which explicitly describe this error are required to do so. If an error is not generated, using such commands to perform invalid reads, writes, or state changes will have undefined results and may result in GL interruption or termination. There is no further restriction in gl(Compressed)Tex(Sub)Image(2,3)D that makes throwing an error mandatory, hence I suspect these tests might be wrong. However, I found that glTex(Sub)Image(2,3)D do emits an GL_INVALID_OPERATION error when executed while a mapped PBO is bound. But the check is performed in driver code (as opposed to API entry points). (Concretely, in i965/intel_tex_image.c::intelTexImage() and i965/intel_tex_subimage.c::intelTexSubImage() which both call common/meta_tex_subimage.c::_mesa_meta_pbo_TexSubImage()). So, the question is: Shouldn't these operations on compressed texture images have the same restriction as their non-compressed counter-parts? Or is there a reason why this restriction is enforced only on non-compressed images? Also, shouldn't this restriction be moved to common API entry points, or are there drivers that do write to PBOs while they are mapped? cheers, Eduardo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa (master): tgsi: add support for flt64 constants
On 02/19/2015 03:53 PM, Dave Airlie wrote: Module: Mesa Branch: master Commit: fa43e0443e206740e219d45abefee65bdb2c3ecb URL: https://urldefense.proofpoint.com/v2/url?u=http-3A__cgit.freedesktop.org_mesa_mesa_commit_-3Fid-3Dfa43e0443e206740e219d45abefee65bdb2c3ecbd=AwIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=T0t4QG7chq2ZwJo6wilkFznRSFy-8uDKartPGbomVj8m=wliriAj2Oy285HDmqyh5ciHQpkhmeNqaYMAXn6TDVgos=QfqfqTGO4IS5Ib7ukIrfSIR3EGdcXECY38l_K3GlYloe= Author: Dave Airlie airl...@redhat.com Date: Wed Aug 27 09:56:14 2014 +1000 tgsi: add support for flt64 constants These act like flt32 except they take up two slots, and you can only add 2 x flt64 constants in one slot. The main reason they are different is we don't want to match half a flt64 constants against a flt32 constant in the matching code, we need to make sure we treat both parts of the flt64 as an single structure. Cleaned up printing/parsing by Ilia Mirkin imir...@alum.mit.edu Reviewed-by: Ilia Mirkin imir...@alum.mit.edu Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/auxiliary/tgsi/tgsi_dump.c |8 +++ src/gallium/auxiliary/tgsi/tgsi_parse.c|1 + src/gallium/auxiliary/tgsi/tgsi_strings.c |5 +- src/gallium/auxiliary/tgsi/tgsi_strings.h |2 +- src/gallium/auxiliary/tgsi/tgsi_text.c | 22 src/gallium/auxiliary/tgsi/tgsi_ureg.c | 75 ++-- src/gallium/auxiliary/tgsi/tgsi_ureg.h |5 ++ src/gallium/include/pipe/p_shader_tokens.h |1 + 8 files changed, 113 insertions(+), 6 deletions(-) [...] diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index f965b01..5069d13 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -232,6 +232,24 @@ static boolean parse_float( const char **pcur, float *val ) return TRUE; } +static boolean parse_double( const char **pcur, uint32_t *val0, uint32_t *val1) +{ + const char *cur = *pcur; + union { + double dval; + uint32_t uval[2]; + } v; + + v.dval = strtod(cur, pcur); + if (*pcur == cur) + return FALSE; + + *val0 = v.uval[0]; + *val1 = v.uval[1]; + + return TRUE; +} Hi Dave, I'm seeing a new compiler warning in that code: tgsi/tgsi_text.c: In function 'parse_double': tgsi/tgsi_text.c:243:4: warning: passing argument 2 of 'strtod' from incompatible pointer type [enabled by default] v.dval = strtod(cur, pcur); ^ In file included from ../../../src/gallium/include/pipe/p_compiler.h:36:0, from ./os/os_misc.h:38, from ./util/u_debug.h:42, from tgsi/tgsi_text.c:28: /usr/include/stdlib.h:164:15: note: expected 'char ** restrict' but argument is of type 'const char **' extern double strtod (const char *__restrict __nptr, ^ Can you clean that up? gcc 4.8.2 -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Copy-propagate vecN operations that are actually moves
On Thu, Feb 19, 2015 at 11:03 PM, Matt Turner matts...@gmail.com wrote: On Thu, Feb 19, 2015 at 11:01 PM, Connor Abbott cwabbo...@gmail.com wrote: I agree with Ken that the regressions are small enough, and it seems they're mostly stuff we can prevent by being smarter when doing the sel peephole, so it seems like the cleanup that will probably help other passes is worth it. So, usually we do that as a preparatory patch. Why aren't we doing that here? Do what in a preparatory patch? Fix up the sel peephole to be able to handle if (foo) bar = baz;? Sure, I can put that patch together. NIR instruction counts is not the metric we care about. No, but cleaning things up means that we can do other optimizations better. Also, in each of those cases, the non-ssa NIR code was better it was just less cleanable by the backend. We need to work on that, but I don't think it's an indicator of a problem. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/7] mesa: Add gallium include dirs to more parts of the tree.
On 13/02/15 00:49, Eric Anholt wrote: Jose Fonseca jfons...@vmware.com writes: Thanks for doing this. I appreciate it. I have no objection with the series. I'm happy to see more reuse. We can always move things around later, and it will be much easier when then are less entangled/duplicated. We'll need to update SCons include paths too. If you have a git repos with your series that I can pull from I'll give it a go tomorrow and provide a patch for it. gallium-includes of my tree. Sorry it took this long to come back to on this. I was sick for a couple days then had to rush to catch up. I've tried to build your branch, and it looks like with your v2: Try to patch up the scons bits. update everything just works. Apart of one minor comment (in a seperate email), the series looks good: Reviewed-by: Jose Fonseca jfons...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/7] mesa: Make bitset.h not rely on Mesa-specific types and functions.
On 12/02/15 00:48, Eric Anholt wrote: Note that we can't use u_math.h's align() because it's a function instead of a macro, while BITSET_DECLARE needs a constant expression for nouveau's usage in global declarations. --- src/mesa/main/bitset.h | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/bitset.h b/src/mesa/main/bitset.h index 2558da4..6f9bed9 100644 --- a/src/mesa/main/bitset.h +++ b/src/mesa/main/bitset.h @@ -31,19 +31,18 @@ #ifndef BITSET_H #define BITSET_H -#include imports.h -#include macros.h +#include util/u_math.h / * generic bitset implementation */ -#define BITSET_WORD GLuint +#define BITSET_WORD unsigned int #define BITSET_WORDBITS (sizeof (BITSET_WORD) * 8) /* bitset declarations */ -#define BITSET_WORDS(bits) (ALIGN(bits, BITSET_WORDBITS) / BITSET_WORDBITS) +#define BITSET_WORDS(bits) ((bits + BITSET_WORDBITS - 1) / BITSET_WORDBITS) bits - (bits) just in case... #define BITSET_DECLARE(name, bits) BITSET_WORD name[BITSET_WORDS(bits)] /* bitset operations With it Reviewed-by: Jose Fonseca jfons...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] glCompressedTex(Sub)Image(2,3) on mapped PBOs
On Fri, Feb 20, 2015 at 8:13 AM, Eduardo Lima Mitev el...@igalia.com wrote: Hello, While working on the following dEQP failing tests I ran into an issue that I think deserves clarification from more experienced people: dEQP-GLES3.functional.negative_api.texture.compressedteximage2d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage2d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedteximage3d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d_invalid_buffer_target All above tests fail because a CompressedTex(Sub)Image(2,3)D operation is performed while there is a pixel buffer object bound to GL_PIXEL_UNPACK_BUFFER and that pixel buffer object is currently mapped. The tests expect this situation to throw an GL_INVALID_OPERATION but no error is returned on i965. In the spec GLES 3.1 spec, page 57, there is this: (and similar wording in GL 4.5 spec, page 76) 6.3.2 Effects of Mapping Buffers on Other GL Commands Any GL command which attempts to read from, write to, or change the state of a buffer object may generate an INVALID_OPERATION error if all or part of the buffer object is mapped. However, only commands which explicitly describe this error are required to do so. If an error is not generated, using such commands to perform invalid reads, writes, or state changes will have undefined results and may result in GL interruption or termination. There is no further restriction in gl(Compressed)Tex(Sub)Image(2,3)D that makes throwing an error mandatory, hence I suspect these tests might be wrong. However, I found that glTex(Sub)Image(2,3)D do emits an GL_INVALID_OPERATION error when executed while a mapped PBO is bound. But the check is performed in driver code (as opposed to API entry points). (Concretely, in i965/intel_tex_image.c::intelTexImage() and i965/intel_tex_subimage.c::intelTexSubImage() which both call common/meta_tex_subimage.c::_mesa_meta_pbo_TexSubImage()). So, the question is: Shouldn't these operations on compressed texture images have the same restriction as their non-compressed counter-parts? Or is there a reason why this restriction is enforced only on non-compressed images? Also, shouldn't this restriction be moved to common API entry points, or are there drivers that do write to PBOs while they are mapped? IMHO, that's exactly what we need to do. I've been toying with writing a patch to do just that. Right now, we're leaving checking if the buffer is mapped and checking if it has enough size up to the device-specific driver backend. This is, in general, not how the rest of mesa does error-checking so it seems very odd. Also, it means that if a driver misses the check it causes a GL API failure not just a texture upload error. To make it worse, they all check it by calling the same function inside mesa/main so there's really no excuse. I haven't seen any good reason why we aren't just calling that in the error-checking section of the entry-point. --Jason cheers, Eduardo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. Regarding the other suggestion of just requiring width == exec_size for immediates and uniforms, that seems pretty reasonable to me. I'd like to know what it will do to optimizations, but it seems ok initially. I'm still a bigger fan of just subclassing and stashing everything we need to know up-front. If we do it right, the only things that will actually need to know about the subclass are the function for creating a LOAD_PAYLOAD and lower_load_payloads. --Jason [1] http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html [2] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html [3] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html [4] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076098.html --Jason On Thu, Feb 19, 2015 at 1:53 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Thu, Feb 19, 2015 at 1:25 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Thu, Feb 19, 2015 at 12:13 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 6, 2015 at 4:01 PM, Francisco Jerez curroje...@riseup.net wrote: Hey Matt, Matt Turner matts...@gmail.com writes: On Fri, Feb 6, 2015 at 6:42 AM, Francisco Jerez curroje...@riseup.net wrote: MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. --- The function only seems to test inst-dst.file == MRF. I don't see any code for handling MRF sources. What am I missing? That test is for handling MRF sources -- More precisely, it's collecting the writemask and half flags for MRF writes, which can only possibly be useful if we're going to use them later on to read something
Re: [Mesa-dev] [PATCH] nir: Copy-propagate vecN operations that are actually moves
On Fri, Feb 20, 2015 at 9:23 AM, Jason Ekstrand ja...@jlekstrand.net wrote: On Thu, Feb 19, 2015 at 11:03 PM, Matt Turner matts...@gmail.com wrote: On Thu, Feb 19, 2015 at 11:01 PM, Connor Abbott cwabbo...@gmail.com wrote: I agree with Ken that the regressions are small enough, and it seems they're mostly stuff we can prevent by being smarter when doing the sel peephole, so it seems like the cleanup that will probably help other passes is worth it. So, usually we do that as a preparatory patch. Why aren't we doing that here? Do what in a preparatory patch? Fix up the sel peephole to be able to handle if (foo) bar = baz;? Sure, I can put that patch together. I thought that this would get turned into a conditional select as a side-effect of going out of SSA anyway (assuming baz was unconditionally set before the if)? Are these shaders not setting baz unconditionally? NIR instruction counts is not the metric we care about. No, but cleaning things up means that we can do other optimizations better. Also, in each of those cases, the non-ssa NIR code was better it was just less cleanable by the backend. We need to work on that, but I don't think it's an indicator of a problem. --Jason Okay, that seems fine. I wouldn't bother updating the SEL peephole. It sounds like a number of the regressions we have are related to it not handling something so we might need to do a larger evaluation. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] nir: Add a couple of simplifications of csel operations.
Aside from one minor suggestion for patch 2, this series is Reviewed-by: Connor Abbott cwabbo...@gmail.com On Fri, Feb 20, 2015 at 3:28 PM, Matt Turner matts...@gmail.com wrote: The series is Reviewed-by: Matt Turner matts...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 89260] macros.h:34:25: fatal error: util/u_math.h: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=89260 Bug ID: 89260 Summary: macros.h:34:25: fatal error: util/u_math.h: No such file or directory Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: fabio@libero.it QA Contact: mesa-dev@lists.freedesktop.org CC: e...@anholt.net After changes in the latest 12h osmesa fails to build: CC osmesa.lo In file included from ../../../../../../src/mesa/drivers/osmesa/osmesa.c:44:0: ../../../../../../src/mesa/main/macros.h:34:25: fatal error: util/u_math.h: No such file or directory #include util/u_math.h ^ compilation terminated. Full log including configure options here: https://launchpadlibrarian.net/198345895/buildlog_ubuntu-vivid-i386.mesa_10.6~git1502210730.9fe818~gd~v_BUILDING.txt.gz -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/skl: Use 1 register for uniform pull constant payload
On Fri, Feb 20, 2015 at 03:34:21AM -0800, Kenneth Graunke wrote: On Thursday, February 19, 2015 10:48:08 PM Ben Widawsky wrote: When under dispatch_width=16 the previous code would allocate 2 registers for the payload when only one is needed. This manifested itself through bugs on SKL which needs to mess with this instruction. Ken though this might impact shader-db, but apparently it doesn't Cc: Kenneth Graunke kenn...@whitecape.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999 Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c46e1d7..24125cc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3062,7 +3062,7 @@ fs_visitor::lower_uniform_pull_constant_loads() assert(const_offset_reg.file == IMM const_offset_reg.type == BRW_REGISTER_TYPE_UD); const_offset_reg.fixed_hw_reg.dw1.ud /= 4; - fs_reg payload = vgrf(glsl_type::uint_type); + fs_reg payload = fs_reg(GRF, alloc.allocate(1)); /* We have to use a message header on Skylake to get SIMD4x2 mode. * Reserve space for the register. Reviewed-by: Kenneth Graunke kenn...@whitecape.org Given that this seems to be required to get SKL stable, do I want to Cc 1.5 too? -- Ben Widawsky, Intel Open Source Technology Center ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/16] main: fix the validation of the number of samples
Please provide a page number and a section title in your spec comment. Thanks. On Mon, Feb 16, 2015 at 6:13 AM, Martin Peres martin.pe...@linux.intel.com wrote: Maybe this should be the job of the dispatch layer. Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mesa/main/multisample.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/main/multisample.c b/src/mesa/main/multisample.c index 1f3fa0c..a0a659b 100644 --- a/src/mesa/main/multisample.c +++ b/src/mesa/main/multisample.c @@ -150,6 +150,15 @@ GLenum _mesa_check_sample_count(struct gl_context *ctx, GLenum target, GLenum internalFormat, GLsizei samples) { + /* From the OpenGL core 3.0 spec, section 2.5: +* +* If a negative number is provided where an argument of type sizei or +* sizeiptr is specified, the error INVALID VALUE is generated. +*/ + if (samples 0) { + return GL_INVALID_VALUE; + } + /* If ARB_internalformat_query is supported, then treat its highest * returned sample count as the absolute maximum for this format; it is * allowed to exceed MAX_SAMPLES. -- 2.3.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Copy-propagate vecN operations that are actually moves
On Feb 20, 2015 9:27 AM, Matt Turner matts...@gmail.com wrote: On Fri, Feb 20, 2015 at 9:23 AM, Jason Ekstrand ja...@jlekstrand.net wrote: On Thu, Feb 19, 2015 at 11:03 PM, Matt Turner matts...@gmail.com wrote: On Thu, Feb 19, 2015 at 11:01 PM, Connor Abbott cwabbo...@gmail.com wrote: I agree with Ken that the regressions are small enough, and it seems they're mostly stuff we can prevent by being smarter when doing the sel peephole, so it seems like the cleanup that will probably help other passes is worth it. So, usually we do that as a preparatory patch. Why aren't we doing that here? Do what in a preparatory patch? Fix up the sel peephole to be able to handle if (foo) bar = baz;? Sure, I can put that patch together. I thought that this would get turned into a conditional select as a side-effect of going out of SSA anyway (assuming baz was unconditionally set before the if)? Are these shaders not setting baz unconditionally? The NIR peephole only triggers if the only instructions in the if or else are moves. In this particular case, it was a uniform load. Before the change, it was lowered to moves in both sides of the if and, since it ended up being a push constant, got peepholed. After the change, it was lowered to moves on only one side of the if (which is theoretically better) and the select didn't trigger. NIR instruction counts is not the metric we care about. No, but cleaning things up means that we can do other optimizations better. Also, in each of those cases, the non-ssa NIR code was better it was just less cleanable by the backend. We need to work on that, but I don't think it's an indicator of a problem. --Jason Okay, that seems fine. I wouldn't bother updating the SEL peephole. It sounds like a number of the regressions we have are related to it not handling something so we might need to do a larger evaluation. Yeah, it just needs to handle one-sided it's better. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/23] main: Refactor ClearBuffer[Sub]Data.
This naming convention tries to match the corresponding DD table entry. There's a thread discussing the naming convention for external software fallback functions. Feel free to add your input to the discussion there :) On Fri, Feb 20, 2015 at 6:18 AM, Martin Peres martin.pe...@free.fr wrote: On 12/02/2015 04:05, Laura Ekstrand wrote: v2: review by Jason Ekstrand - Split refactor of clear buffer sub data from addition of DSA entry points. --- src/mesa/main/bufferobj.c| 125 --- src/mesa/main/bufferobj.h| 19 ++-- src/mesa/state_tracker/st_cb_bufferobjects.c | 4 +- 3 files changed, 69 insertions(+), 79 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 7225b64..b8fa917 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -660,11 +660,11 @@ _mesa_buffer_get_subdata( struct gl_context *ctx, GLintptrARB offset, * dd_function_table::ClearBufferSubData. */ void -_mesa_buffer_clear_subdata(struct gl_context *ctx, - GLintptr offset, GLsizeiptr size, - const GLvoid *clearValue, - GLsizeiptr clearValueSize, - struct gl_buffer_object *bufObj) +_mesa_ClearBufferSubData_sw(struct gl_context *ctx, Interesting choice of naming the function as a mix of camel case and underscores. Since it is an internal function, shouldn't it only have underscores? Other than that: Reviewed-by: Martin Peres martin.pe...@linux.intel.com +GLintptr offset, GLsizeiptr size, +const GLvoid *clearValue, +GLsizeiptr clearValueSize, +struct gl_buffer_object *bufObj) { GLsizeiptr i; GLubyte *dest; @@ -1113,7 +1113,7 @@ _mesa_init_buffer_object_functions(struct dd_function_table *driver) driver-UnmapBuffer = _mesa_buffer_unmap; /* GL_ARB_clear_buffer_object */ - driver-ClearBufferSubData = _mesa_buffer_clear_subdata; + driver-ClearBufferSubData = _mesa_ClearBufferSubData_sw; /* GL_ARB_map_buffer_range */ driver-MapBufferRange = _mesa_buffer_map_range; @@ -1671,57 +1671,77 @@ _mesa_GetBufferSubData(GLenum target, GLintptr offset, ctx-Driver.GetBufferSubData( ctx, offset, size, data, bufObj ); } - -void GLAPIENTRY -_mesa_ClearBufferData(GLenum target, GLenum internalformat, GLenum format, - GLenum type, const GLvoid* data) +/** + * \param subdata true if caller is *SubData, false if *Data + */ +void +_mesa_clear_buffer_sub_data(struct gl_context *ctx, +struct gl_buffer_object *bufObj, +GLenum internalformat, +GLintptr offset, GLsizeiptr size, +GLenum format, GLenum type, +const GLvoid *data, +const char *func, bool subdata) { - GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object* bufObj; mesa_format mesaFormat; GLubyte clearValue[MAX_PIXEL_BYTES]; GLsizeiptr clearValueSize; - bufObj = get_buffer(ctx, glClearBufferData, target, GL_INVALID_VALUE); - if (!bufObj) { - return; - } - - if (_mesa_check_disallowed_mapping(bufObj)) { - _mesa_error(ctx, GL_INVALID_OPERATION, - glClearBufferData(buffer currently mapped)); + /* This checks for disallowed mappings. */ + if (!buffer_object_subdata_range_good(ctx, bufObj, offset, size, + subdata, func)) { return; } mesaFormat = validate_clear_buffer_format(ctx, internalformat, - format, type, - glClearBufferData); + format, type, func); + if (mesaFormat == MESA_FORMAT_NONE) { return; } clearValueSize = _mesa_get_format_bytes(mesaFormat); - if (bufObj-Size % clearValueSize != 0) { + if (offset % clearValueSize != 0 || size % clearValueSize != 0) { _mesa_error(ctx, GL_INVALID_VALUE, - glClearBufferData(size is not a multiple of - internalformat size)); + %s(offset or size is not a multiple of + internalformat size), func); return; } if (data == NULL) { /* clear to zeros, per the spec */ - ctx-Driver.ClearBufferSubData(ctx, 0, bufObj-Size, - NULL, clearValueSize, bufObj); + if (size 0) { + ctx-Driver.ClearBufferSubData(ctx, offset, size, +NULL, clearValueSize, bufObj); + } return; } if
Re: [Mesa-dev] [PATCH] svga: add missing padding to SVGA3dSize
On Fri, Feb 20, 2015 at 09:22:20AM +, Van Der Wath, DanielX J wrote: From: Daniel van der Wath danielx.j.van.der.w...@intel.com The kernel side equivalent of struct SVGA3dSize (struct drm_vmw_size) has an extra padding word that SVGA3dSize lacks. This was causing data to be written past the end of size in vmw_drm_surface_from_handle(), corrupting other data The drm_vmw_* types are used to exchange data with the VMW DRM so none of the DRM IOCTL functions, e.g. drmCommandWriteRead(), should be using the SVGA3d* types. In vmw_drm_surface_from_handle(), size is of type struct drm_vmw_size, and is being used here: rep-size_addr = (unsigned long)size, to call drmCommandWriteRead(). So there shouldn't be a user/kernel mismatch here. At which point during this function do you see handle being over written? Also, which version of MESA are you seeing this on? I'll see if I can see this on my end. and in this case leading to Weston being unable to render anything on screen. --- src/gallium/drivers/svga/include/svga3d_types.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/svga/include/svga3d_types.h b/src/gallium/drivers/svga/include/svga3d_types.h index fc4a6b9..3ce6814 100644 --- a/src/gallium/drivers/svga/include/svga3d_types.h +++ b/src/gallium/drivers/svga/include/svga3d_types.h @@ -1280,6 +1280,7 @@ struct { uint32 width; uint32 height; uint32 depth; + uint32 pad64; } #include vmware_pack_end.h SVGA3dSize; -- 1.7.11.7 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 23/23] main: Cosmetic changes to GetBufferSubData.
Ian has asked that this not be squashed. In fact, they were the same commit last time around on the ML. On Fri, Feb 20, 2015 at 12:29 AM, Martin Peres martin.pe...@free.fr wrote: Please squash this commit with the one introducing GetBufferSubData. It should be pretty easy to do with git rebase -i :) On 12/02/2015 04:06, Laura Ekstrand wrote: --- src/mesa/main/bufferobj.c | 2 +- src/mesa/main/bufferobj.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 0272704..38d8b5a 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -1668,7 +1668,7 @@ _mesa_GetBufferSubData(GLenum target, GLintptr offset, } ASSERT(ctx-Driver.GetBufferSubData); - ctx-Driver.GetBufferSubData( ctx, offset, size, data, bufObj ); + ctx-Driver.GetBufferSubData(ctx, offset, size, data, bufObj); } void GLAPIENTRY diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h index feeaa8b..b5d73ae 100644 --- a/src/mesa/main/bufferobj.h +++ b/src/mesa/main/bufferobj.h @@ -230,8 +230,8 @@ _mesa_NamedBufferSubData(GLuint buffer, GLintptr offset, GLsizeiptr size, const GLvoid *data); void GLAPIENTRY -_mesa_GetBufferSubData(GLenum target, GLintptrARB offset, - GLsizeiptrARB size, void * data); +_mesa_GetBufferSubData(GLenum target, GLintptr offset, + GLsizeiptr size, GLvoid *data); void GLAPIENTRY _mesa_GetNamedBufferSubData(GLuint buffer, GLintptr offset, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. A reasonable compromise might be to leave it up to the caller whether to set the force_writemask_all and force_sechalf flags or not. I.e. just copy the same flags set on the LOAD_PAYLOAD instruction to the MOV instructions. That would still allow reducing the liveness intervals in cases where we know that the payload respects channel boundaries. Tracking the flag information per-register in cases where the same payload has well- and ill-behaved values seems rather premature and rather useless to me because the optimizer is likely to be able to get rid of redundant copies with force_writemask_all -- register coalesce is doing this already AFAIK, maybe by accident. Regarding the other suggestion of just requiring width == exec_size for immediates and uniforms, that seems pretty reasonable to me. I'd like to know what it will do to optimizations, but it seems ok initially. I'm still a bigger fan of just subclassing and stashing everything we need to know up-front. If we do it right, the only things that will actually need to know about the subclass are the function for creating a LOAD_PAYLOAD and lower_load_payloads. --Jason [1] http://lists.freedesktop.org/archives/mesa-dev/2015-January/074614.html [2] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076094.html [3] http://lists.freedesktop.org/archives/mesa-dev/2015-February/076097.html [4]
Re: [Mesa-dev] [PATCH 04/16] main: Added entry point for glCreateTransformFeedbacks
Looks good to me Reviewed-by: Laura Ekstrand la...@jlekstrand.net On Mon, Feb 16, 2015 at 6:13 AM, Martin Peres martin.pe...@linux.intel.com wrote: v2: Review from Laura Ekstrand - generate the name of the gl method once - shorten some lines to stay in the 78 chars limit v3: Review from Fredrik Höglund fred...@kde.org - rename gl_mthd_name to func - set EverBound in _mesa_create_transform_feedbacks in the dsa case v4: - rename _mesa_create_transform_feedbacks to create_transform_feedbacks and make it static Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mapi/glapi/gen/ARB_direct_state_access.xml | 7 +++ src/mesa/main/tests/dispatch_sanity.cpp| 1 + src/mesa/main/transformfeedback.c | 67 -- src/mesa/main/transformfeedback.h | 3 ++ 4 files changed, 63 insertions(+), 15 deletions(-) diff --git a/src/mapi/glapi/gen/ARB_direct_state_access.xml b/src/mapi/glapi/gen/ARB_direct_state_access.xml index 2fe1638..15b00c2 100644 --- a/src/mapi/glapi/gen/ARB_direct_state_access.xml +++ b/src/mapi/glapi/gen/ARB_direct_state_access.xml @@ -7,6 +7,13 @@ enum name=QUERY_TARGETvalue=0x82EA/ enum name=TEXTURE_BINDING value=0x82EB/ + !-- Transform Feedback object functions -- + + function name=CreateTransformFeedbacks offset=assign + param name=n type=GLsizei / + param name=ids type=GLuint * / + /function + !-- Texture object functions -- function name=CreateTextures offset=assign diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index 1f1a3a8..bf320bf 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -955,6 +955,7 @@ const struct function gl_core_functions_possible[] = { { glClipControl, 45, -1 }, /* GL_ARB_direct_state_access */ + { glCreateTransformFeedbacks, 45, -1 }, { glCreateTextures, 45, -1 }, { glTextureStorage1D, 45, -1 }, { glTextureStorage2D, 45, -1 }, diff --git a/src/mesa/main/transformfeedback.c b/src/mesa/main/transformfeedback.c index a737463..10bb6a1 100644 --- a/src/mesa/main/transformfeedback.c +++ b/src/mesa/main/transformfeedback.c @@ -825,25 +825,24 @@ _mesa_lookup_transform_feedback_object(struct gl_context *ctx, GLuint name) _mesa_HashLookup(ctx-TransformFeedback.Objects, name); } - -/** - * Create new transform feedback objects. Transform feedback objects - * encapsulate the state related to transform feedback to allow quickly - * switching state (and drawing the results, below). - * Part of GL_ARB_transform_feedback2. - */ -void GLAPIENTRY -_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +static void +create_transform_feedbacks(struct gl_context *ctx, GLsizei n, GLuint *ids, + bool dsa) { GLuint first; - GET_CURRENT_CONTEXT(ctx); + const char* func; + + if (dsa) + func = glCreateTransformFeedbacks; + else + func = glGenTransformFeedbacks; if (n 0) { - _mesa_error(ctx, GL_INVALID_VALUE, glGenTransformFeedbacks(n 0)); + _mesa_error(ctx, GL_INVALID_VALUE, %s(n 0), func); return; } - if (!names) + if (!ids) return; /* we don't need contiguous IDs, but this might be faster */ @@ -854,18 +853,56 @@ _mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) struct gl_transform_feedback_object *obj = ctx-Driver.NewTransformFeedback(ctx, first + i); if (!obj) { -_mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); +_mesa_error(ctx, GL_OUT_OF_MEMORY, %s, func); return; } - names[i] = first + i; + ids[i] = first + i; _mesa_HashInsert(ctx-TransformFeedback.Objects, first + i, obj); + if (dsa) { +/* this is normally done at bind time in the non-dsa case */ +obj-EverBound = GL_TRUE; + } } } else { - _mesa_error(ctx, GL_OUT_OF_MEMORY, glGenTransformFeedbacks); + _mesa_error(ctx, GL_OUT_OF_MEMORY, %s, func); } } +/** + * Create new transform feedback objects. Transform feedback objects + * encapsulate the state related to transform feedback to allow quickly + * switching state (and drawing the results, below). + * Part of GL_ARB_transform_feedback2. + */ +void GLAPIENTRY +_mesa_GenTransformFeedbacks(GLsizei n, GLuint *names) +{ + GET_CURRENT_CONTEXT(ctx); + + /* GenTransformFeedbacks should just reserve the object names that a +* subsequent call to BindTransformFeedback should actively create. For +* the sake of simplicity, we reserve the names and create the objects +* straight away. +*/ + + create_transform_feedbacks(ctx, n, names, false); +} + +/** + * Create new transform feedback objects. Transform feedback objects + *
Re: [Mesa-dev] [PATCH] i965/skl: Use 1 register for uniform pull constant payload
On Friday, February 20, 2015 10:33:18 AM Ben Widawsky wrote: On Fri, Feb 20, 2015 at 03:34:21AM -0800, Kenneth Graunke wrote: On Thursday, February 19, 2015 10:48:08 PM Ben Widawsky wrote: When under dispatch_width=16 the previous code would allocate 2 registers for the payload when only one is needed. This manifested itself through bugs on SKL which needs to mess with this instruction. Ken though this might impact shader-db, but apparently it doesn't Cc: Kenneth Graunke kenn...@whitecape.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999 Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c46e1d7..24125cc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3062,7 +3062,7 @@ fs_visitor::lower_uniform_pull_constant_loads() assert(const_offset_reg.file == IMM const_offset_reg.type == BRW_REGISTER_TYPE_UD); const_offset_reg.fixed_hw_reg.dw1.ud /= 4; - fs_reg payload = vgrf(glsl_type::uint_type); + fs_reg payload = fs_reg(GRF, alloc.allocate(1)); /* We have to use a message header on Skylake to get SIMD4x2 mode. * Reserve space for the register. Reviewed-by: Kenneth Graunke kenn...@whitecape.org Given that this seems to be required to get SKL stable, do I want to Cc 1.5 too? That seems reasonable to me. It likely won't apply (or build), since I don't think 10.5 has alloc.allocate(). To be nice to Emil, I'm guessing the right thing to do is probably to cherry-pick it to 10.5, change alloc.allocate(1) to virtual_grf_alloc(1), and send that patch to mesa-sta...@lists.freedesktop.org marking it as a backport. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/16] main: fix the validation of the number of samples
On Mon, Feb 16, 2015 at 9:13 AM, Martin Peres martin.pe...@linux.intel.com wrote: Maybe this should be the job of the dispatch layer. Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mesa/main/multisample.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/mesa/main/multisample.c b/src/mesa/main/multisample.c index 1f3fa0c..a0a659b 100644 --- a/src/mesa/main/multisample.c +++ b/src/mesa/main/multisample.c @@ -150,6 +150,15 @@ GLenum _mesa_check_sample_count(struct gl_context *ctx, GLenum target, GLenum internalFormat, GLsizei samples) { + /* From the OpenGL core 3.0 spec, section 2.5: I think this is a bit of a contradiction... there is no OpenGL 3.0 core spec. Core didn't become a thing until 3.1 afaik. +* +* If a negative number is provided where an argument of type sizei or +* sizeiptr is specified, the error INVALID VALUE is generated. +*/ + if (samples 0) { + return GL_INVALID_VALUE; + } + /* If ARB_internalformat_query is supported, then treat its highest * returned sample count as the absolute maximum for this format; it is * allowed to exceed MAX_SAMPLES. -- 2.3.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber correctly). We'd rather not have to squash everything if we can. All that being said, our liveness/register allocation can't handle this and making register allocation handle parts of registers interfering but not other bits is going to be a real pain. Maybe not even worth it. If you'd rather force_writemask_all everything, I'll sign off on that for now. I just wanted to point out that it may not be a good permanent solution. A reasonable compromise might be to leave it up to the caller whether to set the force_writemask_all and force_sechalf flags or not. I.e. just copy the same flags set on the LOAD_PAYLOAD instruction to the MOV instructions. That would still allow reducing the liveness intervals in cases where we know that the payload respects channel boundaries. Tracking the flag information per-register in cases where the same payload has well- and ill-behaved values seems rather premature and rather useless to me because the optimizer is likely to be able to get rid of redundant copies with force_writemask_all -- register coalesce is
Re: [Mesa-dev] glCompressedTex(Sub)Image(2,3) on mapped PBOs
On 02/20/2015 08:13 AM, Eduardo Lima Mitev wrote: Hello, While working on the following dEQP failing tests I ran into an issue that I think deserves clarification from more experienced people: dEQP-GLES3.functional.negative_api.texture.compressedteximage2d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage2d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedteximage3d_invalid_buffer_target dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d_invalid_buffer_target All above tests fail because a CompressedTex(Sub)Image(2,3)D operation is performed while there is a pixel buffer object bound to GL_PIXEL_UNPACK_BUFFER and that pixel buffer object is currently mapped. The tests expect this situation to throw an GL_INVALID_OPERATION but no error is returned on i965. In the spec GLES 3.1 spec, page 57, there is this: (and similar wording in GL 4.5 spec, page 76) 6.3.2 Effects of Mapping Buffers on Other GL Commands Any GL command which attempts to read from, write to, or change the state of a buffer object may generate an INVALID_OPERATION error if all or part of the buffer object is mapped. However, only commands which explicitly describe this error are required to do so. If an error is not generated, using such commands to perform invalid reads, writes, or state changes will have undefined results and may result in GL interruption or termination. This language is intentionally loose. It is often perfectly safe to have a portion of a buffer mapped while a GL command operates on a different part of the buffer. Determining whether a particular operation is safe can be expensive. Imagine trying to determine whether a glDrawElements command will source data from a mapped region of a vertex buffer object. Many vendors objected to having to put expensive (or cheap) checks in performance critical paths. I don't think an of the TexSubImage commands are performance critical enough to warrant excluding the test. I believe some benchmarks use some of the TexSubImage commands, so it's probably worth measuring them on, say, Baytrail with and without the checks. My expectation is that there's enough CPU wasted in those paths that the checks will be lost in the noise. We won't know until we measure. There is no further restriction in gl(Compressed)Tex(Sub)Image(2,3)D that makes throwing an error mandatory, hence I suspect these tests might be wrong. I'm also inclined to think they are wrong... or at least overly strict. Do you know of any commands that require this error be generated? However, I found that glTex(Sub)Image(2,3)D do emits an GL_INVALID_OPERATION error when executed while a mapped PBO is bound. But the check is performed in driver code (as opposed to API entry points). (Concretely, in i965/intel_tex_image.c::intelTexImage() and i965/intel_tex_subimage.c::intelTexSubImage() which both call common/meta_tex_subimage.c::_mesa_meta_pbo_TexSubImage()). So, the question is: Shouldn't these operations on compressed texture images have the same restriction as their non-compressed counter-parts? Or is there a reason why this restriction is enforced only on non-compressed images? Also, shouldn't this restriction be moved to common API entry points, or are there drivers that do write to PBOs while they are mapped? Looking at that function, I find it odd that either of the errors it generates are generated there. We should either do these checks in a single, common place, or we should not do them at all. cheers, Eduardo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/7] i965/fs: Less broken handling of force_writemask_all in lower_load_payload().
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Sat, Jan 17, 2015 at 6:04 PM, Francisco Jerez curroje...@riseup.net wrote: It's perfectly fine to read the second half of a register written with force_writemask_all from a first half MOV instruction or vice versa, and lower_load_payload shouldn't mark the whole MOV as belonging to the second half in that case. Replicate the same metadata to both halves of the destination when writemasking is disabled. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 4a61943..d585a67 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3059,9 +3059,11 @@ fs_visitor::lower_load_payload() } if (inst-dst.file == MRF || inst-dst.file == GRF) { - bool force_sechalf = inst-force_sechalf; + bool force_sechalf = inst-force_sechalf + !inst-force_writemask_all; bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4; + type_sz(inst-dst.type) == 4 + !inst-force_writemask_all; for (int i = 0; i inst-regs_written; ++i) { metadata[dst_reg + i].written = true; metadata[dst_reg + i].force_sechalf = force_sechalf; @@ -3104,11 +3106,15 @@ fs_visitor::lower_load_payload() mov-force_writemask_all = metadata[src_reg].force_writemask_all; metadata[dst_reg] = metadata[src_reg]; if (dst.width * type_sz(dst.type) 32) { - assert((!metadata[src_reg].written || - !metadata[src_reg].force_sechalf) -(!metadata[src_reg + 1].written || - metadata[src_reg + 1].force_sechalf)); - metadata[dst_reg + 1] = metadata[src_reg + 1]; + if (metadata[src_reg].force_writemask_all) { +metadata[dst_reg + 1] = metadata[src_reg]; + } else { +assert((!metadata[src_reg].written || +!metadata[src_reg].force_sechalf) + (!metadata[src_reg + 1].written || +metadata[src_reg + 1].force_sechalf)); +metadata[dst_reg + 1] = metadata[src_reg + 1]; + } } } else { metadata[dst_reg].force_writemask_all = false; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/32] i965/fs: Fix lower_load_payload() not to use an incorrect half for immediates and uniforms.
Yeah... More proof that the lower_load_payload code is plenty bogus... Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Fri, Feb 6, 2015 at 9:42 AM, Francisco Jerez curroje...@riseup.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 8da1f47..e2ebf7e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3116,6 +3116,14 @@ fs_visitor::lower_load_payload() inst-src[i].reg_offset; mov-force_sechalf = metadata[src_reg].force_sechalf; mov-force_writemask_all = metadata[src_reg].force_writemask_all; + } else { + /* We don't have any useful metadata for immediates or + * uniforms. Assume that any of the channels of the + * destination may be used. + */ + assert(inst-src[i].file == IMM || + inst-src[i].file == UNIFORM); + mov-force_writemask_all = true; } if (dst.file == GRF) { -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] vbo: fix an unitialized-variable warning
From: Marek Olšák marek.ol...@amd.com It looks like a bug to me. Cc: 10.5 10.4 10.3 mesa-sta...@lists.freedesktop.org --- src/mesa/vbo/vbo_attrib_tmp.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/vbo/vbo_attrib_tmp.h b/src/mesa/vbo/vbo_attrib_tmp.h index ec66934..0c44540 100644 --- a/src/mesa/vbo/vbo_attrib_tmp.h +++ b/src/mesa/vbo/vbo_attrib_tmp.h @@ -210,6 +210,7 @@ static inline float conv_i2_to_norm_float(const struct gl_context *ctx, int i2) }\ } else if ((type) == GL_UNSIGNED_INT_10F_11F_11F_REV) { \ float res[4];\ + res[3] = 1; \ r11g11b10f_to_float3((arg), res);\ ATTR##val##FV((attr), res); \ } else \ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
One more comment here... This particularly regards your plan of separating it into things that match the destination and other things and not copy prop uniforms or immediates into the other things. There is another case we need to handle. On older gens (SNB maybe?) the SIMD16 FB write messages are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0, r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect SIMD16 load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On those platforms, there is a special type of MOV-to-MRF that can move from (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right now. However, it doesn't follow the standard rules. --Jason On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber correctly). We'd rather not have to squash everything if we can. All that being said, our liveness/register allocation can't handle this and making register allocation handle parts of registers interfering but not other bits is going to be a real pain.
[Mesa-dev] [Bug 67676] Transparent windows no longer work
https://bugs.freedesktop.org/show_bug.cgi?id=67676 Chad Versace chad.vers...@intel.com changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|mesa-dev@lists.freedesktop. |jonny.l...@collabora.co.uk |org | --- Comment #14 from Chad Versace chad.vers...@intel.com --- n reply to Jonny Lamb from comment #13) Created attachment 113563 [details] version three Comments: - Good call on specifying pre-multiplied alpha. - I think you should remove the paragraph To determine if the EGL implementation supports this extension, clients should query the EGL_EXTENSIONS string of the current active display. because that's the normal way of querying extensions and needs no explanation. I found that paragraph confusing because it made me ask Why is the extension telling me this? Is there some magic in this extension that I've failed to see that warrants this explanation?. The platform extension specification that you used as a template explicitly describes how to perform the extension query because its query mechanism differs from 95% of existing EGL extensions. - You should remove the New Behavior section. Normally, the fine details of an extension specification are defined in the Additions to the EGL x.y specification section as a diff against the EGL x.y spec. This extension is no different. A few, special extensions, though, define behavior that should *not* be added to the EGL spec, such as the platform extensions. That's what the New Behavior section should be used for, and this alpha extension defines no such outside-of-the-spec behavior. - The paragraph To obtain an EGL window surface with a meaningful alpha channel, use an EGLConfig with EGL_TRANSPARENT_TYPE set to EGL_TRANSPARENT_ALPHA_MESA. is still valuable, despite belonging in the New Behavior section. The proper location for it is the Overview section. - Regarding issue #1: RESOLUTION: Yes. Non-window surfaces created with EGL_TRANSPARENT_TYPE set to EGL_TRANSPARENT_ALPHA_MESA will generate an error. What error does it generate? If we refer to this paragraph of Section 3.5.1 as precedent, If config does not support rendering to windows (the EGL_SURFACE_TYPE attribute does not contain EGL_WINDOW_BIT ), an EGL_BAD_MATCH error is generated. then the resolution text can be clarified as: RESOLUTION: Attempting to create a non-window surface with a config in which EGL_TRANSPARENT_TYPE is EGL_TRANSPARENT_ALPHA_MESA, then an EGL_BAD_MATCH error is generated. - Other than the above (mostly sylistic) issues, the extension looks good to me. Thanks for writing it! When you submit to mesa-dev, please CC me. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] glsl: add double support for packing varyings
Doubles are always packed, but a single double will never cross a slot boundary -- single slots can still be wasted in some situations. Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- Packing *all* doubles is a bit of a hack... we shouldn't need to do it for dvec2/dvec4. And yet, here we are. This means that setting explicit locations on doubles is also unlikely to go well. But this should be enough to get it mostly going. v1 - v2: - split out variables into a separate list so that it can work for GS src/glsl/lower_packed_varyings.cpp | 117 - 1 file changed, 90 insertions(+), 27 deletions(-) diff --git a/src/glsl/lower_packed_varyings.cpp b/src/glsl/lower_packed_varyings.cpp index 5e844c7..2c9a1c4 100644 --- a/src/glsl/lower_packed_varyings.cpp +++ b/src/glsl/lower_packed_varyings.cpp @@ -146,7 +146,11 @@ #include glsl_symbol_table.h #include ir.h +#include ir_builder.h #include ir_optimization.h +#include program/prog_instruction.h + +using namespace ir_builder; namespace { @@ -163,13 +167,14 @@ public: lower_packed_varyings_visitor(void *mem_ctx, unsigned locations_used, ir_variable_mode mode, unsigned gs_input_vertices, - exec_list *out_instructions); + exec_list *out_instructions, + exec_list *out_variables); void run(exec_list *instructions); private: - ir_assignment *bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs); - ir_assignment *bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs); + void bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs); + void bitwise_assign_unpack(ir_rvalue *lhs, ir_rvalue *rhs); unsigned lower_rvalue(ir_rvalue *rvalue, unsigned fine_location, ir_variable *unpacked_var, const char *name, bool gs_input_toplevel, unsigned vertex_index); @@ -221,13 +226,19 @@ private: * appropriate place in the shader once the visitor has finished running. */ exec_list *out_instructions; + + /** +* Exec list into which the visitor should insert any new variables. +*/ + exec_list *out_variables; }; } /* anonymous namespace */ lower_packed_varyings_visitor::lower_packed_varyings_visitor( void *mem_ctx, unsigned locations_used, ir_variable_mode mode, - unsigned gs_input_vertices, exec_list *out_instructions) + unsigned gs_input_vertices, exec_list *out_instructions, + exec_list *out_variables) : mem_ctx(mem_ctx), locations_used(locations_used), packed_varyings((ir_variable **) @@ -235,7 +246,8 @@ lower_packed_varyings_visitor::lower_packed_varyings_visitor( locations_used)), mode(mode), gs_input_vertices(gs_input_vertices), - out_instructions(out_instructions) + out_instructions(out_instructions), + out_variables(out_variables) { } @@ -274,6 +286,7 @@ lower_packed_varyings_visitor::run(exec_list *instructions) } } +#define SWIZZLE_ZWZW MAKE_SWIZZLE4(SWIZZLE_Z, SWIZZLE_W, SWIZZLE_Z, SWIZZLE_W) /** * Make an ir_assignment from \c rhs to \c lhs, performing appropriate @@ -281,7 +294,7 @@ lower_packed_varyings_visitor::run(exec_list *instructions) * * This function is called when packing varyings. */ -ir_assignment * +void lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, ir_rvalue *rhs) { @@ -300,12 +313,28 @@ lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, rhs = new(this-mem_ctx) ir_expression(ir_unop_bitcast_f2i, lhs-type, rhs); break; + case GLSL_TYPE_DOUBLE: + assert(rhs-type-vector_elements = 2); + if (rhs-type-vector_elements == 2) { +ir_variable *t = new(mem_ctx) ir_variable(lhs-type, pack, ir_var_temporary); + +assert(lhs-type-vector_elements == 4); +this-out_variables-push_tail(t); +this-out_instructions-push_tail( + assign(t, u2i(expr(ir_unop_unpack_double_2x32, swizzle_x(rhs-clone(mem_ctx, NULL, 0x3)); +this-out_instructions-push_tail( + assign(t, u2i(expr(ir_unop_unpack_double_2x32, swizzle_y(rhs))), 0xc)); +rhs = deref(t).val; + } else { +rhs = u2i(expr(ir_unop_unpack_double_2x32, rhs)); + } + break; default: assert(!Unexpected type conversion while lowering varyings); break; } } - return new(this-mem_ctx) ir_assignment(lhs, rhs); + this-out_instructions-push_tail(new (this-mem_ctx) ir_assignment(lhs, rhs)); } @@ -315,7 +344,7 @@ lower_packed_varyings_visitor::bitwise_assign_pack(ir_rvalue *lhs, * * This function is called when unpacking varyings. */ -ir_assignment * +void
Re: [Mesa-dev] [PATCH 1/6] gallium: add some more double opcodes to avoid unnecessary lowering
On Fri, Feb 20, 2015 at 9:37 AM, Roland Scheidegger srol...@vmware.com wrote: Just some minor nits. Am 20.02.2015 um 00:52 schrieb Ilia Mirkin: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_info.c | 5 src/gallium/docs/source/tgsi.rst | 39 ++ src/gallium/include/pipe/p_shader_tokens.h | 7 +- 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index d04f9da..4d838fd 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -257,6 +257,11 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, COMP, D2U, TGSI_OPCODE_D2U }, { 1, 1, 0, 0, 0, 0, COMP, U2D, TGSI_OPCODE_U2D }, { 1, 1, 0, 0 ,0, 0, COMP, DRSQ, TGSI_OPCODE_DRSQ }, + { 1, 1, 0, 0, 0, 0, COMP, DTRUNC, TGSI_OPCODE_DTRUNC }, + { 1, 1, 0, 0, 0, 0, COMP, DCEIL, TGSI_OPCODE_DCEIL }, + { 1, 1, 0, 0, 0, 0, COMP, DFLR, TGSI_OPCODE_DFLR }, + { 1, 1, 0, 0, 0, 0, COMP, DROUND, TGSI_OPCODE_DROUND }, + { 1, 1, 0, 0, 0, 0, COMP, DSSG, TGSI_OPCODE_DSSG }, }; const struct tgsi_opcode_info * diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index e20af79..15f1e9f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -1861,6 +1861,45 @@ two-component vectors with doubled precision in each component. dst.zw = src.zw - \lfloor src.zw\rfloor +.. opcode:: DTRUNC - Truncate + +.. math:: + + dst.xy = trunc(src.xy) + + dst.zw = trunc(src.zw) + +.. opcode:: DCEIL - Ceiling + +.. math:: + + dst.xy = \lceil src.xy\rceil + + dst.zw = \lceil src.zw\rceil + +.. opcode:: DFLR - Floor + +.. math:: + + dst.xy = \lfloor src.xy\rfloor + + dst.zw = \lfloor src.zw\rfloor + +.. opcode:: DROUND - Fraction + +.. math:: + + dst.xy = round(src.xy) + + dst.zw = round(src.zw) + +.. opcode:: DSSG - Set Sign + +.. math:: + + dst.xy = (src.xy 0) ? 1 : (src.xy 0) ? -1 : 0 + + dst.zw = (src.zw 0) ? 1 : (src.zw 0) ? -1 : 0 I think this would be more obvious written as 1.0/-1.0/0.0 (same goes for the non-double version, of course). Yep, that was Dave's comment too. I've fixed it here, left SSG alone. (As a side note, I'm wondering if this actually has defined NaN behavior, the glsl language sort of implies a number has to be exactly one of these 3 cases, and doesn't give an answer what should be returned for a NaN - maybe a NaN?) I'll leave that for another day :) .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index fc41cc9..95ac590 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -519,7 +519,12 @@ struct tgsi_property_data { #define TGSI_OPCODE_D2U 215 #define TGSI_OPCODE_U2D 216 #define TGSI_OPCODE_DRSQ217 /* eg, cayman also has DRSQ */ -#define TGSI_OPCODE_LAST218 +#define TGSI_OPCODE_DTRUNC 218 /* nvc0 */ +#define TGSI_OPCODE_DCEIL 219 /* nvc0 */ +#define TGSI_OPCODE_DFLR220 /* nvc0 */ +#define TGSI_OPCODE_DROUND 221 /* nvc0 */ +#define TGSI_OPCODE_DSSG222 +#define TGSI_OPCODE_LAST223 I don't think these hw-specific references (nvc0) really fit in there - the same can be said about some existing ones (cayman...) I went with the flow... started by Dave, I believe. Happy to remove them all too. Dave -- thoughts? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/32] i965/fs: Fix lower_load_payload() to take into account non-zero reg_offset.
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com On Fri, Feb 6, 2015 at 9:42 AM, Francisco Jerez curroje...@riseup.net wrote: Fixes metadata guess when instructions in the program specify a destination register with non-zero reg_offset and when the payload of a LOAD_PAYLOAD spans several registers. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index eccae06..163aa41 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3070,7 +3070,7 @@ fs_visitor::lower_load_payload() foreach_block_and_inst_safe (block, fs_inst, inst, cfg) { if (inst-dst.file == GRF) { - const int dst_reg = vgrf_to_reg[inst-dst.reg]; + const int dst_reg = vgrf_to_reg[inst-dst.reg] + inst-dst.reg_offset; bool force_sechalf = inst-force_sechalf !inst-force_writemask_all; bool toggle_sechalf = inst-dst.width == 16 @@ -3119,7 +3119,7 @@ fs_visitor::lower_load_payload() } if (dst.file == GRF) { - const int dst_reg = vgrf_to_reg[dst.reg]; + const int dst_reg = vgrf_to_reg[dst.reg] + dst.reg_offset; const bool force_writemask = mov-force_writemask_all; metadata[dst_reg].force_writemask_all = force_writemask; metadata[dst_reg].force_sechalf = mov-force_sechalf; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] gallium/sw/kms: don't redefine DEBUG
From: Marek Olšák marek.ol...@amd.com --- src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c index ed34dfa..781818a 100644 --- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c +++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c @@ -53,9 +53,9 @@ #include state_tracker/drm_driver.h #if 0 -#define DEBUG(msg, ...) fprintf(stderr, msg, __VA_ARGS__) +#define DEBUG_PRINT(msg, ...) fprintf(stderr, msg, __VA_ARGS__) #else -#define DEBUG(msg, ...) +#define DEBUG_PRINT(msg, ...) #endif struct sw_winsys; @@ -145,7 +145,7 @@ kms_sw_displaytarget_create(struct sw_winsys *ws, list_add(kms_sw_dt-link, kms_sw-bo_list); - DEBUG(KMS-DEBUG: created buffer %u (size %u)\n, kms_sw_dt-handle, kms_sw_dt-size); + DEBUG_PRINT(KMS-DEBUG: created buffer %u (size %u)\n, kms_sw_dt-handle, kms_sw_dt-size); *stride = kms_sw_dt-stride; return (struct sw_displaytarget *)kms_sw_dt; @@ -177,7 +177,7 @@ kms_sw_displaytarget_destroy(struct sw_winsys *ws, list_del(kms_sw_dt-link); - DEBUG(KMS-DEBUG: destroyed buffer %u\n, kms_sw_dt-handle); + DEBUG_PRINT(KMS-DEBUG: destroyed buffer %u\n, kms_sw_dt-handle); FREE(kms_sw_dt); } @@ -205,7 +205,7 @@ kms_sw_displaytarget_map(struct sw_winsys *ws, if (kms_sw_dt-mapped == MAP_FAILED) return NULL; - DEBUG(KMS-DEBUG: mapped buffer %u (size %u) at %p\n, + DEBUG_PRINT(KMS-DEBUG: mapped buffer %u (size %u) at %p\n, kms_sw_dt-handle, kms_sw_dt-size, kms_sw_dt-mapped); return kms_sw_dt-mapped; @@ -249,7 +249,7 @@ kms_sw_displaytarget_unmap(struct sw_winsys *ws, { struct kms_sw_displaytarget *kms_sw_dt = kms_sw_displaytarget(dt); - DEBUG(KMS-DEBUG: unmapped buffer %u (was %p)\n, kms_sw_dt-handle, kms_sw_dt-mapped); + DEBUG_PRINT(KMS-DEBUG: unmapped buffer %u (was %p)\n, kms_sw_dt-handle, kms_sw_dt-mapped); munmap(kms_sw_dt-mapped, kms_sw_dt-size); kms_sw_dt-mapped = NULL; @@ -283,7 +283,7 @@ kms_sw_displaytarget_from_handle(struct sw_winsys *ws, if (kms_sw_dt-handle == whandle-handle) { kms_sw_dt-ref_count++; -DEBUG(KMS-DEBUG: imported buffer %u (size %u)\n, kms_sw_dt-handle, kms_sw_dt-size); +DEBUG_PRINT(KMS-DEBUG: imported buffer %u (size %u)\n, kms_sw_dt-handle, kms_sw_dt-size); *stride = kms_sw_dt-stride; return (struct sw_displaytarget *)kms_sw_dt; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] gallivm: fix uninitialized-variable warnings
From: Marek Olšák marek.ol...@amd.com --- src/gallium/auxiliary/gallivm/lp_bld_init.c | 2 +- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c index b9593de..6133883 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c @@ -509,7 +509,7 @@ void gallivm_compile_module(struct gallivm_state *gallivm) { LLVMValueRef func; - int64_t time_begin; + int64_t time_begin = 0; assert(!gallivm-compiled); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index 093c8fc..7d18ee5 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -221,7 +221,7 @@ lp_build_rho(struct lp_build_sample_context *bld, struct lp_build_context *coord_bld = bld-coord_bld; struct lp_build_context *rho_bld = bld-lodf_bld; const unsigned dims = bld-dims; - LLVMValueRef ddx_ddy[2]; + LLVMValueRef ddx_ddy[2] = {NULL}; LLVMBuilderRef builder = bld-gallivm-builder; LLVMTypeRef i32t = LLVMInt32TypeInContext(bld-gallivm-context); LLVMValueRef index0 = LLVMConstInt(i32t, 0, 0); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] gallium/sw/kms: fix a type-mismatch warning
From: Marek Olšák marek.ol...@amd.com --- src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c index 781818a..ce3de78 100644 --- a/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c +++ b/src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c @@ -313,7 +313,7 @@ kms_sw_displaytarget_get_handle(struct sw_winsys *winsys, return TRUE; case DRM_API_HANDLE_TYPE_FD: if (!drmPrimeHandleToFD(kms_sw-fd, kms_sw_dt-handle, - DRM_CLOEXEC, whandle-handle)) { + DRM_CLOEXEC, (int*)whandle-handle)) { whandle-stride = kms_sw_dt-stride; return TRUE; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/6] tgsi: fix type-mismatch warning
From: Marek Olšák marek.ol...@amd.com --- src/gallium/auxiliary/tgsi/tgsi_text.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_text.c b/src/gallium/auxiliary/tgsi/tgsi_text.c index 5069d13..b6b3585 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_text.c +++ b/src/gallium/auxiliary/tgsi/tgsi_text.c @@ -240,7 +240,7 @@ static boolean parse_double( const char **pcur, uint32_t *val0, uint32_t *val1) uint32_t uval[2]; } v; - v.dval = strtod(cur, pcur); + v.dval = strtod(cur, (char**)pcur); if (*pcur == cur) return FALSE; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] gallivm: fix uninitialized-variable warnings
On 02/20/2015 12:22 PM, Marek Olšák wrote: From: Marek Olšák marek.ol...@amd.com --- src/gallium/auxiliary/gallivm/lp_bld_init.c | 2 +- src/gallium/auxiliary/gallivm/lp_bld_sample.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c index b9593de..6133883 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c @@ -509,7 +509,7 @@ void gallivm_compile_module(struct gallivm_state *gallivm) { LLVMValueRef func; - int64_t time_begin; + int64_t time_begin = 0; assert(!gallivm-compiled); diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c b/src/gallium/auxiliary/gallivm/lp_bld_sample.c index 093c8fc..7d18ee5 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c @@ -221,7 +221,7 @@ lp_build_rho(struct lp_build_sample_context *bld, struct lp_build_context *coord_bld = bld-coord_bld; struct lp_build_context *rho_bld = bld-lodf_bld; const unsigned dims = bld-dims; - LLVMValueRef ddx_ddy[2]; + LLVMValueRef ddx_ddy[2] = {NULL}; LLVMBuilderRef builder = bld-gallivm-builder; LLVMTypeRef i32t = LLVMInt32TypeInContext(bld-gallivm-context); LLVMValueRef index0 = LLVMConstInt(i32t, 0, 0); For the series, Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/23] main: Minor whitespace fixes in ClearNamedBuffer[Sub]Data.
Again, Ian requested that this be a separate commit. On Fri, Feb 20, 2015 at 6:22 AM, Martin Peres martin.pe...@free.fr wrote: Please squash this in the previous commit (with git rebase -i). On 12/02/2015 04:05, Laura Ekstrand wrote: --- src/mesa/main/bufferobj.c | 4 ++-- src/mesa/main/bufferobj.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index bd21c8a..88230d6 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -1765,10 +1765,10 @@ void GLAPIENTRY _mesa_ClearBufferSubData(GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, - const GLvoid* data) + const GLvoid *data) { GET_CURRENT_CONTEXT(ctx); - struct gl_buffer_object* bufObj; + struct gl_buffer_object *bufObj; bufObj = get_buffer(ctx, glClearBufferSubData, target, GL_INVALID_VALUE); if (!bufObj) diff --git a/src/mesa/main/bufferobj.h b/src/mesa/main/bufferobj.h index 5254727..2a66444 100644 --- a/src/mesa/main/bufferobj.h +++ b/src/mesa/main/bufferobj.h @@ -220,7 +220,7 @@ _mesa_GetBufferSubData(GLenum target, GLintptrARB offset, void GLAPIENTRY _mesa_ClearBufferData(GLenum target, GLenum internalformat, GLenum format, GLenum type, - const GLvoid * data); + const GLvoid *data); void GLAPIENTRY _mesa_ClearNamedBufferData(GLuint buffer, GLenum internalformat, @@ -231,7 +231,7 @@ void GLAPIENTRY _mesa_ClearBufferSubData(GLenum target, GLenum internalformat, GLintptr offset, GLsizeiptr size, GLenum format, GLenum type, - const GLvoid * data); + const GLvoid *data); void GLAPIENTRY _mesa_ClearNamedBufferSubData(GLuint buffer, GLenum internalformat, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/32] i965/fs: Fix lower_load_payload() to take into account stride in the metadata guess.
Jason Ekstrand ja...@jlekstrand.net writes: Where are you getting a destination stride in a load_payload? The whole point of load_payload was to remove partial writes so this seems to go against everything it's intended for. --Jason Yeah, maybe... This case was triggered by some of my image_load_store code that ends up using byte types with stride=4 for some image formats. On Fri, Feb 6, 2015 at 9:42 AM, Francisco Jerez curroje...@riseup.net wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 163aa41..8da1f47 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3074,7 +3074,7 @@ fs_visitor::lower_load_payload() bool force_sechalf = inst-force_sechalf !inst-force_writemask_all; bool toggle_sechalf = inst-dst.width == 16 - type_sz(inst-dst.type) == 4 + type_sz(inst-dst.type) * inst-dst.stride == 4 !inst-force_writemask_all; for (int i = 0; i inst-regs_written; ++i) { metadata[dst_reg + i].written = true; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpsLmQeSeN85.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] common: Correct texture initialization in create_texture_for_pbo.
On Fri, Feb 20, 2015 at 4:30 PM, Laura Ekstrand la...@jlekstrand.net wrote: Solves bugs related to the driver not setting up the texture miptree correctly, leading to faulty PBO uploads and downloads. --- src/mesa/drivers/common/meta_tex_subimage.c | 13 + src/mesa/drivers/dri/i965/intel_tex.c | 3 ++- src/mesa/main/dd.h | 1 + 3 files changed, 12 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 68c8273..f4f7716 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -51,7 +51,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, { uint32_t pbo_format; GLenum internal_format; - unsigned row_stride; + unsigned row_stride, image_stride; struct gl_buffer_object *buffer_obj; struct gl_texture_object *tex_obj; struct gl_texture_image *tex_image; @@ -74,6 +74,8 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, pixels = _mesa_image_address3d(packing, pixels, width, height, format, type, 0, 0, 0); row_stride = _mesa_image_row_stride(packing, width, format, type); + image_stride = _mesa_image_image_stride(packing, width, height, format, + type); if (_mesa_is_bufferobj(packing-BufferObj)) { *tmp_pbo = 0; @@ -89,7 +91,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, */ _mesa_BindBuffer(pbo_target, *tmp_pbo); - _mesa_BufferData(pbo_target, row_stride * height, pixels, GL_STREAM_DRAW); + _mesa_BufferData(pbo_target, image_stride * depth, pixels, GL_STREAM_DRAW); buffer_obj = ctx-Unpack.BufferObj; pixels = NULL; @@ -99,9 +101,11 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, _mesa_GenTextures(1, tmp_tex); tex_obj = _mesa_lookup_texture(ctx, *tmp_tex); - tex_obj-Target = depth 1 ? GL_TEXTURE_2D_ARRAY : GL_TEXTURE_2D; - tex_obj-Immutable = GL_TRUE; _mesa_initialize_texture_object(ctx, tex_obj, *tmp_tex, GL_TEXTURE_2D); + /* This must be set after _mesa_initialize_texture_object, not before. */ + tex_obj-Immutable = GL_TRUE; + /* This is required for interactions with ARB_texture_view. */ + tex_obj-NumLayers = 1; internal_format = _mesa_get_format_base_format(pbo_format); @@ -114,6 +118,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, buffer_obj, (intptr_t)pixels, row_stride, + image_stride, read_only)) { _mesa_DeleteTextures(1, tmp_tex); _mesa_DeleteBuffers(1, tmp_pbo); diff --git a/src/mesa/drivers/dri/i965/intel_tex.c b/src/mesa/drivers/dri/i965/intel_tex.c index 2d3009a..3a0c09a 100644 --- a/src/mesa/drivers/dri/i965/intel_tex.c +++ b/src/mesa/drivers/dri/i965/intel_tex.c @@ -305,6 +305,7 @@ intel_set_texture_storage_for_buffer_object(struct gl_context *ctx, struct gl_buffer_object *buffer_obj, uint32_t buffer_offset, uint32_t row_stride, +uint32_t image_stride, bool read_only) { struct brw_context *brw = brw_context(ctx); @@ -334,7 +335,7 @@ intel_set_texture_storage_for_buffer_object(struct gl_context *ctx, drm_intel_bo *bo = intel_bufferobj_buffer(brw, intel_buffer_obj, buffer_offset, - row_stride * image-Height); + image_stride * image-Depth); intel_texobj-mt = intel_miptree_create_for_bo(brw, bo, image-TexFormat, diff --git a/src/mesa/main/dd.h b/src/mesa/main/dd.h index ec8662b..9de08e2 100644 --- a/src/mesa/main/dd.h +++ b/src/mesa/main/dd.h @@ -429,6 +429,7 @@ struct dd_function_table { struct gl_buffer_object *bufferObj, uint32_t buffer_offset, uint32_t row_stride, +uint32_t image_stride, bool read_only); Given that the next patch fixes it to only use 2D textures, we shouldn't need the changes to the DD table. In fact, we should take all the stuff about array textures out of the DD table entry, comment, and implementation. However,
Re: [Mesa-dev] [PATCH 2/4] common: Correct PBO 2D_ARRAY handling.
There needs to be some corresponding Piglit test to guarantee that skip pixels is really a problem here. Otherwise, I'm flying blind trying to fix it. On Fri, Feb 20, 2015 at 5:04 PM, Jason Ekstrand ja...@jlekstrand.net wrote: This is mostly correct and it's a good solution. The only problem is that it doesn't handle the skipRows packing property properly. This property tells the driver the stride (in rows) between image planes in the data. Most of the places where depth is used below, we should be using the stride (in rows) between images. Look at the _mesa_get_image_stride familiy of functions to see exactly how to handle this. On Fri, Feb 20, 2015 at 4:30 PM, Laura Ekstrand la...@jlekstrand.net wrote: Changes PBO uploads and downloads to use a tall (height * depth) 2D texture for blitting. This fixes the bug where 2D_ARRAY, 3D, and CUBE_MAP_ARRAY textures are not properly uploaded and downloaded. --- src/mesa/drivers/common/meta_tex_subimage.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index f4f7716..ee3295b 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -110,7 +110,7 @@ create_texture_for_pbo(struct gl_context *ctx, bool create_pbo, internal_format = _mesa_get_format_base_format(pbo_format); tex_image = _mesa_get_tex_image(ctx, tex_obj, tex_obj-Target, 0); - _mesa_init_teximage_fields(ctx, tex_image, width, height, depth, + _mesa_init_teximage_fields(ctx, tex_image, width, height * depth, 1, 0, internal_format, pbo_format); read_only = pbo_target == GL_PIXEL_UNPACK_BUFFER; @@ -227,7 +227,7 @@ _mesa_meta_pbo_TexSubImage(struct gl_context *ctx, GLuint dims, _mesa_update_state(ctx); _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, - 0, 0, width, height, + 0, z * height, width, (z + 1) * height, xoffset, yoffset, xoffset + width, yoffset + height, GL_COLOR_BUFFER_BIT, GL_NEAREST); @@ -349,7 +349,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_meta_BlitFramebuffer(ctx, ctx-ReadBuffer, ctx-DrawBuffer, xoffset, yoffset, xoffset + width, yoffset + height, - 0, 0, width, height, + 0, z * height, width, (z + 1) * height, GL_COLOR_BUFFER_BIT, GL_NEAREST); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber correctly). We'd rather not have to squash everything if we can. All that being said, our liveness/register allocation can't handle this and making register allocation handle parts of registers interfering but not other bits is going to be a real pain. Maybe not even worth it. If you'd rather force_writemask_all everything, I'll sign off on that for now. I just wanted to point out that it may not be a good permanent solution. A reasonable compromise might be to leave it up to the caller whether to set the force_writemask_all and force_sechalf flags or not. I.e. just copy the same flags set on the LOAD_PAYLOAD instruction to the MOV instructions. That would still allow reducing the liveness intervals in cases where we know that the payload respects channel boundaries. Tracking the flag information per-register in cases where the same payload has well- and ill-behaved values seems rather premature and rather useless to me because the optimizer is likely to be able to get rid of redundant
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
Jason Ekstrand ja...@jlekstrand.net writes: One more comment here... This particularly regards your plan of separating it into things that match the destination and other things and not copy prop uniforms or immediates into the other things. There is another case we need to handle. On older gens (SNB maybe?) the SIMD16 FB write messages are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0, r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect SIMD16 load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On those platforms, there is a special type of MOV-to-MRF that can move from (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right now. However, it doesn't follow the standard rules. --Jason I don't see why it couldn't be handled in roughly the same way you do it now? Recognize when src[i + 4] is the same 8-wide register as src[i] shifted by 8 and emit a COMPR4 copy in that case? On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd say it's a bug in the liveness analysis pass if it wouldn't consider the liveness interval of a and b to overlap in your example if the assignment of b had force_writemask_all set. Yes, I'm aware of that. However, the part of that register that has to squash everything is usually only a couple of registers while the entire payload may be up to 13 (if I remmeber
[Mesa-dev] [PATCH 7/7] i965: Fix variable indexing of sampler arrays under non-uniform control flow.
ARB_gpu_shader5 requires sampler array indexing expressions to be dynamically uniform, this however doesn't have any implications on the control flow that leads to the evaluation of that expression being uniform. Use emit_uniformize() to obtain an arbitrary live value from the binding table index calculation instead of assuming that the first channel is always live. Fixes the following Piglit test cases: arb_gpu_shader5/execution/sampler_array_indexing/fs-nonuniform-control-flow.shader_test arb_gpu_shader5/execution/sampler_array_indexing/vs-nonuniform-control-flow.shader_test part of the series: http://lists.freedesktop.org/archives/piglit/2015-February/014615.html --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 5 +++-- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 5 +++-- 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index ad77752..030ce24 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1563,8 +1563,8 @@ fs_visitor::nir_emit_texture(nir_tex_instr *instr) /* Emit code to evaluate the actual indexing expression */ sampler_reg = vgrf(glsl_type::uint_type); - emit(ADD(sampler_reg, src, fs_reg(sampler))) - -force_writemask_all = true; + emit(ADD(sampler_reg, src, fs_reg(sampler))); + emit_uniformize(sampler_reg, sampler_reg); break; } diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index dee6a6d..d7cad9f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -2166,8 +2166,9 @@ fs_visitor::visit(ir_texture *ir) /* Emit code to evaluate the actual indexing expression */ nonconst_sampler_index-accept(this); fs_reg temp = vgrf(glsl_type::uint_type); - emit(ADD(temp, this-result, fs_reg(sampler))) --force_writemask_all = true; + emit(ADD(temp, this-result, fs_reg(sampler))); + emit_uniformize(temp, temp); + sampler_reg = temp; } else { /* Single sampler, or constant array index; the indexing expression diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index d5848d2..c260381 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2497,8 +2497,9 @@ vec4_visitor::visit(ir_texture *ir) /* Emit code to evaluate the actual indexing expression */ nonconst_sampler_index-accept(this); dst_reg temp(this, glsl_type::uint_type); - emit(ADD(temp, this-result, src_reg(sampler))) - -force_writemask_all = true; + emit(ADD(temp, this-result, src_reg(sampler))); + emit_uniformize(temp, src_reg(temp)); + sampler_reg = src_reg(temp); } else { /* Single sampler, or constant array index; the indexing expression -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] i965: Define helper function to copy an arbitrary live component from some register.
--- src/mesa/drivers/dri/i965/brw_fs.h | 2 ++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 12 src/mesa/drivers/dri/i965/brw_vec4.h | 3 +++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 +++ 4 files changed, 28 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 6e5fa1d..df6b825 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -306,6 +306,8 @@ public: const fs_reg a); void emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg dst, const fs_reg src0, const fs_reg src1); + /** Copy any live channel from \p src to the first channel of \p dst. */ + void emit_uniformize(const fs_reg dst, const fs_reg src); bool try_emit_saturate(ir_expression *ir); bool try_emit_line(ir_expression *ir); bool try_emit_mad(ir_expression *ir); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 4b48f2d..6f1ff20 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -337,6 +337,18 @@ fs_visitor::emit_minmax(enum brw_conditional_mod conditionalmod, const fs_reg d } } +void +fs_visitor::emit_uniformize(const fs_reg dst, const fs_reg src) +{ + const fs_reg chan_index = vgrf(glsl_type::uint_type); + + emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, component(chan_index, 0)) + -force_writemask_all = true; + emit(SHADER_OPCODE_BROADCAST, component(dst, 0), +src, component(chan_index, 0)) + -force_writemask_all = true; +} + bool fs_visitor::try_emit_saturate(ir_expression *ir) { diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 7a92d59..adb5fe4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -298,6 +298,9 @@ public: void emit_lrp(const dst_reg dst, const src_reg x, const src_reg y, const src_reg a); + /** Copy any live channel from \p src to the first channel of \p dst. */ + void emit_uniformize(const dst_reg dst, const src_reg src); + void emit_block_move(dst_reg *dst, src_reg *src, const struct glsl_type *type, brw_predicate predicate); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index f6f589d..4515fc7 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -1304,6 +1304,17 @@ vec4_visitor::emit_lrp(const dst_reg dst, } void +vec4_visitor::emit_uniformize(const dst_reg dst, const src_reg src) +{ + const src_reg chan_index(this, glsl_type::uint_type); + + emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, dst_reg(chan_index)) + -force_writemask_all = true; + emit(SHADER_OPCODE_BROADCAST, dst, src, chan_index) + -force_writemask_all = true; +} + +void vec4_visitor::visit(ir_expression *ir) { unsigned int operand; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] i965: Introduce the BROADCAST pseudo-opcode.
The BROADCAST instruction picks the channel from its first source given by an index passed in as second source. This will be used in situations where all channels from the same SIMD thread have to agree on the value of something, e.g. a surface binding table index. --- src/mesa/drivers/dri/i965/brw_defines.h | 6 ++ src/mesa/drivers/dri/i965/brw_eu.h | 6 ++ src/mesa/drivers/dri/i965/brw_eu_emit.c | 77 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 4 ++ src/mesa/drivers/dri/i965/brw_shader.cpp | 3 + src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 ++ 6 files changed, 100 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 17c27dd..d4930e3 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -911,6 +911,12 @@ enum opcode { SHADER_OPCODE_URB_WRITE_SIMD8, + /** +* Pick the channel from its first source register given by the index +* specified as second source. Useful for variable indexing of surfaces. +*/ + SHADER_OPCODE_BROADCAST, + VEC4_OPCODE_MOV_BYTES, VEC4_OPCODE_PACK_BYTES, VEC4_OPCODE_UNPACK_UNIFORM, diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index a94ea42..2505480 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -413,6 +413,12 @@ brw_pixel_interpolator_query(struct brw_compile *p, unsigned msg_length, unsigned response_length); +void +brw_broadcast(struct brw_compile *p, + struct brw_reg dst, + struct brw_reg src, + struct brw_reg idx); + /*** * brw_eu_util.c: */ diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 1d6fd67..d7e3995 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -2854,6 +2854,83 @@ brw_pixel_interpolator_query(struct brw_compile *p, brw_inst_set_pi_message_data(brw, insn, data); } +void +brw_broadcast(struct brw_compile *p, + struct brw_reg dst, + struct brw_reg src, + struct brw_reg idx) +{ + const struct brw_context *brw = p-brw; + const bool align1 = (brw_inst_access_mode(brw, p-current) == BRW_ALIGN_1); + brw_inst *inst; + + assert(src.file == BRW_GENERAL_REGISTER_FILE + src.address_mode == BRW_ADDRESS_DIRECT); + + if ((src.vstride == 0 (src.hstride == 0 || !align1)) || + idx.file == BRW_IMMEDIATE_VALUE) { + /* Trivial, the source is already uniform or the index is a constant. + * We will typically not get here if the optimizer is doing its job, but + * asserting would be mean. + */ + const unsigned i = (idx.file == BRW_IMMEDIATE_VALUE ? idx.dw1.ud : 0); + brw_MOV(p, dst, + (align1 ? stride(suboffset(src, i), 0, 1, 0) : + stride(suboffset(src, 4 * i), 0, 4, 1))); + + } else { + if (align1) { + const struct brw_reg addr = +retype(brw_address_reg(0), BRW_REGISTER_TYPE_UD); + const unsigned offset = src.nr * REG_SIZE + src.subnr; + /* Limit in bytes of the signed indirect addressing immediate. */ + const unsigned limit = 512; + + brw_push_insn_state(p); + brw_set_default_mask_control(p, BRW_MASK_DISABLE); + brw_set_default_predicate_control(p, BRW_PREDICATE_NONE); + + /* Take into account the component size and horizontal stride. */ + assert(src.vstride == src.hstride + src.width); + brw_SHL(p, addr, vec1(idx), + brw_imm_ud(_mesa_logbase2(type_sz(src.type)) + +src.hstride - 1)); + + /* We can only address up to limit bytes using the indirect + * addressing immediate, account for the difference if the source + * register is above this limit. + */ + if (offset = limit) +brw_ADD(p, addr, addr, brw_imm_ud(offset - offset % limit)); + + brw_pop_insn_state(p); + + /* Use indirect addressing to fetch the specified component. */ + brw_MOV(p, dst, + retype(brw_vec1_indirect(addr.subnr, offset % limit), +src.type)); + + } else { + /* In SIMD4x2 mode the index can be either zero or one, replicate it + * to all bits of a flag register, + */ + inst = brw_MOV(p, +brw_null_reg(), +stride(brw_swizzle1(idx, 0), 0, 4, 1)); + brw_inst_set_pred_control(brw, inst, BRW_PREDICATE_NONE); + brw_inst_set_cond_modifier(brw, inst, BRW_CONDITIONAL_NZ); + brw_inst_set_flag_reg_nr(brw, inst, 1); + + /* and use predicated
Re: [Mesa-dev] [PATCH 10/32] i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().
On Fri, Feb 20, 2015 at 2:43 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: One more comment here... This particularly regards your plan of separating it into things that match the destination and other things and not copy prop uniforms or immediates into the other things. There is another case we need to handle. On older gens (SNB maybe?) the SIMD16 FB write messages are in the order (r0, g0, b0, a0, r1, g1, b1, a1) instead of the usual (r0, r1, g0, g1, b0, b1, a0, a1). This isn't going to work if we expect SIMD16 load_payload instructions to be lowerable to a bunch of SIMD16 MOVs. On those platforms, there is a special type of MOV-to-MRF that can move from (gN, gN+1) to (mK, mK+4) and we handle that in lower_load_payload right now. However, it doesn't follow the standard rules. --Jason I don't see why it couldn't be handled in roughly the same way you do it now? Recognize when src[i + 4] is the same 8-wide register as src[i] shifted by 8 and emit a COMPR4 copy in that case? Sure. I haven't thought about it hard enough to prove it can't be done. Just wanted to make sure you were aware of that particular monkey-wrench. --Jason On Fri, Feb 20, 2015 at 2:10 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Fri, Feb 20, 2015 at 1:09 PM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: On Fri, Feb 20, 2015 at 4:11 AM, Francisco Jerez curroje...@riseup.net wrote: Jason Ekstrand ja...@jlekstrand.net writes: I'm still a little pensive. But Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com Thanks. Now for a little aside. I have come to the conclusion that I made a grave mistake when I did the LOAD_PAYLOAD stuff. In retrospect, I should have just subclassed fs_inst for load_payload. The problem is that we need to snag a bunch of information for the sources when we create the load_payload. In particular, we need to know the width of the source so that we know how much space it consumes in the payload and we need to know the information required to properly re-create the mov such as force_sechalf and force_writemask_all. Really, in order to do things properly, we need to gather this information *before* we do any optimizations. The nasty pile of code that you're editing together with the effective_width parameter is a lame attempt to capture/reconstruct this information. Really, we should just subclass, capture the information up-front, and do it properly. Yes, absolutely, this lowering pass is a real mess. There are four more unreviewed patches in the mailing list fixing bugs of the metadata guessing of lower_load_payload() [1][2][3][4], you may want to take a look at them. There are more bugs I'm aware of but it didn't seem practical to fix them. Yeah, Matt pointed me at those. I'll give them a look later today. That said, I don't think it would be worth subclassing fs_inst. My suggestion would have been to keep it simple and lower LOAD_PAYLOAD into a series of MOVs with force_writemask_all enabled in all cases, simply rely on the optimizer to eliminate those where possible. Then get rid of the metadata and effective_width guessing. Instead agree on immediates and uniforms being exec_size-wide by convention (i.e. LOAD_PAYLOAD's exec_size rather than the original instruction's), then prevent constant propagation from propagating an immediate load into a LOAD_PAYLOAD if it would lead to a change in the semantics of the program, and maybe just run constant propagation after lowering so we can be sure those cases are cleaned up properly where register coalesce isn't enough. First off, simply setting force_writemask_all isn't an option. Consider the following scenario: a = foo; if (bar) { b = 7; } else { use(a); } use(b); If use(a) is the last use of the variable a, then the live ranges of a and b don't actually over-lap and we can assign a and b to the same register. However, if force_writemask_all is set on b, it will destroy the value in a before its last use. Right now, we don't actually do this because our liveness analysis pass flattens everything so it will think that b and a over-lap even though they don't. However, if we ever decide to make the liveness information more accurate, having a pile of force_writemask_all instructions will be a major problem. So, while it is *technically* safe for now, it's a really bad idea in the long-term. The thing is we *will* have to deal with that scenario. Building a message payload inherently involves crossing channel boundaries (because of headers, unsupported SIMD modes by some shared functions, etc.). I'd
[Mesa-dev] [PATCH 4/7] i965: Perform basic optimizations on the FIND_LIVE_CHANNEL opcode.
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + src/mesa/drivers/dri/i965/brw_vec4.cpp | 41 + src/mesa/drivers/dri/i965/brw_vec4.h | 1 + src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 1 + 6 files changed, 94 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d567c2b..4537900 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2734,6 +2734,54 @@ fs_visitor::compute_to_mrf() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +fs_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case FS_OPCODE_DISCARD_JUMP: + /* This can potentially make control flow non-uniform until the end + * of the program. + */ + depth++; + break; + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = fs_reg(0); +inst-sources = 1; +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} + +/** * Once we've generated code, try to convert normal FS_OPCODE_FB_WRITE * instructions to FS_OPCODE_REP_FB_WRITE. */ @@ -3678,6 +3726,7 @@ fs_visitor::optimize() OPT(opt_saturate_propagation); OPT(register_coalesce); OPT(compute_to_mrf); + OPT(eliminate_find_live_channel); OPT(compact_virtual_grfs); } while (progress); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 9375412..6e5fa1d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -228,6 +228,7 @@ public: bool opt_register_renaming(); bool register_coalesce(); bool compute_to_mrf(); + bool eliminate_find_live_channel(); bool dead_code_eliminate(); bool remove_duplicate_mrf_writes(); bool virtual_grf_interferes(int a, int b); diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index bbe0747..04a073e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -89,6 +89,7 @@ is_expression(const fs_inst *const inst) case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD: case FS_OPCODE_CINTERP: case FS_OPCODE_LINTERP: + case SHADER_OPCODE_FIND_LIVE_CHANNEL: case SHADER_OPCODE_BROADCAST: return true; case SHADER_OPCODE_RCP: diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 601b8c1..219c05e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1250,6 +1250,46 @@ vec4_visitor::opt_register_coalesce() } /** + * Eliminate FIND_LIVE_CHANNEL instructions occurring outside any control + * flow. We could probably do better here with some form of divergence + * analysis. + */ +bool +vec4_visitor::eliminate_find_live_channel() +{ + bool progress = false; + unsigned depth = 0; + + foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) { + switch (inst-opcode) { + case BRW_OPCODE_IF: + case BRW_OPCODE_DO: + depth++; + break; + + case BRW_OPCODE_ENDIF: + case BRW_OPCODE_WHILE: + depth--; + break; + + case SHADER_OPCODE_FIND_LIVE_CHANNEL: + if (depth == 0) { +inst-opcode = BRW_OPCODE_MOV; +inst-src[0] = src_reg(0); +inst-force_writemask_all = true; +progress = true; + } + break; + + default: + break; + } + } + + return progress; +} + +/** * Splits virtual GRFs requesting more than one contiguous physical register. * * We initially create large virtual GRFs for temporary structures, arrays, @@ -1880,6 +1920,7 @@ vec4_visitor::run() OPT(opt_cse); OPT(opt_algebraic); OPT(opt_register_coalesce); + OPT(eliminate_find_live_channel); } while (progress); pass_num = 0; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index a24f843..7a92d59 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -207,6 +207,7 @@ public: bool opt_cse(); bool opt_algebraic(); bool