Re: [Mesa-dev] [PATCH v2 0/6] Improvements to the vec4 spilling code
On Tue, 2015-07-28 at 18:17 +0300, Francisco Jerez wrote: Iago Toral Quiroga ito...@igalia.com writes: Link to v1: http://lists.freedesktop.org/archives/mesa-dev/2015-July/089766.html Changes after review (Curro) - Drop the patch that asserted that the reg size should always be 1 - Expand this so that we do not unspill a register if we have just unspilled it as well - Use brw_mask_for_swizzle - Update spilling costs accordingly New changes: - Expand the optimizations that are based on caching the spilled/unspilled so we keep using the cached register for as long as consecutive instructions keep reading the register (the previous version would only do this for one instruction). This is because we only see benefits for register allocation when there are gaps in the life span of a register where it is not used (because these are the only instances in which we can use that reg for a different purpose), so as long as consecutive instructions keep reading a register we have just spilled or unspilled, we don't have to unspill it again. I think this may be a good idea (assuming you've managed to measure an improvement in practice), but I don't think that the explanation is strictly speaking correct. It *may* be beneficial to, say, unspill a variable for instruction i and then do it again for instruction i+1, because the set of variables live at instruction i may not be exactly the same as in instruction i+1, and by caching the value between both instructions you cause the temporary to interfere with the union of both sets simultaneously, what may increase the total number of registers required to register-allocate the program. This is true, although you also need to allocate a register for the new vgrf used to unspill, so I think the chances of this being beneficial in practice are very low. I'll make sure to update the comment to be more precise though. That said I think that this may still be a good idea because the register-pressure benefit from separating the live ranges of temporaries used in consecutive instructions is likely to be tiny typically, the program is likely to have other spilling candidates which may simplify the interference graph drastically for the same amount of fill/spill bandwidth invested, so I think you're right that in most cases it's going to be silly to re-spill/fill the same variable in consecutive instructions. Right. The way I would expect this to work in practice is that we start by spilling registers with the best benefit / cost ratio. That should be registers that have a long life-span and usage gaps where the main benefit for allocation comes from being able to allocate the register for a different purpose during these gaps, so there should lose very little for register allocation by doing this (if anything at all). In the future it may also be worth checking whether the heuristic can be refined to use some sort of register pressure-sensitive distance between uses of the same spilled variable as metric to decide whether the variable is worth re-spilling or if it makes sense for it to be cached between a pair of potentially non-consecutive uses. Anyway I'll have a closer look at the rest of your series soon-ish. Thanks Curro! Other Iago Toral Quiroga (6): i965/vec4: Only emit one scratch read per instruction for spilled registers i965/vec4: Remove checks for reladdr when checking for spillable registers i965/vec4: Don't emit scratch reads for a spilled register we have just written i965/vec4: Don't emit scratch reads for a register we have just unspilled i965/vec4: Adjust spilling cost for consecutive instructions i965: Add a debug option for spilling everything in vec4 code src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 145 +++-- src/mesa/drivers/dri/i965/intel_debug.c| 3 +- src/mesa/drivers/dri/i965/intel_debug.h| 5 +- 5 files changed, 139 insertions(+), 18 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] ssbo, vec4, nir - Re: [PATCH v3 (part2) 00/56] ARB_shader_storage_buffer_object (mesa, i965)
On Tue, 2015-08-04 at 17:54 -0700, Jason Ekstrand wrote: On Tue, Aug 4, 2015 at 5:09 PM, Jordan Justen jordan.l.jus...@intel.com wrote: I checked the extension spec, and it appears that like image load/store, ssbo is only required in fragment and compute stages. I don't think the ssbo support in vec4 is quite as problematic as image load/store, so I'm not sure this information will prove useful. I did wonder what path we should take with ssbo, given that the vec4 nir support landed. It seems like vec4 nir is not enabled by default at this point, and I'm not sure what the plan is there. Should we only enable ssbo for geometry stages: 1. Only when nir is enabled 2. Only when nir is disabled. (since that is the default currently) 3. Always 4. Not at all. (At least for now, since the spec doesn't require it) I think my vote would be #1, especially if we plan to enable nir for vec4 by default before the next release. I would agree especially if we think that using NIR by default in mesa 11.0 is going to happen. Given how smoothly the vec4 NIR stuff has worked, I think that's probably possible. Current shader-db results on HSW are as following: Aha, in that case we will start working on adding the NIR bits for geometry stages to our series as well. We had been postponing this because I thought it would probably take more time to enable NIR/vec4 by default, but with this plan it makes sense to start working on that now. total instructions in shared programs: 1816713 - 1798252 (-1.02%) instructions in affected programs: 1671486 - 1653025 (-1.10%) helped:5248 HURT: 12868 GAINED:0 LOST: 0 We can probably whittle that down pretty quick. --Jason -Jordan On 2015-07-14 00:46:02, Iago Toral Quiroga wrote: This is the second part of the v3 series including remaining frontend bits like the optional unsized array at the bottom of SSBO definitions, layout mode std430 and implementation for GL queries/getters. It also includes all of the i965 backend implementation. Notice that this series depends on the following patches from Curro that have not been landed in master yet: i965/vec4: Import surface message builder functions. i965/vec4: Import helpers to convert vectors into arrays and back. i965/vec4: Import array utils for the surface message builder. i965/fs: Import surface message builder functions. i965/fs: Import helpers to convert vectors into arrays and back. i965/fs: Import array utils for the surface message builder. i965: Lift the constness restriction on surface indices passed to untyped ops. i965/vec4: Introduce VEC4 IR builder. i965/vec4: Make src_reg conversion constructor from dst_reg implicit. i965: Add register constructors taking a backend_reg as argument. i965/fs: Fix register coalesce not to lose track of the second half of 16-wide moves. This v3-part2 series, together with these patches from Curro and part1 (which has already been landed in master), can be obtained from this development repository: git clone -b itoral-ARB_shader_storage_buffer_object-v3-part2 https://github.com/Igalia/mesa.git Also notice that: * Patch 55 is a separate bugfix that affects UBOs as well. * Patch 56 is not required by the series and is completely optional. SSBO piglit tests have already been merged in piglit's master branch. Antia Puentes (1): glsl: Consider active all elements of a shared/std140 block array Iago Toral Quiroga (29): i965: Use 16-byte offset alignment for shader storage buffers i965: Implement DriverFlags.NewShaderStorageBuffer i965: Set MaxShaderStorageBuffers for compute shaders i965: Upload Shader Storage Buffer Object surfaces i965: handle visiting of ir_var_shader_storage variables i965/fs: Do not split buffer variables i965/fs: Do not include the header with a pixel mask in untyped read messages i965/vec4: Implement SSBO writes nir: Implement __intrinsic_store_ssbo i965/nir/fs: Implement SSBO writes i965/vec4: Implement SSBO reads nir: Implement __intrinsic_store_load i965/nir/fs: Implement SSBO reads nir: ignore an instruction's dest if it hasn't any glsl: Rename atomic counter functions glsl: Add atomic functions from ARB_shader_storage_buffer_object glsl: lower SSBO atomic intrinsics nir: Implement lowered SSBO atomic intrinsics i965/nir/fs: Implement SSBO atomics i965/vec4: Implement lowered SSBO atomic intrinsics glsl: First argument to atomic functions must be a buffer variable mesa: Add queries for GL_SHADER_STORAGE_BUFFER glsl: Allow use of memory qualifiers with ARB_shader_storage_buffer_object. glsl: Apply memory qualifiers to buffer variables glsl: Allow memory layout qualifiers on shader
Re: [Mesa-dev] [PATCH v3 (part2) 08/56] glsl: add support for unsized arrays in shader storage blocks
On Tue, 2015-08-04 at 14:08 -0700, Jordan Justen wrote: On 2015-07-14 00:46:10, Iago Toral Quiroga wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com They only can be defined in the last position of the shader storage blocks. When an unsized array is used in different shaders, it might be converted in different sized arrays, avoid get a linker error in that case. v2: - Rework error condition and error messages (Timothy Arteri) Arceri Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/glsl/ast_array_index.cpp | 5 +- src/glsl/ast_to_hir.cpp | 66 ++ src/glsl/ir.cpp | 1 + src/glsl/ir.h| 14 ++ src/glsl/linker.cpp | 107 --- 5 files changed, 155 insertions(+), 38 deletions(-) diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp index 2c79002..8a1ae67 100644 --- a/src/glsl/ast_array_index.cpp +++ b/src/glsl/ast_array_index.cpp @@ -182,8 +182,9 @@ _mesa_ast_array_index_to_hir(void *mem_ctx, if (array-type-is_array()) update_max_array_access(array, idx, loc, state); } else if (const_index == NULL array-type-is_array()) { - if (array-type-is_unsized_array()) { -_mesa_glsl_error(loc, state, unsized array index must be constant); + if (array-type-is_unsized_array() + array-variable_referenced()-data.mode != ir_var_shader_storage) { +_mesa_glsl_error(loc, state, unsized array index must be constant); } else if (array-type-fields.array-is_interface() array-variable_referenced()-data.mode == ir_var_uniform !state-is_version(400, 0) !state-ARB_gpu_shader5_enable) { diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index ca30dbc..1b4ee22 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -5599,6 +5599,19 @@ private: bool found; }; +static bool +is_unsized_array_last_element(ir_variable *v) +{ + const glsl_type *interface_type = v-get_interface_type(); + int length = interface_type-length; + + assert(v-type-is_unsized_array()); + + /* Check if it is the last element of the interface */ + if (strcmp(interface_type-fields.structure[length-1].name, v-name) == 0) + return true; + return false; +} ir_rvalue * ast_interface_block::hir(exec_list *instructions, @@ -5913,6 +5926,33 @@ ast_interface_block::hir(exec_list *instructions, if (state-stage == MESA_SHADER_GEOMETRY var_mode == ir_var_shader_in) handle_geometry_shader_input_decl(state, loc, var); + for (unsigned i = 0; i num_variables; i++) { + if (fields[i].type-is_unsized_array()) { +if (var_mode == ir_var_shader_storage) { + if (i != (num_variables - 1)) { + _mesa_glsl_error(loc, state, unsized array `%s' definition: + only last member of a shader storage block + can be defined as unsized array, + fields[i].name); + } +} else { + /* From GLSL ES 3.10 spec, section 4.1.9 Arrays: + * + * If an array is declared as the last member of a shader storage + * block and the size is not specified at compile-time, it is + * sized at run-time. In all other cases, arrays are sized only + * at compile-time. + */ Where is the check for 'last member' in the es path? Is this es check new? If so, should the es check be added in a patch before this one, and then extended for SSBO support? + if (state-es_shader) { + _mesa_glsl_error(loc, state, unsized array `%s' definition: + only last member of a shader storage block + can be defined as unsized array, + fields[i].name); + } +} + } + } + if (ir_variable *earlier = state-symbols-get_variable(this-instance_name)) { if (!redeclaring_per_vertex) { @@ -6003,6 +6043,32 @@ ast_interface_block::hir(exec_list *instructions, var-data.explicit_binding = this-layout.flags.q.explicit_binding; var-data.binding = this-layout.binding; + if (var-type-is_unsized_array()) { +if (var-is_in_shader_storage_block()) { + if (!is_unsized_array_last_element(var)) { + _mesa_glsl_error(loc, state, unsized array `%s' definition: + only last member of a shader storage
Re: [Mesa-dev] [PATCH v3 (part2) 09/56] glsl: Add parser/compiler support for unsized array's length()
On Tue, 2015-08-04 at 16:04 -0700, Jordan Justen wrote: On 2015-08-04 15:12:06, Jordan Justen wrote: On 2015-07-14 00:46:11, Iago Toral Quiroga wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com It also creates unop and triop expressions to tell the driver to calculate the unsized array length. It is needed two expressions to do the calculation: Two expressions are needed to do the calculation: * The unop expression saves the ir_rvalue* whose length should be calculated. * Afterwards, this unop is going to be processed by a lowering pass that will convert it to a triop that includes the block index, line 'that' up with 'Afterwards'. offset of the variable inside the shader storage block and the array stride. All of them are needed for length() calculation following GL_ARB_shader_storage_buffer spec: array.length() = max((buffer_object_size - offset_of_array) / stride_of_array, 0) Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/glsl/hir_field_selection.cpp | 15 +++ src/glsl/ir.cpp | 9 + src/glsl/ir.h | 19 ++- Hmm, I added an r-b below, but I had a question. Rather than IR nodes, should this be handled with intrinsics like the SSBO references? I think that in this case it is not needed. The reason why we wanted to avoid using expressions for ssbo loads is that they were not constant (i.e. the same ssbo load could return different values at different places in the shader program). However, my understanding is that calling length() on unsized array always returns the same value. By looking at the formula referenced above, the size of an unsized array is given by taking the available space at the tail of the buffer (which is a fixed constant value that won't change during shader execution), remove the offset of the unsized array into the buffer (which is also fixed) and divide by the stride (also fixed). So in this case it looks like using expressions is exactly what we want. Iago -Jordan src/glsl/ir_validate.cpp | 13 + src/glsl/link_uniforms.cpp| 8 +++- .../drivers/dri/i965/brw_fs_channel_expressions.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 8 src/mesa/program/ir_to_mesa.cpp | 2 ++ src/mesa/state_tracker/st_glsl_to_tgsi.cpp| 5 + 9 files changed, 75 insertions(+), 6 deletions(-) diff --git a/src/glsl/hir_field_selection.cpp b/src/glsl/hir_field_selection.cpp index 0fa9768..fedbef0 100644 --- a/src/glsl/hir_field_selection.cpp +++ b/src/glsl/hir_field_selection.cpp @@ -71,10 +71,17 @@ _mesa_ast_field_selection_to_hir(const ast_expression *expr, _mesa_glsl_error(loc, state, length method takes no arguments); if (op-type-is_array()) { -if (op-type-is_unsized_array()) - _mesa_glsl_error(loc, state, length called on unsized array); - -result = new(ctx) ir_constant(op-type-array_size()); +if (op-type-is_unsized_array()) { + if (!state-ARB_shader_storage_buffer_object_enable) { + _mesa_glsl_error(loc, state, length called on unsized array + only available with + ARB_shader_storage_buffer_object); + } + /* Calculate length of an unsized array in run-time */ + result = new(ctx) ir_expression(ir_unop_ssbo_unsized_array_length, op); +} else { + result = new(ctx) ir_constant(op-type-array_size()); +} } else if (op-type-is_vector()) { if (state-ARB_shading_language_420pack_enable) { /* .length() returns int. */ diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index 390e8f3..48c91ee 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -340,6 +340,9 @@ ir_expression::ir_expression(int op, ir_rvalue *op0) this-type = glsl_type::get_instance(GLSL_TYPE_INT, op0-type-vector_elements, 1); break; + case ir_unop_ssbo_unsized_array_length: + this-type = glsl_type::int_type; + break; default: assert(!not reached: missing automatic type setup for ir_expression); @@ -471,6 +474,10 @@ ir_expression::ir_expression(int op, ir_rvalue *op0, ir_rvalue *op1, this-type = op1-type; break; + case ir_triop_ssbo_unsized_array_length: + this-type = glsl_type::int_type; + break
Re: [Mesa-dev] [PATCH v3 (part2) 08/56] glsl: add support for unsized arrays in shader storage blocks
On Tue, 2015-08-04 at 14:08 -0700, Jordan Justen wrote: On 2015-07-14 00:46:10, Iago Toral Quiroga wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com They only can be defined in the last position of the shader storage blocks. When an unsized array is used in different shaders, it might be converted in different sized arrays, avoid get a linker error in that case. v2: - Rework error condition and error messages (Timothy Arteri) Arceri Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/glsl/ast_array_index.cpp | 5 +- src/glsl/ast_to_hir.cpp | 66 ++ src/glsl/ir.cpp | 1 + src/glsl/ir.h| 14 ++ src/glsl/linker.cpp | 107 --- 5 files changed, 155 insertions(+), 38 deletions(-) diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp index 2c79002..8a1ae67 100644 --- a/src/glsl/ast_array_index.cpp +++ b/src/glsl/ast_array_index.cpp @@ -182,8 +182,9 @@ _mesa_ast_array_index_to_hir(void *mem_ctx, if (array-type-is_array()) update_max_array_access(array, idx, loc, state); } else if (const_index == NULL array-type-is_array()) { - if (array-type-is_unsized_array()) { -_mesa_glsl_error(loc, state, unsized array index must be constant); + if (array-type-is_unsized_array() + array-variable_referenced()-data.mode != ir_var_shader_storage) { +_mesa_glsl_error(loc, state, unsized array index must be constant); } else if (array-type-fields.array-is_interface() array-variable_referenced()-data.mode == ir_var_uniform !state-is_version(400, 0) !state-ARB_gpu_shader5_enable) { diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index ca30dbc..1b4ee22 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -5599,6 +5599,19 @@ private: bool found; }; +static bool +is_unsized_array_last_element(ir_variable *v) +{ + const glsl_type *interface_type = v-get_interface_type(); + int length = interface_type-length; + + assert(v-type-is_unsized_array()); + + /* Check if it is the last element of the interface */ + if (strcmp(interface_type-fields.structure[length-1].name, v-name) == 0) + return true; + return false; +} ir_rvalue * ast_interface_block::hir(exec_list *instructions, @@ -5913,6 +5926,33 @@ ast_interface_block::hir(exec_list *instructions, if (state-stage == MESA_SHADER_GEOMETRY var_mode == ir_var_shader_in) handle_geometry_shader_input_decl(state, loc, var); + for (unsigned i = 0; i num_variables; i++) { + if (fields[i].type-is_unsized_array()) { +if (var_mode == ir_var_shader_storage) { + if (i != (num_variables - 1)) { + _mesa_glsl_error(loc, state, unsized array `%s' definition: + only last member of a shader storage block + can be defined as unsized array, + fields[i].name); + } +} else { + /* From GLSL ES 3.10 spec, section 4.1.9 Arrays: + * + * If an array is declared as the last member of a shader storage + * block and the size is not specified at compile-time, it is + * sized at run-time. In all other cases, arrays are sized only + * at compile-time. + */ Where is the check for 'last member' in the es path? Is this es check new? If so, should the es check be added in a patch before this one, and then extended for SSBO support? The else branch here where the if (state-es_shader) is included is for things that are not SSBOs. What I think Samuel was trying to do here is to produce an error for any unsized array declaration in the ES path that is not inside an SSBO (if it is inside the SSBO, then it goes though the if branch and that checks that it is the last member in the definition. If this is what he intended to do, then I agree that it would probably make sense to have the check included in a separate patch before this one since it is unrelated to SSBOs, then modify that code with this patch to add the ssbo path included in the if branch. I'll let Samuel have a look at this when he is back from holidays, since since he might have other reasons for doing it like this. + if (state-es_shader) { + _mesa_glsl_error(loc, state, unsized array `%s' definition: + only last member of a shader storage block + can be defined as unsized array, + fields
Re: [Mesa-dev] [PATCH 16/17] glsl: Add link time checks for GLSL precision qualifiers
On Thu, 2015-07-30 at 12:33 +0200, Iago Toral wrote: On Wed, 2015-07-29 at 15:21 -0700, Ian Romanick wrote: On 07/29/2015 07:01 AM, Samuel Iglesias Gonsalvez wrote: From: Iago Toral Quiroga ito...@igalia.com Currently, we only consider precision qualifiers at compile-time. This patch adds precision information to ir_variable so we can also do link time checks. Specifically, from the GLSL ES3 spec, 4.5.3 Precision Qualifiers: The same uniform declared in different shaders that are linked together must have the same precision qualification. Notice that this patch will check the above also for GLSL ES globals that are not uniforms. This is not explicitly stated in the spec, but seems to be the only consistent choice since we can only have one definition of a global all its declarations should be identical, including precision qualifiers. That's not right. Global variables from different stages that are not inputs/outputs or uniforms are distinct... they don't even have to be the same type. ES shaders only allow a single compliation unit per stage, so we don't have to worry about inter-stage globals. Ugh, sorry, the commit log does not make a good job at explaining the situation. This patch does not produce a linker error for globals that are not uniforms, I only meant to say that for globals *in interface blocks*, for which we are producing a linker error in the case of type mismatches, precision will also be considered to decide if the types mismatch. Sorry for being so imprecise in the description, I'll fix the commit log. I guess with this clarification there are no issues with this, right? Timothy pointed out to me recently that the GLSL ES spec has this mention: Precision qualifiers for outputs in one shader matched to inputs in another shader need not match when both shaders are linked into the same program. When both shaders are in separate programs, mismatched precision qualifiers will result in a program interface mismatch that will result in program pipeline validation failures, as described in section 7.4.1 (“Shader Interface Matching”) of the OpenGL ES 3.1 Specification. This makes things a bit more complicated I guess, in any case it makes clear that this is not exactly what the ES spec expects, so I need to rethink how to approach this. These checks don't affect desktop GLSL shaders because we ignore precision information in this case (all variables have precision GLSL_PRECISION_NONE). Fixes the following 5 dEQP tests: dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_1 dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_2 dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_3 dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_4 dEQP-GLES3.functional.shaders.linkage.uniform.block.precision_mismatch --- src/glsl/linker.cpp | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 12b7780..fd68f43 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -958,13 +958,22 @@ cross_validate_globals(struct gl_shader_program *prog, if (var-type-is_record() existing-type-is_record() existing-type-record_compare(var-type)) { existing-type = var-type; - } else { + } else if (strcmp(var-type-name, existing-type-name)) { linker_error(prog, %s `%s' declared as type `%s' and type `%s'\n, mode_string(var), var-name, var-type-name, existing-type-name); return; + } else { + /* The global is declared with the same type name but the type + * declarations mismatch (e.g. the same struct type name, but + * the actual struct declarations mismatch). + */ + linker_error(prog, %s `%s' declared with mismatching definitions + of type `%s'\n, + mode_string(var), var-name, var-type-name); + return; } } } @@ -1121,6 +1130,29 @@ cross_validate_globals(struct gl_shader_program *prog, mode_string(var), var-name); return; } +/* From the GLSL ES3 spec, 4.5.3 Precision qualifiers: + * + * The same uniform declared in different shaders that are linked
Re: [Mesa-dev] [PATCH] nir: add missing type to type_size_vec4()
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Wed, 2015-08-05 at 20:31 +1000, Timothy Arceri wrote: Cc: Iago Toral Quiroga ito...@igalia.com Cc: Jason Ekstrand jason.ekstr...@intel.com --- src/glsl/nir/nir_lower_io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c index 3c17929..71bfd34 100644 --- a/src/glsl/nir/nir_lower_io.c +++ b/src/glsl/nir/nir_lower_io.c @@ -62,6 +62,8 @@ type_size_vec4(const struct glsl_type *type) size += type_size_vec4(glsl_get_struct_field(type, i)); } return size; + case GLSL_TYPE_SUBROUTINE: + return 1; case GLSL_TYPE_SAMPLER: return 0; case GLSL_TYPE_ATOMIC_UINT: ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 (part2) 49/59] glsl: Do not allow assignments to read-only variables
On Wed, 2015-08-05 at 20:04 +1000, Timothy Arceri wrote: On Wed, 2015-08-05 at 10:30 +0200, Iago Toral Quiroga wrote: --- src/glsl/ast_to_hir.cpp | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index e834a46..518612d 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -811,8 +811,15 @@ do_assignment(exec_list *instructions, struct _mesa_glsl_parse_state *state, } ir_variable *lhs_var = lhs-variable_referenced(); - if (lhs_var) + if (lhs_var) { + if (lhs_var-data.image_read_only) { It looks like data.read_only is always set to true for images so wouldn't this already be caught already by the existing read-only check? else if (lhs_var != NULL lhs_var-data.read_only) { _mesa_glsl_error(lhs_loc, state, assignment to read-only variable '%s', lhs_var-name); Not as it is now, because with SSBOs we only set image_read_only and not read_only when the readonly qualifier is used. I suppose this is what we are expected to do since the SSBO spec says that behavior for these qualifiers on SSBOs is the same as for images: https://www.opengl.org/registry/specs/ARB/shader_storage_buffer_object.txt Modify Section 4.10, Memory Qualifiers (p. 71) (...) (insert after third paragraph, p. 73) The memory qualifiers coherent, volatile, restrict, readonly, and writeonly may be used in the declaration of buffer variables (i.e., members of shader storage blocks). When a buffer variable is declared with a memory qualifier, the behavior specified for memory accesses involving image variables described above applies identically to memory accesses involving that buffer variable. It is an error to assign to a buffer variable qualified with readonly or to read from a buffer variable qualified with writeonly. What is a bit confusing for me is that images seem to set image_read_only depending on whether we used the readonly qualifier or not (like ssbos) but then they also set read_only to true unconditionally, so I guess there is a difference between both fields, but I don't know what it is exactly, specially since you can also use writeonly on images, for example. In any case, since we have both read_only and image_read_only in ir_variable at present, I think it makes sense to have checks for both of them, if one of them ends up being redundant the right thing to do would be to kill it completely I guess, otherwise it only gets (even) more confusing. Iago + _mesa_glsl_error(lhs_loc, state, + assignment to read-only variable `%s', + lhs_var-name); + error_emitted = true; + } lhs_var-data.assigned = true; + } if (!error_emitted) { if (non_lvalue_description != NULL) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 (part2) 09/56] glsl: Add parser/compiler support for unsized array's length()
On Wed, 2015-08-05 at 13:38 +0300, Francisco Jerez wrote: Iago Toral ito...@igalia.com writes: On Tue, 2015-08-04 at 16:04 -0700, Jordan Justen wrote: On 2015-08-04 15:12:06, Jordan Justen wrote: On 2015-07-14 00:46:11, Iago Toral Quiroga wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com It also creates unop and triop expressions to tell the driver to calculate the unsized array length. It is needed two expressions to do the calculation: Two expressions are needed to do the calculation: * The unop expression saves the ir_rvalue* whose length should be calculated. * Afterwards, this unop is going to be processed by a lowering pass that will convert it to a triop that includes the block index, line 'that' up with 'Afterwards'. offset of the variable inside the shader storage block and the array stride. All of them are needed for length() calculation following GL_ARB_shader_storage_buffer spec: array.length() = max((buffer_object_size - offset_of_array) / stride_of_array, 0) Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/glsl/hir_field_selection.cpp | 15 +++ src/glsl/ir.cpp | 9 + src/glsl/ir.h | 19 ++- Hmm, I added an r-b below, but I had a question. Rather than IR nodes, should this be handled with intrinsics like the SSBO references? I think that in this case it is not needed. The reason why we wanted to avoid using expressions for ssbo loads is that they were not constant (i.e. the same ssbo load could return different values at different places in the shader program). However, my understanding is that calling length() on unsized array always returns the same value. By looking at the formula referenced above, the size of an unsized array is given by taking the available space at the tail of the buffer (which is a fixed constant value that won't change during shader execution), remove the offset of the unsized array into the buffer (which is also fixed) and divide by the stride (also fixed). So in this case it looks like using expressions is exactly what we want. I guess it would be nice to implement it as an intrinsic for consistency with the other SSBO operations, but I guess an expression node will work in practice too for now for the reasons you mention. There's one thing that seems rather non-orthogonal to me. Is there any reason you are hardcoding the array length formula into the instruction? It seems weird to have an instruction that calculates the length of an unsized array while the back-end basically knows nothing about the individual variables part of the SSBO, because they've all been lowered to an offset+size range within a single SSBO buffer already. The same goes for the VS_OPCODE_UNSIZED_ARRAY_LENGTH back-end instruction that knows nothing about unsized arrays, it just returns the size of the whole buffer. AFAICT it would simplify both the IR and the back-end implementations if you had a simple get_ssbo_size intrinsic that would take a single SSBO index argument and return the total size of the buffer. That would help because otherwise all back-ends are going to need to re-implement basically the same formula. Yep, that makes more sense to me as well. For that formula, the IR already knows the offset and the stride, it only needs the total size of the ssbo, and that's the only thing we should have to implement in the backends. That said, I see no reason for that to be an intrinsic, it can perfectly be a unop expression since it returns a constant value, pretty much like UBO loads, which should be even easier than an intrinsic (mostly because lowering to an intrinsic involves creating function signatures, etc). Iago Iago -Jordan src/glsl/ir_validate.cpp | 13 + src/glsl/link_uniforms.cpp| 8 +++- .../drivers/dri/i965/brw_fs_channel_expressions.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 8 src/mesa/program/ir_to_mesa.cpp | 2 ++ src/mesa/state_tracker/st_glsl_to_tgsi.cpp| 5 + 9 files changed, 75 insertions(+), 6 deletions(-) diff --git a/src/glsl/hir_field_selection.cpp b/src/glsl/hir_field_selection.cpp index 0fa9768..fedbef0 100644 --- a/src/glsl/hir_field_selection.cpp +++ b/src/glsl/hir_field_selection.cpp @@ -71,10 +71,17 @@ _mesa_ast_field_selection_to_hir(const ast_expression *expr, _mesa_glsl_error(loc, state, length method takes no arguments); if (op-type-is_array()) { -if (op-type-is_unsized_array
Re: [Mesa-dev] [PATCH v4 (part2) 49/59] glsl: Do not allow assignments to read-only variables
On Wed, 2015-08-05 at 22:22 +1000, Timothy Arceri wrote: On Wed, 2015-08-05 at 13:45 +0200, Iago Toral wrote: On Wed, 2015-08-05 at 20:04 +1000, Timothy Arceri wrote: On Wed, 2015-08-05 at 10:30 +0200, Iago Toral Quiroga wrote: --- src/glsl/ast_to_hir.cpp | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index e834a46..518612d 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -811,8 +811,15 @@ do_assignment(exec_list *instructions, struct _mesa_glsl_parse_state *state, } ir_variable *lhs_var = lhs-variable_referenced(); - if (lhs_var) + if (lhs_var) { + if (lhs_var-data.image_read_only) { It looks like data.read_only is always set to true for images so wouldn't this already be caught already by the existing read-only check? else if (lhs_var != NULL lhs_var-data.read_only) { _mesa_glsl_error(lhs_loc, state, assignment to read-only variable '%s', lhs_var-name); Not as it is now, because with SSBOs we only set image_read_only and not read_only when the readonly qualifier is used. I suppose this is what we are expected to do since the SSBO spec says that behavior for these qualifiers on SSBOs is the same as for images: https://www.opengl.org/registry/specs/ARB/shader_storage_buffer_object.txt Modify Section 4.10, Memory Qualifiers (p. 71) (...) (insert after third paragraph, p. 73) The memory qualifiers coherent, volatile, restrict, readonly, and writeonly may be used in the declaration of buffer variables (i.e., members of shader storage blocks). When a buffer variable is declared with a memory qualifier, the behavior specified for memory accesses involving image variables described above applies identically to memory accesses involving that buffer variable. It is an error to assign to a buffer variable qualified with readonly or to read from a buffer variable qualified with writeonly. What is a bit confusing for me is that images seem to set image_read_only depending on whether we used the readonly qualifier or not (like ssbos) but then they also set read_only to true unconditionally, so I guess there is a difference between both fields, Asking what the difference is was originally going to be my first question to you :) but I don't know what it is exactly, specially since you can also use writeonly on images, for example. So I really dont know much about images but after some reading the conclusion I've come to is the qualifiers (image_read_only) are meant to limit how you can use imageStore(), imageLoad() and imageAtomic*() etc. Looking at ARB_shader_image_load_store that seems consistent... In that case I imagine that we could just set read_only for buffer variables with the readonly qualifier instead of image_read_only and drop this patch. We will need to add, at least, write_only to ir_variable as well I guess... I imagine that the 3 other fields (image_coherent, image_restrict, image_volatile) do not have image-specific semantics like image_read_only and image_write_oly and can be shared with ssbos we do not have to replicate them in ir_variable as well (in that case we might want to rename them so it is clear that image_read_only and image_write_only really are special and specific to images) Curro, what do you think? On the other hand read_only is the usual uniform restriction stoping you from assigning to the variable directly e.g myImage = 1; which is why its always set to true. If I'm correct I dont think this patch is needed. In any case, since we have both read_only and image_read_only in ir_variable at present, I think it makes sense to have checks for both of them, if one of them ends up being redundant the right thing to do would be to kill it completely I guess, otherwise it only gets (even) more confusing. Iago + _mesa_glsl_error(lhs_loc, state, + assignment to read-only variable `%s', + lhs_var-name); + error_emitted = true; + } lhs_var-data.assigned = true; + } if (!error_emitted) { if (non_lvalue_description != NULL) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 05/23] glsl: Fail to link if inter-stage input/outputs are not assigned to stream 0
On Wed, 2015-07-29 at 21:58 +0200, Marek Olšák wrote: Hi, Where does the spec say we should fail to link? I don't see such a statement there. I have reviewed ARB_gpu_shader5 and I don't see any specific mentions to what should be done in this particular case. That said, isn't this the logical thing to do? It is a programming error to link an FS input to a GS output bound to a non-zero stream and at best the program would have undefined behavior if the FS input is used. Hiding this from the developer silently does not seem to be a good idea in any case, whatever the developer was trying to accomplish he is doing it wrong. It looks like varyings with stream 0 should not be linked with the fragment shader. How is this better? FWIW, the proprietary nVidia driver also fails to link in this case with this error: output 'var_name' is associated with an input with a non-zero stream, which is not allowed Iago Marek On Wed, Jun 18, 2014 at 11:51 AM, Iago Toral Quiroga ito...@igalia.com wrote: Outputs that are linked to inputs in the next stage must be output to stream 0, otherwise we should fail to link. --- src/glsl/link_varyings.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp index 9725a43..3b20594 100644 --- a/src/glsl/link_varyings.cpp +++ b/src/glsl/link_varyings.cpp @@ -1345,6 +1345,14 @@ assign_varying_locations(struct gl_context *ctx, if (input_var || (prog-SeparateShader consumer == NULL)) { matches.record(output_var, input_var); } + + /* Only stream 0 outputs can be consumed in the next stage */ + if (input_var output_var-data.stream != 0) { +linker_error(prog, output %s is assigned to stream=%d but + is linked to an input, which requires stream=0, + output_var-name, output_var-data.stream); +return false; + } } } else { /* If there's no producer stage, then this must be a separable program. -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 05/23] glsl: Fail to link if inter-stage input/outputs are not assigned to stream 0
On Thu, 2015-07-30 at 09:43 +0200, Marek Olšák wrote: On Thu, Jul 30, 2015 at 8:49 AM, Iago Toral ito...@igalia.com wrote: On Wed, 2015-07-29 at 21:58 +0200, Marek Olšák wrote: Hi, Where does the spec say we should fail to link? I don't see such a statement there. I have reviewed ARB_gpu_shader5 and I don't see any specific mentions to what should be done in this particular case. That said, isn't this the logical thing to do? It is a programming error to link an FS input to a GS output bound to a non-zero stream and at best the program would have undefined behavior if the FS input is used. Hiding this from the developer silently does not seem to be a good idea in any case, whatever the developer was trying to accomplish he is doing it wrong. It looks like varyings with stream 0 should not be linked with the fragment shader. How is this better? The spec simply says that varyings with stream 0 are not passed to the rasterizer, which means the matching fragment shader inputs should be uninitialized, but it should still be possible to capture the varyings with transform feedback. That's how I understand the spec. FWIW, I have tested this in the proprietary nVidia driver and the result is the same, it fails to link even if that GS output is captured by TF. My interpretation of the spec is that since GS outputs to stream 0 are not passed down the pipeline they simply do not exist in the eyes of the FS, that is, I see this situation as the same in which we declare an input in the FS that is not declared as output in the GS. But it is true that the spec does not address this situation explicitly, so I think both interpretations could be valid. I still think that failing to link is better though. If we report a link failure the developer knows what is going on and the fix is trivial, otherwise they will run into incorrect rendering, they will have to figure out what is going and eventually fix the code anyway... Since at least nVidia proprietary is failing to link as well in these scenarios I guess our chances of running into shaders that we fail to link for this reason and were expected to link properly are pretty small too. Iago ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/17] glsl: Add precision information to ir_variable
On Wed, 2015-07-29 at 15:16 -0700, Ian Romanick wrote: On 07/29/2015 07:01 AM, Samuel Iglesias Gonsalvez wrote: From: Iago Toral Quiroga ito...@igalia.com We will need this later on when we implement proper support for precision qualifiers in the drivers and also to do link time checks for uniforms as indicated by the spec. This patch also adds compile-time checks for variables without precision information (currently, Mesa only checks that a default precision is set for floats in fragment shaders). As indicated by Ian, the addition of the precision information to ir_variable has been done using a bitfield and pahole to identify an available hole so that memory requirements for ir_variable stay the same. --- src/glsl/ast_to_hir.cpp | 316 +++- src/glsl/glsl_types.cpp | 4 + src/glsl/glsl_types.h | 12 ++ src/glsl/ir.h | 13 ++ 4 files changed, 288 insertions(+), 57 deletions(-) diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 789b2bc..8b170c2 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -1993,6 +1993,41 @@ process_array_type(YYLTYPE *loc, const glsl_type *base, return array_type; } +static bool +precision_qualifier_allowed(const glsl_type *type) This function is just moved up from below? I would have been tempted to put that in a separate patch to make it more obvious that there no changes. *shrug* +{ + /* Precision qualifiers apply to floating point, integer and sampler +* types. +* +* Section 4.5.2 (Precision Qualifiers) of the GLSL 1.30 spec says: +*Any floating point or any integer declaration can have the type +*preceded by one of these precision qualifiers [...] Literal +*constants do not have precision qualifiers. Neither do Boolean +*variables. +* +* Section 4.5 (Precision and Precision Qualifiers) of the GLSL 1.30 +* spec also says: +* +* Precision qualifiers are added for code portability with OpenGL +* ES, not for functionality. They have the same syntax as in OpenGL +* ES. +* +* Section 8 (Built-In Functions) of the GLSL ES 1.00 spec says: +* +* uniform lowp sampler2D sampler; +* highp vec2 coord; +* ... +* lowp vec4 col = texture2D (sampler, coord); +*// texture2D returns lowp +* +* From this, we infer that GLSL 1.30 (and later) should allow precision +* qualifiers on sampler types just like float and integer types. +*/ + return type-is_float() + || type-is_integer() + || type-is_record() + || type-is_sampler(); +} const glsl_type * ast_type_specifier::glsl_type(const char **name, @@ -2009,31 +2044,172 @@ ast_type_specifier::glsl_type(const char **name, return type; } +/** + * From the OpenGL ES 3.0 spec, 4.5.4 Default Precision Qualifiers: + * + * The precision statement + * + *precision precision-qualifier type; + * + * can be used to establish a default precision qualifier. The type field can + * be either int or float or any of the sampler types, (...) If type is float, + * the directive applies to non-precision-qualified floating point type + * (scalar, vector, and matrix) declarations. If type is int, the directive + * applies to all non-precision-qualified integer type (scalar, vector, signed, + * and unsigned) declarations. + * + * We use the symbol table to keep the values of the default precisions for + * each 'type' in each scope and we use the 'type' string from the precision + * statement as key in the symbol table. When we want to retrieve the default + * precision associated with a given glsl_type we need to know the type string + * associated with it. This is what this function returns. + */ +static const char * +get_type_name_for_precision_qualifier(const glsl_type *type) +{ + switch (type-base_type) { + case GLSL_TYPE_FLOAT: + return float; + case GLSL_TYPE_UINT: + case GLSL_TYPE_INT: + return int; + case GLSL_TYPE_SAMPLER: { + bool array = type-sampler_array; + bool shadow = type-sampler_shadow; + switch (type-sampler_type) { + case GLSL_TYPE_FLOAT: + switch (type-sampler_dimensionality) { + case GLSL_SAMPLER_DIM_1D: +if (!array !shadow) + return sampler1D; +if (array !shadow) + return sampler1DArray; +if (!array shadow) + return sampler1DShadow; +return sampler1DArrayShadow; + case GLSL_SAMPLER_DIM_2D: +if (!array !shadow) + return sampler2D; +if (array !shadow
Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written
On Fri, 2015-07-31 at 13:12 +0300, Francisco Jerez wrote: Iago Toral ito...@igalia.com writes: On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote: Iago Toral Quiroga ito...@igalia.com writes: When we have code such as this: mov vgrf1.0.x:F, vgrf2.:F mov vgrf3.0.x:F, vgrf1.:F ... mov vgrf3.0.x:F, vgrf1.:F And vgrf1 is chosen for spilling, we can emit this: mov vgrf1.0.x:F, vgrf2.:F gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D mov vgrf3.0.x:F, vgrf1.:F ... gen4_scratch_read vgrf4.0.x:F, 22D mov vgrf3.0.x:F, vgrf4.:F Instead of this: mov vgrf1.0.x:F, vgrf2.:F gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D gen4_scratch_read vgrf4.0.x:F, 22D mov vgrf3.0.x:F, vgrf4.:F ... gen4_scratch_read vgrf5.0.x:F, 22D mov vgrf3.0.x:F, vgrf5.:F And save one scratch read while still preserving the benefits of spilling the register. In general, we avoid emitting scratch reads for as long as the next instruction keeps reading the spilled register. This should not harm the benefit of spilling the register because gains for register allocation only come when we have chunks of program code where the register is alive but not really used (because these are the points where we could effectively use that register for another purpose if we spilled it), so as long as consecutive instructions use that register we can avoid the scratch reads without losing anything. --- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 +- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index cff5406..fd56dae 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr) unsigned int spill_offset = last_scratch++; /* Generate spill/unspill instructions for the objects being spilled. */ + vec4_instruction *spill_write_inst = NULL; foreach_block_and_inst(block, vec4_instruction, inst, cfg) { + /* We don't spill registers used for scratch */ + if (inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_READ || + inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) + continue; + int scratch_reg = -1; + bool spill_reg_was_read = false; for (unsigned int i = 0; i 3; i++) { if (inst-src[i].file == GRF inst-src[i].reg == spill_reg_nr) { -if (scratch_reg == -1) { +if (!spill_reg_was_read) { + spill_reg_was_read = (!inst-predicate || + inst-opcode == BRW_OPCODE_SEL); +} + +/* If we are reading the spilled register right after writing + * to it we can skip the scratch read and use directly the + * register we used as source for the scratch write. For this + * to work we must check that: + * + * 1) The write is inconditional, that is, it is not predicated or + *it is a SEL. + * 2) All the channels that we read have been written in that + *last write instruction. + * + * We keep doing this for as long as the next instruction + * keeps reading the spilled register and break as soon as we + * find an instruction that doesn't. + */ +if (spill_write_inst +(!spill_write_inst-predicate || + spill_write_inst-opcode == BRW_OPCODE_SEL) +((brw_mask_for_swizzle(inst-src[i].swizzle) + ~spill_write_inst-dst.writemask) == 0)) { + scratch_reg = spill_write_inst-dst.reg; +} else if (scratch_reg == -1) { One suggestion: You could factor out the rather complex caching logic into a separate function (e.g. 'bool can_reuse_scratch_for_source(const vec4_instruction *, unsigned i, unsigned scratch_reg)'). The function would simply compare scratch_reg with the sources of the current instruction (up to src) and the sources and destination of the previous non-scratch_read/write instruction. If there's a match it would check that the regioning is compatible with the i-th source and return true in that case. This would have several benefits: I think this might need to be a bit more complex. The previous inst's src[i] might read only a subset of the channels that where loaded into scratch_reg so comparing only against that can lead us to think that we can't reuse scratch_reg when in fact we can. I think the process
Re: [Mesa-dev] [PATCH 16/17] glsl: Add link time checks for GLSL precision qualifiers
On Wed, 2015-07-29 at 15:21 -0700, Ian Romanick wrote: On 07/29/2015 07:01 AM, Samuel Iglesias Gonsalvez wrote: From: Iago Toral Quiroga ito...@igalia.com Currently, we only consider precision qualifiers at compile-time. This patch adds precision information to ir_variable so we can also do link time checks. Specifically, from the GLSL ES3 spec, 4.5.3 Precision Qualifiers: The same uniform declared in different shaders that are linked together must have the same precision qualification. Notice that this patch will check the above also for GLSL ES globals that are not uniforms. This is not explicitly stated in the spec, but seems to be the only consistent choice since we can only have one definition of a global all its declarations should be identical, including precision qualifiers. That's not right. Global variables from different stages that are not inputs/outputs or uniforms are distinct... they don't even have to be the same type. ES shaders only allow a single compliation unit per stage, so we don't have to worry about inter-stage globals. Ugh, sorry, the commit log does not make a good job at explaining the situation. This patch does not produce a linker error for globals that are not uniforms, I only meant to say that for globals *in interface blocks*, for which we are producing a linker error in the case of type mismatches, precision will also be considered to decide if the types mismatch. Sorry for being so imprecise in the description, I'll fix the commit log. I guess with this clarification there are no issues with this, right? Iago These checks don't affect desktop GLSL shaders because we ignore precision information in this case (all variables have precision GLSL_PRECISION_NONE). Fixes the following 5 dEQP tests: dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_1 dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_2 dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_3 dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_4 dEQP-GLES3.functional.shaders.linkage.uniform.block.precision_mismatch --- src/glsl/linker.cpp | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index 12b7780..fd68f43 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -958,13 +958,22 @@ cross_validate_globals(struct gl_shader_program *prog, if (var-type-is_record() existing-type-is_record() existing-type-record_compare(var-type)) { existing-type = var-type; - } else { + } else if (strcmp(var-type-name, existing-type-name)) { linker_error(prog, %s `%s' declared as type `%s' and type `%s'\n, mode_string(var), var-name, var-type-name, existing-type-name); return; + } else { + /* The global is declared with the same type name but the type + * declarations mismatch (e.g. the same struct type name, but + * the actual struct declarations mismatch). + */ + linker_error(prog, %s `%s' declared with mismatching definitions + of type `%s'\n, + mode_string(var), var-name, var-type-name); + return; } } } @@ -1121,6 +1130,29 @@ cross_validate_globals(struct gl_shader_program *prog, mode_string(var), var-name); return; } +/* From the GLSL ES3 spec, 4.5.3 Precision qualifiers: + * + * The same uniform declared in different shaders that are linked + * together must have the same precision qualification. + * + * In the GLSL ES2 spec this was resolved in the issue amendments + * (10.3 Precision Qualifiers). The GLSL ES1 spec overlooked this, + * but seems like an obvious error since we can only have one + * consistent definition of a global. + * + * The desktop GLSL spec does not include this reference + * because precision qualifiers are ignored. We will never + * hit this scenario in desktop GLSL though because we always set + * the precision of variables to GLSL_PRECISION_NONE. + */ +if (var-data.mode == ir_var_uniform) { + if (existing-data.precision != var-data.precision
Re: [Mesa-dev] [PATCH] mesa/formats: Fix swizzle flipping for big-endian targets
Reviewed-by: Iago Toral Quiroga ito...@igalia.com El 2015-08-11 14:25, Oded Gabbay escribió: On Mon, Aug 10, 2015 at 9:50 AM, Jason Ekstrand ja...@jlekstrand.net wrote: The swizzle defines where in the format you should look for any given channel. When we flip the format around for BE targets, we need to change the destinations of the swizzles, not the sources. For example, say the format is an RGBX format with a swizzle of xyz1 on LE. Then it should be wzy1 on BE; however, the code as it was before, would have made it 1zyx on BE which is clearly wrong. Cc: Iago Toral ito...@igalia.com Cc: Oded Gabbay oded.gab...@gmail.com --- src/mesa/main/formats.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c index d927073..27590ed 100644 --- a/src/mesa/main/formats.c +++ b/src/mesa/main/formats.c @@ -354,14 +354,22 @@ mi_mesa_array_format_flip_channels(mesa_array_format format) return format; if (num_channels == 2) { - _mesa_array_format_set_swizzle(format, swizzle[1], swizzle[0], - swizzle[2], swizzle[3]); + /* Assert that the swizzle makes sense for 2 channels */ + for (unsigned i = 0; i 4; i++) + assert(swizzle[i] != 2 swizzle[i] != 3); + + static const uint8_t flip_xy[6] = { 1, 0, 2, 3, 4, 5 }; + _mesa_array_format_set_swizzle(format, + flip_xy[swizzle[0]], flip_xy[swizzle[1]], + flip_xy[swizzle[2]], flip_xy[swizzle[3]]); return format; } if (num_channels == 4) { - _mesa_array_format_set_swizzle(format, swizzle[3], swizzle[2], - swizzle[1], swizzle[0]); + static const uint8_t flip[6] = { 3, 2, 1, 0, 4, 5 }; + _mesa_array_format_set_swizzle(format, + flip[swizzle[0]], flip[swizzle[1]], + flip[swizzle[2]], flip[swizzle[3]]); return format; } --:-) 2.4.3 Reviewed-by: Oded Gabbay oded.gab...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa/formats: Only do byteswapping for packed formats
Reviewed-by: Iago Toral Quiroga ito...@igalia.com El 2015-08-08 18:04, Jason Ekstrand escribió: Cc: Iago Toral ito...@igalia.com Cc: Oded Gabbay oded.gab...@gmail.com --- src/mesa/main/formats.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c index baeb1bf..d927073 100644 --- a/src/mesa/main/formats.c +++ b/src/mesa/main/formats.c @@ -372,10 +372,10 @@ uint32_t _mesa_format_to_array_format(mesa_format format) { const struct gl_format_info *info = _mesa_get_format_info(format); - if (_mesa_little_endian()) - return info-ArrayFormat; - else + if (!_mesa_little_endian() info-Layout == MESA_FORMAT_LAYOUT_PACKED) return _mesa_array_format_flip_channels(info-ArrayFormat); + else + return info-ArrayFormat; } static struct hash_table *format_array_format_table; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 (part2) 29/59] nir: Implement __intrinsic_store_ssbo
On Thu, 2015-08-06 at 11:06 -0700, Connor Abbott wrote: On Thu, Aug 6, 2015 at 12:30 AM, Iago Toral ito...@igalia.com wrote: On Wed, 2015-08-05 at 12:17 -0700, Connor Abbott wrote: On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote: --- src/glsl/nir/glsl_to_nir.cpp | 36 src/glsl/nir/nir_intrinsics.h | 12 ++-- 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index 642affd..cbec2df 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -641,6 +641,8 @@ nir_visitor::visit(ir_call *ir) op = nir_intrinsic_image_atomic_comp_swap; } else if (strcmp(ir-callee_name(), __intrinsic_memory_barrier) == 0) { op = nir_intrinsic_memory_barrier; + } else if (strcmp(ir-callee_name(), __intrinsic_store_ssbo) == 0) { + op = nir_intrinsic_store_ssbo; } else { unreachable(not reached); } @@ -730,6 +732,40 @@ nir_visitor::visit(ir_call *ir) } case nir_intrinsic_memory_barrier: break; + case nir_intrinsic_store_ssbo: { + exec_node *param = ir-actual_parameters.get_head(); + ir_rvalue *block = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_rvalue *offset = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_rvalue *val = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_constant *write_mask = ((ir_instruction *)param)-as_constant(); + assert(write_mask); + + /* Check if we need the indirect version */ + ir_constant *const_offset = offset-as_constant(); + if (!const_offset) { +op = nir_intrinsic_store_ssbo_indirect; +ralloc_free(instr); +instr = nir_intrinsic_instr_create(shader, op); +instr-src[2] = evaluate_rvalue(offset); +instr-const_index[0] = 0; + } else { +instr-const_index[0] = const_offset-value.u[0]; + } + + instr-const_index[1] = write_mask-value.u[0]; + + instr-src[0] = evaluate_rvalue(val); + instr-num_components = val-type-vector_elements; + + instr-src[1] = evaluate_rvalue(block); + break; + } default: unreachable(not reached); } diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index f264f55..83eeecd 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -176,12 +176,12 @@ LOAD(input, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) * offset. */ -#define STORE(name, num_indices, flags) \ - INTRINSIC(store_##name, 1, ARR(0), false, 0, 0, num_indices, flags) \ - INTRINSIC(store_##name##_indirect, 2, ARR(0, 1), false, 0, 0, \ +#define STORE(name, extra_srcs, num_indices, flags) \ + INTRINSIC(store_##name, extra_srcs, ARR(0, 1), false, 0, 0, num_indices, flags) \ + INTRINSIC(store_##name##_indirect, extra_srcs + 1, ARR(0, 1, 1), false, 0, 0, \ num_indices, flags) \ -STORE(output, 1, 0) -/* STORE(ssbo, 2, 0) */ +STORE(output, 1, 2, 0) +STORE(ssbo, 2, 2, 0) I don't think outputs should have any extra sources, since they only take a constant index, plus possibly an indirect source that's already covered by the STORE macro. SSBO stores should only have one extra source for the block index. Also, we should update the comment above to explain this similarly to the paragraph above the loads. SSBO stores need an extra source for the block index and an extra index for a writemask. I'll leave the STORE() macro as it was and just define SSBO stores using INTRINSIC() directly then. Ok, I see. I don't think you need a separate INTRINSIC(), but right now calling the parameter you added extra_srcs is confusing, since you're counting the value to be stored, which isn't really extra at all -- every store should have one! How about instead, we change the STORE macro to have: - An extra_srcs parameter that contains only sources that are actually extra, not counting the value to be stored -- direct stores have extra_srcs + 1 sources, and indirect sources have extra_srcs + 2 sources - An extra_indices parameter that contains the extra indices, and replace num_indices with extra_indices + 1 Then normal stores have both set to 0, and SSBO stores have both set to 1 to indicate the extra block index and writemask. Sure, sounds good to me. Iago -LAST_INTRINSIC(store_output_indirect) +LAST_INTRINSIC(store_ssbo_indirect) -- 1.9.1
Re: [Mesa-dev] [PATCH v3 4/4] i965/vec4: Don't unspill the same register in consecutive instructions
On Thu, 2015-08-06 at 18:27 +0300, Francisco Jerez wrote: Iago Toral Quiroga ito...@igalia.com writes: If we have spilled/unspilled a register in the current instruction, avoid emitting unspills for the same register in the same instruction or consecutive instructions following the current one as long as they keep reading the spilled register. This should allow us to avoid emitting costy unspills that come with little benefit to register allocation. Also, update evaluate_spill_costs so that we account for the saved unspills. --- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 129 +++-- 1 file changed, 121 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index 617c988..fed5f4d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp @@ -264,6 +264,95 @@ vec4_visitor::reg_allocate() return true; } +/** + * When we decide to spill a register, instead of blindly spilling every use, + * save unspills when the spill register is used (read) in consecutive + * instructions. This can potentially save a bunch of unspills that would + * have very little impact in register allocation anyway. + * + * Notice that we need to account for this behavior when spilling a register + * and when evaluating spilling costs. This function is designed so it can + * be called from both places and avoid repeating the logic. + * + * - When we call this function from spill_reg, we pass in scratch_reg the + *actual unspill/spill register that we want to reuse in the current + *instruction. + * + * - When we call this from evaluate_spill_costs, we pass the register for + *which we are evaluating spilling costs. + * + * In either case, we check if the previous instructions read scratch_reg until + * we find an instruction that writes to it (in which case we can reuse + * scratch_reg as long as the writemask is compatible with the channels we need + * to read in the current instruction) or we hit an instruction that does not + * read scratch_reg at all. The latter can only happen when we call this from + * evaluate_spill_costs, Strictly speaking it can also happen when called from spill_reg() for the first time in a given sequence of consecutive instructions (in which case you correctly return false). True, I'll fix the comment. and means that this is the point at which we first + * need the unspill this register for our current instruction. Since all our + * unspills read a full vec4, we know that in this case we will have all + * the channels available in scratch_reg and we can reuse it. + * + * In any other case, we can't reuse scratch_reg in the current instruction, + * meaning that we will need to unspill it. + */ +static bool +can_use_scratch_for_source(const vec4_instruction *inst, unsigned i, + unsigned scratch_reg) +{ + assert(inst-src[i].file == GRF); + + /* If the current instruction is already using scratch_reg in src[n] with +* n i, then we know we can reuse it for src[i] too. +*/ + for (unsigned n = 0; n i; n++) { + if (inst-src[n].file == GRF inst-src[n].reg == scratch_reg) + return true; + } I don't think this is correct in cases where the previous source reused the temporary of a previously spilled register with incompatible writemask. You probably want to handle the current instruction consistently with the previous ones, i.e. as part of the loop below. Yes, you're right. I suggest you define a variable (e.g. n as you've called it) initially equal to i that would determine the number of sources to check for the next instruction. At the end of the loop body it would be re-set to 3, what would also cause the destination registers to be checked in subsequent iterations. Sounds good to me. + + bool prev_inst_read_scratch_reg = false; + vec4_instruction *prev_inst = (vec4_instruction *) inst-prev; You can move this declaration into the init statement of the for loop to limit its scope. Ok. + for (; !prev_inst-is_head_sentinel(); +prev_inst = (vec4_instruction *) prev_inst-prev) { + /* If any previous instruction does not read from or write to scratch_reg + * inconditonally we cannot reuse scratch_reg + */ + if (prev_inst-predicate prev_inst-opcode != BRW_OPCODE_SEL) + return false; I think this is somewhat pessimistic, register fills for a predicated instruction won't be predicated AFAIK, so it should be possible to reuse them, only the destination of a predicated write cannot be reused. Yeah, makes sense. + + /* If the previous instruction writes to scratch_reg then we can reuse
Re: [Mesa-dev] [PATCH v4 (part2) 45/59] glsl: fix UNIFORM_BUFFER_START or UNIFORM_BUFFER_SIZE query when no buffer object is bound
On Thu, 2015-08-06 at 08:53 +0300, Tapani Pälli wrote: Reviewed-by: Tapani Pälli tapani.pa...@intel.com On 08/05/2015 11:30 AM, Iago Toral Quiroga wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com According to ARB_uniform_buffer_object spec: If the parameter (starting offset or size) was not specified when the buffer object was bound (e.g. if bound with BindBufferBase), or if no buffer object is bound to index, zero is returned. BTW it seems that this was changed from OpenGL 4.1 to 4.2 (?) for UNIFORM_BUFFER_SIZE 4.1 core spec still says If the parameter (starting offset or size) was not specified when the buffer object was bound, zero is returned. If no buffer object is bound to index, -1 is returned while 4.2 changed If not buffer object is bound to index value to 0, GLES 3.1 also says 0. Ha, good catch... It seems that the text in the OpenGL 4.2 spec is the one that persisted, at least I see that in the OpenGL 4.5 spec as well. I guess it is not worth to special case this only for OpenGL 4.1... Ian, what do you think? Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/mesa/main/get.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index e2bfb5f..b89c49f 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -1925,7 +1925,8 @@ find_value_indexed(const char *func, GLenum pname, GLuint index, union value *v) goto invalid_value; if (!ctx-Extensions.ARB_uniform_buffer_object) goto invalid_enum; - v-value_int = ctx-UniformBufferBindings[index].Offset; + v-value_int = ctx-UniformBufferBindings[index].Offset 0 ? 0 : + ctx-UniformBufferBindings[index].Offset; return TYPE_INT; case GL_UNIFORM_BUFFER_SIZE: @@ -1933,7 +1934,8 @@ find_value_indexed(const char *func, GLenum pname, GLuint index, union value *v) goto invalid_value; if (!ctx-Extensions.ARB_uniform_buffer_object) goto invalid_enum; - v-value_int = ctx-UniformBufferBindings[index].Size; + v-value_int = ctx-UniformBufferBindings[index].Size 0 ? 0 : + ctx-UniformBufferBindings[index].Size; return TYPE_INT; /* ARB_shader_storage_buffer_object */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 (part2) 44/59] mesa: Add queries for GL_SHADER_STORAGE_BUFFER
On Wed, 2015-08-05 at 12:23 -0400, Ilia Mirkin wrote: On Wed, Aug 5, 2015 at 4:30 AM, Iago Toral Quiroga ito...@igalia.com wrote: These handle querying the buffer name attached to a giving binding point as well as the start offset and size of that buffer. --- src/mesa/main/get.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 307a5ff..e2bfb5f 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -1044,6 +1044,10 @@ find_custom_value(struct gl_context *ctx, const struct value_desc *d, union valu case GL_UNIFORM_BUFFER_BINDING: v-value_int = ctx-UniformBuffer-Name; break; + /* GL_ARB_shader_storage_buffer_object */ + case GL_SHADER_STORAGE_BUFFER_BINDING: + v-value_int = ctx-ShaderStorageBuffer-Name; + break; /* GL_ARB_timer_query */ case GL_TIMESTAMP: if (ctx-Driver.GetTimestamp) { @@ -1932,6 +1936,33 @@ find_value_indexed(const char *func, GLenum pname, GLuint index, union value *v) v-value_int = ctx-UniformBufferBindings[index].Size; return TYPE_INT; + /* ARB_shader_storage_buffer_object */ + case GL_SHADER_STORAGE_BUFFER_BINDING: + if (index = ctx-Const.MaxShaderStorageBufferBindings) + goto invalid_value; + if (!ctx-Extensions.ARB_shader_storage_buffer_object) + goto invalid_enum; It definitely strikes me as odd as checking the ext second, after other things. Is there a reason for doing it this way? (Same below.) It is what we do for everything else in this file, so we just followed the example. I suppose it does not really matter much in the end, but if we think it does we should probably fix all cases together in a separate patch. Iago + v-value_int = ctx-ShaderStorageBufferBindings[index].BufferObject-Name; + return TYPE_INT; + + case GL_SHADER_STORAGE_BUFFER_START: + if (index = ctx-Const.MaxShaderStorageBufferBindings) + goto invalid_value; + if (!ctx-Extensions.ARB_shader_storage_buffer_object) + goto invalid_enum; + v-value_int = ctx-ShaderStorageBufferBindings[index].Offset 0 ? 0 : + ctx-ShaderStorageBufferBindings[index].Offset; + return TYPE_INT; + + case GL_SHADER_STORAGE_BUFFER_SIZE: + if (index = ctx-Const.MaxShaderStorageBufferBindings) + goto invalid_value; + if (!ctx-Extensions.ARB_shader_storage_buffer_object) + goto invalid_enum; + v-value_int = ctx-ShaderStorageBufferBindings[index].Size 0 ? 0 : + ctx-ShaderStorageBufferBindings[index].Size; + return TYPE_INT; + /* ARB_texture_multisample / GL3.2 */ case GL_SAMPLE_MASK_VALUE: if (index != 0) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 (part2) 14/59] nir: add shader storage buffer's unsized array length calculation
On Wed, 2015-08-05 at 11:59 -0700, Connor Abbott wrote: On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote: From: Samuel Iglesias Gonsalvez sigles...@igalia.com Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com --- src/glsl/nir/glsl_to_nir.cpp | 10 ++ src/glsl/nir/nir_intrinsics.h | 9 + 2 files changed, 19 insertions(+) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index 77327b6..642affd 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -1618,6 +1618,16 @@ nir_visitor::visit(ir_expression *ir) instr-src[2].swizzle[i] = 0; } break; + case ir_triop_ssbo_unsized_array_length: { + nir_intrinsic_instr *load = nir_intrinsic_instr_create(this-shader, + nir_intrinsic_ssbo_unsized_array_length); + load-num_components = ir-type-vector_elements; + load-src[0] = evaluate_rvalue(ir-operands[0]); + load-src[1] = evaluate_rvalue(ir-operands[1]); + load-src[2] = evaluate_rvalue(ir-operands[2]); + add_instr(load-instr, ir-type-vector_elements); + return; + } case ir_quadop_bitfield_insert: instr = emit(nir_op_bitfield_insert, dest_size, srcs); for (unsigned i = 0; i ir-operands[0]-type-vector_elements; i++) { diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index bc6e6b8..f264f55 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -62,6 +62,15 @@ INTRINSIC(interp_var_at_offset, 1, ARR(2), true, 0, 1, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) /* + * Shared storage buffer's unsized array length calculation. It takes the + * SSBO index, unsized array offset inside the shader storage block interface + * and stride of unsized array. + */ + +INTRINSIC(ssbo_unsized_array_length, 3, ARR(1), true, 1, 0, 0, + NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) It looks like there's 3 sources, but you've only specified the size of one. Did you mean to do ARR(1, 1, 1)? With that fixed, Reviewed-by: Connor Abbott connor.w.abb...@intel.com Yeah, I think that's just a mistake. Thanks. + +/* * a barrier is an intrinsic with no inputs/outputs but which can't be moved * around/optimized in general */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 (part2) 29/59] nir: Implement __intrinsic_store_ssbo
On Fri, 2015-08-07 at 07:43 +0200, Iago Toral wrote: On Thu, 2015-08-06 at 11:06 -0700, Connor Abbott wrote: On Thu, Aug 6, 2015 at 12:30 AM, Iago Toral ito...@igalia.com wrote: On Wed, 2015-08-05 at 12:17 -0700, Connor Abbott wrote: On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote: --- src/glsl/nir/glsl_to_nir.cpp | 36 src/glsl/nir/nir_intrinsics.h | 12 ++-- 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index 642affd..cbec2df 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -641,6 +641,8 @@ nir_visitor::visit(ir_call *ir) op = nir_intrinsic_image_atomic_comp_swap; } else if (strcmp(ir-callee_name(), __intrinsic_memory_barrier) == 0) { op = nir_intrinsic_memory_barrier; + } else if (strcmp(ir-callee_name(), __intrinsic_store_ssbo) == 0) { + op = nir_intrinsic_store_ssbo; } else { unreachable(not reached); } @@ -730,6 +732,40 @@ nir_visitor::visit(ir_call *ir) } case nir_intrinsic_memory_barrier: break; + case nir_intrinsic_store_ssbo: { + exec_node *param = ir-actual_parameters.get_head(); + ir_rvalue *block = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_rvalue *offset = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_rvalue *val = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_constant *write_mask = ((ir_instruction *)param)-as_constant(); + assert(write_mask); + + /* Check if we need the indirect version */ + ir_constant *const_offset = offset-as_constant(); + if (!const_offset) { +op = nir_intrinsic_store_ssbo_indirect; +ralloc_free(instr); +instr = nir_intrinsic_instr_create(shader, op); +instr-src[2] = evaluate_rvalue(offset); +instr-const_index[0] = 0; + } else { +instr-const_index[0] = const_offset-value.u[0]; + } + + instr-const_index[1] = write_mask-value.u[0]; + + instr-src[0] = evaluate_rvalue(val); + instr-num_components = val-type-vector_elements; + + instr-src[1] = evaluate_rvalue(block); + break; + } default: unreachable(not reached); } diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index f264f55..83eeecd 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -176,12 +176,12 @@ LOAD(input, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) * offset. */ -#define STORE(name, num_indices, flags) \ - INTRINSIC(store_##name, 1, ARR(0), false, 0, 0, num_indices, flags) \ - INTRINSIC(store_##name##_indirect, 2, ARR(0, 1), false, 0, 0, \ +#define STORE(name, extra_srcs, num_indices, flags) \ + INTRINSIC(store_##name, extra_srcs, ARR(0, 1), false, 0, 0, num_indices, flags) \ + INTRINSIC(store_##name##_indirect, extra_srcs + 1, ARR(0, 1, 1), false, 0, 0, \ num_indices, flags) \ -STORE(output, 1, 0) -/* STORE(ssbo, 2, 0) */ +STORE(output, 1, 2, 0) +STORE(ssbo, 2, 2, 0) I don't think outputs should have any extra sources, since they only take a constant index, plus possibly an indirect source that's already covered by the STORE macro. SSBO stores should only have one extra source for the block index. Also, we should update the comment above to explain this similarly to the paragraph above the loads. SSBO stores need an extra source for the block index and an extra index for a writemask. I'll leave the STORE() macro as it was and just define SSBO stores using INTRINSIC() directly then. Ok, I see. I don't think you need a separate INTRINSIC(), but right now calling the parameter you added extra_srcs is confusing, since you're counting the value to be stored, which isn't really extra at all -- every store should have one! How about instead, we change the STORE macro to have: - An extra_srcs parameter that contains only sources that are actually extra, not counting the value to be stored -- direct stores have extra_srcs + 1 sources, and indirect sources have extra_srcs + 2 sources - An extra_indices parameter that contains the extra indices, and replace num_indices with extra_indices + 1 Then normal stores have both set to 0, and SSBO stores have both set to 1 to indicate
Re: [Mesa-dev] [PATCH v4 (part2) 02/59] i965/vec4: Import helpers to convert vectors into arrays and back.
On Wed, 2015-08-05 at 10:29 +0200, Iago Toral Quiroga wrote: From: Francisco Jerez curroje...@riseup.net These functions handle the conversion of a vec4 into the form expected by the dataport unit in message and message return payloads. The conversion is not always trivial because some messages don't support SIMD4x2 for some generations, in which case a strided copy may be necessary. v2: Split from the FS implementation. v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip. --- src/mesa/drivers/dri/i965/Makefile.sources | 2 + .../drivers/dri/i965/brw_vec4_surface_builder.cpp | 98 ++ .../drivers/dri/i965/brw_vec4_surface_builder.h| 30 +++ 3 files changed, 130 insertions(+) create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_surface_builder.h diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 88e64fb..049e293 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -129,6 +129,8 @@ i965_FILES = \ brw_vec4_nir.cpp \ brw_vec4_gs_nir.cpp \ brw_vec4_reg_allocate.cpp \ + brw_vec4_surface_builder.cpp \ + brw_vec4_surface_builder.h \ brw_vec4_visitor.cpp \ brw_vec4_vp.cpp \ brw_vec4_vs_visitor.cpp \ diff --git a/src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp b/src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp new file mode 100644 index 000..5ba1c6d --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp @@ -0,0 +1,98 @@ +/* + * Copyright © 2013-2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include brw_vec4_surface_builder.h + +using namespace brw; + +namespace { + namespace array_utils { + /** + * Copy one every \p src_stride logical components of the argument into + * one every \p dst_stride logical components of the result. + */ + src_reg + emit_stride(const vec4_builder bld, const src_reg src, unsigned size, + unsigned dst_stride, unsigned src_stride) + { + if (src_stride == 1 dst_stride == 1) { +return src; + } else { +const dst_reg dst = bld.vgrf(src.type, + DIV_ROUND_UP(size * dst_stride, 4)); + +for (unsigned i = 0; i size; ++i) + bld.MOV(writemask(offset(dst, i * dst_stride / 4), + 1 (i * dst_stride % 4)), + swizzle(offset(src, i * src_stride / 4), + brw_swizzle_for_mask(1 (i * src_stride % 4; + +return src_reg(dst); + } + } + + /** + * Convert a VEC4 into an array of registers with the layout expected by + * the recipient shared unit. If \p has_simd4x2 is true the argument is + * left unmodified in SIMD4x2 form, otherwise it will be rearranged into + * a SIMD8 vector. + */ + src_reg + emit_insert(const vec4_builder bld, const src_reg src, + unsigned n, bool has_simd4x2) + { + if (src.file == BAD_FILE || n == 0) { +return src_reg(); + + } else { +/* Pad unused components with zeroes. */ +const unsigned mask = (1 n) - 1; +const dst_reg tmp = bld.vgrf(src.type); + +bld.MOV(writemask(tmp, mask), src); +if (n 4) + bld.MOV(writemask(tmp, ~mask), 0); + +return emit_stride(bld, src, n, has_simd4x2 ? 1 : 4, 1); Curro: do we really need the padding here? In any case, I guess you meant to pass tmp and not src to emit_stride here
Re: [Mesa-dev] [PATCH v3 4/4] i965/vec4: Don't unspill the same register in consecutive instructions
On Fri, 2015-08-07 at 14:14 +0300, Francisco Jerez wrote: Iago Toral ito...@igalia.com writes: On Thu, 2015-08-06 at 18:27 +0300, Francisco Jerez wrote: Iago Toral Quiroga ito...@igalia.com writes: If we have spilled/unspilled a register in the current instruction, avoid emitting unspills for the same register in the same instruction or consecutive instructions following the current one as long as they keep reading the spilled register. This should allow us to avoid emitting costy unspills that come with little benefit to register allocation. Also, update evaluate_spill_costs so that we account for the saved unspills. --- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 129 +++-- 1 file changed, 121 insertions(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index 617c988..fed5f4d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp @@ -264,6 +264,95 @@ vec4_visitor::reg_allocate() return true; } +/** + * When we decide to spill a register, instead of blindly spilling every use, + * save unspills when the spill register is used (read) in consecutive + * instructions. This can potentially save a bunch of unspills that would + * have very little impact in register allocation anyway. + * + * Notice that we need to account for this behavior when spilling a register + * and when evaluating spilling costs. This function is designed so it can + * be called from both places and avoid repeating the logic. + * + * - When we call this function from spill_reg, we pass in scratch_reg the + *actual unspill/spill register that we want to reuse in the current + *instruction. + * + * - When we call this from evaluate_spill_costs, we pass the register for + *which we are evaluating spilling costs. + * + * In either case, we check if the previous instructions read scratch_reg until + * we find an instruction that writes to it (in which case we can reuse + * scratch_reg as long as the writemask is compatible with the channels we need + * to read in the current instruction) or we hit an instruction that does not + * read scratch_reg at all. The latter can only happen when we call this from + * evaluate_spill_costs, Strictly speaking it can also happen when called from spill_reg() for the first time in a given sequence of consecutive instructions (in which case you correctly return false). True, I'll fix the comment. and means that this is the point at which we first + * need the unspill this register for our current instruction. Since all our + * unspills read a full vec4, we know that in this case we will have all + * the channels available in scratch_reg and we can reuse it. + * + * In any other case, we can't reuse scratch_reg in the current instruction, + * meaning that we will need to unspill it. + */ +static bool +can_use_scratch_for_source(const vec4_instruction *inst, unsigned i, + unsigned scratch_reg) +{ + assert(inst-src[i].file == GRF); + + /* If the current instruction is already using scratch_reg in src[n] with +* n i, then we know we can reuse it for src[i] too. +*/ + for (unsigned n = 0; n i; n++) { + if (inst-src[n].file == GRF inst-src[n].reg == scratch_reg) + return true; + } I don't think this is correct in cases where the previous source reused the temporary of a previously spilled register with incompatible writemask. You probably want to handle the current instruction consistently with the previous ones, i.e. as part of the loop below. Yes, you're right. I suggest you define a variable (e.g. n as you've called it) initially equal to i that would determine the number of sources to check for the next instruction. At the end of the loop body it would be re-set to 3, what would also cause the destination registers to be checked in subsequent iterations. Sounds good to me. + + bool prev_inst_read_scratch_reg = false; + vec4_instruction *prev_inst = (vec4_instruction *) inst-prev; You can move this declaration into the init statement of the for loop to limit its scope. Ok. + for (; !prev_inst-is_head_sentinel(); +prev_inst = (vec4_instruction *) prev_inst-prev) { + /* If any previous instruction does not read from or write to scratch_reg + * inconditonally we cannot reuse scratch_reg + */ + if (prev_inst-predicate prev_inst-opcode != BRW_OPCODE_SEL) + return false; I think this is somewhat pessimistic, register fills for a predicated
Re: [Mesa-dev] [PATCH v4 (part2) 46/59] glsl: Allow use of memory qualifiers with ARB_shader_storage_buffer_object.
Thanks Tapani, I'll look into this as soon as I get back from holidays. The implementation has changed slightly since I sent these patches too btw. Iago El 2015-08-13 10:05, Tapani Pälli escribió: Hi; I believe something more needs to be done here, it may be generic changes to the memory qualifier parsing. I wrote a Piglit test that fails with these changes (I tested against itoral-ARB_shader_storage_buffer_object-v4.1 branch), here: http://lists.freedesktop.org/archives/piglit/2015-August/016777.html On 08/05/2015 11:30 AM, Iago Toral Quiroga wrote: --- src/glsl/glsl_lexer.ll | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/glsl/glsl_lexer.ll b/src/glsl/glsl_lexer.ll index efa0bb6..b9d700a 100644 --- a/src/glsl/glsl_lexer.ll +++ b/src/glsl/glsl_lexer.ll @@ -405,11 +405,11 @@ image2DShadow KEYWORD(130, 300, 0, 0, IMAGE2DSHADOW); image1DArrayShadow KEYWORD(130, 300, 0, 0, IMAGE1DARRAYSHADOW); image2DArrayShadow KEYWORD(130, 300, 0, 0, IMAGE2DARRAYSHADOW); -coherent KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable, COHERENT); -volatile KEYWORD_WITH_ALT(110, 100, 420, 0, yyextra-ARB_shader_image_load_store_enable, VOLATILE); -restrict KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable, RESTRICT); -readonly KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable, READONLY); -writeonly KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable, WRITEONLY); +coherent KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable || yyextra-ARB_shader_storage_buffer_object_enable, COHERENT); +volatile KEYWORD_WITH_ALT(110, 100, 420, 0, yyextra-ARB_shader_image_load_store_enable || yyextra-ARB_shader_storage_buffer_object_enable, VOLATILE); +restrict KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable || yyextra-ARB_shader_storage_buffer_object_enable, RESTRICT); +readonly KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable || yyextra-ARB_shader_storage_buffer_object_enable, READONLY); +writeonly KEYWORD_WITH_ALT(420, 300, 420, 0, yyextra-ARB_shader_image_load_store_enable || yyextra-ARB_shader_storage_buffer_object_enable, WRITEONLY); atomic_uint KEYWORD_WITH_ALT(420, 300, 420, 310, yyextra-ARB_shader_atomic_counters_enable, ATOMIC_UINT); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 (part2) 29/59] nir: Implement __intrinsic_store_ssbo
On Wed, 2015-08-05 at 12:17 -0700, Connor Abbott wrote: On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote: --- src/glsl/nir/glsl_to_nir.cpp | 36 src/glsl/nir/nir_intrinsics.h | 12 ++-- 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index 642affd..cbec2df 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -641,6 +641,8 @@ nir_visitor::visit(ir_call *ir) op = nir_intrinsic_image_atomic_comp_swap; } else if (strcmp(ir-callee_name(), __intrinsic_memory_barrier) == 0) { op = nir_intrinsic_memory_barrier; + } else if (strcmp(ir-callee_name(), __intrinsic_store_ssbo) == 0) { + op = nir_intrinsic_store_ssbo; } else { unreachable(not reached); } @@ -730,6 +732,40 @@ nir_visitor::visit(ir_call *ir) } case nir_intrinsic_memory_barrier: break; + case nir_intrinsic_store_ssbo: { + exec_node *param = ir-actual_parameters.get_head(); + ir_rvalue *block = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_rvalue *offset = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_rvalue *val = ((ir_instruction *)param)-as_rvalue(); + + param = param-get_next(); + ir_constant *write_mask = ((ir_instruction *)param)-as_constant(); + assert(write_mask); + + /* Check if we need the indirect version */ + ir_constant *const_offset = offset-as_constant(); + if (!const_offset) { +op = nir_intrinsic_store_ssbo_indirect; +ralloc_free(instr); +instr = nir_intrinsic_instr_create(shader, op); +instr-src[2] = evaluate_rvalue(offset); +instr-const_index[0] = 0; + } else { +instr-const_index[0] = const_offset-value.u[0]; + } + + instr-const_index[1] = write_mask-value.u[0]; + + instr-src[0] = evaluate_rvalue(val); + instr-num_components = val-type-vector_elements; + + instr-src[1] = evaluate_rvalue(block); + break; + } default: unreachable(not reached); } diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index f264f55..83eeecd 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -176,12 +176,12 @@ LOAD(input, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) * offset. */ -#define STORE(name, num_indices, flags) \ - INTRINSIC(store_##name, 1, ARR(0), false, 0, 0, num_indices, flags) \ - INTRINSIC(store_##name##_indirect, 2, ARR(0, 1), false, 0, 0, \ +#define STORE(name, extra_srcs, num_indices, flags) \ + INTRINSIC(store_##name, extra_srcs, ARR(0, 1), false, 0, 0, num_indices, flags) \ + INTRINSIC(store_##name##_indirect, extra_srcs + 1, ARR(0, 1, 1), false, 0, 0, \ num_indices, flags) \ -STORE(output, 1, 0) -/* STORE(ssbo, 2, 0) */ +STORE(output, 1, 2, 0) +STORE(ssbo, 2, 2, 0) I don't think outputs should have any extra sources, since they only take a constant index, plus possibly an indirect source that's already covered by the STORE macro. SSBO stores should only have one extra source for the block index. Also, we should update the comment above to explain this similarly to the paragraph above the loads. SSBO stores need an extra source for the block index and an extra index for a writemask. I'll leave the STORE() macro as it was and just define SSBO stores using INTRINSIC() directly then. -LAST_INTRINSIC(store_output_indirect) +LAST_INTRINSIC(store_ssbo_indirect) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 (part2) 39/59] nir: Implement lowered SSBO atomic intrinsics
On Wed, 2015-08-05 at 12:24 -0700, Connor Abbott wrote: On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote: The original GLSL IR intrinsics have been lowered to an internal version that accepts a block index and an offset instead of a SSBO reference. --- src/glsl/nir/glsl_to_nir.cpp | 55 +++ src/glsl/nir/nir_intrinsics.h | 12 ++ 2 files changed, 67 insertions(+) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index aa96e40..4450e7a 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -645,6 +645,22 @@ nir_visitor::visit(ir_call *ir) op = nir_intrinsic_store_ssbo; } else if (strcmp(ir-callee_name(), __intrinsic_load_ssbo) == 0) { op = nir_intrinsic_load_ssbo; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_add_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_add; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_and_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_and; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_or_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_or; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_xor_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_xor; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_min_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_min; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_max_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_max; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_exchange_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_exchange; + } else if (strcmp(ir-callee_name(), __intrinsic_ssbo_atomic_comp_swap_internal) == 0) { + op = nir_intrinsic_ssbo_atomic_comp_swap; } else { unreachable(not reached); } @@ -826,7 +842,46 @@ nir_visitor::visit(ir_call *ir) nir_instr_insert_after_cf_list(this-cf_node_list, compare-instr); } + break; + } + case nir_intrinsic_ssbo_atomic_add: + case nir_intrinsic_ssbo_atomic_min: + case nir_intrinsic_ssbo_atomic_max: + case nir_intrinsic_ssbo_atomic_and: + case nir_intrinsic_ssbo_atomic_or: + case nir_intrinsic_ssbo_atomic_xor: + case nir_intrinsic_ssbo_atomic_exchange: + case nir_intrinsic_ssbo_atomic_comp_swap: { + int param_count = ir-actual_parameters.length(); + assert(param_count == 3 || param_count == 4); + + /* Block index */ + exec_node *param = ir-actual_parameters.get_head(); + ir_instruction *inst = (ir_instruction *) param; + instr-src[0] = evaluate_rvalue(inst-as_rvalue()); + /* Offset */ + param = param-get_next(); + inst = (ir_instruction *) param; + instr-src[1] = evaluate_rvalue(inst-as_rvalue()); + + /* data1 parameter (this is always present) */ + param = param-get_next(); + inst = (ir_instruction *) param; + instr-src[2] = evaluate_rvalue(inst-as_rvalue()); + + /* data2 parameter (only with atomic_comp_swap) */ + if (param_count == 4) { +assert(op == nir_intrinsic_ssbo_atomic_comp_swap); +param = param-get_next(); +inst = (ir_instruction *) param; +instr-src[3] = evaluate_rvalue(inst-as_rvalue()); + } + + /* Atomic result */ + assert(ir-return_deref); + nir_ssa_dest_init(instr-instr, instr-dest, + ir-return_deref-type-vector_elements, NULL); break; } default: diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index 9877ea4..a1351d1 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -133,6 +133,18 @@ INTRINSIC(image_atomic_xor, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) INTRINSIC(image_atomic_exchange, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) INTRINSIC(image_atomic_comp_swap, 4, ARR(4, 1, 1, 1), true, 1, 1, 0, 0) +/* + * SSBO atomic intrinsics + */ Could you add a bit here explaining what the sources are? How about this?: /* * SSBO atomic intrinsics * * All of the SSBO atomic memory operations read a value from memory, * compute a new value using one of the operations below, write the new * value to memory, and return the original value read. * * All operations take 3 sources except CompSwap that takes 4. These * sources represent: * * 0: The SSBO buffer index. * 1: The offset into the SSBO buffer of the variable that the atomic *operation will operate on. * 2: The data parameter to the atomic function
Re: [Mesa-dev] [PATCH V2 12/14] meta: Fix reading luminance texture as rgba in _mesa_meta_pbo_GetTexSubImage()
On Thu, 2015-07-23 at 11:40 -0700, Anuj Phogat wrote: On Wed, Jul 22, 2015 at 7:10 AM, Iago Toral ito...@igalia.com wrote: The problem here is that the _mesa_meta_BlitFramebuffer is not setting G/B channels to 0.0 when doing Luminance/Intensity to RGBA conversions, so why not implement the fix in _mesa_meta_BlitFramebuffer directly? The GL spec expects frambuffer blits to handle these conversions properly, so it looks like a win for all uses of that function. I couldn't find an OpenGL spec reference suggesting this conversion in case of glBlitFrameBuffer. It is not explicitly stated, however the OpenGL 4.5 spec, section 18.3.1 Blitting Pixel Rectangle says: An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: • The read buffer contains fixed-point or floating-point values and any draw buffer contains neither fixed-point nor floating-point values. • The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. • The read buffer contains signed integer values and any draw buffer does not contain signed integer values. However, I realize now that luminance/intensity are not color-renderable formats, so that text is probably not considering these formats anyway. What I found supports the current behavior of glBlitFrameBuffer: See table 3.23 on page 220 (of pdf) of glspec30.20080811. Some relevant text from https://www.opengl.org/wiki/Image_Format: When a GL_RED format is sampled in a shader, the resulting vec4 is (Red, 0, 0, 1). When a GL_INTENSITY format is sampled, the resulting vec4 is (I, I, I, I). The single intensity value is read into all four components. For GL_LUMINANCE, the result is (L, L, L, 1). There is also a two-channel GL_LUMINANCE_ALPHA format, which gives (L, L, L, A). I think glBlitFrameBuffer should also follow this being a drawing operation. What do you think? Yes, it makes sense. You can add: Reviewed-by: Iago Toral Quiroga ito...@igalia.com to the patch. On Fri, 2015-07-17 at 10:28 -0700, Anuj Phogat wrote: After recent addition of pbo testing in piglit test getteximage-luminance, it fails on i965. This patch makes a sub test pass. This patch adds a clear color operation to meta pbo path, which I think is better than falling back to software path. V2: Fix color mask for GL_LUMINANCE_ALPHA Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 36 +++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 13f8292..f4d5ac3 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -28,6 +28,7 @@ #include blend.h #include bufferobj.h #include buffers.h +#include clear.h #include fbobject.h #include glformats.h #include glheader.h @@ -278,8 +279,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, int full_height, image_height; struct gl_texture_image *pbo_tex_image; struct gl_renderbuffer *rb = NULL; - GLenum status; - bool success = false; + GLenum status, src_base_format; + bool success = false, clear_channels_to_zero = false; + float save_clear_color[4]; int z; if (!_mesa_is_bufferobj(packing-BufferObj)) @@ -380,6 +382,27 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST)) goto fail; + src_base_format = tex_image ? + tex_image-_BaseFormat : + ctx-ReadBuffer-_ColorReadBuffer-_BaseFormat; + + /* Depending on the base formats involved we might need to rebase some +* values. For example if we download from a Luminance format to RGBA +* format, we want G=0 and B=0. +*/ + clear_channels_to_zero = + _mesa_need_luminance_to_rgb_conversion(src_base_format, + pbo_tex_image-_BaseFormat); + + if (clear_channels_to_zero) { + memcpy(save_clear_color, ctx-Color.ClearColor.f, 4 * sizeof(float)); + /* Clear the Green, Blue channels. */ + _mesa_ColorMask(GL_FALSE, GL_TRUE, GL_TRUE, + src_base_format != GL_LUMINANCE_ALPHA); + _mesa_ClearColor(0.0, 0.0, 0.0, 1.0); + _mesa_Clear(GL_COLOR_BUFFER_BIT); + } + for (z = 1; z depth; z++) { _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); @@ -392,6 +415,15 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, 0, z * image_height
Re: [Mesa-dev] [PATCH 3/5] i965/vec4: Register spilling should never see registers with size != 1
On Fri, 2015-07-24 at 16:20 +0300, Francisco Jerez wrote: Iago Toral Quiroga ito...@igalia.com writes: Larger registers should have been moved to scratch (like GRF array access) or split to size 1 by the split_virtual_grfs pass. Not necessarily. split_virtual_grfs() won't be able to split stuff which is read or written at once by the same instruction -- E.g. by send-from-GRF instructions as used for surface messages on e.g. your SSBO implementation. :) We should probably eventually migrate other messages too like the ones used for texturing and framebuffer writes to use proper sends from GRF... Okay, in that case I'll include patches to add support for spilling registers with size 1 as well. Thanks, Iago --- src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index cff5406..80ab813 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp @@ -271,7 +271,8 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill) for (unsigned i = 0; i this-alloc.count; i++) { spill_costs[i] = 0.0; - no_spill[i] = alloc.sizes[i] != 1; + no_spill[i] = false; + assert(this-alloc.sizes[i] == 1); } /* Calculate costs for spilling nodes. Call it a cost of 1 per -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] i965/vec4: Don't emit scratch reads for a spilled register we have just written
On Fri, 2015-07-24 at 16:18 +0300, Francisco Jerez wrote: Iago Toral Quiroga ito...@igalia.com writes: When we have code such as this: mov vgrf1.0.x:F, vgrf2.:F mov vgrf3.0.x:F, vgrf1.:F ... mov vgrf3.0.x:F, vgrf1.:F And vgrf1 is chosen for spilling, we can emit this: mov vgrf1.0.x:F, vgrf2.:F gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D mov vgrf3.0.x:F, vgrf1.:F ... gen4_scratch_read vgrf4.0.x:F, 22D mov vgrf3.0.x:F, vgrf4.:F Instead of this: mov vgrf1.0.x:F, vgrf2.:F gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D gen4_scratch_read vgrf4.0.x:F, 22D mov vgrf3.0.x:F, vgrf4.:F ... gen4_scratch_read vgrf5.0.x:F, 22D mov vgrf3.0.x:F, vgrf5.:F And save one scratch read while still preserving the benefits of spilling the register. This sounds reasonable to me in principle. I guess that there is in general a trade-off between the number of spills/fills you omit and the number of interference edges you eliminate. It may also be worth checking whether you can extend the same principle to cache the value of the variable in a GRF until the next instruction regardless of whether it was written or read (e.g. so you don't unspill the same register in two adjacent instructions). That makes sense, I'll send a v2 with that chage. In either case it seems like the overall cost of spilling a register would be decreased in cases where this heuristic can be applied, would it make sense to update the cost metric accordingly? Yeah, I guess so. I'll do that too. One more comment inline. --- .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 39 +- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index 80ab813..5fed2f9 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp @@ -334,6 +334,18 @@ vec4_visitor::choose_spill_reg(struct ra_graph *g) return ra_get_best_spill_node(g); } +static bool +writemask_matches_swizzle(unsigned writemask, unsigned swizzle) +{ + for (int i = 0; i 4; i++) { + unsigned channel = 1 BRW_GET_SWZ(swizzle, i); + if (!(writemask channel)) + return false; + } + + return true; +} + void vec4_visitor::spill_reg(int spill_reg_nr) { @@ -341,11 +353,33 @@ vec4_visitor::spill_reg(int spill_reg_nr) unsigned int spill_offset = last_scratch++; /* Generate spill/unspill instructions for the objects being spilled. */ + vec4_instruction *spill_write_inst = NULL; foreach_block_and_inst(block, vec4_instruction, inst, cfg) { + /* We don't spill registers used for scratch */ + if (inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_READ || + inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) + continue; + int scratch_reg = -1; for (unsigned int i = 0; i 3; i++) { if (inst-src[i].file == GRF inst-src[i].reg == spill_reg_nr) { -if (scratch_reg == -1) { +/* If we are reading the spilled register right after writing + * to it we can skip the scratch read and use directly the + * register we used as source for the scratch write. For this + * to work we must check that: + * + * 1) The write is inconditional, that is, it is not predicated or + it is a SEL. + * 2) All the channels that we read have been written in that + *last write instruction. + */ +if (spill_write_inst +(!spill_write_inst-predicate || + spill_write_inst-opcode == BRW_OPCODE_SEL) +writemask_matches_swizzle(spill_write_inst-dst.writemask, + inst-src[i].swizzle)) { brw_mask_for_swizzle() returns the mask of components accessed by a swizzle, you could just AND it with ~spill_write_inst-dst.writemask to find out whether it's contained in the destination of the previous instruction. Ah nice, thanks for the tip! Iago + scratch_reg = spill_write_inst-dst.reg; +} else if (scratch_reg == -1) { scratch_reg = alloc.allocate(1); src_reg temp = inst-src[i]; temp.reg = scratch_reg; @@ -358,6 +392,9 @@ vec4_visitor::spill_reg(int spill_reg_nr) if (inst-dst.file == GRF inst-dst.reg == spill_reg_nr) { emit_scratch_write(block, inst, spill_offset); + spill_write_inst = inst; + } else { + spill_write_inst = NULL; } } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
Re: [Mesa-dev] [PATCH 07/14] mesa: Set green, blue channels to zero only for formats with these components
On Tue, 2015-07-21 at 11:13 -0700, Anuj Phogat wrote: On Tue, Jul 21, 2015 at 12:50 AM, Iago Toral ito...@igalia.com wrote: On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/common/meta.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 214a68a..fceb25d 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3132,9 +3132,16 @@ decompress_texture_image(struct gl_context *ctx, * returned as red and two-channel texture values are returned as * red/alpha. */ - if ((baseTexFormat == GL_LUMINANCE || - baseTexFormat == GL_LUMINANCE_ALPHA || - baseTexFormat == GL_INTENSITY) || + if (((baseTexFormat == GL_LUMINANCE || +baseTexFormat == GL_LUMINANCE_ALPHA || +baseTexFormat == GL_INTENSITY) + (destBaseFormat == GL_RGBA || +destBaseFormat == GL_RGB || +destBaseFormat == GL_RG || +destBaseFormat == GL_GREEN || +destBaseFormat == GL_BLUE || +destBaseFormat == GL_BGRA || +destBaseFormat == GL_BGR)) || Is this needed to achieve correct behavior or just an optimization? I would expect that if the dest format does not have G/B channels, setting pixel transfer options for these channels would not have any functional effect anyway. This is just an optimization. We set pixel transfer operations based on these conditions and then call _mesa_ReadPixels, which falls back to slower path if transfer operations are set. I'll bump up the commit message of this patch. Yes, a note in the commit log is worth it, thanks, Reviewed-by: Iago Toral Quiroga ito...@igalia.com /* If we're reading back an RGB(A) texture (using glGetTexImage) as * luminance then we need to return L=tex(R). */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion
On Tue, 2015-07-21 at 17:05 -0700, Anuj Phogat wrote: On Tue, Jul 21, 2015 at 1:36 AM, Iago Toral ito...@igalia.com wrote: On Tue, 2015-07-21 at 08:13 +0200, Iago Toral wrote: On Mon, 2015-07-20 at 10:56 -0700, Anuj Phogat wrote: On Mon, Jul 20, 2015 at 5:10 AM, Iago Toral ito...@igalia.com wrote: On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote: On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote: Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when forced to use the meta pbo path. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 00364f8..84cbc50 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) return false; + + if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format, type)) + return false; Hrm... This seems fishy. Isn't glBlitFramebuffers supposed to handle format conversion with integers? If so we should probably fix it rather than just skip it for the meta pbo path. As discussed offline, here is relevant text for glBlitFrameBuffer() from OpenGL 4.5 spec, section 18.3.1: An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: -The read buffer contains fixed-point or floating-point values and any draw buffer contains neither fixed-point nor floating-point values. -The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. - The read buffer contains signed integer values and any draw buffer does not contain signed integer values. I'll add a comment here explaining the reason to avoid meta path. Is this code going to run only for glBlitFramebuffer? I see this function being called from code paths that implement glReadPixels and glGetTexImage too. _mesa_meta_pbo_GetTexSubImage() is used only for glReadPixels and glGetTexImage. I quoted the glBliFrameBuffer restriction above because the function is later using _mesa_meta_BlitFramebuffer(), which doesn't support some format conversions. If this function can be used to resolve ReadPixels and GetTexImage but the checks you add are *specific* to BlitFramebuffer, it does not look like this is the right place for them. Shouldn't you put them inside _mesa_meta_BlitFramebuffer instead? Otherwise they would affect to ReadPixels and GetTexImage too and I don't see the same restrictions applying to ReadPixels for example. We already have error checks in place for glBlitFrameBuffer(). Take a look at compatible_color_datatypes() in _mesa_blit_framebuffer(). Specifically for ReadPixels I only see this in the spec: An INVALID_OPERATION error is generated if format is an integer format and the color buffer is not an integer format, or if the color buffer is an integer format and format is not an integer format. So, unlike BlitFramebuffer, it seems that ReadPixels is fine as long as both formats are integer, no matter if the types have the same sign or not. Right. That's the reason this patch doesn't generate any GL error for signed-unsigned int mismatch. It just decides not to use meta pbo path because of unsupported format conversions in _mesa_meta_BlitFrameBuffer(), and fallback to using other paths. Ah, I see now, thanks for clarifying this. Reviewed-by: Iago Toral Quiroga ito...@igalia.com Iago } /* For arrays, use a tall (height * depth) 2D texture but taking into -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo
Re: [Mesa-dev] [PATCH v2 06/14] meta: Don't do fragment color clamping in _mesa_meta_pbo_GetTexSubImage
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Wed, 2015-07-22 at 11:54 -0700, Anuj Phogat wrote: _mesa_meta_pbo_GetTexSubImage() uses _mesa_meta_BlitFrameBuffer(), which will do fragment clamping if enabled. But fragment clamping doesn't affect ReadPixels and GetTexImage. Without this patch, piglit test arb_color_buffer_float-clear fails, when forced to use the meta pbo path. v2: Apply this fix to both glReadPixels and glGetTexImage. Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/common/meta_tex_subimage.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 6d7c1eb..038d525 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -25,6 +25,7 @@ *Jason Ekstrand jason.ekstr...@intel.com */ +#include blend.h #include bufferobj.h #include buffers.h #include fbobject.h @@ -331,6 +332,10 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER | MESA_META_PIXEL_STORE)); + /* GL_CLAMP_FRAGMENT_COLOR doesn't affect ReadPixels and GettexImage */ + if (ctx-Extensions.ARB_color_buffer_float) + _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE); + _mesa_GenFramebuffers(2, fbos); if (tex_image tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/14] mesa: Change the signature of _mesa_need_rgb_to_luminance_conversion()
Looks good to me, if this did not introduce any regressions: Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: This allows us to handle cases when texImage-_BaseFormat doesn't match _mesa_format_get_base_format(texImage-Format). _BaseFormat is what we care about in this function. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/common/meta_tex_subimage.c | 4 +++- src/mesa/main/readpix.c | 28 +++- src/mesa/main/readpix.h | 3 ++- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 6d52014..43e1210 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -262,6 +262,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, int full_height, image_height; struct gl_texture_image *pbo_tex_image; struct gl_renderbuffer *rb = NULL; + GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format); GLenum status, base_format; bool success = false, clear_channels_to_zero = false; float save_clear_color[4]; @@ -284,7 +285,8 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, type, GL_FALSE)) return false; - if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) + if (_mesa_need_rgb_to_luminance_conversion(rb-_BaseFormat, + dstBaseFormat)) return false; if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format, type)) diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c index c98975f..3a9b766 100644 --- a/src/mesa/main/readpix.c +++ b/src/mesa/main/readpix.c @@ -47,17 +47,14 @@ * Return true if the conversion L=R+G+B is needed. */ GLboolean -_mesa_need_rgb_to_luminance_conversion(mesa_format texFormat, GLenum format) +_mesa_need_rgb_to_luminance_conversion(GLenum srcBaseFormat, + GLenum dstBaseFormat) { - GLenum baseTexFormat = _mesa_get_format_base_format(texFormat); - - return (baseTexFormat == GL_RG || - baseTexFormat == GL_RGB || - baseTexFormat == GL_RGBA) - (format == GL_LUMINANCE || - format == GL_LUMINANCE_ALPHA || - format == GL_LUMINANCE_INTEGER_EXT || - format == GL_LUMINANCE_ALPHA_INTEGER_EXT); + return (srcBaseFormat == GL_RG || + srcBaseFormat == GL_RGB || + srcBaseFormat == GL_RGBA) + (dstBaseFormat == GL_LUMINANCE || + dstBaseFormat == GL_LUMINANCE_ALPHA); } /** @@ -89,6 +86,8 @@ _mesa_get_readpixels_transfer_ops(const struct gl_context *ctx, GLboolean uses_blit) { GLbitfield transferOps = ctx-_ImageTransferState; + GLenum srcBaseFormat = _mesa_get_format_base_format(texFormat); + GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format); if (format == GL_DEPTH_COMPONENT || format == GL_DEPTH_STENCIL || @@ -125,7 +124,7 @@ _mesa_get_readpixels_transfer_ops(const struct gl_context *ctx, * have any effect anyway. */ if (_mesa_get_format_datatype(texFormat) == GL_UNSIGNED_NORMALIZED - !_mesa_need_rgb_to_luminance_conversion(texFormat, format)) { + !_mesa_need_rgb_to_luminance_conversion(srcBaseFormat, dstBaseFormat)) { transferOps = ~IMAGE_CLAMP_BIT; } @@ -164,6 +163,7 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, { struct gl_renderbuffer *rb = _mesa_get_read_renderbuffer_for_format(ctx, format); + GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format); assert(rb); @@ -184,7 +184,8 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, default: /* Color formats. */ - if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) { + if (_mesa_need_rgb_to_luminance_conversion(rb-_BaseFormat, + dstBaseFormat)) { return GL_TRUE; } @@ -458,6 +459,7 @@ read_rgba_pixels( struct gl_context *ctx, uint8_t rebase_swizzle[4]; struct gl_framebuffer *fb = ctx-ReadBuffer; struct gl_renderbuffer *rb = fb-_ColorReadBuffer; + GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format); if (!rb) return; @@ -469,7 +471,7 @@ read_rgba_pixels( struct gl_context *ctx, dst_stride = _mesa_image_row_stride(packing, width, format, type); dst_format = _mesa_format_from_format_and_type(format, type); convert_rgb_to_lum = - _mesa_need_rgb_to_luminance_conversion(rb-Format, format
Re: [Mesa-dev] [PATCH 14/14] meta: Use _mesa_need_rgb_to_luminance_conversion() in decompress_texture_image()
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/common/meta.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 6108d98..e123500 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3137,11 +3137,8 @@ decompress_texture_image(struct gl_context *ctx, /* If we're reading back an RGB(A) texture (using glGetTexImage) as * luminance then we need to return L=tex(R). */ - ((baseTexFormat == GL_RGBA || -baseTexFormat == GL_RGB || -baseTexFormat == GL_RG) - (destBaseFormat == GL_LUMINANCE || - destBaseFormat == GL_LUMINANCE_ALPHA))) { + _mesa_need_rgb_to_luminance_conversion(baseTexFormat, + destBaseFormat)) { /* Green and blue must be zero */ _mesa_PixelTransferf(GL_GREEN_SCALE, 0.0f); _mesa_PixelTransferf(GL_BLUE_SCALE, 0.0f); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 12/14] meta: Fix reading luminance texture as rgba in _mesa_meta_pbo_GetTexSubImage()
The problem here is that the _mesa_meta_BlitFramebuffer is not setting G/B channels to 0.0 when doing Luminance/Intensity to RGBA conversions, so why not implement the fix in _mesa_meta_BlitFramebuffer directly? The GL spec expects frambuffer blits to handle these conversions properly, so it looks like a win for all uses of that function. Iago On Fri, 2015-07-17 at 10:28 -0700, Anuj Phogat wrote: After recent addition of pbo testing in piglit test getteximage-luminance, it fails on i965. This patch makes a sub test pass. This patch adds a clear color operation to meta pbo path, which I think is better than falling back to software path. V2: Fix color mask for GL_LUMINANCE_ALPHA Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 36 +++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 13f8292..f4d5ac3 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -28,6 +28,7 @@ #include blend.h #include bufferobj.h #include buffers.h +#include clear.h #include fbobject.h #include glformats.h #include glheader.h @@ -278,8 +279,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, int full_height, image_height; struct gl_texture_image *pbo_tex_image; struct gl_renderbuffer *rb = NULL; - GLenum status; - bool success = false; + GLenum status, src_base_format; + bool success = false, clear_channels_to_zero = false; + float save_clear_color[4]; int z; if (!_mesa_is_bufferobj(packing-BufferObj)) @@ -380,6 +382,27 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, GL_COLOR_BUFFER_BIT, GL_NEAREST)) goto fail; + src_base_format = tex_image ? + tex_image-_BaseFormat : + ctx-ReadBuffer-_ColorReadBuffer-_BaseFormat; + + /* Depending on the base formats involved we might need to rebase some +* values. For example if we download from a Luminance format to RGBA +* format, we want G=0 and B=0. +*/ + clear_channels_to_zero = + _mesa_need_luminance_to_rgb_conversion(src_base_format, + pbo_tex_image-_BaseFormat); + + if (clear_channels_to_zero) { + memcpy(save_clear_color, ctx-Color.ClearColor.f, 4 * sizeof(float)); + /* Clear the Green, Blue channels. */ + _mesa_ColorMask(GL_FALSE, GL_TRUE, GL_TRUE, + src_base_format != GL_LUMINANCE_ALPHA); + _mesa_ClearColor(0.0, 0.0, 0.0, 1.0); + _mesa_Clear(GL_COLOR_BUFFER_BIT); + } + for (z = 1; z depth; z++) { _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, tex_image, zoffset + z); @@ -392,6 +415,15 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, 0, z * image_height, width, z * image_height + height, GL_COLOR_BUFFER_BIT, GL_NEAREST); + if (clear_channels_to_zero) + _mesa_Clear(GL_COLOR_BUFFER_BIT); + } + + /* Unmask the color channels and restore the saved clear color values. */ + if (clear_channels_to_zero) { + _mesa_ColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE); + _mesa_ClearColor(save_clear_color[0], save_clear_color[1], + save_clear_color[2], save_clear_color[3]); } success = true; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/14] meta: Use _mesa_need_luminance_to_rgb_conversion() in decompress_texture_image()
Patches 10-11 are Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/common/meta.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index c9e58d8..6108d98 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3132,16 +3132,8 @@ decompress_texture_image(struct gl_context *ctx, * returned as red and two-channel texture values are returned as * red/alpha. */ - if (((baseTexFormat == GL_LUMINANCE || -baseTexFormat == GL_LUMINANCE_ALPHA || -baseTexFormat == GL_INTENSITY) - (destBaseFormat == GL_RGBA || -destBaseFormat == GL_RGB || -destBaseFormat == GL_RG || -destBaseFormat == GL_GREEN || -destBaseFormat == GL_BLUE || -destBaseFormat == GL_BGRA || -destBaseFormat == GL_BGR)) || + if (_mesa_need_luminance_to_rgb_conversion(baseTexFormat, + destBaseFormat) || /* If we're reading back an RGB(A) texture (using glGetTexImage) as * luminance then we need to return L=tex(R). */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: free interface_types and replace old hash_table uses
On Sat, 2015-07-11 at 10:13 +1000, Timothy Arceri wrote: The util/hash_table was intended to be a fast hash table replacement for the program/hash_table see 35fd61bd99c1 and 72e55bb6888ff. This replaces some more uses of the old hash table and also destroys the interface_types hash table when _mesa_glsl_release_types() is called which wasn't previously being done. --- Was looking at the remaining program/hash_table uses and noticed that interface_types wasnt being freed so thought I'd fix that and replace the hash while I was there. No measurable compile time changes to the public shader-db src/glsl/glsl_types.cpp | 85 ++--- src/glsl/glsl_types.h | 2 +- 2 files changed, 46 insertions(+), 41 deletions(-) diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index 281ff51..255bd69 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -25,7 +25,7 @@ #include main/core.h /* for Elements, MAX2 */ #include glsl_parser_extras.h #include glsl_types.h -#include program/hash_table.h +#include util/hash_table.h mtx_t glsl_type::mutex = _MTX_INITIALIZER_NP; @@ -329,14 +329,19 @@ _mesa_glsl_release_types(void) * necessary. */ if (glsl_type::array_types != NULL) { - hash_table_dtor(glsl_type::array_types); + _mesa_hash_table_destroy(glsl_type::array_types, NULL); glsl_type::array_types = NULL; } if (glsl_type::record_types != NULL) { - hash_table_dtor(glsl_type::record_types); + _mesa_hash_table_destroy(glsl_type::record_types, NULL); glsl_type::record_types = NULL; } + + if (glsl_type::interface_types != NULL) { + _mesa_hash_table_destroy(glsl_type::interface_types, NULL); + glsl_type::interface_types = NULL; + } I think it is probably best to put the destruction of interface_types in a separate patch, it is a different issue after all. You can add my Reviewed-by on that patch. With that and a couple of other minor nitpicks I mention below fixed, this is: Reviewed-by: Iago Toral Quiroga ito...@igalia.com } @@ -648,27 +653,28 @@ glsl_type::get_array_instance(const glsl_type *base, unsigned array_size) mtx_lock(glsl_type::mutex); if (array_types == NULL) { - array_types = hash_table_ctor(64, hash_table_string_hash, - hash_table_string_compare); + array_types = _mesa_hash_table_create(NULL, _mesa_key_hash_string, +_mesa_key_string_equal); } - const glsl_type *t = (glsl_type *) hash_table_find(array_types, key); - - if (t == NULL) { + const struct hash_entry *entry = _mesa_hash_table_search(array_types, key); + if (entry == NULL) { mtx_unlock(glsl_type::mutex); - t = new glsl_type(base, array_size); + const glsl_type *t = new glsl_type(base, array_size); mtx_lock(glsl_type::mutex); - hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, key)); + entry = _mesa_hash_table_insert(array_types, + ralloc_strdup(mem_ctx, key), + (void *) t); } - assert(t-base_type == GLSL_TYPE_ARRAY); - assert(t-length == array_size); - assert(t-fields.array == base); + assert(((glsl_type *)entry-data)-base_type == GLSL_TYPE_ARRAY); + assert(((glsl_type *)entry-data)-length == array_size); + assert(((glsl_type *)entry-data)-fields.array == base); Other parts of this file put a blank between the type cast and the variable, so I would add that here (and in all other places where you cast entry to glsl_type* in this patch). mtx_unlock(glsl_type::mutex); - return t; + return (glsl_type *)entry-data; } @@ -722,19 +728,13 @@ glsl_type::record_compare(const glsl_type *b) const } -int +bool glsl_type::record_key_compare(const void *a, const void *b) { const glsl_type *const key1 = (glsl_type *) a; const glsl_type *const key2 = (glsl_type *) b; - /* Return zero is the types match (there is zero difference) or non-zero -* otherwise. -*/ - if (strcmp(key1-name, key2-name) != 0) - return 1; - - return !key1-record_compare(key2); + return strcmp(key1-name, key2-name) == 0 key1-record_compare(key2); } @@ -772,25 +772,27 @@ glsl_type::get_record_instance(const glsl_struct_field *fields, mtx_lock(glsl_type::mutex); if (record_types == NULL) { - record_types = hash_table_ctor(64, record_key_hash, record_key_compare); + record_types = _mesa_hash_table_create(NULL, record_key_hash, + record_key_compare); } - const glsl_type *t = (glsl_type *) hash_table_find(record_types, key); - if (t == NULL) { + const struct hash_entry *entry = _mesa_hash_table_search(record_types
Re: [Mesa-dev] [PATCH v3 (part2) 00/56] ARB_shader_storage_buffer_object (mesa, i965)
BTW, notice that patches 1-3, 5-7 and 51 have already been reviewed by Jordan. Iago On Tue, 2015-07-14 at 09:46 +0200, Iago Toral Quiroga wrote: This is the second part of the v3 series including remaining frontend bits like the optional unsized array at the bottom of SSBO definitions, layout mode std430 and implementation for GL queries/getters. It also includes all of the i965 backend implementation. Notice that this series depends on the following patches from Curro that have not been landed in master yet: i965/vec4: Import surface message builder functions. i965/vec4: Import helpers to convert vectors into arrays and back. i965/vec4: Import array utils for the surface message builder. i965/fs: Import surface message builder functions. i965/fs: Import helpers to convert vectors into arrays and back. i965/fs: Import array utils for the surface message builder. i965: Lift the constness restriction on surface indices passed to untyped ops. i965/vec4: Introduce VEC4 IR builder. i965/vec4: Make src_reg conversion constructor from dst_reg implicit. i965: Add register constructors taking a backend_reg as argument. i965/fs: Fix register coalesce not to lose track of the second half of 16-wide moves. This v3-part2 series, together with these patches from Curro and part1 (which has already been landed in master), can be obtained from this development repository: git clone -b itoral-ARB_shader_storage_buffer_object-v3-part2 https://github.com/Igalia/mesa.git Also notice that: * Patch 55 is a separate bugfix that affects UBOs as well. * Patch 56 is not required by the series and is completely optional. SSBO piglit tests have already been merged in piglit's master branch. Antia Puentes (1): glsl: Consider active all elements of a shared/std140 block array Iago Toral Quiroga (29): i965: Use 16-byte offset alignment for shader storage buffers i965: Implement DriverFlags.NewShaderStorageBuffer i965: Set MaxShaderStorageBuffers for compute shaders i965: Upload Shader Storage Buffer Object surfaces i965: handle visiting of ir_var_shader_storage variables i965/fs: Do not split buffer variables i965/fs: Do not include the header with a pixel mask in untyped read messages i965/vec4: Implement SSBO writes nir: Implement __intrinsic_store_ssbo i965/nir/fs: Implement SSBO writes i965/vec4: Implement SSBO reads nir: Implement __intrinsic_store_load i965/nir/fs: Implement SSBO reads nir: ignore an instruction's dest if it hasn't any glsl: Rename atomic counter functions glsl: Add atomic functions from ARB_shader_storage_buffer_object glsl: lower SSBO atomic intrinsics nir: Implement lowered SSBO atomic intrinsics i965/nir/fs: Implement SSBO atomics i965/vec4: Implement lowered SSBO atomic intrinsics glsl: First argument to atomic functions must be a buffer variable mesa: Add queries for GL_SHADER_STORAGE_BUFFER glsl: Allow use of memory qualifiers with ARB_shader_storage_buffer_object. glsl: Apply memory qualifiers to buffer variables glsl: Allow memory layout qualifiers on shader storage buffer objects glsl: Do not allow assignments to read-only variables glsl: Do not allow reads from write-only variables docs: Mark ARB_shader_storage_buffer_object as done for i965. i965/vec4: Skip dependency control for opcodes emitting multiple instructions Kristian Høgsberg (1): glsl: atomic counters can be declared as buffer-qualified variables Samuel Iglesias Gonsalvez (25): i965: set ARB_shader_storage_buffer_object related constant values glsl: add support for unsized arrays in shader storage blocks glsl: Add parser/compiler support for unsized array's length() glsl: Lower unsized array's length calculation expression nir: add shader storage buffer's unsized array length calculation i965/vec4: Implement unsized array's length calculation i965/fs: Implement generator code for unsized array's length calculation i965/fs/nir: implement unsized array's length calculation i965/wm: emit null buffer surfaces when null buffers are attached i965/wm: surfaces should have the API buffer size, not the drm buffer size glsl: Add parser/compiler support for std430 interface packing qualifier glsl: propagate interface packing information to arrays of scalars, vectors. glsl: propagate std430 packing qualifier to struct's members and array of structs glsl: add std430 interface packing support to ssbo store and unsized array length glsl: a shader storage buffer must be smaller than the maximum size allowed glsl: number of active shader storage blocks must be within allowed limits glsl: ignore buffer variables when counting uniform components glsl: shader storage blocks use different max block size values than uniforms mesa: add glShaderStorageBlockBinding() glsl: fix UNIFORM_BUFFER_START
Re: [Mesa-dev] [PATCH v3 (part2) 54/56] docs: Mark ARB_shader_storage_buffer_object as done for i965.
Hi Mike, On Tue, 2015-07-14 at 09:30 +, Mike Lothian wrote: Hi Iago Nice work, was it an oversight this wasn't enabled for GLES 3.1? Or is the implementation slightly different? Good question, to be honest I don't remember right now, I think I checked this at some point and did not see any issues with enabling this for GLES 3.1 at the time but then I forgot to enable it, but I need to double check this though. Iago Cheers Mike On Tue, 14 Jul 2015 at 08:48 Iago Toral Quiroga ito...@igalia.com wrote: --- docs/GL3.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 33a282e..6427616 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -164,7 +164,7 @@ GL 4.3, GLSL 4.30: GL_ARB_program_interface_query DONE (all drivers) GL_ARB_robust_buffer_access_behavior not started GL_ARB_shader_image_size in progress (Martin Peres) - GL_ARB_shader_storage_buffer_object in progress (Iago Toral, Samuel Iglesias) + GL_ARB_shader_storage_buffer_object DONE (i965) GL_ARB_stencil_texturing DONE (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe) GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/13] swrast: Avoid double promotion.
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/mesa/swrast/s_aaline.c| 28 ++-- src/mesa/swrast/s_aalinetemp.h| 4 ++-- src/mesa/swrast/s_atifragshader.c | 4 ++-- src/mesa/swrast/s_copypix.c | 6 +++--- src/mesa/swrast/s_drawpix.c | 12 ++-- src/mesa/swrast/s_fragprog.c | 4 ++-- src/mesa/swrast/s_lines.c | 4 ++-- src/mesa/swrast/s_points.c| 10 +- src/mesa/swrast/s_span.c | 10 +- src/mesa/swrast/s_texcombine.c| 6 +++--- src/mesa/swrast/s_texfilter.c | 8 src/mesa/swrast/s_tritemp.h | 2 +- src/mesa/swrast/s_zoom.c | 2 +- 13 files changed, 50 insertions(+), 50 deletions(-) diff --git a/src/mesa/swrast/s_aaline.c b/src/mesa/swrast/s_aaline.c index f3258e8..de5b42b 100644 --- a/src/mesa/swrast/s_aaline.c +++ b/src/mesa/swrast/s_aaline.c @@ -116,11 +116,11 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, GLfloat y1, const GLfloat b = pz * py; const GLfloat c = px * px + py * py; const GLfloat d = -(a * x0 + b * y0 + c * z0); - if (a == 0.0 b == 0.0 c == 0.0 d == 0.0) { - plane[0] = 0.0; - plane[1] = 0.0; - plane[2] = 1.0; - plane[3] = 0.0; + if (a == 0.0F b == 0.0F c == 0.0F d == 0.0F) { + plane[0] = 0.0F; + plane[1] = 0.0F; + plane[2] = 1.0F; + plane[3] = 0.0F; } else { plane[0] = a; @@ -135,9 +135,9 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, GLfloat y1, static inline void constant_plane(GLfloat value, GLfloat plane[4]) { - plane[0] = 0.0; - plane[1] = 0.0; - plane[2] = -1.0; + plane[0] = 0.0F; + plane[1] = 0.0F; + plane[2] = -1.0F; plane[3] = value; } @@ -160,8 +160,8 @@ static inline GLfloat solve_plane_recip(GLfloat x, GLfloat y, const GLfloat plane[4]) { const GLfloat denom = plane[3] + plane[0] * x + plane[1] * y; - if (denom == 0.0) - return 0.0; + if (denom == 0.0F) + return 0.0F; else return -plane[2] / denom; } @@ -374,7 +374,7 @@ segment(struct gl_context *ctx, if (x0 x1) { xLeft = x0 - line-halfWidth; xRight = x1 + line-halfWidth; - if (line-dy = 0.0) { + if (line-dy = 0.0F) { yBot = y0 - 3.0F * line-halfWidth; yTop = y0 + line-halfWidth; } @@ -386,7 +386,7 @@ segment(struct gl_context *ctx, else { xLeft = x1 - line-halfWidth; xRight = x0 + line-halfWidth; - if (line-dy = 0.0) { + if (line-dy = 0.0F) { yBot = y1 - 3.0F * line-halfWidth; yTop = y1 + line-halfWidth; } @@ -420,7 +420,7 @@ segment(struct gl_context *ctx, if (y0 y1) { yBot = y0 - line-halfWidth; yTop = y1 + line-halfWidth; - if (line-dx = 0.0) { + if (line-dx = 0.0F) { xLeft = x0 - 3.0F * line-halfWidth; xRight = x0 + line-halfWidth; } @@ -432,7 +432,7 @@ segment(struct gl_context *ctx, else { yBot = y1 - line-halfWidth; yTop = y0 + line-halfWidth; - if (line-dx = 0.0) { + if (line-dx = 0.0F) { xLeft = x1 - 3.0F * line-halfWidth; xRight = x1 + line-halfWidth; } diff --git a/src/mesa/swrast/s_aalinetemp.h b/src/mesa/swrast/s_aalinetemp.h index f1d078f..bebb131 100644 --- a/src/mesa/swrast/s_aalinetemp.h +++ b/src/mesa/swrast/s_aalinetemp.h @@ -44,7 +44,7 @@ NAME(plot)(struct gl_context *ctx, struct LineInfo *line, int ix, int iy) (void) swrast; - if (coverage == 0.0) + if (coverage == 0.0F) return; line-span.end++; @@ -123,7 +123,7 @@ NAME(line)(struct gl_context *ctx, const SWvertex *v0, const SWvertex *v1) ctx-Const.MinLineWidthAA, ctx-Const.MaxLineWidthAA); - if (line.len == 0.0 || IS_INF_OR_NAN(line.len)) + if (line.len == 0.0F || IS_INF_OR_NAN(line.len)) return; INIT_SPAN(line.span, GL_LINE); diff --git a/src/mesa/swrast/s_atifragshader.c b/src/mesa/swrast/s_atifragshader.c index 9e029db..2974dee 100644 --- a/src/mesa/swrast/s_atifragshader.c +++ b/src/mesa/swrast/s_atifragshader.c @@ -436,13 +436,13 @@ execute_shader(struct gl_context *ctx, const struct ati_fragment_shader *shader, for (i = 0; i 3; i++) { dst[optype][i] = (src[optype][2][i] - 0.5) ? src[optype][0][i] : src[optype][1][i]; + 0.5F) ? src[optype][0][i] : src[optype][1][i]; } } else { dst[optype][3] = (src[optype][2][3
Re: [Mesa-dev] [PATCH 05/13] nir: Avoid double promition.
On Tue, 2015-07-14 at 11:30 +0200, Iago Toral wrote: Reviewed-by: Iago Toral Quiroga ito...@igalia.com BTW, ldexpf and copysignf are c99 too, so I guess you'll need to wrap these too. Iago On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/glsl/nir/nir_opcodes.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py index 56e96d9..df5b7e2 100644 --- a/src/glsl/nir/nir_opcodes.py +++ b/src/glsl/nir/nir_opcodes.py @@ -474,10 +474,10 @@ else ) opcode(ldexp, 0, tfloat, [0, 0], [tfloat, tint], , -dst = ldexp(src0, src1); +dst = ldexpf(src0, src1); /* flush denormals to zero. */ if (!isnormal(dst)) - dst = copysign(0.0f, src0); + dst = copysignf(0.0f, src0); ) # Combines the first component of each input to make a 2-component vector. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/13] program: Avoid double promotion.
Needs wrappers for sinf, cosf, powf, logf. Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/mesa/program/prog_execute.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c index 77274e2..2c52d0d 100644 --- a/src/mesa/program/prog_execute.c +++ b/src/mesa/program/prog_execute.c @@ -623,7 +623,7 @@ _mesa_execute_program(struct gl_context * ctx, GLfloat a[4], result[4]; fetch_vector1(inst-SrcReg[0], machine, a); result[0] = result[1] = result[2] = result[3] - = (GLfloat) cos(a[0]); + = cosf(a[0]); store_vector4(inst, machine, result); } break; @@ -776,7 +776,7 @@ _mesa_execute_program(struct gl_context * ctx, if (inst-SrcReg[0].File != PROGRAM_UNDEFINED) { GLfloat a[4]; fetch_vector1(inst-SrcReg[0], machine, a); - cond = (a[0] != 0.0); + cond = (a[0] != 0.0F); } else { cond = eval_condition(machine, inst); @@ -834,7 +834,7 @@ _mesa_execute_program(struct gl_context * ctx, val = -FLT_MAX; } else { - val = (float)(log(a[0]) * 1.442695F); + val = logf(a[0]) * 1.442695F; } result[0] = result[1] = result[2] = result[3] = val; store_vector4(inst, machine, result); @@ -853,10 +853,10 @@ _mesa_execute_program(struct gl_context * ctx, result[1] = a[0]; /* XXX we could probably just use pow() here */ if (a[0] 0.0F) { - if (a[1] == 0.0 a[3] == 0.0) + if (a[1] == 0.0F a[3] == 0.0F) result[2] = 1.0F; else - result[2] = (GLfloat) pow(a[1], a[3]); + result[2] = powf(a[1], a[3]); } else { result[2] = 0.0F; @@ -886,12 +886,12 @@ _mesa_execute_program(struct gl_context * ctx, int exponent; GLfloat mantissa = frexpf(t[0], exponent); q[0] = (GLfloat) (exponent - 1); - q[1] = (GLfloat) (2.0 * mantissa); /* map [.5, 1) - [1, 2) */ + q[1] = 2.0F * mantissa; /* map [.5, 1) - [1, 2) */ /* The fast LOG2 macro doesn't meet the precision * requirements. */ - q[2] = (float)(log(t[0]) * 1.442695F); + q[2] = logf(t[0]) * 1.442695F; } } else { @@ -1051,7 +1051,7 @@ _mesa_execute_program(struct gl_context * ctx, fetch_vector1(inst-SrcReg[0], machine, a); fetch_vector1(inst-SrcReg[1], machine, b); result[0] = result[1] = result[2] = result[3] - = (GLfloat) pow(a[0], b[0]); + = powf(a[0], b[0]); store_vector4(inst, machine, result); } break; @@ -1095,10 +1095,10 @@ _mesa_execute_program(struct gl_context * ctx, { GLfloat a[4], result[4]; fetch_vector1(inst-SrcReg[0], machine, a); -result[0] = (GLfloat) cos(a[0]); -result[1] = (GLfloat) sin(a[0]); -result[2] = 0.0;/* undefined! */ -result[3] = 0.0;/* undefined! */ +result[0] = cosf(a[0]); +result[1] = sinf(a[0]); +result[2] = 0.0F;/* undefined! */ +result[3] = 0.0F;/* undefined! */ store_vector4(inst, machine, result); } break; @@ -1161,7 +1161,7 @@ _mesa_execute_program(struct gl_context * ctx, GLfloat a[4], result[4]; fetch_vector1(inst-SrcReg[0], machine, a); result[0] = result[1] = result[2] = result[3] - = (GLfloat) sin(a[0]); + = sinf(a[0]); store_vector4(inst, machine, result); } break; @@ -1360,7 +1360,7 @@ _mesa_execute_program(struct gl_context * ctx, * zero, we'd probably be fine except for an assert in * IROUND_POS() which gets triggered by the inf values created. */ -if (texcoord[3] != 0.0) { +if (texcoord[3] != 0.0F) { texcoord[0] /= texcoord[3]; texcoord[1] /= texcoord[3]; texcoord[2] /= texcoord[3]; @@ -1380,7 +1380,7 @@ _mesa_execute_program(struct gl_context * ctx, fetch_vector4(inst-SrcReg[0], machine, texcoord); if (inst-TexSrcTarget != TEXTURE_CUBE_INDEX -texcoord[3] != 0.0) { +texcoord[3] != 0.0F) { texcoord[0
Re: [Mesa-dev] [PATCH 07/13] util: Avoid double promition.
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/util/register_allocate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c index 2ad8c3c..95be20f 100644 --- a/src/util/register_allocate.c +++ b/src/util/register_allocate.c @@ -648,7 +648,7 @@ ra_get_best_spill_node(struct ra_graph *g) float cost = g-nodes[n].spill_cost; float benefit; - if (cost = 0.0) + if (cost = 0.0f) continue; if (g-nodes[n].in_stack) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/13] tnl: Avoid double promotion.
On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: There are a couple of unrelated changes in t_vb_lighttmp.h that I hope you'll excuse -- there's a block of code that's duplicated modulo a few trivial differences that I took the liberty of fixing. --- src/mesa/tnl/t_draw.c | 2 +- src/mesa/tnl/t_rasterpos.c | 6 +++--- src/mesa/tnl/t_vb_fog.c | 6 +++--- src/mesa/tnl/t_vb_light.c | 16 src/mesa/tnl/t_vb_lighttmp.h| 16 +++- src/mesa/tnl/t_vb_normals.c | 4 ++-- src/mesa/tnl/t_vertex_generic.c | 2 +- 7 files changed, 25 insertions(+), 27 deletions(-) diff --git a/src/mesa/tnl/t_draw.c b/src/mesa/tnl/t_draw.c index 6adf1dc..713c6a9 100644 --- a/src/mesa/tnl/t_draw.c +++ b/src/mesa/tnl/t_draw.c @@ -257,7 +257,7 @@ static GLboolean *_tnl_import_edgeflag( struct gl_context *ctx, GLuint i; for (i = 0; i count; i++) { - *bptr++ = ((GLfloat *)ptr)[0] == 1.0; + *bptr++ = ((GLfloat *)ptr)[0] == 1.0F; ptr += stride; } diff --git a/src/mesa/tnl/t_rasterpos.c b/src/mesa/tnl/t_rasterpos.c index 7ef50ea..4bd9ac8 100644 --- a/src/mesa/tnl/t_rasterpos.c +++ b/src/mesa/tnl/t_rasterpos.c @@ -148,7 +148,7 @@ shade_rastpos(struct gl_context *ctx, SUB_3V(VP, light-_Position, vertex); /* d = length(VP) */ d = (GLfloat) LEN_3FV( VP ); - if (d 1.0e-6) { + if (d 1.0e-6F) { /* normalize VP */ GLfloat invd = 1.0F / d; SELF_SCALE_SCALAR_3V(VP, invd); @@ -172,7 +172,7 @@ shade_rastpos(struct gl_context *ctx, } } - if (attenuation 1e-3) + if (attenuation 1e-3F) continue; n_dot_VP = DOT3( normal, VP ); @@ -219,7 +219,7 @@ shade_rastpos(struct gl_context *ctx, shine = ctx-Light.Material.Attrib[MAT_ATTRIB_FRONT_SHININESS][0]; spec_coef = powf(n_dot_h, shine); - if (spec_coef 1.0e-10) { + if (spec_coef 1.0e-10F) { if (ctx-Light.Model.ColorControl==GL_SEPARATE_SPECULAR_COLOR) { ACC_SCALE_SCALAR_3V( specularContrib, spec_coef, light-_MatSpecular[0]); diff --git a/src/mesa/tnl/t_vb_fog.c b/src/mesa/tnl/t_vb_fog.c index 1ca72f8..5489ed6 100644 --- a/src/mesa/tnl/t_vb_fog.c +++ b/src/mesa/tnl/t_vb_fog.c @@ -45,8 +45,8 @@ struct fog_stage_data { #define FOG_STAGE_DATA(stage) ((struct fog_stage_data *)stage-privatePtr) #define FOG_EXP_TABLE_SIZE 256 -#define FOG_MAX (10.0) -#define EXP_FOG_MAX .0006595 +#define FOG_MAX (10.0F) +#define EXP_FOG_MAX .0006595F #define FOG_INCR (FOG_MAX/FOG_EXP_TABLE_SIZE) static GLfloat exp_table[FOG_EXP_TABLE_SIZE]; static GLfloat inited = 0; @@ -54,7 +54,7 @@ static GLfloat inited = 0; #if 1 #define NEG_EXP( result, narg ) \ do { \ - GLfloat f = (GLfloat) (narg * (1.0/FOG_INCR));\ + GLfloat f = (GLfloat) (narg * (1.0F / FOG_INCR)); \ GLint k = (GLint) f; \ if (k FOG_EXP_TABLE_SIZE-2) \ result = (GLfloat) EXP_FOG_MAX; \ diff --git a/src/mesa/tnl/t_vb_light.c b/src/mesa/tnl/t_vb_light.c index dbd57fa..df9073e 100644 --- a/src/mesa/tnl/t_vb_light.c +++ b/src/mesa/tnl/t_vb_light.c @@ -137,23 +137,23 @@ validate_shine_table( struct gl_context *ctx, GLuint side, GLfloat shininess ) break; m = s-tab; - m[0] = 0.0; - if (shininess == 0.0) { + m[0] = 0.0F; + if (shininess == 0.0F) { for (j = 1 ; j = SHINE_TABLE_SIZE ; j++) - m[j] = 1.0; + m[j] = 1.0F; } else { for (j = 1 ; j SHINE_TABLE_SIZE ; j++) { GLdouble t, x = j / (GLfloat) (SHINE_TABLE_SIZE - 1); I think you want to declare x (and probably t as well) with type GLfloat here. -if (x 0.005) /* underflow check */ - x = 0.005; +if (x 0.005F) /* underflow check */ + x = 0.005F; t = pow(x, shininess); Since the code below casts t to float anyway, is there a reason why you did not use powf and declared t as float? Also, there is no point in making the change below if you don't do that, right? - if (t 1e-20) + if (t 1e-20F) m[j] = (GLfloat) t; else -m[j] = 0.0; +m[j] = 0.0F; } - m[SHINE_TABLE_SIZE] = 1.0; + m[SHINE_TABLE_SIZE] = 1.0F; } s-shininess = shininess; diff --git a/src/mesa/tnl/t_vb_lighttmp.h b/src/mesa/tnl/t_vb_lighttmp.h index f8786ac..3aebcd4 100644 --- a/src/mesa/tnl/t_vb_lighttmp.h +++ b/src/mesa/tnl/t_vb_lighttmp.h @@ -112,7 +112,7 @@
Re: [Mesa-dev] [PATCH 05/13] nir: Avoid double promition.
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/glsl/nir/nir_opcodes.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py index 56e96d9..df5b7e2 100644 --- a/src/glsl/nir/nir_opcodes.py +++ b/src/glsl/nir/nir_opcodes.py @@ -474,10 +474,10 @@ else ) opcode(ldexp, 0, tfloat, [0, 0], [tfloat, tint], , -dst = ldexp(src0, src1); +dst = ldexpf(src0, src1); /* flush denormals to zero. */ if (!isnormal(dst)) - dst = copysign(0.0f, src0); + dst = copysignf(0.0f, src0); ) # Combines the first component of each input to make a 2-component vector. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/13] gallium/auxiliary: Avoid double promotion.
This one will need wrapping for fabsf. Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/gallium/auxiliary/util/u_format_rgb9e5.h | 2 +- src/gallium/auxiliary/util/u_math.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_rgb9e5.h b/src/gallium/auxiliary/util/u_format_rgb9e5.h index 7a01f7f..d1ace3f 100644 --- a/src/gallium/auxiliary/util/u_format_rgb9e5.h +++ b/src/gallium/auxiliary/util/u_format_rgb9e5.h @@ -75,7 +75,7 @@ typedef union { static INLINE float rgb9e5_ClampRange(float x) { - if (x 0.0) { + if (x 0.0f) { if (x = MAX_RGB9E5) { return MAX_RGB9E5; } else { diff --git a/src/gallium/auxiliary/util/u_math.h b/src/gallium/auxiliary/util/u_math.h index 3b4040f..9c3cb6a 100644 --- a/src/gallium/auxiliary/util/u_math.h +++ b/src/gallium/auxiliary/util/u_math.h @@ -240,7 +240,7 @@ util_iround(float f) static INLINE boolean util_is_approx(float a, float b, float tol) { - return fabs(b - a) = tol; + return fabsf(b - a) = tol; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/13] i965: Use float calculations when double is unnecessary.
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: Literals without an f/F suffix are of type double, and implicit conversion rules specify that the float in (float op double) be converted to a double before the operation is performed. I believe float execution was intended (in nearly all cases) or is sufficient (in the case of gen7_urb.c). Removes a lot of float - double conversion instructions and replaces many double instructions with float instructions which are cheaper. text data bss dec hex filename 4928659 19516026192 5150011 4e953b i965_dri.so before 4928315 19515226192 5149659 4e93db i965_dri.so after --- src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 22 +++--- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_meta_fast_clear.c| 4 ++-- src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c | 4 ++-- src/mesa/drivers/dri/i965/brw_misc_state.c | 4 ++-- src/mesa/drivers/dri/i965/brw_sampler_state.c | 4 ++-- src/mesa/drivers/dri/i965/brw_sf_state.c | 9 + src/mesa/drivers/dri/i965/brw_state_cache.c| 2 +- src/mesa/drivers/dri/i965/brw_util.h | 4 ++-- src/mesa/drivers/dri/i965/gen6_multisample_state.c | 4 ++-- src/mesa/drivers/dri/i965/gen6_sf_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_sf_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_urb.c | 2 +- src/mesa/drivers/dri/i965/gen8_sf_state.c | 2 +- 14 files changed, 35 insertions(+), 34 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp index 1561b59..205c905 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp @@ -1285,8 +1285,8 @@ brw_blorp_blit_program::translate_dst_to_src() /* Round the float coordinates down to nearest integer */ emit_rndd(Xp_f, X_f); emit_rndd(Yp_f, Y_f); - emit_mul(X_f, Xp_f, brw_imm_f(1 / key-x_scale)); - emit_mul(Y_f, Yp_f, brw_imm_f(1 / key-y_scale)); + emit_mul(X_f, Xp_f, brw_imm_f(1.0f / key-x_scale)); + emit_mul(Y_f, Yp_f, brw_imm_f(1.0f / key-y_scale)); SWAP_XY_AND_XPYP(); } else if (!key-bilinear_filter) { /* Round the float coordinates down to nearest integer by moving to @@ -1442,7 +1442,7 @@ brw_blorp_blit_program::manual_blend_average(unsigned num_samples) for (int j = 0; j 4; ++j) { emit_mul(offset(texture_data[0], 2*j), offset(vec8(texture_data[0]), 2*j), - brw_imm_f(1.0/num_samples)); + brw_imm_f(1.0f / num_samples)); } } @@ -1475,9 +1475,9 @@ brw_blorp_blit_program::manual_blend_bilinear(unsigned num_samples) /* Compute pixel coordinates */ emit_add(vec16(x_sample_coords), Xp_f, - brw_imm_f((float)(i 0x1) * (1.0 / key-x_scale))); + brw_imm_f((float)(i 0x1) * (1.0f / key-x_scale))); emit_add(vec16(y_sample_coords), Yp_f, - brw_imm_f((float)((i 1) 0x1) * (1.0 / key-y_scale))); + brw_imm_f((float)((i 1) 0x1) * (1.0f / key-y_scale))); emit_mov(vec16(X), x_sample_coords); emit_mov(vec16(Y), y_sample_coords); @@ -1789,7 +1789,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, GLfloat src1, * so 0.5 provides the necessary correction. */ multiplier = scale; - offset = src0 + (-dst0 + 0.5) * scale; + offset = src0 + (-dst0 + 0.5f) * scale; } else { /* When mirroring X we need: * src_x - src_x0 = dst_x1 - dst_x - 0.5 @@ -1797,7 +1797,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, GLfloat src1, * src_x = src_x0 + (dst_x1 -dst_x - 0.5) * scale */ multiplier = -scale; - offset = src0 + (dst1 - 0.5) * scale; + offset = src0 + (dst1 - 0.5f) * scale; } } @@ -1952,8 +1952,8 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct brw_context *brw, /* Scaling factors used for bilinear filtering in multisample scaled * blits. */ - wm_prog_key.x_scale = 2.0; - wm_prog_key.y_scale = src_mt-num_samples / 2.0; + wm_prog_key.x_scale = 2.0f; + wm_prog_key.y_scale = src_mt-num_samples / 2.0f; if (filter == GL_LINEAR src.num_samples = 1 dst.num_samples = 1) wm_prog_key.bilinear_filter = true; @@ -2000,9 +2000,9 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct brw_context *brw, x1 = wm_push_consts.dst_x1 = roundf(dst_x1); y1 = wm_push_consts.dst_y1 = roundf(dst_y1); wm_push_consts.rect_grid_x1 = (minify(src_mt-logical_width0, src_level) * - wm_prog_key.x_scale - 1.0
Re: [Mesa-dev] [PATCH 08/13] vbo: Avoid double promotion.
Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/mesa/vbo/vbo_context.c| 6 +++--- src/mesa/vbo/vbo_exec_array.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c index fd1ffe2..e3eb286 100644 --- a/src/mesa/vbo/vbo_context.c +++ b/src/mesa/vbo/vbo_context.c @@ -37,9 +37,9 @@ static GLuint check_size( const GLfloat *attr ) { - if (attr[3] != 1.0) return 4; - if (attr[2] != 0.0) return 3; - if (attr[1] != 0.0) return 2; + if (attr[3] != 1.0F) return 4; + if (attr[2] != 0.0F) return 3; + if (attr[1] != 0.0F) return 2; return 1; } diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c index 72b8206..b73aa97 100644 --- a/src/mesa/vbo/vbo_exec_array.c +++ b/src/mesa/vbo/vbo_exec_array.c @@ -255,7 +255,7 @@ check_array_data(struct gl_context *ctx, struct gl_client_array *array, GLint k; for (k = 0; k array-Size; k++) { if (IS_INF_OR_NAN(f[k]) || - f[k] = 1.0e20 || f[k] = -1.0e10) { + f[k] = 1.0e20F || f[k] = -1.0e10F) { printf(Bad array data:\n); printf( Element[%u].%u = %f\n, j, k, f[k]); printf( Array %u at %p\n, attrib, (void* ) array); @@ -263,7 +263,7 @@ check_array_data(struct gl_context *ctx, struct gl_client_array *array, array-Type, array-Size, array-Stride); printf( Address/offset %p in Buffer Object %u\n, array-Ptr, array-BufferObj-Name); - f[k] = 1.0; /* XXX replace the bad value! */ + f[k] = 1.0F; /* XXX replace the bad value! */ } /*assert(!IS_INF_OR_NAN(f[k]));*/ } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/13] mesa/math: Avoid double promotion.
; r3[6] *= s; r3[7] *= s; @@ -490,26 +490,26 @@ static GLboolean invert_matrix_3d_general( GLmatrix *mat ) */ pos = neg = 0.0; t = MAT(in,0,0) * MAT(in,1,1) * MAT(in,2,2); - if (t = 0.0) pos += t; else neg += t; + if (t = 0.0F) pos += t; else neg += t; t = MAT(in,1,0) * MAT(in,2,1) * MAT(in,0,2); - if (t = 0.0) pos += t; else neg += t; + if (t = 0.0F) pos += t; else neg += t; t = MAT(in,2,0) * MAT(in,0,1) * MAT(in,1,2); - if (t = 0.0) pos += t; else neg += t; + if (t = 0.0F) pos += t; else neg += t; t = -MAT(in,2,0) * MAT(in,1,1) * MAT(in,0,2); - if (t = 0.0) pos += t; else neg += t; + if (t = 0.0F) pos += t; else neg += t; t = -MAT(in,1,0) * MAT(in,0,1) * MAT(in,2,2); - if (t = 0.0) pos += t; else neg += t; + if (t = 0.0F) pos += t; else neg += t; t = -MAT(in,0,0) * MAT(in,2,1) * MAT(in,1,2); - if (t = 0.0) pos += t; else neg += t; + if (t = 0.0F) pos += t; else neg += t; det = pos + neg; - if (fabsf(det) 1e-25) + if (fabsf(det) 1e-25F) return GL_FALSE; det = 1.0F / det; @@ -564,7 +564,7 @@ static GLboolean invert_matrix_3d( GLmatrix *mat ) MAT(in,0,1) * MAT(in,0,1) + MAT(in,0,2) * MAT(in,0,2)); - if (scale == 0.0) + if (scale == 0.0F) return GL_FALSE; scale = 1.0F / scale; @@ -799,8 +799,8 @@ _math_matrix_rotate( GLmatrix *mat, GLfloat m[16]; GLboolean optimized; - s = (GLfloat) sin( angle * M_PI / 180.0 ); - c = (GLfloat) cos( angle * M_PI / 180.0 ); + s = sinf( angle * M_PI / 180.0 ); 180.0F + c = cosf( angle * M_PI / 180.0 ); 180.0F I guess we will need wrappers for sinf and cosf. Reviewed-by: Iago Toral Quiroga ito...@igalia.com memcpy(m, Identity, sizeof(GLfloat)*16); optimized = GL_FALSE; @@ -859,7 +859,7 @@ _math_matrix_rotate( GLmatrix *mat, if (!optimized) { const GLfloat mag = sqrtf(x * x + y * y + z * z); - if (mag = 1.0e-4) { + if (mag = 1.0e-4F) { /* no rotation, leave mat as-is */ return; } @@ -1070,7 +1070,7 @@ _math_matrix_scale( GLmatrix *mat, GLfloat x, GLfloat y, GLfloat z ) m[2] *= x; m[6] *= y; m[10] *= z; m[3] *= x; m[7] *= y; m[11] *= z; - if (fabsf(x - y) 1e-8 fabsf(x - z) 1e-8) + if (fabsf(x - y) 1e-8F fabsf(x - z) 1e-8F) mat-flags |= MAT_FLAG_UNIFORM_SCALE; else mat-flags |= MAT_FLAG_GENERAL_SCALE; @@ -1206,7 +1206,7 @@ static void analyse_from_scratch( GLmatrix *mat ) GLuint i; for (i = 0 ; i 16 ; i++) { - if (m[i] == 0.0) mask |= (1i); + if (m[i] == 0.0F) mask |= (1i); } if (m[0] == 1.0F) mask |= (116); @@ -1240,12 +1240,12 @@ static void analyse_from_scratch( GLmatrix *mat ) mat-type = MATRIX_2D; /* Check for scale */ - if (SQ(mm-1) SQ(1e-6) || - SQ(m4m4-1) SQ(1e-6)) + if (SQ(mm-1) SQ(1e-6F) || + SQ(m4m4-1) SQ(1e-6F)) mat-flags |= MAT_FLAG_GENERAL_SCALE; /* Check for rotation */ - if (SQ(mm4) SQ(1e-6)) + if (SQ(mm4) SQ(1e-6F)) mat-flags |= MAT_FLAG_GENERAL_3D; else mat-flags |= MAT_FLAG_ROTATION; @@ -1255,9 +1255,9 @@ static void analyse_from_scratch( GLmatrix *mat ) mat-type = MATRIX_3D_NO_ROT; /* Check for scale */ - if (SQ(m[0]-m[5]) SQ(1e-6) - SQ(m[0]-m[10]) SQ(1e-6)) { - if (SQ(m[0]-1.0) SQ(1e-6)) { + if (SQ(m[0]-m[5]) SQ(1e-6F) + SQ(m[0]-m[10]) SQ(1e-6F)) { + if (SQ(m[0]-1.0F) SQ(1e-6F)) { mat-flags |= MAT_FLAG_UNIFORM_SCALE; } } @@ -1275,8 +1275,8 @@ static void analyse_from_scratch( GLmatrix *mat ) mat-type = MATRIX_3D; /* Check for scale */ - if (SQ(c1-c2) SQ(1e-6) SQ(c1-c3) SQ(1e-6)) { - if (SQ(c1-1.0) SQ(1e-6)) + if (SQ(c1-c2) SQ(1e-6F) SQ(c1-c3) SQ(1e-6F)) { + if (SQ(c1-1.0F) SQ(1e-6F)) mat-flags |= MAT_FLAG_UNIFORM_SCALE; /* else no scale at all */ } @@ -1285,10 +1285,10 @@ static void analyse_from_scratch( GLmatrix *mat ) } /* Check for rotation */ - if (SQ(d1) SQ(1e-6)) { + if (SQ(d1) SQ(1e-6F)) { CROSS3( cp, m, m+4 ); SUB_3V( cp, cp, (m+8) ); - if (LEN_SQUARED_3FV(cp) SQ(1e-6)) + if (LEN_SQUARED_3FV(cp) SQ(1e-6F)) mat-flags |= MAT_FLAG_ROTATION; else mat-flags |= MAT_FLAG_GENERAL_3D; diff --git a/src/mesa/math/m_norm_tmp.h b/src/mesa/math/m_norm_tmp.h index d3ec1c2..6f1db8d 100644 --- a/src/mesa/math/m_norm_tmp.h +++ b/src/mesa/math/m_norm_tmp.h @@ -80,7 +80,7 @@ TAG(transform_normalize_normals)( const GLmatrix *mat, } } else { - if (scale != 1.0) { + if (scale != 1.0f) { m0 *= scale, m4 *= scale, m8 *= scale; m1 *= scale, m5
Re: [Mesa-dev] [PATCH 13/13] mesa: Avoid double promotion.
On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/mesa/main/ffvertex_prog.c | 10 +- src/mesa/main/fog.c | 2 +- src/mesa/main/get.c | 2 +- src/mesa/main/light.c | 30 +++--- src/mesa/main/lines.c | 4 ++-- src/mesa/main/multisample.c | 4 ++-- src/mesa/main/pack.c | 14 +++--- src/mesa/main/pixel.c | 4 ++-- src/mesa/main/pixeltransfer.c | 8 src/mesa/main/points.c| 8 src/mesa/main/readpix.c | 4 ++-- src/mesa/main/samplerobj.c| 2 +- src/mesa/main/texparam.c | 2 +- src/mesa/swrast_setup/ss_tritmp.h | 4 ++-- 14 files changed, 49 insertions(+), 49 deletions(-) diff --git a/src/mesa/main/ffvertex_prog.c b/src/mesa/main/ffvertex_prog.c index 70adaf8..95b428d 100644 --- a/src/mesa/main/ffvertex_prog.c +++ b/src/mesa/main/ffvertex_prog.c @@ -189,15 +189,15 @@ static void make_state_key( struct gl_context *ctx, struct state_key *key ) if (light-Enabled) { key-unit[i].light_enabled = 1; - if (light-EyePosition[3] == 0.0) + if (light-EyePosition[3] == 0.0F) key-unit[i].light_eyepos3_is_zero = 1; - if (light-SpotCutoff == 180.0) + if (light-SpotCutoff == 180.0F) key-unit[i].light_spotcutoff_is_180 = 1; - if (light-ConstantAttenuation != 1.0 || - light-LinearAttenuation != 0.0 || - light-QuadraticAttenuation != 0.0) + if (light-ConstantAttenuation != 1.0F || + light-LinearAttenuation != 0.0F || + light-QuadraticAttenuation != 0.0F) key-unit[i].light_attenuated = 1; } } diff --git a/src/mesa/main/fog.c b/src/mesa/main/fog.c index 3bce289..45f343d 100644 --- a/src/mesa/main/fog.c +++ b/src/mesa/main/fog.c @@ -115,7 +115,7 @@ _mesa_Fogfv( GLenum pname, const GLfloat *params ) ctx-Fog.Mode = m; break; case GL_FOG_DENSITY: - if (*params0.0) { + if (*params0.0F) { _mesa_error( ctx, GL_INVALID_VALUE, glFog ); return; } diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 3d6d639..785a9b5 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -626,7 +626,7 @@ find_custom_value(struct gl_context *ctx, const struct value_desc *d, union valu break; case GL_EDGE_FLAG: - v-value_bool = ctx-Current.Attrib[VERT_ATTRIB_EDGEFLAG][0] == 1.0; + v-value_bool = ctx-Current.Attrib[VERT_ATTRIB_EDGEFLAG][0] == 1.0F; break; case GL_READ_BUFFER: diff --git a/src/mesa/main/light.c b/src/mesa/main/light.c index 4021dbe..fe2ce8c 100644 --- a/src/mesa/main/light.c +++ b/src/mesa/main/light.c @@ -143,7 +143,7 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum pname, const GLfloat *pa COPY_3V(light-SpotDirection, params); break; case GL_SPOT_EXPONENT: - assert(params[0] = 0.0); + assert(params[0] = 0.0F); assert(params[0] = ctx-Const.MaxSpotExponent); if (light-SpotExponent == params[0]) return; @@ -151,12 +151,12 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum pname, const GLfloat *pa light-SpotExponent = params[0]; break; case GL_SPOT_CUTOFF: - assert(params[0] == 180.0 || (params[0] = 0.0 params[0] = 90.0)); + assert(params[0] == 180.0F || (params[0] = 0.0F params[0] = 90.0F)); if (light-SpotCutoff == params[0]) return; FLUSH_VERTICES(ctx, _NEW_LIGHT); light-SpotCutoff = params[0]; - light-_CosCutoff = (GLfloat) (cos(light-SpotCutoff * M_PI / 180.0)); + light-_CosCutoff = (cosf(light-SpotCutoff * M_PI / 180.0)); Same comment as in the previous patch: is there any gain here? Other than this: Reviewed-by: Iago Toral Quiroga ito...@igalia.com if (light-_CosCutoff 0) light-_CosCutoff = 0; if (light-SpotCutoff != 180.0F) @@ -165,21 +165,21 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum pname, const GLfloat *pa light-_Flags = ~LIGHT_SPOT; break; case GL_CONSTANT_ATTENUATION: - assert(params[0] = 0.0); + assert(params[0] = 0.0F); if (light-ConstantAttenuation == params[0]) return; FLUSH_VERTICES(ctx, _NEW_LIGHT); light-ConstantAttenuation = params[0]; break; case GL_LINEAR_ATTENUATION: - assert(params[0] = 0.0); + assert(params[0] = 0.0F); if (light-LinearAttenuation == params[0]) return; FLUSH_VERTICES(ctx, _NEW_LIGHT); light-LinearAttenuation = params[0]; break; case GL_QUADRATIC_ATTENUATION: - assert(params[0] = 0.0); + assert(params[0] = 0.0F); if (light-QuadraticAttenuation == params[0]) return
Re: [Mesa-dev] [PATCH 12/13] mesa/math: Avoid double promotion.
On Tue, 2015-07-14 at 13:19 +0200, Iago Toral wrote: On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote: --- src/mesa/math/m_clip_tmp.h | 20 ++--- src/mesa/math/m_matrix.c | 70 +++--- src/mesa/math/m_norm_tmp.h | 2 +- 3 files changed, 46 insertions(+), 46 deletions(-) diff --git a/src/mesa/math/m_clip_tmp.h b/src/mesa/math/m_clip_tmp.h index e289be7..60c0004 100644 --- a/src/mesa/math/m_clip_tmp.h +++ b/src/mesa/math/m_clip_tmp.h @@ -194,13 +194,13 @@ static GLvector4f * TAG(cliptest_points3)( GLvector4f *clip_vec, STRIDE_LOOP { const GLfloat cx = from[0], cy = from[1], cz = from[2]; GLubyte mask = 0; - if (cx 1.0) mask |= CLIP_RIGHT_BIT; - else if (cx -1.0) mask |= CLIP_LEFT_BIT; - if (cy 1.0) mask |= CLIP_TOP_BIT; - else if (cy -1.0) mask |= CLIP_BOTTOM_BIT; + if (cx 1.0F) mask |= CLIP_RIGHT_BIT; + else if (cx -1.0F) mask |= CLIP_LEFT_BIT; + if (cy 1.0F) mask |= CLIP_TOP_BIT; + else if (cy -1.0F) mask |= CLIP_BOTTOM_BIT; if (viewport_z_clip) { -if (cz 1.0) mask |= CLIP_FAR_BIT; -else if (cz -1.0) mask |= CLIP_NEAR_BIT; +if (cz 1.0F) mask |= CLIP_FAR_BIT; +else if (cz -1.0F) mask |= CLIP_NEAR_BIT; } clipMask[i] = mask; tmpOrMask |= mask; @@ -230,10 +230,10 @@ static GLvector4f * TAG(cliptest_points2)( GLvector4f *clip_vec, STRIDE_LOOP { const GLfloat cx = from[0], cy = from[1]; GLubyte mask = 0; - if (cx 1.0) mask |= CLIP_RIGHT_BIT; - else if (cx -1.0) mask |= CLIP_LEFT_BIT; - if (cy 1.0) mask |= CLIP_TOP_BIT; - else if (cy -1.0) mask |= CLIP_BOTTOM_BIT; + if (cx 1.0F) mask |= CLIP_RIGHT_BIT; + else if (cx -1.0F) mask |= CLIP_LEFT_BIT; + if (cy 1.0F) mask |= CLIP_TOP_BIT; + else if (cy -1.0F) mask |= CLIP_BOTTOM_BIT; clipMask[i] = mask; tmpOrMask |= mask; tmpAndMask = mask; diff --git a/src/mesa/math/m_matrix.c b/src/mesa/math/m_matrix.c index 6a42c6c..6522200 100644 --- a/src/mesa/math/m_matrix.c +++ b/src/mesa/math/m_matrix.c @@ -380,7 +380,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat ) if (fabsf(r3[0])fabsf(r2[0])) SWAP_ROWS(r3, r2); if (fabsf(r2[0])fabsf(r1[0])) SWAP_ROWS(r2, r1); if (fabsf(r1[0])fabsf(r0[0])) SWAP_ROWS(r1, r0); - if (0.0 == r0[0]) return GL_FALSE; + if (0.0F == r0[0]) return GL_FALSE; /* eliminate first variable */ m1 = r1[0]/r0[0]; m2 = r2[0]/r0[0]; m3 = r3[0]/r0[0]; @@ -388,31 +388,31 @@ static GLboolean invert_matrix_general( GLmatrix *mat ) s = r0[2]; r1[2] -= m1 * s; r2[2] -= m2 * s; r3[2] -= m3 * s; s = r0[3]; r1[3] -= m1 * s; r2[3] -= m2 * s; r3[3] -= m3 * s; s = r0[4]; - if (s != 0.0) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; } + if (s != 0.0F) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; } s = r0[5]; - if (s != 0.0) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; } + if (s != 0.0F) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; } s = r0[6]; - if (s != 0.0) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; } + if (s != 0.0F) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; } s = r0[7]; - if (s != 0.0) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; } + if (s != 0.0F) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; } /* choose pivot - or die */ if (fabsf(r3[1])fabsf(r2[1])) SWAP_ROWS(r3, r2); if (fabsf(r2[1])fabsf(r1[1])) SWAP_ROWS(r2, r1); - if (0.0 == r1[1]) return GL_FALSE; + if (0.0F == r1[1]) return GL_FALSE; /* eliminate second variable */ m2 = r2[1]/r1[1]; m3 = r3[1]/r1[1]; r2[2] -= m2 * r1[2]; r3[2] -= m3 * r1[2]; r2[3] -= m2 * r1[3]; r3[3] -= m3 * r1[3]; - s = r1[4]; if (0.0 != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; } - s = r1[5]; if (0.0 != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; } - s = r1[6]; if (0.0 != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; } - s = r1[7]; if (0.0 != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; } + s = r1[4]; if (0.0F != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; } + s = r1[5]; if (0.0F != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; } + s = r1[6]; if (0.0F != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; } + s = r1[7]; if (0.0F != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; } /* choose pivot - or die */ if (fabsf(r3[2])fabsf(r2[2])) SWAP_ROWS(r3, r2); - if (0.0 == r2[2]) return GL_FALSE; + if (0.0F == r2[2]) return GL_FALSE; /* eliminate third variable */ m3 = r3[2]/r2[2]; @@ -421,7 +421,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat ) r3[7] -= m3 * r2[7]; /* last check */ - if (0.0 == r3[3
Re: [Mesa-dev] Register spilling issues in the NIR-vec4 backend
On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote: On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral ito...@igalia.com wrote: Hi, when we sent the patches for the new nir-vec4 backend we mentioned that we had a few dEQP tests that would fail to link because of register spilling. Now that we have added GS support we see a few instances of this problem popping up in a few GS piglit tests too, for example this one: tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test I have been looking into what is going on with these tests and I came to the conclusion that the problem is a consequence of various factors, but probably the main thing contributing to it is the way our SSA pass works. That said, I am not that experienced with NIR, so it could also be that my analysis is missing something and I am just arriving to wrong conclusions, so I'll explain my thoughts below and hopefully someone else with more NIR experience can jump in and confirm or reject my analysis. The GS code in that test looks like this: for (int p = 0; p 3; p++) { color = ((index = ins[p].m1.length() ? ins[p].m2[index-ins[p].m1.length()] : ins[p].m1[index]) == expect) ? vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0); gl_Position = gl_in[p].gl_Position; EmitVertex(); } One thing that is immediately contributing to the register pressure is some really awful code generated because of the indirect array indexing on the inputs inside the loop. This is because of the lower_variable_index_to_cond_assign lowering pass called from brw_shader.cpp. This pass will convert that color assignment into a bunch of nested if/else statements which makes the generated GLSL IR code rather large, involving plenty of temporaries too. This is only made worse by the fact that loop unrolling will replicate that 3 times. The result is a huge pile of GLSL IR with a few dozens of nested if/else statements and temporaries that looks like [1] (that is only a fragment of the GLSL IR). One thing that is particularly relevant in that code is that it has multiple conditional assignments to the same variable (dereference_array_value) as a consequence of this lowering pass. That much, however, is common to the NIR and non-NIR paths. The problem in the NIR case is that all these assignments generate new SSA values, which then become new registers in the final NIR form. This leads to NIR code like [2]. In contrast, the old vec4 visitor path, is able to have writes to the same variable write to the same register. As a result, if I print the code right before register allocation in the NIR path [3] and I compare that to what we get with the old vec4 visitor path at that same point [4], it is clearly visible that this difference is allowing the vec4 visitor path to reduce register pressure (see how in [4] we have multiple writes to vgrf5, while in [3] we always write to a new vgrf every time). So, am I missing something or is this kind of result expected with NIR programs? Is there anything in the nir-vec4 pass that we can do to fix this or does this need to be fixed when going out of SSA moe inside NIR? Iago [1] http://pastebin.com/5uA8ex2S [2] http://pastebin.com/pqLfvAVN [3] http://pastebin.com/64nSuUH8 [4] http://pastebin.com/WCrdYxzt ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Hi Iago, Indeed, NIR does convert conditional writes to conditional selectss -- it's a required part of the conversion to SSA, and since our HW has a conditional select instruction that's just as fast as doing a conditional move, we haven't bothered much to try and change it back during out-of-SSA. However, doing this shouldn't make things worse. In your example, vgrf9, vgrf15, and vgrf17 all have very short live intervals and don't interfere with vgrf11 (unless there's another use of them somewhere after the snippet you pasted), which means that the register allocator is free to allocate the destinations of all the selects to the same register. What's happening, though, is that you're running into our terrible liveness analysis. After doing the proper liveness analysis, we figure out the place each register first becomes live and last becomes dead, and then we consider registers that have overlapping ranges to interfere. So we consider vgrf11 to interfere with vgrf15 and vgrf17, even though it really doesn't. The trouble with making it do the right thing is that we may actually need to extend the live ranges of registers when the exec masks don't match up, either because one uses writemask_all or because they have incompatible exec masks due to containing different datatypes (half-float vs. float, etc.). For example, in your snippet
Re: [Mesa-dev] [PATCH v2 02/14] meta: Fix transfer operations check in meta pbo path for readpixels
On Fri, 2015-07-17 at 10:36 -0700, Anuj Phogat wrote: Iago, Jason: Patches 2 and 5 in this series depend on patches 1 and 4 respectively, Since you guys reviewed 2 and 5, would you also like to review 1, 4 and/or other patches in this series? Thanks -Anuj Sure, I'll have a look at the patches. Iag On Sun, Jun 28, 2015 at 11:29 PM, Iago Toral ito...@igalia.com wrote: Reviewed-by: Iago Toral Quiroga ito...@igalia.com On Fri, 2015-06-26 at 13:15 -0700, Anuj Phogat wrote: Currently used ctx-_ImageTransferState check is not sufficient because it doesn't include the read color clamping enabled with GL_CLAMP_READ_COLOR. So, use the helper function _mesa_get_readpixels_transfer_ops(). Also, transfer operations don't affect glGetTexImage(). So, do the check only for glReadPixles. Without this patch, arb_color_buffer_float-readpixels test fails, when forced to use meta pbo path. V2: Add a comment and bump up the commit message. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org Cc: Iago Toral ito...@igalia.com Cc: Jason Ekstrand jason.ekstr...@intel.com --- src/mesa/drivers/common/meta_tex_subimage.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index d2474f5..90d78e5 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -273,12 +273,17 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, format == GL_COLOR_INDEX) return false; - if (ctx-_ImageTransferState) - return false; - - + /* Don't use meta path for readpixels in below conditions. */ if (!tex_image) { rb = ctx-ReadBuffer-_ColorReadBuffer; + + /* _mesa_get_readpixels_transfer_ops() includes the cases of read + * color clamping along with the ctx-_ImageTransferState. + */ + if (_mesa_get_readpixels_transfer_ops(ctx, rb-Format, format, +type, GL_FALSE)) + return false; + if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) return false; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/14] mesa: Add a mesa utility function _mesa_need_signed_unsigned_int_conversion()
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: This utility function is used in a later patch. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/main/readpix.c | 32 ++-- src/mesa/main/readpix.h | 4 2 files changed, 22 insertions(+), 14 deletions(-) diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c index a9416ef..1038983 100644 --- a/src/mesa/main/readpix.c +++ b/src/mesa/main/readpix.c @@ -114,6 +114,22 @@ _mesa_get_readpixels_transfer_ops(const struct gl_context *ctx, return transferOps; } +bool +_mesa_need_signed_unsigned_int_conversion(mesa_format rbFormat, + GLenum format, GLenum type) +{ + const GLenum srcType = _mesa_get_format_datatype(rbFormat); + return (srcType == GL_INT + _mesa_is_enum_format_integer(format) + (type == GL_UNSIGNED_INT || + type == GL_UNSIGNED_SHORT || + type == GL_UNSIGNED_BYTE)) || + (srcType == GL_UNSIGNED_INT + _mesa_is_enum_format_integer(format) + (type == GL_INT || + type == GL_SHORT || + type == GL_BYTE)); +} I think it is better if you assign the result of _mesa_is_enum_format_integer(format) to a temporary instead of calling it twice in the condition, just like you do with srcType. /** * Return true if memcpy cannot be used for ReadPixels. @@ -130,7 +146,6 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, { struct gl_renderbuffer *rb = _mesa_get_read_renderbuffer_for_format(ctx, format); - GLenum srcType; assert(rb); @@ -157,20 +172,9 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, /* Conversion between signed and unsigned integers needs masking * (it isn't just memcpy). */ - srcType = _mesa_get_format_datatype(rb-Format); - - if ((srcType == GL_INT - _mesa_is_enum_format_integer(format) - (type == GL_UNSIGNED_INT || -type == GL_UNSIGNED_SHORT || -type == GL_UNSIGNED_BYTE)) || - (srcType == GL_UNSIGNED_INT - _mesa_is_enum_format_integer(format) - (type == GL_INT || -type == GL_SHORT || -type == GL_BYTE))) { + if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format, + type)) return GL_TRUE; - } You need to rebase your patch, this code does not exist any more. I moved it to Gallium where I am not sure that your change is what they want. You should probably just skip this part. With these changes, Reviewed-by: Iago Toral Quiroga ito...@igalia.com /* And finally, see if there are any transfer ops. */ return _mesa_get_readpixels_transfer_ops(ctx, rb-Format, format, type, diff --git a/src/mesa/main/readpix.h b/src/mesa/main/readpix.h index f894036..a93e263 100644 --- a/src/mesa/main/readpix.h +++ b/src/mesa/main/readpix.h @@ -46,6 +46,10 @@ _mesa_get_readpixels_transfer_ops(const struct gl_context *ctx, GLenum format, GLenum type, GLboolean uses_blit); +extern bool +_mesa_need_signed_unsigned_int_conversion(mesa_format rbFormat, + GLenum format, GLenum type); + extern void _mesa_readpixels(struct gl_context *ctx, GLint x, GLint y, GLsizei width, GLsizei height, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion
On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote: On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote: Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when forced to use the meta pbo path. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 00364f8..84cbc50 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) return false; + + if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format, type)) + return false; Hrm... This seems fishy. Isn't glBlitFramebuffers supposed to handle format conversion with integers? If so we should probably fix it rather than just skip it for the meta pbo path. As discussed offline, here is relevant text for glBlitFrameBuffer() from OpenGL 4.5 spec, section 18.3.1: An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: -The read buffer contains fixed-point or floating-point values and any draw buffer contains neither fixed-point nor floating-point values. -The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. - The read buffer contains signed integer values and any draw buffer does not contain signed integer values. I'll add a comment here explaining the reason to avoid meta path. Is this code going to run only for glBlitFramebuffer? I see this function being called from code paths that implement glReadPixels and glGetTexImage too. Iago } /* For arrays, use a tall (height * depth) 2D texture but taking into -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/14] mesa: Turn get_readpixels_transfer_ops() in to a global function
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: This utility function is utilized in a later patch. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- Jenkins showed no piglit regressions with this series. src/mesa/main/readpix.c | 14 -- src/mesa/main/readpix.h | 6 ++ 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c index a3357cd..caa2648 100644 --- a/src/mesa/main/readpix.c +++ b/src/mesa/main/readpix.c @@ -64,9 +64,11 @@ _mesa_need_rgb_to_luminance_conversion(mesa_format texFormat, GLenum format) /** * Return transfer op flags for this ReadPixels operation. */ -static GLbitfield -get_readpixels_transfer_ops(const struct gl_context *ctx, mesa_format texFormat, -GLenum format, GLenum type, GLboolean uses_blit) +GLbitfield +_mesa_get_readpixels_transfer_ops(const struct gl_context *ctx, + mesa_format texFormat, + GLenum format, GLenum type, + GLboolean uses_blit) { GLbitfield transferOps = ctx-_ImageTransferState; @@ -169,7 +171,7 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, } /* And finally, see if there are any transfer ops. */ - return get_readpixels_transfer_ops(ctx, rb-Format, format, type, + return _mesa_get_readpixels_transfer_ops(ctx, rb-Format, format, type, uses_blit) != 0; With the change, indentation for the second line needs to be fixed too. Other than that, Reviewed-by: Iago Toral Quiroga ito...@igalia.com } return GL_FALSE; @@ -436,8 +438,8 @@ read_rgba_pixels( struct gl_context *ctx, if (!rb) return; - transferOps = get_readpixels_transfer_ops(ctx, rb-Format, format, type, - GL_FALSE); + transferOps = _mesa_get_readpixels_transfer_ops(ctx, rb-Format, format, + type, GL_FALSE); /* Describe the dst format */ dst_is_integer = _mesa_is_enum_format_integer(format); dst_stride = _mesa_image_row_stride(packing, width, format, type); diff --git a/src/mesa/main/readpix.h b/src/mesa/main/readpix.h index 1636dd9..f894036 100644 --- a/src/mesa/main/readpix.h +++ b/src/mesa/main/readpix.h @@ -40,6 +40,12 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, extern GLboolean _mesa_need_rgb_to_luminance_conversion(mesa_format texFormat, GLenum format); +extern GLbitfield +_mesa_get_readpixels_transfer_ops(const struct gl_context *ctx, + mesa_format texFormat, + GLenum format, GLenum type, + GLboolean uses_blit); + extern void _mesa_readpixels(struct gl_context *ctx, GLint x, GLint y, GLsizei width, GLsizei height, ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/main/readpix.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c index caa2648..a9416ef 100644 --- a/src/mesa/main/readpix.c +++ b/src/mesa/main/readpix.c @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const struct gl_context *ctx, GLenum format, srcType = _mesa_get_format_datatype(rb-Format); if ((srcType == GL_INT + _mesa_is_enum_format_integer(format) (type == GL_UNSIGNED_INT || type == GL_UNSIGNED_SHORT || type == GL_UNSIGNED_BYTE)) || (srcType == GL_UNSIGNED_INT + _mesa_is_enum_format_integer(format) (type == GL_INT || type == GL_SHORT || type == GL_BYTE))) { As I mentioned in patch 4, this code does not exist any more, so this patch should probably be dropped. Iago ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Register spilling issues in the NIR-vec4 backend
Hi, On Thu, 2015-07-16 at 08:15 -0700, Jason Ekstrand wrote: On Jul 15, 2015 11:20 PM, Iago Toral ito...@igalia.com wrote: On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote: On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral ito...@igalia.com wrote: Hi, when we sent the patches for the new nir-vec4 backend we mentioned that we had a few dEQP tests that would fail to link because of register spilling. Now that we have added GS support we see a few instances of this problem popping up in a few GS piglit tests too, for example this one: tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test I have been looking into what is going on with these tests and I came to the conclusion that the problem is a consequence of various factors, but probably the main thing contributing to it is the way our SSA pass works. That said, I am not that experienced with NIR, so it could also be that my analysis is missing something and I am just arriving to wrong conclusions, so I'll explain my thoughts below and hopefully someone else with more NIR experience can jump in and confirm or reject my analysis. The GS code in that test looks like this: for (int p = 0; p 3; p++) { color = ((index = ins[p].m1.length() ? ins[p].m2[index-ins[p].m1.length()] : ins[p].m1[index]) == expect) ? vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0); gl_Position = gl_in[p].gl_Position; EmitVertex(); } One thing that is immediately contributing to the register pressure is some really awful code generated because of the indirect array indexing on the inputs inside the loop. This is because of the lower_variable_index_to_cond_assign lowering pass called from brw_shader.cpp. This pass will convert that color assignment into a bunch of nested if/else statements which makes the generated GLSL IR code rather large, involving plenty of temporaries too. This is only made worse by the fact that loop unrolling will replicate that 3 times. The result is a huge pile of GLSL IR with a few dozens of nested if/else statements and temporaries that looks like [1] (that is only a fragment of the GLSL IR). One thing that is particularly relevant in that code is that it has multiple conditional assignments to the same variable (dereference_array_value) as a consequence of this lowering pass. That much, however, is common to the NIR and non-NIR paths. The problem in the NIR case is that all these assignments generate new SSA values, which then become new registers in the final NIR form. This leads to NIR code like [2]. In contrast, the old vec4 visitor path, is able to have writes to the same variable write to the same register. As a result, if I print the code right before register allocation in the NIR path [3] and I compare that to what we get with the old vec4 visitor path at that same point [4], it is clearly visible that this difference is allowing the vec4 visitor path to reduce register pressure (see how in [4] we have multiple writes to vgrf5, while in [3] we always write to a new vgrf every time). So, am I missing something or is this kind of result expected with NIR programs? Is there anything in the nir-vec4 pass that we can do to fix this or does this need to be fixed when going out of SSA moe inside NIR? Iago [1] http://pastebin.com/5uA8ex2S [2] http://pastebin.com/pqLfvAVN [3] http://pastebin.com/64nSuUH8 [4] http://pastebin.com/WCrdYxzt ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Hi Iago, Indeed, NIR does convert conditional writes to conditional selectss -- it's a required part of the conversion to SSA, and since our HW has a conditional select instruction that's just as fast as doing a conditional move, we haven't bothered much to try and change it back during out-of-SSA. However, doing this shouldn't make things worse. In your example, vgrf9, vgrf15, and vgrf17 all have very short live intervals and don't interfere with vgrf11 (unless there's another use of them somewhere after the snippet you pasted), which means that the register allocator is free to allocate the destinations of all the selects to the same register. What's happening, though, is that you're running into our terrible liveness analysis. After doing the proper liveness analysis, we figure out the place each register first becomes live and last becomes dead, and then we consider registers that have overlapping ranges to interfere. So we consider vgrf11 to interfere with vgrf15 and vgrf17, even
Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion
On Mon, 2015-07-20 at 10:56 -0700, Anuj Phogat wrote: On Mon, Jul 20, 2015 at 5:10 AM, Iago Toral ito...@igalia.com wrote: On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote: On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote: Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when forced to use the meta pbo path. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 00364f8..84cbc50 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) return false; + + if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format, type)) + return false; Hrm... This seems fishy. Isn't glBlitFramebuffers supposed to handle format conversion with integers? If so we should probably fix it rather than just skip it for the meta pbo path. As discussed offline, here is relevant text for glBlitFrameBuffer() from OpenGL 4.5 spec, section 18.3.1: An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: -The read buffer contains fixed-point or floating-point values and any draw buffer contains neither fixed-point nor floating-point values. -The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. - The read buffer contains signed integer values and any draw buffer does not contain signed integer values. I'll add a comment here explaining the reason to avoid meta path. Is this code going to run only for glBlitFramebuffer? I see this function being called from code paths that implement glReadPixels and glGetTexImage too. _mesa_meta_pbo_GetTexSubImage() is used only for glReadPixels and glGetTexImage. I quoted the glBliFrameBuffer restriction above because the function is later using _mesa_meta_BlitFramebuffer(), which doesn't support some format conversions. If this function can be used to resolve ReadPixels and GetTexImage but the checks you add are *specific* to BlitFramebuffer, it does not look like this is the right place for them. Shouldn't you put them inside _mesa_meta_BlitFramebuffer instead? Otherwise they would affect to ReadPixels and GetTexImage too and I don't see the same restrictions applying to ReadPixels for example. Iago } /* For arrays, use a tall (height * depth) 2D texture but taking into -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/14] mesa: Add a helper function _mesa_unpack_format_to_base_format()
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/main/glformats.c | 44 src/mesa/main/glformats.h | 3 +++ 2 files changed, 47 insertions(+) diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c index ac69fab..cac243c 100644 --- a/src/mesa/main/glformats.c +++ b/src/mesa/main/glformats.c @@ -1278,6 +1278,50 @@ _mesa_is_compressed_format(const struct gl_context *ctx, GLenum format) } } +/** + * Convert various unpack formats to the corresponding base format. + */ +GLenum +_mesa_unpack_format_to_base_format(GLenum format) +{ + switch(format) { + case GL_RED_INTEGER: + return GL_RED; + case GL_GREEN_INTEGER: + return GL_GREEN; + case GL_BLUE_INTEGER: + return GL_BLUE; + case GL_RG_INTEGER: + return GL_RG; + case GL_RGB_INTEGER: + return GL_RGB; + case GL_RGBA_INTEGER: + return GL_RGBA; + case GL_BGR_INTEGER: + return GL_BGR; + case GL_BGRA_INTEGER: + return GL_BGRA; + case GL_ALPHA_INTEGER: + return GL_ALPHA; I would probably move the GL_ALPHA_INTEGER case together with the other single component formats, at first I thought you had missed it. Either way: Reviewed-by: Iago Toral Quiroga ito...@igalia.com + case GL_LUMINANCE_INTEGER_EXT: + return GL_LUMINANCE; + case GL_LUMINANCE_ALPHA_INTEGER_EXT: + return GL_LUMINANCE_ALPHA; + case GL_RED: + case GL_GREEN: + case GL_BLUE: + case GL_RG: + case GL_RGB: + case GL_RGBA: + case GL_BGR: + case GL_BGRA: + case GL_ALPHA: + case GL_LUMINANCE: + case GL_LUMINANCE_ALPHA: + default: + return format; + } +} Reviewed-by: Iago Toral Quiroga ito...@igalia.com /** * Convert various base formats to the cooresponding integer format. diff --git a/src/mesa/main/glformats.h b/src/mesa/main/glformats.h index 8881cb7..419955a 100644 --- a/src/mesa/main/glformats.h +++ b/src/mesa/main/glformats.h @@ -101,6 +101,9 @@ _mesa_is_compressed_format(const struct gl_context *ctx, GLenum format); extern GLenum _mesa_base_format_to_integer_format(GLenum format); +extern GLenum +_mesa_unpack_format_to_base_format(GLenum format); + extern GLboolean _mesa_base_format_has_channel(GLenum base_format, GLenum pname); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/14] mesa: Set green, blue channels to zero only for formats with these components
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/common/meta.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 214a68a..fceb25d 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3132,9 +3132,16 @@ decompress_texture_image(struct gl_context *ctx, * returned as red and two-channel texture values are returned as * red/alpha. */ - if ((baseTexFormat == GL_LUMINANCE || - baseTexFormat == GL_LUMINANCE_ALPHA || - baseTexFormat == GL_INTENSITY) || + if (((baseTexFormat == GL_LUMINANCE || +baseTexFormat == GL_LUMINANCE_ALPHA || +baseTexFormat == GL_INTENSITY) + (destBaseFormat == GL_RGBA || +destBaseFormat == GL_RGB || +destBaseFormat == GL_RG || +destBaseFormat == GL_GREEN || +destBaseFormat == GL_BLUE || +destBaseFormat == GL_BGRA || +destBaseFormat == GL_BGR)) || Is this needed to achieve correct behavior or just an optimization? I would expect that if the dest format does not have G/B channels, setting pixel transfer options for these channels would not have any functional effect anyway. /* If we're reading back an RGB(A) texture (using glGetTexImage) as * luminance then we need to return L=tex(R). */ ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion
On Tue, 2015-07-21 at 08:13 +0200, Iago Toral wrote: On Mon, 2015-07-20 at 10:56 -0700, Anuj Phogat wrote: On Mon, Jul 20, 2015 at 5:10 AM, Iago Toral ito...@igalia.com wrote: On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote: On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand ja...@jlekstrand.net wrote: On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote: Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when forced to use the meta pbo path. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 00364f8..84cbc50 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) return false; + + if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format, type)) + return false; Hrm... This seems fishy. Isn't glBlitFramebuffers supposed to handle format conversion with integers? If so we should probably fix it rather than just skip it for the meta pbo path. As discussed offline, here is relevant text for glBlitFrameBuffer() from OpenGL 4.5 spec, section 18.3.1: An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: -The read buffer contains fixed-point or floating-point values and any draw buffer contains neither fixed-point nor floating-point values. -The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. - The read buffer contains signed integer values and any draw buffer does not contain signed integer values. I'll add a comment here explaining the reason to avoid meta path. Is this code going to run only for glBlitFramebuffer? I see this function being called from code paths that implement glReadPixels and glGetTexImage too. _mesa_meta_pbo_GetTexSubImage() is used only for glReadPixels and glGetTexImage. I quoted the glBliFrameBuffer restriction above because the function is later using _mesa_meta_BlitFramebuffer(), which doesn't support some format conversions. If this function can be used to resolve ReadPixels and GetTexImage but the checks you add are *specific* to BlitFramebuffer, it does not look like this is the right place for them. Shouldn't you put them inside _mesa_meta_BlitFramebuffer instead? Otherwise they would affect to ReadPixels and GetTexImage too and I don't see the same restrictions applying to ReadPixels for example. Specifically for ReadPixels I only see this in the spec: An INVALID_OPERATION error is generated if format is an integer format and the color buffer is not an integer format, or if the color buffer is an integer format and format is not an integer format. So, unlike BlitFramebuffer, it seems that ReadPixels is fine as long as both formats are integer, no matter if the types have the same sign or not. Iago } /* For arrays, use a tall (height * depth) 2D texture but taking into -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/14] meta: Use _mesa_unpack_format_to_base_format() to handle integer formats
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: _mesa_base_tex_format() doesn't handle GL_*_INTEGER formats. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com --- src/mesa/drivers/common/meta.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index fceb25d..c9e58d8 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3123,7 +3123,7 @@ decompress_texture_image(struct gl_context *ctx, /* read pixels from renderbuffer */ { GLenum baseTexFormat = texImage-_BaseFormat; - GLenum destBaseFormat = _mesa_base_tex_format(ctx, destFormat); + GLenum destBaseFormat = _mesa_unpack_format_to_base_format(destFormat); I think this commit is doing a bit more more than what the shortlog suggests: it is replacing a call to mesa_base_tex_format that handles internal formats (so things like GL_COMPRESSED_ALPHA, GL_RGB_S3TC or GL_RG16_SNORM for example) with a call to the new _mesa_unpack_format_to_base_format that does not care for internal formats at all. That change seems to be fine because decompress_texture_image claims to receive a destFormat that is ala glReadPixels, so there is no need to care for all those internal formats anyway, but I think you should explain this in the commit log. With that change, Reviewed-by: Iago Toral Quiroga ito...@igalia.com /* The pixel transfer state will be set to default values at this point * (see MESA_META_PIXEL_TRANSFER) so pixel transfer ops are effectively @@ -3149,9 +3149,7 @@ decompress_texture_image(struct gl_context *ctx, baseTexFormat == GL_RGB || baseTexFormat == GL_RG) (destBaseFormat == GL_LUMINANCE || - destBaseFormat == GL_LUMINANCE_ALPHA || - destBaseFormat == GL_LUMINANCE_INTEGER_EXT || - destBaseFormat == GL_LUMINANCE_ALPHA_INTEGER_EXT))) { + destBaseFormat == GL_LUMINANCE_ALPHA))) { /* Green and blue must be zero */ _mesa_PixelTransferf(GL_GREEN_SCALE, 0.0f); _mesa_PixelTransferf(GL_BLUE_SCALE, 0.0f); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/14] meta: Don't do fragment color clamping in case of ReadPixels
On Tue, 2015-07-21 at 09:24 +0200, Iago Toral wrote: On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Without this patch, piglit test arb_color_buffer_float-readpixels fails, when forced to use the meta pbo path. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 84cbc50..ccb7dfb 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -25,6 +25,7 @@ *Jason Ekstrand jason.ekstr...@intel.com */ +#include blend.h #include bufferobj.h #include buffers.h #include fbobject.h @@ -305,6 +306,10 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER | MESA_META_PIXEL_STORE)); + /* GL_CLAMP_FRAGMENT_COLOR doesn't affect ReadPixels. */ + if (!tex_image ctx-Extensions.ARB_color_buffer_float) + _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE); + I think you need to restore the original value of GL_CLAMP_FRAGMENT_COLOR after you are done with ReadPixels, otherwise it looks like you would be changing the setting for future API calls too. Oh, I just noticed that this setting is already being considered in the state flags passed to _mesa_meta_begin, so ignore that comment. Also, does this only apply to ReadPixels and not to GetTexImage? That looks odd to me. Iago _mesa_GenFramebuffers(2, fbos); if (tex_image tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/14] meta: Don't do fragment color clamping in case of ReadPixels
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote: Without this patch, piglit test arb_color_buffer_float-readpixels fails, when forced to use the meta pbo path. Signed-off-by: Anuj Phogat anuj.pho...@gmail.com Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/drivers/common/meta_tex_subimage.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/common/meta_tex_subimage.c b/src/mesa/drivers/common/meta_tex_subimage.c index 84cbc50..ccb7dfb 100644 --- a/src/mesa/drivers/common/meta_tex_subimage.c +++ b/src/mesa/drivers/common/meta_tex_subimage.c @@ -25,6 +25,7 @@ *Jason Ekstrand jason.ekstr...@intel.com */ +#include blend.h #include bufferobj.h #include buffers.h #include fbobject.h @@ -305,6 +306,10 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, GLuint dims, _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER | MESA_META_PIXEL_STORE)); + /* GL_CLAMP_FRAGMENT_COLOR doesn't affect ReadPixels. */ + if (!tex_image ctx-Extensions.ARB_color_buffer_float) + _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE); + I think you need to restore the original value of GL_CLAMP_FRAGMENT_COLOR after you are done with ReadPixels, otherwise it looks like you would be changing the setting for future API calls too. Also, does this only apply to ReadPixels and not to GetTexImage? That looks odd to me. Iago _mesa_GenFramebuffers(2, fbos); if (tex_image tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Register spilling issues in the NIR-vec4 backend
Hi, when we sent the patches for the new nir-vec4 backend we mentioned that we had a few dEQP tests that would fail to link because of register spilling. Now that we have added GS support we see a few instances of this problem popping up in a few GS piglit tests too, for example this one: tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test I have been looking into what is going on with these tests and I came to the conclusion that the problem is a consequence of various factors, but probably the main thing contributing to it is the way our SSA pass works. That said, I am not that experienced with NIR, so it could also be that my analysis is missing something and I am just arriving to wrong conclusions, so I'll explain my thoughts below and hopefully someone else with more NIR experience can jump in and confirm or reject my analysis. The GS code in that test looks like this: for (int p = 0; p 3; p++) { color = ((index = ins[p].m1.length() ? ins[p].m2[index-ins[p].m1.length()] : ins[p].m1[index]) == expect) ? vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0); gl_Position = gl_in[p].gl_Position; EmitVertex(); } One thing that is immediately contributing to the register pressure is some really awful code generated because of the indirect array indexing on the inputs inside the loop. This is because of the lower_variable_index_to_cond_assign lowering pass called from brw_shader.cpp. This pass will convert that color assignment into a bunch of nested if/else statements which makes the generated GLSL IR code rather large, involving plenty of temporaries too. This is only made worse by the fact that loop unrolling will replicate that 3 times. The result is a huge pile of GLSL IR with a few dozens of nested if/else statements and temporaries that looks like [1] (that is only a fragment of the GLSL IR). One thing that is particularly relevant in that code is that it has multiple conditional assignments to the same variable (dereference_array_value) as a consequence of this lowering pass. That much, however, is common to the NIR and non-NIR paths. The problem in the NIR case is that all these assignments generate new SSA values, which then become new registers in the final NIR form. This leads to NIR code like [2]. In contrast, the old vec4 visitor path, is able to have writes to the same variable write to the same register. As a result, if I print the code right before register allocation in the NIR path [3] and I compare that to what we get with the old vec4 visitor path at that same point [4], it is clearly visible that this difference is allowing the vec4 visitor path to reduce register pressure (see how in [4] we have multiple writes to vgrf5, while in [3] we always write to a new vgrf every time). So, am I missing something or is this kind of result expected with NIR programs? Is there anything in the nir-vec4 pass that we can do to fix this or does this need to be fixed when going out of SSA moe inside NIR? Iago [1] http://pastebin.com/5uA8ex2S [2] http://pastebin.com/pqLfvAVN [3] http://pastebin.com/64nSuUH8 [4] http://pastebin.com/WCrdYxzt ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] i965: Move BEGIN_BATCH() into same control flow as ADVANCE_BATCH().
I think Chris did not review this one: Reviewed-by: Iago Toral Quiroga ito...@igalia.com Iago On Fri, 2015-07-10 at 11:44 -0700, Matt Turner wrote: BEGIN_BATCH() and ADVANCE_BATCH() will contain do { and } while (0) respectively to allow declaring local variables used by intervening OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be in the same control flow. --- src/mesa/drivers/dri/i965/brw_draw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_draw.c b/src/mesa/drivers/dri/i965/brw_draw.c index 69ad4d4..ec13473 100644 --- a/src/mesa/drivers/dri/i965/brw_draw.c +++ b/src/mesa/drivers/dri/i965/brw_draw.c @@ -261,17 +261,17 @@ static void brw_emit_prim(struct brw_context *brw, indirect_flag = 0; } + BEGIN_BATCH(brw-gen = 7 ? 7 : 6); + if (brw-gen = 7) { if (brw-predicate.state == BRW_PREDICATE_STATE_USE_BIT) predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE; else predicate_enable = 0; - BEGIN_BATCH(7); OUT_BATCH(CMD_3D_PRIM 16 | (7 - 2) | indirect_flag | predicate_enable); OUT_BATCH(hw_prim | vertex_access_type); } else { - BEGIN_BATCH(6); OUT_BATCH(CMD_3D_PRIM 16 | (6 - 2) | hw_prim GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT | vertex_access_type); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass
On Wed, 2015-10-21 at 14:58 +0300, Francisco Jerez wrote: > Iago Toral <ito...@igalia.com> writes: > > > On Wed, 2015-10-21 at 13:00 +0300, Francisco Jerez wrote: > >> Iago Toral <ito...@igalia.com> writes: > >> > >> > Hi Curro, > >> > > >> > On Tue, 2015-10-20 at 14:18 +0300, Francisco Jerez wrote: > >> >> Iago Toral <ito...@igalia.com> writes: > >> >> > >> >> > On Tue, 2015-10-20 at 13:22 +0300, Francisco Jerez wrote: > >> >> >> Iago Toral Quiroga <ito...@igalia.com> writes: > >> >> >> > >> >> >> > This allows us to re-use the results of previous ssbo loads in > >> >> >> > situations > >> >> >> > that are safe (i.e. when there are no stores, atomic operations or > >> >> >> > memory barriers in between). > >> >> >> > > >> >> >> > This is particularly useful for things like matrix > >> >> >> > multiplications, where > >> >> >> > for a mat4 buffer variable we cut the number of loads from 16 (4 > >> >> >> > reads of > >> >> >> > each column) down to 4 (1 read of each column). > >> >> >> > > >> >> >> > The pass can only cache ssbo loads that involve constant blocks and > >> >> >> > offsets, but could be extended to compare sub-expressions for these > >> >> >> > as well, similar to a CSE pass. > >> >> >> > > >> >> >> > The way the cache works is simple: ssbo loads with constant > >> >> >> > block/offset > >> >> >> > are included in a cache as they are seen. Stores invalidate cache > >> >> >> > entries. > >> >> >> > Stores with non-constant offset invalidate all cached loads for > >> >> >> > the block > >> >> >> > and stores with non-constant block invalidate all cache entries. > >> >> >> > There is > >> >> >> > room to improve this by using the actual variable name we are > >> >> >> > accessing to > >> >> >> > limit the entries that should be invalidated. We also need to > >> >> >> > invalidate > >> >> >> > cache entries when we exit the block in which they have been > >> >> >> > defined > >> >> >> > (i.e. inside if/else blocks or loops). > >> >> >> > > >> >> >> > The cache optimization is built as a separate pass, instead of > >> >> >> > merging it > >> >> >> > inside the lower_ubo_reference pass for a number of reasons: > >> >> >> > > >> >> >> > 1) The way we process assignments in visitors is that the LHS is > >> >> >> > processed before the RHS. This creates a problem for an > >> >> >> > optimization > >> >> >> > such as this when we do things like a = a + 1, since we would see > >> >> >> > the > >> >> >> > store before the read when the actual execution order is reversed. > >> >> >> > This could be fixed by re-implementing the logic in the visit_enter > >> >> >> > method for ir_assignment in lower_ubo_reference and then returning > >> >> >> > visit_continue_with_parent. > >> >> >> > > >> >> >> > 2) Some writes/reads need to be split into multiple smaller > >> >> >> > writes/reads, and we need to handle caching for each one. This > >> >> >> > happens > >> >> >> > deep inside the code that handles the lowering and some > >> >> >> > of the information we need to do this is not available. This could > >> >> >> > also > >> >> >> > be fixed by passing more data into the corresponding functions or > >> >> >> > by > >> >> >> > making this data available as class members, but the current > >> >> >> > implementation > >> >> >> > is already complex enough and this would only contribute to the > >> >> >> > complexity. > >> >> >> > > >&
Re: [Mesa-dev] [PATCH] i965/fs: Disable CSE optimization for untyped & typed surface reads
On Wed, 2015-10-21 at 23:24 -0700, Jordan Justen wrote: > On 2015-10-20 00:43:13, Iago Toral wrote: > > On Tue, 2015-10-20 at 00:12 -0700, Jordan Justen wrote: > > > An untyped surface read is volatile because it might be affected by a > > > write. > > > > > > In the ES31-CTS.compute_shader.resources-max test, two back to back > > > read/modify/writes of an SSBO variable looked something like this: > > > > > > r1 = untyped_surface_read(ssbo_float) > > > r2 = r1 + 1 > > > untyped_surface_write(ssbo_float, r2) > > > r3 = untyped_surface_read(ssbo_float) > > > r4 = r3 + 1 > > > untyped_surface_write(ssbo_float, r4) > > > > > > And after CSE, we had: > > > > > > r1 = untyped_surface_read(ssbo_float) > > > r2 = r1 + 1 > > > untyped_surface_write(ssbo_float, r2) > > > r4 = r1 + 1 > > > untyped_surface_write(ssbo_float, r4) > > > > Yeah, we cannot do CSE with SSBO loads. Patch looks good to me, but we > > should do the same in the vec4 CSE pass. > > Yeah, I checked vec4 CSE. It looks like is_expression will > unconditionally return false for those opcodes. Oh right. > r-b? Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> FWIW, my ssbo load optimization pass is trying to "undo" this since it is all about doing CSE for ssbo loads that are safe to CSE, that is, when we know that we don't have stores/atomics that write to the same offset or memory barriers in between. I am trying to implement that in NIR though, so we still need this, to prevent i965 from trying to CSE the remaining loads it sees, since thise would not be safe to CSE. Also, as I mentioned in another e-mail, we did not notice this issue earlier was because there are a couple of problems in i965 that make it quite difficult that the CSE pass identifies identical SSBO loads at the moment, but that is bound to change as soon as those things get eventually fixed. Iago > -Jordan > > > > > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > > > --- > > > src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 3 ++- > > > src/mesa/drivers/dri/i965/brw_shader.cpp | 14 ++ > > > src/mesa/drivers/dri/i965/brw_shader.h | 6 ++ > > > 3 files changed, 22 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > > > b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > > > index c7628dc..3a28c8d 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp > > > @@ -93,7 +93,8 @@ is_expression(const fs_visitor *v, const fs_inst *const > > > inst) > > > case SHADER_OPCODE_LOAD_PAYLOAD: > > >return !inst->is_copy_payload(v->alloc); > > > default: > > > - return inst->is_send_from_grf() && !inst->has_side_effects(); > > > + return inst->is_send_from_grf() && !inst->has_side_effects() && > > > + !inst->is_volatile(); > > > } > > > } > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp > > > b/src/mesa/drivers/dri/i965/brw_shader.cpp > > > index 2324b56..be911ed 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp > > > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp > > > @@ -969,6 +969,20 @@ backend_instruction::has_side_effects() const > > > } > > > } > > > > > > +bool > > > +backend_instruction::is_volatile() const > > > +{ > > > + switch (opcode) { > > > + case SHADER_OPCODE_UNTYPED_SURFACE_READ: > > > + case SHADER_OPCODE_UNTYPED_SURFACE_READ_LOGICAL: > > > + case SHADER_OPCODE_TYPED_SURFACE_READ: > > > + case SHADER_OPCODE_TYPED_SURFACE_READ_LOGICAL: > > > + return true; > > > + default: > > > + return false; > > > + } > > > +} > > > + > > > #ifndef NDEBUG > > > static bool > > > inst_is_in_block(const bblock_t *block, const backend_instruction *inst) > > > diff --git a/src/mesa/drivers/dri/i965/brw_shader.h > > > b/src/mesa/drivers/dri/i965/brw_shader.h > > > index b33b08f..35ee210 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_shader.h > > > +++ b/src/mesa/drivers/dri/i965/brw_shader.h > > > @@ -115,6 +115,12 @@ struct backend_instruction : public exec_node { > > > * optimize these out unless you know what you are doing. > > > */ > > > bool has_side_effects() const; > > > + > > > + /** > > > +* True if the instruction might be affected by side effects of other > > > +* instructions. > > > +*/ > > > + bool is_volatile() const; > > > #else > > > struct backend_instruction { > > > struct exec_node link; > > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture().
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> On Wed, 2015-10-21 at 12:30 -0700, Matt Turner wrote: > We implement textureQueryLevels (which takes no arguments, save the > sampler) using the resinfo message (which takes an argument of LOD). > Without initializing it, we'd generate a MOV from the null register to > load the LOD argument. > > Essentially the same logic applies to texture. A vertex shader cannot > compute derivatives and so cannot produce an LOD, so TXL with an LOD of > 0.0 is used. > --- > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 12 > 1 file changed, 12 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > index c39f97e..b8f90f2 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > @@ -882,6 +882,18 @@ vec4_visitor::emit_texture(ir_texture_opcode op, > uint32_t sampler, > src_reg sampler_reg) > { > + /* The sampler can only meaningfully compute LOD for fragment shader > +* messages. For all other stages, we change the opcode to TXL and > hardcode > +* the LOD to 0. > +* > +* textureQueryLevels() is implemented in terms of TXS so we need to pass > a > +* valid LOD argument. > +*/ > + if (op == ir_tex || op == ir_query_levels) { > + assert(lod.file == BAD_FILE); > + lod = src_reg(0.0f); > + } > + > enum opcode opcode; > switch (op) { > case ir_tex: opcode = SHADER_OPCODE_TXL; break; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads
On Thu, 2015-10-22 at 09:39 -0400, Connor Abbott wrote: > On Thu, Oct 22, 2015 at 7:21 AM, Iago Toral Quiroga <ito...@igalia.com> wrote: > > I implemented this first as a separate optimization pass in GLSL IR [1], but > > Curro pointed out that this being pretty much a restricted form of a CSE > > pass > > it would probably make more sense to do it inside CSE (and we no longer have > > a CSE pass in GLSL IR). > > > > Unlike other things we CSE in NIR, in the case of SSBO loads we need to make > > sure that we invalidate previous entries in the set in the presence of > > conflicting instructions (i.e. SSBO writes to the same block and offset) or > > in the presence of memory barriers. > > > > If this is accepted I intend to extend this to also cover image reads, which > > follow similar behavior. > > > > No regressions observed in piglit or dEQP's SSBO functional tests. > > > > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html > > > > Iago Toral Quiroga (2): > > nir/cse: invalidate SSBO loads in presence of ssbo writes or memory > > barriers > > nir/instr_set: allow rewrite of SSBO loads > > > > src/glsl/nir/nir_instr_set.c | 24 ++-- > > src/glsl/nir/nir_opt_cse.c | 142 > > +++ > > 2 files changed, 162 insertions(+), 4 deletions(-) > > > > -- > > 1.9.1 > > > > ___ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > NAK, this isn't going to work. NIR CSE is designed for operations > which can be moved around freely as long they're still dominated by > the SSA values they use. It makes heavy advantage of this to avoid > looking at the entire CFG and instead only at the current block and > its parents in the dominance tree. For example, imagine you have > something like: > > A = load_ssbo 0 > if (cond) { >store_ssbo 0 > } > B = load_ssbo 0 > > Then A and B can't be combined, but CSE will combine them anyways when > it reaches B because it keeps a hash table of values dominating B and > finds A as a match. It doesn't look at the if conditional at all > because it doesn't dominate the load to B. This is great when you want > to CSE pure things that don't depend on other side effects -- after > all, this is the sort of efficiency that SSA is supposed to give us -- > but it means that as-is, it can't be used for e.g. SSBO's and images > without completely changing how the pass works and making it less > efficient. Ugh! One would think that at least one of the 2000+ SSBO tests in dEQP would catch something like this... I guess not :(. > Now, that being said, I still think that we should definitely be doing > this sort of thing in NIR now that we've finally added support for > SSBO's and images. We've been trying to avoid adding new optimizations > to GLSL, since we've been trying to move away from it. In addition, > with SPIR-V on the way, anything added to GLSL IR now is something > that we won't be able to use with SPIR-V shaders. Only doing it in FS > doesn't sound so great either; we should be doing as much as possible > at the midlevel, and combining SSBO loads is something that isn't > FS-specific at all. Yeah, agreed. > There are two ways I can see support for this being added to NIR: > > 1. Add an extra fake source/destination to intrinsics with side > effects, and add a pass to do essentially a conversion to SSA that > wires up these "token" sources/destinations, or perhaps extend the > existing to-SSA pass. > > 2. Add a special "load-combining" pass that does some dataflow > analysis or similar (or, for now, only looks at things within a single > block). > > The advantage of #1 is that we get to use existing NIR passes, like > CSE, DCE, and GCM "for free" on SSBO loads and stores, without having > to do the equivalent thing using dataflow analysis. Also, doing store > forwarding (i.e. replacing the result of an SSBO load with the value > corresponding to a store, if we can figure out which store affects it) > is going to much easier. However, #1 is going to be much more of a > research project. I've thought about how we could do it, but I'm still > not sure how it could be done feasibly and still be correct. Thanks for sharing these ideas. #1 looks like the best way to go in terms of benefits (although it looks rather artificial!), however I am not sure that my understanding of NIR at this moment is good enough to pursue something like that. Also, I would really like to see some sort of support
Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads
On Fri, 2015-10-23 at 09:26 -0700, Jason Ekstrand wrote: > On Thu, Oct 22, 2015 at 11:13 PM, Iago Toral <ito...@igalia.com> wrote: > > On Thu, 2015-10-22 at 09:09 -0700, Jason Ekstrand wrote: > >> On Thu, Oct 22, 2015 at 4:21 AM, Iago Toral Quiroga <ito...@igalia.com> > >> wrote: > >> > I implemented this first as a separate optimization pass in GLSL IR [1], > >> > but > >> > Curro pointed out that this being pretty much a restricted form of a CSE > >> > pass > >> > it would probably make more sense to do it inside CSE (and we no longer > >> > have > >> > a CSE pass in GLSL IR). > >> > > >> > Unlike other things we CSE in NIR, in the case of SSBO loads we need to > >> > make > >> > sure that we invalidate previous entries in the set in the presence of > >> > conflicting instructions (i.e. SSBO writes to the same block and offset) > >> > or > >> > in the presence of memory barriers. > >> > > >> > If this is accepted I intend to extend this to also cover image reads, > >> > which > >> > follow similar behavior. > >> > > >> > No regressions observed in piglit or dEQP's SSBO functional tests. > >> > > >> > [1] > >> > http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html > >> > >> I think you've gotten enough NAK's that I don't need to chime in > >> there. Unfortunately, solving this in general is something of a > >> research project that both Connor and I have been thinking about for > >> quite some time. I've been thinking off-and-on about how to add a > >> proper memory model to lower_vars_to_ssa for almost a year now and > >> still haven't come up with a good way to do it. I don't know whether > >> SSBO's would be simpler or not. We need a proper memory model for > >> both lower_vars_to_ssa and SSBO load/stores (and shared local > >> variables) but it's a substantial research project. > >> > >> This isn't to say that you couldn't do it. Just know what you're taking > >> on. ;-) > > > > Yeah, it does not make sense that I try to do this, you guys have > > clearly given this much more thought than me and know much better how a > > solution for this would fit in NIR than me. > > > >> That said, here's a suggestion for something that we *could* write > >> today, wouldn't be very hard, and wold solve a decent number of cases. > >> > >> For each block: > >> > >> 1) Create a new instruction set (don't use anything from any previous > >> blocks) > >> 2) call add_or_rewrite on all ssbo load operations > >> 3) If you ever see a barrier or ssbo store, destroy the entire > >> instruction set and start again. > > > > Yep, this is what I was thinking for the load-combine pass that Connor > > suggested. However, I think that in this case we do not need to destroy > > the entire set when we find a store, only for memory barriers, right? I > > mean, there should be nothing preventing us from checking the > > offset/block of the store and compare it with the offset/block of the > > loads in the set to decide which ones we need to remove (like I was > > doing in my last patch) > > That's where you get into the "special casing" I mentioned below. If > you have an direct store, you would have to throw away any indirect > loads Yes. > and then insert a fake direct load for the given offset. Actually, what I am doing is a bit different: When I see stores, I also insert them in the hash table (but I never rewrite stores). Then, when I see see a load, I check for a match, if I have it, I use it, if not, I check if I have a store to the same offset, and If I do, I just use that, no need to fake anything. Of course, if I do this, in order to check if I have a compatible store I have to traverse the hash table looking for a match, but I think that should be okay in this case, since that only has load/store operations and only the ones in the current block, so I think it should be okay. Does this seem like a reasonable alternative? > If you > have an indirect store, you would have to throw away everything and > then insert a fake indirect load for the given offset. So, yes, you > can do it, but it'll take a little more work. Yeah, mostly because there are also atomics to consider and then you also have to check that the stores write to all the components we read before we reuse them, etc. > You'll also probabl
Re: [Mesa-dev] [PATCH 1/2] i965/vec4: Remove unnecessary #includes from the generator.
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> On Sat, 2015-10-24 at 13:20 -0700, Matt Turner wrote: > --- > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 8 > 1 file changed, 8 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > index 6569d81..d50184a 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > @@ -20,18 +20,10 @@ > * IN THE SOFTWARE. > */ > > -#include > #include "glsl/glsl_parser_extras.h" > #include "brw_vec4.h" > #include "brw_cfg.h" > > -extern "C" { > -#include "brw_eu.h" > -#include "main/macros.h" > -#include "program/prog_print.h" > -#include "program/prog_parameter.h" > -}; > - > using namespace brw; > > static void ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.
On Wed, 2015-10-28 at 03:37 -0700, Kenneth Graunke wrote: > These tessellation shader related fields need plumbing through NIR. > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > --- > src/glsl/nir/glsl_to_nir.cpp | 2 ++ > src/glsl/nir/nir.h | 5 + > 2 files changed, 7 insertions(+) > > diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp > index 7275b48..3722ae0 100644 > --- a/src/glsl/nir/glsl_to_nir.cpp > +++ b/src/glsl/nir/glsl_to_nir.cpp > @@ -160,6 +160,8 @@ glsl_to_nir(const struct gl_shader_program *shader_prog, > shader->info.num_images = sh->NumImages; > shader->info.inputs_read = sh->Program->InputsRead; > shader->info.outputs_written = sh->Program->OutputsWritten; > + shader->info.patch_inputs_read = sh->Program->PatchInputsRead; > + shader->info.patch_outputs_written = sh->Program->PatchOutputsWritten; > shader->info.system_values_read = sh->Program->SystemValuesRead; > shader->info.uses_texture_gather = sh->Program->UsesGather; > shader->info.uses_clip_distance_out = > diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h > index ac42251..d053273 100644 > --- a/src/glsl/nir/nir.h > +++ b/src/glsl/nir/nir.h > @@ -1481,6 +1481,11 @@ typedef struct nir_shader_info { > /* Which system values are actually read */ > uint64_t system_values_read; > > + /* Which patch inputs are actually read */ > + uint64_t patch_inputs_read; > + /* Which patch outputs are actually written */ > + uint64_t patch_outputs_written; These two should be fine as uint32_t since we set them from a GLBitfield which is 32-bit after all... or maybe you were thinking that we might need to grow these bitfields in Mesa to 64-bits eventually. All 3 patches are: Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> BTW, system_values_read should also be fine as 32-bit for the same reason. I can send a patch for that if we care. Iago > /* Whether or not this shader ever uses textureGather() */ > bool uses_texture_gather; > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex
On Wed, 2015-10-28 at 09:11 +0100, Iago Toral wrote: > Yeah, this makes things more consistent: > Reviewed-by: Iago Toral Quiroga <ito...@igali.com> I meant: Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > > On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote: > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > > Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > > Cc: Iago Toral Quiroga <ito...@igalia.com> > > --- > > src/glsl/link_uniform_initializers.cpp | 2 +- > > src/glsl/linker.cpp| 16 > > src/glsl/standalone_scaffolding.cpp| 4 ++-- > > src/mesa/main/mtypes.h | 11 ++- > > src/mesa/main/shader_query.cpp | 2 +- > > src/mesa/main/shaderobj.c | 4 ++-- > > src/mesa/main/uniforms.c | 4 ++-- > > 7 files changed, 22 insertions(+), 21 deletions(-) > > > > diff --git a/src/glsl/link_uniform_initializers.cpp > > b/src/glsl/link_uniform_initializers.cpp > > index 682a4ee..58d21e5 100644 > > --- a/src/glsl/link_uniform_initializers.cpp > > +++ b/src/glsl/link_uniform_initializers.cpp > > @@ -178,7 +178,7 @@ set_block_binding(gl_shader_program *prog, const char > > *block_name, int binding) > > > >/* This is a field of a UBO. val is the binding index. */ > >for (int i = 0; i < MESA_SHADER_STAGES; i++) { > > - int stage_index = prog->UniformBlockStageIndex[i][block_index]; > > + int stage_index = prog->InterfaceBlockStageIndex[i][block_index]; > > > > if (stage_index != -1) { > > struct gl_shader *sh = prog->_LinkedShaders[i]; > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > > index cfd8f81..3db2fd3 100644 > > --- a/src/glsl/linker.cpp > > +++ b/src/glsl/linker.cpp > > @@ -1174,10 +1174,10 @@ interstage_cross_validate_uniform_blocks(struct > > gl_shader_program *prog) > > for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { > >struct gl_shader *sh = prog->_LinkedShaders[i]; > > > > - prog->UniformBlockStageIndex[i] = ralloc_array(prog, int, > > -max_num_uniform_blocks); > > + prog->InterfaceBlockStageIndex[i] = ralloc_array(prog, int, > > + > > max_num_uniform_blocks); > >for (unsigned int j = 0; j < max_num_uniform_blocks; j++) > > -prog->UniformBlockStageIndex[i][j] = -1; > > +prog->InterfaceBlockStageIndex[i][j] = -1; > > > >if (sh == NULL) > > continue; > > @@ -1194,7 +1194,7 @@ interstage_cross_validate_uniform_blocks(struct > > gl_shader_program *prog) > > return false; > > } > > > > -prog->UniformBlockStageIndex[i][index] = j; > > +prog->InterfaceBlockStageIndex[i][index] = j; > >} > > } > > > > @@ -2836,9 +2836,9 @@ check_resources(struct gl_context *ctx, struct > > gl_shader_program *prog) > >} > > > >for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) { > > -if (prog->UniformBlockStageIndex[j][i] != -1) { > > +if (prog->InterfaceBlockStageIndex[j][i] != -1) { > > struct gl_shader *sh = prog->_LinkedShaders[j]; > > -int stage_index = prog->UniformBlockStageIndex[j][i]; > > +int stage_index = prog->InterfaceBlockStageIndex[j][i]; > > if (sh && > > sh->BufferInterfaceBlocks[stage_index].IsShaderStorage) { > > shader_blocks[j]++; > > total_shader_storage_blocks++; > > @@ -2955,7 +2955,7 @@ check_image_resources(struct gl_context *ctx, struct > > gl_shader_program *prog) > > total_image_units += sh->NumImages; > > > > for (unsigned j = 0; j < prog->NumBufferInterfaceBlocks; j++) { > > -int stage_index = prog->UniformBlockStageIndex[i][j]; > > +int stage_index = prog->InterfaceBlockStageIndex[i][j]; > > if (stage_index != -1 && > > sh->BufferInterfaceBlocks[stage_index].IsShaderStorage) > > total_shader_storage_blocks++; > > } > > @@ -3734,7 +3734,7 @@ build_program_resource_list(struct gl_shader_program > > *shProg) > >int block_index = shProg->UniformStorage[i].block_index; > >if (block_index != -1) { > > for (unsigned j = 0; j
Re: [Mesa-dev] [PATCH 1/3] mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex
Yeah, this makes things more consistent: Reviewed-by: Iago Toral Quiroga <ito...@igali.com> On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > Cc: Iago Toral Quiroga <ito...@igalia.com> > --- > src/glsl/link_uniform_initializers.cpp | 2 +- > src/glsl/linker.cpp| 16 > src/glsl/standalone_scaffolding.cpp| 4 ++-- > src/mesa/main/mtypes.h | 11 ++- > src/mesa/main/shader_query.cpp | 2 +- > src/mesa/main/shaderobj.c | 4 ++-- > src/mesa/main/uniforms.c | 4 ++-- > 7 files changed, 22 insertions(+), 21 deletions(-) > > diff --git a/src/glsl/link_uniform_initializers.cpp > b/src/glsl/link_uniform_initializers.cpp > index 682a4ee..58d21e5 100644 > --- a/src/glsl/link_uniform_initializers.cpp > +++ b/src/glsl/link_uniform_initializers.cpp > @@ -178,7 +178,7 @@ set_block_binding(gl_shader_program *prog, const char > *block_name, int binding) > >/* This is a field of a UBO. val is the binding index. */ >for (int i = 0; i < MESA_SHADER_STAGES; i++) { > - int stage_index = prog->UniformBlockStageIndex[i][block_index]; > + int stage_index = prog->InterfaceBlockStageIndex[i][block_index]; > > if (stage_index != -1) { > struct gl_shader *sh = prog->_LinkedShaders[i]; > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > index cfd8f81..3db2fd3 100644 > --- a/src/glsl/linker.cpp > +++ b/src/glsl/linker.cpp > @@ -1174,10 +1174,10 @@ interstage_cross_validate_uniform_blocks(struct > gl_shader_program *prog) > for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { >struct gl_shader *sh = prog->_LinkedShaders[i]; > > - prog->UniformBlockStageIndex[i] = ralloc_array(prog, int, > - max_num_uniform_blocks); > + prog->InterfaceBlockStageIndex[i] = ralloc_array(prog, int, > + > max_num_uniform_blocks); >for (unsigned int j = 0; j < max_num_uniform_blocks; j++) > - prog->UniformBlockStageIndex[i][j] = -1; > + prog->InterfaceBlockStageIndex[i][j] = -1; > >if (sh == NULL) >continue; > @@ -1194,7 +1194,7 @@ interstage_cross_validate_uniform_blocks(struct > gl_shader_program *prog) > return false; >} > > - prog->UniformBlockStageIndex[i][index] = j; > + prog->InterfaceBlockStageIndex[i][index] = j; >} > } > > @@ -2836,9 +2836,9 @@ check_resources(struct gl_context *ctx, struct > gl_shader_program *prog) >} > >for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) { > - if (prog->UniformBlockStageIndex[j][i] != -1) { > + if (prog->InterfaceBlockStageIndex[j][i] != -1) { > struct gl_shader *sh = prog->_LinkedShaders[j]; > -int stage_index = prog->UniformBlockStageIndex[j][i]; > +int stage_index = prog->InterfaceBlockStageIndex[j][i]; > if (sh && > sh->BufferInterfaceBlocks[stage_index].IsShaderStorage) { > shader_blocks[j]++; > total_shader_storage_blocks++; > @@ -2955,7 +2955,7 @@ check_image_resources(struct gl_context *ctx, struct > gl_shader_program *prog) > total_image_units += sh->NumImages; > > for (unsigned j = 0; j < prog->NumBufferInterfaceBlocks; j++) { > -int stage_index = prog->UniformBlockStageIndex[i][j]; > +int stage_index = prog->InterfaceBlockStageIndex[i][j]; > if (stage_index != -1 && > sh->BufferInterfaceBlocks[stage_index].IsShaderStorage) > total_shader_storage_blocks++; > } > @@ -3734,7 +3734,7 @@ build_program_resource_list(struct gl_shader_program > *shProg) >int block_index = shProg->UniformStorage[i].block_index; >if (block_index != -1) { > for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) { > - if (shProg->UniformBlockStageIndex[j][block_index] != -1) > + if (shProg->InterfaceBlockStageIndex[j][block_index] != -1) > stageref |= (1 << j); > } >} > diff --git a/src/glsl/standalone_scaffolding.cpp > b/src/glsl/standalone_scaffolding.cpp > index eccf094..fe1d820 100644 > --- a/src/glsl/standalone_scaffolding.cpp > +++ b/src/glsl/standalone_scaffolding.cpp > @@ -120,8 +120,8 @@ _mesa_clear_shader_program_data(s
Re: [Mesa-dev] [PATCH 3/3] mesa: Use UBO/SSBO indices during binding
On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote: > Previously we were treating the binding index for Uniform Buffer > Objects and Shader Storage Buffer Objects as being part of the > combined BufferInterfaceBlocks array. > > Fixes ES31-CTS.compute_shader.resource-ubo on i965. > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > Cc: Iago Toral Quiroga <ito...@igalia.com> > --- > src/mesa/main/uniforms.c | 30 -- > 1 file changed, 20 insertions(+), 10 deletions(-) > > diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c > index 758ca24..47f80ce 100644 > --- a/src/mesa/main/uniforms.c > +++ b/src/mesa/main/uniforms.c > @@ -1002,10 +1002,10 @@ _mesa_UniformBlockBinding(GLuint program, > if (!shProg) >return; > > - if (uniformBlockIndex >= shProg->NumBufferInterfaceBlocks) { > + if (uniformBlockIndex >= shProg->NumUniformBlocks) { >_mesa_error(ctx, GL_INVALID_VALUE, > "glUniformBlockBinding(block index %u >= %u)", > - uniformBlockIndex, shProg->NumBufferInterfaceBlocks); > + uniformBlockIndex, shProg->NumUniformBlocks); >return; > } > > @@ -1016,17 +1016,22 @@ _mesa_UniformBlockBinding(GLuint program, >return; > } > > - if (shProg->BufferInterfaceBlocks[uniformBlockIndex].Binding != > + if (shProg->UniformBlocks[uniformBlockIndex]->Binding != > uniformBlockBinding) { >int i; > >FLUSH_VERTICES(ctx, 0); >ctx->NewDriverState |= ctx->DriverFlags.NewUniformBuffer; > > - shProg->BufferInterfaceBlocks[uniformBlockIndex].Binding = > uniformBlockBinding; > + const int interface_block_index = > + shProg->UboInterfaceBlockIndex[uniformBlockIndex]; > + > + shProg->BufferInterfaceBlocks[interface_block_index].Binding = > + uniformBlockBinding; FWIW, here you could just do: shProg->UniformBlocks[uniformBlockIndex]->Binding = uniformBlockBinding; since shProg->UniformBlocks holds pointers into the appropriate places in shProg->BufferInterfaceBlocks. >for (i = 0; i < MESA_SHADER_STAGES; i++) { > - int stage_index = > shProg->InterfaceBlockStageIndex[i][uniformBlockIndex]; > + int stage_index = > +shProg->InterfaceBlockStageIndex[i][interface_block_index]; However, we still need this and so we still need to have the index in the interface block space, so either way: Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> >if (stage_index != -1) { > struct gl_shader *sh = shProg->_LinkedShaders[i]; > @@ -1054,10 +1059,10 @@ _mesa_ShaderStorageBlockBinding(GLuint program, > if (!shProg) >return; > > - if (shaderStorageBlockIndex >= shProg->NumBufferInterfaceBlocks) { > + if (shaderStorageBlockIndex >= shProg->NumShaderStorageBlocks) { >_mesa_error(ctx, GL_INVALID_VALUE, > "glShaderStorageBlockBinding(block index %u >= %u)", > - shaderStorageBlockIndex, shProg->NumBufferInterfaceBlocks); > + shaderStorageBlockIndex, shProg->NumShaderStorageBlocks); >return; > } > > @@ -1069,17 +1074,22 @@ _mesa_ShaderStorageBlockBinding(GLuint program, >return; > } > > - if (shProg->BufferInterfaceBlocks[shaderStorageBlockIndex].Binding != > + if (shProg->ShaderStorageBlocks[shaderStorageBlockIndex]->Binding != > shaderStorageBlockBinding) { >int i; > >FLUSH_VERTICES(ctx, 0); >ctx->NewDriverState |= ctx->DriverFlags.NewShaderStorageBuffer; > > - shProg->BufferInterfaceBlocks[shaderStorageBlockIndex].Binding = > shaderStorageBlockBinding; > + const int interface_block_index = > + shProg->SsboInterfaceBlockIndex[shaderStorageBlockIndex]; > + > + shProg->BufferInterfaceBlocks[interface_block_index].Binding = > + shaderStorageBlockBinding; > >for (i = 0; i < MESA_SHADER_STAGES; i++) { > - int stage_index = > shProg->InterfaceBlockStageIndex[i][shaderStorageBlockIndex]; > + int stage_index = > +shProg->InterfaceBlockStageIndex[i][interface_block_index]; > >if (stage_index != -1) { > struct gl_shader *sh = shProg->_LinkedShaders[i]; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] nir/instr_set: Add an allow_loads field
On Tue, 2015-10-27 at 14:33 +0200, Pohjolainen, Topi wrote: > On Tue, Oct 27, 2015 at 10:28:58AM +0100, Iago Toral Quiroga wrote: > > We need this so we can configure different behaviors for passes that > > cannot deal with side-effectful instructions (CSE) and passes that can > > (we will add a load-combine pass shortly). > > > > For now, when allow_loads is true, we let the instruction set rewrite > > SSBO loads. > > --- > > src/glsl/nir/nir_instr_set.c | 51 > > > > src/glsl/nir/nir_instr_set.h | 20 - > > src/glsl/nir/nir_opt_cse.c | 4 ++-- > > 3 files changed, 50 insertions(+), 25 deletions(-) > > > > diff --git a/src/glsl/nir/nir_instr_set.c b/src/glsl/nir/nir_instr_set.c > > index d3f939f..583618f 100644 > > --- a/src/glsl/nir/nir_instr_set.c > > +++ b/src/glsl/nir/nir_instr_set.c > > @@ -398,6 +398,13 @@ dest_is_ssa(nir_dest *dest, void *data) > > return dest->is_ssa; > > } > > > > +static bool > > +is_load(nir_intrinsic_instr *instr) > > +{ > > + return instr->intrinsic == nir_intrinsic_load_ssbo || > > + instr->intrinsic == nir_intrinsic_load_ssbo_indirect; > > +} > > + > > /* This function determines if uses of an instruction can safely be > > rewritten > > * to use another identical instruction instead. Note that this function > > must > > * be kept in sync with hash_instr() and nir_instrs_equal() -- only > > @@ -406,7 +413,7 @@ dest_is_ssa(nir_dest *dest, void *data) > > */ > > > > static bool > > -instr_can_rewrite(nir_instr *instr) > > +instr_can_rewrite(nir_instr *instr, bool allow_loads) > > { > > /* We only handle SSA. */ > > if (!nir_foreach_dest(instr, dest_is_ssa, NULL) || > > @@ -428,11 +435,15 @@ instr_can_rewrite(nir_instr *instr) > >return true; > > } > > case nir_instr_type_intrinsic: { > > + nir_intrinsic_instr *intrinsic = nir_instr_as_intrinsic(instr); > >const nir_intrinsic_info *info = > > - _intrinsic_infos[nir_instr_as_intrinsic(instr)->intrinsic]; > > - return (info->flags & NIR_INTRINSIC_CAN_ELIMINATE) && > > - (info->flags & NIR_INTRINSIC_CAN_REORDER) && > > - info->num_variables == 0; /* not implemented yet */ > > + _intrinsic_infos[intrinsic->intrinsic]; > > + bool can_eliminate_and_reorder = > > + (info->flags & NIR_INTRINSIC_CAN_ELIMINATE) && > > + (info->flags & NIR_INTRINSIC_CAN_REORDER) && > > + info->num_variables == 0; /* not implemented yet */ > > + return can_eliminate_and_reorder ? > > + true: allow_loads && is_load(intrinsic); > > Isn't this just? > > return can_eliminate_and_reorder || > (allow_loads && is_load(intrinsic)); > > Received: from fanzine.local.igalia.com ([192.168.10.13] > helo=fanzine.igalia.com) > by mail.igalia.com with esmtps > (Cipher TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim) > id 1Zr3SP-0002rb-7Q > for <ito...@igalia.com>; Tue, 27 Oct 2015 13:34:29 +0100 > Received: from mga14.intel.com ([192.55.52.115]) > by fanzine.igalia.com with esmtp (Exim) > id 1Zr3SO-0001hB-Rd > for <ito...@igalia.com>; Tue, 27 Oct 2015 13:34:29 +0100 > Received: from fmsmga002.fm.intel.com ([10.253.24.26]) > by fmsmga103.fm.intel.com with ESMTP; 27 Oct 2015 05:33:51 -0700 > X-ExtLoop1: 1 > X-IronPort-AV: E=Sophos;i="5.20,205,1444719600"; >d="scan'208";a="836522023" > Received: from kgoijens-mobl5.ger.corp.intel.com (HELO nelli) > ([10.252.24.134]) > by fmsmga002.fm.intel.com with ESMTP; 27 Oct 2015 05:33:50 -0700 > Date: Tue, 27 Oct 2015 14:33:50 +0200 > From: "Pohjolainen, Topi" <topi.pohjolai...@intel.com> > To: Iago Toral Quiroga <ito...@igalia.com> > Cc: mesa-dev@lists.freedesktop.org > Subject: Re: [Mesa-dev] [PATCH 1/4] nir/instr_set: Add an allow_loads field > Message-ID: <20151027123349.gb2...@nelli.ger.corp.intel.com> > References: <1445938141-28845-1-git-send-email-ito...@igalia.com> > <1445938141-28845-2-git-send-email-ito...@igalia.com> > MIME-Version: 1.0 > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > In-Reply-To: <1445938141-28845-2-git-send-email-ito...@igalia.com> > User-Agent: Mutt/1.5.23 (2014-03-12) > > On Tue, Oct 27, 2015 at 10:28:58AM +0100, Iago Toral Quiroga
Re: [Mesa-dev] [PATCH 2/3] mesa: Map program UBOs and SSBOs to Interface Blocks
On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > Cc: Iago Toral Quiroga <ito...@igalia.com> > --- > src/glsl/linker.cpp | 14 ++ > src/glsl/standalone_scaffolding.cpp | 5 + > src/mesa/main/mtypes.h | 7 +++ > 3 files changed, 26 insertions(+) > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > index 3db2fd3..d925393 100644 > --- a/src/glsl/linker.cpp > +++ b/src/glsl/linker.cpp > @@ -4448,6 +4448,20 @@ link_shaders(struct gl_context *ctx, struct > gl_shader_program *prog) > >ShaderStorageBlocks, > >NumShaderStorageBlocks); > > + prog->UboInterfaceBlockIndex = > + ralloc_array(prog, int, prog->NumUniformBlocks); > + prog->SsboInterfaceBlockIndex = > + ralloc_array(prog, int, prog->NumShaderStorageBlocks); > + for (unsigned i = 0, u = 0, s = 0; > +i < prog->NumBufferInterfaceBlocks; > +i++) { > + if (prog->BufferInterfaceBlocks[i].IsShaderStorage) { > + prog->SsboInterfaceBlockIndex[s++] = i; > + } else { > + prog->UboInterfaceBlockIndex[u++] = i; > + } > + } This loop is pretty much the same we have in split_ubos_and_ssbos and the functionality implements seems to belong in that function too, so maybe it would be better if we merged this loop in that function. I am thinking that we could pass two more arguments to split_ubos_and_ssbos that would be NULL in the first call (the one we do in the loop for all shader stages) and in the last call we do outside the loop for the shader program, we pass >UboInterfaceBlockIndex and >SsboInterfaceBlockIndex to get these setup. What do you think? > /* FINISHME: Assign fragment shader output locations. */ > > done: > diff --git a/src/glsl/standalone_scaffolding.cpp > b/src/glsl/standalone_scaffolding.cpp > index fe1d820..5c76295 100644 > --- a/src/glsl/standalone_scaffolding.cpp > +++ b/src/glsl/standalone_scaffolding.cpp > @@ -124,6 +124,11 @@ _mesa_clear_shader_program_data(struct gl_shader_program > *shProg) >shProg->InterfaceBlockStageIndex[i] = NULL; > } > > + ralloc_free(shProg->UboInterfaceBlockIndex); > + shProg->UboInterfaceBlockIndex = NULL; > + ralloc_free(shProg->SsboInterfaceBlockIndex); > + shProg->SsboInterfaceBlockIndex = NULL; > + > ralloc_free(shProg->AtomicBuffers); > shProg->AtomicBuffers = NULL; > shProg->NumAtomicBuffers = 0; > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index cce959e..c1cbe96 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -2725,6 +2725,13 @@ struct gl_shader_program > int *InterfaceBlockStageIndex[MESA_SHADER_STAGES]; > > /** > +* Indices into the BufferInterfaceBlocks[] array for Uniform Buffer > +* Objects and Shader Storage Buffer Objects. > +*/ > + int *UboInterfaceBlockIndex; > + int *SsboInterfaceBlockIndex; > + > + /** > * Map of active uniform names to locations > * > * Maps any active uniform that is not an array element to a location. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Add compute shader builtin variables for OpenGLES 3.1
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> On Thu, 2015-10-29 at 00:46 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > src/glsl/builtin_variables.cpp | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp > index a6ad105..00113d5 100644 > --- a/src/glsl/builtin_variables.cpp > +++ b/src/glsl/builtin_variables.cpp > @@ -710,7 +710,7 @@ builtin_variable_generator::generate_constants() >} > } > > - if (state->is_version(430, 0) || state->ARB_compute_shader_enable) { > + if (state->is_version(430, 310) || state->ARB_compute_shader_enable) { >add_const("gl_MaxComputeAtomicCounterBuffers", > MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS); >add_const("gl_MaxComputeAtomicCounters", MAX_COMPUTE_ATOMIC_COUNTERS); >add_const("gl_MaxComputeImageUniforms", MAX_COMPUTE_IMAGE_UNIFORMS); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules
On Thu, 2015-10-29 at 00:47 -0700, Jordan Justen wrote: > The OpenGLES GLSL 3.1 specification uses the precision qualifier > ordering rules from ARB_shading_language_420pack. Maybe expand the commit log to make explicit that this is for GLES 3.1 and desktop GL since 4.2 Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > src/glsl/glsl_parser.yy | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy > index 2f2e10d..4636435 100644 > --- a/src/glsl/glsl_parser.yy > +++ b/src/glsl/glsl_parser.yy > @@ -948,7 +948,8 @@ parameter_qualifier: >if ($2.precision != ast_precision_none) > _mesa_glsl_error(&@1, state, "duplicate precision qualifier"); > > - if (!state->has_420pack() && $2.flags.i != 0) > + if (!(state->has_420pack() || state->is_version(420, 310)) && > + $2.flags.i != 0) > _mesa_glsl_error(&@1, state, "precision qualifiers must come last"); > >$$ = $2; > @@ -1847,7 +1848,8 @@ type_qualifier: >if ($2.precision != ast_precision_none) > _mesa_glsl_error(&@1, state, "duplicate precision qualifier"); > > - if (!state->has_420pack() && $2.flags.i != 0) > + if (!(state->has_420pack() || state->is_version(420, 310)) && > + $2.flags.i != 0) > _mesa_glsl_error(&@1, state, "precision qualifiers must come last"); > >$$ = $2; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] i965: Invalid memory accesses after resizing brw_codegen's store table
On Wed, 2015-10-28 at 10:58 -0700, Kristian Høgsberg wrote: > On Wed, Oct 28, 2015 at 10:01:40AM +0100, Samuel Iglesias Gonsálvez wrote: > > There is no opinions about this issue or reviews of the proposed patch > > after one week. > > > > This is just a reminder in case you have missed it :-) > > Thanks for the reminder! How about something like this instead? Yeah, that works too. I was a bit concerned that this same problem may be affecting other places and this would only address it for brw_send_indirect_message, but after a quick review we don't generally need to hold pointers to previous instructions and the places where we do, like in brw_ENDIF or brw_WHILE we are careful to create the instructions we need before we look for pointers to others (which we do using indices into the store anyway). Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> I'll push this patch tomorrow if nobody else objects. Thanks Kristian! > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > index ebd811f..cd5c726 100644 > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > @@ -2511,12 +2511,20 @@ brw_send_indirect_message(struct brw_codegen *p, >struct brw_reg desc) > { > const struct brw_device_info *devinfo = p->devinfo; > - struct brw_inst *send, *setup; > + struct brw_inst *send; > + int setup; > > assert(desc.type == BRW_REGISTER_TYPE_UD); > > + /* We hold on to the setup instruction (the SEND in the direct case, the > OR > +* in the indirect case) by its index in the instruction store. The > +* pointer returned by next_insn() may become invalid if emitting the SEND > +* in the indirect case reallocs the store. > +*/ > + > if (desc.file == BRW_IMMEDIATE_VALUE) { > - setup = send = next_insn(p, BRW_OPCODE_SEND); > + setup = p->nr_insn; > + send = next_insn(p, BRW_OPCODE_SEND); >brw_set_src1(p, send, desc); > > } else { > @@ -2531,7 +2539,8 @@ brw_send_indirect_message(struct brw_codegen *p, > * caller can specify additional descriptor bits with the usual > * brw_set_*_message() helper functions. > */ > - setup = brw_OR(p, addr, desc, brw_imm_ud(0)); > + setup = p->nr_insn; > + brw_OR(p, addr, desc, brw_imm_ud(0)); > >brw_pop_insn_state(p); > > @@ -2543,7 +2552,7 @@ brw_send_indirect_message(struct brw_codegen *p, > brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD)); > brw_inst_set_sfid(devinfo, send, sfid); > > - return setup; > + return >store[setup]; > } > > static struct brw_inst * > > > > Sam > > > > On 21/10/15 12:23, Iago Toral wrote: > > > Hi, > > > > > > The problem is with code like this (see brw_send_indirect_message): > > > > > > setup = brw_OR(p, addr, desc, brw_imm_ud(0)); > > > send = next_insn(p, BRW_OPCODE_SEND); > > > ... > > > return setup; > > > > > > If next_insn triggers a realloc of the instruction store, then the setup > > > instruction pointer is no longer valid. Notice that this can happen > > > anywhere where we keep pointers to previous instructions before creating > > > new ones (!) > > > > > > The patch from Samuel fixes this by special-casing this for SEND > > > instructions only (since we know that the indirect versions can hit > > > this, maybe there are more situations though). It does so by trying to > > > make sure that we never realloc the store with a SEND instruction. For > > > this, we realloc before we reach the end of the current store (32 > > > instructions before the limit) as long as the instruction is not a SEND > > > (so that if it is a SEND we still have up to 32 opportunities to do the > > > realloc without a different instruction before running out of space in > > > the store). > > > > > > Iago > > > > > > On Wed, 2015-10-21 at 09:02 +0200, Samuel Iglesias Gonsalvez wrote: > > >> Hello, > > >> > > >> I have found several invalid memory accesses when running > > >> dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That > > >> invalid memory accesses were unluckily happening when generating the > > >> assembly instructions for SSBO stores for different compute shaders. > > >> > > >> However it looks like this problem could happen to other shaders and > > >> situations. Because of that, I am going to explain the probl
Re: [Mesa-dev] [PATCH] i965/nir: Mark const index UBO surfaces as used
On Thu, 2015-10-29 at 00:50 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen> --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 6 -- > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 -- > 2 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 4950ba4..6d69e96 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -1438,8 +1438,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , > nir_intrinsic_instr *instr >fs_reg surf_index; > >if (const_index) { > - surf_index = fs_reg(stage_prog_data->binding_table.ubo_start + > - const_index->u[0]); > + unsigned index = stage_prog_data->binding_table.ubo_start + > + const_index->u[0]; > + surf_index = fs_reg(index); > + brw_mark_surface_used(prog_data, index); >} else { > /* The block index is not a constant. Evaluate the index expression >* per-channel and add the base UBO index; we have to select a value > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > index 0f04f65..efbdaa9 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > @@ -749,8 +749,10 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr > *instr) > /* The block index is a constant, so just emit the binding table > entry >* as an immediate. >*/ > - surf_index = src_reg(prog_data->base.binding_table.ubo_start + > - const_block_index->u[0]); > + unsigned index = prog_data->base.binding_table.ubo_start + > + const_block_index->u[0]; > + surf_index = src_reg(index); > + brw_mark_surface_used(_data->base, index); >} else { > /* The block index is not a constant. Evaluate the index expression >* per-channel and add the base UBO index; we have to select a value Is this necessary? As far as I can see, the generator opcodes will mark surfaces for which we have a constant index as used, so this should not be necessary. Iago ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] main/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> On Thu, 2015-10-29 at 00:47 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > src/mesa/main/get_hash_params.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/main/get_hash_params.py > b/src/mesa/main/get_hash_params.py > index c295615..fbc7b8f 100644 > --- a/src/mesa/main/get_hash_params.py > +++ b/src/mesa/main/get_hash_params.py > @@ -460,6 +460,7 @@ descriptor=[ >[ "MAX_COMPUTE_UNIFORM_COMPONENTS", > "CONST(MAX_COMPUTE_UNIFORM_COMPONENTS), extra_ARB_compute_shader_es31" ], >[ "MAX_COMPUTE_IMAGE_UNIFORMS", "CONST(MAX_COMPUTE_IMAGE_UNIFORMS), > extra_ARB_compute_shader_es31" ], >[ "DISPATCH_INDIRECT_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0, > extra_ARB_compute_shader_es31" ], > + [ "MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS", > "CONTEXT_INT(Const.Program[MESA_SHADER_COMPUTE].MaxCombinedUniformComponents), > extra_ARB_compute_shader_es31" ], > > # GL_ARB_framebuffer_no_attachments / GLES 3.1 >["MAX_FRAMEBUFFER_WIDTH", "CONTEXT_INT(Const.MaxFramebufferWidth), > extra_ARB_framebuffer_no_attachments"], ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Setup pull constant state for compute programs
On Thu, 2015-10-29 at 00:49 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > src/mesa/drivers/dri/i965/brw_context.h | 2 +- > src/mesa/drivers/dri/i965/brw_state.h| 1 + > src/mesa/drivers/dri/i965/brw_state_upload.c | 2 ++ > src/mesa/drivers/dri/i965/gen7_cs_state.c| 32 > > 4 files changed, 36 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.h > b/src/mesa/drivers/dri/i965/brw_context.h > index 18c361e..887b57b 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.h > +++ b/src/mesa/drivers/dri/i965/brw_context.h > @@ -1177,7 +1177,7 @@ struct brw_context > > int num_atoms[BRW_NUM_PIPELINES]; > const struct brw_tracked_state render_atoms[60]; > - const struct brw_tracked_state compute_atoms[8]; > + const struct brw_tracked_state compute_atoms[9]; > > /* If (INTEL_DEBUG & DEBUG_BATCH) */ > struct { > diff --git a/src/mesa/drivers/dri/i965/brw_state.h > b/src/mesa/drivers/dri/i965/brw_state.h > index dc2b941..2c7c2f3 100644 > --- a/src/mesa/drivers/dri/i965/brw_state.h > +++ b/src/mesa/drivers/dri/i965/brw_state.h > @@ -49,6 +49,7 @@ extern const struct brw_tracked_state brw_clip_unit; > extern const struct brw_tracked_state brw_vs_pull_constants; > extern const struct brw_tracked_state brw_gs_pull_constants; > extern const struct brw_tracked_state brw_wm_pull_constants; > +extern const struct brw_tracked_state brw_cs_pull_constants; > extern const struct brw_tracked_state brw_constant_buffer; > extern const struct brw_tracked_state brw_curbe_offsets; > extern const struct brw_tracked_state brw_invariant_state; > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > b/src/mesa/drivers/dri/i965/brw_state_upload.c > index 79b8301..0344b8a 100644 > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > @@ -259,6 +259,7 @@ static const struct brw_tracked_state > *gen7_compute_atoms[] = > _state_base_address, > _cs_image_surfaces, > _cs_push_constants, > + _cs_pull_constants, > _cs_ubo_surfaces, > _cs_abo_surfaces, > _texture_surfaces, > @@ -353,6 +354,7 @@ static const struct brw_tracked_state > *gen8_compute_atoms[] = > _state_base_address, > _cs_image_surfaces, > _cs_push_constants, > + _cs_pull_constants, > _cs_ubo_surfaces, > _cs_abo_surfaces, > _texture_surfaces, > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c > b/src/mesa/drivers/dri/i965/gen7_cs_state.c > index 6aeb0cb..da1d05f 100644 > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c > @@ -29,6 +29,7 @@ > #include "brw_shader.h" > #include "intel_mipmap_tree.h" > #include "intel_batchbuffer.h" > +#include "intel_buffer_objects.h" I think you don't need this include. Other than that: Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > #include "brw_state.h" > > static unsigned > @@ -285,3 +286,34 @@ const struct brw_tracked_state gen7_cs_push_constants = { > }, > .emit = gen7_upload_cs_push_constants, > }; > + > +/** > + * Creates a new CS constant buffer reflecting the current CS program's > + * constants, if needed by the CS program. > + */ > +static void > +brw_upload_cs_pull_constants(struct brw_context *brw) > +{ > + struct brw_stage_state *stage_state = >cs.base; > + > + /* BRW_NEW_COMPUTE_PROGRAM */ > + struct brw_compute_program *cp = > + (struct brw_compute_program *) brw->compute_program; > + > + /* BRW_NEW_CS_PROG_DATA */ > + const struct brw_stage_prog_data *prog_data = >cs.prog_data->base; > + > + /* _NEW_PROGRAM_CONSTANTS */ > + brw_upload_pull_constants(brw, BRW_NEW_SURFACES, >program.Base, > + stage_state, prog_data, true); > +} > + > +const struct brw_tracked_state brw_cs_pull_constants = { > + .dirty = { > + .mesa = _NEW_PROGRAM_CONSTANTS, > + .brw = BRW_NEW_BATCH | > + BRW_NEW_COMPUTE_PROGRAM | > + BRW_NEW_CS_PROG_DATA, > + }, > + .emit = brw_upload_cs_pull_constants, > +}; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] mesa/sso: Add MESA_VERBOSE=api trace support
On Thu, 2015-10-29 at 00:52 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > src/mesa/main/pipelineobj.c | 33 + > 1 file changed, 33 insertions(+) > > diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c > index 51ee10f..c8c50fa 100644 > --- a/src/mesa/main/pipelineobj.c > +++ b/src/mesa/main/pipelineobj.c > @@ -230,6 +230,10 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield > stages, GLuint program) > struct gl_shader_program *shProg = NULL; > GLbitfield any_valid_stages; > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glUseProgramStages(%d, 0x%x, %d)\n", > + pipeline, stages, program); > + It probably does not matter much, you since pieline and program and unsigned we might want to use %y instead of %d here. The same in a bunch of of the other hunks in this patch. Either way, Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > if (!pipe) { >_mesa_error(ctx, GL_INVALID_OPERATION, "glUseProgramStages(pipeline)"); >return; > @@ -345,6 +349,9 @@ _mesa_ActiveShaderProgram(GLuint pipeline, GLuint program) > struct gl_shader_program *shProg = NULL; > struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, > pipeline); > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glActiveShaderProgram(%d, %d)\n", pipeline, program); > + > if (program != 0) { >shProg = _mesa_lookup_shader_program_err(ctx, program, > > "glActiveShaderProgram(program)"); > @@ -380,6 +387,9 @@ _mesa_BindProgramPipeline(GLuint pipeline) > GET_CURRENT_CONTEXT(ctx); > struct gl_pipeline_object *newObj = NULL; > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glBindProgramPipeline(%d)\n", pipeline); > + > /* Rebinding the same pipeline object: no change. > */ > if (ctx->_Shader->Name == pipeline) > @@ -467,6 +477,9 @@ _mesa_DeleteProgramPipelines(GLsizei n, const GLuint > *pipelines) > GET_CURRENT_CONTEXT(ctx); > GLsizei i; > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glDeleteProgramPipelines(%d, %p)\n", n, pipelines); > + > if (n < 0) { >_mesa_error(ctx, GL_INVALID_VALUE, "glDeleteProgramPipelines(n<0)"); >return; > @@ -551,6 +564,9 @@ _mesa_GenProgramPipelines(GLsizei n, GLuint *pipelines) > { > GET_CURRENT_CONTEXT(ctx); > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glGenProgramPipelines(%d, %p)\n", n, pipelines); > + > create_program_pipelines(ctx, n, pipelines, false); > } > > @@ -559,6 +575,9 @@ _mesa_CreateProgramPipelines(GLsizei n, GLuint *pipelines) > { > GET_CURRENT_CONTEXT(ctx); > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glCreateProgramPipelines(%d, %p)\n", n, pipelines); > + > create_program_pipelines(ctx, n, pipelines, true); > } > > @@ -574,6 +593,9 @@ _mesa_IsProgramPipeline(GLuint pipeline) > { > GET_CURRENT_CONTEXT(ctx); > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glIsProgramPipeline(%d)\n", pipeline); > + > struct gl_pipeline_object *obj = _mesa_lookup_pipeline_object(ctx, > pipeline); > if (obj == NULL) >return GL_FALSE; > @@ -590,6 +612,10 @@ _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum > pname, GLint *params) > GET_CURRENT_CONTEXT(ctx); > struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, > pipeline); > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glGetProgramPipelineiv(%d, %d, %p)\n", > + pipeline, pname, params); > + > /* Are geometry shaders available in this context? > */ > const bool has_gs = _mesa_has_geometry_shaders(ctx); > @@ -857,6 +883,9 @@ _mesa_ValidateProgramPipeline(GLuint pipeline) > { > GET_CURRENT_CONTEXT(ctx); > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glValidateProgramPipeline(%d)\n", pipeline); > + > struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, > pipeline); > > if (!pipe) { > @@ -875,6 +904,10 @@ _mesa_GetProgramPipelineInfoLog(GLuint pipeline, GLsizei > bufSize, > { > GET_CURRENT_CONTEXT(ctx); > > + if (MESA_VERBOSE & VERBOSE_API) > + _mesa_debug(ctx, "glGetProgramPipelineInfoLog(%d, %d, %p, %p)\n", > + pipeline, bufSize, length, infoLog); > + > struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, > pipeline); > > if (!pipe) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] mesa/sso: Add compute shader support
On Thu, 2015-10-29 at 00:52 -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > src/mesa/main/api_validate.c | 2 +- > src/mesa/main/pipelineobj.c | 11 +++ > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c > index c59b6f3..46f39e7 100644 > --- a/src/mesa/main/api_validate.c > +++ b/src/mesa/main/api_validate.c > @@ -923,7 +923,7 @@ check_valid_to_compute(struct gl_context *ctx, const char > *function) > * "An INVALID_OPERATION error is generated if there is no active program > * for the compute shader stage." > */ > - prog = ctx->Shader.CurrentProgram[MESA_SHADER_COMPUTE]; > + prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE]; > if (prog == NULL || prog->_LinkedShaders[MESA_SHADER_COMPUTE] == NULL) { >_mesa_error(ctx, GL_INVALID_OPERATION, >"%s(no active compute shader)", This hunk won't apply on current master, there is no such comment before this line. Maybe this is part of another series of patches? For the rest of the patch: Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c > index c8c50fa..58730f4 100644 > --- a/src/mesa/main/pipelineobj.c > +++ b/src/mesa/main/pipelineobj.c > @@ -255,6 +255,8 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield > stages, GLuint program) > if (_mesa_has_tessellation(ctx)) >any_valid_stages |= GL_TESS_CONTROL_SHADER_BIT | >GL_TESS_EVALUATION_SHADER_BIT; > + if (_mesa_has_compute_shaders(ctx)) > + any_valid_stages |= GL_COMPUTE_SHADER_BIT; > > if (stages != GL_ALL_SHADER_BITS && (stages & ~any_valid_stages) != 0) { >_mesa_error(ctx, GL_INVALID_VALUE, "glUseProgramStages(Stages)"); > @@ -336,6 +338,9 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield > stages, GLuint program) > > if ((stages & GL_TESS_EVALUATION_SHADER_BIT) != 0) >_mesa_use_shader_program(ctx, GL_TESS_EVALUATION_SHADER, shProg, pipe); > + > + if ((stages & GL_COMPUTE_SHADER_BIT) != 0) > + _mesa_use_shader_program(ctx, GL_COMPUTE_SHADER, shProg, pipe); > } > > /** > @@ -669,6 +674,12 @@ _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum > pname, GLint *params) >*params = pipe->CurrentProgram[MESA_SHADER_FRAGMENT] > ? pipe->CurrentProgram[MESA_SHADER_FRAGMENT]->Name : 0; >return; > + case GL_COMPUTE_SHADER: > + if (!_mesa_has_compute_shaders(ctx)) > + break; > + *params = pipe->CurrentProgram[MESA_SHADER_COMPUTE] > + ? pipe->CurrentProgram[MESA_SHADER_COMPUTE]->Name : 0; > + return; > default: >break; > } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads
On Thu, 2015-10-22 at 09:09 -0700, Jason Ekstrand wrote: > On Thu, Oct 22, 2015 at 4:21 AM, Iago Toral Quiroga <ito...@igalia.com> wrote: > > I implemented this first as a separate optimization pass in GLSL IR [1], but > > Curro pointed out that this being pretty much a restricted form of a CSE > > pass > > it would probably make more sense to do it inside CSE (and we no longer have > > a CSE pass in GLSL IR). > > > > Unlike other things we CSE in NIR, in the case of SSBO loads we need to make > > sure that we invalidate previous entries in the set in the presence of > > conflicting instructions (i.e. SSBO writes to the same block and offset) or > > in the presence of memory barriers. > > > > If this is accepted I intend to extend this to also cover image reads, which > > follow similar behavior. > > > > No regressions observed in piglit or dEQP's SSBO functional tests. > > > > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html > > I think you've gotten enough NAK's that I don't need to chime in > there. Unfortunately, solving this in general is something of a > research project that both Connor and I have been thinking about for > quite some time. I've been thinking off-and-on about how to add a > proper memory model to lower_vars_to_ssa for almost a year now and > still haven't come up with a good way to do it. I don't know whether > SSBO's would be simpler or not. We need a proper memory model for > both lower_vars_to_ssa and SSBO load/stores (and shared local > variables) but it's a substantial research project. > > This isn't to say that you couldn't do it. Just know what you're taking on. > ;-) Yeah, it does not make sense that I try to do this, you guys have clearly given this much more thought than me and know much better how a solution for this would fit in NIR than me. > That said, here's a suggestion for something that we *could* write > today, wouldn't be very hard, and wold solve a decent number of cases. > > For each block: > > 1) Create a new instruction set (don't use anything from any previous blocks) > 2) call add_or_rewrite on all ssbo load operations > 3) If you ever see a barrier or ssbo store, destroy the entire > instruction set and start again. Yep, this is what I was thinking for the load-combine pass that Connor suggested. However, I think that in this case we do not need to destroy the entire set when we find a store, only for memory barriers, right? I mean, there should be nothing preventing us from checking the offset/block of the store and compare it with the offset/block of the loads in the set to decide which ones we need to remove (like I was doing in my last patch) > This is something you could put together fairly quickly and would > handle a fair number of cases. With a little special casing, you may > also be able to handle store and then an immediate load of the same > value or duplicate stores. Anything much more complex than that is > going to take a lot more thought. Yes, I'll give this a try next. Thanks for all the comments and suggestions! Iago ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads
On Thu, 2015-10-22 at 16:38 +0200, Iago Toral wrote: > On Thu, 2015-10-22 at 09:39 -0400, Connor Abbott wrote: > > On Thu, Oct 22, 2015 at 7:21 AM, Iago Toral Quiroga <ito...@igalia.com> > > wrote: > > > I implemented this first as a separate optimization pass in GLSL IR [1], > > > but > > > Curro pointed out that this being pretty much a restricted form of a CSE > > > pass > > > it would probably make more sense to do it inside CSE (and we no longer > > > have > > > a CSE pass in GLSL IR). > > > > > > Unlike other things we CSE in NIR, in the case of SSBO loads we need to > > > make > > > sure that we invalidate previous entries in the set in the presence of > > > conflicting instructions (i.e. SSBO writes to the same block and offset) > > > or > > > in the presence of memory barriers. > > > > > > If this is accepted I intend to extend this to also cover image reads, > > > which > > > follow similar behavior. > > > > > > No regressions observed in piglit or dEQP's SSBO functional tests. > > > > > > [1] > > > http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html > > > > > > Iago Toral Quiroga (2): > > > nir/cse: invalidate SSBO loads in presence of ssbo writes or memory > > > barriers > > > nir/instr_set: allow rewrite of SSBO loads > > > > > > src/glsl/nir/nir_instr_set.c | 24 ++-- > > > src/glsl/nir/nir_opt_cse.c | 142 > > > +++ > > > 2 files changed, 162 insertions(+), 4 deletions(-) > > > > > > -- > > > 1.9.1 > > > > > > ___ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > NAK, this isn't going to work. NIR CSE is designed for operations > > which can be moved around freely as long they're still dominated by > > the SSA values they use. It makes heavy advantage of this to avoid > > looking at the entire CFG and instead only at the current block and > > its parents in the dominance tree. For example, imagine you have > > something like: > > > > A = load_ssbo 0 > > if (cond) { > >store_ssbo 0 > > } > > B = load_ssbo 0 > > > > Then A and B can't be combined, but CSE will combine them anyways when > > it reaches B because it keeps a hash table of values dominating B and > > finds A as a match. It doesn't look at the if conditional at all > > because it doesn't dominate the load to B. This is great when you want > > to CSE pure things that don't depend on other side effects -- after > > all, this is the sort of efficiency that SSA is supposed to give us -- > > but it means that as-is, it can't be used for e.g. SSBO's and images > > without completely changing how the pass works and making it less > > efficient. > > Ugh! One would think that at least one of the 2000+ SSBO tests in dEQP > would catch something like this... I guess not :(. However, I have just tested this and it works just fine. See: buffer Fragments { vec4 v; }; out vec4 color; void main() { vec4 tmp = v; if (tmp.x > 0) { v = vec4(0, 1, 0, 1); } color = v; } And the final NIR SSA form for this is: impl main { block block_0: /* preds: */ vec1 ssa_0 = load_const (0x /* 0.00 */) vec4 ssa_1 = load_const (0x /* 0.00 */, 0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80 /* 1.00 */) vec4 ssa_2 = intrinsic load_ssbo (ssa_0) () (0) vec1 ssa_3 = flt ssa_0, ssa_2 /* succs: block_1 block_2 */ if ssa_3 { block block_1: /* preds: block_0 */ intrinsic store_ssbo (ssa_1, ssa_0) () (0, 15) /* succs: block_3 */ } else { block block_2: /* preds: block_0 */ /* succs: block_3 */ } block block_3: /* preds: block_1 block_2 */ vec4 ssa_4 = intrinsic load_ssbo (ssa_0) () (0) intrinsic store_output (ssa_4) () (0) /* color */ /* succs: block_4 */ block block_4: } What is going on here is that block 1 is in block0->dom_children, so the CSE pass looks into that, sees the store and invalidates the first SSBO load as I was initially hoping that it would. I guess this behavior is not expected then? Iago > > Now, that being said, I still think that we should definitely be doing > > this sort of thing
Re: [Mesa-dev] [PATCH 2/2] glsl: fix GL_BUFFER_DATA_SIZE value for shader storage blocks with unsize arrays
On Thu, 2015-10-22 at 11:01 +0200, Samuel Iglesias Gonsalvez wrote: > From ARB_program_interface_query: > > "For the property of BUFFER_DATA_SIZE, then the implementation-dependent > minimum total buffer object size, in basic machine units, required to hold > all active variables associated with an active uniform block, shader > storage block, or atomic counter buffer is written to . If the > final member of an active shader storage block is array with no declared > size, the minimum buffer size is computed assuming the array was declared > as an array with one element." > > Fixes the following dEQP-GLES31 tests: > > dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.named_block > dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.unnamed_block > dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.block_array > > Signed-off-by: Samuel Iglesias Gonsalvez <sigles...@igalia.com> > --- > src/glsl/link_uniform_blocks.cpp | 15 --- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/src/glsl/link_uniform_blocks.cpp > b/src/glsl/link_uniform_blocks.cpp > index 5285d8d..a10b44b 100644 > --- a/src/glsl/link_uniform_blocks.cpp > +++ b/src/glsl/link_uniform_blocks.cpp > @@ -130,13 +130,22 @@ private: > >unsigned alignment = 0; >unsigned size = 0; > - > + /* From ARB_program_interface_query: Add a blank line here > + * "If the final member of an active shader storage block is array with > + * no declared size, the minimum buffer size is computed assuming the > + * array was declared as an array with one element." Align the two lines above to the If in the first line and indent the block in quotes like it is done for other similar comments in the same function just below this. > + * > + * For that reason, we use the base type of the unsized array to > calculate > + * its size. > + */ I was wondering if we should also check that this is the last member explicitly (there is an unused bool parameter in this function that informs us about that). My understanding is that only SSBOs can have that, and the parser should ensure that they are last in the SSBO definition, so maybe it is redundant... if we don't want to add that check, then maybe it is worth amending the comment to explain why though (and even in that case maybe we want to add an assert). With these changes: Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > + const glsl_type *type_for_size = > + type->is_unsized_array() ? type->without_array() : type; >if (packing == GLSL_INTERFACE_PACKING_STD430) { > alignment = type->std430_base_alignment(v->RowMajor); > - size = type->std430_size(v->RowMajor); > + size = type_for_size->std430_size(v->RowMajor); >} else { > alignment = type->std140_base_alignment(v->RowMajor); > - size = type->std140_size(v->RowMajor); > + size = type_for_size->std140_size(v->RowMajor); >} > >this->offset = glsl_align(this->offset, alignment); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/9] i965: Don't consider control flow instructions to have sources.
On Wed, 2015-10-21 at 15:58 -0700, Matt Turner wrote: > And why did IFF have a destination? > > I suspect that once upon a time the disassembler used this information > to know which fields to find the jump targets in. The jump targets have > moved, so the disassembler has to know how to handle these > per-generation anyway. Control-flow instructions are explicitly handled by the disassembler and don't use .nsrc or .ndst at all, so I think this should be fine. The only other place where we use .ndst is the the check for compr4 in gen < 6, but IFF would not fall in that case anyway. Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > --- > src/mesa/drivers/dri/i965/brw_disasm.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c > b/src/mesa/drivers/dri/i965/brw_disasm.c > index c2dac7c..29056ed 100644 > --- a/src/mesa/drivers/dri/i965/brw_disasm.c > +++ b/src/mesa/drivers/dri/i965/brw_disasm.c > @@ -90,20 +90,20 @@ const struct opcode_desc opcode_descs[128] = { > [BRW_OPCODE_NOP] = { .name = "nop", .nsrc = 0, .ndst = 0 }, > [BRW_OPCODE_NENOP]= { .name = "nenop", .nsrc = 0, .ndst = 0 }, > [BRW_OPCODE_JMPI] = { .name = "jmpi",.nsrc = 0, .ndst = 0 }, > - [BRW_OPCODE_IF] = { .name = "if", .nsrc = 2, .ndst = 0 }, > - [BRW_OPCODE_IFF] = { .name = "iff", .nsrc = 2, .ndst = 1 }, > - [BRW_OPCODE_WHILE]= { .name = "while", .nsrc = 2, .ndst = 0 }, > - [BRW_OPCODE_ELSE] = { .name = "else",.nsrc = 2, .ndst = 0 }, > - [BRW_OPCODE_BREAK]= { .name = "break", .nsrc = 2, .ndst = 0 }, > - [BRW_OPCODE_CONTINUE] = { .name = "cont",.nsrc = 1, .ndst = 0 }, > - [BRW_OPCODE_HALT] = { .name = "halt",.nsrc = 1, .ndst = 0 }, > + [BRW_OPCODE_IF] = { .name = "if", .nsrc = 0, .ndst = 0 }, > + [BRW_OPCODE_IFF] = { .name = "iff", .nsrc = 0, .ndst = 0 }, > + [BRW_OPCODE_WHILE]= { .name = "while", .nsrc = 0, .ndst = 0 }, > + [BRW_OPCODE_ELSE] = { .name = "else",.nsrc = 0, .ndst = 0 }, > + [BRW_OPCODE_BREAK]= { .name = "break", .nsrc = 0, .ndst = 0 }, > + [BRW_OPCODE_CONTINUE] = { .name = "cont",.nsrc = 0, .ndst = 0 }, > + [BRW_OPCODE_HALT] = { .name = "halt",.nsrc = 0, .ndst = 0 }, > // [BRW_OPCODE_MSAVE]= { .name = "msave", .nsrc = 1, .ndst = 1 }, > // [BRW_OPCODE_PUSH] = { .name = "push",.nsrc = 1, .ndst = 1 }, > // [BRW_OPCODE_MREST]= { .name = "mrest", .nsrc = 1, .ndst = 1 }, > // [BRW_OPCODE_POP] = { .name = "pop", .nsrc = 2, .ndst = 0 }, > [BRW_OPCODE_WAIT] = { .name = "wait",.nsrc = 1, .ndst = 0 }, > [BRW_OPCODE_DO] = { .name = "do", .nsrc = 0, .ndst = 0 }, > - [BRW_OPCODE_ENDIF]= { .name = "endif", .nsrc = 2, .ndst = 0 }, > + [BRW_OPCODE_ENDIF]= { .name = "endif", .nsrc = 0, .ndst = 0 }, > }; > > static bool ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] glsl: Drop exec_list argument to lower_ubo_reference
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> On Wed, 2015-11-04 at 15:33 -0800, Kristian Høgsberg Kristensen wrote: > We always pass in shader->ir and we already pass in the shader, so just > drop the exec_list. Most passes either take just a exec_list or a > shader, so this seems more consistent. > > Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net> > --- > src/glsl/ir_optimization.h | 2 +- > src/glsl/lower_ubo_reference.cpp | 4 ++-- > src/mesa/drivers/dri/i965/brw_link.cpp | 2 +- > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +- > 4 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h > index ce5c492..6d19a6c 100644 > --- a/src/glsl/ir_optimization.h > +++ b/src/glsl/ir_optimization.h > @@ -124,7 +124,7 @@ bool lower_const_arrays_to_uniforms(exec_list > *instructions); > bool lower_clip_distance(gl_shader *shader); > void lower_output_reads(unsigned stage, exec_list *instructions); > bool lower_packing_builtins(exec_list *instructions, int op_mask); > -void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions); > +void lower_ubo_reference(struct gl_shader *shader); > void lower_packed_varyings(void *mem_ctx, > unsigned locations_used, ir_variable_mode mode, > unsigned gs_input_vertices, gl_shader *shader); > diff --git a/src/glsl/lower_ubo_reference.cpp > b/src/glsl/lower_ubo_reference.cpp > index 57a242b..24806ac 100644 > --- a/src/glsl/lower_ubo_reference.cpp > +++ b/src/glsl/lower_ubo_reference.cpp > @@ -1270,7 +1270,7 @@ lower_ubo_reference_visitor::visit_enter(ir_call *ir) > } /* unnamed namespace */ > > void > -lower_ubo_reference(struct gl_shader *shader, exec_list *instructions) > +lower_ubo_reference(struct gl_shader *shader) > { > lower_ubo_reference_visitor v(shader); > > @@ -1281,6 +1281,6 @@ lower_ubo_reference(struct gl_shader *shader, exec_list > *instructions) > */ > do { >v.progress = false; > - visit_list_elements(, instructions); > + visit_list_elements(, shader->ir); > } while (v.progress); > } > diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp > b/src/mesa/drivers/dri/i965/brw_link.cpp > index fc9bee4..f1e3860 100644 > --- a/src/mesa/drivers/dri/i965/brw_link.cpp > +++ b/src/mesa/drivers/dri/i965/brw_link.cpp > @@ -157,7 +157,7 @@ process_glsl_ir(gl_shader_stage stage, > _mesa_shader_stage_to_abbrev(shader->Stage)); > } > > - lower_ubo_reference(shader, shader->ir); > + lower_ubo_reference(shader); > > bool progress; > do { > diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > index f481e89..ca00930 100644 > --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > @@ -5822,7 +5822,7 @@ st_link_shader(struct gl_context *ctx, struct > gl_shader_program *prog) > (!ctx->Const.NativeIntegers ? INT_DIV_TO_MUL_RCP : > 0) | > (options->EmitNoSat ? SAT_TO_CLAMP : 0)); > > - lower_ubo_reference(prog->_LinkedShaders[i], ir); > + lower_ubo_reference(prog->_LinkedShaders[i]); >do_vec_index_to_cond_assign(ir); >lower_vector_insert(ir, true); >lower_quadop_vector(ir, false); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] glsl: Lower UBO and SSBO access in glsl linker
On Wed, 2015-11-04 at 15:33 -0800, Kristian Høgsberg Kristensen wrote: > All GLSL IR consumers run this lowering pass so we can move it to the > linker. This moves the pass up quite a bit, but that's the point: it > needs to run before we throw away information about per-component vector > access. > > Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net> > --- > src/glsl/linker.cpp| 8 > src/mesa/drivers/dri/i965/brw_link.cpp | 2 -- > src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ > src/mesa/main/mtypes.h | 2 ++ > src/mesa/state_tracker/st_extensions.c | 1 + > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 1 - > 6 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > index c35d87a..ea6a3f3 100644 > --- a/src/glsl/linker.cpp > +++ b/src/glsl/linker.cpp > @@ -4449,6 +4449,14 @@ link_shaders(struct gl_context *ctx, struct > gl_shader_program *prog) > > /* FINISHME: Assign fragment shader output locations. */ > > + for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { > + if (prog->_LinkedShaders[i] == NULL) > + continue; > + > + if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks) > + lower_ubo_reference(prog->_LinkedShaders[i]); > + } > + It probably makes more sense to rewrite this loop as: if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks) { for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { if (prog->_LinkedShaders[i] != NULL) lower_ubo_reference(prog->_LinkedShaders[i]); } } With that change, and assuming that this change is not responsible for the shader-db regressions posted by Jason: Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > done: > for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { >free(shader_list[i]); > diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp > b/src/mesa/drivers/dri/i965/brw_link.cpp > index f1e3860..2991173 100644 > --- a/src/mesa/drivers/dri/i965/brw_link.cpp > +++ b/src/mesa/drivers/dri/i965/brw_link.cpp > @@ -157,8 +157,6 @@ process_glsl_ir(gl_shader_stage stage, > _mesa_shader_stage_to_abbrev(shader->Stage)); > } > > - lower_ubo_reference(shader); > - > bool progress; > do { >progress = false; > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp > b/src/mesa/drivers/dri/i965/brw_shader.cpp > index 4ea297a..5adc986 100644 > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp > @@ -148,6 +148,8 @@ brw_compiler_create(void *mem_ctx, const struct > brw_device_info *devinfo) > compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true; > >compiler->glsl_compiler_options[i].NirOptions = nir_options; > + > + compiler->glsl_compiler_options[i].LowerBufferInterfaceBlocks = true; > } > > return compiler; > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index d6c1eb8..800ad81 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -2874,6 +2874,8 @@ struct gl_shader_compiler_options > */ > GLboolean OptimizeForAOS; > > + GLboolean LowerBufferInterfaceBlocks; /**< Lower UBO and SSBO access to > intrinsics. */ > + > const struct nir_shader_compiler_options *NirOptions; > }; > > diff --git a/src/mesa/state_tracker/st_extensions.c > b/src/mesa/state_tracker/st_extensions.c > index bd7cbcc..bbb9027 100644 > --- a/src/mesa/state_tracker/st_extensions.c > +++ b/src/mesa/state_tracker/st_extensions.c > @@ -254,6 +254,7 @@ void st_init_limits(struct pipe_screen *screen, > > PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT); > >options->LowerClipDistance = true; > + options->LowerBufferInterfaceBlocks = true; > } > > c->LowerTessLevel = true; > diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > index ca00930..9ee6f8f 100644 > --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > @@ -5822,7 +5822,6 @@ st_link_shader(struct gl_context *ctx, struct > gl_shader_program *prog) > (!ctx->Const.NativeIntegers ? INT_DIV_TO_MUL_RCP : > 0) | > (options->EmitNoSat ? SAT_TO_CLAMP : 0)); > > - lower_ubo_reference(prog->_LinkedShaders[i]); >do_vec_index_to_cond_assign(ir); >lower_vector_insert(ir, true); >lower_quadop_vector(ir, false); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] glsl: Lower UBO and SSBO access in glsl linker
On Mon, 2015-11-09 at 16:52 +0100, Iago Toral wrote: > On Wed, 2015-11-04 at 15:33 -0800, Kristian Høgsberg Kristensen wrote: > > All GLSL IR consumers run this lowering pass so we can move it to the > > linker. This moves the pass up quite a bit, but that's the point: it > > needs to run before we throw away information about per-component vector > > access. > > > > Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net> > > --- > > src/glsl/linker.cpp| 8 > > src/mesa/drivers/dri/i965/brw_link.cpp | 2 -- > > src/mesa/drivers/dri/i965/brw_shader.cpp | 2 ++ > > src/mesa/main/mtypes.h | 2 ++ > > src/mesa/state_tracker/st_extensions.c | 1 + > > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 1 - > > 6 files changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp > > index c35d87a..ea6a3f3 100644 > > --- a/src/glsl/linker.cpp > > +++ b/src/glsl/linker.cpp > > @@ -4449,6 +4449,14 @@ link_shaders(struct gl_context *ctx, struct > > gl_shader_program *prog) > > > > /* FINISHME: Assign fragment shader output locations. */ > > > > + for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { > > + if (prog->_LinkedShaders[i] == NULL) > > +continue; > > + > > + if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks) > > + lower_ubo_reference(prog->_LinkedShaders[i]); > > + } > > + > > It probably makes more sense to rewrite this loop as: > > if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks) { >for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { > if (prog->_LinkedShaders[i] != NULL) > lower_ubo_reference(prog->_LinkedShaders[i]); >} > } > > With that change, and assuming that this change is not responsible for > the shader-db regressions posted by Jason: Forget about that, I did not notice that LowerBufferInterfaceBlocks is set by stage. You can keep the Rb for the original version. Iago > Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > > > done: > > for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) { > >free(shader_list[i]); > > diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp > > b/src/mesa/drivers/dri/i965/brw_link.cpp > > index f1e3860..2991173 100644 > > --- a/src/mesa/drivers/dri/i965/brw_link.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_link.cpp > > @@ -157,8 +157,6 @@ process_glsl_ir(gl_shader_stage stage, > > _mesa_shader_stage_to_abbrev(shader->Stage)); > > } > > > > - lower_ubo_reference(shader); > > - > > bool progress; > > do { > >progress = false; > > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp > > b/src/mesa/drivers/dri/i965/brw_shader.cpp > > index 4ea297a..5adc986 100644 > > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp > > @@ -148,6 +148,8 @@ brw_compiler_create(void *mem_ctx, const struct > > brw_device_info *devinfo) > > compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true; > > > >compiler->glsl_compiler_options[i].NirOptions = nir_options; > > + > > + compiler->glsl_compiler_options[i].LowerBufferInterfaceBlocks = true; > > } > > > > return compiler; > > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > > index d6c1eb8..800ad81 100644 > > --- a/src/mesa/main/mtypes.h > > +++ b/src/mesa/main/mtypes.h > > @@ -2874,6 +2874,8 @@ struct gl_shader_compiler_options > > */ > > GLboolean OptimizeForAOS; > > > > + GLboolean LowerBufferInterfaceBlocks; /**< Lower UBO and SSBO access to > > intrinsics. */ > > + > > const struct nir_shader_compiler_options *NirOptions; > > }; > > > > diff --git a/src/mesa/state_tracker/st_extensions.c > > b/src/mesa/state_tracker/st_extensions.c > > index bd7cbcc..bbb9027 100644 > > --- a/src/mesa/state_tracker/st_extensions.c > > +++ b/src/mesa/state_tracker/st_extensions.c > > @@ -254,6 +254,7 @@ void st_init_limits(struct pipe_screen *screen, > > > > PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT); > > > >options->LowerClipDistance = true; > > + options->LowerBufferInterfaceBlocks = true; > > } > > > > c->LowerTessLevel = true; &
Re: [Mesa-dev] [PATCH 6/7] glsl: do not loose precision information when packing varyings
s/loose/lose Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> On Thu, 2015-11-05 at 13:33 +0200, Tapani Pälli wrote: > This information will be used by cross stage validation of varyings > for pipeline objects. > > Signed-off-by: Tapani Pälli <tapani.pa...@intel.com> > --- > src/glsl/lower_packed_varyings.cpp | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/glsl/lower_packed_varyings.cpp > b/src/glsl/lower_packed_varyings.cpp > index 5d66ca9..037c27d 100644 > --- a/src/glsl/lower_packed_varyings.cpp > +++ b/src/glsl/lower_packed_varyings.cpp > @@ -621,6 +621,7 @@ lower_packed_varyings_visitor::get_packed_varying_deref( >packed_var->data.patch = unpacked_var->data.patch; >packed_var->data.interpolation = unpacked_var->data.interpolation; >packed_var->data.location = location; > + packed_var->data.precision = unpacked_var->data.precision; >unpacked_var->insert_before(packed_var); >this->packed_varyings[slot] = packed_var; > } else { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5 5/7] glsl: Add precision information to ir_variable
On Fri, 2015-11-06 at 14:03 +0200, Tapani Pälli wrote: > From: Iago Toral Quiroga <ito...@igalia.com> > > We will need this later on when we implement proper support for > precision qualifiers in the drivers and also to do link time checks for > uniforms as indicated by the spec. > > This patch also adds compile-time checks for variables without precision > information (currently, Mesa only checks that a default precision is set > for floats in fragment shaders). > > As indicated by Ian, the addition of the precision information to > ir_variable has been done using a bitfield and pahole to identify an > available hole so that memory requirements for ir_variable stay the > same. > > v2 (Ian): > - Avoid if-ladders by defining arrays of supported sampler names and > indexing > into them with type->sampler_array + 2 * type->sampler_shadow > - Make the code that selects the precision qualifier to use an utility > function > - Fix a typo > > v3 (Tapani): > - rebased > - squashed in "Precision qualifiers are not allowed on structs" > - fixed select_gles_precision for sampler arrays > - fixed precision_qualifier_allowed for arrays of structs > > v4 (Tapani): > - add atomic_uint handling > - do not allow precision qualifier on images > (issues reported by Marta) > > v5 (Tapani): > - support precision qualifier on image types > --- > src/glsl/ast_to_hir.cpp | 296 > > src/glsl/ir.h | 13 ++ > src/glsl/nir/glsl_types.cpp | 4 + > src/glsl/nir/glsl_types.h | 11 ++ > 4 files changed, 301 insertions(+), 23 deletions(-) > > diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp > index b6d662b..1240615 100644 > --- a/src/glsl/ast_to_hir.cpp > +++ b/src/glsl/ast_to_hir.cpp > @@ -2189,10 +2189,10 @@ precision_qualifier_allowed(const glsl_type *type) > * From this, we infer that GLSL 1.30 (and later) should allow precision > * qualifiers on sampler types just like float and integer types. > */ > - return type->is_float() > + return (type->is_float() > || type->is_integer() > - || type->is_record() > - || type->contains_opaque(); > + || type->contains_opaque()) > + && !type->without_array()->is_record(); > } > > const glsl_type * > @@ -2210,31 +2210,268 @@ ast_type_specifier::glsl_type(const char **name, > return type; > } > > -const glsl_type * > -ast_fully_specified_type::glsl_type(const char **name, > -struct _mesa_glsl_parse_state *state) > const > +/** > + * From the OpenGL ES 3.0 spec, 4.5.4 Default Precision Qualifiers: > + * > + * "The precision statement > + * > + *precision precision-qualifier type; > + * > + * can be used to establish a default precision qualifier. The type field > can > + * be either int or float or any of the sampler types, (...) If type is > float, > + * the directive applies to non-precision-qualified floating point type > + * (scalar, vector, and matrix) declarations. If type is int, the directive > + * applies to all non-precision-qualified integer type (scalar, vector, > signed, > + * and unsigned) declarations." > + * > + * We use the symbol table to keep the values of the default precisions for > + * each 'type' in each scope and we use the 'type' string from the precision > + * statement as key in the symbol table. When we want to retrieve the default > + * precision associated with a given glsl_type we need to know the type > string > + * associated with it. This is what this function returns. > + */ > +static const char * > +get_type_name_for_precision_qualifier(const glsl_type *type) > { > - const struct glsl_type *type = this->specifier->glsl_type(name, state); > - > - if (type == NULL) > - return NULL; > + switch (type->base_type) { > + case GLSL_TYPE_FLOAT: > + return "float"; > + case GLSL_TYPE_UINT: > + case GLSL_TYPE_INT: > + return "int"; > + case GLSL_TYPE_ATOMIC_UINT: > + return "atomic_uint"; > + case GLSL_TYPE_IMAGE: > + /* fallthrough */ I think this is not correct. As far as I understand the spec, we can set a default precision for any of the image types: image2D image3D imageCube image2DArray iimage2D iimage3D iimageCube iimage2DArray uimage2D uimage3D uimageCube uimage2DArray but here you are re-using the precisions from samplers, so if we do this: #precision lowp sampler2D; #precision highp image2D; the latter statement is ignored, and the former affects
Re: [Mesa-dev] nir/i965: Source modifiers on vecN opcodes
On Mon, 2015-11-09 at 12:27 -0500, Connor Abbott wrote: > On Mon, Nov 9, 2015 at 10:41 AM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > > > > On Nov 9, 2015 7:24 AM, "Connor Abbott" <cwabbo...@gmail.com> wrote: > >> > >> On Mon, Nov 9, 2015 at 6:55 AM, Iago Toral <ito...@igalia.com> wrote: > >> > Hi, > >> > > >> > Currently, NIR defines vecN operations as unsigned (integer). The fp64 > >> > patches from Connor change this to float (I guess because we need to > >> > know the case where we are packing vectors of 64-bit floats). However, > >> > this makes it so that nir_lower_source_to_mods turns this: > >> > > >> > vec1 ssa_2 = fmov -ssa_1.y > >> > vec3 ssa_3 = vec3 ssa_1, ssa_2, ssa_0 > >> > > >> > into: > >> > > >> > vec3 ssa_2 = vec3 ssa_1, -ssa_1.y, ssa_0 > >> > > >> > This only happens because the vec3 operation is defined as a float > >> > operation now, otherwise it would not try to do this. It is not clear to > >> > me if this is by design, I mean, have this kind of things only kick-in > >> > for float/int and define vecN operations as unsigned to avoid this for > >> > them. > >> > > >> > The problem comes later when we call nir_lower_vec_to_movs in the i965 > >> > vec4 backend. That pass generates a separate MOV for each component in > >> > the vector, but to do that properly when a negate is involved it needs > >> > to know if this is a float or an integer operand, which it does not > >> > know at this point. The current code always emits an imov, which won't > >> > work if the operand is a float. > >> > > >> > I can think of two solutions for this: > >> > > >> > 1) Change nir_lower_source_to_mods so it does not try to rewrite alu > >> > operations where a source comes from a fmov with a negate, or at least > >> > if the instruction we are trying to rewrite is a vecN operation (or > >> > maybe allow this in scalar mode only?) > >> > > >> > 2) In nir_lower_vec_to_movs, if a source is negated, check its > >> > parent_instr and try to guess its type from that (in this example, we > >> > would see it came from fmov and we can say it is a float and emit fmov > >> > instead of imov). Not sure if this would work in all possible scenarios > >> > though. > >> > > >> > Opinions? > >> > > >> > Iago > >> > > >> > ___ > >> > mesa-dev mailing list > >> > mesa-dev@lists.freedesktop.org > >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > >> > >> The only reason I changed vecN to produce floats is to avoid producing > >> 64-bit integer instructions, which at one point the constant folding > >> infrastructure couldn't support (but now it can), so you can just > >> revert the change. Ofc the i965 backend won't be able to express this > >> directly, but for now you can silently change 64-bit integers to > >> floats and assert that they only happen in things that copy data > >> around. > > > > I would tend to agree. We could also make it unsigned so no source > > modifiers ever make sense. Meh. > > Oh yeah, I meant assert that we don't get e.g. a 64-bit iadd, so we > remember to fix that later. When we get support for real 64-bit > integers, we'll have to only map nir_type_int64/uint64 to DF on gen7. Ok, sounds reasonable to me. I'll make vecN opcodes unsigned again and add asserts in the driver to catch 64-bit integer ALU operations. Thank you both for the suggestions! Iago > > > > If we did want to keep vecN float, the thing to do would be to make > > vec_to_move lower it to fmovs rather than imovs. But, like Connor said, > > just asserting no source modifiers for th 64-bit version in the backend is > > probably best. > > > > --Jason > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Correctly handle vector extract on function parameter
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> On Thu, 2015-11-12 at 00:44 -0800, Jordan Justen wrote: > This commit accidentally used a '==' when '=' was intended. > > commit 96b22fb080894ba1840af2372f28a46cc0f40c76 > Author: Kristian Høgsberg Kristensen <k...@bitplanet.net> > Date: Wed Nov 4 14:58:54 2015 -0800 > > glsl: Use array deref for access to vector components > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > Cc: Kristian Høgsberg Kristensen <k...@bitplanet.net> > Cc: Matt Turner <matts...@gmail.com> > --- > Ilia's suggestion seems like what we want. > > Passed jenkins. Then again, so did the original patch. > > src/glsl/ast_function.cpp | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp > index 5584470..466ece6 100644 > --- a/src/glsl/ast_function.cpp > +++ b/src/glsl/ast_function.cpp > @@ -368,8 +368,8 @@ fix_parameter(void *mem_ctx, ir_rvalue *actual, const > glsl_type *formal_type, > > ir_rvalue *lhs = actual; > if (expr != NULL && expr->operation == ir_binop_vector_extract) { > - lhs == new(mem_ctx) > ir_dereference_array(expr->operands[0]->clone(mem_ctx, NULL), > - > expr->operands[1]->clone(mem_ctx, NULL)); > + lhs = new(mem_ctx) > ir_dereference_array(expr->operands[0]->clone(mem_ctx, NULL), > + > expr->operands[1]->clone(mem_ctx, NULL)); > } > > ir_assignment *const assignment_2 = new(mem_ctx) ir_assignment(lhs, rhs); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev