Re: [Mesa-dev] [PATCH v2 0/6] Improvements to the vec4 spilling code

2015-07-29 Thread Iago Toral
On Tue, 2015-07-28 at 18:17 +0300, Francisco Jerez wrote:
 Iago Toral Quiroga ito...@igalia.com writes:
 
  Link to v1:
  http://lists.freedesktop.org/archives/mesa-dev/2015-July/089766.html
 
  Changes after review (Curro)
- Drop the patch that asserted that the reg size should always be 1
- Expand this so that we do not unspill a register if we have just
  unspilled it as well
- Use brw_mask_for_swizzle
- Update spilling costs accordingly
 
  New changes:
 
- Expand the optimizations that are based on caching the spilled/unspilled
  so we keep using the cached register for as long as consecutive 
  instructions
  keep reading the register (the previous version would only do this for 
  one
  instruction). This is because we only see benefits for register 
  allocation
  when there are gaps in the life span of a register where it is not used
  (because these are the only instances in which we can use that reg for a
  different purpose), so as long as consecutive instructions keep reading 
  a
  register we have just spilled or unspilled, we don't have to unspill it
  again.
 
 I think this may be a good idea (assuming you've managed to measure an
 improvement in practice), but I don't think that the explanation is
 strictly speaking correct.  It *may* be beneficial to, say, unspill a
 variable for instruction i and then do it again for instruction i+1,
 because the set of variables live at instruction i may not be exactly
 the same as in instruction i+1, and by caching the value between both
 instructions you cause the temporary to interfere with the union of both
 sets simultaneously, what may increase the total number of registers
 required to register-allocate the program.

This is true, although you also need to allocate a register for the new
vgrf used to unspill, so I think the chances of this being beneficial in
practice are very low. I'll make sure to update the comment to be more
precise though.

 That said I think that this may still be a good idea because the
 register-pressure benefit from separating the live ranges of temporaries
 used in consecutive instructions is likely to be tiny typically, the
 program is likely to have other spilling candidates which may simplify
 the interference graph drastically for the same amount of fill/spill
 bandwidth invested, so I think you're right that in most cases it's
 going to be silly to re-spill/fill the same variable in consecutive
 instructions.

Right. The way I would expect this to work in practice is that we start
by spilling registers with the best benefit / cost ratio. That should be
registers that have a long life-span and usage gaps where the main
benefit for allocation comes from being able to allocate the register
for a different purpose during these gaps, so there should lose very
little for register allocation by doing this (if anything at all).

 In the future it may also be worth checking whether the heuristic can be
 refined to use some sort of register pressure-sensitive distance between
 uses of the same spilled variable as metric to decide whether the
 variable is worth re-spilling or if it makes sense for it to be cached
 between a pair of potentially non-consecutive uses.
 
 Anyway I'll have a closer look at the rest of your series soon-ish.

Thanks Curro!

  Other  
  Iago Toral Quiroga (6):
i965/vec4: Only emit one scratch read per instruction for spilled
  registers
i965/vec4: Remove checks for reladdr when checking for spillable
  registers
i965/vec4: Don't emit scratch reads for a spilled register we have
  just written
i965/vec4: Don't emit scratch reads for a register we have just
  unspilled
i965/vec4: Adjust spilling cost for consecutive instructions
i965: Add a debug option for spilling everything in vec4 code
 
   src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |   2 +-
   src/mesa/drivers/dri/i965/brw_vec4.cpp |   2 +-
   .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 145 
  +++--
   src/mesa/drivers/dri/i965/intel_debug.c|   3 +-
   src/mesa/drivers/dri/i965/intel_debug.h|   5 +-
   5 files changed, 139 insertions(+), 18 deletions(-)
 
  -- 
  1.9.1


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] ssbo, vec4, nir - Re: [PATCH v3 (part2) 00/56] ARB_shader_storage_buffer_object (mesa, i965)

2015-08-05 Thread Iago Toral
On Tue, 2015-08-04 at 17:54 -0700, Jason Ekstrand wrote:
 On Tue, Aug 4, 2015 at 5:09 PM, Jordan Justen jordan.l.jus...@intel.com 
 wrote:
  I checked the extension spec, and it appears that like image
  load/store, ssbo is only required in fragment and compute stages. I
  don't think the ssbo support in vec4 is quite as problematic as image
  load/store, so I'm not sure this information will prove useful.
 
  I did wonder what path we should take with ssbo, given that the vec4
  nir support landed. It seems like vec4 nir is not enabled by default
  at this point, and I'm not sure what the plan is there.
 
  Should we only enable ssbo for geometry stages:
 
   1. Only when nir is enabled
 
   2. Only when nir is disabled. (since that is the default currently)
 
   3. Always
 
   4. Not at all. (At least for now, since the spec doesn't require it)
 
  I think my vote would be #1, especially if we plan to enable nir for
  vec4 by default before the next release.
 
 I would agree especially if we think that using NIR by default in mesa
 11.0 is going to happen.  Given how smoothly the vec4 NIR stuff has
 worked, I think that's probably possible.  Current shader-db results
 on HSW are as following:

Aha, in that case we will start working on adding the NIR bits for
geometry stages to our series as well. We had been postponing this
because I thought it would probably take more time to enable NIR/vec4 by
default, but with this plan it makes sense to start working on that now.

 total instructions in shared programs: 1816713 - 1798252 (-1.02%)
 instructions in affected programs: 1671486 - 1653025 (-1.10%)
 helped:5248
 HURT:  12868
 GAINED:0
 LOST:  0
 
 We can probably whittle that down pretty quick.
 --Jason
 
  -Jordan
 
  On 2015-07-14 00:46:02, Iago Toral Quiroga wrote:
  This is the second part of the v3 series including remaining frontend bits
  like the optional unsized array at the bottom of SSBO definitions, layout
  mode std430 and implementation for GL queries/getters. It also includes
  all of the i965 backend implementation.
 
  Notice that this series depends on the following patches from
  Curro that have not been landed in master yet:
 
  i965/vec4: Import surface message builder functions.
  i965/vec4: Import helpers to convert vectors into arrays and back.
  i965/vec4: Import array utils for the surface message builder.
  i965/fs: Import surface message builder functions.
  i965/fs: Import helpers to convert vectors into arrays and back.
  i965/fs: Import array utils for the surface message builder.
  i965: Lift the constness restriction on surface indices passed to untyped 
  ops.
  i965/vec4: Introduce VEC4 IR builder.
  i965/vec4: Make src_reg conversion constructor from dst_reg implicit.
  i965: Add register constructors taking a backend_reg as argument.
  i965/fs: Fix register coalesce not to lose track of the second half of 
  16-wide moves.
 
  This v3-part2 series, together with these patches from Curro and part1 
  (which
  has already been landed in master), can be obtained from this development
  repository:
 
  git clone -b itoral-ARB_shader_storage_buffer_object-v3-part2 
  https://github.com/Igalia/mesa.git
 
  Also notice that:
  * Patch 55 is a separate bugfix that affects UBOs as well.
  * Patch 56 is not required by the series and is completely optional.
 
  SSBO piglit tests have already been merged in piglit's master branch.
 
  Antia Puentes (1):
glsl: Consider active all elements of a shared/std140 block array
 
  Iago Toral Quiroga (29):
i965: Use 16-byte offset alignment for shader storage buffers
i965: Implement DriverFlags.NewShaderStorageBuffer
i965: Set MaxShaderStorageBuffers for compute shaders
i965: Upload Shader Storage Buffer Object surfaces
i965: handle visiting of ir_var_shader_storage variables
i965/fs: Do not split buffer variables
i965/fs: Do not include the header with a pixel mask in untyped read
  messages
i965/vec4: Implement SSBO writes
nir: Implement __intrinsic_store_ssbo
i965/nir/fs: Implement SSBO writes
i965/vec4: Implement SSBO reads
nir: Implement __intrinsic_store_load
i965/nir/fs: Implement SSBO reads
nir: ignore an instruction's dest if it hasn't any
glsl: Rename atomic counter functions
glsl: Add atomic functions from ARB_shader_storage_buffer_object
glsl: lower SSBO atomic intrinsics
nir: Implement lowered SSBO atomic intrinsics
i965/nir/fs: Implement SSBO atomics
i965/vec4: Implement lowered SSBO atomic intrinsics
glsl: First argument to atomic functions must be a buffer variable
mesa: Add queries for GL_SHADER_STORAGE_BUFFER
glsl: Allow use of memory qualifiers with
  ARB_shader_storage_buffer_object.
glsl: Apply memory qualifiers to buffer variables
glsl: Allow memory layout qualifiers on shader

Re: [Mesa-dev] [PATCH v3 (part2) 08/56] glsl: add support for unsized arrays in shader storage blocks

2015-08-05 Thread Iago Toral
On Tue, 2015-08-04 at 14:08 -0700, Jordan Justen wrote:
 On 2015-07-14 00:46:10, Iago Toral Quiroga wrote:
  From: Samuel Iglesias Gonsalvez sigles...@igalia.com
  
  They only can be defined in the last position of the shader
  storage blocks.
  
  When an unsized array is used in different shaders, it might be
  converted in different sized arrays, avoid get a linker error
  in that case.
  
  v2:
  - Rework error condition and error messages (Timothy Arteri)
 
 Arceri
 
  Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
  ---
   src/glsl/ast_array_index.cpp |   5 +-
   src/glsl/ast_to_hir.cpp  |  66 ++
   src/glsl/ir.cpp  |   1 +
   src/glsl/ir.h|  14 ++
   src/glsl/linker.cpp  | 107 
  ---
   5 files changed, 155 insertions(+), 38 deletions(-)
  
  diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp
  index 2c79002..8a1ae67 100644
  --- a/src/glsl/ast_array_index.cpp
  +++ b/src/glsl/ast_array_index.cpp
  @@ -182,8 +182,9 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
 if (array-type-is_array())
update_max_array_access(array, idx, loc, state);
  } else if (const_index == NULL  array-type-is_array()) {
  -  if (array-type-is_unsized_array()) {
  -_mesa_glsl_error(loc, state, unsized array index must be 
  constant);
  +  if (array-type-is_unsized_array() 
  +  array-variable_referenced()-data.mode != 
  ir_var_shader_storage) {
  +_mesa_glsl_error(loc, state, unsized array index must be 
  constant);
 } else if (array-type-fields.array-is_interface()
 array-variable_referenced()-data.mode == 
  ir_var_uniform
 !state-is_version(400, 0)  
  !state-ARB_gpu_shader5_enable) {
  diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
  index ca30dbc..1b4ee22 100644
  --- a/src/glsl/ast_to_hir.cpp
  +++ b/src/glsl/ast_to_hir.cpp
  @@ -5599,6 +5599,19 @@ private:
  bool found;
   };
   
  +static bool
  +is_unsized_array_last_element(ir_variable *v)
  +{
  +   const glsl_type *interface_type = v-get_interface_type();
  +   int length = interface_type-length;
  +
  +   assert(v-type-is_unsized_array());
  +
  +   /* Check if it is the last element of the interface */
  +   if (strcmp(interface_type-fields.structure[length-1].name, v-name) == 
  0)
  +  return true;
  +   return false;
  +}
   
   ir_rvalue *
   ast_interface_block::hir(exec_list *instructions,
  @@ -5913,6 +5926,33 @@ ast_interface_block::hir(exec_list *instructions,
 if (state-stage == MESA_SHADER_GEOMETRY  var_mode == 
  ir_var_shader_in)
handle_geometry_shader_input_decl(state, loc, var);
   
  +  for (unsigned i = 0; i  num_variables; i++) {
  + if (fields[i].type-is_unsized_array()) {
  +if (var_mode == ir_var_shader_storage) {
  +   if (i != (num_variables - 1)) {
  +  _mesa_glsl_error(loc, state, unsized array `%s' 
  definition: 
  +   only last member of a shader storage 
  block 
  +   can be defined as unsized array,
  +   fields[i].name);
  +   }
  +} else {
  +   /* From GLSL ES 3.10 spec, section 4.1.9 Arrays:
  +   *
  +   * If an array is declared as the last member of a shader 
  storage
  +   * block and the size is not specified at compile-time, it is
  +   * sized at run-time. In all other cases, arrays are sized 
  only
  +   * at compile-time.
  +   */
 
 Where is the check for 'last member' in the es path?
 
 Is this es check new? If so, should the es check be added in a patch
 before this one, and then extended for SSBO support?
 
  +   if (state-es_shader) {
  +  _mesa_glsl_error(loc, state, unsized array `%s' 
  definition: 
  + only last member of a shader storage 
  block 
  + can be defined as unsized array,
  + fields[i].name);
  +   }
  +}
  + }
  +  }
  +
  
 if (ir_variable *earlier =
 state-symbols-get_variable(this-instance_name)) {
if (!redeclaring_per_vertex) {
  @@ -6003,6 +6043,32 @@ ast_interface_block::hir(exec_list *instructions,
var-data.explicit_binding = 
  this-layout.flags.q.explicit_binding;
var-data.binding = this-layout.binding;
   
  + if (var-type-is_unsized_array()) {
  +if (var-is_in_shader_storage_block()) {
  +   if (!is_unsized_array_last_element(var)) {
  +  _mesa_glsl_error(loc, state, unsized array `%s' 
  definition: 
  +   only last member of a shader storage

Re: [Mesa-dev] [PATCH v3 (part2) 09/56] glsl: Add parser/compiler support for unsized array's length()

2015-08-05 Thread Iago Toral
On Tue, 2015-08-04 at 16:04 -0700, Jordan Justen wrote:
 On 2015-08-04 15:12:06, Jordan Justen wrote:
  On 2015-07-14 00:46:11, Iago Toral Quiroga wrote:
   From: Samuel Iglesias Gonsalvez sigles...@igalia.com
   
   It also creates unop and triop expressions to tell the driver to
   calculate the unsized array length.
   
   It is needed two expressions to do the calculation:
  
  Two expressions are needed to do the calculation:
  
   * The unop expression saves the ir_rvalue* whose length should be
 calculated.
   * Afterwards, this unop is going to be processed by a lowering pass
   that will convert it to a triop that includes the block index,
  
  line 'that' up with 'Afterwards'.
  
   offset of the variable inside the shader storage block and the array
   stride. All of them are needed for length() calculation following
   GL_ARB_shader_storage_buffer spec:
   
  array.length() =
 max((buffer_object_size - offset_of_array) / stride_of_array, 0)
   
   Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
   ---
src/glsl/hir_field_selection.cpp  | 15 
   +++
src/glsl/ir.cpp   |  9 +
src/glsl/ir.h | 19 
   ++-
 
 Hmm, I added an r-b below, but I had a question.
 
 Rather than IR nodes, should this be handled with intrinsics like the
 SSBO references?

I think that in this case it is not needed. The reason why we wanted to
avoid using expressions for ssbo loads is that they were not constant
(i.e. the same ssbo load could return different values at different
places in the shader program). However, my understanding is that calling
length() on unsized array always returns the same value. By looking at
the formula referenced above, the size of an unsized array is given by
taking the available space at the tail of the buffer (which is a fixed
constant value that won't change during shader execution), remove the
offset of the unsized array into the buffer (which is also fixed) and
divide by the stride (also fixed). So in this case it looks like using
expressions is exactly what we want.

Iago

 -Jordan
 
src/glsl/ir_validate.cpp  | 13 +
src/glsl/link_uniforms.cpp|  8 +++-
.../drivers/dri/i965/brw_fs_channel_expressions.cpp   |  2 ++
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  8 
src/mesa/program/ir_to_mesa.cpp   |  2 ++
src/mesa/state_tracker/st_glsl_to_tgsi.cpp|  5 +
9 files changed, 75 insertions(+), 6 deletions(-)
   
   diff --git a/src/glsl/hir_field_selection.cpp 
   b/src/glsl/hir_field_selection.cpp
   index 0fa9768..fedbef0 100644
   --- a/src/glsl/hir_field_selection.cpp
   +++ b/src/glsl/hir_field_selection.cpp
   @@ -71,10 +71,17 @@ _mesa_ast_field_selection_to_hir(const ast_expression 
   *expr,
_mesa_glsl_error(loc, state, length method takes no 
   arguments);

 if (op-type-is_array()) {
   -if (op-type-is_unsized_array())
   -   _mesa_glsl_error(loc, state, length called on unsized 
   array);
   -
   -result = new(ctx) ir_constant(op-type-array_size());
   +if (op-type-is_unsized_array()) {
   +   if (!state-ARB_shader_storage_buffer_object_enable) {
   +  _mesa_glsl_error(loc, state, length called on 
   unsized array
   + only available with 
   +
   ARB_shader_storage_buffer_object);
   +   }
   +   /* Calculate length of an unsized array in run-time */
   +   result = new(ctx) 
   ir_expression(ir_unop_ssbo_unsized_array_length, op);
   +} else {
   +   result = new(ctx) ir_constant(op-type-array_size());
   +}
 } else if (op-type-is_vector()) {
if (state-ARB_shading_language_420pack_enable) {
   /* .length() returns int. */
   diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
   index 390e8f3..48c91ee 100644
   --- a/src/glsl/ir.cpp
   +++ b/src/glsl/ir.cpp
   @@ -340,6 +340,9 @@ ir_expression::ir_expression(int op, ir_rvalue *op0)
  this-type = glsl_type::get_instance(GLSL_TYPE_INT,
  op0-type-vector_elements, 1);
  break;
   +   case ir_unop_ssbo_unsized_array_length:
   +  this-type = glsl_type::int_type;
   +  break;

   default:
  assert(!not reached: missing automatic type setup for 
   ir_expression);
   @@ -471,6 +474,10 @@ ir_expression::ir_expression(int op, ir_rvalue *op0, 
   ir_rvalue *op1,
  this-type = op1-type;
  break;

   +   case ir_triop_ssbo_unsized_array_length:
   +  this-type = glsl_type::int_type;
   +  break

Re: [Mesa-dev] [PATCH v3 (part2) 08/56] glsl: add support for unsized arrays in shader storage blocks

2015-08-05 Thread Iago Toral
On Tue, 2015-08-04 at 14:08 -0700, Jordan Justen wrote:
 On 2015-07-14 00:46:10, Iago Toral Quiroga wrote:
  From: Samuel Iglesias Gonsalvez sigles...@igalia.com
  
  They only can be defined in the last position of the shader
  storage blocks.
  
  When an unsized array is used in different shaders, it might be
  converted in different sized arrays, avoid get a linker error
  in that case.
  
  v2:
  - Rework error condition and error messages (Timothy Arteri)
 
 Arceri
 
  Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
  ---
   src/glsl/ast_array_index.cpp |   5 +-
   src/glsl/ast_to_hir.cpp  |  66 ++
   src/glsl/ir.cpp  |   1 +
   src/glsl/ir.h|  14 ++
   src/glsl/linker.cpp  | 107 
  ---
   5 files changed, 155 insertions(+), 38 deletions(-)
  
  diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp
  index 2c79002..8a1ae67 100644
  --- a/src/glsl/ast_array_index.cpp
  +++ b/src/glsl/ast_array_index.cpp
  @@ -182,8 +182,9 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
 if (array-type-is_array())
update_max_array_access(array, idx, loc, state);
  } else if (const_index == NULL  array-type-is_array()) {
  -  if (array-type-is_unsized_array()) {
  -_mesa_glsl_error(loc, state, unsized array index must be 
  constant);
  +  if (array-type-is_unsized_array() 
  +  array-variable_referenced()-data.mode != 
  ir_var_shader_storage) {
  +_mesa_glsl_error(loc, state, unsized array index must be 
  constant);
 } else if (array-type-fields.array-is_interface()
 array-variable_referenced()-data.mode == 
  ir_var_uniform
 !state-is_version(400, 0)  
  !state-ARB_gpu_shader5_enable) {
  diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
  index ca30dbc..1b4ee22 100644
  --- a/src/glsl/ast_to_hir.cpp
  +++ b/src/glsl/ast_to_hir.cpp
  @@ -5599,6 +5599,19 @@ private:
  bool found;
   };
   
  +static bool
  +is_unsized_array_last_element(ir_variable *v)
  +{
  +   const glsl_type *interface_type = v-get_interface_type();
  +   int length = interface_type-length;
  +
  +   assert(v-type-is_unsized_array());
  +
  +   /* Check if it is the last element of the interface */
  +   if (strcmp(interface_type-fields.structure[length-1].name, v-name) == 
  0)
  +  return true;
  +   return false;
  +}
   
   ir_rvalue *
   ast_interface_block::hir(exec_list *instructions,
  @@ -5913,6 +5926,33 @@ ast_interface_block::hir(exec_list *instructions,
 if (state-stage == MESA_SHADER_GEOMETRY  var_mode == 
  ir_var_shader_in)
handle_geometry_shader_input_decl(state, loc, var);
   
  +  for (unsigned i = 0; i  num_variables; i++) {
  + if (fields[i].type-is_unsized_array()) {
  +if (var_mode == ir_var_shader_storage) {
  +   if (i != (num_variables - 1)) {
  +  _mesa_glsl_error(loc, state, unsized array `%s' 
  definition: 
  +   only last member of a shader storage 
  block 
  +   can be defined as unsized array,
  +   fields[i].name);
  +   }
  +} else {
  +   /* From GLSL ES 3.10 spec, section 4.1.9 Arrays:
  +   *
  +   * If an array is declared as the last member of a shader 
  storage
  +   * block and the size is not specified at compile-time, it is
  +   * sized at run-time. In all other cases, arrays are sized 
  only
  +   * at compile-time.
  +   */
 
 Where is the check for 'last member' in the es path?
 
 Is this es check new? If so, should the es check be added in a patch
 before this one, and then extended for SSBO support?

The else branch here where the if (state-es_shader) is included is
for things that are not SSBOs. What I think Samuel was trying to do here
is to produce an error for any unsized array declaration in the ES path
that is not inside an SSBO (if it is inside the SSBO, then it goes
though the if branch and that checks that it is the last member in the
definition.

If this is what he intended to do, then I agree that it would probably
make sense to have the check included in a separate patch before this
one since it is unrelated to SSBOs, then modify that code with this
patch to add the ssbo path included in the if branch.

I'll let Samuel have a look at this when he is back from holidays, since
since he might have other reasons for doing it like this.

  +   if (state-es_shader) {
  +  _mesa_glsl_error(loc, state, unsized array `%s' 
  definition: 
  + only last member of a shader storage 
  block 
  + can be defined as unsized array,
  + fields

Re: [Mesa-dev] [PATCH 16/17] glsl: Add link time checks for GLSL precision qualifiers

2015-08-05 Thread Iago Toral
On Thu, 2015-07-30 at 12:33 +0200, Iago Toral wrote:
 On Wed, 2015-07-29 at 15:21 -0700, Ian Romanick wrote:
  On 07/29/2015 07:01 AM, Samuel Iglesias Gonsalvez wrote:
   From: Iago Toral Quiroga ito...@igalia.com
   
   Currently, we only consider precision qualifiers at compile-time. This 
   patch
   adds precision information to ir_variable so we can also do link time 
   checks.
   Specifically, from the GLSL ES3 spec, 4.5.3 Precision Qualifiers:
   
   The same uniform declared in different shaders that are linked together
must have the same precision qualification.
   
   Notice that this patch will check the above also for GLSL ES globals that 
   are
   not uniforms. This is not explicitly stated in the spec, but seems to be
   the only consistent choice since we can only have one definition of a 
   global
   all its declarations should be identical, including precision qualifiers.
  
  That's not right.  Global variables from different stages that are not
  inputs/outputs or uniforms are distinct... they don't even have to be
  the same type.  ES shaders only allow a single compliation unit per
  stage, so we don't have to worry about inter-stage globals.
 
 Ugh, sorry, the commit log does not make a good job at explaining the
 situation. This patch does not produce a linker error for globals that
 are not uniforms, I only meant to say that for globals *in interface
 blocks*, for which we are producing a linker error in the case of type
 mismatches, precision will also be considered to decide if the types
 mismatch.
 
 Sorry for being so imprecise in the description, I'll fix the commit
 log. I guess with this clarification there are no issues with this,
 right?

Timothy pointed out to me recently that the GLSL ES spec has this
mention:

Precision qualifiers for outputs in one shader matched to inputs in
another shader need not match when both shaders are linked into the same
program. When both shaders are in separate programs, mismatched 
precision qualifiers will result in a program interface mismatch that
will result in program pipeline validation failures, as described in
section 7.4.1 (“Shader Interface Matching”) of the OpenGL ES 3.1
Specification.

This makes things a bit more complicated I guess, in any case it makes
clear that this is not exactly what the ES spec expects, so I need to
rethink how to approach this.

   These checks don't affect desktop GLSL shaders because we ignore precision
   information in this case (all variables have precision 
   GLSL_PRECISION_NONE).
   
   Fixes the following 5 dEQP tests:
   dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_1
   dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_2
   dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_3
   dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_4
   dEQP-GLES3.functional.shaders.linkage.uniform.block.precision_mismatch
   ---
src/glsl/linker.cpp | 34 +-
1 file changed, 33 insertions(+), 1 deletion(-)
   
   diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
   index 12b7780..fd68f43 100644
   --- a/src/glsl/linker.cpp
   +++ b/src/glsl/linker.cpp
   @@ -958,13 +958,22 @@ cross_validate_globals(struct gl_shader_program 
   *prog,
  if (var-type-is_record()  
   existing-type-is_record()
   existing-type-record_compare(var-type)) {
 existing-type = var-type;
   -  } else {
   +  } else if (strcmp(var-type-name, 
   existing-type-name)) {
 linker_error(prog, %s `%s' declared as type 
  `%s' and type `%s'\n,
  mode_string(var),
  var-name, var-type-name,
  existing-type-name);
 return;
   +  } else {
   + /* The global is declared with the same type name 
   but the type
   +  * declarations mismatch (e.g. the same struct type 
   name, but
   +  * the actual struct declarations mismatch).
   +  */
   + linker_error(prog, %s `%s' declared with 
   mismatching definitions 
   +  of type `%s'\n,
   +  mode_string(var), var-name, 
   var-type-name);
   + return;
  }
}
 }
   @@ -1121,6 +1130,29 @@ cross_validate_globals(struct gl_shader_program 
   *prog,
mode_string(var), var-name);
   return;
}
   +/* From the GLSL ES3 spec, 4.5.3 Precision qualifiers:
   + *
   + * The same uniform declared in different shaders that are 
   linked

Re: [Mesa-dev] [PATCH] nir: add missing type to type_size_vec4()

2015-08-05 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Wed, 2015-08-05 at 20:31 +1000, Timothy Arceri wrote:
 Cc: Iago Toral Quiroga ito...@igalia.com
 Cc: Jason Ekstrand jason.ekstr...@intel.com
 ---
  src/glsl/nir/nir_lower_io.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c
 index 3c17929..71bfd34 100644
 --- a/src/glsl/nir/nir_lower_io.c
 +++ b/src/glsl/nir/nir_lower_io.c
 @@ -62,6 +62,8 @@ type_size_vec4(const struct glsl_type *type)
   size += type_size_vec4(glsl_get_struct_field(type, i));
}
return size;
 +   case GLSL_TYPE_SUBROUTINE:
 +  return 1;
 case GLSL_TYPE_SAMPLER:
return 0;
 case GLSL_TYPE_ATOMIC_UINT:


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 49/59] glsl: Do not allow assignments to read-only variables

2015-08-05 Thread Iago Toral
On Wed, 2015-08-05 at 20:04 +1000, Timothy Arceri wrote:
 On Wed, 2015-08-05 at 10:30 +0200, Iago Toral Quiroga wrote:
  ---
   src/glsl/ast_to_hir.cpp | 9 -
   1 file changed, 8 insertions(+), 1 deletion(-)
  
  diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
  index e834a46..518612d 100644
  --- a/src/glsl/ast_to_hir.cpp
  +++ b/src/glsl/ast_to_hir.cpp
  @@ -811,8 +811,15 @@ do_assignment(exec_list *instructions, struct 
  _mesa_glsl_parse_state *state,
  }
   
  ir_variable *lhs_var = lhs-variable_referenced();
  -   if (lhs_var)
  +   if (lhs_var) {
  +  if (lhs_var-data.image_read_only) {
 
 It looks like data.read_only is always set to true for images so wouldn't this
 already be caught already by the existing read-only check?
 
   else if (lhs_var != NULL  lhs_var-data.read_only) {
  _mesa_glsl_error(lhs_loc, state,
   assignment to read-only variable '%s',
   lhs_var-name);

Not as it is now, because with SSBOs we only set image_read_only and not
read_only when the readonly qualifier is used. I suppose this is what we
are expected to do since the SSBO spec says that behavior for these
qualifiers on SSBOs is the same as for images:
https://www.opengl.org/registry/specs/ARB/shader_storage_buffer_object.txt

Modify Section 4.10, Memory Qualifiers (p. 71)
(...)
(insert after third paragraph, p. 73) The memory qualifiers coherent,
volatile, restrict, readonly, and writeonly may be used in the
declaration of buffer variables (i.e., members of shader storage blocks).
When a buffer variable is declared with a memory qualifier, the behavior
specified for memory accesses involving image variables described above
applies identically to memory accesses involving that buffer variable.  It
is an error to assign to a buffer variable qualified with readonly or to
read from a buffer variable qualified with writeonly.

What is a bit confusing for me is that images seem to set
image_read_only depending on whether we used the readonly qualifier or
not (like ssbos) but then they also set read_only to true
unconditionally, so I guess there is a difference between both fields,
but I don't know what it is exactly, specially since you can also use
writeonly on images, for example.

In any case, since we have both read_only and image_read_only in
ir_variable at present, I think it makes sense to have checks for both
of them, if one of them ends up being redundant the right thing to do
would be to kill it completely I guess, otherwise it only gets (even)
more confusing.

Iago

 
  + _mesa_glsl_error(lhs_loc, state,
  +  assignment to read-only variable `%s',
  +  lhs_var-name);
  + error_emitted = true;
  +  }
 lhs_var-data.assigned = true;
  +   }
   
  if (!error_emitted) {
 if (non_lvalue_description != NULL) {
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 (part2) 09/56] glsl: Add parser/compiler support for unsized array's length()

2015-08-05 Thread Iago Toral
On Wed, 2015-08-05 at 13:38 +0300, Francisco Jerez wrote:
 Iago Toral ito...@igalia.com writes:
 
  On Tue, 2015-08-04 at 16:04 -0700, Jordan Justen wrote:
  On 2015-08-04 15:12:06, Jordan Justen wrote:
   On 2015-07-14 00:46:11, Iago Toral Quiroga wrote:
From: Samuel Iglesias Gonsalvez sigles...@igalia.com

It also creates unop and triop expressions to tell the driver to
calculate the unsized array length.

It is needed two expressions to do the calculation:
   
   Two expressions are needed to do the calculation:
   
* The unop expression saves the ir_rvalue* whose length should be
  calculated.
* Afterwards, this unop is going to be processed by a lowering pass
that will convert it to a triop that includes the block index,
   
   line 'that' up with 'Afterwards'.
   
offset of the variable inside the shader storage block and the array
stride. All of them are needed for length() calculation following
GL_ARB_shader_storage_buffer spec:

   array.length() =
  max((buffer_object_size - offset_of_array) / stride_of_array, 0)

Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
---
 src/glsl/hir_field_selection.cpp  | 15 
+++
 src/glsl/ir.cpp   |  9 +
 src/glsl/ir.h | 19 
++-
  
  Hmm, I added an r-b below, but I had a question.
  
  Rather than IR nodes, should this be handled with intrinsics like the
  SSBO references?
 
  I think that in this case it is not needed. The reason why we wanted to
  avoid using expressions for ssbo loads is that they were not constant
  (i.e. the same ssbo load could return different values at different
  places in the shader program). However, my understanding is that calling
  length() on unsized array always returns the same value. By looking at
  the formula referenced above, the size of an unsized array is given by
  taking the available space at the tail of the buffer (which is a fixed
  constant value that won't change during shader execution), remove the
  offset of the unsized array into the buffer (which is also fixed) and
  divide by the stride (also fixed). So in this case it looks like using
  expressions is exactly what we want.
 
 
 I guess it would be nice to implement it as an intrinsic for consistency
 with the other SSBO operations, but I guess an expression node will work
 in practice too for now for the reasons you mention.
 
 There's one thing that seems rather non-orthogonal to me.  Is there any
 reason you are hardcoding the array length formula into the instruction?
 It seems weird to have an instruction that calculates the length of an
 unsized array while the back-end basically knows nothing about the
 individual variables part of the SSBO, because they've all been lowered
 to an offset+size range within a single SSBO buffer already.  The same
 goes for the VS_OPCODE_UNSIZED_ARRAY_LENGTH back-end instruction that
 knows nothing about unsized arrays, it just returns the size of the
 whole buffer.
 
 AFAICT it would simplify both the IR and the back-end implementations if
 you had a simple get_ssbo_size intrinsic that would take a single SSBO
 index argument and return the total size of the buffer.  That would help
 because otherwise all back-ends are going to need to re-implement
 basically the same formula.

Yep, that makes more sense to me as well. For that formula, the IR
already knows the offset and the stride, it only needs the total size of
the ssbo, and that's the only thing we should have to implement in the
backends.

That said, I see no reason for that to be an intrinsic, it can perfectly
be a unop expression since it returns a constant value, pretty much like
UBO loads, which should be even easier than an intrinsic (mostly because
lowering to an intrinsic involves creating function signatures, etc).

Iago

  Iago
 
  -Jordan
  
 src/glsl/ir_validate.cpp  | 13 
+
 src/glsl/link_uniforms.cpp|  8 +++-
 .../drivers/dri/i965/brw_fs_channel_expressions.cpp   |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  8 
 src/mesa/program/ir_to_mesa.cpp   |  2 ++
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp|  5 +
 9 files changed, 75 insertions(+), 6 deletions(-)

diff --git a/src/glsl/hir_field_selection.cpp 
b/src/glsl/hir_field_selection.cpp
index 0fa9768..fedbef0 100644
--- a/src/glsl/hir_field_selection.cpp
+++ b/src/glsl/hir_field_selection.cpp
@@ -71,10 +71,17 @@ _mesa_ast_field_selection_to_hir(const 
ast_expression *expr,
 _mesa_glsl_error(loc, state, length method takes no 
arguments);
 
  if (op-type-is_array()) {
-if (op-type-is_unsized_array

Re: [Mesa-dev] [PATCH v4 (part2) 49/59] glsl: Do not allow assignments to read-only variables

2015-08-05 Thread Iago Toral
On Wed, 2015-08-05 at 22:22 +1000, Timothy Arceri wrote:
 On Wed, 2015-08-05 at 13:45 +0200, Iago Toral wrote:
  On Wed, 2015-08-05 at 20:04 +1000, Timothy Arceri wrote:
   On Wed, 2015-08-05 at 10:30 +0200, Iago Toral Quiroga wrote:
---
 src/glsl/ast_to_hir.cpp | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index e834a46..518612d 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -811,8 +811,15 @@ do_assignment(exec_list *instructions, struct 
_mesa_glsl_parse_state *state,
}
 
ir_variable *lhs_var = lhs-variable_referenced();
-   if (lhs_var)
+   if (lhs_var) {
+  if (lhs_var-data.image_read_only) {
   
   It looks like data.read_only is always set to true for images so wouldn't 
   this
   already be caught already by the existing read-only check?
   
 else if (lhs_var != NULL  lhs_var-data.read_only) {
_mesa_glsl_error(lhs_loc, state,
 assignment to read-only variable '%s',
 lhs_var-name);
  
  Not as it is now, because with SSBOs we only set image_read_only and not
  read_only when the readonly qualifier is used. I suppose this is what we
  are expected to do since the SSBO spec says that behavior for these
  qualifiers on SSBOs is the same as for images:
  https://www.opengl.org/registry/specs/ARB/shader_storage_buffer_object.txt
  
  Modify Section 4.10, Memory Qualifiers (p. 71)
  (...)
  (insert after third paragraph, p. 73) The memory qualifiers coherent,
  volatile, restrict, readonly, and writeonly may be used in the
  declaration of buffer variables (i.e., members of shader storage blocks).
  When a buffer variable is declared with a memory qualifier, the behavior
  specified for memory accesses involving image variables described above
  applies identically to memory accesses involving that buffer variable.  It
  is an error to assign to a buffer variable qualified with readonly or to
  read from a buffer variable qualified with writeonly.
  
  What is a bit confusing for me is that images seem to set
  image_read_only depending on whether we used the readonly qualifier or
  not (like ssbos) but then they also set read_only to true
  unconditionally, so I guess there is a difference between both fields,
 
 Asking what the difference is was originally going to be my first question to
 you :)
 
  but I don't know what it is exactly, specially since you can also use
  writeonly on images, for example.
 
 So I really dont know much about images but after some reading the conclusion
 I've come to is the qualifiers (image_read_only) are meant to limit how you
 can use imageStore(), imageLoad() and imageAtomic*() etc.

Looking at ARB_shader_image_load_store that seems consistent...  In that
case I imagine that we could just set read_only for buffer variables
with the readonly qualifier instead of image_read_only and drop this
patch. We will need to add, at least, write_only to ir_variable as well
I guess... I imagine that the 3 other fields (image_coherent,
image_restrict, image_volatile) do not have image-specific semantics
like image_read_only and image_write_oly and can be shared with ssbos
we do not have to replicate them in ir_variable as well (in that case we
might want to rename them so it is clear that image_read_only and
image_write_only really are special and specific to images)

Curro, what do you think?

 On the other hand read_only is the usual uniform restriction stoping you from
 assigning to the variable directly e.g myImage = 1; which is why its always
 set to true.
 
 If I'm correct I dont think this patch is needed.
 
  
  In any case, since we have both read_only and image_read_only in
  ir_variable at present, I think it makes sense to have checks for both
  of them, if one of them ends up being redundant the right thing to do
  would be to kill it completely I guess, otherwise it only gets (even)
  more confusing.
  
  Iago
  
   
+ _mesa_glsl_error(lhs_loc, state,
+  assignment to read-only variable `%s',
+  lhs_var-name);
+ error_emitted = true;
+  }
   lhs_var-data.assigned = true;
+   }
 
if (!error_emitted) {
   if (non_lvalue_description != NULL) {
   
  
  
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 05/23] glsl: Fail to link if inter-stage input/outputs are not assigned to stream 0

2015-07-30 Thread Iago Toral
On Wed, 2015-07-29 at 21:58 +0200, Marek Olšák wrote:
 Hi,
 
 Where does the spec say we should fail to link? I don't see such a
 statement there.

I have reviewed ARB_gpu_shader5 and I don't see any specific mentions to
what should be done in this particular case. That said, isn't this the
logical thing to do? It is a programming error to link an FS input to a
GS output bound to a non-zero stream and at best the program would have
undefined behavior if the FS input is used. Hiding this from the
developer silently does not seem to be a good idea in any case, whatever
the developer was trying to accomplish he is doing it wrong.

 It looks like varyings with stream  0 should not be linked with the
 fragment shader.

How is this better?

FWIW, the proprietary nVidia driver also fails to link in this case with
this error:

output 'var_name' is associated with an input with a non-zero stream,
which is not allowed

Iago

 Marek
 
 On Wed, Jun 18, 2014 at 11:51 AM, Iago Toral Quiroga ito...@igalia.com 
 wrote:
  Outputs that are linked to inputs in the next stage must be output to 
  stream 0,
  otherwise we should fail to link.
  ---
   src/glsl/link_varyings.cpp | 8 
   1 file changed, 8 insertions(+)
 
  diff --git a/src/glsl/link_varyings.cpp b/src/glsl/link_varyings.cpp
  index 9725a43..3b20594 100644
  --- a/src/glsl/link_varyings.cpp
  +++ b/src/glsl/link_varyings.cpp
  @@ -1345,6 +1345,14 @@ assign_varying_locations(struct gl_context *ctx,
if (input_var || (prog-SeparateShader  consumer == NULL)) {
   matches.record(output_var, input_var);
}
  +
  + /* Only stream 0 outputs can be consumed in the next stage */
  + if (input_var  output_var-data.stream != 0) {
  +linker_error(prog, output %s is assigned to stream=%d but 
  + is linked to an input, which requires stream=0,
  + output_var-name, output_var-data.stream);
  +return false;
  + }
 }
  } else {
 /* If there's no producer stage, then this must be a separable 
  program.
  --
  1.9.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 05/23] glsl: Fail to link if inter-stage input/outputs are not assigned to stream 0

2015-07-30 Thread Iago Toral
On Thu, 2015-07-30 at 09:43 +0200, Marek Olšák wrote:
 On Thu, Jul 30, 2015 at 8:49 AM, Iago Toral ito...@igalia.com wrote:
  On Wed, 2015-07-29 at 21:58 +0200, Marek Olšák wrote:
  Hi,
 
  Where does the spec say we should fail to link? I don't see such a
  statement there.
 
  I have reviewed ARB_gpu_shader5 and I don't see any specific mentions to
  what should be done in this particular case. That said, isn't this the
  logical thing to do? It is a programming error to link an FS input to a
  GS output bound to a non-zero stream and at best the program would have
  undefined behavior if the FS input is used. Hiding this from the
  developer silently does not seem to be a good idea in any case, whatever
  the developer was trying to accomplish he is doing it wrong.
 
  It looks like varyings with stream  0 should not be linked with the
  fragment shader.
 
  How is this better?
 
 The spec simply says that varyings with stream  0 are not passed to
 the rasterizer, which means the matching fragment shader inputs should
 be uninitialized, but it should still be possible to capture the
 varyings with transform feedback. That's how I understand the spec.

FWIW, I have tested this in the proprietary nVidia driver and the result
is the same, it fails to link even if that GS output is captured by TF.

My interpretation of the spec is that since GS outputs to stream 0 are
not passed down the pipeline they simply do not exist in the eyes of the
FS, that is, I see this situation as the same in which we declare an
input in the FS that is not declared as output in the GS. But it is true
that the spec does not address this situation explicitly, so I think
both interpretations could be valid.

I still think that failing to link is better though. If we report a link
failure the developer knows what is going on and the fix is trivial,
otherwise they will run into incorrect rendering, they will have to
figure out what is going and eventually fix the code anyway...

Since at least nVidia proprietary is failing to link as well in these
scenarios I guess our chances of running into shaders that we fail to
link for this reason and were expected to link properly are pretty small
too.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/17] glsl: Add precision information to ir_variable

2015-07-30 Thread Iago Toral
On Wed, 2015-07-29 at 15:16 -0700, Ian Romanick wrote:
 On 07/29/2015 07:01 AM, Samuel Iglesias Gonsalvez wrote:
  From: Iago Toral Quiroga ito...@igalia.com
  
  We will need this later on when we implement proper support for precision
  qualifiers in the drivers and also to do link time checks for uniforms as
  indicated by the spec.
  
  This patch also adds compile-time checks for variables without precision
  information (currently, Mesa only checks that a default precision is set
  for floats in fragment shaders).
  
  As indicated by Ian, the addition of the precision information to
  ir_variable has been done using a bitfield and pahole to identify an
  available hole so that memory requirements for ir_variable stay the same.
  ---
   src/glsl/ast_to_hir.cpp | 316 
  +++-
   src/glsl/glsl_types.cpp |   4 +
   src/glsl/glsl_types.h   |  12 ++
   src/glsl/ir.h   |  13 ++
   4 files changed, 288 insertions(+), 57 deletions(-)
  
  diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
  index 789b2bc..8b170c2 100644
  --- a/src/glsl/ast_to_hir.cpp
  +++ b/src/glsl/ast_to_hir.cpp
  @@ -1993,6 +1993,41 @@ process_array_type(YYLTYPE *loc, const glsl_type 
  *base,
  return array_type;
   }
   
  +static bool
  +precision_qualifier_allowed(const glsl_type *type)
 
 This function is just moved up from below?  I would have been tempted to
 put that in a separate patch to make it more obvious that there no
 changes. *shrug*
 
  +{
  +   /* Precision qualifiers apply to floating point, integer and sampler
  +* types.
  +*
  +* Section 4.5.2 (Precision Qualifiers) of the GLSL 1.30 spec says:
  +*Any floating point or any integer declaration can have the type
  +*preceded by one of these precision qualifiers [...] Literal
  +*constants do not have precision qualifiers. Neither do Boolean
  +*variables.
  +*
  +* Section 4.5 (Precision and Precision Qualifiers) of the GLSL 1.30
  +* spec also says:
  +*
  +* Precision qualifiers are added for code portability with OpenGL
  +* ES, not for functionality. They have the same syntax as in OpenGL
  +* ES.
  +*
  +* Section 8 (Built-In Functions) of the GLSL ES 1.00 spec says:
  +*
  +* uniform lowp sampler2D sampler;
  +* highp vec2 coord;
  +* ...
  +* lowp vec4 col = texture2D (sampler, coord);
  +*// texture2D returns lowp
  +*
  +* From this, we infer that GLSL 1.30 (and later) should allow precision
  +* qualifiers on sampler types just like float and integer types.
  +*/
  +   return type-is_float()
  +   || type-is_integer()
  +   || type-is_record()
  +   || type-is_sampler();
  +}
   
   const glsl_type *
   ast_type_specifier::glsl_type(const char **name,
  @@ -2009,31 +2044,172 @@ ast_type_specifier::glsl_type(const char **name,
  return type;
   }
   
  +/**
  + * From the OpenGL ES 3.0 spec, 4.5.4 Default Precision Qualifiers:
  + *
  + * The precision statement
  + *
  + *precision precision-qualifier type;
  + *
  + *  can be used to establish a default precision qualifier. The type field 
  can
  + *  be either int or float or any of the sampler types, (...) If type is 
  float,
  + *  the directive applies to non-precision-qualified floating point type
  + *  (scalar, vector, and matrix) declarations. If type is int, the 
  directive
  + *  applies to all non-precision-qualified integer type (scalar, vector, 
  signed,
  + *  and unsigned) declarations.
  + *
  + * We use the symbol table to keep the values of the default precisions for
  + * each 'type' in each scope and we use the 'type' string from the 
  precision
  + * statement as key in the symbol table. When we want to retrieve the 
  default
  + * precision associated with a given glsl_type we need to know the type 
  string
  + * associated with it. This is what this function returns.
  + */
  +static const char *
  +get_type_name_for_precision_qualifier(const glsl_type *type)
  +{
  +   switch (type-base_type) {
  +   case GLSL_TYPE_FLOAT:
  +  return float;
  +   case GLSL_TYPE_UINT:
  +   case GLSL_TYPE_INT:
  +  return int;
  +   case GLSL_TYPE_SAMPLER: {
  +  bool array = type-sampler_array;
  +  bool shadow = type-sampler_shadow;
  +  switch (type-sampler_type) {
  +  case GLSL_TYPE_FLOAT:
  + switch (type-sampler_dimensionality) {
  + case GLSL_SAMPLER_DIM_1D:
  +if (!array  !shadow)
  +   return sampler1D;
  +if (array  !shadow)
  +   return sampler1DArray;
  +if (!array  shadow)
  +   return sampler1DShadow;
  +return sampler1DArrayShadow;
  + case GLSL_SAMPLER_DIM_2D:
  +if (!array  !shadow)
  +   return sampler2D;
  +if (array  !shadow

Re: [Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

2015-07-31 Thread Iago Toral
On Fri, 2015-07-31 at 13:12 +0300, Francisco Jerez wrote:
 Iago Toral ito...@igalia.com writes:
 
  On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote:
  Iago Toral Quiroga ito...@igalia.com writes:
  
   When we have code such as this:
  
   mov vgrf1.0.x:F, vgrf2.:F
   mov vgrf3.0.x:F, vgrf1.:F
   ...
   mov vgrf3.0.x:F, vgrf1.:F
  
   And vgrf1 is chosen for spilling, we can emit this:
  
   mov vgrf1.0.x:F, vgrf2.:F
   gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
   mov vgrf3.0.x:F, vgrf1.:F
   ...
   gen4_scratch_read vgrf4.0.x:F, 22D
   mov vgrf3.0.x:F, vgrf4.:F
  
   Instead of this:
  
   mov vgrf1.0.x:F, vgrf2.:F
   gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
   gen4_scratch_read vgrf4.0.x:F, 22D
   mov vgrf3.0.x:F, vgrf4.:F
   ...
   gen4_scratch_read vgrf5.0.x:F, 22D
   mov vgrf3.0.x:F, vgrf5.:F
  
   And save one scratch read while still preserving the benefits of
   spilling the register.
  
   In general, we avoid emitting scratch reads for as long as the next 
   instruction
   keeps reading the spilled register. This should not harm the benefit of
   spilling the register because gains for register allocation only come 
   when we
   have chunks of program code where the register is alive but not really 
   used
   (because these are the points where we could effectively use that 
   register for
   another purpose if we spilled it), so as long as consecutive 
   instructions use
   that register we can avoid the scratch reads without losing anything.
   ---
.../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 37 
   +-
1 file changed, 36 insertions(+), 1 deletion(-)
  
   diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
   b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
   index cff5406..fd56dae 100644
   --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
   +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
   @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
   unsigned int spill_offset = last_scratch++;

   /* Generate spill/unspill instructions for the objects being 
   spilled. */
   +   vec4_instruction *spill_write_inst = NULL;
   foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
   +  /* We don't spill registers used for scratch */
   +  if (inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
   +  inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
   + continue;
   +
  int scratch_reg = -1;
   +  bool spill_reg_was_read = false;
  for (unsigned int i = 0; i  3; i++) {
 if (inst-src[i].file == GRF  inst-src[i].reg == 
   spill_reg_nr) {
   -if (scratch_reg == -1) {
   +if (!spill_reg_was_read) {
   +   spill_reg_was_read = (!inst-predicate ||
   + inst-opcode == BRW_OPCODE_SEL);
   +}
   +
   +/* If we are reading the spilled register right after 
   writing
   + * to it we can skip the scratch read and use directly the
   + * register we used as source for the scratch write. For 
   this
   + * to work we must check that:
   + *
   + * 1) The write is inconditional, that is, it is not 
   predicated or
   + *it is a SEL.
   + * 2) All the channels that we read have been written in 
   that
   + *last write instruction.
   + *
   + * We keep doing this for as long as the next instruction
   + * keeps reading the spilled register and break as soon as 
   we
   + * find an instruction that doesn't.
   + */
   +if (spill_write_inst 
   +(!spill_write_inst-predicate ||
   + spill_write_inst-opcode == BRW_OPCODE_SEL) 
   +((brw_mask_for_swizzle(inst-src[i].swizzle) 
   + ~spill_write_inst-dst.writemask) == 0)) {
   +   scratch_reg = spill_write_inst-dst.reg;
   +} else if (scratch_reg == -1) {
  
  One suggestion: You could factor out the rather complex caching logic
  into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
  vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
  would simply compare scratch_reg with the sources of the current
  instruction (up to src) and the sources and destination of the previous
  non-scratch_read/write instruction.  If there's a match it would check
  that the regioning is compatible with the i-th source and return true in
  that case.  This would have several benefits:
 
  I think this might need to be a bit more complex. The previous inst's
  src[i] might read only a subset of the channels that where loaded into
  scratch_reg so comparing only against that can lead us to think that we
  can't reuse scratch_reg when in fact we can.
 
  I think the process

Re: [Mesa-dev] [PATCH 16/17] glsl: Add link time checks for GLSL precision qualifiers

2015-07-30 Thread Iago Toral
On Wed, 2015-07-29 at 15:21 -0700, Ian Romanick wrote:
 On 07/29/2015 07:01 AM, Samuel Iglesias Gonsalvez wrote:
  From: Iago Toral Quiroga ito...@igalia.com
  
  Currently, we only consider precision qualifiers at compile-time. This patch
  adds precision information to ir_variable so we can also do link time 
  checks.
  Specifically, from the GLSL ES3 spec, 4.5.3 Precision Qualifiers:
  
  The same uniform declared in different shaders that are linked together
   must have the same precision qualification.
  
  Notice that this patch will check the above also for GLSL ES globals that 
  are
  not uniforms. This is not explicitly stated in the spec, but seems to be
  the only consistent choice since we can only have one definition of a global
  all its declarations should be identical, including precision qualifiers.
 
 That's not right.  Global variables from different stages that are not
 inputs/outputs or uniforms are distinct... they don't even have to be
 the same type.  ES shaders only allow a single compliation unit per
 stage, so we don't have to worry about inter-stage globals.

Ugh, sorry, the commit log does not make a good job at explaining the
situation. This patch does not produce a linker error for globals that
are not uniforms, I only meant to say that for globals *in interface
blocks*, for which we are producing a linker error in the case of type
mismatches, precision will also be considered to decide if the types
mismatch.

Sorry for being so imprecise in the description, I'll fix the commit
log. I guess with this clarification there are no issues with this,
right?

Iago

  These checks don't affect desktop GLSL shaders because we ignore precision
  information in this case (all variables have precision GLSL_PRECISION_NONE).
  
  Fixes the following 5 dEQP tests:
  dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_1
  dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_2
  dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_3
  dEQP-GLES3.functional.shaders.linkage.uniform.struct.precision_conflict_4
  dEQP-GLES3.functional.shaders.linkage.uniform.block.precision_mismatch
  ---
   src/glsl/linker.cpp | 34 +-
   1 file changed, 33 insertions(+), 1 deletion(-)
  
  diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
  index 12b7780..fd68f43 100644
  --- a/src/glsl/linker.cpp
  +++ b/src/glsl/linker.cpp
  @@ -958,13 +958,22 @@ cross_validate_globals(struct gl_shader_program *prog,
 if (var-type-is_record()  existing-type-is_record()
  existing-type-record_compare(var-type)) {
existing-type = var-type;
  -  } else {
  +  } else if (strcmp(var-type-name, 
  existing-type-name)) {
linker_error(prog, %s `%s' declared as type 
 `%s' and type `%s'\n,
 mode_string(var),
 var-name, var-type-name,
 existing-type-name);
return;
  +  } else {
  + /* The global is declared with the same type name but 
  the type
  +  * declarations mismatch (e.g. the same struct type 
  name, but
  +  * the actual struct declarations mismatch).
  +  */
  + linker_error(prog, %s `%s' declared with mismatching 
  definitions 
  +  of type `%s'\n,
  +  mode_string(var), var-name, 
  var-type-name);
  + return;
 }
 }
  }
  @@ -1121,6 +1130,29 @@ cross_validate_globals(struct gl_shader_program 
  *prog,
   mode_string(var), var-name);
  return;
   }
  +/* From the GLSL ES3 spec, 4.5.3 Precision qualifiers:
  + *
  + * The same uniform declared in different shaders that are 
  linked
  + *  together must have the same precision qualification.
  + *
  + * In the GLSL ES2 spec this was resolved in the issue 
  amendments
  + * (10.3 Precision Qualifiers). The GLSL ES1 spec overlooked 
  this,
  + * but seems like an obvious error since we can only have one
  + * consistent definition of a global.
  + *
  + * The desktop GLSL spec does not include this reference
  + * because precision qualifiers are ignored. We will never
  + * hit this scenario in desktop GLSL though because we always 
  set
  + * the precision of variables to GLSL_PRECISION_NONE.
  + */
  +if (var-data.mode == ir_var_uniform) {
  +   if (existing-data.precision != var-data.precision

Re: [Mesa-dev] [PATCH] mesa/formats: Fix swizzle flipping for big-endian targets

2015-08-11 Thread Iago Toral

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

El 2015-08-11 14:25, Oded Gabbay escribió:
On Mon, Aug 10, 2015 at 9:50 AM, Jason Ekstrand ja...@jlekstrand.net 
wrote:

The swizzle defines where in the format you should look for any given
channel.  When we flip the format around for BE targets, we need to 
change
the destinations of the swizzles, not the sources.  For example, say 
the
format is an RGBX format with a swizzle of xyz1 on LE.  Then it should 
be
wzy1 on BE;  however, the code as it was before, would have made it 
1zyx on

BE which is clearly wrong.

Cc: Iago Toral ito...@igalia.com
Cc: Oded Gabbay oded.gab...@gmail.com
---
 src/mesa/main/formats.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c
index d927073..27590ed 100644
--- a/src/mesa/main/formats.c
+++ b/src/mesa/main/formats.c
@@ -354,14 +354,22 @@ 
mi_mesa_array_format_flip_channels(mesa_array_format format)

   return format;

if (num_channels == 2) {
-  _mesa_array_format_set_swizzle(format, swizzle[1], swizzle[0],
- swizzle[2], swizzle[3]);
+  /* Assert that the swizzle makes sense for 2 channels */
+  for (unsigned i = 0; i  4; i++)
+ assert(swizzle[i] != 2  swizzle[i] != 3);
+
+  static const uint8_t flip_xy[6] = { 1, 0, 2, 3, 4, 5 };
+  _mesa_array_format_set_swizzle(format,
+ flip_xy[swizzle[0]], 
flip_xy[swizzle[1]],
+ flip_xy[swizzle[2]], 
flip_xy[swizzle[3]]);

   return format;
}

if (num_channels == 4) {
-  _mesa_array_format_set_swizzle(format, swizzle[3], swizzle[2],
- swizzle[1], swizzle[0]);
+  static const uint8_t flip[6] = { 3, 2, 1, 0, 4, 5 };
+  _mesa_array_format_set_swizzle(format,
+ flip[swizzle[0]], 
flip[swizzle[1]],
+ flip[swizzle[2]], 
flip[swizzle[3]]);

   return format;
}

--:-)
2.4.3



Reviewed-by: Oded Gabbay oded.gab...@gmail.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/formats: Only do byteswapping for packed formats

2015-08-11 Thread Iago Toral

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

El 2015-08-08 18:04, Jason Ekstrand escribió:

Cc: Iago Toral ito...@igalia.com
Cc: Oded Gabbay oded.gab...@gmail.com
---
 src/mesa/main/formats.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c
index baeb1bf..d927073 100644
--- a/src/mesa/main/formats.c
+++ b/src/mesa/main/formats.c
@@ -372,10 +372,10 @@ uint32_t
 _mesa_format_to_array_format(mesa_format format)
 {
const struct gl_format_info *info = _mesa_get_format_info(format);
-   if (_mesa_little_endian())
-  return info-ArrayFormat;
-   else
+   if (!_mesa_little_endian()  info-Layout == 
MESA_FORMAT_LAYOUT_PACKED)

   return _mesa_array_format_flip_channels(info-ArrayFormat);
+   else
+  return info-ArrayFormat;
 }

 static struct hash_table *format_array_format_table;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 29/59] nir: Implement __intrinsic_store_ssbo

2015-08-06 Thread Iago Toral
On Thu, 2015-08-06 at 11:06 -0700, Connor Abbott wrote:
 On Thu, Aug 6, 2015 at 12:30 AM, Iago Toral ito...@igalia.com wrote:
  On Wed, 2015-08-05 at 12:17 -0700, Connor Abbott wrote:
  On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com 
  wrote:
   ---
src/glsl/nir/glsl_to_nir.cpp  | 36 
src/glsl/nir/nir_intrinsics.h | 12 ++--
2 files changed, 42 insertions(+), 6 deletions(-)
  
   diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
   index 642affd..cbec2df 100644
   --- a/src/glsl/nir/glsl_to_nir.cpp
   +++ b/src/glsl/nir/glsl_to_nir.cpp
   @@ -641,6 +641,8 @@ nir_visitor::visit(ir_call *ir)
 op = nir_intrinsic_image_atomic_comp_swap;
  } else if (strcmp(ir-callee_name(), 
   __intrinsic_memory_barrier) == 0) {
 op = nir_intrinsic_memory_barrier;
   +  } else if (strcmp(ir-callee_name(), __intrinsic_store_ssbo) == 
   0) {
   + op = nir_intrinsic_store_ssbo;
  } else {
 unreachable(not reached);
  }
   @@ -730,6 +732,40 @@ nir_visitor::visit(ir_call *ir)
  }
  case nir_intrinsic_memory_barrier:
 break;
   +  case nir_intrinsic_store_ssbo: {
   + exec_node *param = ir-actual_parameters.get_head();
   + ir_rvalue *block = ((ir_instruction *)param)-as_rvalue();
   +
   + param = param-get_next();
   + ir_rvalue *offset = ((ir_instruction *)param)-as_rvalue();
   +
   + param = param-get_next();
   + ir_rvalue *val = ((ir_instruction *)param)-as_rvalue();
   +
   + param = param-get_next();
   + ir_constant *write_mask = ((ir_instruction 
   *)param)-as_constant();
   + assert(write_mask);
   +
   + /* Check if we need the indirect version */
   + ir_constant *const_offset = offset-as_constant();
   + if (!const_offset) {
   +op = nir_intrinsic_store_ssbo_indirect;
   +ralloc_free(instr);
   +instr = nir_intrinsic_instr_create(shader, op);
   +instr-src[2] = evaluate_rvalue(offset);
   +instr-const_index[0] = 0;
   + } else {
   +instr-const_index[0] = const_offset-value.u[0];
   + }
   +
   + instr-const_index[1] = write_mask-value.u[0];
   +
   + instr-src[0] = evaluate_rvalue(val);
   + instr-num_components = val-type-vector_elements;
   +
   + instr-src[1] = evaluate_rvalue(block);
   + break;
   +  }
  default:
 unreachable(not reached);
  }
   diff --git a/src/glsl/nir/nir_intrinsics.h 
   b/src/glsl/nir/nir_intrinsics.h
   index f264f55..83eeecd 100644
   --- a/src/glsl/nir/nir_intrinsics.h
   +++ b/src/glsl/nir/nir_intrinsics.h
   @@ -176,12 +176,12 @@ LOAD(input, 0, NIR_INTRINSIC_CAN_ELIMINATE | 
   NIR_INTRINSIC_CAN_REORDER)
 * offset.
 */
  
   -#define STORE(name, num_indices, flags) \
   -   INTRINSIC(store_##name, 1, ARR(0), false, 0, 0, num_indices, flags) \
   -   INTRINSIC(store_##name##_indirect, 2, ARR(0, 1), false, 0, 0, \
   +#define STORE(name, extra_srcs, num_indices, flags) \
   +   INTRINSIC(store_##name, extra_srcs, ARR(0, 1), false, 0, 0, 
   num_indices, flags) \
   +   INTRINSIC(store_##name##_indirect, extra_srcs + 1, ARR(0, 1, 1), 
   false, 0, 0, \
 num_indices, flags) \
  
   -STORE(output, 1, 0)
   -/* STORE(ssbo, 2, 0) */
   +STORE(output, 1, 2, 0)
   +STORE(ssbo, 2, 2, 0)
 
  I don't think outputs should have any extra sources, since they only
  take a constant index, plus possibly an indirect source that's already
  covered by the STORE macro. SSBO stores should only have one extra
  source for the block index. Also, we should update the comment above
  to explain this similarly to the paragraph above the loads.
 
  SSBO stores need an extra source for the block index and an extra index
  for a writemask.
 
  I'll leave the STORE() macro as it was and just define SSBO stores using
  INTRINSIC() directly then.
 
 Ok, I see. I don't think you need a separate INTRINSIC(), but right
 now calling the parameter you added extra_srcs is confusing, since
 you're counting the value to be stored, which isn't really extra at
 all -- every store should have one! How about instead, we change the
 STORE macro to have:
 
 - An extra_srcs parameter that contains only sources that are
 actually extra, not counting the value to be stored -- direct stores
 have extra_srcs + 1 sources, and indirect sources have extra_srcs +
 2 sources
 - An extra_indices parameter that contains the extra indices, and
 replace num_indices with extra_indices + 1
 
 Then normal stores have both set to 0, and SSBO stores have both set
 to 1 to indicate the extra block index and writemask.

Sure, sounds good to me.

Iago

 
  
   -LAST_INTRINSIC(store_output_indirect)
   +LAST_INTRINSIC(store_ssbo_indirect)
   --
   1.9.1

Re: [Mesa-dev] [PATCH v3 4/4] i965/vec4: Don't unspill the same register in consecutive instructions

2015-08-07 Thread Iago Toral
On Thu, 2015-08-06 at 18:27 +0300, Francisco Jerez wrote:
 Iago Toral Quiroga ito...@igalia.com writes:
 
  If we have spilled/unspilled a register in the current instruction, avoid
  emitting unspills for the same register in the same instruction or 
  consecutive
  instructions following the current one as long as they keep reading the 
  spilled
  register. This should allow us to avoid emitting costy unspills that come 
  with
  little benefit to register allocation.
 
  Also, update evaluate_spill_costs so that we account for the saved unspills.
  ---
   .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 129 
  +++--
   1 file changed, 121 insertions(+), 8 deletions(-)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
  b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  index 617c988..fed5f4d 100644
  --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  @@ -264,6 +264,95 @@ vec4_visitor::reg_allocate()
  return true;
   }
   
  +/**
  + * When we decide to spill a register, instead of blindly spilling every 
  use,
  + * save unspills when the spill register is used (read) in consecutive
  + * instructions. This can potentially save a bunch of unspills that would
  + * have very little impact in register allocation anyway.
  + *
  + * Notice that we need to account for this behavior when spilling a 
  register
  + * and when evaluating spilling costs. This function is designed so it can
  + * be called from both places and avoid repeating the logic.
  + *
  + *  - When we call this function from spill_reg, we pass in scratch_reg the
  + *actual unspill/spill register that we want to reuse in the current
  + *instruction.
  + *
  + *  - When we call this from evaluate_spill_costs, we pass the register for
  + *which we are evaluating spilling costs.
  + *
  + * In either case, we check if the previous instructions read scratch_reg 
  until
  + * we find an instruction that writes to it (in which case we can reuse
  + * scratch_reg as long as the writemask is compatible with the channels we 
  need
  + * to read in the current instruction) or we hit an instruction that does 
  not
  + * read scratch_reg at all. The latter can only happen when we call this 
  from
  + * evaluate_spill_costs,
 
 Strictly speaking it can also happen when called from spill_reg() for
 the first time in a given sequence of consecutive instructions (in which
 case you correctly return false).

True, I'll fix the comment.

   and means that this is the point at which we first
  + * need the unspill this register for our current instruction. Since all 
  our
  + * unspills read a full vec4, we know that in this case we will have all
  + * the channels available in scratch_reg and we can reuse it.
  + *
  + * In any other case, we can't reuse scratch_reg in the current 
  instruction,
  + * meaning that we will need to unspill it.
  + */
  +static bool
  +can_use_scratch_for_source(const vec4_instruction *inst, unsigned i,
  +   unsigned scratch_reg)
  +{
  +   assert(inst-src[i].file == GRF);
  +
  +   /* If the current instruction is already using scratch_reg in src[n] 
  with
  +* n  i, then we know we can reuse it for src[i] too.
  +*/
  +   for (unsigned n = 0; n  i; n++) {
  +  if (inst-src[n].file == GRF  inst-src[n].reg == scratch_reg)
  + return true;
  +   }
 
 I don't think this is correct in cases where the previous source reused
 the temporary of a previously spilled register with incompatible
 writemask.  You probably want to handle the current instruction
 consistently with the previous ones, i.e. as part of the loop below.

Yes, you're right.

 I suggest you define a variable (e.g. n as you've called it) initially
 equal to i that would determine the number of sources to check for the
 next instruction.  At the end of the loop body it would be re-set to 3,
 what would also cause the destination registers to be checked in
 subsequent iterations.

Sounds good to me.

  +
  +   bool prev_inst_read_scratch_reg = false;
  +   vec4_instruction *prev_inst = (vec4_instruction *) inst-prev;
 
 You can move this declaration into the init statement of the for loop to
 limit its scope.

Ok.

  +   for (; !prev_inst-is_head_sentinel();
  +prev_inst = (vec4_instruction *) prev_inst-prev) {
  +  /* If any previous instruction does not read from or write to 
  scratch_reg
  +   * inconditonally we cannot reuse scratch_reg
  +   */
  +  if (prev_inst-predicate  prev_inst-opcode != BRW_OPCODE_SEL)
  + return false;
 
 I think this is somewhat pessimistic, register fills for a predicated
 instruction won't be predicated AFAIK, so it should be possible to reuse
 them, only the destination of a predicated write cannot be reused.

Yeah, makes sense.

  +
  +  /* If the previous instruction writes to scratch_reg then we can 
  reuse

Re: [Mesa-dev] [PATCH v4 (part2) 45/59] glsl: fix UNIFORM_BUFFER_START or UNIFORM_BUFFER_SIZE query when no buffer object is bound

2015-08-06 Thread Iago Toral
On Thu, 2015-08-06 at 08:53 +0300, Tapani Pälli wrote:
 Reviewed-by: Tapani Pälli tapani.pa...@intel.com
 
 On 08/05/2015 11:30 AM, Iago Toral Quiroga wrote:
  From: Samuel Iglesias Gonsalvez sigles...@igalia.com
 
  According to ARB_uniform_buffer_object spec:
 
  If the parameter (starting offset or size) was not specified when the
buffer object was bound (e.g. if bound with BindBufferBase), or if no
buffer object is bound to index, zero is returned.
 
 BTW it seems that this was changed from OpenGL 4.1 to 4.2 (?) for 
 UNIFORM_BUFFER_SIZE 4.1 core spec still says
 
 If the parameter (starting offset or size) was not specified when the 
 buffer object was bound, zero is returned. If no buffer object is bound 
 to index, -1 is returned
 
 while 4.2 changed If not buffer object is bound to index value to 0, 
 GLES 3.1 also says 0.

Ha, good catch... It seems that the text in the OpenGL 4.2 spec is the
one that persisted, at least I see that in the OpenGL 4.5 spec as well.
I guess it is not worth to special case this only for OpenGL 4.1...

Ian, what do you think?

  Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
  ---
src/mesa/main/get.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
 
  diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
  index e2bfb5f..b89c49f 100644
  --- a/src/mesa/main/get.c
  +++ b/src/mesa/main/get.c
  @@ -1925,7 +1925,8 @@ find_value_indexed(const char *func, GLenum pname, 
  GLuint index, union value *v)
   goto invalid_value;
  if (!ctx-Extensions.ARB_uniform_buffer_object)
   goto invalid_enum;
  -  v-value_int = ctx-UniformBufferBindings[index].Offset;
  +  v-value_int = ctx-UniformBufferBindings[index].Offset  0 ? 0 :
  + ctx-UniformBufferBindings[index].Offset;
  return TYPE_INT;

   case GL_UNIFORM_BUFFER_SIZE:
  @@ -1933,7 +1934,8 @@ find_value_indexed(const char *func, GLenum pname, 
  GLuint index, union value *v)
   goto invalid_value;
  if (!ctx-Extensions.ARB_uniform_buffer_object)
   goto invalid_enum;
  -  v-value_int = ctx-UniformBufferBindings[index].Size;
  +  v-value_int = ctx-UniformBufferBindings[index].Size  0 ? 0 :
  + ctx-UniformBufferBindings[index].Size;
  return TYPE_INT;

   /* ARB_shader_storage_buffer_object */
 
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 44/59] mesa: Add queries for GL_SHADER_STORAGE_BUFFER

2015-08-06 Thread Iago Toral
On Wed, 2015-08-05 at 12:23 -0400, Ilia Mirkin wrote:
 On Wed, Aug 5, 2015 at 4:30 AM, Iago Toral Quiroga ito...@igalia.com wrote:
  These handle querying the buffer name attached to a giving binding point
  as well as the start offset and size of that buffer.
  ---
   src/mesa/main/get.c | 31 +++
   1 file changed, 31 insertions(+)
 
  diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
  index 307a5ff..e2bfb5f 100644
  --- a/src/mesa/main/get.c
  +++ b/src/mesa/main/get.c
  @@ -1044,6 +1044,10 @@ find_custom_value(struct gl_context *ctx, const 
  struct value_desc *d, union valu
  case GL_UNIFORM_BUFFER_BINDING:
 v-value_int = ctx-UniformBuffer-Name;
 break;
  +   /* GL_ARB_shader_storage_buffer_object */
  +   case GL_SHADER_STORAGE_BUFFER_BINDING:
  +  v-value_int = ctx-ShaderStorageBuffer-Name;
  +  break;
  /* GL_ARB_timer_query */
  case GL_TIMESTAMP:
 if (ctx-Driver.GetTimestamp) {
  @@ -1932,6 +1936,33 @@ find_value_indexed(const char *func, GLenum pname, 
  GLuint index, union value *v)
 v-value_int = ctx-UniformBufferBindings[index].Size;
 return TYPE_INT;
 
  +   /* ARB_shader_storage_buffer_object */
  +   case GL_SHADER_STORAGE_BUFFER_BINDING:
  +  if (index = ctx-Const.MaxShaderStorageBufferBindings)
  + goto invalid_value;
  +  if (!ctx-Extensions.ARB_shader_storage_buffer_object)
  + goto invalid_enum;
 
 It definitely strikes me as odd as checking the ext second, after
 other things. Is there a reason for doing it this way? (Same below.)

It is what we do for everything else in this file, so we just followed
the example. I suppose it does not really matter much in the end, but if
we think it does we should probably fix all cases together in a separate
patch.

Iago

  +  v-value_int = 
  ctx-ShaderStorageBufferBindings[index].BufferObject-Name;
  +  return TYPE_INT;
  +
  +   case GL_SHADER_STORAGE_BUFFER_START:
  +  if (index = ctx-Const.MaxShaderStorageBufferBindings)
  + goto invalid_value;
  +  if (!ctx-Extensions.ARB_shader_storage_buffer_object)
  + goto invalid_enum;
  +  v-value_int = ctx-ShaderStorageBufferBindings[index].Offset  0 ? 
  0 :
  + ctx-ShaderStorageBufferBindings[index].Offset;
  +  return TYPE_INT;
  +
  +   case GL_SHADER_STORAGE_BUFFER_SIZE:
  +  if (index = ctx-Const.MaxShaderStorageBufferBindings)
  + goto invalid_value;
  +  if (!ctx-Extensions.ARB_shader_storage_buffer_object)
  + goto invalid_enum;
  +  v-value_int = ctx-ShaderStorageBufferBindings[index].Size  0 ? 0 :
  + ctx-ShaderStorageBufferBindings[index].Size;
  +  return TYPE_INT;
  +
  /* ARB_texture_multisample / GL3.2 */
  case GL_SAMPLE_MASK_VALUE:
 if (index != 0)
  --
  1.9.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 14/59] nir: add shader storage buffer's unsized array length calculation

2015-08-06 Thread Iago Toral
On Wed, 2015-08-05 at 11:59 -0700, Connor Abbott wrote:
 On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote:
  From: Samuel Iglesias Gonsalvez sigles...@igalia.com
 
  Signed-off-by: Samuel Iglesias Gonsalvez sigles...@igalia.com
  ---
   src/glsl/nir/glsl_to_nir.cpp  | 10 ++
   src/glsl/nir/nir_intrinsics.h |  9 +
   2 files changed, 19 insertions(+)
 
  diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
  index 77327b6..642affd 100644
  --- a/src/glsl/nir/glsl_to_nir.cpp
  +++ b/src/glsl/nir/glsl_to_nir.cpp
  @@ -1618,6 +1618,16 @@ nir_visitor::visit(ir_expression *ir)
instr-src[2].swizzle[i] = 0;
 }
 break;
  +   case ir_triop_ssbo_unsized_array_length: {
  +  nir_intrinsic_instr *load = nir_intrinsic_instr_create(this-shader,
  + 
  nir_intrinsic_ssbo_unsized_array_length);
  +  load-num_components = ir-type-vector_elements;
  +  load-src[0] = evaluate_rvalue(ir-operands[0]);
  +  load-src[1] = evaluate_rvalue(ir-operands[1]);
  +  load-src[2] = evaluate_rvalue(ir-operands[2]);
  +  add_instr(load-instr, ir-type-vector_elements);
  +  return;
  +   }
  case ir_quadop_bitfield_insert:
 instr = emit(nir_op_bitfield_insert, dest_size, srcs);
 for (unsigned i = 0; i  ir-operands[0]-type-vector_elements; 
  i++) {
  diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
  index bc6e6b8..f264f55 100644
  --- a/src/glsl/nir/nir_intrinsics.h
  +++ b/src/glsl/nir/nir_intrinsics.h
  @@ -62,6 +62,15 @@ INTRINSIC(interp_var_at_offset, 1, ARR(2), true, 0, 1, 0,
 NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
 
   /*
  + * Shared storage buffer's unsized array length calculation. It takes the
  + * SSBO index, unsized array offset inside the shader storage block 
  interface
  + * and stride of unsized array.
  + */
  +
  +INTRINSIC(ssbo_unsized_array_length, 3, ARR(1), true, 1, 0, 0,
  +  NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
 
 It looks like there's 3 sources, but you've only specified the size of
 one. Did you mean to do ARR(1, 1, 1)? With that fixed,
 
 Reviewed-by: Connor Abbott connor.w.abb...@intel.com

Yeah, I think that's just a mistake. Thanks.

  +
  +/*
* a barrier is an intrinsic with no inputs/outputs but which can't be 
  moved
* around/optimized in general
*/
  --
  1.9.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 29/59] nir: Implement __intrinsic_store_ssbo

2015-08-07 Thread Iago Toral
On Fri, 2015-08-07 at 07:43 +0200, Iago Toral wrote:
 On Thu, 2015-08-06 at 11:06 -0700, Connor Abbott wrote:
  On Thu, Aug 6, 2015 at 12:30 AM, Iago Toral ito...@igalia.com wrote:
   On Wed, 2015-08-05 at 12:17 -0700, Connor Abbott wrote:
   On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com 
   wrote:
---
 src/glsl/nir/glsl_to_nir.cpp  | 36 

 src/glsl/nir/nir_intrinsics.h | 12 ++--
 2 files changed, 42 insertions(+), 6 deletions(-)
   
diff --git a/src/glsl/nir/glsl_to_nir.cpp 
b/src/glsl/nir/glsl_to_nir.cpp
index 642affd..cbec2df 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -641,6 +641,8 @@ nir_visitor::visit(ir_call *ir)
  op = nir_intrinsic_image_atomic_comp_swap;
   } else if (strcmp(ir-callee_name(), 
__intrinsic_memory_barrier) == 0) {
  op = nir_intrinsic_memory_barrier;
+  } else if (strcmp(ir-callee_name(), __intrinsic_store_ssbo) 
== 0) {
+ op = nir_intrinsic_store_ssbo;
   } else {
  unreachable(not reached);
   }
@@ -730,6 +732,40 @@ nir_visitor::visit(ir_call *ir)
   }
   case nir_intrinsic_memory_barrier:
  break;
+  case nir_intrinsic_store_ssbo: {
+ exec_node *param = ir-actual_parameters.get_head();
+ ir_rvalue *block = ((ir_instruction *)param)-as_rvalue();
+
+ param = param-get_next();
+ ir_rvalue *offset = ((ir_instruction *)param)-as_rvalue();
+
+ param = param-get_next();
+ ir_rvalue *val = ((ir_instruction *)param)-as_rvalue();
+
+ param = param-get_next();
+ ir_constant *write_mask = ((ir_instruction 
*)param)-as_constant();
+ assert(write_mask);
+
+ /* Check if we need the indirect version */
+ ir_constant *const_offset = offset-as_constant();
+ if (!const_offset) {
+op = nir_intrinsic_store_ssbo_indirect;
+ralloc_free(instr);
+instr = nir_intrinsic_instr_create(shader, op);
+instr-src[2] = evaluate_rvalue(offset);
+instr-const_index[0] = 0;
+ } else {
+instr-const_index[0] = const_offset-value.u[0];
+ }
+
+ instr-const_index[1] = write_mask-value.u[0];
+
+ instr-src[0] = evaluate_rvalue(val);
+ instr-num_components = val-type-vector_elements;
+
+ instr-src[1] = evaluate_rvalue(block);
+ break;
+  }
   default:
  unreachable(not reached);
   }
diff --git a/src/glsl/nir/nir_intrinsics.h 
b/src/glsl/nir/nir_intrinsics.h
index f264f55..83eeecd 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -176,12 +176,12 @@ LOAD(input, 0, NIR_INTRINSIC_CAN_ELIMINATE | 
NIR_INTRINSIC_CAN_REORDER)
  * offset.
  */
   
-#define STORE(name, num_indices, flags) \
-   INTRINSIC(store_##name, 1, ARR(0), false, 0, 0, num_indices, 
flags) \
-   INTRINSIC(store_##name##_indirect, 2, ARR(0, 1), false, 0, 0, \
+#define STORE(name, extra_srcs, num_indices, flags) \
+   INTRINSIC(store_##name, extra_srcs, ARR(0, 1), false, 0, 0, 
num_indices, flags) \
+   INTRINSIC(store_##name##_indirect, extra_srcs + 1, ARR(0, 1, 1), 
false, 0, 0, \
  num_indices, flags) \
   
-STORE(output, 1, 0)
-/* STORE(ssbo, 2, 0) */
+STORE(output, 1, 2, 0)
+STORE(ssbo, 2, 2, 0)
  
   I don't think outputs should have any extra sources, since they only
   take a constant index, plus possibly an indirect source that's already
   covered by the STORE macro. SSBO stores should only have one extra
   source for the block index. Also, we should update the comment above
   to explain this similarly to the paragraph above the loads.
  
   SSBO stores need an extra source for the block index and an extra index
   for a writemask.
  
   I'll leave the STORE() macro as it was and just define SSBO stores using
   INTRINSIC() directly then.
  
  Ok, I see. I don't think you need a separate INTRINSIC(), but right
  now calling the parameter you added extra_srcs is confusing, since
  you're counting the value to be stored, which isn't really extra at
  all -- every store should have one! How about instead, we change the
  STORE macro to have:
  
  - An extra_srcs parameter that contains only sources that are
  actually extra, not counting the value to be stored -- direct stores
  have extra_srcs + 1 sources, and indirect sources have extra_srcs +
  2 sources
  - An extra_indices parameter that contains the extra indices, and
  replace num_indices with extra_indices + 1
  
  Then normal stores have both set to 0, and SSBO stores have both set
  to 1 to indicate

Re: [Mesa-dev] [PATCH v4 (part2) 02/59] i965/vec4: Import helpers to convert vectors into arrays and back.

2015-08-07 Thread Iago Toral
On Wed, 2015-08-05 at 10:29 +0200, Iago Toral Quiroga wrote:
 From: Francisco Jerez curroje...@riseup.net
 
 These functions handle the conversion of a vec4 into the form expected
 by the dataport unit in message and message return payloads.  The
 conversion is not always trivial because some messages don't support
 SIMD4x2 for some generations, in which case a strided copy may be
 necessary.
 
 v2: Split from the FS implementation.
 v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.
 ---
  src/mesa/drivers/dri/i965/Makefile.sources |  2 +
  .../drivers/dri/i965/brw_vec4_surface_builder.cpp  | 98 
 ++
  .../drivers/dri/i965/brw_vec4_surface_builder.h| 30 +++
  3 files changed, 130 insertions(+)
  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp
  create mode 100644 src/mesa/drivers/dri/i965/brw_vec4_surface_builder.h
 
 diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
 b/src/mesa/drivers/dri/i965/Makefile.sources
 index 88e64fb..049e293 100644
 --- a/src/mesa/drivers/dri/i965/Makefile.sources
 +++ b/src/mesa/drivers/dri/i965/Makefile.sources
 @@ -129,6 +129,8 @@ i965_FILES = \
   brw_vec4_nir.cpp \
   brw_vec4_gs_nir.cpp \
   brw_vec4_reg_allocate.cpp \
 + brw_vec4_surface_builder.cpp \
 + brw_vec4_surface_builder.h \
   brw_vec4_visitor.cpp \
   brw_vec4_vp.cpp \
   brw_vec4_vs_visitor.cpp \
 diff --git a/src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp 
 b/src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp
 new file mode 100644
 index 000..5ba1c6d
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/brw_vec4_surface_builder.cpp
 @@ -0,0 +1,98 @@
 +/*
 + * Copyright © 2013-2015 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + */
 +
 +#include brw_vec4_surface_builder.h
 +
 +using namespace brw;
 +
 +namespace {
 +   namespace array_utils {
 +  /**
 +   * Copy one every \p src_stride logical components of the argument into
 +   * one every \p dst_stride logical components of the result.
 +   */
 +  src_reg
 +  emit_stride(const vec4_builder bld, const src_reg src, unsigned size,
 +  unsigned dst_stride, unsigned src_stride)
 +  {
 + if (src_stride == 1  dst_stride == 1) {
 +return src;
 + } else {
 +const dst_reg dst = bld.vgrf(src.type,
 + DIV_ROUND_UP(size * dst_stride, 4));
 +
 +for (unsigned i = 0; i  size; ++i)
 +   bld.MOV(writemask(offset(dst, i * dst_stride / 4),
 + 1  (i * dst_stride % 4)),
 +   swizzle(offset(src, i * src_stride / 4),
 +   brw_swizzle_for_mask(1  (i * src_stride % 
 4;
 +
 +return src_reg(dst);
 + }
 +  }
 +
 +  /**
 +   * Convert a VEC4 into an array of registers with the layout expected 
 by
 +   * the recipient shared unit.  If \p has_simd4x2 is true the argument 
 is
 +   * left unmodified in SIMD4x2 form, otherwise it will be rearranged 
 into
 +   * a SIMD8 vector.
 +   */
 +  src_reg
 +  emit_insert(const vec4_builder bld, const src_reg src,
 +  unsigned n, bool has_simd4x2)
 +  {
 + if (src.file == BAD_FILE || n == 0) {
 +return src_reg();
 +
 + } else {
 +/* Pad unused components with zeroes. */
 +const unsigned mask = (1  n) - 1;
 +const dst_reg tmp = bld.vgrf(src.type);
 +
 +bld.MOV(writemask(tmp, mask), src);
 +if (n  4)
 +   bld.MOV(writemask(tmp, ~mask), 0);
 +
 +return emit_stride(bld, src, n, has_simd4x2 ? 1 : 4, 1);

Curro: do we really need the padding here? In any case, I guess you
meant to pass tmp and not src to emit_stride here

Re: [Mesa-dev] [PATCH v3 4/4] i965/vec4: Don't unspill the same register in consecutive instructions

2015-08-07 Thread Iago Toral
On Fri, 2015-08-07 at 14:14 +0300, Francisco Jerez wrote:
 Iago Toral ito...@igalia.com writes:
 
  On Thu, 2015-08-06 at 18:27 +0300, Francisco Jerez wrote:
  Iago Toral Quiroga ito...@igalia.com writes:
  
   If we have spilled/unspilled a register in the current instruction, avoid
   emitting unspills for the same register in the same instruction or 
   consecutive
   instructions following the current one as long as they keep reading the 
   spilled
   register. This should allow us to avoid emitting costy unspills that 
   come with
   little benefit to register allocation.
  
   Also, update evaluate_spill_costs so that we account for the saved 
   unspills.
   ---
.../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 129 
   +++--
1 file changed, 121 insertions(+), 8 deletions(-)
  
   diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
   b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
   index 617c988..fed5f4d 100644
   --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
   +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
   @@ -264,6 +264,95 @@ vec4_visitor::reg_allocate()
   return true;
}

   +/**
   + * When we decide to spill a register, instead of blindly spilling 
   every use,
   + * save unspills when the spill register is used (read) in consecutive
   + * instructions. This can potentially save a bunch of unspills that 
   would
   + * have very little impact in register allocation anyway.
   + *
   + * Notice that we need to account for this behavior when spilling a 
   register
   + * and when evaluating spilling costs. This function is designed so it 
   can
   + * be called from both places and avoid repeating the logic.
   + *
   + *  - When we call this function from spill_reg, we pass in scratch_reg 
   the
   + *actual unspill/spill register that we want to reuse in the current
   + *instruction.
   + *
   + *  - When we call this from evaluate_spill_costs, we pass the register 
   for
   + *which we are evaluating spilling costs.
   + *
   + * In either case, we check if the previous instructions read 
   scratch_reg until
   + * we find an instruction that writes to it (in which case we can reuse
   + * scratch_reg as long as the writemask is compatible with the channels 
   we need
   + * to read in the current instruction) or we hit an instruction that 
   does not
   + * read scratch_reg at all. The latter can only happen when we call 
   this from
   + * evaluate_spill_costs,
  
  Strictly speaking it can also happen when called from spill_reg() for
  the first time in a given sequence of consecutive instructions (in which
  case you correctly return false).
 
  True, I'll fix the comment.
 
and means that this is the point at which we first
   + * need the unspill this register for our current instruction. Since 
   all our
   + * unspills read a full vec4, we know that in this case we will have all
   + * the channels available in scratch_reg and we can reuse it.
   + *
   + * In any other case, we can't reuse scratch_reg in the current 
   instruction,
   + * meaning that we will need to unspill it.
   + */
   +static bool
   +can_use_scratch_for_source(const vec4_instruction *inst, unsigned i,
   +   unsigned scratch_reg)
   +{
   +   assert(inst-src[i].file == GRF);
   +
   +   /* If the current instruction is already using scratch_reg in src[n] 
   with
   +* n  i, then we know we can reuse it for src[i] too.
   +*/
   +   for (unsigned n = 0; n  i; n++) {
   +  if (inst-src[n].file == GRF  inst-src[n].reg == scratch_reg)
   + return true;
   +   }
  
  I don't think this is correct in cases where the previous source reused
  the temporary of a previously spilled register with incompatible
  writemask.  You probably want to handle the current instruction
  consistently with the previous ones, i.e. as part of the loop below.
 
  Yes, you're right.
 
  I suggest you define a variable (e.g. n as you've called it) initially
  equal to i that would determine the number of sources to check for the
  next instruction.  At the end of the loop body it would be re-set to 3,
  what would also cause the destination registers to be checked in
  subsequent iterations.
 
  Sounds good to me.
 
   +
   +   bool prev_inst_read_scratch_reg = false;
   +   vec4_instruction *prev_inst = (vec4_instruction *) inst-prev;
  
  You can move this declaration into the init statement of the for loop to
  limit its scope.
 
  Ok.
 
   +   for (; !prev_inst-is_head_sentinel();
   +prev_inst = (vec4_instruction *) prev_inst-prev) {
   +  /* If any previous instruction does not read from or write to 
   scratch_reg
   +   * inconditonally we cannot reuse scratch_reg
   +   */
   +  if (prev_inst-predicate  prev_inst-opcode != BRW_OPCODE_SEL)
   + return false;
  
  I think this is somewhat pessimistic, register fills for a predicated

Re: [Mesa-dev] [PATCH v4 (part2) 46/59] glsl: Allow use of memory qualifiers with ARB_shader_storage_buffer_object.

2015-08-13 Thread Iago Toral
Thanks Tapani, I'll look into this as soon as I get back from holidays. 
The implementation has changed slightly since I sent these patches too 
btw.


Iago

El 2015-08-13 10:05, Tapani Pälli escribió:

Hi;

I believe something more needs to be done here, it may be generic
changes to the memory qualifier parsing. I wrote a Piglit test that
fails with these changes (I tested against
itoral-ARB_shader_storage_buffer_object-v4.1 branch), here:

http://lists.freedesktop.org/archives/piglit/2015-August/016777.html


On 08/05/2015 11:30 AM, Iago Toral Quiroga wrote:

---
  src/glsl/glsl_lexer.ll | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/glsl/glsl_lexer.ll b/src/glsl/glsl_lexer.ll
index efa0bb6..b9d700a 100644
--- a/src/glsl/glsl_lexer.ll
+++ b/src/glsl/glsl_lexer.ll
@@ -405,11 +405,11 @@ image2DShadow   KEYWORD(130, 300, 0, 0, 
IMAGE2DSHADOW);

  image1DArrayShadow  KEYWORD(130, 300, 0, 0, IMAGE1DARRAYSHADOW);
  image2DArrayShadow  KEYWORD(130, 300, 0, 0, IMAGE2DARRAYSHADOW);

-coherent	KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable, COHERENT);
-volatile	KEYWORD_WITH_ALT(110, 100, 420, 0, 
yyextra-ARB_shader_image_load_store_enable, VOLATILE);
-restrict	KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable, RESTRICT);
-readonly	KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable, READONLY);
-writeonly	KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable, WRITEONLY);
+coherent   KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable || 
yyextra-ARB_shader_storage_buffer_object_enable, COHERENT);
+volatile   KEYWORD_WITH_ALT(110, 100, 420, 0, 
yyextra-ARB_shader_image_load_store_enable || 
yyextra-ARB_shader_storage_buffer_object_enable, VOLATILE);
+restrict   KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable || 
yyextra-ARB_shader_storage_buffer_object_enable, RESTRICT);
+readonly   KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable || 
yyextra-ARB_shader_storage_buffer_object_enable, READONLY);
+writeonly  KEYWORD_WITH_ALT(420, 300, 420, 0, 
yyextra-ARB_shader_image_load_store_enable || 
yyextra-ARB_shader_storage_buffer_object_enable, WRITEONLY);


  atomic_uint KEYWORD_WITH_ALT(420, 300, 420, 310, 
yyextra-ARB_shader_atomic_counters_enable, ATOMIC_UINT);




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 29/59] nir: Implement __intrinsic_store_ssbo

2015-08-06 Thread Iago Toral
On Wed, 2015-08-05 at 12:17 -0700, Connor Abbott wrote:
 On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote:
  ---
   src/glsl/nir/glsl_to_nir.cpp  | 36 
   src/glsl/nir/nir_intrinsics.h | 12 ++--
   2 files changed, 42 insertions(+), 6 deletions(-)
 
  diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
  index 642affd..cbec2df 100644
  --- a/src/glsl/nir/glsl_to_nir.cpp
  +++ b/src/glsl/nir/glsl_to_nir.cpp
  @@ -641,6 +641,8 @@ nir_visitor::visit(ir_call *ir)
op = nir_intrinsic_image_atomic_comp_swap;
 } else if (strcmp(ir-callee_name(), __intrinsic_memory_barrier) 
  == 0) {
op = nir_intrinsic_memory_barrier;
  +  } else if (strcmp(ir-callee_name(), __intrinsic_store_ssbo) == 0) 
  {
  + op = nir_intrinsic_store_ssbo;
 } else {
unreachable(not reached);
 }
  @@ -730,6 +732,40 @@ nir_visitor::visit(ir_call *ir)
 }
 case nir_intrinsic_memory_barrier:
break;
  +  case nir_intrinsic_store_ssbo: {
  + exec_node *param = ir-actual_parameters.get_head();
  + ir_rvalue *block = ((ir_instruction *)param)-as_rvalue();
  +
  + param = param-get_next();
  + ir_rvalue *offset = ((ir_instruction *)param)-as_rvalue();
  +
  + param = param-get_next();
  + ir_rvalue *val = ((ir_instruction *)param)-as_rvalue();
  +
  + param = param-get_next();
  + ir_constant *write_mask = ((ir_instruction 
  *)param)-as_constant();
  + assert(write_mask);
  +
  + /* Check if we need the indirect version */
  + ir_constant *const_offset = offset-as_constant();
  + if (!const_offset) {
  +op = nir_intrinsic_store_ssbo_indirect;
  +ralloc_free(instr);
  +instr = nir_intrinsic_instr_create(shader, op);
  +instr-src[2] = evaluate_rvalue(offset);
  +instr-const_index[0] = 0;
  + } else {
  +instr-const_index[0] = const_offset-value.u[0];
  + }
  +
  + instr-const_index[1] = write_mask-value.u[0];
  +
  + instr-src[0] = evaluate_rvalue(val);
  + instr-num_components = val-type-vector_elements;
  +
  + instr-src[1] = evaluate_rvalue(block);
  + break;
  +  }
 default:
unreachable(not reached);
 }
  diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
  index f264f55..83eeecd 100644
  --- a/src/glsl/nir/nir_intrinsics.h
  +++ b/src/glsl/nir/nir_intrinsics.h
  @@ -176,12 +176,12 @@ LOAD(input, 0, NIR_INTRINSIC_CAN_ELIMINATE | 
  NIR_INTRINSIC_CAN_REORDER)
* offset.
*/
 
  -#define STORE(name, num_indices, flags) \
  -   INTRINSIC(store_##name, 1, ARR(0), false, 0, 0, num_indices, flags) \
  -   INTRINSIC(store_##name##_indirect, 2, ARR(0, 1), false, 0, 0, \
  +#define STORE(name, extra_srcs, num_indices, flags) \
  +   INTRINSIC(store_##name, extra_srcs, ARR(0, 1), false, 0, 0, 
  num_indices, flags) \
  +   INTRINSIC(store_##name##_indirect, extra_srcs + 1, ARR(0, 1, 1), false, 
  0, 0, \
num_indices, flags) \
 
  -STORE(output, 1, 0)
  -/* STORE(ssbo, 2, 0) */
  +STORE(output, 1, 2, 0)
  +STORE(ssbo, 2, 2, 0)
 
 I don't think outputs should have any extra sources, since they only
 take a constant index, plus possibly an indirect source that's already
 covered by the STORE macro. SSBO stores should only have one extra
 source for the block index. Also, we should update the comment above
 to explain this similarly to the paragraph above the loads.

SSBO stores need an extra source for the block index and an extra index
for a writemask.

I'll leave the STORE() macro as it was and just define SSBO stores using
INTRINSIC() directly then.

 
  -LAST_INTRINSIC(store_output_indirect)
  +LAST_INTRINSIC(store_ssbo_indirect)
  --
  1.9.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4 (part2) 39/59] nir: Implement lowered SSBO atomic intrinsics

2015-08-06 Thread Iago Toral
On Wed, 2015-08-05 at 12:24 -0700, Connor Abbott wrote:
 On Wed, Aug 5, 2015 at 1:30 AM, Iago Toral Quiroga ito...@igalia.com wrote:
  The original GLSL IR intrinsics have been lowered to an internal
  version that accepts a block index and an offset instead of a
  SSBO reference.
  ---
   src/glsl/nir/glsl_to_nir.cpp  | 55 
  +++
   src/glsl/nir/nir_intrinsics.h | 12 ++
   2 files changed, 67 insertions(+)
 
  diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
  index aa96e40..4450e7a 100644
  --- a/src/glsl/nir/glsl_to_nir.cpp
  +++ b/src/glsl/nir/glsl_to_nir.cpp
  @@ -645,6 +645,22 @@ nir_visitor::visit(ir_call *ir)
op = nir_intrinsic_store_ssbo;
 } else if (strcmp(ir-callee_name(), __intrinsic_load_ssbo) == 0) {
op = nir_intrinsic_load_ssbo;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_add_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_add;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_and_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_and;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_or_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_or;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_xor_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_xor;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_min_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_min;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_max_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_max;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_exchange_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_exchange;
  +  } else if (strcmp(ir-callee_name(), 
  __intrinsic_ssbo_atomic_comp_swap_internal) == 0) {
  + op = nir_intrinsic_ssbo_atomic_comp_swap;
 } else {
unreachable(not reached);
 }
  @@ -826,7 +842,46 @@ nir_visitor::visit(ir_call *ir)
 
   nir_instr_insert_after_cf_list(this-cf_node_list, 
  compare-instr);
}
  + break;
  +  }
  +  case nir_intrinsic_ssbo_atomic_add:
  +  case nir_intrinsic_ssbo_atomic_min:
  +  case nir_intrinsic_ssbo_atomic_max:
  +  case nir_intrinsic_ssbo_atomic_and:
  +  case nir_intrinsic_ssbo_atomic_or:
  +  case nir_intrinsic_ssbo_atomic_xor:
  +  case nir_intrinsic_ssbo_atomic_exchange:
  +  case nir_intrinsic_ssbo_atomic_comp_swap: {
  + int param_count = ir-actual_parameters.length();
  + assert(param_count == 3 || param_count == 4);
  +
  + /* Block index */
  + exec_node *param = ir-actual_parameters.get_head();
  + ir_instruction *inst = (ir_instruction *) param;
  + instr-src[0] = evaluate_rvalue(inst-as_rvalue());
 
  + /* Offset */
  + param = param-get_next();
  + inst = (ir_instruction *) param;
  + instr-src[1] = evaluate_rvalue(inst-as_rvalue());
  +
  + /* data1 parameter (this is always present) */
  + param = param-get_next();
  + inst = (ir_instruction *) param;
  + instr-src[2] = evaluate_rvalue(inst-as_rvalue());
  +
  + /* data2 parameter (only with atomic_comp_swap) */
  + if (param_count == 4) {
  +assert(op == nir_intrinsic_ssbo_atomic_comp_swap);
  +param = param-get_next();
  +inst = (ir_instruction *) param;
  +instr-src[3] = evaluate_rvalue(inst-as_rvalue());
  + }
  +
  + /* Atomic result */
  + assert(ir-return_deref);
  + nir_ssa_dest_init(instr-instr, instr-dest,
  +   ir-return_deref-type-vector_elements, NULL);
break;
 }
 default:
  diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
  index 9877ea4..a1351d1 100644
  --- a/src/glsl/nir/nir_intrinsics.h
  +++ b/src/glsl/nir/nir_intrinsics.h
  @@ -133,6 +133,18 @@ INTRINSIC(image_atomic_xor, 3, ARR(4, 1, 1), true, 1, 
  1, 0, 0)
   INTRINSIC(image_atomic_exchange, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
   INTRINSIC(image_atomic_comp_swap, 4, ARR(4, 1, 1, 1), true, 1, 1, 0, 0)
 
  +/*
  + * SSBO atomic intrinsics
  + */
 
 Could you add a bit here explaining what the sources are?

How about this?:

/*
 * SSBO atomic intrinsics
 *
 * All of the SSBO atomic memory operations read a value from memory,
 * compute a new value using one of the operations below, write the new
 * value to memory, and return the original value read.
 *
 * All operations take 3 sources except CompSwap that takes 4. These
 * sources represent:
 *
 * 0: The SSBO buffer index.
 * 1: The offset into the SSBO buffer of the variable that the atomic
 *operation will operate on.
 * 2: The data parameter to the atomic function

Re: [Mesa-dev] [PATCH V2 12/14] meta: Fix reading luminance texture as rgba in _mesa_meta_pbo_GetTexSubImage()

2015-07-24 Thread Iago Toral
On Thu, 2015-07-23 at 11:40 -0700, Anuj Phogat wrote:
 On Wed, Jul 22, 2015 at 7:10 AM, Iago Toral ito...@igalia.com wrote:
  The problem here is that the _mesa_meta_BlitFramebuffer is not setting
  G/B channels to 0.0 when doing Luminance/Intensity to RGBA conversions,
  so why not implement the fix in _mesa_meta_BlitFramebuffer directly? The
  GL spec expects frambuffer blits to handle these conversions properly,
  so it looks like a win for all uses of that function.
 
 I couldn't find an OpenGL spec reference suggesting this conversion in case
 of glBlitFrameBuffer.

It is not explicitly stated, however the OpenGL 4.5 spec, section 18.3.1
Blitting Pixel Rectangle says:

An INVALID_OPERATION error is generated if format conversions are not
supported, which occurs under any of the following conditions:
• The read buffer contains fixed-point or floating-point values and any
draw buffer contains neither fixed-point nor floating-point values.
• The read buffer contains unsigned integer values and any draw buffer
does not contain unsigned integer values.
• The read buffer contains signed integer values and any draw buffer
does not contain signed integer values.

However, I realize now that luminance/intensity are not color-renderable
formats, so that text is probably not considering these formats anyway.

 What I found supports the current behavior of glBlitFrameBuffer:
 See table 3.23 on page 220 (of pdf) of glspec30.20080811.
 
 Some relevant text from https://www.opengl.org/wiki/Image_Format:
 When a GL_RED format is sampled in a shader, the resulting vec4 is
  (Red, 0, 0, 1). When a GL_INTENSITY format is sampled, the resulting
   vec4 is (I, I, I, I). The single intensity value is read into all four
   components. For GL_LUMINANCE, the result is (L, L, L, 1). There is
   also a two-channel GL_LUMINANCE_ALPHA format, which gives
   (L, L, L, A).

 I think glBlitFrameBuffer should also follow this being a drawing operation.
 What do you think?

Yes, it makes sense. You can add:

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

to the patch.

 
  On Fri, 2015-07-17 at 10:28 -0700, Anuj Phogat wrote:
  After recent addition of pbo testing in piglit test getteximage-luminance,
  it fails on i965. This patch makes a sub test pass.
 
  This patch adds a clear color operation to meta pbo path, which I think is
  better than falling back to software path.
 
  V2: Fix color mask for GL_LUMINANCE_ALPHA
 
  Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
  Cc: mesa-sta...@lists.freedesktop.org
  ---
   src/mesa/drivers/common/meta_tex_subimage.c | 36 
  +++--
   1 file changed, 34 insertions(+), 2 deletions(-)
 
  diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
  b/src/mesa/drivers/common/meta_tex_subimage.c
  index 13f8292..f4d5ac3 100644
  --- a/src/mesa/drivers/common/meta_tex_subimage.c
  +++ b/src/mesa/drivers/common/meta_tex_subimage.c
  @@ -28,6 +28,7 @@
   #include blend.h
   #include bufferobj.h
   #include buffers.h
  +#include clear.h
   #include fbobject.h
   #include glformats.h
   #include glheader.h
  @@ -278,8 +279,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
  GLuint dims,
  int full_height, image_height;
  struct gl_texture_image *pbo_tex_image;
  struct gl_renderbuffer *rb = NULL;
  -   GLenum status;
  -   bool success = false;
  +   GLenum status, src_base_format;
  +   bool success = false, clear_channels_to_zero = false;
  +   float save_clear_color[4];
  int z;
 
  if (!_mesa_is_bufferobj(packing-BufferObj))
  @@ -380,6 +382,27 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
  GLuint dims,
 GL_COLOR_BUFFER_BIT, GL_NEAREST))
 goto fail;
 
  +   src_base_format = tex_image ?
  + tex_image-_BaseFormat :
  + ctx-ReadBuffer-_ColorReadBuffer-_BaseFormat;
  +
  +   /* Depending on the base formats involved we might need to rebase some
  +* values. For example if we download from a Luminance format to RGBA
  +* format, we want G=0 and B=0.
  +*/
  +   clear_channels_to_zero =
  +  _mesa_need_luminance_to_rgb_conversion(src_base_format,
  + pbo_tex_image-_BaseFormat);
  +
  +   if (clear_channels_to_zero) {
  +  memcpy(save_clear_color, ctx-Color.ClearColor.f, 4 * 
  sizeof(float));
  +  /* Clear the Green, Blue channels. */
  +  _mesa_ColorMask(GL_FALSE, GL_TRUE, GL_TRUE,
  +  src_base_format != GL_LUMINANCE_ALPHA);
  +  _mesa_ClearColor(0.0, 0.0, 0.0, 1.0);
  +  _mesa_Clear(GL_COLOR_BUFFER_BIT);
  +   }
  +
  for (z = 1; z  depth; z++) {
 _mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
   tex_image, zoffset + z);
  @@ -392,6 +415,15 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
  GLuint dims,
0, z * image_height

Re: [Mesa-dev] [PATCH 3/5] i965/vec4: Register spilling should never see registers with size != 1

2015-07-27 Thread Iago Toral
On Fri, 2015-07-24 at 16:20 +0300, Francisco Jerez wrote:
 Iago Toral Quiroga ito...@igalia.com writes:
 
  Larger registers should have been moved to scratch (like GRF array access)
  or split to size 1 by the split_virtual_grfs pass.
 
 Not necessarily.  split_virtual_grfs() won't be able to split stuff
 which is read or written at once by the same instruction -- E.g. by
 send-from-GRF instructions as used for surface messages on e.g.  your
 SSBO implementation.  :)
 
 We should probably eventually migrate other messages too like the ones
 used for texturing and framebuffer writes to use proper sends from
 GRF...

Okay, in that case I'll include patches to add support for spilling
registers with size  1 as well.

Thanks,
Iago

  ---
   src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
  b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  index cff5406..80ab813 100644
  --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  @@ -271,7 +271,8 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, 
  bool *no_spill)
   
  for (unsigned i = 0; i  this-alloc.count; i++) {
 spill_costs[i] = 0.0;
  -  no_spill[i] = alloc.sizes[i] != 1;
  +  no_spill[i] = false;
  +  assert(this-alloc.sizes[i] == 1);
  }
   
  /* Calculate costs for spilling nodes.  Call it a cost of 1 per
  -- 
  1.9.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] i965/vec4: Don't emit scratch reads for a spilled register we have just written

2015-07-27 Thread Iago Toral
On Fri, 2015-07-24 at 16:18 +0300, Francisco Jerez wrote:
 Iago Toral Quiroga ito...@igalia.com writes:
 
  When we have code such as this:
 
  mov vgrf1.0.x:F, vgrf2.:F
  mov vgrf3.0.x:F, vgrf1.:F
  ...
  mov vgrf3.0.x:F, vgrf1.:F
 
  And vgrf1 is chosen for spilling, we can emit this:
 
  mov vgrf1.0.x:F, vgrf2.:F
  gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
  mov vgrf3.0.x:F, vgrf1.:F
  ...
  gen4_scratch_read vgrf4.0.x:F, 22D
  mov vgrf3.0.x:F, vgrf4.:F
 
  Instead of this:
 
  mov vgrf1.0.x:F, vgrf2.:F
  gen4_scratch_write hw_reg0:F, vgrf1.:D, 22D
  gen4_scratch_read vgrf4.0.x:F, 22D
  mov vgrf3.0.x:F, vgrf4.:F
  ...
  gen4_scratch_read vgrf5.0.x:F, 22D
  mov vgrf3.0.x:F, vgrf5.:F
 
  And save one scratch read while still preserving the benefits of
  spilling the register.
 
 This sounds reasonable to me in principle.  I guess that there is in
 general a trade-off between the number of spills/fills you omit and the
 number of interference edges you eliminate.  It may also be worth
 checking whether you can extend the same principle to cache the value of
 the variable in a GRF until the next instruction regardless of whether
 it was written or read (e.g. so you don't unspill the same register in
 two adjacent instructions).

That makes sense, I'll send a v2 with that chage.

 In either case it seems like the overall cost of spilling a register
 would be decreased in cases where this heuristic can be applied, would
 it make sense to update the cost metric accordingly?

Yeah, I guess so. I'll do that too.

 One more comment inline.
 
  ---
   .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 39 
  +-
   1 file changed, 38 insertions(+), 1 deletion(-)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
  b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  index 80ab813..5fed2f9 100644
  --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
  @@ -334,6 +334,18 @@ vec4_visitor::choose_spill_reg(struct ra_graph *g)
  return ra_get_best_spill_node(g);
   }
   
  +static bool
  +writemask_matches_swizzle(unsigned writemask, unsigned swizzle)
  +{
  +   for (int i = 0; i  4; i++) {
  +  unsigned channel = 1  BRW_GET_SWZ(swizzle, i);
  +  if (!(writemask  channel))
  + return false;
  +   }
  +
  +   return true;
  +}
  +
   void
   vec4_visitor::spill_reg(int spill_reg_nr)
   {
  @@ -341,11 +353,33 @@ vec4_visitor::spill_reg(int spill_reg_nr)
  unsigned int spill_offset = last_scratch++;
   
  /* Generate spill/unspill instructions for the objects being spilled. */
  +   vec4_instruction *spill_write_inst = NULL;
  foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
  +  /* We don't spill registers used for scratch */
  +  if (inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
  +  inst-opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
  + continue;
  +
 int scratch_reg = -1;
 for (unsigned int i = 0; i  3; i++) {
if (inst-src[i].file == GRF  inst-src[i].reg == spill_reg_nr) 
  {
  -if (scratch_reg == -1) {
  +/* If we are reading the spilled register right after writing
  + * to it we can skip the scratch read and use directly the
  + * register we used as source for the scratch write. For this
  + * to work we must check that:
  + *
  + * 1) The write is inconditional, that is, it is not 
  predicated or
  +  it is a SEL.
  + * 2) All the channels that we read have been written in that
  + *last write instruction.
  + */
  +if (spill_write_inst 
  +(!spill_write_inst-predicate ||
  + spill_write_inst-opcode == BRW_OPCODE_SEL) 
  +writemask_matches_swizzle(spill_write_inst-dst.writemask,
  +  inst-src[i].swizzle)) {
 
 brw_mask_for_swizzle() returns the mask of components accessed by a
 swizzle, you could just AND it with ~spill_write_inst-dst.writemask to
 find out whether it's contained in the destination of the previous
 instruction.

Ah nice, thanks for the tip!

Iago

  +   scratch_reg = spill_write_inst-dst.reg;
  +} else if (scratch_reg == -1) {
  scratch_reg = alloc.allocate(1);
  src_reg temp = inst-src[i];
  temp.reg = scratch_reg;
  @@ -358,6 +392,9 @@ vec4_visitor::spill_reg(int spill_reg_nr)
   
 if (inst-dst.file == GRF  inst-dst.reg == spill_reg_nr) {
emit_scratch_write(block, inst, spill_offset);
  + spill_write_inst = inst;
  +  } else {
  + spill_write_inst = NULL;
 }
  }
   
  -- 
  1.9.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org

Re: [Mesa-dev] [PATCH 07/14] mesa: Set green, blue channels to zero only for formats with these components

2015-07-22 Thread Iago Toral
On Tue, 2015-07-21 at 11:13 -0700, Anuj Phogat wrote:
 On Tue, Jul 21, 2015 at 12:50 AM, Iago Toral ito...@igalia.com wrote:
  On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
  Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
  ---
   src/mesa/drivers/common/meta.c | 13 ++---
   1 file changed, 10 insertions(+), 3 deletions(-)
 
  diff --git a/src/mesa/drivers/common/meta.c 
  b/src/mesa/drivers/common/meta.c
  index 214a68a..fceb25d 100644
  --- a/src/mesa/drivers/common/meta.c
  +++ b/src/mesa/drivers/common/meta.c
  @@ -3132,9 +3132,16 @@ decompress_texture_image(struct gl_context *ctx,
  * returned as red and two-channel texture values are returned as
  * red/alpha.
  */
  -  if ((baseTexFormat == GL_LUMINANCE ||
  -   baseTexFormat == GL_LUMINANCE_ALPHA ||
  -   baseTexFormat == GL_INTENSITY) ||
  +  if (((baseTexFormat == GL_LUMINANCE ||
  +baseTexFormat == GL_LUMINANCE_ALPHA ||
  +baseTexFormat == GL_INTENSITY) 
  +   (destBaseFormat == GL_RGBA ||
  +destBaseFormat == GL_RGB ||
  +destBaseFormat == GL_RG ||
  +destBaseFormat == GL_GREEN ||
  +destBaseFormat == GL_BLUE ||
  +destBaseFormat == GL_BGRA ||
  +destBaseFormat == GL_BGR)) ||
 
  Is this needed to achieve correct behavior or just an optimization? I
  would expect that if the dest format does not have G/B channels, setting
  pixel transfer options for these channels would not have any functional
  effect anyway.
 
 This is just an optimization. We set pixel transfer operations based on these
 conditions and then call _mesa_ReadPixels, which falls back to slower path
 if transfer operations are set. I'll bump up the commit message of this patch.

Yes, a note in the commit log is worth it, thanks,

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

 /* If we're reading back an RGB(A) texture (using 
  glGetTexImage) as
   * luminance then we need to return L=tex(R).
   */
 
 
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion

2015-07-22 Thread Iago Toral
On Tue, 2015-07-21 at 17:05 -0700, Anuj Phogat wrote:
 On Tue, Jul 21, 2015 at 1:36 AM, Iago Toral ito...@igalia.com wrote:
  On Tue, 2015-07-21 at 08:13 +0200, Iago Toral wrote:
  On Mon, 2015-07-20 at 10:56 -0700, Anuj Phogat wrote:
   On Mon, Jul 20, 2015 at 5:10 AM, Iago Toral ito...@igalia.com wrote:
On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote:
On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand 
ja...@jlekstrand.net wrote:

 On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote:

 Without this patch, piglit test fbo_integer_readpixels_sint_uint 
 fails,
 when
 forced to use the meta pbo path.

 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 Cc: mesa-sta...@lists.freedesktop.org
 ---
  src/mesa/drivers/common/meta_tex_subimage.c | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/src/mesa/drivers/common/meta_tex_subimage.c
 b/src/mesa/drivers/common/meta_tex_subimage.c
 index 00364f8..84cbc50 100644
 --- a/src/mesa/drivers/common/meta_tex_subimage.c
 +++ b/src/mesa/drivers/common/meta_tex_subimage.c
 @@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct 
 gl_context *ctx,
 GLuint dims,

if (_mesa_need_rgb_to_luminance_conversion(rb-Format, 
 format))
   return false;
 +
 +  if (_mesa_need_signed_unsigned_int_conversion(rb-Format, 
 format,
 type))
 + return false;

 Hrm... This seems fishy.  Isn't glBlitFramebuffers supposed to 
 handle format
 conversion with integers?  If so we should probably fix it rather 
 than just
 skip it for the meta pbo path.

As discussed offline, here is relevant text for glBlitFrameBuffer() 
from
OpenGL 4.5 spec, section 18.3.1:
An INVALID_OPERATION error is generated if format conversions are not
supported, which occurs under any of the following conditions:
-The read buffer contains fixed-point or floating-point values and 
any draw
  buffer contains neither fixed-point nor floating-point values.
-The read buffer contains unsigned integer values and any draw buffer 
does
  not contain unsigned integer values.
- The read buffer contains signed integer values and any draw buffer 
does
  not contain signed integer values.
   
I'll add a comment here explaining the reason to avoid meta path.
   
Is this code going to run only for glBlitFramebuffer? I see this
function being called from code paths that implement glReadPixels and
glGetTexImage too.
   
   _mesa_meta_pbo_GetTexSubImage() is used only for glReadPixels and
   glGetTexImage. I quoted the glBliFrameBuffer restriction above because
   the function is later using _mesa_meta_BlitFramebuffer(), which doesn't
   support some format conversions.
 
  If this function can be used to resolve ReadPixels and GetTexImage but
  the checks you add are *specific* to BlitFramebuffer, it does not look
  like this is the right place for them. Shouldn't you put them inside
  _mesa_meta_BlitFramebuffer instead? Otherwise they would affect to
  ReadPixels and GetTexImage too and I don't see the same restrictions
  applying to ReadPixels for example.
 We already have error checks in place for glBlitFrameBuffer(). Take a
 look at compatible_color_datatypes() in _mesa_blit_framebuffer().
 
  Specifically for ReadPixels I only see this in the spec:
 
  An INVALID_OPERATION error is generated if format is an integer format
  and the color buffer is not an integer format, or if the color buffer is
  an integer format and format is not an integer format.
 
  So, unlike BlitFramebuffer, it seems that ReadPixels is fine as long as
  both formats are integer, no matter if the types have the same sign or
  not.
 Right. That's the reason this patch doesn't generate any GL error for
 signed-unsigned int mismatch. It just decides not to use meta pbo path
 because of  unsupported format conversions in _mesa_meta_BlitFrameBuffer(),
 and fallback to using other paths.

Ah, I see now, thanks for clarifying this.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

 
  Iago
 
 
 }

 /* For arrays, use a tall (height * depth) 2D texture but 
 taking into
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
   
   
  
 
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
 
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo

Re: [Mesa-dev] [PATCH v2 06/14] meta: Don't do fragment color clamping in _mesa_meta_pbo_GetTexSubImage

2015-07-22 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Wed, 2015-07-22 at 11:54 -0700, Anuj Phogat wrote:
 _mesa_meta_pbo_GetTexSubImage() uses _mesa_meta_BlitFrameBuffer(),
 which will do fragment clamping if enabled. But fragment clamping
 doesn't affect ReadPixels and GetTexImage.
 
 Without this patch, piglit test arb_color_buffer_float-clear fails,
 when forced to use the meta pbo path.
 
 v2: Apply this fix to both glReadPixels and glGetTexImage.
 
 Cc: mesa-sta...@lists.freedesktop.org
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/common/meta_tex_subimage.c | 5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
 b/src/mesa/drivers/common/meta_tex_subimage.c
 index 6d7c1eb..038d525 100644
 --- a/src/mesa/drivers/common/meta_tex_subimage.c
 +++ b/src/mesa/drivers/common/meta_tex_subimage.c
 @@ -25,6 +25,7 @@
   *Jason Ekstrand jason.ekstr...@intel.com
   */
  
 +#include blend.h
  #include bufferobj.h
  #include buffers.h
  #include fbobject.h
 @@ -331,6 +332,10 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
 GLuint dims,
 _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER |
 MESA_META_PIXEL_STORE));
  
 +   /* GL_CLAMP_FRAGMENT_COLOR doesn't affect ReadPixels and GettexImage */
 +   if (ctx-Extensions.ARB_color_buffer_float)
 +  _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
 +
 _mesa_GenFramebuffers(2, fbos);
  
 if (tex_image  tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] mesa: Change the signature of _mesa_need_rgb_to_luminance_conversion()

2015-07-22 Thread Iago Toral
Looks good to me, if this did not introduce any regressions:

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 This allows us to handle cases when texImage-_BaseFormat doesn't match
 _mesa_format_get_base_format(texImage-Format). _BaseFormat is what we
 care about in this function.
 
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/common/meta_tex_subimage.c |  4 +++-
  src/mesa/main/readpix.c | 28 +++-
  src/mesa/main/readpix.h |  3 ++-
  3 files changed, 20 insertions(+), 15 deletions(-)
 
 diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
 b/src/mesa/drivers/common/meta_tex_subimage.c
 index 6d52014..43e1210 100644
 --- a/src/mesa/drivers/common/meta_tex_subimage.c
 +++ b/src/mesa/drivers/common/meta_tex_subimage.c
 @@ -262,6 +262,7 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
 GLuint dims,
 int full_height, image_height;
 struct gl_texture_image *pbo_tex_image;
 struct gl_renderbuffer *rb = NULL;
 +   GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format);
 GLenum status, base_format;
 bool success = false, clear_channels_to_zero = false;
 float save_clear_color[4];
 @@ -284,7 +285,8 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
 GLuint dims,
  type, GL_FALSE))
   return false;
  
 -  if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format))
 +  if (_mesa_need_rgb_to_luminance_conversion(rb-_BaseFormat,
 + dstBaseFormat))
   return false;
  
if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format, 
 type))
 diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
 index c98975f..3a9b766 100644
 --- a/src/mesa/main/readpix.c
 +++ b/src/mesa/main/readpix.c
 @@ -47,17 +47,14 @@
   * Return true if the conversion L=R+G+B is needed.
   */
  GLboolean
 -_mesa_need_rgb_to_luminance_conversion(mesa_format texFormat, GLenum format)
 +_mesa_need_rgb_to_luminance_conversion(GLenum srcBaseFormat,
 +   GLenum dstBaseFormat)
  {
 -   GLenum baseTexFormat = _mesa_get_format_base_format(texFormat);
 -
 -   return (baseTexFormat == GL_RG ||
 -   baseTexFormat == GL_RGB ||
 -   baseTexFormat == GL_RGBA) 
 -  (format == GL_LUMINANCE ||
 -   format == GL_LUMINANCE_ALPHA ||
 -   format == GL_LUMINANCE_INTEGER_EXT ||
 -   format == GL_LUMINANCE_ALPHA_INTEGER_EXT);
 +   return (srcBaseFormat == GL_RG ||
 +   srcBaseFormat == GL_RGB ||
 +   srcBaseFormat == GL_RGBA) 
 +  (dstBaseFormat == GL_LUMINANCE ||
 +   dstBaseFormat == GL_LUMINANCE_ALPHA);
  }
  
  /**
 @@ -89,6 +86,8 @@ _mesa_get_readpixels_transfer_ops(const struct gl_context 
 *ctx,
GLboolean uses_blit)
  {
 GLbitfield transferOps = ctx-_ImageTransferState;
 +   GLenum srcBaseFormat = _mesa_get_format_base_format(texFormat);
 +   GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format);
  
 if (format == GL_DEPTH_COMPONENT ||
 format == GL_DEPTH_STENCIL ||
 @@ -125,7 +124,7 @@ _mesa_get_readpixels_transfer_ops(const struct gl_context 
 *ctx,
  * have any effect anyway.
  */
 if (_mesa_get_format_datatype(texFormat) == GL_UNSIGNED_NORMALIZED 
 -   !_mesa_need_rgb_to_luminance_conversion(texFormat, format)) {
 +   !_mesa_need_rgb_to_luminance_conversion(srcBaseFormat, 
 dstBaseFormat)) {
transferOps = ~IMAGE_CLAMP_BIT;
 }
  
 @@ -164,6 +163,7 @@ _mesa_readpixels_needs_slow_path(const struct gl_context 
 *ctx, GLenum format,
  {
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 +   GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format);
  
 assert(rb);
  
 @@ -184,7 +184,8 @@ _mesa_readpixels_needs_slow_path(const struct gl_context 
 *ctx, GLenum format,
  
 default:
/* Color formats. */
 -  if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format)) {
 +  if (_mesa_need_rgb_to_luminance_conversion(rb-_BaseFormat,
 + dstBaseFormat)) {
   return GL_TRUE;
}
  
 @@ -458,6 +459,7 @@ read_rgba_pixels( struct gl_context *ctx,
 uint8_t rebase_swizzle[4];
 struct gl_framebuffer *fb = ctx-ReadBuffer;
 struct gl_renderbuffer *rb = fb-_ColorReadBuffer;
 +   GLenum dstBaseFormat = _mesa_unpack_format_to_base_format(format);
  
 if (!rb)
return;
 @@ -469,7 +471,7 @@ read_rgba_pixels( struct gl_context *ctx,
 dst_stride = _mesa_image_row_stride(packing, width, format, type);
 dst_format = _mesa_format_from_format_and_type(format, type);
 convert_rgb_to_lum =
 -  _mesa_need_rgb_to_luminance_conversion(rb-Format, format

Re: [Mesa-dev] [PATCH 14/14] meta: Use _mesa_need_rgb_to_luminance_conversion() in decompress_texture_image()

2015-07-22 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/common/meta.c | 7 ++-
  1 file changed, 2 insertions(+), 5 deletions(-)
 
 diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
 index 6108d98..e123500 100644
 --- a/src/mesa/drivers/common/meta.c
 +++ b/src/mesa/drivers/common/meta.c
 @@ -3137,11 +3137,8 @@ decompress_texture_image(struct gl_context *ctx,
/* If we're reading back an RGB(A) texture (using glGetTexImage) as
  * luminance then we need to return L=tex(R).
  */
 -  ((baseTexFormat == GL_RGBA ||
 -baseTexFormat == GL_RGB  ||
 -baseTexFormat == GL_RG) 
 -  (destBaseFormat == GL_LUMINANCE ||
 -   destBaseFormat == GL_LUMINANCE_ALPHA))) {
 +  _mesa_need_rgb_to_luminance_conversion(baseTexFormat,
 + destBaseFormat)) {
   /* Green and blue must be zero */
   _mesa_PixelTransferf(GL_GREEN_SCALE, 0.0f);
   _mesa_PixelTransferf(GL_BLUE_SCALE, 0.0f);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 12/14] meta: Fix reading luminance texture as rgba in _mesa_meta_pbo_GetTexSubImage()

2015-07-22 Thread Iago Toral
The problem here is that the _mesa_meta_BlitFramebuffer is not setting
G/B channels to 0.0 when doing Luminance/Intensity to RGBA conversions,
so why not implement the fix in _mesa_meta_BlitFramebuffer directly? The
GL spec expects frambuffer blits to handle these conversions properly,
so it looks like a win for all uses of that function.

Iago

On Fri, 2015-07-17 at 10:28 -0700, Anuj Phogat wrote:
 After recent addition of pbo testing in piglit test getteximage-luminance,
 it fails on i965. This patch makes a sub test pass.
 
 This patch adds a clear color operation to meta pbo path, which I think is
 better than falling back to software path.
 
 V2: Fix color mask for GL_LUMINANCE_ALPHA
 
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 Cc: mesa-sta...@lists.freedesktop.org
 ---
  src/mesa/drivers/common/meta_tex_subimage.c | 36 
 +++--
  1 file changed, 34 insertions(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
 b/src/mesa/drivers/common/meta_tex_subimage.c
 index 13f8292..f4d5ac3 100644
 --- a/src/mesa/drivers/common/meta_tex_subimage.c
 +++ b/src/mesa/drivers/common/meta_tex_subimage.c
 @@ -28,6 +28,7 @@
  #include blend.h
  #include bufferobj.h
  #include buffers.h
 +#include clear.h
  #include fbobject.h
  #include glformats.h
  #include glheader.h
 @@ -278,8 +279,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
 GLuint dims,
 int full_height, image_height;
 struct gl_texture_image *pbo_tex_image;
 struct gl_renderbuffer *rb = NULL;
 -   GLenum status;
 -   bool success = false;
 +   GLenum status, src_base_format;
 +   bool success = false, clear_channels_to_zero = false;
 +   float save_clear_color[4];
 int z;
  
 if (!_mesa_is_bufferobj(packing-BufferObj))
 @@ -380,6 +382,27 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
 GLuint dims,
GL_COLOR_BUFFER_BIT, GL_NEAREST))
goto fail;
  
 +   src_base_format = tex_image ?
 + tex_image-_BaseFormat :
 + ctx-ReadBuffer-_ColorReadBuffer-_BaseFormat;
 +
 +   /* Depending on the base formats involved we might need to rebase some
 +* values. For example if we download from a Luminance format to RGBA
 +* format, we want G=0 and B=0.
 +*/
 +   clear_channels_to_zero =
 +  _mesa_need_luminance_to_rgb_conversion(src_base_format,
 + pbo_tex_image-_BaseFormat);
 +
 +   if (clear_channels_to_zero) {
 +  memcpy(save_clear_color, ctx-Color.ClearColor.f, 4 * sizeof(float));
 +  /* Clear the Green, Blue channels. */
 +  _mesa_ColorMask(GL_FALSE, GL_TRUE, GL_TRUE,
 +  src_base_format != GL_LUMINANCE_ALPHA);
 +  _mesa_ClearColor(0.0, 0.0, 0.0, 1.0);
 +  _mesa_Clear(GL_COLOR_BUFFER_BIT);
 +   }
 +
 for (z = 1; z  depth; z++) {
_mesa_meta_bind_fbo_image(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
  tex_image, zoffset + z);
 @@ -392,6 +415,15 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
 GLuint dims,
   0, z * image_height,
   width, z * image_height + height,
   GL_COLOR_BUFFER_BIT, GL_NEAREST);
 +  if (clear_channels_to_zero)
 + _mesa_Clear(GL_COLOR_BUFFER_BIT);
 +   }
 +
 +   /* Unmask the color channels and restore the saved clear color values. */
 +   if (clear_channels_to_zero) {
 +  _mesa_ColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
 +  _mesa_ClearColor(save_clear_color[0], save_clear_color[1],
 +   save_clear_color[2], save_clear_color[3]);
 }
  
 success = true;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/14] meta: Use _mesa_need_luminance_to_rgb_conversion() in decompress_texture_image()

2015-07-22 Thread Iago Toral
Patches 10-11 are

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/common/meta.c | 12 ++--
  1 file changed, 2 insertions(+), 10 deletions(-)
 
 diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
 index c9e58d8..6108d98 100644
 --- a/src/mesa/drivers/common/meta.c
 +++ b/src/mesa/drivers/common/meta.c
 @@ -3132,16 +3132,8 @@ decompress_texture_image(struct gl_context *ctx,
 * returned as red and two-channel texture values are returned as
 * red/alpha.
 */
 -  if (((baseTexFormat == GL_LUMINANCE ||
 -baseTexFormat == GL_LUMINANCE_ALPHA ||
 -baseTexFormat == GL_INTENSITY) 
 -   (destBaseFormat == GL_RGBA ||
 -destBaseFormat == GL_RGB ||
 -destBaseFormat == GL_RG ||
 -destBaseFormat == GL_GREEN ||
 -destBaseFormat == GL_BLUE ||
 -destBaseFormat == GL_BGRA ||
 -destBaseFormat == GL_BGR)) ||
 +  if (_mesa_need_luminance_to_rgb_conversion(baseTexFormat,
 + destBaseFormat) ||
/* If we're reading back an RGB(A) texture (using glGetTexImage) as
  * luminance then we need to return L=tex(R).
  */


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: free interface_types and replace old hash_table uses

2015-07-13 Thread Iago Toral
On Sat, 2015-07-11 at 10:13 +1000, Timothy Arceri wrote:
 The util/hash_table was intended to be a fast hash table
 replacement for the program/hash_table see 35fd61bd99c1 and 72e55bb6888ff.
 
 This replaces some more uses of the old hash table and also
 destroys the interface_types hash table when _mesa_glsl_release_types()
 is called which wasn't previously being done.
 ---
  Was looking at the remaining program/hash_table uses and noticed that
  interface_types wasnt being freed so thought I'd fix that and replace the
  hash while I was there.
 
  No measurable compile time changes to the public shader-db
 
  src/glsl/glsl_types.cpp | 85 
 ++---
  src/glsl/glsl_types.h   |  2 +-
  2 files changed, 46 insertions(+), 41 deletions(-)
 
 diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
 index 281ff51..255bd69 100644
 --- a/src/glsl/glsl_types.cpp
 +++ b/src/glsl/glsl_types.cpp
 @@ -25,7 +25,7 @@
  #include main/core.h /* for Elements, MAX2 */
  #include glsl_parser_extras.h
  #include glsl_types.h
 -#include program/hash_table.h
 +#include util/hash_table.h
  
 
  mtx_t glsl_type::mutex = _MTX_INITIALIZER_NP;
 @@ -329,14 +329,19 @@ _mesa_glsl_release_types(void)
  * necessary.
  */
 if (glsl_type::array_types != NULL) {
 -  hash_table_dtor(glsl_type::array_types);
 +  _mesa_hash_table_destroy(glsl_type::array_types, NULL);
glsl_type::array_types = NULL;
 }
  
 if (glsl_type::record_types != NULL) {
 -  hash_table_dtor(glsl_type::record_types);
 +  _mesa_hash_table_destroy(glsl_type::record_types, NULL);
glsl_type::record_types = NULL;
 }
 +
 +   if (glsl_type::interface_types != NULL) {
 +  _mesa_hash_table_destroy(glsl_type::interface_types, NULL);
 +  glsl_type::interface_types = NULL;
 +   }

I think it is probably best to put the destruction of interface_types in
a separate patch, it is a different issue after all. You can add my
Reviewed-by on that patch.

With that and a couple of other minor nitpicks I mention below fixed,
this is:
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

  }
  
 
 @@ -648,27 +653,28 @@ glsl_type::get_array_instance(const glsl_type *base, 
 unsigned array_size)
 mtx_lock(glsl_type::mutex);
  
 if (array_types == NULL) {
 -  array_types = hash_table_ctor(64, hash_table_string_hash,
 - hash_table_string_compare);
 +  array_types = _mesa_hash_table_create(NULL, _mesa_key_hash_string,
 +_mesa_key_string_equal);
 }
  
 -   const glsl_type *t = (glsl_type *) hash_table_find(array_types, key);
 -
 -   if (t == NULL) {
 +   const struct hash_entry *entry = _mesa_hash_table_search(array_types, 
 key);
 +   if (entry == NULL) {
mtx_unlock(glsl_type::mutex);
 -  t = new glsl_type(base, array_size);
 +  const glsl_type *t = new glsl_type(base, array_size);
mtx_lock(glsl_type::mutex);
  
 -  hash_table_insert(array_types, (void *) t, ralloc_strdup(mem_ctx, 
 key));
 +  entry = _mesa_hash_table_insert(array_types,
 +  ralloc_strdup(mem_ctx, key),
 +  (void *) t);
 }
  
 -   assert(t-base_type == GLSL_TYPE_ARRAY);
 -   assert(t-length == array_size);
 -   assert(t-fields.array == base);
 +   assert(((glsl_type *)entry-data)-base_type == GLSL_TYPE_ARRAY);
 +   assert(((glsl_type *)entry-data)-length == array_size);
 +   assert(((glsl_type *)entry-data)-fields.array == base);

Other parts of this file put a blank between the type cast and the
variable, so I would add that here (and in all other places where you
cast entry to glsl_type* in this patch).

 mtx_unlock(glsl_type::mutex);
  
 -   return t;
 +   return (glsl_type *)entry-data;
  }
  
 
 @@ -722,19 +728,13 @@ glsl_type::record_compare(const glsl_type *b) const
  }
  
 
 -int
 +bool
  glsl_type::record_key_compare(const void *a, const void *b)
  {
 const glsl_type *const key1 = (glsl_type *) a;
 const glsl_type *const key2 = (glsl_type *) b;
  
 -   /* Return zero is the types match (there is zero difference) or non-zero
 -* otherwise.
 -*/
 -   if (strcmp(key1-name, key2-name) != 0)
 -  return 1;
 -
 -   return !key1-record_compare(key2);
 +   return strcmp(key1-name, key2-name) == 0  key1-record_compare(key2);
  }
  
 
 @@ -772,25 +772,27 @@ glsl_type::get_record_instance(const glsl_struct_field 
 *fields,
 mtx_lock(glsl_type::mutex);
  
 if (record_types == NULL) {
 -  record_types = hash_table_ctor(64, record_key_hash, 
 record_key_compare);
 +  record_types = _mesa_hash_table_create(NULL, record_key_hash,
 + record_key_compare);
 }
  
 -   const glsl_type *t = (glsl_type *) hash_table_find(record_types,  key);
 -   if (t == NULL) {
 +   const struct hash_entry *entry = _mesa_hash_table_search(record_types

Re: [Mesa-dev] [PATCH v3 (part2) 00/56] ARB_shader_storage_buffer_object (mesa, i965)

2015-07-14 Thread Iago Toral
BTW, notice that patches 1-3, 5-7 and 51 have already been reviewed by
Jordan.

Iago

On Tue, 2015-07-14 at 09:46 +0200, Iago Toral Quiroga wrote:
 This is the second part of the v3 series including remaining frontend bits
 like the optional unsized array at the bottom of SSBO definitions, layout
 mode std430 and implementation for GL queries/getters. It also includes
 all of the i965 backend implementation.
 
 Notice that this series depends on the following patches from
 Curro that have not been landed in master yet:
 
 i965/vec4: Import surface message builder functions.
 i965/vec4: Import helpers to convert vectors into arrays and back.
 i965/vec4: Import array utils for the surface message builder.
 i965/fs: Import surface message builder functions.
 i965/fs: Import helpers to convert vectors into arrays and back.
 i965/fs: Import array utils for the surface message builder.
 i965: Lift the constness restriction on surface indices passed to untyped ops.
 i965/vec4: Introduce VEC4 IR builder.
 i965/vec4: Make src_reg conversion constructor from dst_reg implicit.
 i965: Add register constructors taking a backend_reg as argument.
 i965/fs: Fix register coalesce not to lose track of the second half of 
 16-wide moves.
 
 This v3-part2 series, together with these patches from Curro and part1 (which
 has already been landed in master), can be obtained from this development
 repository:
 
 git clone -b itoral-ARB_shader_storage_buffer_object-v3-part2 
 https://github.com/Igalia/mesa.git
 
 Also notice that:
 * Patch 55 is a separate bugfix that affects UBOs as well.
 * Patch 56 is not required by the series and is completely optional.
 
 SSBO piglit tests have already been merged in piglit's master branch.
 
 Antia Puentes (1):
   glsl: Consider active all elements of a shared/std140 block array
 
 Iago Toral Quiroga (29):
   i965: Use 16-byte offset alignment for shader storage buffers
   i965: Implement DriverFlags.NewShaderStorageBuffer
   i965: Set MaxShaderStorageBuffers for compute shaders
   i965: Upload Shader Storage Buffer Object surfaces
   i965: handle visiting of ir_var_shader_storage variables
   i965/fs: Do not split buffer variables
   i965/fs: Do not include the header with a pixel mask in untyped read
 messages
   i965/vec4: Implement SSBO writes
   nir: Implement __intrinsic_store_ssbo
   i965/nir/fs: Implement SSBO writes
   i965/vec4: Implement SSBO reads
   nir: Implement __intrinsic_store_load
   i965/nir/fs: Implement SSBO reads
   nir: ignore an instruction's dest if it hasn't any
   glsl: Rename atomic counter functions
   glsl: Add atomic functions from ARB_shader_storage_buffer_object
   glsl: lower SSBO atomic intrinsics
   nir: Implement lowered SSBO atomic intrinsics
   i965/nir/fs: Implement SSBO atomics
   i965/vec4: Implement lowered SSBO atomic intrinsics
   glsl: First argument to atomic functions must be a buffer variable
   mesa: Add queries for GL_SHADER_STORAGE_BUFFER
   glsl: Allow use of memory qualifiers with
 ARB_shader_storage_buffer_object.
   glsl: Apply memory qualifiers to buffer variables
   glsl: Allow memory layout qualifiers on shader storage buffer objects
   glsl: Do not allow assignments to read-only variables
   glsl: Do not allow reads from write-only variables
   docs: Mark ARB_shader_storage_buffer_object as done for i965.
   i965/vec4: Skip dependency control for opcodes emitting multiple
 instructions
 
 Kristian Høgsberg (1):
   glsl: atomic counters can be declared as buffer-qualified variables
 
 Samuel Iglesias Gonsalvez (25):
   i965: set ARB_shader_storage_buffer_object related constant values
   glsl: add support for unsized arrays in shader storage blocks
   glsl: Add parser/compiler support for unsized array's length()
   glsl: Lower unsized array's length calculation expression
   nir: add shader storage buffer's unsized array length calculation
   i965/vec4: Implement unsized array's length calculation
   i965/fs: Implement generator code for unsized array's length
 calculation
   i965/fs/nir: implement unsized array's length calculation
   i965/wm: emit null buffer surfaces when null buffers are attached
   i965/wm: surfaces should have the API buffer size, not the drm buffer
 size
   glsl: Add parser/compiler support for std430 interface packing
 qualifier
   glsl: propagate interface packing information to arrays of scalars,
 vectors.
   glsl: propagate std430 packing qualifier to struct's members and array
 of structs
   glsl: add std430 interface packing support to ssbo store and unsized
 array length
   glsl: a shader storage buffer must be smaller than the maximum size
 allowed
   glsl: number of active shader storage blocks must be within allowed
 limits
   glsl: ignore buffer variables when counting uniform components
   glsl: shader storage blocks use different max block size values than
 uniforms
   mesa: add glShaderStorageBlockBinding()
   glsl: fix UNIFORM_BUFFER_START

Re: [Mesa-dev] [PATCH v3 (part2) 54/56] docs: Mark ARB_shader_storage_buffer_object as done for i965.

2015-07-14 Thread Iago Toral
Hi Mike,

On Tue, 2015-07-14 at 09:30 +, Mike Lothian wrote:
 Hi Iago
 
 
 Nice work, was it an oversight this wasn't enabled for GLES 3.1? Or is
 the implementation slightly different?

Good question, to be honest I don't remember right now, I think I
checked this at some point and did not see any issues with enabling this
for GLES 3.1 at the time but then I forgot to enable it, but I need to
double check this though.

Iago

 
 Cheers
 
 
 Mike
 
 On Tue, 14 Jul 2015 at 08:48 Iago Toral Quiroga ito...@igalia.com
 wrote:
 
 ---
  docs/GL3.txt | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/docs/GL3.txt b/docs/GL3.txt
 index 33a282e..6427616 100644
 --- a/docs/GL3.txt
 +++ b/docs/GL3.txt
 @@ -164,7 +164,7 @@ GL 4.3, GLSL 4.30:
GL_ARB_program_interface_query   DONE
 (all drivers)
GL_ARB_robust_buffer_access_behavior not
 started
GL_ARB_shader_image_size in
 progress (Martin Peres)
 -  GL_ARB_shader_storage_buffer_object  in
 progress (Iago Toral, Samuel Iglesias)
 +  GL_ARB_shader_storage_buffer_object  DONE
 (i965)
GL_ARB_stencil_texturing DONE
 (i965/gen8+, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe)
GL_ARB_texture_buffer_range  DONE
 (nv50, nvc0, i965, r600, radeonsi, llvmpipe)
GL_ARB_texture_query_levels  DONE
 (all drivers that support GLSL 1.30)
 --
 1.9.1
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 10/13] swrast: Avoid double promotion.

2015-07-14 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 ---
  src/mesa/swrast/s_aaline.c| 28 ++--
  src/mesa/swrast/s_aalinetemp.h|  4 ++--
  src/mesa/swrast/s_atifragshader.c |  4 ++--
  src/mesa/swrast/s_copypix.c   |  6 +++---
  src/mesa/swrast/s_drawpix.c   | 12 ++--
  src/mesa/swrast/s_fragprog.c  |  4 ++--
  src/mesa/swrast/s_lines.c |  4 ++--
  src/mesa/swrast/s_points.c| 10 +-
  src/mesa/swrast/s_span.c  | 10 +-
  src/mesa/swrast/s_texcombine.c|  6 +++---
  src/mesa/swrast/s_texfilter.c |  8 
  src/mesa/swrast/s_tritemp.h   |  2 +-
  src/mesa/swrast/s_zoom.c  |  2 +-
  13 files changed, 50 insertions(+), 50 deletions(-)
 
 diff --git a/src/mesa/swrast/s_aaline.c b/src/mesa/swrast/s_aaline.c
 index f3258e8..de5b42b 100644
 --- a/src/mesa/swrast/s_aaline.c
 +++ b/src/mesa/swrast/s_aaline.c
 @@ -116,11 +116,11 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, 
 GLfloat y1,
 const GLfloat b = pz * py;
 const GLfloat c = px * px + py * py;
 const GLfloat d = -(a * x0 + b * y0 + c * z0);
 -   if (a == 0.0  b == 0.0  c == 0.0  d == 0.0) {
 -  plane[0] = 0.0;
 -  plane[1] = 0.0;
 -  plane[2] = 1.0;
 -  plane[3] = 0.0;
 +   if (a == 0.0F  b == 0.0F  c == 0.0F  d == 0.0F) {
 +  plane[0] = 0.0F;
 +  plane[1] = 0.0F;
 +  plane[2] = 1.0F;
 +  plane[3] = 0.0F;
 }
 else {
plane[0] = a;
 @@ -135,9 +135,9 @@ compute_plane(GLfloat x0, GLfloat y0, GLfloat x1, GLfloat 
 y1,
  static inline void
  constant_plane(GLfloat value, GLfloat plane[4])
  {
 -   plane[0] = 0.0;
 -   plane[1] = 0.0;
 -   plane[2] = -1.0;
 +   plane[0] = 0.0F;
 +   plane[1] = 0.0F;
 +   plane[2] = -1.0F;
 plane[3] = value;
  }
  
 @@ -160,8 +160,8 @@ static inline GLfloat
  solve_plane_recip(GLfloat x, GLfloat y, const GLfloat plane[4])
  {
 const GLfloat denom = plane[3] + plane[0] * x + plane[1] * y;
 -   if (denom == 0.0)
 -  return 0.0;
 +   if (denom == 0.0F)
 +  return 0.0F;
 else
return -plane[2] / denom;
  }
 @@ -374,7 +374,7 @@ segment(struct gl_context *ctx,
if (x0  x1) {
   xLeft = x0 - line-halfWidth;
   xRight = x1 + line-halfWidth;
 - if (line-dy = 0.0) {
 + if (line-dy = 0.0F) {
  yBot = y0 - 3.0F * line-halfWidth;
  yTop = y0 + line-halfWidth;
   }
 @@ -386,7 +386,7 @@ segment(struct gl_context *ctx,
else {
   xLeft = x1 - line-halfWidth;
   xRight = x0 + line-halfWidth;
 - if (line-dy = 0.0) {
 + if (line-dy = 0.0F) {
  yBot = y1 - 3.0F * line-halfWidth;
  yTop = y1 + line-halfWidth;
   }
 @@ -420,7 +420,7 @@ segment(struct gl_context *ctx,
if (y0  y1) {
   yBot = y0 - line-halfWidth;
   yTop = y1 + line-halfWidth;
 - if (line-dx = 0.0) {
 + if (line-dx = 0.0F) {
  xLeft = x0 - 3.0F * line-halfWidth;
  xRight = x0 + line-halfWidth;
   }
 @@ -432,7 +432,7 @@ segment(struct gl_context *ctx,
else {
   yBot = y1 - line-halfWidth;
   yTop = y0 + line-halfWidth;
 - if (line-dx = 0.0) {
 + if (line-dx = 0.0F) {
  xLeft = x1 - 3.0F * line-halfWidth;
  xRight = x1 + line-halfWidth;
   }
 diff --git a/src/mesa/swrast/s_aalinetemp.h b/src/mesa/swrast/s_aalinetemp.h
 index f1d078f..bebb131 100644
 --- a/src/mesa/swrast/s_aalinetemp.h
 +++ b/src/mesa/swrast/s_aalinetemp.h
 @@ -44,7 +44,7 @@ NAME(plot)(struct gl_context *ctx, struct LineInfo *line, 
 int ix, int iy)
  
 (void) swrast;
  
 -   if (coverage == 0.0)
 +   if (coverage == 0.0F)
return;
  
 line-span.end++;
 @@ -123,7 +123,7 @@ NAME(line)(struct gl_context *ctx, const SWvertex *v0, 
 const SWvertex *v1)
   ctx-Const.MinLineWidthAA,
   ctx-Const.MaxLineWidthAA);
  
 -   if (line.len == 0.0 || IS_INF_OR_NAN(line.len))
 +   if (line.len == 0.0F || IS_INF_OR_NAN(line.len))
return;
  
 INIT_SPAN(line.span, GL_LINE);
 diff --git a/src/mesa/swrast/s_atifragshader.c 
 b/src/mesa/swrast/s_atifragshader.c
 index 9e029db..2974dee 100644
 --- a/src/mesa/swrast/s_atifragshader.c
 +++ b/src/mesa/swrast/s_atifragshader.c
 @@ -436,13 +436,13 @@ execute_shader(struct gl_context *ctx, const struct 
 ati_fragment_shader *shader,
for (i = 0; i  3; i++) {
   dst[optype][i] =
  (src[optype][2][i] 
 - 0.5) ? src[optype][0][i] : src[optype][1][i];
 + 0.5F) ? src[optype][0][i] : src[optype][1][i];
}
 }
 else {
dst[optype][3] =
   (src[optype][2][3

Re: [Mesa-dev] [PATCH 05/13] nir: Avoid double promition.

2015-07-14 Thread Iago Toral
On Tue, 2015-07-14 at 11:30 +0200, Iago Toral wrote:
 Reviewed-by: Iago Toral Quiroga ito...@igalia.com

BTW, ldexpf and copysignf are c99 too, so I guess you'll need to wrap
these too.

Iago

 On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
  ---
   src/glsl/nir/nir_opcodes.py | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)
  
  diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
  index 56e96d9..df5b7e2 100644
  --- a/src/glsl/nir/nir_opcodes.py
  +++ b/src/glsl/nir/nir_opcodes.py
  @@ -474,10 +474,10 @@ else
   )
   
   opcode(ldexp, 0, tfloat, [0, 0], [tfloat, tint], , 
  -dst = ldexp(src0, src1);
  +dst = ldexpf(src0, src1);
   /* flush denormals to zero. */
   if (!isnormal(dst))
  -   dst = copysign(0.0f, src0);
  +   dst = copysignf(0.0f, src0);
   )
   
   # Combines the first component of each input to make a 2-component vector.
 
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/13] program: Avoid double promotion.

2015-07-14 Thread Iago Toral
Needs wrappers for sinf, cosf, powf, logf.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 ---
  src/mesa/program/prog_execute.c | 30 +++---
  1 file changed, 15 insertions(+), 15 deletions(-)
 
 diff --git a/src/mesa/program/prog_execute.c b/src/mesa/program/prog_execute.c
 index 77274e2..2c52d0d 100644
 --- a/src/mesa/program/prog_execute.c
 +++ b/src/mesa/program/prog_execute.c
 @@ -623,7 +623,7 @@ _mesa_execute_program(struct gl_context * ctx,
  GLfloat a[4], result[4];
  fetch_vector1(inst-SrcReg[0], machine, a);
  result[0] = result[1] = result[2] = result[3]
 -   = (GLfloat) cos(a[0]);
 +   = cosf(a[0]);
  store_vector4(inst, machine, result);
   }
   break;
 @@ -776,7 +776,7 @@ _mesa_execute_program(struct gl_context * ctx,
  if (inst-SrcReg[0].File != PROGRAM_UNDEFINED) {
 GLfloat a[4];
 fetch_vector1(inst-SrcReg[0], machine, a);
 -   cond = (a[0] != 0.0);
 +   cond = (a[0] != 0.0F);
  }
  else {
 cond = eval_condition(machine, inst);
 @@ -834,7 +834,7 @@ _mesa_execute_program(struct gl_context * ctx,
 val = -FLT_MAX;
  }
  else {
 -   val = (float)(log(a[0]) * 1.442695F);
 +   val = logf(a[0]) * 1.442695F;
  }
  result[0] = result[1] = result[2] = result[3] = val;
  store_vector4(inst, machine, result);
 @@ -853,10 +853,10 @@ _mesa_execute_program(struct gl_context * ctx,
  result[1] = a[0];
  /* XXX we could probably just use pow() here */
  if (a[0]  0.0F) {
 -   if (a[1] == 0.0  a[3] == 0.0)
 +   if (a[1] == 0.0F  a[3] == 0.0F)
result[2] = 1.0F;
 else
 -  result[2] = (GLfloat) pow(a[1], a[3]);
 +  result[2] = powf(a[1], a[3]);
  }
  else {
 result[2] = 0.0F;
 @@ -886,12 +886,12 @@ _mesa_execute_program(struct gl_context * ctx,
int exponent;
GLfloat mantissa = frexpf(t[0], exponent);
q[0] = (GLfloat) (exponent - 1);
 -  q[1] = (GLfloat) (2.0 * mantissa); /* map [.5, 1) - [1, 
 2) */
 +  q[1] = 2.0F * mantissa; /* map [.5, 1) - [1, 2) */
  
 /* The fast LOG2 macro doesn't meet the precision
  * requirements.
  */
 -  q[2] = (float)(log(t[0]) * 1.442695F);
 +  q[2] = logf(t[0]) * 1.442695F;
 }
  }
  else {
 @@ -1051,7 +1051,7 @@ _mesa_execute_program(struct gl_context * ctx,
  fetch_vector1(inst-SrcReg[0], machine, a);
  fetch_vector1(inst-SrcReg[1], machine, b);
  result[0] = result[1] = result[2] = result[3]
 -   = (GLfloat) pow(a[0], b[0]);
 +   = powf(a[0], b[0]);
  store_vector4(inst, machine, result);
   }
   break;
 @@ -1095,10 +1095,10 @@ _mesa_execute_program(struct gl_context * ctx,
   {
  GLfloat a[4], result[4];
  fetch_vector1(inst-SrcReg[0], machine, a);
 -result[0] = (GLfloat) cos(a[0]);
 -result[1] = (GLfloat) sin(a[0]);
 -result[2] = 0.0;/* undefined! */
 -result[3] = 0.0;/* undefined! */
 +result[0] = cosf(a[0]);
 +result[1] = sinf(a[0]);
 +result[2] = 0.0F;/* undefined! */
 +result[3] = 0.0F;/* undefined! */
  store_vector4(inst, machine, result);
   }
   break;
 @@ -1161,7 +1161,7 @@ _mesa_execute_program(struct gl_context * ctx,
  GLfloat a[4], result[4];
  fetch_vector1(inst-SrcReg[0], machine, a);
  result[0] = result[1] = result[2] = result[3]
 -   = (GLfloat) sin(a[0]);
 +   = sinf(a[0]);
  store_vector4(inst, machine, result);
   }
   break;
 @@ -1360,7 +1360,7 @@ _mesa_execute_program(struct gl_context * ctx,
   * zero, we'd probably be fine except for an assert in
   * IROUND_POS() which gets triggered by the inf values created.
   */
 -if (texcoord[3] != 0.0) {
 +if (texcoord[3] != 0.0F) {
 texcoord[0] /= texcoord[3];
 texcoord[1] /= texcoord[3];
 texcoord[2] /= texcoord[3];
 @@ -1380,7 +1380,7 @@ _mesa_execute_program(struct gl_context * ctx,
  
  fetch_vector4(inst-SrcReg[0], machine, texcoord);
  if (inst-TexSrcTarget != TEXTURE_CUBE_INDEX 
 -texcoord[3] != 0.0) {
 +texcoord[3] != 0.0F) {
 texcoord[0

Re: [Mesa-dev] [PATCH 07/13] util: Avoid double promition.

2015-07-14 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 ---
  src/util/register_allocate.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/src/util/register_allocate.c b/src/util/register_allocate.c
 index 2ad8c3c..95be20f 100644
 --- a/src/util/register_allocate.c
 +++ b/src/util/register_allocate.c
 @@ -648,7 +648,7 @@ ra_get_best_spill_node(struct ra_graph *g)
float cost = g-nodes[n].spill_cost;
float benefit;
  
 -  if (cost = 0.0)
 +  if (cost = 0.0f)
continue;
  
if (g-nodes[n].in_stack)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/13] tnl: Avoid double promotion.

2015-07-14 Thread Iago Toral
On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 There are a couple of unrelated changes in t_vb_lighttmp.h that I hope
 you'll excuse -- there's a block of code that's duplicated modulo a few
 trivial differences that I took the liberty of fixing.
 ---
  src/mesa/tnl/t_draw.c   |  2 +-
  src/mesa/tnl/t_rasterpos.c  |  6 +++---
  src/mesa/tnl/t_vb_fog.c |  6 +++---
  src/mesa/tnl/t_vb_light.c   | 16 
  src/mesa/tnl/t_vb_lighttmp.h| 16 +++-
  src/mesa/tnl/t_vb_normals.c |  4 ++--
  src/mesa/tnl/t_vertex_generic.c |  2 +-
  7 files changed, 25 insertions(+), 27 deletions(-)
 
 diff --git a/src/mesa/tnl/t_draw.c b/src/mesa/tnl/t_draw.c
 index 6adf1dc..713c6a9 100644
 --- a/src/mesa/tnl/t_draw.c
 +++ b/src/mesa/tnl/t_draw.c
 @@ -257,7 +257,7 @@ static GLboolean *_tnl_import_edgeflag( struct gl_context 
 *ctx,
 GLuint i;
  
 for (i = 0; i  count; i++) {
 -  *bptr++ = ((GLfloat *)ptr)[0] == 1.0;
 +  *bptr++ = ((GLfloat *)ptr)[0] == 1.0F;
ptr += stride;
 }
  
 diff --git a/src/mesa/tnl/t_rasterpos.c b/src/mesa/tnl/t_rasterpos.c
 index 7ef50ea..4bd9ac8 100644
 --- a/src/mesa/tnl/t_rasterpos.c
 +++ b/src/mesa/tnl/t_rasterpos.c
 @@ -148,7 +148,7 @@ shade_rastpos(struct gl_context *ctx,
SUB_3V(VP, light-_Position, vertex);
   /* d = length(VP) */
d = (GLfloat) LEN_3FV( VP );
 -  if (d  1.0e-6) {
 +  if (d  1.0e-6F) {
  /* normalize VP */
   GLfloat invd = 1.0F / d;
   SELF_SCALE_SCALAR_3V(VP, invd);
 @@ -172,7 +172,7 @@ shade_rastpos(struct gl_context *ctx,
}
}
  
 -  if (attenuation  1e-3)
 +  if (attenuation  1e-3F)
continue;
  
n_dot_VP = DOT3( normal, VP );
 @@ -219,7 +219,7 @@ shade_rastpos(struct gl_context *ctx,
   shine = ctx-Light.Material.Attrib[MAT_ATTRIB_FRONT_SHININESS][0];
   spec_coef = powf(n_dot_h, shine);
  
 - if (spec_coef  1.0e-10) {
 + if (spec_coef  1.0e-10F) {
 if 
 (ctx-Light.Model.ColorControl==GL_SEPARATE_SPECULAR_COLOR) {
ACC_SCALE_SCALAR_3V( specularContrib, spec_coef,
 light-_MatSpecular[0]);
 diff --git a/src/mesa/tnl/t_vb_fog.c b/src/mesa/tnl/t_vb_fog.c
 index 1ca72f8..5489ed6 100644
 --- a/src/mesa/tnl/t_vb_fog.c
 +++ b/src/mesa/tnl/t_vb_fog.c
 @@ -45,8 +45,8 @@ struct fog_stage_data {
  #define FOG_STAGE_DATA(stage) ((struct fog_stage_data *)stage-privatePtr)
  
  #define FOG_EXP_TABLE_SIZE 256
 -#define FOG_MAX (10.0)
 -#define EXP_FOG_MAX .0006595
 +#define FOG_MAX (10.0F)
 +#define EXP_FOG_MAX .0006595F
  #define FOG_INCR (FOG_MAX/FOG_EXP_TABLE_SIZE)
  static GLfloat exp_table[FOG_EXP_TABLE_SIZE];
  static GLfloat inited = 0;
 @@ -54,7 +54,7 @@ static GLfloat inited = 0;
  #if 1
  #define NEG_EXP( result, narg )  
 \
  do { \
 -   GLfloat f = (GLfloat) (narg * (1.0/FOG_INCR));\
 +   GLfloat f = (GLfloat) (narg * (1.0F / FOG_INCR)); \
 GLint k = (GLint) f;  
 \
 if (k  FOG_EXP_TABLE_SIZE-2) \
result = (GLfloat) EXP_FOG_MAX;
 \
 diff --git a/src/mesa/tnl/t_vb_light.c b/src/mesa/tnl/t_vb_light.c
 index dbd57fa..df9073e 100644
 --- a/src/mesa/tnl/t_vb_light.c
 +++ b/src/mesa/tnl/t_vb_light.c
 @@ -137,23 +137,23 @@ validate_shine_table( struct gl_context *ctx, GLuint 
 side, GLfloat shininess )
   break;
  
m = s-tab;
 -  m[0] = 0.0;
 -  if (shininess == 0.0) {
 +  m[0] = 0.0F;
 +  if (shininess == 0.0F) {
for (j = 1 ; j = SHINE_TABLE_SIZE ; j++)
 - m[j] = 1.0;
 + m[j] = 1.0F;
}
else {
for (j = 1 ; j  SHINE_TABLE_SIZE ; j++) {
  GLdouble t, x = j / (GLfloat) (SHINE_TABLE_SIZE - 1);

I think you want to declare x (and probably t as well) with type GLfloat
here.

 -if (x  0.005) /* underflow check */
 -   x = 0.005;
 +if (x  0.005F) /* underflow check */
 +   x = 0.005F;
  t = pow(x, shininess);

Since the code below casts t to float anyway, is there a reason why you
did not use powf and declared t as float? Also, there is no point in
making the change below if you don't do that, right?

 - if (t  1e-20)
 + if (t  1e-20F)
  m[j] = (GLfloat) t;
   else
 -m[j] = 0.0;
 +m[j] = 0.0F;
}
 -  m[SHINE_TABLE_SIZE] = 1.0;
 +  m[SHINE_TABLE_SIZE] = 1.0F;
}
  
s-shininess = shininess;
 diff --git a/src/mesa/tnl/t_vb_lighttmp.h b/src/mesa/tnl/t_vb_lighttmp.h
 index f8786ac..3aebcd4 100644
 --- a/src/mesa/tnl/t_vb_lighttmp.h
 +++ b/src/mesa/tnl/t_vb_lighttmp.h
 @@ -112,7 +112,7 @@ 

Re: [Mesa-dev] [PATCH 05/13] nir: Avoid double promition.

2015-07-14 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 ---
  src/glsl/nir/nir_opcodes.py | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
 index 56e96d9..df5b7e2 100644
 --- a/src/glsl/nir/nir_opcodes.py
 +++ b/src/glsl/nir/nir_opcodes.py
 @@ -474,10 +474,10 @@ else
  )
  
  opcode(ldexp, 0, tfloat, [0, 0], [tfloat, tint], , 
 -dst = ldexp(src0, src1);
 +dst = ldexpf(src0, src1);
  /* flush denormals to zero. */
  if (!isnormal(dst))
 -   dst = copysign(0.0f, src0);
 +   dst = copysignf(0.0f, src0);
  )
  
  # Combines the first component of each input to make a 2-component vector.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/13] gallium/auxiliary: Avoid double promotion.

2015-07-14 Thread Iago Toral
This one will need wrapping for fabsf.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 ---
  src/gallium/auxiliary/util/u_format_rgb9e5.h | 2 +-
  src/gallium/auxiliary/util/u_math.h  | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/src/gallium/auxiliary/util/u_format_rgb9e5.h 
 b/src/gallium/auxiliary/util/u_format_rgb9e5.h
 index 7a01f7f..d1ace3f 100644
 --- a/src/gallium/auxiliary/util/u_format_rgb9e5.h
 +++ b/src/gallium/auxiliary/util/u_format_rgb9e5.h
 @@ -75,7 +75,7 @@ typedef union {
  
  static INLINE float rgb9e5_ClampRange(float x)
  {
 -   if (x  0.0) {
 +   if (x  0.0f) {
if (x = MAX_RGB9E5) {
   return MAX_RGB9E5;
} else {
 diff --git a/src/gallium/auxiliary/util/u_math.h 
 b/src/gallium/auxiliary/util/u_math.h
 index 3b4040f..9c3cb6a 100644
 --- a/src/gallium/auxiliary/util/u_math.h
 +++ b/src/gallium/auxiliary/util/u_math.h
 @@ -240,7 +240,7 @@ util_iround(float f)
  static INLINE boolean
  util_is_approx(float a, float b, float tol)
  {
 -   return fabs(b - a) = tol;
 +   return fabsf(b - a) = tol;
  }
  
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/13] i965: Use float calculations when double is unnecessary.

2015-07-14 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 Literals without an f/F suffix are of type double, and implicit
 conversion rules specify that the float in (float op double) be
 converted to a double before the operation is performed. I believe float
 execution was intended (in nearly all cases) or is sufficient (in the
 case of gen7_urb.c).
 
 Removes a lot of float - double conversion instructions and replaces
 many double instructions with float instructions which are cheaper.
 
text data  bss  dec  hex  filename
 4928659   19516026192  5150011   4e953b  i965_dri.so before
 4928315   19515226192  5149659   4e93db  i965_dri.so after
 ---
  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp   | 22 
 +++---
  src/mesa/drivers/dri/i965/brw_fs.cpp   |  4 ++--
  src/mesa/drivers/dri/i965/brw_meta_fast_clear.c|  4 ++--
  src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c  |  4 ++--
  src/mesa/drivers/dri/i965/brw_misc_state.c |  4 ++--
  src/mesa/drivers/dri/i965/brw_sampler_state.c  |  4 ++--
  src/mesa/drivers/dri/i965/brw_sf_state.c   |  9 +
  src/mesa/drivers/dri/i965/brw_state_cache.c|  2 +-
  src/mesa/drivers/dri/i965/brw_util.h   |  4 ++--
  src/mesa/drivers/dri/i965/gen6_multisample_state.c |  4 ++--
  src/mesa/drivers/dri/i965/gen6_sf_state.c  |  2 +-
  src/mesa/drivers/dri/i965/gen7_sf_state.c  |  2 +-
  src/mesa/drivers/dri/i965/gen7_urb.c   |  2 +-
  src/mesa/drivers/dri/i965/gen8_sf_state.c  |  2 +-
  14 files changed, 35 insertions(+), 34 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
 b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
 index 1561b59..205c905 100644
 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
 @@ -1285,8 +1285,8 @@ brw_blorp_blit_program::translate_dst_to_src()
/* Round the float coordinates down to nearest integer */
emit_rndd(Xp_f, X_f);
emit_rndd(Yp_f, Y_f);
 -  emit_mul(X_f, Xp_f, brw_imm_f(1 / key-x_scale));
 -  emit_mul(Y_f, Yp_f, brw_imm_f(1 / key-y_scale));
 +  emit_mul(X_f, Xp_f, brw_imm_f(1.0f / key-x_scale));
 +  emit_mul(Y_f, Yp_f, brw_imm_f(1.0f / key-y_scale));
SWAP_XY_AND_XPYP();
 } else if (!key-bilinear_filter) {
/* Round the float coordinates down to nearest integer by moving to
 @@ -1442,7 +1442,7 @@ brw_blorp_blit_program::manual_blend_average(unsigned 
 num_samples)
for (int j = 0; j  4; ++j) {
   emit_mul(offset(texture_data[0], 2*j),
   offset(vec8(texture_data[0]), 2*j),
 - brw_imm_f(1.0/num_samples));
 + brw_imm_f(1.0f / num_samples));
}
 }
  
 @@ -1475,9 +1475,9 @@ brw_blorp_blit_program::manual_blend_bilinear(unsigned 
 num_samples)
  
/* Compute pixel coordinates */
emit_add(vec16(x_sample_coords), Xp_f,
 -  brw_imm_f((float)(i  0x1) * (1.0 / key-x_scale)));
 +  brw_imm_f((float)(i  0x1) * (1.0f / key-x_scale)));
emit_add(vec16(y_sample_coords), Yp_f,
 -  brw_imm_f((float)((i  1)  0x1) * (1.0 / key-y_scale)));
 +  brw_imm_f((float)((i  1)  0x1) * (1.0f / key-y_scale)));
emit_mov(vec16(X), x_sample_coords);
emit_mov(vec16(Y), y_sample_coords);
  
 @@ -1789,7 +1789,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, 
 GLfloat src1,
 * so 0.5 provides the necessary correction.
 */
multiplier = scale;
 -  offset = src0 + (-dst0 + 0.5) * scale;
 +  offset = src0 + (-dst0 + 0.5f) * scale;
 } else {
/* When mirroring X we need:
 *   src_x - src_x0 = dst_x1 - dst_x - 0.5
 @@ -1797,7 +1797,7 @@ brw_blorp_coord_transform_params::setup(GLfloat src0, 
 GLfloat src1,
 *   src_x = src_x0 + (dst_x1 -dst_x - 0.5) * scale
 */
multiplier = -scale;
 -  offset = src0 + (dst1 - 0.5) * scale;
 +  offset = src0 + (dst1 - 0.5f) * scale;
 }
  }
  
 @@ -1952,8 +1952,8 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct 
 brw_context *brw,
 /* Scaling factors used for bilinear filtering in multisample scaled
  * blits.
  */
 -   wm_prog_key.x_scale = 2.0;
 -   wm_prog_key.y_scale = src_mt-num_samples / 2.0;
 +   wm_prog_key.x_scale = 2.0f;
 +   wm_prog_key.y_scale = src_mt-num_samples / 2.0f;
  
 if (filter == GL_LINEAR  src.num_samples = 1  dst.num_samples = 1)
wm_prog_key.bilinear_filter = true;
 @@ -2000,9 +2000,9 @@ brw_blorp_blit_params::brw_blorp_blit_params(struct 
 brw_context *brw,
 x1 = wm_push_consts.dst_x1 = roundf(dst_x1);
 y1 = wm_push_consts.dst_y1 = roundf(dst_y1);
 wm_push_consts.rect_grid_x1 = (minify(src_mt-logical_width0, src_level) *
 -  wm_prog_key.x_scale - 1.0

Re: [Mesa-dev] [PATCH 08/13] vbo: Avoid double promotion.

2015-07-14 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 ---
  src/mesa/vbo/vbo_context.c| 6 +++---
  src/mesa/vbo/vbo_exec_array.c | 4 ++--
  2 files changed, 5 insertions(+), 5 deletions(-)
 
 diff --git a/src/mesa/vbo/vbo_context.c b/src/mesa/vbo/vbo_context.c
 index fd1ffe2..e3eb286 100644
 --- a/src/mesa/vbo/vbo_context.c
 +++ b/src/mesa/vbo/vbo_context.c
 @@ -37,9 +37,9 @@
  
  static GLuint check_size( const GLfloat *attr )
  {
 -   if (attr[3] != 1.0) return 4;
 -   if (attr[2] != 0.0) return 3;
 -   if (attr[1] != 0.0) return 2;
 +   if (attr[3] != 1.0F) return 4;
 +   if (attr[2] != 0.0F) return 3;
 +   if (attr[1] != 0.0F) return 2;
 return 1; 
  }
  
 diff --git a/src/mesa/vbo/vbo_exec_array.c b/src/mesa/vbo/vbo_exec_array.c
 index 72b8206..b73aa97 100644
 --- a/src/mesa/vbo/vbo_exec_array.c
 +++ b/src/mesa/vbo/vbo_exec_array.c
 @@ -255,7 +255,7 @@ check_array_data(struct gl_context *ctx, struct 
 gl_client_array *array,
  GLint k;
  for (k = 0; k  array-Size; k++) {
 if (IS_INF_OR_NAN(f[k]) ||
 -   f[k] = 1.0e20 || f[k] = -1.0e10) {
 +   f[k] = 1.0e20F || f[k] = -1.0e10F) {
printf(Bad array data:\n);
printf(  Element[%u].%u = %f\n, j, k, f[k]);
printf(  Array %u at %p\n, attrib, (void* ) array);
 @@ -263,7 +263,7 @@ check_array_data(struct gl_context *ctx, struct 
 gl_client_array *array,
array-Type, array-Size, array-Stride);
printf(  Address/offset %p in Buffer Object %u\n,
array-Ptr, array-BufferObj-Name);
 -  f[k] = 1.0; /* XXX replace the bad value! */
 +  f[k] = 1.0F; /* XXX replace the bad value! */
 }
 /*assert(!IS_INF_OR_NAN(f[k]));*/
  }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/13] mesa/math: Avoid double promotion.

2015-07-14 Thread Iago Toral
; r3[6] *= s; r3[7] *= s;
 @@ -490,26 +490,26 @@ static GLboolean invert_matrix_3d_general( GLmatrix 
 *mat )
  */
 pos = neg = 0.0;
 t =  MAT(in,0,0) * MAT(in,1,1) * MAT(in,2,2);
 -   if (t = 0.0) pos += t; else neg += t;
 +   if (t = 0.0F) pos += t; else neg += t;
  
 t =  MAT(in,1,0) * MAT(in,2,1) * MAT(in,0,2);
 -   if (t = 0.0) pos += t; else neg += t;
 +   if (t = 0.0F) pos += t; else neg += t;
  
 t =  MAT(in,2,0) * MAT(in,0,1) * MAT(in,1,2);
 -   if (t = 0.0) pos += t; else neg += t;
 +   if (t = 0.0F) pos += t; else neg += t;
  
 t = -MAT(in,2,0) * MAT(in,1,1) * MAT(in,0,2);
 -   if (t = 0.0) pos += t; else neg += t;
 +   if (t = 0.0F) pos += t; else neg += t;
  
 t = -MAT(in,1,0) * MAT(in,0,1) * MAT(in,2,2);
 -   if (t = 0.0) pos += t; else neg += t;
 +   if (t = 0.0F) pos += t; else neg += t;
  
 t = -MAT(in,0,0) * MAT(in,2,1) * MAT(in,1,2);
 -   if (t = 0.0) pos += t; else neg += t;
 +   if (t = 0.0F) pos += t; else neg += t;
  
 det = pos + neg;
  
 -   if (fabsf(det)  1e-25)
 +   if (fabsf(det)  1e-25F)
return GL_FALSE;
  
 det = 1.0F / det;
 @@ -564,7 +564,7 @@ static GLboolean invert_matrix_3d( GLmatrix *mat )
 MAT(in,0,1) * MAT(in,0,1) +
 MAT(in,0,2) * MAT(in,0,2));
  
 -  if (scale == 0.0)
 +  if (scale == 0.0F)
   return GL_FALSE;
  
scale = 1.0F / scale;
 @@ -799,8 +799,8 @@ _math_matrix_rotate( GLmatrix *mat,
 GLfloat m[16];
 GLboolean optimized;
  
 -   s = (GLfloat) sin( angle * M_PI / 180.0 );
 -   c = (GLfloat) cos( angle * M_PI / 180.0 );
 +   s = sinf( angle * M_PI / 180.0 );

180.0F

 +   c = cosf( angle * M_PI / 180.0 );

180.0F

I guess we will need wrappers for sinf and cosf.

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

  
 memcpy(m, Identity, sizeof(GLfloat)*16);
 optimized = GL_FALSE;
 @@ -859,7 +859,7 @@ _math_matrix_rotate( GLmatrix *mat,
 if (!optimized) {
const GLfloat mag = sqrtf(x * x + y * y + z * z);
  
 -  if (mag = 1.0e-4) {
 +  if (mag = 1.0e-4F) {
   /* no rotation, leave mat as-is */
   return;
}
 @@ -1070,7 +1070,7 @@ _math_matrix_scale( GLmatrix *mat, GLfloat x, GLfloat 
 y, GLfloat z )
 m[2] *= x;   m[6] *= y;   m[10] *= z;
 m[3] *= x;   m[7] *= y;   m[11] *= z;
  
 -   if (fabsf(x - y)  1e-8  fabsf(x - z)  1e-8)
 +   if (fabsf(x - y)  1e-8F  fabsf(x - z)  1e-8F)
mat-flags |= MAT_FLAG_UNIFORM_SCALE;
 else
mat-flags |= MAT_FLAG_GENERAL_SCALE;
 @@ -1206,7 +1206,7 @@ static void analyse_from_scratch( GLmatrix *mat )
 GLuint i;
  
 for (i = 0 ; i  16 ; i++) {
 -  if (m[i] == 0.0) mask |= (1i);
 +  if (m[i] == 0.0F) mask |= (1i);
 }
  
 if (m[0] == 1.0F) mask |= (116);
 @@ -1240,12 +1240,12 @@ static void analyse_from_scratch( GLmatrix *mat )
mat-type = MATRIX_2D;
  
/* Check for scale */
 -  if (SQ(mm-1)  SQ(1e-6) ||
 -   SQ(m4m4-1)  SQ(1e-6))
 +  if (SQ(mm-1)  SQ(1e-6F) ||
 +   SQ(m4m4-1)  SQ(1e-6F))
mat-flags |= MAT_FLAG_GENERAL_SCALE;
  
/* Check for rotation */
 -  if (SQ(mm4)  SQ(1e-6))
 +  if (SQ(mm4)  SQ(1e-6F))
mat-flags |= MAT_FLAG_GENERAL_3D;
else
mat-flags |= MAT_FLAG_ROTATION;
 @@ -1255,9 +1255,9 @@ static void analyse_from_scratch( GLmatrix *mat )
mat-type = MATRIX_3D_NO_ROT;
  
/* Check for scale */
 -  if (SQ(m[0]-m[5])  SQ(1e-6) 
 -   SQ(m[0]-m[10])  SQ(1e-6)) {
 -  if (SQ(m[0]-1.0)  SQ(1e-6)) {
 +  if (SQ(m[0]-m[5])  SQ(1e-6F) 
 +   SQ(m[0]-m[10])  SQ(1e-6F)) {
 +  if (SQ(m[0]-1.0F)  SQ(1e-6F)) {
   mat-flags |= MAT_FLAG_UNIFORM_SCALE;
   }
}
 @@ -1275,8 +1275,8 @@ static void analyse_from_scratch( GLmatrix *mat )
mat-type = MATRIX_3D;
  
/* Check for scale */
 -  if (SQ(c1-c2)  SQ(1e-6)  SQ(c1-c3)  SQ(1e-6)) {
 -  if (SQ(c1-1.0)  SQ(1e-6))
 +  if (SQ(c1-c2)  SQ(1e-6F)  SQ(c1-c3)  SQ(1e-6F)) {
 +  if (SQ(c1-1.0F)  SQ(1e-6F))
   mat-flags |= MAT_FLAG_UNIFORM_SCALE;
/* else no scale at all */
}
 @@ -1285,10 +1285,10 @@ static void analyse_from_scratch( GLmatrix *mat )
}
  
/* Check for rotation */
 -  if (SQ(d1)  SQ(1e-6)) {
 +  if (SQ(d1)  SQ(1e-6F)) {
CROSS3( cp, m, m+4 );
SUB_3V( cp, cp, (m+8) );
 -  if (LEN_SQUARED_3FV(cp)  SQ(1e-6))
 +  if (LEN_SQUARED_3FV(cp)  SQ(1e-6F))
   mat-flags |= MAT_FLAG_ROTATION;
else
   mat-flags |= MAT_FLAG_GENERAL_3D;
 diff --git a/src/mesa/math/m_norm_tmp.h b/src/mesa/math/m_norm_tmp.h
 index d3ec1c2..6f1db8d 100644
 --- a/src/mesa/math/m_norm_tmp.h
 +++ b/src/mesa/math/m_norm_tmp.h
 @@ -80,7 +80,7 @@ TAG(transform_normalize_normals)( const GLmatrix *mat,
}
 }
 else {
 -  if (scale != 1.0) {
 +  if (scale != 1.0f) {
m0 *= scale,  m4 *= scale,  m8 *= scale;
m1 *= scale,  m5

Re: [Mesa-dev] [PATCH 13/13] mesa: Avoid double promotion.

2015-07-14 Thread Iago Toral
On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
 ---
  src/mesa/main/ffvertex_prog.c | 10 +-
  src/mesa/main/fog.c   |  2 +-
  src/mesa/main/get.c   |  2 +-
  src/mesa/main/light.c | 30 +++---
  src/mesa/main/lines.c |  4 ++--
  src/mesa/main/multisample.c   |  4 ++--
  src/mesa/main/pack.c  | 14 +++---
  src/mesa/main/pixel.c |  4 ++--
  src/mesa/main/pixeltransfer.c |  8 
  src/mesa/main/points.c|  8 
  src/mesa/main/readpix.c   |  4 ++--
  src/mesa/main/samplerobj.c|  2 +-
  src/mesa/main/texparam.c  |  2 +-
  src/mesa/swrast_setup/ss_tritmp.h |  4 ++--
  14 files changed, 49 insertions(+), 49 deletions(-)
 
 diff --git a/src/mesa/main/ffvertex_prog.c b/src/mesa/main/ffvertex_prog.c
 index 70adaf8..95b428d 100644
 --- a/src/mesa/main/ffvertex_prog.c
 +++ b/src/mesa/main/ffvertex_prog.c
 @@ -189,15 +189,15 @@ static void make_state_key( struct gl_context *ctx, 
 struct state_key *key )
if (light-Enabled) {
   key-unit[i].light_enabled = 1;
  
 - if (light-EyePosition[3] == 0.0)
 + if (light-EyePosition[3] == 0.0F)
  key-unit[i].light_eyepos3_is_zero = 1;
  
 - if (light-SpotCutoff == 180.0)
 + if (light-SpotCutoff == 180.0F)
  key-unit[i].light_spotcutoff_is_180 = 1;
  
 - if (light-ConstantAttenuation != 1.0 ||
 - light-LinearAttenuation != 0.0 ||
 - light-QuadraticAttenuation != 0.0)
 + if (light-ConstantAttenuation != 1.0F ||
 + light-LinearAttenuation != 0.0F ||
 + light-QuadraticAttenuation != 0.0F)
  key-unit[i].light_attenuated = 1;
}
}
 diff --git a/src/mesa/main/fog.c b/src/mesa/main/fog.c
 index 3bce289..45f343d 100644
 --- a/src/mesa/main/fog.c
 +++ b/src/mesa/main/fog.c
 @@ -115,7 +115,7 @@ _mesa_Fogfv( GLenum pname, const GLfloat *params )
ctx-Fog.Mode = m;
break;
case GL_FOG_DENSITY:
 -  if (*params0.0) {
 +  if (*params0.0F) {
   _mesa_error( ctx, GL_INVALID_VALUE, glFog );
  return;
}
 diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
 index 3d6d639..785a9b5 100644
 --- a/src/mesa/main/get.c
 +++ b/src/mesa/main/get.c
 @@ -626,7 +626,7 @@ find_custom_value(struct gl_context *ctx, const struct 
 value_desc *d, union valu
break;
  
 case GL_EDGE_FLAG:
 -  v-value_bool = ctx-Current.Attrib[VERT_ATTRIB_EDGEFLAG][0] == 1.0;
 +  v-value_bool = ctx-Current.Attrib[VERT_ATTRIB_EDGEFLAG][0] == 1.0F;
break;
  
 case GL_READ_BUFFER:
 diff --git a/src/mesa/main/light.c b/src/mesa/main/light.c
 index 4021dbe..fe2ce8c 100644
 --- a/src/mesa/main/light.c
 +++ b/src/mesa/main/light.c
 @@ -143,7 +143,7 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum 
 pname, const GLfloat *pa
COPY_3V(light-SpotDirection, params);
break;
 case GL_SPOT_EXPONENT:
 -  assert(params[0] = 0.0);
 +  assert(params[0] = 0.0F);
assert(params[0] = ctx-Const.MaxSpotExponent);
if (light-SpotExponent == params[0])
return;
 @@ -151,12 +151,12 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum 
 pname, const GLfloat *pa
light-SpotExponent = params[0];
break;
 case GL_SPOT_CUTOFF:
 -  assert(params[0] == 180.0 || (params[0] = 0.0  params[0] = 90.0));
 +  assert(params[0] == 180.0F || (params[0] = 0.0F  params[0] = 
 90.0F));
if (light-SpotCutoff == params[0])
   return;
FLUSH_VERTICES(ctx, _NEW_LIGHT);
light-SpotCutoff = params[0];
 -  light-_CosCutoff = (GLfloat) (cos(light-SpotCutoff * M_PI / 180.0));
 +  light-_CosCutoff = (cosf(light-SpotCutoff * M_PI / 180.0));

Same comment as in the previous patch: is there any gain here?

Other than this:
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

if (light-_CosCutoff  0)
   light-_CosCutoff = 0;
if (light-SpotCutoff != 180.0F)
 @@ -165,21 +165,21 @@ _mesa_light(struct gl_context *ctx, GLuint lnum, GLenum 
 pname, const GLfloat *pa
   light-_Flags = ~LIGHT_SPOT;
break;
 case GL_CONSTANT_ATTENUATION:
 -  assert(params[0] = 0.0);
 +  assert(params[0] = 0.0F);
if (light-ConstantAttenuation == params[0])
return;
FLUSH_VERTICES(ctx, _NEW_LIGHT);
light-ConstantAttenuation = params[0];
break;
 case GL_LINEAR_ATTENUATION:
 -  assert(params[0] = 0.0);
 +  assert(params[0] = 0.0F);
if (light-LinearAttenuation == params[0])
return;
FLUSH_VERTICES(ctx, _NEW_LIGHT);
light-LinearAttenuation = params[0];
break;
 case GL_QUADRATIC_ATTENUATION:
 -  assert(params[0] = 0.0);
 +  assert(params[0] = 0.0F);
if (light-QuadraticAttenuation == params[0])
return

Re: [Mesa-dev] [PATCH 12/13] mesa/math: Avoid double promotion.

2015-07-14 Thread Iago Toral
On Tue, 2015-07-14 at 13:19 +0200, Iago Toral wrote:
 On Mon, 2015-07-13 at 16:22 -0700, Matt Turner wrote:
  ---
   src/mesa/math/m_clip_tmp.h | 20 ++---
   src/mesa/math/m_matrix.c   | 70 
  +++---
   src/mesa/math/m_norm_tmp.h |  2 +-
   3 files changed, 46 insertions(+), 46 deletions(-)
  
  diff --git a/src/mesa/math/m_clip_tmp.h b/src/mesa/math/m_clip_tmp.h
  index e289be7..60c0004 100644
  --- a/src/mesa/math/m_clip_tmp.h
  +++ b/src/mesa/math/m_clip_tmp.h
  @@ -194,13 +194,13 @@ static GLvector4f * TAG(cliptest_points3)( GLvector4f 
  *clip_vec,
  STRIDE_LOOP {
 const GLfloat cx = from[0], cy = from[1], cz = from[2];
 GLubyte mask = 0;
  -  if (cx   1.0)   mask |= CLIP_RIGHT_BIT;
  -  else if (cx  -1.0)  mask |= CLIP_LEFT_BIT;
  -  if (cy   1.0)   mask |= CLIP_TOP_BIT;
  -  else if (cy  -1.0)  mask |= CLIP_BOTTOM_BIT;
  +  if (cx   1.0F)   mask |= CLIP_RIGHT_BIT;
  +  else if (cx  -1.0F)  mask |= CLIP_LEFT_BIT;
  +  if (cy   1.0F)   mask |= CLIP_TOP_BIT;
  +  else if (cy  -1.0F)  mask |= CLIP_BOTTOM_BIT;
 if (viewport_z_clip) {
  -if (cz   1.0)   mask |= CLIP_FAR_BIT;
  -else if (cz  -1.0)  mask |= CLIP_NEAR_BIT;
  +if (cz   1.0F)   mask |= CLIP_FAR_BIT;
  +else if (cz  -1.0F)  mask |= CLIP_NEAR_BIT;
 }
 clipMask[i] = mask;
 tmpOrMask |= mask;
  @@ -230,10 +230,10 @@ static GLvector4f * TAG(cliptest_points2)( GLvector4f 
  *clip_vec,
  STRIDE_LOOP {
 const GLfloat cx = from[0], cy = from[1];
 GLubyte mask = 0;
  -  if (cx   1.0)   mask |= CLIP_RIGHT_BIT;
  -  else if (cx  -1.0)  mask |= CLIP_LEFT_BIT;
  -  if (cy   1.0)   mask |= CLIP_TOP_BIT;
  -  else if (cy  -1.0)  mask |= CLIP_BOTTOM_BIT;
  +  if (cx   1.0F)   mask |= CLIP_RIGHT_BIT;
  +  else if (cx  -1.0F)  mask |= CLIP_LEFT_BIT;
  +  if (cy   1.0F)   mask |= CLIP_TOP_BIT;
  +  else if (cy  -1.0F)  mask |= CLIP_BOTTOM_BIT;
 clipMask[i] = mask;
 tmpOrMask |= mask;
 tmpAndMask = mask;
  diff --git a/src/mesa/math/m_matrix.c b/src/mesa/math/m_matrix.c
  index 6a42c6c..6522200 100644
  --- a/src/mesa/math/m_matrix.c
  +++ b/src/mesa/math/m_matrix.c
  @@ -380,7 +380,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat )
  if (fabsf(r3[0])fabsf(r2[0])) SWAP_ROWS(r3, r2);
  if (fabsf(r2[0])fabsf(r1[0])) SWAP_ROWS(r2, r1);
  if (fabsf(r1[0])fabsf(r0[0])) SWAP_ROWS(r1, r0);
  -   if (0.0 == r0[0])  return GL_FALSE;
  +   if (0.0F == r0[0])  return GL_FALSE;
   
  /* eliminate first variable */
  m1 = r1[0]/r0[0]; m2 = r2[0]/r0[0]; m3 = r3[0]/r0[0];
  @@ -388,31 +388,31 @@ static GLboolean invert_matrix_general( GLmatrix *mat 
  )
  s = r0[2]; r1[2] -= m1 * s; r2[2] -= m2 * s; r3[2] -= m3 * s;
  s = r0[3]; r1[3] -= m1 * s; r2[3] -= m2 * s; r3[3] -= m3 * s;
  s = r0[4];
  -   if (s != 0.0) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; }
  +   if (s != 0.0F) { r1[4] -= m1 * s; r2[4] -= m2 * s; r3[4] -= m3 * s; }
  s = r0[5];
  -   if (s != 0.0) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; }
  +   if (s != 0.0F) { r1[5] -= m1 * s; r2[5] -= m2 * s; r3[5] -= m3 * s; }
  s = r0[6];
  -   if (s != 0.0) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; }
  +   if (s != 0.0F) { r1[6] -= m1 * s; r2[6] -= m2 * s; r3[6] -= m3 * s; }
  s = r0[7];
  -   if (s != 0.0) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; }
  +   if (s != 0.0F) { r1[7] -= m1 * s; r2[7] -= m2 * s; r3[7] -= m3 * s; }
   
  /* choose pivot - or die */
  if (fabsf(r3[1])fabsf(r2[1])) SWAP_ROWS(r3, r2);
  if (fabsf(r2[1])fabsf(r1[1])) SWAP_ROWS(r2, r1);
  -   if (0.0 == r1[1])  return GL_FALSE;
  +   if (0.0F == r1[1])  return GL_FALSE;
   
  /* eliminate second variable */
  m2 = r2[1]/r1[1]; m3 = r3[1]/r1[1];
  r2[2] -= m2 * r1[2]; r3[2] -= m3 * r1[2];
  r2[3] -= m2 * r1[3]; r3[3] -= m3 * r1[3];
  -   s = r1[4]; if (0.0 != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; }
  -   s = r1[5]; if (0.0 != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; }
  -   s = r1[6]; if (0.0 != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; }
  -   s = r1[7]; if (0.0 != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; }
  +   s = r1[4]; if (0.0F != s) { r2[4] -= m2 * s; r3[4] -= m3 * s; }
  +   s = r1[5]; if (0.0F != s) { r2[5] -= m2 * s; r3[5] -= m3 * s; }
  +   s = r1[6]; if (0.0F != s) { r2[6] -= m2 * s; r3[6] -= m3 * s; }
  +   s = r1[7]; if (0.0F != s) { r2[7] -= m2 * s; r3[7] -= m3 * s; }
   
  /* choose pivot - or die */
  if (fabsf(r3[2])fabsf(r2[2])) SWAP_ROWS(r3, r2);
  -   if (0.0 == r2[2])  return GL_FALSE;
  +   if (0.0F == r2[2])  return GL_FALSE;
   
  /* eliminate third variable */
  m3 = r3[2]/r2[2];
  @@ -421,7 +421,7 @@ static GLboolean invert_matrix_general( GLmatrix *mat )
  r3[7] -= m3 * r2[7];
   
  /* last check */
  -   if (0.0 == r3[3

Re: [Mesa-dev] Register spilling issues in the NIR-vec4 backend

2015-07-16 Thread Iago Toral
On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote:
 On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral ito...@igalia.com wrote:
  Hi,
 
  when we sent the patches for the new nir-vec4 backend we mentioned that
  we had a few dEQP tests that would fail to link because of register
  spilling. Now that we have added GS support we see a few instances of
  this problem popping up in a few GS piglit tests too, for example this
  one:
 
  tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test
 
  I have been looking into what is going on with these tests and I came to
  the conclusion that the problem is a consequence of various factors, but
  probably the main thing contributing to it is the way our SSA pass
  works. That said, I am not that experienced with NIR, so it could also
  be that my analysis is missing something and I am just arriving to wrong
  conclusions, so I'll explain my thoughts below and hopefully someone
  else with more NIR experience can jump in and confirm or reject my
  analysis.
 
  The GS code in that test looks like this:
 
  for (int p = 0; p  3; p++) {
 color = ((index = ins[p].m1.length() ?
  ins[p].m2[index-ins[p].m1.length()] :
  ins[p].m1[index]) == expect) ?
 vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0);
 gl_Position = gl_in[p].gl_Position;
 EmitVertex();
  }
 
  One thing that is immediately contributing to the register pressure is
  some really awful code generated because of the indirect array indexing
  on the inputs inside the loop. This is because of the
  lower_variable_index_to_cond_assign lowering pass called from
  brw_shader.cpp. This pass will convert that color assignment into a
  bunch of nested if/else statements which makes the generated GLSL IR
  code rather large, involving plenty of temporaries too. This is only
  made worse by the fact that loop unrolling will replicate that 3 times.
  The result is a huge pile of GLSL IR with a few dozens of nested if/else
  statements and temporaries that looks like [1] (that is only a fragment
  of the GLSL IR).
 
  One thing that is particularly relevant in that code is that it has
  multiple conditional assignments to the same variable
  (dereference_array_value) as a consequence of this lowering pass.
 
  That much, however, is common to the NIR and non-NIR paths. The problem
  in the NIR case is that all these assignments generate new SSA values,
  which then become new registers in the final NIR form. This leads to NIR
  code like [2].  In contrast, the old vec4 visitor path, is able to have
  writes to the same variable write to the same register.
 
  As a result, if I print the code right before register allocation in the
  NIR path [3] and I compare that to what we get with the old vec4 visitor
  path at that same point [4], it is clearly visible that this difference
  is allowing the vec4 visitor path to reduce register pressure (see how
  in [4] we have multiple writes to vgrf5, while in [3] we always write to
  a new vgrf every time).
 
  So, am I missing something or is this kind of result expected with NIR
  programs? Is there anything in the nir-vec4 pass that we can do to fix
  this or does this need to be fixed when going out of SSA moe inside NIR?
 
  Iago
 
  [1] http://pastebin.com/5uA8ex2S
  [2] http://pastebin.com/pqLfvAVN
  [3] http://pastebin.com/64nSuUH8
  [4] http://pastebin.com/WCrdYxzt
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
 Hi Iago,
 
 Indeed, NIR does convert conditional writes to conditional selectss --
 it's a required part of the conversion to SSA, and since our HW has a
 conditional select instruction that's just as fast as doing a
 conditional move, we haven't bothered much to try and change it back
 during out-of-SSA. However, doing this shouldn't make things worse. In
 your example, vgrf9, vgrf15, and vgrf17 all have very short live
 intervals and don't interfere with vgrf11 (unless there's another use
 of them somewhere after the snippet you pasted), which means that the
 register allocator is free to allocate the destinations of all the
 selects to the same register.
 
 What's happening, though, is that you're running into our terrible
 liveness analysis. After doing the proper liveness analysis, we figure
 out the place each register first becomes live and last becomes dead,
 and then we consider registers that have overlapping ranges to
 interfere. So we consider vgrf11 to interfere with vgrf15 and vgrf17,
 even though it really doesn't. The trouble with making it do the right
 thing is that we may actually need to extend the live ranges of
 registers when the exec masks don't match up, either because one uses
 writemask_all or because they have incompatible exec masks due to
 containing different datatypes (half-float vs. float, etc.). For
 example, in your snippet

Re: [Mesa-dev] [PATCH v2 02/14] meta: Fix transfer operations check in meta pbo path for readpixels

2015-07-20 Thread Iago Toral
On Fri, 2015-07-17 at 10:36 -0700, Anuj Phogat wrote:
 Iago, Jason: Patches 2 and 5 in this series depend on patches 1 and 4
 respectively, Since you guys reviewed 2 and 5, would you also like to
 review 1, 4 and/or other patches in this series?
 
 Thanks
 -Anuj

Sure, I'll have a look at the patches.
Iag

 On Sun, Jun 28, 2015 at 11:29 PM, Iago Toral ito...@igalia.com wrote:
  Reviewed-by: Iago Toral Quiroga ito...@igalia.com
 
  On Fri, 2015-06-26 at 13:15 -0700, Anuj Phogat wrote:
  Currently used ctx-_ImageTransferState check is not sufficient
  because it doesn't include the read color clamping enabled with
  GL_CLAMP_READ_COLOR. So, use the helper function
  _mesa_get_readpixels_transfer_ops().
 
  Also, transfer operations don't affect glGetTexImage(). So, do
  the check only for glReadPixles.
 
  Without this patch, arb_color_buffer_float-readpixels test fails, when
  forced to use meta pbo path.
 
  V2: Add a comment and bump up the commit message.
 
  Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
  Cc: mesa-sta...@lists.freedesktop.org
  Cc: Iago Toral ito...@igalia.com
  Cc: Jason Ekstrand jason.ekstr...@intel.com
  ---
   src/mesa/drivers/common/meta_tex_subimage.c | 13 +
   1 file changed, 9 insertions(+), 4 deletions(-)
 
  diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
  b/src/mesa/drivers/common/meta_tex_subimage.c
  index d2474f5..90d78e5 100644
  --- a/src/mesa/drivers/common/meta_tex_subimage.c
  +++ b/src/mesa/drivers/common/meta_tex_subimage.c
  @@ -273,12 +273,17 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context 
  *ctx, GLuint dims,
  format == GL_COLOR_INDEX)
 return false;
 
  -   if (ctx-_ImageTransferState)
  -  return false;
  -
  -
  +   /* Don't use meta path for readpixels in below conditions. */
  if (!tex_image) {
 rb = ctx-ReadBuffer-_ColorReadBuffer;
  +
  +  /* _mesa_get_readpixels_transfer_ops() includes the cases of read
  +   * color clamping along with the ctx-_ImageTransferState.
  +   */
  +  if (_mesa_get_readpixels_transfer_ops(ctx, rb-Format, format,
  +type, GL_FALSE))
  + return false;
  +
 if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format))
return false;
  }
 
 
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/14] mesa: Add a mesa utility function _mesa_need_signed_unsigned_int_conversion()

2015-07-20 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 This utility function is used in a later patch.
 
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 Cc: mesa-sta...@lists.freedesktop.org
 ---
  src/mesa/main/readpix.c | 32 ++--
  src/mesa/main/readpix.h |  4 
  2 files changed, 22 insertions(+), 14 deletions(-)
 
 diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
 index a9416ef..1038983 100644
 --- a/src/mesa/main/readpix.c
 +++ b/src/mesa/main/readpix.c
 @@ -114,6 +114,22 @@ _mesa_get_readpixels_transfer_ops(const struct 
 gl_context *ctx,
 return transferOps;
  }
  
 +bool
 +_mesa_need_signed_unsigned_int_conversion(mesa_format rbFormat,
 +  GLenum format, GLenum type)
 +{
 +  const GLenum srcType = _mesa_get_format_datatype(rbFormat);
 +  return (srcType == GL_INT 
 +  _mesa_is_enum_format_integer(format) 
 +  (type == GL_UNSIGNED_INT ||
 +   type == GL_UNSIGNED_SHORT ||
 +   type == GL_UNSIGNED_BYTE)) ||
 + (srcType == GL_UNSIGNED_INT 
 +  _mesa_is_enum_format_integer(format) 
 +  (type == GL_INT ||
 +   type == GL_SHORT ||
 +   type == GL_BYTE));
 +}

I think it is better if you assign the result  of
_mesa_is_enum_format_integer(format) to a temporary instead of calling
it twice in the condition, just like you do with srcType.

  /**
   * Return true if memcpy cannot be used for ReadPixels.
 @@ -130,7 +146,6 @@ _mesa_readpixels_needs_slow_path(const struct gl_context 
 *ctx, GLenum format,
  {
 struct gl_renderbuffer *rb =
   _mesa_get_read_renderbuffer_for_format(ctx, format);
 -   GLenum srcType;
  
 assert(rb);
  
 @@ -157,20 +172,9 @@ _mesa_readpixels_needs_slow_path(const struct gl_context 
 *ctx, GLenum format,
  
/* Conversion between signed and unsigned integers needs masking
 * (it isn't just memcpy). */
 -  srcType = _mesa_get_format_datatype(rb-Format);
 -
 -  if ((srcType == GL_INT 
 -   _mesa_is_enum_format_integer(format) 
 -   (type == GL_UNSIGNED_INT ||
 -type == GL_UNSIGNED_SHORT ||
 -type == GL_UNSIGNED_BYTE)) ||
 -  (srcType == GL_UNSIGNED_INT 
 -   _mesa_is_enum_format_integer(format) 
 -   (type == GL_INT ||
 -type == GL_SHORT ||
 -type == GL_BYTE))) {
 +  if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format,
 + type))
   return GL_TRUE;
 -  }

You need to rebase your patch, this code does not exist any more. I
moved it to Gallium where I am not sure that your change is what they
want. You should probably just skip this part.

With these changes,
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

/* And finally, see if there are any transfer ops. */
return _mesa_get_readpixels_transfer_ops(ctx, rb-Format, format, type,
 diff --git a/src/mesa/main/readpix.h b/src/mesa/main/readpix.h
 index f894036..a93e263 100644
 --- a/src/mesa/main/readpix.h
 +++ b/src/mesa/main/readpix.h
 @@ -46,6 +46,10 @@ _mesa_get_readpixels_transfer_ops(const struct gl_context 
 *ctx,
GLenum format, GLenum type,
GLboolean uses_blit);
  
 +extern bool
 +_mesa_need_signed_unsigned_int_conversion(mesa_format rbFormat,
 +  GLenum format, GLenum type);
 +
  extern void
  _mesa_readpixels(struct gl_context *ctx,
   GLint x, GLint y, GLsizei width, GLsizei height,


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion

2015-07-20 Thread Iago Toral
On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote:
 On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 
  On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote:
 
  Without this patch, piglit test fbo_integer_readpixels_sint_uint fails,
  when
  forced to use the meta pbo path.
 
  Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
  Cc: mesa-sta...@lists.freedesktop.org
  ---
   src/mesa/drivers/common/meta_tex_subimage.c | 3 +++
   1 file changed, 3 insertions(+)
 
  diff --git a/src/mesa/drivers/common/meta_tex_subimage.c
  b/src/mesa/drivers/common/meta_tex_subimage.c
  index 00364f8..84cbc50 100644
  --- a/src/mesa/drivers/common/meta_tex_subimage.c
  +++ b/src/mesa/drivers/common/meta_tex_subimage.c
  @@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx,
  GLuint dims,
 
 if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format))
return false;
  +
  +  if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format,
  type))
  + return false;
 
  Hrm... This seems fishy.  Isn't glBlitFramebuffers supposed to handle format
  conversion with integers?  If so we should probably fix it rather than just
  skip it for the meta pbo path.
 
 As discussed offline, here is relevant text for glBlitFrameBuffer() from
 OpenGL 4.5 spec, section 18.3.1:
 An INVALID_OPERATION error is generated if format conversions are not
 supported, which occurs under any of the following conditions:
 -The read buffer contains fixed-point or floating-point values and any draw
   buffer contains neither fixed-point nor floating-point values.
 -The read buffer contains unsigned integer values and any draw buffer does
   not contain unsigned integer values.
 - The read buffer contains signed integer values and any draw buffer does
   not contain signed integer values.
 
 I'll add a comment here explaining the reason to avoid meta path.

Is this code going to run only for glBlitFramebuffer? I see this
function being called from code paths that implement glReadPixels and
glGetTexImage too.

Iago

  }
 
  /* For arrays, use a tall (height * depth) 2D texture but taking into
  --
  1.9.3
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/14] mesa: Turn get_readpixels_transfer_ops() in to a global function

2015-07-20 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 This utility function is utilized in a later patch.
 
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 Cc: mesa-sta...@lists.freedesktop.org
 ---
 Jenkins showed no piglit regressions with this series.
 
  src/mesa/main/readpix.c | 14 --
  src/mesa/main/readpix.h |  6 ++
  2 files changed, 14 insertions(+), 6 deletions(-)
 
 diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
 index a3357cd..caa2648 100644
 --- a/src/mesa/main/readpix.c
 +++ b/src/mesa/main/readpix.c
 @@ -64,9 +64,11 @@ _mesa_need_rgb_to_luminance_conversion(mesa_format 
 texFormat, GLenum format)
  /**
   * Return transfer op flags for this ReadPixels operation.
   */
 -static GLbitfield
 -get_readpixels_transfer_ops(const struct gl_context *ctx, mesa_format 
 texFormat,
 -GLenum format, GLenum type, GLboolean uses_blit)
 +GLbitfield
 +_mesa_get_readpixels_transfer_ops(const struct gl_context *ctx,
 +  mesa_format texFormat,
 +  GLenum format, GLenum type,
 +  GLboolean uses_blit)
  {
 GLbitfield transferOps = ctx-_ImageTransferState;
  
 @@ -169,7 +171,7 @@ _mesa_readpixels_needs_slow_path(const struct gl_context 
 *ctx, GLenum format,
}
  
/* And finally, see if there are any transfer ops. */
 -  return get_readpixels_transfer_ops(ctx, rb-Format, format, type,
 +  return _mesa_get_readpixels_transfer_ops(ctx, rb-Format, format, type,
   uses_blit) != 0;

With the change, indentation for the second line needs to be fixed too.

Other than that,
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

 }
 return GL_FALSE;
 @@ -436,8 +438,8 @@ read_rgba_pixels( struct gl_context *ctx,
 if (!rb)
return;
  
 -   transferOps = get_readpixels_transfer_ops(ctx, rb-Format, format, type,
 - GL_FALSE);
 +   transferOps = _mesa_get_readpixels_transfer_ops(ctx, rb-Format, format,
 +   type, GL_FALSE);
 /* Describe the dst format */
 dst_is_integer = _mesa_is_enum_format_integer(format);
 dst_stride = _mesa_image_row_stride(packing, width, format, type);
 diff --git a/src/mesa/main/readpix.h b/src/mesa/main/readpix.h
 index 1636dd9..f894036 100644
 --- a/src/mesa/main/readpix.h
 +++ b/src/mesa/main/readpix.h
 @@ -40,6 +40,12 @@ _mesa_readpixels_needs_slow_path(const struct gl_context 
 *ctx, GLenum format,
  extern GLboolean
  _mesa_need_rgb_to_luminance_conversion(mesa_format texFormat, GLenum format);
  
 +extern GLbitfield
 +_mesa_get_readpixels_transfer_ops(const struct gl_context *ctx,
 +  mesa_format texFormat,
 +  GLenum format, GLenum type,
 +  GLboolean uses_blit);
 +
  extern void
  _mesa_readpixels(struct gl_context *ctx,
   GLint x, GLint y, GLsizei width, GLsizei height,


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/14] mesa: Fix conditions to test signed, unsigned integer format

2015-07-20 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 Cc: mesa-sta...@lists.freedesktop.org
 ---
  src/mesa/main/readpix.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c
 index caa2648..a9416ef 100644
 --- a/src/mesa/main/readpix.c
 +++ b/src/mesa/main/readpix.c
 @@ -160,10 +160,12 @@ _mesa_readpixels_needs_slow_path(const struct 
 gl_context *ctx, GLenum format,
srcType = _mesa_get_format_datatype(rb-Format);
  
if ((srcType == GL_INT 
 +   _mesa_is_enum_format_integer(format) 
 (type == GL_UNSIGNED_INT ||
  type == GL_UNSIGNED_SHORT ||
  type == GL_UNSIGNED_BYTE)) ||
(srcType == GL_UNSIGNED_INT 
 +   _mesa_is_enum_format_integer(format) 
 (type == GL_INT ||
  type == GL_SHORT ||
  type == GL_BYTE))) {

As I mentioned in patch 4, this code does not exist any more, so this
patch should probably be dropped.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Register spilling issues in the NIR-vec4 backend

2015-07-20 Thread Iago Toral
Hi,
On Thu, 2015-07-16 at 08:15 -0700, Jason Ekstrand wrote:
 
 On Jul 15, 2015 11:20 PM, Iago Toral ito...@igalia.com wrote:
 
  On Wed, 2015-07-15 at 11:02 -0700, Connor Abbott wrote:
   On Wed, Jul 15, 2015 at 7:49 AM, Iago Toral ito...@igalia.com
 wrote:
Hi,
   
when we sent the patches for the new nir-vec4 backend we
 mentioned that
we had a few dEQP tests that would fail to link because of
 register
spilling. Now that we have added GS support we see a few
 instances of
this problem popping up in a few GS piglit tests too, for
 example this
one:
   
   
 tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test
   
I have been looking into what is going on with these tests and I
 came to
the conclusion that the problem is a consequence of various
 factors, but
probably the main thing contributing to it is the way our SSA
 pass
works. That said, I am not that experienced with NIR, so it
 could also
be that my analysis is missing something and I am just arriving
 to wrong
conclusions, so I'll explain my thoughts below and hopefully
 someone
else with more NIR experience can jump in and confirm or reject
 my
analysis.
   
The GS code in that test looks like this:
   
for (int p = 0; p  3; p++) {
   color = ((index = ins[p].m1.length() ?
ins[p].m2[index-ins[p].m1.length()] :
ins[p].m1[index]) == expect) ?
   vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0,
 1.0);
   gl_Position = gl_in[p].gl_Position;
   EmitVertex();
}
   
One thing that is immediately contributing to the register
 pressure is
some really awful code generated because of the indirect array
 indexing
on the inputs inside the loop. This is because of the
lower_variable_index_to_cond_assign lowering pass called from
brw_shader.cpp. This pass will convert that color assignment
 into a
bunch of nested if/else statements which makes the generated
 GLSL IR
code rather large, involving plenty of temporaries too. This is
 only
made worse by the fact that loop unrolling will replicate that 3
 times.
The result is a huge pile of GLSL IR with a few dozens of nested
 if/else
statements and temporaries that looks like [1] (that is only a
 fragment
of the GLSL IR).
   
One thing that is particularly relevant in that code is that it
 has
multiple conditional assignments to the same variable
(dereference_array_value) as a consequence of this lowering
 pass.
   
That much, however, is common to the NIR and non-NIR paths. The
 problem
in the NIR case is that all these assignments generate new SSA
 values,
which then become new registers in the final NIR form. This
 leads to NIR
code like [2].  In contrast, the old vec4 visitor path, is able
 to have
writes to the same variable write to the same register.
   
As a result, if I print the code right before register
 allocation in the
NIR path [3] and I compare that to what we get with the old vec4
 visitor
path at that same point [4], it is clearly visible that this
 difference
is allowing the vec4 visitor path to reduce register pressure
 (see how
in [4] we have multiple writes to vgrf5, while in [3] we always
 write to
a new vgrf every time).
   
So, am I missing something or is this kind of result expected
 with NIR
programs? Is there anything in the nir-vec4 pass that we can do
 to fix
this or does this need to be fixed when going out of SSA moe
 inside NIR?
   
Iago
   
[1] http://pastebin.com/5uA8ex2S
[2] http://pastebin.com/pqLfvAVN
[3] http://pastebin.com/64nSuUH8
[4] http://pastebin.com/WCrdYxzt
   
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
  
   Hi Iago,
  
   Indeed, NIR does convert conditional writes to conditional
 selectss --
   it's a required part of the conversion to SSA, and since our HW
 has a
   conditional select instruction that's just as fast as doing a
   conditional move, we haven't bothered much to try and change it
 back
   during out-of-SSA. However, doing this shouldn't make things
 worse. In
   your example, vgrf9, vgrf15, and vgrf17 all have very short live
   intervals and don't interfere with vgrf11 (unless there's another
 use
   of them somewhere after the snippet you pasted), which means that
 the
   register allocator is free to allocate the destinations of all the
   selects to the same register.
  
   What's happening, though, is that you're running into our terrible
   liveness analysis. After doing the proper liveness analysis, we
 figure
   out the place each register first becomes live and last becomes
 dead,
   and then we consider registers that have overlapping ranges to
   interfere. So we consider vgrf11 to interfere with vgrf15 and
 vgrf17,
   even

Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion

2015-07-21 Thread Iago Toral
On Mon, 2015-07-20 at 10:56 -0700, Anuj Phogat wrote:
 On Mon, Jul 20, 2015 at 5:10 AM, Iago Toral ito...@igalia.com wrote:
  On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote:
  On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand ja...@jlekstrand.net 
  wrote:
  
   On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote:
  
   Without this patch, piglit test fbo_integer_readpixels_sint_uint fails,
   when
   forced to use the meta pbo path.
  
   Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
   Cc: mesa-sta...@lists.freedesktop.org
   ---
src/mesa/drivers/common/meta_tex_subimage.c | 3 +++
1 file changed, 3 insertions(+)
  
   diff --git a/src/mesa/drivers/common/meta_tex_subimage.c
   b/src/mesa/drivers/common/meta_tex_subimage.c
   index 00364f8..84cbc50 100644
   --- a/src/mesa/drivers/common/meta_tex_subimage.c
   +++ b/src/mesa/drivers/common/meta_tex_subimage.c
   @@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context 
   *ctx,
   GLuint dims,
  
  if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format))
 return false;
   +
   +  if (_mesa_need_signed_unsigned_int_conversion(rb-Format, format,
   type))
   + return false;
  
   Hrm... This seems fishy.  Isn't glBlitFramebuffers supposed to handle 
   format
   conversion with integers?  If so we should probably fix it rather than 
   just
   skip it for the meta pbo path.
  
  As discussed offline, here is relevant text for glBlitFrameBuffer() from
  OpenGL 4.5 spec, section 18.3.1:
  An INVALID_OPERATION error is generated if format conversions are not
  supported, which occurs under any of the following conditions:
  -The read buffer contains fixed-point or floating-point values and any draw
buffer contains neither fixed-point nor floating-point values.
  -The read buffer contains unsigned integer values and any draw buffer does
not contain unsigned integer values.
  - The read buffer contains signed integer values and any draw buffer does
not contain signed integer values.
 
  I'll add a comment here explaining the reason to avoid meta path.
 
  Is this code going to run only for glBlitFramebuffer? I see this
  function being called from code paths that implement glReadPixels and
  glGetTexImage too.
 
 _mesa_meta_pbo_GetTexSubImage() is used only for glReadPixels and
 glGetTexImage. I quoted the glBliFrameBuffer restriction above because
 the function is later using _mesa_meta_BlitFramebuffer(), which doesn't
 support some format conversions.

If this function can be used to resolve ReadPixels and GetTexImage but
the checks you add are *specific* to BlitFramebuffer, it does not look
like this is the right place for them. Shouldn't you put them inside
_mesa_meta_BlitFramebuffer instead? Otherwise they would affect to
ReadPixels and GetTexImage too and I don't see the same restrictions
applying to ReadPixels for example.

Iago

   }
  
   /* For arrays, use a tall (height * depth) 2D texture but taking 
   into
   --
   1.9.3
  
   ___
   mesa-dev mailing list
   mesa-dev@lists.freedesktop.org
   http://lists.freedesktop.org/mailman/listinfo/mesa-dev
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
 
 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/14] mesa: Add a helper function _mesa_unpack_format_to_base_format()

2015-07-21 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/main/glformats.c | 44 
  src/mesa/main/glformats.h |  3 +++
  2 files changed, 47 insertions(+)
 
 diff --git a/src/mesa/main/glformats.c b/src/mesa/main/glformats.c
 index ac69fab..cac243c 100644
 --- a/src/mesa/main/glformats.c
 +++ b/src/mesa/main/glformats.c
 @@ -1278,6 +1278,50 @@ _mesa_is_compressed_format(const struct gl_context 
 *ctx, GLenum format)
 }
  }
  
 +/**
 + * Convert various unpack formats to the corresponding base format.
 + */
 +GLenum
 +_mesa_unpack_format_to_base_format(GLenum format)
 +{
 +   switch(format) {
 +   case GL_RED_INTEGER:
 +  return GL_RED;
 +   case GL_GREEN_INTEGER:
 +  return GL_GREEN;
 +   case GL_BLUE_INTEGER:
 +  return GL_BLUE;
 +   case GL_RG_INTEGER:
 +  return GL_RG;
 +   case GL_RGB_INTEGER:
 +  return GL_RGB;
 +   case GL_RGBA_INTEGER:
 +  return GL_RGBA;
 +   case GL_BGR_INTEGER:
 +  return GL_BGR;
 +   case GL_BGRA_INTEGER:
 +  return GL_BGRA;
 +   case GL_ALPHA_INTEGER:
 +  return GL_ALPHA;

I would probably move the GL_ALPHA_INTEGER case together with the other
single component formats, at first I thought you had missed it. Either
way:

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

 +   case GL_LUMINANCE_INTEGER_EXT:
 +  return GL_LUMINANCE;
 +   case GL_LUMINANCE_ALPHA_INTEGER_EXT:
 +  return GL_LUMINANCE_ALPHA;
 +   case GL_RED:
 +   case GL_GREEN:
 +   case GL_BLUE:
 +   case GL_RG:
 +   case GL_RGB:
 +   case GL_RGBA:
 +   case GL_BGR:
 +   case GL_BGRA:
 +   case GL_ALPHA:
 +   case GL_LUMINANCE:
 +   case GL_LUMINANCE_ALPHA:
 +   default:
 +  return format;
 +   }
 +}

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

  /**
   * Convert various base formats to the cooresponding integer format.
 diff --git a/src/mesa/main/glformats.h b/src/mesa/main/glformats.h
 index 8881cb7..419955a 100644
 --- a/src/mesa/main/glformats.h
 +++ b/src/mesa/main/glformats.h
 @@ -101,6 +101,9 @@ _mesa_is_compressed_format(const struct gl_context *ctx, 
 GLenum format);
  extern GLenum
  _mesa_base_format_to_integer_format(GLenum format);
  
 +extern GLenum
 +_mesa_unpack_format_to_base_format(GLenum format);
 +
  extern GLboolean
  _mesa_base_format_has_channel(GLenum base_format, GLenum pname);
  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/14] mesa: Set green, blue channels to zero only for formats with these components

2015-07-21 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/common/meta.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)
 
 diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
 index 214a68a..fceb25d 100644
 --- a/src/mesa/drivers/common/meta.c
 +++ b/src/mesa/drivers/common/meta.c
 @@ -3132,9 +3132,16 @@ decompress_texture_image(struct gl_context *ctx,
 * returned as red and two-channel texture values are returned as
 * red/alpha.
 */
 -  if ((baseTexFormat == GL_LUMINANCE ||
 -   baseTexFormat == GL_LUMINANCE_ALPHA ||
 -   baseTexFormat == GL_INTENSITY) ||
 +  if (((baseTexFormat == GL_LUMINANCE ||
 +baseTexFormat == GL_LUMINANCE_ALPHA ||
 +baseTexFormat == GL_INTENSITY) 
 +   (destBaseFormat == GL_RGBA ||
 +destBaseFormat == GL_RGB ||
 +destBaseFormat == GL_RG ||
 +destBaseFormat == GL_GREEN ||
 +destBaseFormat == GL_BLUE ||
 +destBaseFormat == GL_BGRA ||
 +destBaseFormat == GL_BGR)) ||

Is this needed to achieve correct behavior or just an optimization? I
would expect that if the dest format does not have G/B channels, setting
pixel transfer options for these channels would not have any functional
effect anyway.

/* If we're reading back an RGB(A) texture (using glGetTexImage) as
  * luminance then we need to return L=tex(R).
  */


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/14] meta: Abort meta pbo path if readpixels need signed-unsigned conversion

2015-07-21 Thread Iago Toral
On Tue, 2015-07-21 at 08:13 +0200, Iago Toral wrote:
 On Mon, 2015-07-20 at 10:56 -0700, Anuj Phogat wrote:
  On Mon, Jul 20, 2015 at 5:10 AM, Iago Toral ito...@igalia.com wrote:
   On Fri, 2015-06-19 at 13:40 -0700, Anuj Phogat wrote:
   On Tue, Jun 16, 2015 at 9:21 PM, Jason Ekstrand ja...@jlekstrand.net 
   wrote:
   
On Jun 16, 2015 11:15, Anuj Phogat anuj.pho...@gmail.com wrote:
   
Without this patch, piglit test fbo_integer_readpixels_sint_uint 
fails,
when
forced to use the meta pbo path.
   
Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/drivers/common/meta_tex_subimage.c | 3 +++
 1 file changed, 3 insertions(+)
   
diff --git a/src/mesa/drivers/common/meta_tex_subimage.c
b/src/mesa/drivers/common/meta_tex_subimage.c
index 00364f8..84cbc50 100644
--- a/src/mesa/drivers/common/meta_tex_subimage.c
+++ b/src/mesa/drivers/common/meta_tex_subimage.c
@@ -283,6 +283,9 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context 
*ctx,
GLuint dims,
   
   if (_mesa_need_rgb_to_luminance_conversion(rb-Format, format))
  return false;
+
+  if (_mesa_need_signed_unsigned_int_conversion(rb-Format, 
format,
type))
+ return false;
   
Hrm... This seems fishy.  Isn't glBlitFramebuffers supposed to handle 
format
conversion with integers?  If so we should probably fix it rather than 
just
skip it for the meta pbo path.
   
   As discussed offline, here is relevant text for glBlitFrameBuffer() from
   OpenGL 4.5 spec, section 18.3.1:
   An INVALID_OPERATION error is generated if format conversions are not
   supported, which occurs under any of the following conditions:
   -The read buffer contains fixed-point or floating-point values and any 
   draw
 buffer contains neither fixed-point nor floating-point values.
   -The read buffer contains unsigned integer values and any draw buffer 
   does
 not contain unsigned integer values.
   - The read buffer contains signed integer values and any draw buffer does
 not contain signed integer values.
  
   I'll add a comment here explaining the reason to avoid meta path.
  
   Is this code going to run only for glBlitFramebuffer? I see this
   function being called from code paths that implement glReadPixels and
   glGetTexImage too.
  
  _mesa_meta_pbo_GetTexSubImage() is used only for glReadPixels and
  glGetTexImage. I quoted the glBliFrameBuffer restriction above because
  the function is later using _mesa_meta_BlitFramebuffer(), which doesn't
  support some format conversions.
 
 If this function can be used to resolve ReadPixels and GetTexImage but
 the checks you add are *specific* to BlitFramebuffer, it does not look
 like this is the right place for them. Shouldn't you put them inside
 _mesa_meta_BlitFramebuffer instead? Otherwise they would affect to
 ReadPixels and GetTexImage too and I don't see the same restrictions
 applying to ReadPixels for example.

Specifically for ReadPixels I only see this in the spec:

An INVALID_OPERATION error is generated if format is an integer format
and the color buffer is not an integer format, or if the color buffer is
an integer format and format is not an integer format.

So, unlike BlitFramebuffer, it seems that ReadPixels is fine as long as
both formats are integer, no matter if the types have the same sign or
not.

Iago

 
}
   
/* For arrays, use a tall (height * depth) 2D texture but taking 
into
--
1.9.3
   
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
   ___
   mesa-dev mailing list
   mesa-dev@lists.freedesktop.org
   http://lists.freedesktop.org/mailman/listinfo/mesa-dev
  
  
  
 
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/14] meta: Use _mesa_unpack_format_to_base_format() to handle integer formats

2015-07-21 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 _mesa_base_tex_format() doesn't handle GL_*_INTEGER formats.
 
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 ---
  src/mesa/drivers/common/meta.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)
 
 diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
 index fceb25d..c9e58d8 100644
 --- a/src/mesa/drivers/common/meta.c
 +++ b/src/mesa/drivers/common/meta.c
 @@ -3123,7 +3123,7 @@ decompress_texture_image(struct gl_context *ctx,
 /* read pixels from renderbuffer */
 {
GLenum baseTexFormat = texImage-_BaseFormat;
 -  GLenum destBaseFormat = _mesa_base_tex_format(ctx, destFormat);
 +  GLenum destBaseFormat = _mesa_unpack_format_to_base_format(destFormat);

I think this commit is doing a bit more more than what the shortlog
suggests: it is replacing a call to mesa_base_tex_format that handles
internal formats (so things like GL_COMPRESSED_ALPHA,  GL_RGB_S3TC or
GL_RG16_SNORM for example) with a call to the new
_mesa_unpack_format_to_base_format that does not care for internal
formats at all.

That change seems to be fine because decompress_texture_image claims to
receive a destFormat that is ala glReadPixels, so there is no need to
care for all those internal formats anyway, but I think you should
explain this in the commit log.

With that change,
Reviewed-by: Iago Toral Quiroga ito...@igalia.com

/* The pixel transfer state will be set to default values at this point
 * (see MESA_META_PIXEL_TRANSFER) so pixel transfer ops are effectively
 @@ -3149,9 +3149,7 @@ decompress_texture_image(struct gl_context *ctx,
  baseTexFormat == GL_RGB  ||
  baseTexFormat == GL_RG) 
(destBaseFormat == GL_LUMINANCE ||
 -   destBaseFormat == GL_LUMINANCE_ALPHA ||
 -   destBaseFormat == GL_LUMINANCE_INTEGER_EXT ||
 -   destBaseFormat == GL_LUMINANCE_ALPHA_INTEGER_EXT))) {
 +   destBaseFormat == GL_LUMINANCE_ALPHA))) {
   /* Green and blue must be zero */
   _mesa_PixelTransferf(GL_GREEN_SCALE, 0.0f);
   _mesa_PixelTransferf(GL_BLUE_SCALE, 0.0f);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/14] meta: Don't do fragment color clamping in case of ReadPixels

2015-07-21 Thread Iago Toral
On Tue, 2015-07-21 at 09:24 +0200, Iago Toral wrote:
 On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
  Without this patch, piglit test arb_color_buffer_float-readpixels
  fails, when forced to use the meta pbo path.
  
  Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
  Cc: mesa-sta...@lists.freedesktop.org
  ---
   src/mesa/drivers/common/meta_tex_subimage.c | 5 +
   1 file changed, 5 insertions(+)
  
  diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
  b/src/mesa/drivers/common/meta_tex_subimage.c
  index 84cbc50..ccb7dfb 100644
  --- a/src/mesa/drivers/common/meta_tex_subimage.c
  +++ b/src/mesa/drivers/common/meta_tex_subimage.c
  @@ -25,6 +25,7 @@
*Jason Ekstrand jason.ekstr...@intel.com
*/
   
  +#include blend.h
   #include bufferobj.h
   #include buffers.h
   #include fbobject.h
  @@ -305,6 +306,10 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
  GLuint dims,
  _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER |
  MESA_META_PIXEL_STORE));
   
  +   /* GL_CLAMP_FRAGMENT_COLOR doesn't affect ReadPixels. */
  +   if (!tex_image  ctx-Extensions.ARB_color_buffer_float)
  +  _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
  +
 
 I think you need to restore the original value of
 GL_CLAMP_FRAGMENT_COLOR after you are done with ReadPixels,  otherwise
 it looks like you would be changing the setting for future API calls
 too.

Oh, I just noticed that this setting is already being considered in the
state flags passed to _mesa_meta_begin, so ignore that comment.

 Also, does this only apply to ReadPixels and not to GetTexImage? That
 looks odd to me.
 
 Iago
 
  _mesa_GenFramebuffers(2, fbos);
   
  if (tex_image  tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) {
 
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/14] meta: Don't do fragment color clamping in case of ReadPixels

2015-07-21 Thread Iago Toral
On Tue, 2015-06-16 at 11:15 -0700, Anuj Phogat wrote:
 Without this patch, piglit test arb_color_buffer_float-readpixels
 fails, when forced to use the meta pbo path.
 
 Signed-off-by: Anuj Phogat anuj.pho...@gmail.com
 Cc: mesa-sta...@lists.freedesktop.org
 ---
  src/mesa/drivers/common/meta_tex_subimage.c | 5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/src/mesa/drivers/common/meta_tex_subimage.c 
 b/src/mesa/drivers/common/meta_tex_subimage.c
 index 84cbc50..ccb7dfb 100644
 --- a/src/mesa/drivers/common/meta_tex_subimage.c
 +++ b/src/mesa/drivers/common/meta_tex_subimage.c
 @@ -25,6 +25,7 @@
   *Jason Ekstrand jason.ekstr...@intel.com
   */
  
 +#include blend.h
  #include bufferobj.h
  #include buffers.h
  #include fbobject.h
 @@ -305,6 +306,10 @@ _mesa_meta_pbo_GetTexSubImage(struct gl_context *ctx, 
 GLuint dims,
 _mesa_meta_begin(ctx, ~(MESA_META_PIXEL_TRANSFER |
 MESA_META_PIXEL_STORE));
  
 +   /* GL_CLAMP_FRAGMENT_COLOR doesn't affect ReadPixels. */
 +   if (!tex_image  ctx-Extensions.ARB_color_buffer_float)
 +  _mesa_ClampColor(GL_CLAMP_FRAGMENT_COLOR, GL_FALSE);
 +

I think you need to restore the original value of
GL_CLAMP_FRAGMENT_COLOR after you are done with ReadPixels,  otherwise
it looks like you would be changing the setting for future API calls
too.

Also, does this only apply to ReadPixels and not to GetTexImage? That
looks odd to me.

Iago

 _mesa_GenFramebuffers(2, fbos);
  
 if (tex_image  tex_image-TexObject-Target == GL_TEXTURE_1D_ARRAY) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Register spilling issues in the NIR-vec4 backend

2015-07-15 Thread Iago Toral
Hi,

when we sent the patches for the new nir-vec4 backend we mentioned that
we had a few dEQP tests that would fail to link because of register
spilling. Now that we have added GS support we see a few instances of
this problem popping up in a few GS piglit tests too, for example this
one:

tests/spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd.shader_test

I have been looking into what is going on with these tests and I came to
the conclusion that the problem is a consequence of various factors, but
probably the main thing contributing to it is the way our SSA pass
works. That said, I am not that experienced with NIR, so it could also
be that my analysis is missing something and I am just arriving to wrong
conclusions, so I'll explain my thoughts below and hopefully someone
else with more NIR experience can jump in and confirm or reject my
analysis.
 
The GS code in that test looks like this:

for (int p = 0; p  3; p++) {
   color = ((index = ins[p].m1.length() ?  
ins[p].m2[index-ins[p].m1.length()] :
ins[p].m1[index]) == expect) ?
   vec4(0.0, 1.0, 0.0, 1.0) : vec4(1.0, 0.0, 0.0, 1.0);
   gl_Position = gl_in[p].gl_Position;
   EmitVertex();
}

One thing that is immediately contributing to the register pressure is
some really awful code generated because of the indirect array indexing
on the inputs inside the loop. This is because of the
lower_variable_index_to_cond_assign lowering pass called from
brw_shader.cpp. This pass will convert that color assignment into a
bunch of nested if/else statements which makes the generated GLSL IR
code rather large, involving plenty of temporaries too. This is only
made worse by the fact that loop unrolling will replicate that 3 times.
The result is a huge pile of GLSL IR with a few dozens of nested if/else
statements and temporaries that looks like [1] (that is only a fragment
of the GLSL IR).

One thing that is particularly relevant in that code is that it has
multiple conditional assignments to the same variable
(dereference_array_value) as a consequence of this lowering pass.

That much, however, is common to the NIR and non-NIR paths. The problem
in the NIR case is that all these assignments generate new SSA values,
which then become new registers in the final NIR form. This leads to NIR
code like [2].  In contrast, the old vec4 visitor path, is able to have
writes to the same variable write to the same register.

As a result, if I print the code right before register allocation in the
NIR path [3] and I compare that to what we get with the old vec4 visitor
path at that same point [4], it is clearly visible that this difference
is allowing the vec4 visitor path to reduce register pressure (see how
in [4] we have multiple writes to vgrf5, while in [3] we always write to
a new vgrf every time).

So, am I missing something or is this kind of result expected with NIR
programs? Is there anything in the nir-vec4 pass that we can do to fix
this or does this need to be fixed when going out of SSA moe inside NIR?

Iago

[1] http://pastebin.com/5uA8ex2S
[2] http://pastebin.com/pqLfvAVN
[3] http://pastebin.com/64nSuUH8
[4] http://pastebin.com/WCrdYxzt

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] i965: Move BEGIN_BATCH() into same control flow as ADVANCE_BATCH().

2015-07-13 Thread Iago Toral
I think Chris did not review this one:

Reviewed-by: Iago Toral Quiroga ito...@igalia.com

Iago

On Fri, 2015-07-10 at 11:44 -0700, Matt Turner wrote:
 BEGIN_BATCH() and ADVANCE_BATCH() will contain do { and } while (0)
 respectively to allow declaring local variables used by intervening
 OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be
 in the same control flow.
 ---
  src/mesa/drivers/dri/i965/brw_draw.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
 b/src/mesa/drivers/dri/i965/brw_draw.c
 index 69ad4d4..ec13473 100644
 --- a/src/mesa/drivers/dri/i965/brw_draw.c
 +++ b/src/mesa/drivers/dri/i965/brw_draw.c
 @@ -261,17 +261,17 @@ static void brw_emit_prim(struct brw_context *brw,
indirect_flag = 0;
 }
  
 +   BEGIN_BATCH(brw-gen = 7 ? 7 : 6);
 +
 if (brw-gen = 7) {
if (brw-predicate.state == BRW_PREDICATE_STATE_USE_BIT)
   predicate_enable = GEN7_3DPRIM_PREDICATE_ENABLE;
else
   predicate_enable = 0;
  
 -  BEGIN_BATCH(7);
OUT_BATCH(CMD_3D_PRIM  16 | (7 - 2) | indirect_flag | 
 predicate_enable);
OUT_BATCH(hw_prim | vertex_access_type);
 } else {
 -  BEGIN_BATCH(6);
OUT_BATCH(CMD_3D_PRIM  16 | (6 - 2) |
  hw_prim  GEN4_3DPRIM_TOPOLOGY_TYPE_SHIFT |
  vertex_access_type);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Implement a SSBO load optimization pass

2015-10-21 Thread Iago Toral
On Wed, 2015-10-21 at 14:58 +0300, Francisco Jerez wrote:
> Iago Toral <ito...@igalia.com> writes:
> 
> > On Wed, 2015-10-21 at 13:00 +0300, Francisco Jerez wrote:
> >> Iago Toral <ito...@igalia.com> writes:
> >> 
> >> > Hi Curro,
> >> >
> >> > On Tue, 2015-10-20 at 14:18 +0300, Francisco Jerez wrote:
> >> >> Iago Toral <ito...@igalia.com> writes:
> >> >> 
> >> >> > On Tue, 2015-10-20 at 13:22 +0300, Francisco Jerez wrote:
> >> >> >> Iago Toral Quiroga <ito...@igalia.com> writes:
> >> >> >> 
> >> >> >> > This allows us to re-use the results of previous ssbo loads in 
> >> >> >> > situations
> >> >> >> > that are safe (i.e. when there are no stores, atomic operations or
> >> >> >> > memory barriers in between).
> >> >> >> >
> >> >> >> > This is particularly useful for things like matrix 
> >> >> >> > multiplications, where
> >> >> >> > for a mat4 buffer variable we cut the number of loads from 16 (4 
> >> >> >> > reads of
> >> >> >> > each column) down to 4 (1 read of each column).
> >> >> >> >
> >> >> >> > The pass can only cache ssbo loads that involve constant blocks and
> >> >> >> > offsets, but could be extended to compare sub-expressions for these
> >> >> >> > as well, similar to a CSE pass.
> >> >> >> >
> >> >> >> > The way the cache works is simple: ssbo loads with constant 
> >> >> >> > block/offset
> >> >> >> > are included in a cache as they are seen. Stores invalidate cache 
> >> >> >> > entries.
> >> >> >> > Stores with non-constant offset invalidate all cached loads for 
> >> >> >> > the block
> >> >> >> > and stores with non-constant block invalidate all cache entries. 
> >> >> >> > There is
> >> >> >> > room to improve this by using the actual variable name we are 
> >> >> >> > accessing to
> >> >> >> > limit the entries that should be invalidated. We also need to 
> >> >> >> > invalidate
> >> >> >> > cache entries when we exit the block in which they have been 
> >> >> >> > defined
> >> >> >> > (i.e. inside if/else blocks or loops).
> >> >> >> >
> >> >> >> > The cache optimization is built as a separate pass, instead of 
> >> >> >> > merging it
> >> >> >> > inside the lower_ubo_reference pass for a number of reasons:
> >> >> >> >
> >> >> >> > 1) The way we process assignments in visitors is that the LHS is
> >> >> >> > processed before the RHS. This creates a problem for an 
> >> >> >> > optimization
> >> >> >> > such as this when we do things like a = a + 1, since we would see 
> >> >> >> > the
> >> >> >> > store before the read when the actual execution order is reversed.
> >> >> >> > This could be fixed by re-implementing the logic in the visit_enter
> >> >> >> > method for ir_assignment in lower_ubo_reference and then returning
> >> >> >> > visit_continue_with_parent.
> >> >> >> >
> >> >> >> > 2) Some writes/reads need to be split into multiple smaller
> >> >> >> > writes/reads, and we need to handle caching for each one. This 
> >> >> >> > happens
> >> >> >> > deep inside the code that handles the lowering and some
> >> >> >> > of the information we need to do this is not available. This could 
> >> >> >> > also
> >> >> >> > be fixed by passing more data into the corresponding functions or 
> >> >> >> > by
> >> >> >> > making this data available as class members, but the current 
> >> >> >> > implementation
> >> >> >> > is already complex enough and  this would only contribute to the 
> >> >> >> > complexity.
> >> >> >> >
> >&

Re: [Mesa-dev] [PATCH] i965/fs: Disable CSE optimization for untyped & typed surface reads

2015-10-22 Thread Iago Toral
On Wed, 2015-10-21 at 23:24 -0700, Jordan Justen wrote:
> On 2015-10-20 00:43:13, Iago Toral wrote:
> > On Tue, 2015-10-20 at 00:12 -0700, Jordan Justen wrote:
> > > An untyped surface read is volatile because it might be affected by a
> > > write.
> > > 
> > > In the ES31-CTS.compute_shader.resources-max test, two back to back
> > > read/modify/writes of an SSBO variable looked something like this:
> > > 
> > >   r1 = untyped_surface_read(ssbo_float)
> > >   r2 = r1 + 1
> > >   untyped_surface_write(ssbo_float, r2)
> > >   r3 = untyped_surface_read(ssbo_float)
> > >   r4 = r3 + 1
> > >   untyped_surface_write(ssbo_float, r4)
> > > 
> > > And after CSE, we had:
> > > 
> > >   r1 = untyped_surface_read(ssbo_float)
> > >   r2 = r1 + 1
> > >   untyped_surface_write(ssbo_float, r2)
> > >   r4 = r1 + 1
> > >   untyped_surface_write(ssbo_float, r4)
> > 
> > Yeah, we cannot do CSE with SSBO loads. Patch looks good to me, but we
> > should do the same in the vec4 CSE pass.
> 
> Yeah, I checked vec4 CSE. It looks like is_expression will
> unconditionally return false for those opcodes.

Oh right.

> r-b?

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

FWIW, my ssbo load optimization pass is trying to "undo" this since it
is all about doing CSE for ssbo loads that are safe to CSE, that is,
when we know that we don't have stores/atomics that write to the same
offset or memory barriers in between. I am trying to implement that in
NIR though, so we still need this, to prevent i965 from trying to CSE
the remaining loads it sees, since thise would not be safe to CSE.

Also, as I mentioned in another e-mail, we did not notice this issue
earlier was because there are a couple of problems in i965 that make it
quite difficult that the CSE pass identifies identical SSBO loads at the
moment, but that is bound to change as soon as those things get
eventually fixed.

Iago

> -Jordan
> 
> > 
> > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_fs_cse.cpp |  3 ++-
> > >  src/mesa/drivers/dri/i965/brw_shader.cpp | 14 ++
> > >  src/mesa/drivers/dri/i965/brw_shader.h   |  6 ++
> > >  3 files changed, 22 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
> > > b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> > > index c7628dc..3a28c8d 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
> > > @@ -93,7 +93,8 @@ is_expression(const fs_visitor *v, const fs_inst *const 
> > > inst)
> > > case SHADER_OPCODE_LOAD_PAYLOAD:
> > >return !inst->is_copy_payload(v->alloc);
> > > default:
> > > -  return inst->is_send_from_grf() && !inst->has_side_effects();
> > > +  return inst->is_send_from_grf() && !inst->has_side_effects() &&
> > > + !inst->is_volatile();
> > > }
> > >  }
> > >  
> > > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> > > b/src/mesa/drivers/dri/i965/brw_shader.cpp
> > > index 2324b56..be911ed 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> > > @@ -969,6 +969,20 @@ backend_instruction::has_side_effects() const
> > > }
> > >  }
> > >  
> > > +bool
> > > +backend_instruction::is_volatile() const
> > > +{
> > > +   switch (opcode) {
> > > +   case SHADER_OPCODE_UNTYPED_SURFACE_READ:
> > > +   case SHADER_OPCODE_UNTYPED_SURFACE_READ_LOGICAL:
> > > +   case SHADER_OPCODE_TYPED_SURFACE_READ:
> > > +   case SHADER_OPCODE_TYPED_SURFACE_READ_LOGICAL:
> > > +  return true;
> > > +   default:
> > > +  return false;
> > > +   }
> > > +}
> > > +
> > >  #ifndef NDEBUG
> > >  static bool
> > >  inst_is_in_block(const bblock_t *block, const backend_instruction *inst)
> > > diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
> > > b/src/mesa/drivers/dri/i965/brw_shader.h
> > > index b33b08f..35ee210 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_shader.h
> > > +++ b/src/mesa/drivers/dri/i965/brw_shader.h
> > > @@ -115,6 +115,12 @@ struct backend_instruction : public exec_node {
> > >  * optimize these out unless you know what you are doing.
> > >  */
> > > bool has_side_effects() const;
> > > +
> > > +   /**
> > > +* True if the instruction might be affected by side effects of other
> > > +* instructions.
> > > +*/
> > > +   bool is_volatile() const;
> > >  #else
> > >  struct backend_instruction {
> > > struct exec_node link;
> > 
> > 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/vec4: Initialize LOD to 0.0f for textureQueryLevels() and texture().

2015-10-22 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

On Wed, 2015-10-21 at 12:30 -0700, Matt Turner wrote:
> We implement textureQueryLevels (which takes no arguments, save the
> sampler) using the resinfo message (which takes an argument of LOD).
> Without initializing it, we'd generate a MOV from the null register to
> load the LOD argument.
> 
> Essentially the same logic applies to texture. A vertex shader cannot
> compute derivatives and so cannot produce an LOD, so TXL with an LOD of
> 0.0 is used.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index c39f97e..b8f90f2 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -882,6 +882,18 @@ vec4_visitor::emit_texture(ir_texture_opcode op,
> uint32_t sampler,
> src_reg sampler_reg)
>  {
> +   /* The sampler can only meaningfully compute LOD for fragment shader
> +* messages. For all other stages, we change the opcode to TXL and 
> hardcode
> +* the LOD to 0.
> +*
> +* textureQueryLevels() is implemented in terms of TXS so we need to pass 
> a
> +* valid LOD argument.
> +*/
> +   if (op == ir_tex || op == ir_query_levels) {
> +  assert(lod.file == BAD_FILE);
> +  lod = src_reg(0.0f);
> +   }
> +
> enum opcode opcode;
> switch (op) {
> case ir_tex: opcode = SHADER_OPCODE_TXL; break;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads

2015-10-22 Thread Iago Toral
On Thu, 2015-10-22 at 09:39 -0400, Connor Abbott wrote:
> On Thu, Oct 22, 2015 at 7:21 AM, Iago Toral Quiroga <ito...@igalia.com> wrote:
> > I implemented this first as a separate optimization pass in GLSL IR [1], but
> > Curro pointed out that this being pretty much a restricted form of a CSE 
> > pass
> > it would probably make more sense to do it inside CSE (and we no longer have
> > a CSE pass in GLSL IR).
> >
> > Unlike other things we CSE in NIR, in the case of SSBO loads we need to make
> > sure that we invalidate previous entries in the set in the presence of
> > conflicting instructions (i.e. SSBO writes to the same block and offset) or
> > in the presence of memory barriers.
> >
> > If this is accepted I intend to extend this to also cover image reads, which
> > follow similar behavior.
> >
> > No regressions observed in piglit or dEQP's SSBO functional tests.
> >
> > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html
> >
> > Iago Toral Quiroga (2):
> >   nir/cse: invalidate SSBO loads in presence of ssbo writes or memory
> > barriers
> >   nir/instr_set: allow rewrite of SSBO loads
> >
> >  src/glsl/nir/nir_instr_set.c |  24 ++--
> >  src/glsl/nir/nir_opt_cse.c   | 142 
> > +++
> >  2 files changed, 162 insertions(+), 4 deletions(-)
> >
> > --
> > 1.9.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> NAK, this isn't going to work. NIR CSE is designed for operations
> which can be moved around freely as long they're still dominated by
> the SSA values they use. It makes heavy advantage of this to avoid
> looking at the entire CFG and instead only at the current block and
> its parents in the dominance tree. For example, imagine you have
> something like:
> 
> A = load_ssbo 0
> if  (cond) {
>store_ssbo 0
> }
> B = load_ssbo 0
> 
> Then A and B can't be combined, but CSE will combine them anyways when
> it reaches B because it keeps a hash table of values dominating B and
> finds A as a match. It doesn't look at the if conditional at all
> because it doesn't dominate the load to B. This is great when you want
> to CSE pure things that don't depend on other side effects -- after
> all, this is the sort of efficiency that SSA is supposed to give us --
> but it means that as-is, it can't be used for e.g. SSBO's and images
> without completely changing how the pass works and making it less
> efficient.

Ugh! One would think that at least one of the 2000+ SSBO tests in dEQP
would catch something like this... I guess not :(.

> Now, that being said, I still think that we should definitely be doing
> this sort of thing in NIR now that we've finally added support for
> SSBO's and images. We've been trying to avoid adding new optimizations
> to GLSL, since we've been trying to move away from it. In addition,
> with SPIR-V on the way, anything added to GLSL IR now is something
> that we won't be able to use with SPIR-V shaders. Only doing it in FS
> doesn't sound so great either; we should be doing as much as possible
> at the midlevel, and combining SSBO loads is something that isn't
> FS-specific at all.

Yeah, agreed.

> There are two ways I can see support for this being added to NIR:
> 
> 1. Add an extra fake source/destination to intrinsics with side
> effects, and add a pass to do essentially a conversion to SSA that
> wires up these "token" sources/destinations, or perhaps extend the
> existing to-SSA pass.
> 
> 2. Add a special "load-combining" pass that does some dataflow
> analysis or similar (or, for now, only looks at things within a single
> block).
> 
> The advantage of #1 is that we get to use existing NIR passes, like
> CSE, DCE, and GCM "for free" on SSBO loads and stores, without having
> to do the equivalent thing using dataflow analysis. Also, doing store
> forwarding (i.e. replacing the result of an SSBO load with the value
> corresponding to a store, if we can figure out which store affects it)
> is going to much easier. However, #1 is going to be much more of a
> research project. I've thought about how we could do it, but I'm still
> not sure how it could be done feasibly and still be correct.

Thanks for sharing these ideas. #1 looks like the best way to go in
terms of benefits (although it looks rather artificial!), however I am
not sure that my understanding of NIR at this moment is good enough to
pursue something like that. Also, I would really like to see some sort
of support 

Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads

2015-10-26 Thread Iago Toral
On Fri, 2015-10-23 at 09:26 -0700, Jason Ekstrand wrote:
> On Thu, Oct 22, 2015 at 11:13 PM, Iago Toral <ito...@igalia.com> wrote:
> > On Thu, 2015-10-22 at 09:09 -0700, Jason Ekstrand wrote:
> >> On Thu, Oct 22, 2015 at 4:21 AM, Iago Toral Quiroga <ito...@igalia.com> 
> >> wrote:
> >> > I implemented this first as a separate optimization pass in GLSL IR [1], 
> >> > but
> >> > Curro pointed out that this being pretty much a restricted form of a CSE 
> >> > pass
> >> > it would probably make more sense to do it inside CSE (and we no longer 
> >> > have
> >> > a CSE pass in GLSL IR).
> >> >
> >> > Unlike other things we CSE in NIR, in the case of SSBO loads we need to 
> >> > make
> >> > sure that we invalidate previous entries in the set in the presence of
> >> > conflicting instructions (i.e. SSBO writes to the same block and offset) 
> >> > or
> >> > in the presence of memory barriers.
> >> >
> >> > If this is accepted I intend to extend this to also cover image reads, 
> >> > which
> >> > follow similar behavior.
> >> >
> >> > No regressions observed in piglit or dEQP's SSBO functional tests.
> >> >
> >> > [1] 
> >> > http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html
> >>
> >> I think you've gotten enough NAK's that I don't need to chime in
> >> there.  Unfortunately, solving this in general is something of a
> >> research project that both Connor and I have been thinking about for
> >> quite some time.  I've been thinking off-and-on about how to add a
> >> proper memory model to lower_vars_to_ssa for almost a year now and
> >> still haven't come up with a good way to do it.  I don't know whether
> >> SSBO's would be simpler or not.  We need a proper memory model for
> >> both lower_vars_to_ssa and SSBO load/stores (and shared local
> >> variables) but it's a substantial research project.
> >>
> >> This isn't to say that you couldn't do it.  Just know what you're taking 
> >> on. ;-)
> >
> > Yeah, it does not make sense that I try to do this, you guys have
> > clearly given this much more thought than me and know much better how a
> > solution for this would fit in NIR than me.
> >
> >> That said, here's a suggestion for something that we *could* write
> >> today, wouldn't be very hard, and wold solve a decent number of cases.
> >>
> >> For each block:
> >>
> >> 1) Create a new instruction set (don't use anything from any previous 
> >> blocks)
> >> 2) call add_or_rewrite on all ssbo load operations
> >> 3) If you ever see a barrier or ssbo store, destroy the entire
> >> instruction set and start again.
> >
> > Yep, this is what I was thinking for the load-combine pass that Connor
> > suggested. However, I think that in this case we do not need to destroy
> > the entire set when we find a store, only for memory barriers, right? I
> > mean, there should be nothing preventing us from checking the
> > offset/block of the store and compare it with the offset/block of the
> > loads in the set to decide which ones we need to remove (like I was
> > doing in my last patch)
> 
> That's where you get into the "special casing" I mentioned below.  If
> you have an direct store, you would have to throw away any indirect
> loads 

Yes.

> and then insert a fake direct load for the given offset.

Actually, what I am doing is a bit different:

When I see stores, I also insert them in the hash table (but I never
rewrite stores). Then, when I see see a load, I check for a match, if I
have it, I use it, if not, I check if I have a store to the same offset,
and If I do, I just use that, no need to fake anything. Of course, if I
do this, in order to check if I have a compatible store I have to
traverse the hash table looking for a match, but I think that should be
okay in this case, since that only has load/store operations and only
the ones in the current block, so I think it should be okay.

Does this seem like a reasonable alternative?

>   If you
> have an indirect store, you would have to throw away everything and
> then insert a fake indirect load for the given offset.  So, yes, you
> can do it, but it'll take a little more work.

Yeah, mostly because there are also atomics to consider and then you
also have to check that the stores write to all the components we read
before we reuse them, etc.

>   You'll also probabl

Re: [Mesa-dev] [PATCH 1/2] i965/vec4: Remove unnecessary #includes from the generator.

2015-10-27 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

On Sat, 2015-10-24 at 13:20 -0700, Matt Turner wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index 6569d81..d50184a 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -20,18 +20,10 @@
>   * IN THE SOFTWARE.
>   */
>  
> -#include 
>  #include "glsl/glsl_parser_extras.h"
>  #include "brw_vec4.h"
>  #include "brw_cfg.h"
>  
> -extern "C" {
> -#include "brw_eu.h"
> -#include "main/macros.h"
> -#include "program/prog_print.h"
> -#include "program/prog_parameter.h"
> -};
> -
>  using namespace brw;
>  
>  static void


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] nir: Store PatchInputsRead and PatchOutputsWritten in nir_shader_info.

2015-10-28 Thread Iago Toral
On Wed, 2015-10-28 at 03:37 -0700, Kenneth Graunke wrote:
> These tessellation shader related fields need plumbing through NIR.
> 
> Signed-off-by: Kenneth Graunke <kenn...@whitecape.org>
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 2 ++
>  src/glsl/nir/nir.h   | 5 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index 7275b48..3722ae0 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -160,6 +160,8 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
> shader->info.num_images = sh->NumImages;
> shader->info.inputs_read = sh->Program->InputsRead;
> shader->info.outputs_written = sh->Program->OutputsWritten;
> +   shader->info.patch_inputs_read = sh->Program->PatchInputsRead;
> +   shader->info.patch_outputs_written = sh->Program->PatchOutputsWritten;
> shader->info.system_values_read = sh->Program->SystemValuesRead;
> shader->info.uses_texture_gather = sh->Program->UsesGather;
> shader->info.uses_clip_distance_out =
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index ac42251..d053273 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1481,6 +1481,11 @@ typedef struct nir_shader_info {
> /* Which system values are actually read */
> uint64_t system_values_read;
>  
> +   /* Which patch inputs are actually read */
> +   uint64_t patch_inputs_read;
> +   /* Which patch outputs are actually written */
> +   uint64_t patch_outputs_written;

These two should be fine as uint32_t since we set them from a GLBitfield
which is 32-bit after all... or maybe you were thinking that we might
need to grow these bitfields in Mesa to 64-bits eventually.

All 3 patches are:
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

BTW, system_values_read should also be fine as 32-bit for the same
reason. I can send a patch for that if we care.

Iago

> /* Whether or not this shader ever uses textureGather() */
> bool uses_texture_gather;
>  


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex

2015-10-28 Thread Iago Toral
On Wed, 2015-10-28 at 09:11 +0100, Iago Toral wrote:
> Yeah, this makes things more consistent:
> Reviewed-by: Iago Toral Quiroga <ito...@igali.com>

I meant:

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

> 
> On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote:
> > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> > Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com>
> > Cc: Iago Toral Quiroga <ito...@igalia.com>
> > ---
> >  src/glsl/link_uniform_initializers.cpp |  2 +-
> >  src/glsl/linker.cpp| 16 
> >  src/glsl/standalone_scaffolding.cpp|  4 ++--
> >  src/mesa/main/mtypes.h | 11 ++-
> >  src/mesa/main/shader_query.cpp |  2 +-
> >  src/mesa/main/shaderobj.c  |  4 ++--
> >  src/mesa/main/uniforms.c   |  4 ++--
> >  7 files changed, 22 insertions(+), 21 deletions(-)
> > 
> > diff --git a/src/glsl/link_uniform_initializers.cpp 
> > b/src/glsl/link_uniform_initializers.cpp
> > index 682a4ee..58d21e5 100644
> > --- a/src/glsl/link_uniform_initializers.cpp
> > +++ b/src/glsl/link_uniform_initializers.cpp
> > @@ -178,7 +178,7 @@ set_block_binding(gl_shader_program *prog, const char 
> > *block_name, int binding)
> >  
> >/* This is a field of a UBO.  val is the binding index. */
> >for (int i = 0; i < MESA_SHADER_STAGES; i++) {
> > - int stage_index = prog->UniformBlockStageIndex[i][block_index];
> > + int stage_index = prog->InterfaceBlockStageIndex[i][block_index];
> >  
> >   if (stage_index != -1) {
> >  struct gl_shader *sh = prog->_LinkedShaders[i];
> > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> > index cfd8f81..3db2fd3 100644
> > --- a/src/glsl/linker.cpp
> > +++ b/src/glsl/linker.cpp
> > @@ -1174,10 +1174,10 @@ interstage_cross_validate_uniform_blocks(struct 
> > gl_shader_program *prog)
> > for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
> >struct gl_shader *sh = prog->_LinkedShaders[i];
> >  
> > -  prog->UniformBlockStageIndex[i] = ralloc_array(prog, int,
> > -max_num_uniform_blocks);
> > +  prog->InterfaceBlockStageIndex[i] = ralloc_array(prog, int,
> > +   
> > max_num_uniform_blocks);
> >for (unsigned int j = 0; j < max_num_uniform_blocks; j++)
> > -prog->UniformBlockStageIndex[i][j] = -1;
> > +prog->InterfaceBlockStageIndex[i][j] = -1;
> >  
> >if (sh == NULL)
> >  continue;
> > @@ -1194,7 +1194,7 @@ interstage_cross_validate_uniform_blocks(struct 
> > gl_shader_program *prog)
> > return false;
> >  }
> >  
> > -prog->UniformBlockStageIndex[i][index] = j;
> > +prog->InterfaceBlockStageIndex[i][index] = j;
> >}
> > }
> >  
> > @@ -2836,9 +2836,9 @@ check_resources(struct gl_context *ctx, struct 
> > gl_shader_program *prog)
> >}
> >  
> >for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) {
> > -if (prog->UniformBlockStageIndex[j][i] != -1) {
> > +if (prog->InterfaceBlockStageIndex[j][i] != -1) {
> >  struct gl_shader *sh = prog->_LinkedShaders[j];
> > -int stage_index = prog->UniformBlockStageIndex[j][i];
> > +int stage_index = prog->InterfaceBlockStageIndex[j][i];
> >  if (sh && 
> > sh->BufferInterfaceBlocks[stage_index].IsShaderStorage) {
> > shader_blocks[j]++;
> > total_shader_storage_blocks++;
> > @@ -2955,7 +2955,7 @@ check_image_resources(struct gl_context *ctx, struct 
> > gl_shader_program *prog)
> >   total_image_units += sh->NumImages;
> >  
> >   for (unsigned j = 0; j < prog->NumBufferInterfaceBlocks; j++) {
> > -int stage_index = prog->UniformBlockStageIndex[i][j];
> > +int stage_index = prog->InterfaceBlockStageIndex[i][j];
> >  if (stage_index != -1 && 
> > sh->BufferInterfaceBlocks[stage_index].IsShaderStorage)
> > total_shader_storage_blocks++;
> >   }
> > @@ -3734,7 +3734,7 @@ build_program_resource_list(struct gl_shader_program 
> > *shProg)
> >int block_index = shProg->UniformStorage[i].block_index;
> >if (block_index != -1) {
> >   for (unsigned j = 0; j 

Re: [Mesa-dev] [PATCH 1/3] mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex

2015-10-28 Thread Iago Toral
Yeah, this makes things more consistent:
Reviewed-by: Iago Toral Quiroga <ito...@igali.com>

On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com>
> Cc: Iago Toral Quiroga <ito...@igalia.com>
> ---
>  src/glsl/link_uniform_initializers.cpp |  2 +-
>  src/glsl/linker.cpp| 16 
>  src/glsl/standalone_scaffolding.cpp|  4 ++--
>  src/mesa/main/mtypes.h | 11 ++-
>  src/mesa/main/shader_query.cpp |  2 +-
>  src/mesa/main/shaderobj.c  |  4 ++--
>  src/mesa/main/uniforms.c   |  4 ++--
>  7 files changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/src/glsl/link_uniform_initializers.cpp 
> b/src/glsl/link_uniform_initializers.cpp
> index 682a4ee..58d21e5 100644
> --- a/src/glsl/link_uniform_initializers.cpp
> +++ b/src/glsl/link_uniform_initializers.cpp
> @@ -178,7 +178,7 @@ set_block_binding(gl_shader_program *prog, const char 
> *block_name, int binding)
>  
>/* This is a field of a UBO.  val is the binding index. */
>for (int i = 0; i < MESA_SHADER_STAGES; i++) {
> - int stage_index = prog->UniformBlockStageIndex[i][block_index];
> + int stage_index = prog->InterfaceBlockStageIndex[i][block_index];
>  
>   if (stage_index != -1) {
>  struct gl_shader *sh = prog->_LinkedShaders[i];
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index cfd8f81..3db2fd3 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -1174,10 +1174,10 @@ interstage_cross_validate_uniform_blocks(struct 
> gl_shader_program *prog)
> for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
>struct gl_shader *sh = prog->_LinkedShaders[i];
>  
> -  prog->UniformBlockStageIndex[i] = ralloc_array(prog, int,
> -  max_num_uniform_blocks);
> +  prog->InterfaceBlockStageIndex[i] = ralloc_array(prog, int,
> +   
> max_num_uniform_blocks);
>for (unsigned int j = 0; j < max_num_uniform_blocks; j++)
> -  prog->UniformBlockStageIndex[i][j] = -1;
> +  prog->InterfaceBlockStageIndex[i][j] = -1;
>  
>if (sh == NULL)
>continue;
> @@ -1194,7 +1194,7 @@ interstage_cross_validate_uniform_blocks(struct 
> gl_shader_program *prog)
>   return false;
>}
>  
> -  prog->UniformBlockStageIndex[i][index] = j;
> +  prog->InterfaceBlockStageIndex[i][index] = j;
>}
> }
>  
> @@ -2836,9 +2836,9 @@ check_resources(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>}
>  
>for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) {
> -  if (prog->UniformBlockStageIndex[j][i] != -1) {
> +  if (prog->InterfaceBlockStageIndex[j][i] != -1) {
>  struct gl_shader *sh = prog->_LinkedShaders[j];
> -int stage_index = prog->UniformBlockStageIndex[j][i];
> +int stage_index = prog->InterfaceBlockStageIndex[j][i];
>  if (sh && 
> sh->BufferInterfaceBlocks[stage_index].IsShaderStorage) {
> shader_blocks[j]++;
> total_shader_storage_blocks++;
> @@ -2955,7 +2955,7 @@ check_image_resources(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>   total_image_units += sh->NumImages;
>  
>   for (unsigned j = 0; j < prog->NumBufferInterfaceBlocks; j++) {
> -int stage_index = prog->UniformBlockStageIndex[i][j];
> +int stage_index = prog->InterfaceBlockStageIndex[i][j];
>  if (stage_index != -1 && 
> sh->BufferInterfaceBlocks[stage_index].IsShaderStorage)
> total_shader_storage_blocks++;
>   }
> @@ -3734,7 +3734,7 @@ build_program_resource_list(struct gl_shader_program 
> *shProg)
>int block_index = shProg->UniformStorage[i].block_index;
>if (block_index != -1) {
>   for (unsigned j = 0; j < MESA_SHADER_STAGES; j++) {
> - if (shProg->UniformBlockStageIndex[j][block_index] != -1)
> + if (shProg->InterfaceBlockStageIndex[j][block_index] != -1)
>  stageref |= (1 << j);
>   }
>}
> diff --git a/src/glsl/standalone_scaffolding.cpp 
> b/src/glsl/standalone_scaffolding.cpp
> index eccf094..fe1d820 100644
> --- a/src/glsl/standalone_scaffolding.cpp
> +++ b/src/glsl/standalone_scaffolding.cpp
> @@ -120,8 +120,8 @@ _mesa_clear_shader_program_data(s

Re: [Mesa-dev] [PATCH 3/3] mesa: Use UBO/SSBO indices during binding

2015-10-28 Thread Iago Toral
On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote:
> Previously we were treating the binding index for Uniform Buffer
> Objects and Shader Storage Buffer Objects as being part of the
> combined BufferInterfaceBlocks array.
> 
> Fixes ES31-CTS.compute_shader.resource-ubo on i965.
> 
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com>
> Cc: Iago Toral Quiroga <ito...@igalia.com>
> ---
>  src/mesa/main/uniforms.c | 30 --
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/src/mesa/main/uniforms.c b/src/mesa/main/uniforms.c
> index 758ca24..47f80ce 100644
> --- a/src/mesa/main/uniforms.c
> +++ b/src/mesa/main/uniforms.c
> @@ -1002,10 +1002,10 @@ _mesa_UniformBlockBinding(GLuint program,
> if (!shProg)
>return;
>  
> -   if (uniformBlockIndex >= shProg->NumBufferInterfaceBlocks) {
> +   if (uniformBlockIndex >= shProg->NumUniformBlocks) {
>_mesa_error(ctx, GL_INVALID_VALUE,
> "glUniformBlockBinding(block index %u >= %u)",
> -   uniformBlockIndex, shProg->NumBufferInterfaceBlocks);
> +   uniformBlockIndex, shProg->NumUniformBlocks);
>return;
> }
>  
> @@ -1016,17 +1016,22 @@ _mesa_UniformBlockBinding(GLuint program,
>return;
> }
>  
> -   if (shProg->BufferInterfaceBlocks[uniformBlockIndex].Binding !=
> +   if (shProg->UniformBlocks[uniformBlockIndex]->Binding !=
> uniformBlockBinding) {
>int i;
>  
>FLUSH_VERTICES(ctx, 0);
>ctx->NewDriverState |= ctx->DriverFlags.NewUniformBuffer;
>  
> -  shProg->BufferInterfaceBlocks[uniformBlockIndex].Binding = 
> uniformBlockBinding;
> +  const int interface_block_index =
> + shProg->UboInterfaceBlockIndex[uniformBlockIndex];
> +
> +  shProg->BufferInterfaceBlocks[interface_block_index].Binding =
> + uniformBlockBinding;

FWIW, here you could just do:
shProg->UniformBlocks[uniformBlockIndex]->Binding = uniformBlockBinding;

since shProg->UniformBlocks holds pointers into the appropriate places
in shProg->BufferInterfaceBlocks.

>for (i = 0; i < MESA_SHADER_STAGES; i++) {
> -  int stage_index = 
> shProg->InterfaceBlockStageIndex[i][uniformBlockIndex];
> +  int stage_index =
> +shProg->InterfaceBlockStageIndex[i][interface_block_index];

However, we still need this and so we still need to have the index in
the interface block space, so either way:

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

>if (stage_index != -1) {
>   struct gl_shader *sh = shProg->_LinkedShaders[i];
> @@ -1054,10 +1059,10 @@ _mesa_ShaderStorageBlockBinding(GLuint program,
> if (!shProg)
>return;
>  
> -   if (shaderStorageBlockIndex >= shProg->NumBufferInterfaceBlocks) {
> +   if (shaderStorageBlockIndex >= shProg->NumShaderStorageBlocks) {
>_mesa_error(ctx, GL_INVALID_VALUE,
> "glShaderStorageBlockBinding(block index %u >= %u)",
> -   shaderStorageBlockIndex, shProg->NumBufferInterfaceBlocks);
> +   shaderStorageBlockIndex, shProg->NumShaderStorageBlocks);
>return;
> }
>  
> @@ -1069,17 +1074,22 @@ _mesa_ShaderStorageBlockBinding(GLuint program,
>return;
> }
>  
> -   if (shProg->BufferInterfaceBlocks[shaderStorageBlockIndex].Binding !=
> +   if (shProg->ShaderStorageBlocks[shaderStorageBlockIndex]->Binding !=
> shaderStorageBlockBinding) {
>int i;
>  
>FLUSH_VERTICES(ctx, 0);
>ctx->NewDriverState |= ctx->DriverFlags.NewShaderStorageBuffer;
>  
> -  shProg->BufferInterfaceBlocks[shaderStorageBlockIndex].Binding = 
> shaderStorageBlockBinding;
> +  const int interface_block_index =
> + shProg->SsboInterfaceBlockIndex[shaderStorageBlockIndex];
> +
> +  shProg->BufferInterfaceBlocks[interface_block_index].Binding =
> + shaderStorageBlockBinding;
>  
>for (i = 0; i < MESA_SHADER_STAGES; i++) {
> -  int stage_index = 
> shProg->InterfaceBlockStageIndex[i][shaderStorageBlockIndex];
> +  int stage_index =
> +shProg->InterfaceBlockStageIndex[i][interface_block_index];
>  
>if (stage_index != -1) {
>   struct gl_shader *sh = shProg->_LinkedShaders[i];


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] nir/instr_set: Add an allow_loads field

2015-10-28 Thread Iago Toral
On Tue, 2015-10-27 at 14:33 +0200, Pohjolainen, Topi wrote:
> On Tue, Oct 27, 2015 at 10:28:58AM +0100, Iago Toral Quiroga wrote:
> > We need this so we can configure different behaviors for passes that
> > cannot deal with side-effectful instructions (CSE) and passes that can
> > (we will add a load-combine pass shortly).
> > 
> > For now, when allow_loads is true, we let the instruction set rewrite
> > SSBO loads.
> > ---
> >  src/glsl/nir/nir_instr_set.c | 51 
> > 
> >  src/glsl/nir/nir_instr_set.h | 20 -
> >  src/glsl/nir/nir_opt_cse.c   |  4 ++--
> >  3 files changed, 50 insertions(+), 25 deletions(-)
> > 
> > diff --git a/src/glsl/nir/nir_instr_set.c b/src/glsl/nir/nir_instr_set.c
> > index d3f939f..583618f 100644
> > --- a/src/glsl/nir/nir_instr_set.c
> > +++ b/src/glsl/nir/nir_instr_set.c
> > @@ -398,6 +398,13 @@ dest_is_ssa(nir_dest *dest, void *data)
> > return dest->is_ssa;
> >  }
> >  
> > +static bool
> > +is_load(nir_intrinsic_instr *instr)
> > +{
> > +   return instr->intrinsic == nir_intrinsic_load_ssbo ||
> > +  instr->intrinsic == nir_intrinsic_load_ssbo_indirect;
> > +}
> > +
> >  /* This function determines if uses of an instruction can safely be 
> > rewritten
> >   * to use another identical instruction instead. Note that this function 
> > must
> >   * be kept in sync with hash_instr() and nir_instrs_equal() -- only
> > @@ -406,7 +413,7 @@ dest_is_ssa(nir_dest *dest, void *data)
> >   */
> >  
> >  static bool
> > -instr_can_rewrite(nir_instr *instr)
> > +instr_can_rewrite(nir_instr *instr, bool allow_loads)
> >  {
> > /* We only handle SSA. */
> > if (!nir_foreach_dest(instr, dest_is_ssa, NULL) ||
> > @@ -428,11 +435,15 @@ instr_can_rewrite(nir_instr *instr)
> >return true;
> > }
> > case nir_instr_type_intrinsic: {
> > +  nir_intrinsic_instr *intrinsic = nir_instr_as_intrinsic(instr);
> >const nir_intrinsic_info *info =
> > - _intrinsic_infos[nir_instr_as_intrinsic(instr)->intrinsic];
> > -  return (info->flags & NIR_INTRINSIC_CAN_ELIMINATE) &&
> > - (info->flags & NIR_INTRINSIC_CAN_REORDER) &&
> > - info->num_variables == 0; /* not implemented yet */
> > + _intrinsic_infos[intrinsic->intrinsic];
> > +  bool can_eliminate_and_reorder =
> > + (info->flags & NIR_INTRINSIC_CAN_ELIMINATE) &&
> > + (info->flags & NIR_INTRINSIC_CAN_REORDER) &&
> > + info->num_variables == 0; /* not implemented yet */
> > +  return can_eliminate_and_reorder ?
> > + true: allow_loads && is_load(intrinsic);
> 
> Isn't this just?
> 
>  return can_eliminate_and_reorder ||
> (allow_loads && is_load(intrinsic));
> 
> Received: from fanzine.local.igalia.com ([192.168.10.13] 
> helo=fanzine.igalia.com)
>   by mail.igalia.com with esmtps 
>   (Cipher TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim)
>   id 1Zr3SP-0002rb-7Q
>   for <ito...@igalia.com>; Tue, 27 Oct 2015 13:34:29 +0100
> Received: from mga14.intel.com ([192.55.52.115])
>   by fanzine.igalia.com with esmtp (Exim)
>   id 1Zr3SO-0001hB-Rd
>   for <ito...@igalia.com>; Tue, 27 Oct 2015 13:34:29 +0100
> Received: from fmsmga002.fm.intel.com ([10.253.24.26])
>   by fmsmga103.fm.intel.com with ESMTP; 27 Oct 2015 05:33:51 -0700
> X-ExtLoop1: 1
> X-IronPort-AV: E=Sophos;i="5.20,205,1444719600"; 
>d="scan'208";a="836522023"
> Received: from kgoijens-mobl5.ger.corp.intel.com (HELO nelli) 
> ([10.252.24.134])
>   by fmsmga002.fm.intel.com with ESMTP; 27 Oct 2015 05:33:50 -0700
> Date: Tue, 27 Oct 2015 14:33:50 +0200
> From: "Pohjolainen, Topi" <topi.pohjolai...@intel.com>
> To: Iago Toral Quiroga <ito...@igalia.com>
> Cc: mesa-dev@lists.freedesktop.org
> Subject: Re: [Mesa-dev] [PATCH 1/4] nir/instr_set: Add an allow_loads field
> Message-ID: <20151027123349.gb2...@nelli.ger.corp.intel.com>
> References: <1445938141-28845-1-git-send-email-ito...@igalia.com>
>  <1445938141-28845-2-git-send-email-ito...@igalia.com>
> MIME-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> In-Reply-To: <1445938141-28845-2-git-send-email-ito...@igalia.com>
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> On Tue, Oct 27, 2015 at 10:28:58AM +0100, Iago Toral Quiroga 

Re: [Mesa-dev] [PATCH 2/3] mesa: Map program UBOs and SSBOs to Interface Blocks

2015-10-28 Thread Iago Toral
On Tue, 2015-10-27 at 22:38 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> Cc: Samuel Iglesias Gonsálvez <sigles...@igalia.com>
> Cc: Iago Toral Quiroga <ito...@igalia.com>
> ---
>  src/glsl/linker.cpp | 14 ++
>  src/glsl/standalone_scaffolding.cpp |  5 +
>  src/mesa/main/mtypes.h  |  7 +++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index 3db2fd3..d925393 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -4448,6 +4448,20 @@ link_shaders(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>  >ShaderStorageBlocks,
>  >NumShaderStorageBlocks);
>  
> +   prog->UboInterfaceBlockIndex =
> +  ralloc_array(prog, int, prog->NumUniformBlocks);
> +   prog->SsboInterfaceBlockIndex =
> +  ralloc_array(prog, int, prog->NumShaderStorageBlocks);
> +   for (unsigned i = 0, u = 0, s = 0;
> +i < prog->NumBufferInterfaceBlocks;
> +i++) {
> +  if (prog->BufferInterfaceBlocks[i].IsShaderStorage) {
> + prog->SsboInterfaceBlockIndex[s++] = i;
> +  } else {
> + prog->UboInterfaceBlockIndex[u++] = i;
> +  }
> +   }

This loop is pretty much the same we have in split_ubos_and_ssbos and
the functionality implements seems to belong in that function too, so
maybe it would be better if we merged this loop in that function. I am
thinking that we could pass two more arguments to split_ubos_and_ssbos
that would be NULL in the first call (the one we do in the loop for all
shader stages) and in the last call we do outside the loop for the
shader program, we pass >UboInterfaceBlockIndex and
>SsboInterfaceBlockIndex to get these setup.

What do you think?

> /* FINISHME: Assign fragment shader output locations. */
>  
>  done:
> diff --git a/src/glsl/standalone_scaffolding.cpp 
> b/src/glsl/standalone_scaffolding.cpp
> index fe1d820..5c76295 100644
> --- a/src/glsl/standalone_scaffolding.cpp
> +++ b/src/glsl/standalone_scaffolding.cpp
> @@ -124,6 +124,11 @@ _mesa_clear_shader_program_data(struct gl_shader_program 
> *shProg)
>shProg->InterfaceBlockStageIndex[i] = NULL;
> }
>  
> +   ralloc_free(shProg->UboInterfaceBlockIndex);
> +   shProg->UboInterfaceBlockIndex = NULL;
> +   ralloc_free(shProg->SsboInterfaceBlockIndex);
> +   shProg->SsboInterfaceBlockIndex = NULL;
> +
> ralloc_free(shProg->AtomicBuffers);
> shProg->AtomicBuffers = NULL;
> shProg->NumAtomicBuffers = 0;
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index cce959e..c1cbe96 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -2725,6 +2725,13 @@ struct gl_shader_program
> int *InterfaceBlockStageIndex[MESA_SHADER_STAGES];
>  
> /**
> +* Indices into the BufferInterfaceBlocks[] array for Uniform Buffer
> +* Objects and Shader Storage Buffer Objects.
> +*/
> +   int *UboInterfaceBlockIndex;
> +   int *SsboInterfaceBlockIndex;
> +
> +   /**
>  * Map of active uniform names to locations
>  *
>  * Maps any active uniform that is not an array element to a location.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Add compute shader builtin variables for OpenGLES 3.1

2015-10-29 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

On Thu, 2015-10-29 at 00:46 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> ---
>  src/glsl/builtin_variables.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/glsl/builtin_variables.cpp b/src/glsl/builtin_variables.cpp
> index a6ad105..00113d5 100644
> --- a/src/glsl/builtin_variables.cpp
> +++ b/src/glsl/builtin_variables.cpp
> @@ -710,7 +710,7 @@ builtin_variable_generator::generate_constants()
>}
> }
>  
> -   if (state->is_version(430, 0) || state->ARB_compute_shader_enable) {
> +   if (state->is_version(430, 310) || state->ARB_compute_shader_enable) {
>add_const("gl_MaxComputeAtomicCounterBuffers", 
> MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS);
>add_const("gl_MaxComputeAtomicCounters", MAX_COMPUTE_ATOMIC_COUNTERS);
>add_const("gl_MaxComputeImageUniforms", MAX_COMPUTE_IMAGE_UNIFORMS);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules

2015-10-29 Thread Iago Toral
On Thu, 2015-10-29 at 00:47 -0700, Jordan Justen wrote:
> The OpenGLES GLSL 3.1 specification uses the precision qualifier
> ordering rules from ARB_shading_language_420pack.

Maybe expand the commit log to make explicit that this is for GLES 3.1
and desktop GL since 4.2

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> ---
>  src/glsl/glsl_parser.yy | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
> index 2f2e10d..4636435 100644
> --- a/src/glsl/glsl_parser.yy
> +++ b/src/glsl/glsl_parser.yy
> @@ -948,7 +948,8 @@ parameter_qualifier:
>if ($2.precision != ast_precision_none)
>   _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
>  
> -  if (!state->has_420pack() && $2.flags.i != 0)
> +  if (!(state->has_420pack() || state->is_version(420, 310)) &&
> +  $2.flags.i != 0)
>   _mesa_glsl_error(&@1, state, "precision qualifiers must come last");
>  
>$$ = $2;
> @@ -1847,7 +1848,8 @@ type_qualifier:
>if ($2.precision != ast_precision_none)
>   _mesa_glsl_error(&@1, state, "duplicate precision qualifier");
>  
> -  if (!state->has_420pack() && $2.flags.i != 0)
> +  if (!(state->has_420pack() || state->is_version(420, 310)) &&
> +  $2.flags.i != 0)
>   _mesa_glsl_error(&@1, state, "precision qualifiers must come last");
>  
>$$ = $2;


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] i965: Invalid memory accesses after resizing brw_codegen's store table

2015-10-29 Thread Iago Toral
On Wed, 2015-10-28 at 10:58 -0700, Kristian Høgsberg wrote:
> On Wed, Oct 28, 2015 at 10:01:40AM +0100, Samuel Iglesias Gonsálvez wrote:
> > There is no opinions about this issue or reviews of the proposed patch
> > after one week.
> > 
> > This is just a reminder in case you have missed it :-)
> 
> Thanks for the reminder! How about something like this instead?

Yeah, that works too. I was a bit concerned that this same problem may
be affecting other places and this would only address it for
brw_send_indirect_message, but after a quick review we don't generally
need to hold pointers to previous instructions and the places where we
do, like in brw_ENDIF or brw_WHILE we are careful to create the
instructions we need before we look for pointers to others (which we do
using indices into the store anyway).

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

I'll push this patch tomorrow if nobody else objects.

Thanks Kristian!

> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index ebd811f..cd5c726 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -2511,12 +2511,20 @@ brw_send_indirect_message(struct brw_codegen *p,
>struct brw_reg desc)
>  {
> const struct brw_device_info *devinfo = p->devinfo;
> -   struct brw_inst *send, *setup;
> +   struct brw_inst *send;
> +   int setup;
>  
> assert(desc.type == BRW_REGISTER_TYPE_UD);
>  
> +   /* We hold on to the setup instruction (the SEND in the direct case, the 
> OR
> +* in the indirect case) by its index in the instruction store.  The
> +* pointer returned by next_insn() may become invalid if emitting the SEND
> +* in the indirect case reallocs the store.
> +*/
> +
> if (desc.file == BRW_IMMEDIATE_VALUE) {
> -  setup = send = next_insn(p, BRW_OPCODE_SEND);
> +  setup = p->nr_insn;
> +  send = next_insn(p, BRW_OPCODE_SEND);
>brw_set_src1(p, send, desc);
>  
> } else {
> @@ -2531,7 +2539,8 @@ brw_send_indirect_message(struct brw_codegen *p,
> * caller can specify additional descriptor bits with the usual
> * brw_set_*_message() helper functions.
> */
> -  setup = brw_OR(p, addr, desc, brw_imm_ud(0));
> +  setup = p->nr_insn;
> +  brw_OR(p, addr, desc, brw_imm_ud(0));
>  
>brw_pop_insn_state(p);
>  
> @@ -2543,7 +2552,7 @@ brw_send_indirect_message(struct brw_codegen *p,
> brw_set_src0(p, send, retype(payload, BRW_REGISTER_TYPE_UD));
>     brw_inst_set_sfid(devinfo, send, sfid);
>  
> -   return setup;
> +   return >store[setup];
>  }
>  
>  static struct brw_inst *
> 
> 
> > Sam
> > 
> > On 21/10/15 12:23, Iago Toral wrote:
> > > Hi,
> > > 
> > > The problem is with code like this (see brw_send_indirect_message):
> > > 
> > > setup = brw_OR(p, addr, desc, brw_imm_ud(0));
> > > send = next_insn(p, BRW_OPCODE_SEND);
> > > ...
> > > return setup;
> > > 
> > > If next_insn triggers a realloc of the instruction store, then the setup
> > > instruction pointer is no longer valid. Notice that this can happen
> > > anywhere where we keep pointers to previous instructions before creating
> > > new ones (!)
> > > 
> > > The patch from Samuel fixes this by special-casing this for SEND
> > > instructions only (since we know that the indirect versions can hit
> > > this, maybe there are more situations though). It does so by trying to
> > > make sure that we never realloc the store with a SEND instruction. For
> > > this, we realloc before we reach the end of the current store (32
> > > instructions before the limit) as long as the instruction is not a SEND
> > > (so that if it is a SEND we still have up to 32 opportunities to do the
> > > realloc without a different instruction before running out of space in
> > > the store).
> > > 
> > > Iago
> > > 
> > > On Wed, 2015-10-21 at 09:02 +0200, Samuel Iglesias Gonsalvez wrote:
> > >> Hello,
> > >>
> > >> I have found several invalid memory accesses when running
> > >> dEQP-GLES31.functional.ssbo.* tests on i965 driver (and gen7+). That
> > >> invalid memory accesses were unluckily happening when generating the
> > >> assembly instructions for SSBO stores for different compute shaders.
> > >>
> > >> However it looks like this problem could happen to other shaders and
> > >> situations. Because of that, I am going to explain the probl

Re: [Mesa-dev] [PATCH] i965/nir: Mark const index UBO surfaces as used

2015-10-29 Thread Iago Toral
On Thu, 2015-10-29 at 00:50 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp   | 6 --
>  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 --
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 4950ba4..6d69e96 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -1438,8 +1438,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>fs_reg surf_index;
>  
>if (const_index) {
> - surf_index = fs_reg(stage_prog_data->binding_table.ubo_start +
> - const_index->u[0]);
> + unsigned index = stage_prog_data->binding_table.ubo_start +
> +  const_index->u[0];
> + surf_index = fs_reg(index);
> + brw_mark_surface_used(prog_data, index);
>} else {
>   /* The block index is not a constant. Evaluate the index expression
>* per-channel and add the base UBO index; we have to select a value
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> index 0f04f65..efbdaa9 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> @@ -749,8 +749,10 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
> *instr)
>   /* The block index is a constant, so just emit the binding table 
> entry
>* as an immediate.
>*/
> - surf_index = src_reg(prog_data->base.binding_table.ubo_start +
> -  const_block_index->u[0]);
> + unsigned index = prog_data->base.binding_table.ubo_start +
> +  const_block_index->u[0];
> + surf_index = src_reg(index);
> + brw_mark_surface_used(_data->base, index);
>} else {
>   /* The block index is not a constant. Evaluate the index expression
>* per-channel and add the base UBO index; we have to select a value

Is this necessary? As far as I can see, the generator opcodes will mark
surfaces for which we have a constant index as used, so this should not
be necessary.

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS

2015-10-29 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

On Thu, 2015-10-29 at 00:47 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> ---
>  src/mesa/main/get_hash_params.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/main/get_hash_params.py 
> b/src/mesa/main/get_hash_params.py
> index c295615..fbc7b8f 100644
> --- a/src/mesa/main/get_hash_params.py
> +++ b/src/mesa/main/get_hash_params.py
> @@ -460,6 +460,7 @@ descriptor=[
>[ "MAX_COMPUTE_UNIFORM_COMPONENTS", 
> "CONST(MAX_COMPUTE_UNIFORM_COMPONENTS), extra_ARB_compute_shader_es31" ],
>[ "MAX_COMPUTE_IMAGE_UNIFORMS", "CONST(MAX_COMPUTE_IMAGE_UNIFORMS), 
> extra_ARB_compute_shader_es31" ],
>[ "DISPATCH_INDIRECT_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0, 
> extra_ARB_compute_shader_es31" ],
> +  [ "MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS", 
> "CONTEXT_INT(Const.Program[MESA_SHADER_COMPUTE].MaxCombinedUniformComponents),
>  extra_ARB_compute_shader_es31" ],
>  
>  # GL_ARB_framebuffer_no_attachments / GLES 3.1
>["MAX_FRAMEBUFFER_WIDTH", "CONTEXT_INT(Const.MaxFramebufferWidth), 
> extra_ARB_framebuffer_no_attachments"],


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Setup pull constant state for compute programs

2015-10-29 Thread Iago Toral
On Thu, 2015-10-29 at 00:49 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> ---
>  src/mesa/drivers/dri/i965/brw_context.h  |  2 +-
>  src/mesa/drivers/dri/i965/brw_state.h|  1 +
>  src/mesa/drivers/dri/i965/brw_state_upload.c |  2 ++
>  src/mesa/drivers/dri/i965/gen7_cs_state.c| 32 
> 
>  4 files changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
> b/src/mesa/drivers/dri/i965/brw_context.h
> index 18c361e..887b57b 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1177,7 +1177,7 @@ struct brw_context
>  
> int num_atoms[BRW_NUM_PIPELINES];
> const struct brw_tracked_state render_atoms[60];
> -   const struct brw_tracked_state compute_atoms[8];
> +   const struct brw_tracked_state compute_atoms[9];
>  
> /* If (INTEL_DEBUG & DEBUG_BATCH) */
> struct {
> diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
> b/src/mesa/drivers/dri/i965/brw_state.h
> index dc2b941..2c7c2f3 100644
> --- a/src/mesa/drivers/dri/i965/brw_state.h
> +++ b/src/mesa/drivers/dri/i965/brw_state.h
> @@ -49,6 +49,7 @@ extern const struct brw_tracked_state brw_clip_unit;
>  extern const struct brw_tracked_state brw_vs_pull_constants;
>  extern const struct brw_tracked_state brw_gs_pull_constants;
>  extern const struct brw_tracked_state brw_wm_pull_constants;
> +extern const struct brw_tracked_state brw_cs_pull_constants;
>  extern const struct brw_tracked_state brw_constant_buffer;
>  extern const struct brw_tracked_state brw_curbe_offsets;
>  extern const struct brw_tracked_state brw_invariant_state;
> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
> b/src/mesa/drivers/dri/i965/brw_state_upload.c
> index 79b8301..0344b8a 100644
> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> @@ -259,6 +259,7 @@ static const struct brw_tracked_state 
> *gen7_compute_atoms[] =
> _state_base_address,
> _cs_image_surfaces,
> _cs_push_constants,
> +   _cs_pull_constants,
> _cs_ubo_surfaces,
> _cs_abo_surfaces,
> _texture_surfaces,
> @@ -353,6 +354,7 @@ static const struct brw_tracked_state 
> *gen8_compute_atoms[] =
> _state_base_address,
> _cs_image_surfaces,
> _cs_push_constants,
> +   _cs_pull_constants,
> _cs_ubo_surfaces,
> _cs_abo_surfaces,
> _texture_surfaces,
> diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c 
> b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> index 6aeb0cb..da1d05f 100644
> --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> @@ -29,6 +29,7 @@
>  #include "brw_shader.h"
>  #include "intel_mipmap_tree.h"
>  #include "intel_batchbuffer.h"
> +#include "intel_buffer_objects.h"

I think you don't need this include. Other than that:

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

>  #include "brw_state.h"
>  
>  static unsigned
> @@ -285,3 +286,34 @@ const struct brw_tracked_state gen7_cs_push_constants = {
> },
> .emit = gen7_upload_cs_push_constants,
>  };
> +
> +/**
> + * Creates a new CS constant buffer reflecting the current CS program's
> + * constants, if needed by the CS program.
> + */
> +static void
> +brw_upload_cs_pull_constants(struct brw_context *brw)
> +{
> +   struct brw_stage_state *stage_state = >cs.base;
> +
> +   /* BRW_NEW_COMPUTE_PROGRAM */
> +   struct brw_compute_program *cp =
> +  (struct brw_compute_program *) brw->compute_program;
> +
> +   /* BRW_NEW_CS_PROG_DATA */
> +   const struct brw_stage_prog_data *prog_data = >cs.prog_data->base;
> +
> +   /* _NEW_PROGRAM_CONSTANTS */
> +   brw_upload_pull_constants(brw, BRW_NEW_SURFACES, >program.Base,
> + stage_state, prog_data, true);
> +}
> +
> +const struct brw_tracked_state brw_cs_pull_constants = {
> +   .dirty = {
> +  .mesa = _NEW_PROGRAM_CONSTANTS,
> +  .brw = BRW_NEW_BATCH |
> + BRW_NEW_COMPUTE_PROGRAM |
> + BRW_NEW_CS_PROG_DATA,
> +   },
> +   .emit = brw_upload_cs_pull_constants,
> +};


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] mesa/sso: Add MESA_VERBOSE=api trace support

2015-10-29 Thread Iago Toral
On Thu, 2015-10-29 at 00:52 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> ---
>  src/mesa/main/pipelineobj.c | 33 +
>  1 file changed, 33 insertions(+)
> 
> diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
> index 51ee10f..c8c50fa 100644
> --- a/src/mesa/main/pipelineobj.c
> +++ b/src/mesa/main/pipelineobj.c
> @@ -230,6 +230,10 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield 
> stages, GLuint program)
> struct gl_shader_program *shProg = NULL;
> GLbitfield any_valid_stages;
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glUseProgramStages(%d, 0x%x, %d)\n",
> +  pipeline, stages, program);
> +

It probably does not matter much, you since pieline and program and
unsigned we might want to use %y instead of %d here. The same in a bunch
of of the other hunks in this patch.

Either way,
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

> if (!pipe) {
>_mesa_error(ctx, GL_INVALID_OPERATION, "glUseProgramStages(pipeline)");
>return;
> @@ -345,6 +349,9 @@ _mesa_ActiveShaderProgram(GLuint pipeline, GLuint program)
> struct gl_shader_program *shProg = NULL;
> struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, 
> pipeline);
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glActiveShaderProgram(%d, %d)\n", pipeline, program);
> +
> if (program != 0) {
>shProg = _mesa_lookup_shader_program_err(ctx, program,
> 
> "glActiveShaderProgram(program)");
> @@ -380,6 +387,9 @@ _mesa_BindProgramPipeline(GLuint pipeline)
> GET_CURRENT_CONTEXT(ctx);
> struct gl_pipeline_object *newObj = NULL;
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glBindProgramPipeline(%d)\n", pipeline);
> +
> /* Rebinding the same pipeline object: no change.
>  */
> if (ctx->_Shader->Name == pipeline)
> @@ -467,6 +477,9 @@ _mesa_DeleteProgramPipelines(GLsizei n, const GLuint 
> *pipelines)
> GET_CURRENT_CONTEXT(ctx);
> GLsizei i;
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glDeleteProgramPipelines(%d, %p)\n", n, pipelines);
> +
> if (n < 0) {
>_mesa_error(ctx, GL_INVALID_VALUE, "glDeleteProgramPipelines(n<0)");
>return;
> @@ -551,6 +564,9 @@ _mesa_GenProgramPipelines(GLsizei n, GLuint *pipelines)
>  {
> GET_CURRENT_CONTEXT(ctx);
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glGenProgramPipelines(%d, %p)\n", n, pipelines);
> +
> create_program_pipelines(ctx, n, pipelines, false);
>  }
>  
> @@ -559,6 +575,9 @@ _mesa_CreateProgramPipelines(GLsizei n, GLuint *pipelines)
>  {
> GET_CURRENT_CONTEXT(ctx);
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glCreateProgramPipelines(%d, %p)\n", n, pipelines);
> +
> create_program_pipelines(ctx, n, pipelines, true);
>  }
>  
> @@ -574,6 +593,9 @@ _mesa_IsProgramPipeline(GLuint pipeline)
>  {
> GET_CURRENT_CONTEXT(ctx);
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glIsProgramPipeline(%d)\n", pipeline);
> +
> struct gl_pipeline_object *obj = _mesa_lookup_pipeline_object(ctx, 
> pipeline);
> if (obj == NULL)
>return GL_FALSE;
> @@ -590,6 +612,10 @@ _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum 
> pname, GLint *params)
> GET_CURRENT_CONTEXT(ctx);
> struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, 
> pipeline);
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glGetProgramPipelineiv(%d, %d, %p)\n",
> +  pipeline, pname, params);
> +
> /* Are geometry shaders available in this context?
>  */
> const bool has_gs = _mesa_has_geometry_shaders(ctx);
> @@ -857,6 +883,9 @@ _mesa_ValidateProgramPipeline(GLuint pipeline)
>  {
> GET_CURRENT_CONTEXT(ctx);
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glValidateProgramPipeline(%d)\n", pipeline);
> +
> struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, 
> pipeline);
>  
> if (!pipe) {
> @@ -875,6 +904,10 @@ _mesa_GetProgramPipelineInfoLog(GLuint pipeline, GLsizei 
> bufSize,
>  {
> GET_CURRENT_CONTEXT(ctx);
>  
> +   if (MESA_VERBOSE & VERBOSE_API)
> +  _mesa_debug(ctx, "glGetProgramPipelineInfoLog(%d, %d, %p, %p)\n",
> +  pipeline, bufSize, length, infoLog);
> +
> struct gl_pipeline_object *pipe = _mesa_lookup_pipeline_object(ctx, 
> pipeline);
>  
> if (!pipe) {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] mesa/sso: Add compute shader support

2015-10-29 Thread Iago Toral
On Thu, 2015-10-29 at 00:52 -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> ---
>  src/mesa/main/api_validate.c |  2 +-
>  src/mesa/main/pipelineobj.c  | 11 +++
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/api_validate.c b/src/mesa/main/api_validate.c
> index c59b6f3..46f39e7 100644
> --- a/src/mesa/main/api_validate.c
> +++ b/src/mesa/main/api_validate.c
> @@ -923,7 +923,7 @@ check_valid_to_compute(struct gl_context *ctx, const char 
> *function)
>  * "An INVALID_OPERATION error is generated if there is no active program
>  *  for the compute shader stage."
>  */
> -   prog = ctx->Shader.CurrentProgram[MESA_SHADER_COMPUTE];
> +   prog = ctx->_Shader->CurrentProgram[MESA_SHADER_COMPUTE];
> if (prog == NULL || prog->_LinkedShaders[MESA_SHADER_COMPUTE] == NULL) {
>_mesa_error(ctx, GL_INVALID_OPERATION,
>"%s(no active compute shader)",

This hunk won't apply on current master, there is no such comment before
this line. Maybe this is part of another series of patches?

For the rest of the patch:
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

> diff --git a/src/mesa/main/pipelineobj.c b/src/mesa/main/pipelineobj.c
> index c8c50fa..58730f4 100644
> --- a/src/mesa/main/pipelineobj.c
> +++ b/src/mesa/main/pipelineobj.c
> @@ -255,6 +255,8 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield 
> stages, GLuint program)
> if (_mesa_has_tessellation(ctx))
>any_valid_stages |= GL_TESS_CONTROL_SHADER_BIT |
>GL_TESS_EVALUATION_SHADER_BIT;
> +   if (_mesa_has_compute_shaders(ctx))
> +  any_valid_stages |= GL_COMPUTE_SHADER_BIT;
>  
> if (stages != GL_ALL_SHADER_BITS && (stages & ~any_valid_stages) != 0) {
>_mesa_error(ctx, GL_INVALID_VALUE, "glUseProgramStages(Stages)");
> @@ -336,6 +338,9 @@ _mesa_UseProgramStages(GLuint pipeline, GLbitfield 
> stages, GLuint program)
>  
> if ((stages & GL_TESS_EVALUATION_SHADER_BIT) != 0)
>_mesa_use_shader_program(ctx, GL_TESS_EVALUATION_SHADER, shProg, pipe);
> +
> +   if ((stages & GL_COMPUTE_SHADER_BIT) != 0)
> +  _mesa_use_shader_program(ctx, GL_COMPUTE_SHADER, shProg, pipe);
>  }
>  
>  /**
> @@ -669,6 +674,12 @@ _mesa_GetProgramPipelineiv(GLuint pipeline, GLenum 
> pname, GLint *params)
>*params = pipe->CurrentProgram[MESA_SHADER_FRAGMENT]
>   ? pipe->CurrentProgram[MESA_SHADER_FRAGMENT]->Name : 0;
>return;
> +   case GL_COMPUTE_SHADER:
> +  if (!_mesa_has_compute_shaders(ctx))
> + break;
> +  *params = pipe->CurrentProgram[MESA_SHADER_COMPUTE]
> + ? pipe->CurrentProgram[MESA_SHADER_COMPUTE]->Name : 0;
> +  return;
> default:
>break;
> }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads

2015-10-23 Thread Iago Toral
On Thu, 2015-10-22 at 09:09 -0700, Jason Ekstrand wrote:
> On Thu, Oct 22, 2015 at 4:21 AM, Iago Toral Quiroga <ito...@igalia.com> wrote:
> > I implemented this first as a separate optimization pass in GLSL IR [1], but
> > Curro pointed out that this being pretty much a restricted form of a CSE 
> > pass
> > it would probably make more sense to do it inside CSE (and we no longer have
> > a CSE pass in GLSL IR).
> >
> > Unlike other things we CSE in NIR, in the case of SSBO loads we need to make
> > sure that we invalidate previous entries in the set in the presence of
> > conflicting instructions (i.e. SSBO writes to the same block and offset) or
> > in the presence of memory barriers.
> >
> > If this is accepted I intend to extend this to also cover image reads, which
> > follow similar behavior.
> >
> > No regressions observed in piglit or dEQP's SSBO functional tests.
> >
> > [1] http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html
> 
> I think you've gotten enough NAK's that I don't need to chime in
> there.  Unfortunately, solving this in general is something of a
> research project that both Connor and I have been thinking about for
> quite some time.  I've been thinking off-and-on about how to add a
> proper memory model to lower_vars_to_ssa for almost a year now and
> still haven't come up with a good way to do it.  I don't know whether
> SSBO's would be simpler or not.  We need a proper memory model for
> both lower_vars_to_ssa and SSBO load/stores (and shared local
> variables) but it's a substantial research project.
> 
> This isn't to say that you couldn't do it.  Just know what you're taking on. 
> ;-)

Yeah, it does not make sense that I try to do this, you guys have
clearly given this much more thought than me and know much better how a
solution for this would fit in NIR than me.

> That said, here's a suggestion for something that we *could* write
> today, wouldn't be very hard, and wold solve a decent number of cases.
> 
> For each block:
> 
> 1) Create a new instruction set (don't use anything from any previous blocks)
> 2) call add_or_rewrite on all ssbo load operations
> 3) If you ever see a barrier or ssbo store, destroy the entire
> instruction set and start again.

Yep, this is what I was thinking for the load-combine pass that Connor
suggested. However, I think that in this case we do not need to destroy
the entire set when we find a store, only for memory barriers, right? I
mean, there should be nothing preventing us from checking the
offset/block of the store and compare it with the offset/block of the
loads in the set to decide which ones we need to remove (like I was
doing in my last patch)

> This is something you could put together fairly quickly and would
> handle a fair number of cases.  With a little special casing, you may
> also be able to handle store and then an immediate load of the same
> value or duplicate stores.  Anything much more complex than that is
> going to take a lot more thought.

Yes, I'll give this a try next. Thanks for all the comments and
suggestions!

Iago

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/2] Nir: Allow CSE of SSBO loads

2015-10-22 Thread Iago Toral
On Thu, 2015-10-22 at 16:38 +0200, Iago Toral wrote:
> On Thu, 2015-10-22 at 09:39 -0400, Connor Abbott wrote:
> > On Thu, Oct 22, 2015 at 7:21 AM, Iago Toral Quiroga <ito...@igalia.com> 
> > wrote:
> > > I implemented this first as a separate optimization pass in GLSL IR [1], 
> > > but
> > > Curro pointed out that this being pretty much a restricted form of a CSE 
> > > pass
> > > it would probably make more sense to do it inside CSE (and we no longer 
> > > have
> > > a CSE pass in GLSL IR).
> > >
> > > Unlike other things we CSE in NIR, in the case of SSBO loads we need to 
> > > make
> > > sure that we invalidate previous entries in the set in the presence of
> > > conflicting instructions (i.e. SSBO writes to the same block and offset) 
> > > or
> > > in the presence of memory barriers.
> > >
> > > If this is accepted I intend to extend this to also cover image reads, 
> > > which
> > > follow similar behavior.
> > >
> > > No regressions observed in piglit or dEQP's SSBO functional tests.
> > >
> > > [1] 
> > > http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html
> > >
> > > Iago Toral Quiroga (2):
> > >   nir/cse: invalidate SSBO loads in presence of ssbo writes or memory
> > > barriers
> > >   nir/instr_set: allow rewrite of SSBO loads
> > >
> > >  src/glsl/nir/nir_instr_set.c |  24 ++--
> > >  src/glsl/nir/nir_opt_cse.c   | 142 
> > > +++
> > >  2 files changed, 162 insertions(+), 4 deletions(-)
> > >
> > > --
> > > 1.9.1
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> > NAK, this isn't going to work. NIR CSE is designed for operations
> > which can be moved around freely as long they're still dominated by
> > the SSA values they use. It makes heavy advantage of this to avoid
> > looking at the entire CFG and instead only at the current block and
> > its parents in the dominance tree. For example, imagine you have
> > something like:
> > 
> > A = load_ssbo 0
> > if  (cond) {
> >store_ssbo 0
> > }
> > B = load_ssbo 0
> > 
> > Then A and B can't be combined, but CSE will combine them anyways when
> > it reaches B because it keeps a hash table of values dominating B and
> > finds A as a match. It doesn't look at the if conditional at all
> > because it doesn't dominate the load to B. This is great when you want
> > to CSE pure things that don't depend on other side effects -- after
> > all, this is the sort of efficiency that SSA is supposed to give us --
> > but it means that as-is, it can't be used for e.g. SSBO's and images
> > without completely changing how the pass works and making it less
> > efficient.
> 
> Ugh! One would think that at least one of the 2000+ SSBO tests in dEQP
> would catch something like this... I guess not :(.

However, I have just tested this and it works just fine. See:

buffer Fragments {
   vec4 v;
};
out vec4 color;

void main()
{
   vec4 tmp = v;
   if (tmp.x > 0) {
  v = vec4(0, 1, 0, 1);
   }
   color = v;
}

And the final NIR SSA form for this is:

impl main {
block block_0:
/* preds: */
vec1 ssa_0 = load_const (0x /* 0.00 */)
vec4 ssa_1 = load_const (0x /* 0.00 */,
0x3f80 /* 1.00 */, 0x /* 0.00 */, 0x3f80 /*
1.00 */)
vec4 ssa_2 = intrinsic load_ssbo (ssa_0) () (0)
vec1 ssa_3 = flt ssa_0, ssa_2
/* succs: block_1 block_2 */
if ssa_3 {
block block_1:
/* preds: block_0 */
intrinsic store_ssbo (ssa_1, ssa_0) () (0, 15)
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec4 ssa_4 = intrinsic load_ssbo (ssa_0) () (0)
intrinsic store_output (ssa_4) () (0)   /* color */
/* succs: block_4 */
block block_4:
}

What is going on here is that block 1 is in block0->dom_children, so the
CSE pass looks into that, sees the store and invalidates the first SSBO
load as I was initially hoping that it would.

I guess this behavior is not expected then?

Iago

> > Now, that being said, I still think that we should definitely be doing
> > this sort of thing

Re: [Mesa-dev] [PATCH 2/2] glsl: fix GL_BUFFER_DATA_SIZE value for shader storage blocks with unsize arrays

2015-10-28 Thread Iago Toral
On Thu, 2015-10-22 at 11:01 +0200, Samuel Iglesias Gonsalvez wrote:
> From ARB_program_interface_query:
> 
> "For the property of BUFFER_DATA_SIZE, then the implementation-dependent
>  minimum total buffer object size, in basic machine units, required to hold
>  all active variables associated with an active uniform block, shader
>  storage block, or atomic counter buffer is written to .  If the
>  final member of an active shader storage block is array with no declared
>  size, the minimum buffer size is computed assuming the array was declared
>  as an array with one element."
> 
> Fixes the following dEQP-GLES31 tests:
> 
> dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.named_block
> dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.unnamed_block
> dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.block_array
> 
> Signed-off-by: Samuel Iglesias Gonsalvez <sigles...@igalia.com>
> ---
>  src/glsl/link_uniform_blocks.cpp | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/src/glsl/link_uniform_blocks.cpp 
> b/src/glsl/link_uniform_blocks.cpp
> index 5285d8d..a10b44b 100644
> --- a/src/glsl/link_uniform_blocks.cpp
> +++ b/src/glsl/link_uniform_blocks.cpp
> @@ -130,13 +130,22 @@ private:
>  
>unsigned alignment = 0;
>unsigned size = 0;
> -
> +  /* From ARB_program_interface_query:
Add a blank line here
> +   * "If the final member of an active shader storage block is array with
> +   * no declared size, the minimum buffer size is computed assuming the
> +   * array was declared as an array with one element."

Align the two lines above to the If in the first line and indent the
block in quotes like it is done for other similar comments in the same
function just below this.

> +   *
> +   * For that reason, we use the base type of the unsized array to 
> calculate
> +   * its size.
> +   */

I was wondering if we should also check that this is the last member
explicitly (there is an unused bool parameter in this function that
informs us about that). My understanding is that only SSBOs can have
that, and the parser should ensure that they are last  in the SSBO
definition, so maybe it is redundant... if we don't want to add that
check, then maybe it is worth amending the comment to explain why though
(and even in that case maybe we want to add an assert).

With these changes:
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

> +  const glsl_type *type_for_size =
> + type->is_unsized_array() ? type->without_array() : type;
>if (packing == GLSL_INTERFACE_PACKING_STD430) {
>   alignment = type->std430_base_alignment(v->RowMajor);
> - size = type->std430_size(v->RowMajor);
> + size = type_for_size->std430_size(v->RowMajor);
>} else {
>   alignment = type->std140_base_alignment(v->RowMajor);
> - size = type->std140_size(v->RowMajor);
> + size = type_for_size->std140_size(v->RowMajor);
>}
>  
>this->offset = glsl_align(this->offset, alignment);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] i965: Don't consider control flow instructions to have sources.

2015-10-28 Thread Iago Toral
On Wed, 2015-10-21 at 15:58 -0700, Matt Turner wrote:
> And why did IFF have a destination?
> 
> I suspect that once upon a time the disassembler used this information
> to know which fields to find the jump targets in. The jump targets have
> moved, so the disassembler has to know how to handle these
> per-generation anyway.

Control-flow instructions are explicitly handled by the disassembler and
don't use .nsrc or .ndst at all, so I think this should be fine.

The only other place where we use .ndst is the the check for compr4 in
gen < 6, but IFF would not fall in that case anyway.

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

> ---
>  src/mesa/drivers/dri/i965/brw_disasm.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
> b/src/mesa/drivers/dri/i965/brw_disasm.c
> index c2dac7c..29056ed 100644
> --- a/src/mesa/drivers/dri/i965/brw_disasm.c
> +++ b/src/mesa/drivers/dri/i965/brw_disasm.c
> @@ -90,20 +90,20 @@ const struct opcode_desc opcode_descs[128] = {
> [BRW_OPCODE_NOP]  = { .name = "nop", .nsrc = 0, .ndst = 0 },
> [BRW_OPCODE_NENOP]= { .name = "nenop",   .nsrc = 0, .ndst = 0 },
> [BRW_OPCODE_JMPI] = { .name = "jmpi",.nsrc = 0, .ndst = 0 },
> -   [BRW_OPCODE_IF]   = { .name = "if",  .nsrc = 2, .ndst = 0 },
> -   [BRW_OPCODE_IFF]  = { .name = "iff", .nsrc = 2, .ndst = 1 },
> -   [BRW_OPCODE_WHILE]= { .name = "while",   .nsrc = 2, .ndst = 0 },
> -   [BRW_OPCODE_ELSE] = { .name = "else",.nsrc = 2, .ndst = 0 },
> -   [BRW_OPCODE_BREAK]= { .name = "break",   .nsrc = 2, .ndst = 0 },
> -   [BRW_OPCODE_CONTINUE] = { .name = "cont",.nsrc = 1, .ndst = 0 },
> -   [BRW_OPCODE_HALT] = { .name = "halt",.nsrc = 1, .ndst = 0 },
> +   [BRW_OPCODE_IF]   = { .name = "if",  .nsrc = 0, .ndst = 0 },
> +   [BRW_OPCODE_IFF]  = { .name = "iff", .nsrc = 0, .ndst = 0 },
> +   [BRW_OPCODE_WHILE]= { .name = "while",   .nsrc = 0, .ndst = 0 },
> +   [BRW_OPCODE_ELSE] = { .name = "else",.nsrc = 0, .ndst = 0 },
> +   [BRW_OPCODE_BREAK]= { .name = "break",   .nsrc = 0, .ndst = 0 },
> +   [BRW_OPCODE_CONTINUE] = { .name = "cont",.nsrc = 0, .ndst = 0 },
> +   [BRW_OPCODE_HALT] = { .name = "halt",.nsrc = 0, .ndst = 0 },
> // [BRW_OPCODE_MSAVE]= { .name = "msave",   .nsrc = 1, .ndst = 1 },
> // [BRW_OPCODE_PUSH] = { .name = "push",.nsrc = 1, .ndst = 1 },
> // [BRW_OPCODE_MREST]= { .name = "mrest",   .nsrc = 1, .ndst = 1 },
> // [BRW_OPCODE_POP]  = { .name = "pop", .nsrc = 2, .ndst = 0 },
> [BRW_OPCODE_WAIT] = { .name = "wait",.nsrc = 1, .ndst = 0 },
> [BRW_OPCODE_DO]   = { .name = "do",  .nsrc = 0, .ndst = 0 },
> -   [BRW_OPCODE_ENDIF]= { .name = "endif",   .nsrc = 2, .ndst = 0 },
> +   [BRW_OPCODE_ENDIF]= { .name = "endif",   .nsrc = 0, .ndst = 0 },
>  };
>  
>  static bool


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] glsl: Drop exec_list argument to lower_ubo_reference

2015-11-09 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

On Wed, 2015-11-04 at 15:33 -0800, Kristian Høgsberg Kristensen wrote:
> We always pass in shader->ir and we already pass in the shader, so just
> drop the exec_list. Most passes either take just a exec_list or a
> shader, so this seems more consistent.
> 
> Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net>
> ---
>  src/glsl/ir_optimization.h | 2 +-
>  src/glsl/lower_ubo_reference.cpp   | 4 ++--
>  src/mesa/drivers/dri/i965/brw_link.cpp | 2 +-
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 2 +-
>  4 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h
> index ce5c492..6d19a6c 100644
> --- a/src/glsl/ir_optimization.h
> +++ b/src/glsl/ir_optimization.h
> @@ -124,7 +124,7 @@ bool lower_const_arrays_to_uniforms(exec_list 
> *instructions);
>  bool lower_clip_distance(gl_shader *shader);
>  void lower_output_reads(unsigned stage, exec_list *instructions);
>  bool lower_packing_builtins(exec_list *instructions, int op_mask);
> -void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions);
> +void lower_ubo_reference(struct gl_shader *shader);
>  void lower_packed_varyings(void *mem_ctx,
> unsigned locations_used, ir_variable_mode mode,
> unsigned gs_input_vertices, gl_shader *shader);
> diff --git a/src/glsl/lower_ubo_reference.cpp 
> b/src/glsl/lower_ubo_reference.cpp
> index 57a242b..24806ac 100644
> --- a/src/glsl/lower_ubo_reference.cpp
> +++ b/src/glsl/lower_ubo_reference.cpp
> @@ -1270,7 +1270,7 @@ lower_ubo_reference_visitor::visit_enter(ir_call *ir)
>  } /* unnamed namespace */
>  
>  void
> -lower_ubo_reference(struct gl_shader *shader, exec_list *instructions)
> +lower_ubo_reference(struct gl_shader *shader)
>  {
> lower_ubo_reference_visitor v(shader);
>  
> @@ -1281,6 +1281,6 @@ lower_ubo_reference(struct gl_shader *shader, exec_list 
> *instructions)
>  */
> do {
>v.progress = false;
> -  visit_list_elements(, instructions);
> +  visit_list_elements(, shader->ir);
> } while (v.progress);
>  }
> diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
> b/src/mesa/drivers/dri/i965/brw_link.cpp
> index fc9bee4..f1e3860 100644
> --- a/src/mesa/drivers/dri/i965/brw_link.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_link.cpp
> @@ -157,7 +157,7 @@ process_glsl_ir(gl_shader_stage stage,
>   _mesa_shader_stage_to_abbrev(shader->Stage));
> }
>  
> -   lower_ubo_reference(shader, shader->ir);
> +   lower_ubo_reference(shader);
>  
> bool progress;
> do {
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index f481e89..ca00930 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -5822,7 +5822,7 @@ st_link_shader(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>   (!ctx->Const.NativeIntegers ? INT_DIV_TO_MUL_RCP : 
> 0) |
>   (options->EmitNoSat ? SAT_TO_CLAMP : 0));
>  
> -  lower_ubo_reference(prog->_LinkedShaders[i], ir);
> +  lower_ubo_reference(prog->_LinkedShaders[i]);
>do_vec_index_to_cond_assign(ir);
>lower_vector_insert(ir, true);
>lower_quadop_vector(ir, false);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] glsl: Lower UBO and SSBO access in glsl linker

2015-11-09 Thread Iago Toral
On Wed, 2015-11-04 at 15:33 -0800, Kristian Høgsberg Kristensen wrote:
> All GLSL IR consumers run this lowering pass so we can move it to the
> linker. This moves the pass up quite a bit, but that's the point: it
> needs to run before we throw away information about per-component vector
> access.
> 
> Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net>
> ---
>  src/glsl/linker.cpp| 8 
>  src/mesa/drivers/dri/i965/brw_link.cpp | 2 --
>  src/mesa/drivers/dri/i965/brw_shader.cpp   | 2 ++
>  src/mesa/main/mtypes.h | 2 ++
>  src/mesa/state_tracker/st_extensions.c | 1 +
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 1 -
>  6 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> index c35d87a..ea6a3f3 100644
> --- a/src/glsl/linker.cpp
> +++ b/src/glsl/linker.cpp
> @@ -4449,6 +4449,14 @@ link_shaders(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>  
> /* FINISHME: Assign fragment shader output locations. */
>  
> +   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
> +  if (prog->_LinkedShaders[i] == NULL)
> +  continue;
> +
> +  if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks)
> + lower_ubo_reference(prog->_LinkedShaders[i]);
> +   }
> +

It probably makes more sense to rewrite this loop as:

if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks) {
   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
  if (prog->_LinkedShaders[i] != NULL)
 lower_ubo_reference(prog->_LinkedShaders[i]);
   }
}

With that change, and assuming that this change is not responsible for
the shader-db regressions posted by Jason:

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

>  done:
> for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
>free(shader_list[i]);
> diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
> b/src/mesa/drivers/dri/i965/brw_link.cpp
> index f1e3860..2991173 100644
> --- a/src/mesa/drivers/dri/i965/brw_link.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_link.cpp
> @@ -157,8 +157,6 @@ process_glsl_ir(gl_shader_stage stage,
>   _mesa_shader_stage_to_abbrev(shader->Stage));
> }
>  
> -   lower_ubo_reference(shader);
> -
> bool progress;
> do {
>progress = false;
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> b/src/mesa/drivers/dri/i965/brw_shader.cpp
> index 4ea297a..5adc986 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> @@ -148,6 +148,8 @@ brw_compiler_create(void *mem_ctx, const struct 
> brw_device_info *devinfo)
>   compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true;
>  
>compiler->glsl_compiler_options[i].NirOptions = nir_options;
> +
> +  compiler->glsl_compiler_options[i].LowerBufferInterfaceBlocks = true;
> }
>  
> return compiler;
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index d6c1eb8..800ad81 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -2874,6 +2874,8 @@ struct gl_shader_compiler_options
>  */
> GLboolean OptimizeForAOS;
>  
> +   GLboolean LowerBufferInterfaceBlocks; /**< Lower UBO and SSBO access to 
> intrinsics. */
> +
> const struct nir_shader_compiler_options *NirOptions;
>  };
>  
> diff --git a/src/mesa/state_tracker/st_extensions.c 
> b/src/mesa/state_tracker/st_extensions.c
> index bd7cbcc..bbb9027 100644
> --- a/src/mesa/state_tracker/st_extensions.c
> +++ b/src/mesa/state_tracker/st_extensions.c
> @@ -254,6 +254,7 @@ void st_init_limits(struct pipe_screen *screen,
>
> PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT);
>  
>options->LowerClipDistance = true;
> +  options->LowerBufferInterfaceBlocks = true;
> }
>  
> c->LowerTessLevel = true;
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index ca00930..9ee6f8f 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -5822,7 +5822,6 @@ st_link_shader(struct gl_context *ctx, struct 
> gl_shader_program *prog)
>   (!ctx->Const.NativeIntegers ? INT_DIV_TO_MUL_RCP : 
> 0) |
>   (options->EmitNoSat ? SAT_TO_CLAMP : 0));
>  
> -  lower_ubo_reference(prog->_LinkedShaders[i]);
>do_vec_index_to_cond_assign(ir);
>lower_vector_insert(ir, true);
>lower_quadop_vector(ir, false);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/3] glsl: Lower UBO and SSBO access in glsl linker

2015-11-10 Thread Iago Toral
On Mon, 2015-11-09 at 16:52 +0100, Iago Toral wrote:
> On Wed, 2015-11-04 at 15:33 -0800, Kristian Høgsberg Kristensen wrote:
> > All GLSL IR consumers run this lowering pass so we can move it to the
> > linker. This moves the pass up quite a bit, but that's the point: it
> > needs to run before we throw away information about per-component vector
> > access.
> > 
> > Signed-off-by: Kristian Høgsberg Kristensen <k...@bitplanet.net>
> > ---
> >  src/glsl/linker.cpp| 8 
> >  src/mesa/drivers/dri/i965/brw_link.cpp | 2 --
> >  src/mesa/drivers/dri/i965/brw_shader.cpp   | 2 ++
> >  src/mesa/main/mtypes.h | 2 ++
> >  src/mesa/state_tracker/st_extensions.c | 1 +
> >  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 1 -
> >  6 files changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp
> > index c35d87a..ea6a3f3 100644
> > --- a/src/glsl/linker.cpp
> > +++ b/src/glsl/linker.cpp
> > @@ -4449,6 +4449,14 @@ link_shaders(struct gl_context *ctx, struct 
> > gl_shader_program *prog)
> >  
> > /* FINISHME: Assign fragment shader output locations. */
> >  
> > +   for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
> > +  if (prog->_LinkedShaders[i] == NULL)
> > +continue;
> > +
> > +  if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks)
> > + lower_ubo_reference(prog->_LinkedShaders[i]);
> > +   }
> > +
> 
> It probably makes more sense to rewrite this loop as:
> 
> if (ctx->Const.ShaderCompilerOptions[i].LowerBufferInterfaceBlocks) {
>for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
>   if (prog->_LinkedShaders[i] != NULL)
>  lower_ubo_reference(prog->_LinkedShaders[i]);
>}
> }
> 
> With that change, and assuming that this change is not responsible for
> the shader-db regressions posted by Jason:

Forget about that, I did not notice that LowerBufferInterfaceBlocks is
set by stage. You can keep the Rb for the original version.

Iago

> Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>
> 
> >  done:
> > for (unsigned i = 0; i < MESA_SHADER_STAGES; i++) {
> >free(shader_list[i]);
> > diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
> > b/src/mesa/drivers/dri/i965/brw_link.cpp
> > index f1e3860..2991173 100644
> > --- a/src/mesa/drivers/dri/i965/brw_link.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_link.cpp
> > @@ -157,8 +157,6 @@ process_glsl_ir(gl_shader_stage stage,
> >   _mesa_shader_stage_to_abbrev(shader->Stage));
> > }
> >  
> > -   lower_ubo_reference(shader);
> > -
> > bool progress;
> > do {
> >progress = false;
> > diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
> > b/src/mesa/drivers/dri/i965/brw_shader.cpp
> > index 4ea297a..5adc986 100644
> > --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
> > @@ -148,6 +148,8 @@ brw_compiler_create(void *mem_ctx, const struct 
> > brw_device_info *devinfo)
> >   compiler->glsl_compiler_options[i].EmitNoIndirectSampler = true;
> >  
> >compiler->glsl_compiler_options[i].NirOptions = nir_options;
> > +
> > +  compiler->glsl_compiler_options[i].LowerBufferInterfaceBlocks = true;
> > }
> >  
> > return compiler;
> > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> > index d6c1eb8..800ad81 100644
> > --- a/src/mesa/main/mtypes.h
> > +++ b/src/mesa/main/mtypes.h
> > @@ -2874,6 +2874,8 @@ struct gl_shader_compiler_options
> >  */
> > GLboolean OptimizeForAOS;
> >  
> > +   GLboolean LowerBufferInterfaceBlocks; /**< Lower UBO and SSBO access to 
> > intrinsics. */
> > +
> > const struct nir_shader_compiler_options *NirOptions;
> >  };
> >  
> > diff --git a/src/mesa/state_tracker/st_extensions.c 
> > b/src/mesa/state_tracker/st_extensions.c
> > index bd7cbcc..bbb9027 100644
> > --- a/src/mesa/state_tracker/st_extensions.c
> > +++ b/src/mesa/state_tracker/st_extensions.c
> > @@ -254,6 +254,7 @@ void st_init_limits(struct pipe_screen *screen,
> >
> > PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT);
> >  
> >options->LowerClipDistance = true;
> > +  options->LowerBufferInterfaceBlocks = true;
> > }
> >  
> > c->LowerTessLevel = true;
&

Re: [Mesa-dev] [PATCH 6/7] glsl: do not loose precision information when packing varyings

2015-11-10 Thread Iago Toral
s/loose/lose

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

On Thu, 2015-11-05 at 13:33 +0200, Tapani Pälli wrote:
> This information will be used by cross stage validation of varyings
> for pipeline objects.
> 
> Signed-off-by: Tapani Pälli <tapani.pa...@intel.com>
> ---
>  src/glsl/lower_packed_varyings.cpp | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/glsl/lower_packed_varyings.cpp 
> b/src/glsl/lower_packed_varyings.cpp
> index 5d66ca9..037c27d 100644
> --- a/src/glsl/lower_packed_varyings.cpp
> +++ b/src/glsl/lower_packed_varyings.cpp
> @@ -621,6 +621,7 @@ lower_packed_varyings_visitor::get_packed_varying_deref(
>packed_var->data.patch = unpacked_var->data.patch;
>packed_var->data.interpolation = unpacked_var->data.interpolation;
>packed_var->data.location = location;
> +  packed_var->data.precision = unpacked_var->data.precision;
>unpacked_var->insert_before(packed_var);
>this->packed_varyings[slot] = packed_var;
> } else {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 5/7] glsl: Add precision information to ir_variable

2015-11-10 Thread Iago Toral
On Fri, 2015-11-06 at 14:03 +0200, Tapani Pälli wrote:
> From: Iago Toral Quiroga <ito...@igalia.com>
> 
> We will need this later on when we implement proper support for
> precision qualifiers in the drivers and also to do link time checks for
> uniforms as indicated by the spec.
> 
> This patch also adds compile-time checks for variables without precision
> information (currently, Mesa only checks that a default precision is set
> for floats in fragment shaders).
> 
> As indicated by Ian, the addition of the precision information to
> ir_variable has been done using a bitfield and pahole to identify an
> available hole so that memory requirements for ir_variable stay the
> same.
> 
> v2 (Ian):
>   - Avoid if-ladders by defining arrays of supported sampler names and
> indexing
> into them with type->sampler_array + 2 * type->sampler_shadow
>   - Make the code that selects the precision qualifier to use an utility
> function
>   - Fix a typo
> 
> v3 (Tapani):
>   - rebased
>   - squashed in "Precision qualifiers are not allowed on structs"
>   - fixed select_gles_precision for sampler arrays
>   - fixed precision_qualifier_allowed for arrays of structs
> 
> v4 (Tapani):
>   - add atomic_uint handling
>   - do not allow precision qualifier on images
>   (issues reported by Marta)
> 
> v5 (Tapani):
>   - support precision qualifier on image types
> ---
>  src/glsl/ast_to_hir.cpp | 296 
> 
>  src/glsl/ir.h   |  13 ++
>  src/glsl/nir/glsl_types.cpp |   4 +
>  src/glsl/nir/glsl_types.h   |  11 ++
>  4 files changed, 301 insertions(+), 23 deletions(-)
> 
> diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
> index b6d662b..1240615 100644
> --- a/src/glsl/ast_to_hir.cpp
> +++ b/src/glsl/ast_to_hir.cpp
> @@ -2189,10 +2189,10 @@ precision_qualifier_allowed(const glsl_type *type)
>  * From this, we infer that GLSL 1.30 (and later) should allow precision
>  * qualifiers on sampler types just like float and integer types.
>  */
> -   return type->is_float()
> +   return (type->is_float()
> || type->is_integer()
> -   || type->is_record()
> -   || type->contains_opaque();
> +   || type->contains_opaque())
> +   && !type->without_array()->is_record();
>  }
>  
>  const glsl_type *
> @@ -2210,31 +2210,268 @@ ast_type_specifier::glsl_type(const char **name,
> return type;
>  }
>  
> -const glsl_type *
> -ast_fully_specified_type::glsl_type(const char **name,
> -struct _mesa_glsl_parse_state *state) 
> const
> +/**
> + * From the OpenGL ES 3.0 spec, 4.5.4 Default Precision Qualifiers:
> + *
> + * "The precision statement
> + *
> + *precision precision-qualifier type;
> + *
> + *  can be used to establish a default precision qualifier. The type field 
> can
> + *  be either int or float or any of the sampler types, (...) If type is 
> float,
> + *  the directive applies to non-precision-qualified floating point type
> + *  (scalar, vector, and matrix) declarations. If type is int, the directive
> + *  applies to all non-precision-qualified integer type (scalar, vector, 
> signed,
> + *  and unsigned) declarations."
> + *
> + * We use the symbol table to keep the values of the default precisions for
> + * each 'type' in each scope and we use the 'type' string from the precision
> + * statement as key in the symbol table. When we want to retrieve the default
> + * precision associated with a given glsl_type we need to know the type 
> string
> + * associated with it. This is what this function returns.
> + */
> +static const char *
> +get_type_name_for_precision_qualifier(const glsl_type *type)
>  {
> -   const struct glsl_type *type = this->specifier->glsl_type(name, state);
> -
> -   if (type == NULL)
> -  return NULL;
> +   switch (type->base_type) {
> +   case GLSL_TYPE_FLOAT:
> +  return "float";
> +   case GLSL_TYPE_UINT:
> +   case GLSL_TYPE_INT:
> +  return "int";
> +   case GLSL_TYPE_ATOMIC_UINT:
> +  return "atomic_uint";
> +   case GLSL_TYPE_IMAGE:
> +   /* fallthrough */

I think this is not correct. As far as I understand the spec, we can set
a default precision for any of the image types:

image2D
image3D
imageCube
image2DArray
iimage2D
iimage3D
iimageCube
iimage2DArray
uimage2D
uimage3D
uimageCube
uimage2DArray

but here you are re-using the precisions from samplers, so if we do
this:

#precision lowp sampler2D;
#precision highp image2D;

the latter statement is ignored, and the former affects

Re: [Mesa-dev] nir/i965: Source modifiers on vecN opcodes

2015-11-10 Thread Iago Toral
On Mon, 2015-11-09 at 12:27 -0500, Connor Abbott wrote:
> On Mon, Nov 9, 2015 at 10:41 AM, Jason Ekstrand <ja...@jlekstrand.net> wrote:
> >
> > On Nov 9, 2015 7:24 AM, "Connor Abbott" <cwabbo...@gmail.com> wrote:
> >>
> >> On Mon, Nov 9, 2015 at 6:55 AM, Iago Toral <ito...@igalia.com> wrote:
> >> > Hi,
> >> >
> >> > Currently, NIR defines vecN operations as unsigned (integer). The fp64
> >> > patches from Connor change this to float (I guess because we need to
> >> > know the case where we are packing vectors of 64-bit floats). However,
> >> > this makes it so that  nir_lower_source_to_mods turns this:
> >> >
> >> > vec1 ssa_2 = fmov -ssa_1.y
> >> > vec3 ssa_3 = vec3 ssa_1, ssa_2, ssa_0
> >> >
> >> > into:
> >> >
> >> > vec3 ssa_2 = vec3 ssa_1, -ssa_1.y, ssa_0
> >> >
> >> > This only happens because the vec3 operation is defined as a float
> >> > operation now, otherwise it would not try to do this. It is not clear to
> >> > me if this is by design, I mean, have this kind of things only kick-in
> >> > for float/int and define vecN operations as unsigned to avoid this for
> >> > them.
> >> >
> >> > The problem comes later when we call nir_lower_vec_to_movs in the i965
> >> > vec4 backend. That pass generates a separate MOV for each component in
> >> > the vector, but to do that properly when a negate is involved it needs
> >> > to know if this is a float or an integer operand, which it  does not
> >> > know at this point. The current code always emits an imov, which won't
> >> > work if the operand is a float.
> >> >
> >> > I can think of two solutions for this:
> >> >
> >> > 1) Change nir_lower_source_to_mods so it does not try to rewrite alu
> >> > operations where a source comes from a fmov with a negate, or at least
> >> > if the instruction we are trying to rewrite is a vecN operation (or
> >> > maybe allow this in scalar mode only?)
> >> >
> >> > 2) In nir_lower_vec_to_movs, if a source is negated, check its
> >> > parent_instr and try to guess its type from that (in this example, we
> >> > would see it came from fmov and we can say it is a float and emit fmov
> >> > instead of imov). Not sure if this would work in all possible scenarios
> >> > though.
> >> >
> >> > Opinions?
> >> >
> >> > Iago
> >> >
> >> > ___
> >> > mesa-dev mailing list
> >> > mesa-dev@lists.freedesktop.org
> >> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >>
> >> The only reason I changed vecN to produce floats is to avoid producing
> >> 64-bit integer instructions, which at one point the constant folding
> >> infrastructure couldn't support (but now it can), so you can just
> >> revert the change. Ofc the i965 backend won't be able to express this
> >> directly, but for now you can silently change 64-bit integers to
> >> floats and assert that they only happen in things that copy data
> >> around.
> >
> > I would tend to agree.  We could also make it unsigned so no source
> > modifiers ever make sense.  Meh.
> 
> Oh yeah, I meant assert that we don't get e.g. a 64-bit iadd, so we
> remember to fix that later. When we get support for real 64-bit
> integers, we'll have to only map nir_type_int64/uint64 to DF on gen7.

Ok, sounds reasonable to me. I'll make vecN opcodes unsigned again and
add asserts in the driver to catch 64-bit integer ALU operations.

Thank you both for the suggestions!

Iago

> >
> > If we did want to keep vecN float, the thing to do would be to make
> > vec_to_move lower it to fmovs rather than imovs.  But, like Connor said,
> > just asserting no source modifiers for th 64-bit version in the backend is
> > probably best.
> >
> > --Jason
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Correctly handle vector extract on function parameter

2015-11-12 Thread Iago Toral
Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

On Thu, 2015-11-12 at 00:44 -0800, Jordan Justen wrote:
> This commit accidentally used a '==' when '=' was intended.
> 
> commit 96b22fb080894ba1840af2372f28a46cc0f40c76
> Author: Kristian Høgsberg Kristensen <k...@bitplanet.net>
> Date:   Wed Nov 4 14:58:54 2015 -0800
> 
> glsl: Use array deref for access to vector components
> 
> Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com>
> Cc: Kristian Høgsberg Kristensen <k...@bitplanet.net>
> Cc: Matt Turner <matts...@gmail.com>
> ---
>  Ilia's suggestion seems like what we want.
> 
>  Passed jenkins. Then again, so did the original patch.
> 
>  src/glsl/ast_function.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp
> index 5584470..466ece6 100644
> --- a/src/glsl/ast_function.cpp
> +++ b/src/glsl/ast_function.cpp
> @@ -368,8 +368,8 @@ fix_parameter(void *mem_ctx, ir_rvalue *actual, const 
> glsl_type *formal_type,
>  
> ir_rvalue *lhs = actual;
> if (expr != NULL && expr->operation == ir_binop_vector_extract) {
> -  lhs == new(mem_ctx) 
> ir_dereference_array(expr->operands[0]->clone(mem_ctx, NULL),
> -   
> expr->operands[1]->clone(mem_ctx, NULL));
> +  lhs = new(mem_ctx) 
> ir_dereference_array(expr->operands[0]->clone(mem_ctx, NULL),
> +  
> expr->operands[1]->clone(mem_ctx, NULL));
> }
>  
> ir_assignment *const assignment_2 = new(mem_ctx) ir_assignment(lhs, rhs);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


<    1   2   3   4   5   6   7   8   9   10   >