Re: [Mesa-dev] [PATCH] radeonsi: don't call of u_prims_for_vertices for patches and rectangles

2015-12-09 Thread Michel Dänzer
On 10.12.2015 06:58, Marek Olšák wrote:
> From: Marek Olšák 
> 
> Both caused a crash due to a division by zero in that function.
> This is an alternative fix.
> 
> Cc: 11.0 11.1 
> ---
>  src/gallium/drivers/radeonsi/si_state_draw.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
> b/src/gallium/drivers/radeonsi/si_state_draw.c
> index ee84a1f..e550011 100644
> --- a/src/gallium/drivers/radeonsi/si_state_draw.c
> +++ b/src/gallium/drivers/radeonsi/si_state_draw.c
> @@ -216,6 +216,18 @@ static void si_emit_derived_tess_state(struct si_context 
> *sctx,
>   radeon_emit(cs, tcs_out_layout | (num_tcs_output_cp << 26));
>  }
>  
> +static unsigned si_num_prims_for_vertices(const struct pipe_draw_info *info)
> +{
> + switch (info->mode) {
> + case PIPE_PRIM_PATCHES:
> + return info->count / info->vertices_per_patch;
> + case R600_PRIM_RECTANGLE_LIST:
> + return info->count / 3;
> + default:
> + return u_prims_for_vertices(info->mode, info->count);
> + }
> +}

I don't suppose it makes sense to handle PIPE_PRIM_PATCHES in
u_prims_for_vertices? Either way,

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] mesa/shader: return correct attribute location for double matrix arrays

2015-12-09 Thread Dave Airlie
From: Dave Airlie 

If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19,
dmat2[2] at 21 etc. The old code was returning 17,18,19.

I think this code is also wrong for float matricies as well.

This partly fixes:
GL41-CTS.vertex_attrib_64bit.limits_test
---
 src/mesa/main/shader_query.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
index 5d15006..faaf08c 100644
--- a/src/mesa/main/shader_query.cpp
+++ b/src/mesa/main/shader_query.cpp
@@ -858,7 +858,7 @@ program_resource_location(struct gl_shader_program *shProg,
   && array_index >= RESOURCE_VAR(res)->type->length) {
  return -1;
   }
-  return RESOURCE_VAR(res)->data.location + array_index - 
VERT_ATTRIB_GENERIC0;
+  return RESOURCE_VAR(res)->data.location + (array_index * 
RESOURCE_VAR(res)->type->without_array()->matrix_columns) - 
VERT_ATTRIB_GENERIC0;
case GL_PROGRAM_OUTPUT:
   /* If the output is an array, fail if the index is out of bounds. */
   if (array_index > 0
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/15] i965/fs: Get rid of the param_size array

2015-12-09 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 1 -
 src/mesa/drivers/dri/i965/brw_fs.h   | 2 --
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 9 -
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 ---
 4 files changed, 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 1add656..9b06ed2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1018,7 +1018,6 @@ fs_visitor::import_uniforms(fs_visitor *v)
this->push_constant_loc = v->push_constant_loc;
this->pull_constant_loc = v->pull_constant_loc;
this->uniforms = v->uniforms;
-   this->param_size = v->param_size;
 }
 
 fs_reg *
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index b55589f..b42c49d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -311,8 +311,6 @@ public:
struct brw_stage_prog_data *prog_data;
struct gl_program *prog;
 
-   int *param_size;
-
int *virtual_grf_start;
int *virtual_grf_end;
brw::fs_live_variables *live_intervals;
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index bf239c3..22323b4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -174,15 +174,6 @@ fs_visitor::nir_setup_uniforms()
   return;
 
uniforms = nir->num_uniforms / 4;
-
-   nir_foreach_variable(var, >uniforms) {
-  /* UBO's and atomics don't take up space in the uniform file */
-  if (var->interface_type != NULL || var->type->contains_atomic())
- continue;
-
-  if (type_size_scalar(var->type) > 0)
- param_size[var->data.driver_location / 4] = 
type_size_scalar(var->type);
-   }
 }
 
 static bool
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 68f2548..5616963 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1011,9 +1011,6 @@ fs_visitor::init()
 
this->spilled_any_registers = false;
this->do_dual_src = false;
-
-   if (dispatch_width == 8)
-  this->param_size = rzalloc_array(mem_ctx, int, 
stage_prog_data->nr_params);
 }
 
 fs_visitor::~fs_visitor()
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/15] i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push constants

2015-12-09 Thread Jason Ekstrand
This commit moves us to an instruction based model rather than a
register-based model for indirects.  This is more accurate anyway as we
have to emit instructions to resolve the reladdr.  It's also a lot simpler
because it gets rid of the recursive reladdr problem by design.

One side-effect of this is that we need a whole new algorithm in
move_uniform_array_access_to_pull_constants.  This new algorithm is much
more straightforward than the old one and is fairly similar to what we're
already doing in the FS backend.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4.h   |  3 +-
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 10 +--
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 86 --
 4 files changed, 50 insertions(+), 51 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index a697bdf..e4a405b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -775,7 +775,7 @@ vec4_visitor::move_push_constants_to_pull_constants()
 dst_reg temp = dst_reg(this, glsl_type::vec4_type);
 
 emit_pull_constant_load(block, inst, temp, inst->src[i],
-pull_constant_loc[uniform]);
+pull_constant_loc[uniform], src_reg());
 
 inst->src[i].file = temp.file;
  inst->src[i].nr = temp.nr;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index f2e5ce1..e6d6c82 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -293,7 +293,8 @@ public:
void emit_pull_constant_load(bblock_t *block, vec4_instruction *inst,
dst_reg dst,
src_reg orig_src,
-   int base_offset);
+   int base_offset,
+src_reg indirect);
void emit_pull_constant_load_reg(dst_reg dst,
 src_reg surf_index,
 src_reg offset,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index f965b39..58b6612 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -673,12 +673,14 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
  /* Offsets are in bytes but they should always be multiples of 16 */
  assert(const_offset->u[0] % 16 == 0);
  src.reg_offset = const_offset->u[0] / 16;
+
+ emit(MOV(dest, src));
   } else {
- src_reg tmp = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_D, 1);
- src.reladdr = new(mem_ctx) src_reg(tmp);
-  }
+ src_reg indirect = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_UD, 
1);
 
-  emit(MOV(dest, src));
+ emit(SHADER_OPCODE_MOV_INDIRECT, dest, src,
+  indirect, brw_imm_ud(instr->const_index[1]));
+  }
   break;
}
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 7712d34..e7ab536 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1641,16 +1641,16 @@ vec4_visitor::move_grf_array_access_to_scratch()
 void
 vec4_visitor::emit_pull_constant_load(bblock_t *block, vec4_instruction *inst,
  dst_reg temp, src_reg orig_src,
- int base_offset)
+ int base_offset, src_reg indirect)
 {
int reg_offset = base_offset + orig_src.reg_offset;
const unsigned index = prog_data->base.binding_table.pull_constants_start;
 
src_reg offset;
-   if (orig_src.reladdr) {
+   if (indirect.file != BAD_FILE) {
   offset = src_reg(this, glsl_type::int_type);
 
-  emit_before(block, inst, ADD(dst_reg(offset), *orig_src.reladdr,
+  emit_before(block, inst, ADD(dst_reg(offset), indirect,
brw_imm_d(reg_offset * 16)));
} else if (devinfo->gen >= 8) {
   /* Store the offset in a GRF so we can send-from-GRF. */
@@ -1685,59 +1685,55 @@ 
vec4_visitor::move_uniform_array_access_to_pull_constants()
 {
int pull_constant_loc[this->uniforms];
memset(pull_constant_loc, -1, sizeof(pull_constant_loc));
-   bool nested_reladdr;
 
-   /* Walk through and find array access of uniforms.  Put a copy of that
-* uniform in the pull constant buffer.
-*
-* Note that we don't move constant-indexed accesses to arrays.  No
-* testing has been done of the performance impact of this choice.
+   /* First, walk through the instructions and determine which things need to
+* be pulled.  We mark something as needing to bye pulled by setting
+* pull_constant_loc to 0.
 */
-   do {
-  

[Mesa-dev] [PATCH 14/15] i965/fs: Rename demote_pull_constants to lower_constant_loads

2015-12-09 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++--
 src/mesa/drivers/dri/i965/brw_fs.h   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index a2ec03e..f3cf129 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2016,7 +2016,7 @@ fs_visitor::assign_constant_locations()
  * or VARYING_PULL_CONSTANT_LOAD instructions which load values into VGRFs.
  */
 void
-fs_visitor::demote_pull_constants()
+fs_visitor::lower_constant_loads()
 {
const unsigned index = stage_prog_data->binding_table.pull_constants_start;
 
@@ -5033,7 +5033,7 @@ fs_visitor::optimize()
bld = fs_builder(this, 64);
 
assign_constant_locations();
-   demote_pull_constants();
+   lower_constant_loads();
 
validate();
 
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index b42c49d..eb6ecb2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -136,7 +136,7 @@ public:
void split_virtual_grfs();
bool compact_virtual_grfs();
void assign_constant_locations();
-   void demote_pull_constants();
+   void lower_constant_loads();
void invalidate_live_intervals();
void calculate_live_intervals();
void calculate_register_pressure();
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/15] i965/fs: Use MOV_INDIRECT for all indirect uniform loads

2015-12-09 Thread Jason Ekstrand
Instead of using reladdr, this commit changes the FS backend to emit a
MOV_INDIRECT whenever we need an indirect uniform load.  We also have to
rework some of the other bits of the backend to handle this new form of
uniform load.  The obvious change is that demote_pull_constants now acts
more like a lowering pass when it hits a MOV_INDIRECT.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 72 +++-
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 53 ++-
 2 files changed, 86 insertions(+), 39 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index bf446d2..7cc03c5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1945,8 +1945,8 @@ fs_visitor::assign_constant_locations()
  if (inst->src[i].file != UNIFORM)
 continue;
 
- if (inst->src[i].reladdr) {
-int uniform = inst->src[i].nr;
+ if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
+int uniform = inst->src[0].nr;
 
 /* If this array isn't already present in the pull constant buffer,
  * add it.
@@ -2028,49 +2028,63 @@ fs_visitor::assign_constant_locations()
 void
 fs_visitor::demote_pull_constants()
 {
-   foreach_block_and_inst (block, fs_inst, inst, cfg) {
+   const unsigned index = stage_prog_data->binding_table.pull_constants_start;
+
+   foreach_block_and_inst_safe (block, fs_inst, inst, cfg) {
+  /* Set up the annotation tracking for new generated instructions. */
+  const fs_builder ibld(this, block, inst);
+
   for (int i = 0; i < inst->sources; i++) {
 if (inst->src[i].file != UNIFORM)
continue;
 
- int pull_index;
+ /* We'll handle this case later */
+ if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0)
+continue;
+
  unsigned location = inst->src[i].nr + inst->src[i].reg_offset;
- if (location >= uniforms) /* Out of bounds access */
-pull_index = -1;
- else
-pull_index = pull_constant_loc[location];
+ if (location >= uniforms)
+continue; /* Out of bounds access */
+
+ int pull_index = pull_constant_loc[location];
 
  if (pull_index == -1)
continue;
 
- /* Set up the annotation tracking for new generated instructions. */
- const fs_builder ibld(this, block, inst);
- const unsigned index = 
stage_prog_data->binding_table.pull_constants_start;
- fs_reg dst = vgrf(glsl_type::float_type);
-
  assert(inst->src[i].stride == 0);
 
- /* Generate a pull load into dst. */
- if (inst->src[i].reladdr) {
-VARYING_PULL_CONSTANT_LOAD(ibld, dst,
-   brw_imm_ud(index),
-   *inst->src[i].reladdr,
-   pull_index * 4);
-inst->src[i].reladdr = NULL;
-inst->src[i].stride = 1;
- } else {
-const fs_builder ubld = ibld.exec_all().group(8, 0);
-struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & 
~15);
-ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
-  dst, brw_imm_ud(index), offset);
-inst->src[i].set_smear(pull_index & 3);
- }
- brw_mark_surface_used(prog_data, index);
+ fs_reg dst = vgrf(glsl_type::float_type);
+ const fs_builder ubld = ibld.exec_all().group(8, 0);
+ struct brw_reg offset = brw_imm_ud((unsigned)(pull_index * 4) & ~15);
+ ubld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD,
+   dst, brw_imm_ud(index), offset);
 
  /* Rewrite the instruction to use the temporary VGRF. */
  inst->src[i].file = VGRF;
  inst->src[i].nr = dst.nr;
  inst->src[i].reg_offset = 0;
+ inst->src[i].set_smear(pull_index & 3);
+
+ brw_mark_surface_used(prog_data, index);
+  }
+
+  if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT &&
+  inst->src[0].file == UNIFORM) {
+
+ unsigned location = inst->src[0].nr + inst->src[0].reg_offset;
+ if (location >= uniforms)
+continue; /* Out of bounds access */
+
+ int pull_index = pull_constant_loc[location];
+ assert(pull_index >= 0); /* This had better be pull */
+
+ VARYING_PULL_CONSTANT_LOAD(ibld, inst->dst,
+brw_imm_ud(index),
+inst->src[1],
+pull_index * 4);
+ inst->remove(block);
+
+ brw_mark_surface_used(prog_data, index);
   }
}
invalidate_live_intervals();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 15d9b1c..bf239c3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ 

[Mesa-dev] [PATCH 01/15] nir: Add another index to load_uniform to specify the range read

2015-12-09 Thread Jason Ekstrand
---
 src/glsl/nir/nir_intrinsics.h | 7 +--
 src/glsl/nir/nir_lower_io.c   | 5 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
index 63df21e..c329a82 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -236,6 +236,9 @@ SYSTEM_VALUE(helper_invocation, 1, 0)
  * of the start of the variable being loaded and and the offset source is a
  * offset into that variable.
  *
+ * Uniform load operations have a second index that specifies the size of the
+ * variable being loaded.  If const_index[1] == 0, then the size is unknown.
+ *
  * Some load operations such as UBO/SSBO load and per_vertex loads take an
  * additional source to specify which UBO/SSBO/vertex to load from.
  *
@@ -248,8 +251,8 @@ SYSTEM_VALUE(helper_invocation, 1, 0)
 #define LOAD(name, srcs, indices, flags) \
INTRINSIC(load_##name, srcs, ARR(1, 1, 1, 1), true, 0, 0, indices, flags)
 
-/* src[] = { offset }. const_index[] = { base } */
-LOAD(uniform, 1, 1, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
+/* src[] = { offset }. const_index[] = { base, size } */
+LOAD(uniform, 1, 2, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
 /* src[] = { buffer_index, offset }. No const_index */
 LOAD(ubo, 2, 0, NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
 /* src[] = { offset }. const_index[0] = { base } */
diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c
index a2723d5..b73ceec 100644
--- a/src/glsl/nir/nir_lower_io.c
+++ b/src/glsl/nir/nir_lower_io.c
@@ -216,6 +216,11 @@ nir_lower_io_block(nir_block *block, void *void_state)
  load->const_index[0] =
 intrin->variables[0]->var->data.driver_location;
 
+ if (load->intrinsic == nir_intrinsic_load_uniform) {
+load->const_index[1] =
+   state->type_size(intrin->variables[0]->var->type);
+ }
+
  if (per_vertex)
 load->src[0] = nir_src_for_ssa(vertex_index);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/15] i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions

2015-12-09 Thread Jason Ekstrand
It should work fine without it and the visitor can set it if it wants.
---
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index c25da07..d86eee1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -366,7 +366,6 @@ fs_generator::generate_mov_indirect(fs_inst *inst,
assert(inst->exec_size == 8 || devinfo->gen >= 8);
 
brw_MOV(p, addr, indirect_byte_offset);
-   brw_inst_set_mask_control(devinfo, brw_last_inst, BRW_MASK_DISABLE);
brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset), dst.type));
 }
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/15] i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware

2015-12-09 Thread Jason Ekstrand
While we're at it, we also add support for the possibility that the
indirect is, in fact, a constant.  This shouldn't happen in the common case
(if it does, that means NIR failed to constant-fold something), but it's
possible so we should handle it.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   |  4 ++
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 51 +++---
 2 files changed, 42 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 9eaf8d0..a2ec03e 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4424,6 +4424,10 @@ get_lowered_simd_width(const struct brw_device_info 
*devinfo,
case SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL:
   return 8;
 
+   case SHADER_OPCODE_MOV_INDIRECT:
+  /* Prior to Broadwell, we only have 8 address subregisters */
+  return devinfo->gen < 8 ? 8 : inst->exec_size;
+
default:
   return inst->exec_size;
}
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index d86eee1..7fa6d84 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -351,22 +351,47 @@ fs_generator::generate_mov_indirect(fs_inst *inst,
 
unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr;
 
-   /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */
-   struct brw_reg addr = vec8(brw_address_reg(0));
+   if (indirect_byte_offset.file == BRW_IMMEDIATE_VALUE) {
+  imm_byte_offset += indirect_byte_offset.ud;
 
-   /* The destination stride of an instruction (in bytes) must be greater
-* than or equal to the size of the rest of the instruction.  Since the
-* address register is of type UW, we can't use a D-type instruction.
-* In order to get around this, re re-type to UW and use a stride.
-*/
-   indirect_byte_offset =
-  retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW);
+  reg.nr = imm_byte_offset / REG_SIZE;
+  reg.subnr = imm_byte_offset % REG_SIZE;
+  brw_MOV(p, dst, reg);
+   } else {
+  /* Prior to Broadwell, there are only 8 address registers. */
+  assert(inst->exec_size == 8 || devinfo->gen >= 8);
 
-   /* Prior to Broadwell, there are only 8 address registers. */
-   assert(inst->exec_size == 8 || devinfo->gen >= 8);
+  /* We use VxH indirect addressing, clobbering a0.0 through a0.7. */
+  struct brw_reg addr = vec8(brw_address_reg(0));
 
-   brw_MOV(p, addr, indirect_byte_offset);
-   brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset), dst.type));
+  /* The destination stride of an instruction (in bytes) must be greater
+   * than or equal to the size of the rest of the instruction.  Since the
+   * address register is of type UW, we can't use a D-type instruction.
+   * In order to get around this, re re-type to UW and use a stride.
+   */
+  indirect_byte_offset =
+ retype(spread(indirect_byte_offset, 2), BRW_REGISTER_TYPE_UW);
+
+  if (devinfo->gen < 8) {
+ /* Prior to broadwell, we have a restriction that the bottom 5 bits
+  * of the base offset and the bottom 5 bits of the indirect must add
+  * to less than 32.  In other words, the hardware needs to be able to
+  * add the bottom five bits of the two to get the subnumber and add
+  * the next 7 bits of each to get the actual register number.  Since
+  * the indirect may cause us to cross a register boundary, this makes
+  * it almost useless.  We could try and do something clever where we
+  * use a actual base offset if base_offset % 32 == 0 but that would
+  * mean we were generating different code depending on the base
+  * offset.  Instead, for the sake of consistency, we'll just do the
+  * add ourselves.
+  */
+ brw_ADD(p, addr, indirect_byte_offset, brw_imm_uw(imm_byte_offset));
+ brw_MOV(p, dst, retype(brw_VxH_indirect(0, 0), dst.type));
+  } else {
+ brw_MOV(p, addr, indirect_byte_offset);
+ brw_MOV(p, dst, retype(brw_VxH_indirect(0, imm_byte_offset), 
dst.type));
+  }
+   }
 }
 
 void
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/15] i965/vec4: Inline get_pull_constant_offset

2015-12-09 Thread Jason Ekstrand
It's not really doing enough anymore to justify a helper function.
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  2 --
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 37 ++
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index ae5bf69..f2e5ce1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -284,8 +284,6 @@ public:
 
src_reg get_scratch_offset(bblock_t *block, vec4_instruction *inst,
  src_reg *reladdr, int reg_offset);
-   src_reg get_pull_constant_offset(bblock_t *block, vec4_instruction *inst,
-   src_reg *reladdr, int reg_offset);
void emit_scratch_read(bblock_t *block, vec4_instruction *inst,
  dst_reg dst,
  src_reg orig_src,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 443d0eb..7712d34 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1464,27 +1464,6 @@ vec4_visitor::get_scratch_offset(bblock_t *block, 
vec4_instruction *inst,
}
 }
 
-src_reg
-vec4_visitor::get_pull_constant_offset(bblock_t * block, vec4_instruction 
*inst,
-  src_reg *reladdr, int reg_offset)
-{
-   if (reladdr) {
-  src_reg index = src_reg(this, glsl_type::int_type);
-
-  emit_before(block, inst, ADD(dst_reg(index), *reladdr,
-   brw_imm_d(reg_offset * 16)));
-
-  return index;
-   } else if (devinfo->gen >= 8) {
-  /* Store the offset in a GRF so we can send-from-GRF. */
-  src_reg offset = src_reg(this, glsl_type::int_type);
-  emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset * 
16)));
-  return offset;
-   } else {
-  return brw_imm_d(reg_offset * 16);
-   }
-}
-
 /**
  * Emits an instruction before @inst to load the value named by @orig_src
  * from scratch space at @base_offset to @temp.
@@ -1666,8 +1645,20 @@ vec4_visitor::emit_pull_constant_load(bblock_t *block, 
vec4_instruction *inst,
 {
int reg_offset = base_offset + orig_src.reg_offset;
const unsigned index = prog_data->base.binding_table.pull_constants_start;
-   src_reg offset = get_pull_constant_offset(block, inst, orig_src.reladdr,
- reg_offset);
+
+   src_reg offset;
+   if (orig_src.reladdr) {
+  offset = src_reg(this, glsl_type::int_type);
+
+  emit_before(block, inst, ADD(dst_reg(offset), *orig_src.reladdr,
+   brw_imm_d(reg_offset * 16)));
+   } else if (devinfo->gen >= 8) {
+  /* Store the offset in a GRF so we can send-from-GRF. */
+  offset = src_reg(this, glsl_type::int_type);
+  emit_before(block, inst, MOV(dst_reg(offset), brw_imm_d(reg_offset * 
16)));
+   } else {
+  offset = brw_imm_d(reg_offset * 16);
+   }
 
emit_pull_constant_load_reg(temp,
brw_imm_ud(index),
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/15] i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD

2015-12-09 Thread Jason Ekstrand
This is to prevent shader-db regressions from D <-> UD conversions in deref
add+mul chains caused by using MOV_INDIRECT.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 5e8acec..bf446d2 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -174,7 +174,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
 * CSE can later notice that those loads are all the same and eliminate
 * the redundant ones.
 */
-   fs_reg vec4_offset = vgrf(glsl_type::int_type);
+   fs_reg vec4_offset = vgrf(glsl_type::uint_type);
bld.ADD(vec4_offset, varying_offset, brw_imm_ud(const_offset & ~0xf));
 
int scale = 1;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/15] i965/fs: Stop relying on param_size in assign_constant_locations

2015-12-09 Thread Jason Ekstrand
Now that we have MOV_INDIRECT opcodes, we have all of the size information
we need directly in the opcode.  With a little restructuring of the
algorithm used in assign_constant_locations we don't need param_size
anymore.  The big thing to watch out for now, however, is that you can have
two ranges overlap where neither contains the other.  In order to deal with
this, we make the first pass just flag what needs pulling and handle
assigning pull constant locations until later.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 44 ++--
 1 file changed, 17 insertions(+), 27 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 786c5fb..1add656 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1920,14 +1920,12 @@ fs_visitor::assign_constant_locations()
if (dispatch_width != 8)
   return;
 
-   unsigned int num_pull_constants = 0;
-
-   pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
-   memset(pull_constant_loc, -1, sizeof(pull_constant_loc[0]) * uniforms);
-
bool is_live[uniforms];
memset(is_live, 0, sizeof(is_live));
 
+   bool needs_pull[uniforms];
+   memset(needs_pull, 0, sizeof(is_live));
+
/* First, we walk through the instructions and do two things:
 *
 *  1) Figure out which uniforms are live.
@@ -1943,20 +1941,15 @@ fs_visitor::assign_constant_locations()
  if (inst->src[i].file != UNIFORM)
 continue;
 
- if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
-int uniform = inst->src[0].nr;
+ int constant_nr = inst->src[i].nr + inst->src[i].reg_offset;
 
-/* If this array isn't already present in the pull constant buffer,
- * add it.
- */
-if (pull_constant_loc[uniform] == -1) {
-   assert(param_size[uniform]);
-   for (int j = 0; j < param_size[uniform]; j++)
-  pull_constant_loc[uniform + j] = num_pull_constants++;
+ if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
+for (unsigned j = 0; j < inst->src[2].ud / 4; j++) {
+   is_live[constant_nr + j] = true;
+   needs_pull[constant_nr + j] = true;
 }
  } else {
 /* Mark the the one accessed uniform as live */
-int constant_nr = inst->src[i].nr + inst->src[i].reg_offset;
 if (constant_nr >= 0 && constant_nr < (int) uniforms)
is_live[constant_nr] = true;
  }
@@ -1973,26 +1966,23 @@ fs_visitor::assign_constant_locations()
 */
unsigned int max_push_components = 16 * 8;
unsigned int num_push_constants = 0;
+   unsigned int num_pull_constants = 0;
 
push_constant_loc = ralloc_array(mem_ctx, int, uniforms);
+   pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
 
for (unsigned int i = 0; i < uniforms; i++) {
-  if (!is_live[i] || pull_constant_loc[i] != -1) {
- /* This UNIFORM register is either dead, or has already been demoted
-  * to a pull const.  Mark it as no longer living in the param[] array.
-  */
- push_constant_loc[i] = -1;
+  push_constant_loc[i] = -1;
+  pull_constant_loc[i] = -1;
+
+  if (!is_live[i])
  continue;
-  }
 
-  if (num_push_constants < max_push_components) {
- /* Retain as a push constant.  Record the location in the params[]
-  * array.
-  */
+  if (!needs_pull[i] && num_push_constants < max_push_components) {
+ /* Retain as a push constant */
  push_constant_loc[i] = num_push_constants++;
   } else {
- /* Demote to a pull constant. */
- push_constant_loc[i] = -1;
+ /* We have to pull it */
  pull_constant_loc[i] = num_pull_constants++;
   }
}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/15] i965/fs: Add support for doing MOV_INDIRECT on uniforms

2015-12-09 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 9b06ed2..de33c1d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -848,7 +848,10 @@ fs_inst::regs_read(int arg) const
  assert(src[2].file == IMM);
  unsigned region_length = src[2].ud;
 
- if (src[0].file == FIXED_GRF) {
+ if (src[0].file == UNIFORM) {
+assert(region_length % 4 == 0);
+return region_length / 4;
+ } else if (src[0].file == FIXED_GRF) {
 /* If the start of the region is not register aligned, then
  * there's some portion of the register that's technically
  * unread at the beginning.
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/15] i965/vec4: Get rid of the uniform_size array

2015-12-09 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  8 
 src/mesa/drivers/dri/i965/brw_vec4.h  |  2 --
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp|  9 -
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 11 ---
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |  1 -
 5 files changed, 31 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e4a405b..1304e23 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -466,11 +466,6 @@ vec4_visitor::split_uniform_registers()
 inst->src[i].reg_offset = 0;
   }
}
-
-   /* Update that everything is now vector-sized. */
-   for (int i = 0; i < this->uniforms; i++) {
-  this->uniform_size[i] = 1;
-   }
 }
 
 void
@@ -528,7 +523,6 @@ vec4_visitor::pack_uniform_registers()
 * push constants.
 */
for (int src = 0; src < uniforms; src++) {
-  assert(src < uniform_array_size);
   int size = chans_used[src];
 
   if (size == 0)
@@ -1588,8 +1582,6 @@ vec4_visitor::setup_uniforms(int reg)
 * matter what, or the GPU would hang.
 */
if (devinfo->gen < 6 && this->uniforms == 0) {
-  assert(this->uniforms < this->uniform_array_size);
-
   stage_prog_data->param =
  reralloc(NULL, stage_prog_data->param, const gl_constant_value *, 4);
   for (unsigned int i = 0; i < 4; i++) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index e6d6c82..0dc04ea 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -115,8 +115,6 @@ public:
 */
dst_reg output_reg[BRW_VARYING_SLOT_COUNT];
const char *output_reg_annotation[BRW_VARYING_SLOT_COUNT];
-   int *uniform_size;
-   int uniform_array_size; /*< Size of the uniform_size array */
int uniforms;
 
src_reg shader_start_time;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 58b6612..bafc9a5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -118,15 +118,6 @@ void
 vec4_visitor::nir_setup_uniforms()
 {
uniforms = nir->num_uniforms / 16;
-
-   nir_foreach_variable(var, >uniforms) {
-  /* UBO's and atomics don't take up space in the uniform file */
-  if (var->interface_type != NULL || var->type->contains_atomic())
- continue;
-
-  if (type_size_vec4(var->type) > 0)
- uniform_size[var->data.driver_location / 16] = 
type_size_vec4(var->type);
-   }
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index e7ab536..138db7e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1786,17 +1786,6 @@ vec4_visitor::vec4_visitor(const struct brw_compiler 
*compiler,
this->max_grf = devinfo->gen >= 7 ? GEN7_MRF_HACK_START : BRW_MAX_GRF;
 
this->uniforms = 0;
-
-   /* Initialize uniform_array_size to at least 1 because pre-gen6 VS requires
-* at least one. See setup_uniforms() in brw_vec4.cpp.
-*/
-   this->uniform_array_size = 1;
-   if (prog_data) {
-  this->uniform_array_size =
- MAX2(DIV_ROUND_UP(stage_prog_data->nr_params, 4), 1);
-   }
-
-   this->uniform_size = rzalloc_array(mem_ctx, int, this->uniform_array_size);
 }
 
 vec4_visitor::~vec4_visitor()
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
index fd8be7d..205323c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp
@@ -261,7 +261,6 @@ void
 vec4_vs_visitor::setup_uniform_clipplane_values()
 {
for (int i = 0; i < key->nr_userclip_plane_consts; ++i) {
-  assert(this->uniforms < uniform_array_size);
   this->userplane[i] = dst_reg(UNIFORM, this->uniforms);
   this->userplane[i].type = BRW_REGISTER_TYPE_F;
   for (int j = 0; j < 4; ++j) {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/15] i965/fs: Push small uniform arrays

2015-12-09 Thread Jason Ekstrand
Unfortunately, this also means that we need to use a slightly different
algorithm for assign_constant_locations.  The old algorithm worked based on
the assumption that each read of a uniform value read exactly one float.
If it encountered a MOV_INDIRECT, it would immediately bail and push the
whole thing.  Since we can now read ranges using MOV_INDIRECT, we need to
be able to push a series of floats without breaking them up.  To do this,
we use an algorithm similar to the on in split_virtual_grfs.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 76 +---
 1 file changed, 53 insertions(+), 23 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index f3cf129..98f8336 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1911,9 +1911,7 @@ fs_visitor::compact_virtual_grfs()
  * maximum number of fragment shader uniform components (64).  If
  * there are too many of these, they'd fill up all of register space.
  * So, this will push some of them out to the pull constant buffer and
- * update the program to load them.  We also use pull constants for all
- * indirect constant loads because we don't support indirect accesses in
- * registers yet.
+ * update the program to load them.
  */
 void
 fs_visitor::assign_constant_locations()
@@ -1925,15 +1923,18 @@ fs_visitor::assign_constant_locations()
bool is_live[uniforms];
memset(is_live, 0, sizeof(is_live));
 
-   bool needs_pull[uniforms];
-   memset(needs_pull, 0, sizeof(is_live));
+   /* For each uniform slot, a value of true indicates that the given slot and
+* the next slot must remain contiguous.  This is used to keep us from
+* splitting arrays apart.
+*/
+   bool contiguous[uniforms];
+   memset(contiguous, 0, sizeof(contiguous));
 
/* First, we walk through the instructions and do two things:
 *
 *  1) Figure out which uniforms are live.
 *
-*  2) Find all indirect access of uniform arrays and flag them as needing
-* to go into the pull constant buffer.
+*  2) Mark any indirectly used ranges of registers as contiguous.
 *
 * Note that we don't move constant-indexed accesses to arrays.  No
 * testing has been done of the performance impact of this choice.
@@ -1946,12 +1947,16 @@ fs_visitor::assign_constant_locations()
  int constant_nr = inst->src[i].nr + inst->src[i].reg_offset;
 
  if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
-for (unsigned j = 0; j < inst->src[2].ud / 4; j++) {
-   is_live[constant_nr + j] = true;
-   needs_pull[constant_nr + j] = true;
+assert(inst->src[2].ud % 4 == 0);
+unsigned last = constant_nr + (inst->src[2].ud / 4) - 1;
+assert(last < uniforms);
+
+for (unsigned j = constant_nr; j < last; j++) {
+   is_live[j] = true;
+   contiguous[j] = true;
 }
+is_live[last] = true;
  } else {
-/* Mark the the one accessed uniform as live */
 if (constant_nr >= 0 && constant_nr < (int) uniforms)
is_live[constant_nr] = true;
  }
@@ -1966,26 +1971,49 @@ fs_visitor::assign_constant_locations()
 * If changing this value, note the limitation about total_regs in
 * brw_curbe.c.
 */
-   unsigned int max_push_components = 16 * 8;
+   const unsigned int max_push_components = 16 * 8;
+
+   /* We push small arrays, but no bigger than 16 floats.  This is big enough
+* for a vec4 but hopefully not large enough to push out other stuff.  We
+* should probably use a better heuristic at some point.
+*/
+   const unsigned int max_chunk_size = 16;
+
unsigned int num_push_constants = 0;
unsigned int num_pull_constants = 0;
 
push_constant_loc = ralloc_array(mem_ctx, int, uniforms);
pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
 
-   for (unsigned int i = 0; i < uniforms; i++) {
-  push_constant_loc[i] = -1;
-  pull_constant_loc[i] = -1;
+   int chunk_start = -1;
+   for (unsigned u = 0; u < uniforms; u++) {
+  push_constant_loc[u] = -1;
+  pull_constant_loc[u] = -1;
 
-  if (!is_live[i])
+  if (!is_live[u])
  continue;
 
-  if (!needs_pull[i] && num_push_constants < max_push_components) {
- /* Retain as a push constant */
- push_constant_loc[i] = num_push_constants++;
-  } else {
- /* We have to pull it */
- pull_constant_loc[i] = num_pull_constants++;
+  /* This is the first live uniform in the chunk */
+  if (chunk_start < 0)
+ chunk_start = u;
+
+  /* If this element does not need to be contiguous with the next, we
+   * split at this point and everthing between chunk_start and u forms a
+   * single chunk.
+   */
+  if (!contiguous[u]) {
+ unsigned chunk_size = u - chunk_start + 1;
+
+ if 

[Mesa-dev] [PATCH 12/15] i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr

2015-12-09 Thread Jason Ekstrand
The subnr field is in bytes so we don't need to multiply by type_sz.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index de33c1d..9eaf8d0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -865,7 +865,7 @@ fs_inst::regs_read(int arg) const
  * unread portion at the beginning.
  */
 if (src[0].subnr)
-   region_length += src[0].subnr * type_sz(src[0].type);
+   region_length += src[0].subnr;
 
 return DIV_ROUND_UP(region_length, REG_SIZE);
  } else {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/15] i965/fs: Get rid of reladdr

2015-12-09 Thread Jason Ekstrand
We aren't using it anymore.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 7 +--
 src/mesa/drivers/dri/i965/brw_ir_fs.h | 5 +
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 7cc03c5..786c5fb 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -433,7 +433,6 @@ fs_reg::fs_reg(struct ::brw_reg reg) :
 {
this->reg_offset = 0;
this->subreg_offset = 0;
-   this->reladdr = NULL;
this->stride = 1;
if (this->file == IMM &&
(this->type != BRW_REGISTER_TYPE_V &&
@@ -448,7 +447,6 @@ fs_reg::equals(const fs_reg ) const
 {
return (this->backend_reg::equals(r) &&
subreg_offset == r.subreg_offset &&
-   !reladdr && !r.reladdr &&
stride == r.stride);
 }
 
@@ -4716,9 +4714,7 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
  break;
   case UNIFORM:
  fprintf(file, "u%d", inst->src[i].nr + inst->src[i].reg_offset);
- if (inst->src[i].reladdr) {
-fprintf(file, "+reladdr");
- } else if (inst->src[i].subreg_offset) {
+ if (inst->src[i].subreg_offset) {
 fprintf(file, "+%d.%d", inst->src[i].reg_offset,
 inst->src[i].subreg_offset);
  }
@@ -4829,7 +4825,6 @@ fs_visitor::get_instruction_generating_reg(fs_inst *start,
 {
if (end == start ||
end->is_partial_write() ||
-   reg.reladdr ||
!reg.equals(end->dst)) {
   return NULL;
} else {
diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index c3eec2e..e4f20f4 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -58,8 +58,6 @@ public:
 */
int subreg_offset;
 
-   fs_reg *reladdr;
-
/** Register region horizontal stride */
uint8_t stride;
 };
@@ -136,8 +134,7 @@ component(fs_reg reg, unsigned idx)
 static inline bool
 is_uniform(const fs_reg )
 {
-   return (reg.stride == 0 || reg.is_null()) &&
-  (!reg.reladdr || is_uniform(*reg.reladdr));
+   return (reg.stride == 0 || reg.is_null());
 }
 
 /**
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC] i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT

2015-12-09 Thread Jason Ekstrand
This is an initial implementation of the MOV_INDIRECT opcode in the vec4
backend.  Unfortunately, I haven't had a chance to test it in the wild yet,
but I think review would still be good.  In particular, the approach I took
to handling swizzles.

Unfortunately, the only indirect MOV instructions you can use in align16
mode have a uniform indirect.  This means that, in order to do an indirect
MOV, we need to either do two movs or use align1 mode.  The problem with
two MOVs is that, in order to force first/second half, you have to disable
writemasking so you might as well be in align1.  In align1 mode, we have
two options for indirects:  We could use height of 2 and make the hardware
grab two sets of 4 consecutive dwords for us or we could use a UV
immediate to add 0, 4, 8, and 12 to the four channels.  The second method
only works easily on SNB+ because we don't have UV immediates on ILK and
previous.  However, the first method (use a height of 2) may have
interesting hardware implications if any of those sets of 4 dwords ever
crosses a register boundary.  I didn't want to count software always giving
us vec4-aligned offsets, so I decided on using the UV immediate.

Ok, so swizzles.  One of the other advantages of adding a UV immediate is
that we can make the immediate be whatever we want.  It doesn't have to be
(0, 4, 8, 12); it could be anything.  So, I used it to implement swizzling.
Do we want swizzling?  Does it make sense to load a swizzled value starting
at an arbitrary offset?  Does it make sense to support swizzling but not
writemasking?  I don't know.  Thoughts?

---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 45 
 1 file changed, 45 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index c3426dd..71a7f63 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1052,6 +1052,48 @@ generate_set_simd4x2_header_gen9(struct brw_codegen *p,
 }
 
 static void
+generate_mov_indirect(struct brw_codegen *p,
+  vec4_instruction *inst,
+  struct brw_reg dst, struct brw_reg reg,
+  struct brw_reg indirect, struct brw_reg length)
+{
+   assert(indirect.type == BRW_REGISTER_TYPE_UD);
+
+   unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr * (REG_SIZE / 2);
+
+   /* This instruction acts in align1 mode */
+   assert(inst->force_writemask_all || reg.writemask == 0xf);
+
+   brw_push_insn_state(p);
+   brw_set_default_access_mode(p, BRW_ALIGN_1);
+   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
+
+   struct brw_reg addr = vec2(brw_address_reg(0));
+
+   /* We need to move the indirect value into the address register.  In order
+* to make things make some sense, we want to respect at least the X
+* component of the swizzle.  In order to do that, we need to convert the
+* subnr (probably 0) to an align1 subnr and add in the swizzle.  We then
+* use a region of <8,4,0>:uw to pick off the first 2 bytes of the indirect
+* and splat it out to all four channels of the given half of a0.
+*/
+   assert(brw_is_single_value_swizzle(indirect.swizzle));
+   indirect.subnr = (indirect.subnr * 4 + BRW_GET_SWZ(indirect.swizzle, 0)) * 
2;
+   indirect = stride(retype(indirect, BRW_REGISTER_TYPE_UW), 8, 4, 0);
+
+   brw_ADD(p, addr, indirect, brw_imm_uw(imm_byte_offset));
+
+   /* Use a <4,1> region Vx1 region*/
+   struct brw_reg src = brw_VxH_indirect(0, 0);
+   src.width = BRW_WIDTH_4;
+   src.hstride = BRW_HORIZONTAL_STRIDE_1;
+
+   brw_MOV(p, dst, retype(src, reg.type));
+
+   brw_pop_insn_state(p);
+}
+
+static void
 generate_code(struct brw_codegen *p,
   const struct brw_compiler *compiler,
   void *log_data,
@@ -1538,6 +1580,9 @@ generate_code(struct brw_codegen *p,
  break;
   }
 
+  case SHADER_OPCODE_MOV_INDIRECT:
+ generate_mov_indirect(p, inst, dst, src[0], src[1], src[2]);
+
   default:
  unreachable("Unsupported opcode");
   }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/4] winsys/radeon: clear the buffer cache on allocation failure and try again

2015-12-09 Thread Michel Dänzer
On 10.12.2015 06:58, Marek Olšák wrote:
> From: Marek Olšák 

The series is

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 05/12] nir/lower_io: Get rid of load/store_foo_indirect

2015-12-09 Thread Jason Ekstrand
On Tue, Dec 8, 2015 at 6:25 PM, Kenneth Graunke  wrote:
> On Tuesday, December 08, 2015 01:46:22 PM Jason Ekstrand wrote:
>> diff --git a/src/glsl/nir/nir_lower_io.c b/src/glsl/nir/nir_lower_io.c
>> index f64ac69..a2723d5 100644
>> --- a/src/glsl/nir/nir_lower_io.c
>> +++ b/src/glsl/nir/nir_lower_io.c
>> @@ -333,18 +311,18 @@ nir_lower_io(nir_shader *shader, nir_variable_mode 
>> mode,
>>   * Return the indirect source for a load/store indirect intrinsic.
>
> This comment could use updating.  How about:
>
> /**
>  * Return the offset soruce for a load/store intrinsic.
>  */

Good call.  Fixed locally.
--Jason

>>   */
>>  nir_src *
>> -nir_get_io_indirect_src(nir_intrinsic_instr *instr)
>> +nir_get_io_offset_src(nir_intrinsic_instr *instr)
>>  {
>> switch (instr->intrinsic) {
>> -   case nir_intrinsic_load_input_indirect:
>> -   case nir_intrinsic_load_output_indirect:
>> -   case nir_intrinsic_load_uniform_indirect:
>> +   case nir_intrinsic_load_input:
>> +   case nir_intrinsic_load_output:
>> +   case nir_intrinsic_load_uniform:
>>return >src[0];
>> -   case nir_intrinsic_load_per_vertex_input_indirect:
>> -   case nir_intrinsic_load_per_vertex_output_indirect:
>> -   case nir_intrinsic_store_output_indirect:
>> +   case nir_intrinsic_load_per_vertex_input:
>> +   case nir_intrinsic_load_per_vertex_output:
>> +   case nir_intrinsic_store_output:
>>return >src[1];
>> -   case nir_intrinsic_store_per_vertex_output_indirect:
>> +   case nir_intrinsic_store_per_vertex_output:
>>return >src[2];
>> default:
>>return NULL;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/shader: return correct attribute location for double matrix arrays

2015-12-09 Thread Timothy Arceri
On Thu, 2015-12-10 at 13:41 +1000, Dave Airlie wrote:
> From: Dave Airlie 
> 
> If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19,
> dmat2[2] at 21 etc. The old code was returning 17,18,19.
> 
> I think this code is also wrong for float matricies as well.

Would be good to have a piglit for the float case.

Reviewed-by: Timothy Arceri 

> 
> This partly fixes:
> GL41-CTS.vertex_attrib_64bit.limits_test
> ---
>  src/mesa/main/shader_query.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/shader_query.cpp
> b/src/mesa/main/shader_query.cpp
> index 5d15006..faaf08c 100644
> --- a/src/mesa/main/shader_query.cpp
> +++ b/src/mesa/main/shader_query.cpp
> @@ -858,7 +858,7 @@ program_resource_location(struct
> gl_shader_program *shProg,
>&& array_index >= RESOURCE_VAR(res)->type->length) {
>   return -1;
>}
> -  return RESOURCE_VAR(res)->data.location + array_index -
> VERT_ATTRIB_GENERIC0;
> +  return RESOURCE_VAR(res)->data.location + (array_index *
> RESOURCE_VAR(res)->type->without_array()->matrix_columns) -
> VERT_ATTRIB_GENERIC0;
> case GL_PROGRAM_OUTPUT:
>/* If the output is an array, fail if the index is out of
> bounds. */
>if (array_index > 0
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/shader: return correct attribute location for double matrix arrays

2015-12-09 Thread Tapani Pälli


On 12/10/2015 05:41 AM, Dave Airlie wrote:

From: Dave Airlie 

If we have a dmat2[4], then dmat2[0] is at 17, dmat2[1] at 19,
dmat2[2] at 21 etc. The old code was returning 17,18,19.

I think this code is also wrong for float matricies as well.

This partly fixes:
GL41-CTS.vertex_attrib_64bit.limits_test
---
  src/mesa/main/shader_query.cpp | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/shader_query.cpp b/src/mesa/main/shader_query.cpp
index 5d15006..faaf08c 100644
--- a/src/mesa/main/shader_query.cpp
+++ b/src/mesa/main/shader_query.cpp
@@ -858,7 +858,7 @@ program_resource_location(struct gl_shader_program *shProg,
&& array_index >= RESOURCE_VAR(res)->type->length) {
   return -1;
}
-  return RESOURCE_VAR(res)->data.location + array_index - 
VERT_ATTRIB_GENERIC0;
+  return RESOURCE_VAR(res)->data.location + (array_index * 
RESOURCE_VAR(res)->type->without_array()->matrix_columns) - VERT_ATTRIB_GENERIC0;


There are some lines that exceed 80 chars in this file but this seems 
way too long (?) You could put this case in braces and have a temporary 
for glsl_type or maybe matrix cols to make it fit. Otherwise seems 
correct to me. Looks like this has been broken for quite a long time, 
even before program interface query was introduced.


Reviewed-by: Tapani Pälli 


 case GL_PROGRAM_OUTPUT:
/* If the output is an array, fail if the index is out of bounds. */
if (array_index > 0



// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/15] i965: Rework uniform handling in the back-end

2015-12-09 Thread Jason Ekstrand
This series is kind-of two series in one.  First, we rework the way we do
uniform handling all throughout both backends.  In particular, we stop
using reladdr and, instead, start using MOV_INDIRECT.  Doing this allows us
some nice simplifications:  First, we no longer have the recursive reladdr
problem; it's gone by design.  Second, we can now get rid of the extra
arrays of uniform sizes that we've been carying around everywhere.  The
fact that they exist has bothered me for a while, but no more!

The second half of the series does a bunch of cleanups on the MOV_INDIRECT
instruction in the FS and then starts using it to do indirect push
constants.

Jason Ekstrand (15):
  nir: Add another index to load_uniform to specify the range read
  i965/fs: Use UD type for offsets in VARYING_PULL_CONSTANT_LOAD
  i965/fs: Use MOV_INDIRECT for all indirect uniform loads
  i965/fs: Get rid of reladdr
  i965/fs: Stop relying on param_size in assign_constant_locations
  i965/fs: Get rid of the param_size array
  i965/vec4: Inline get_pull_constant_offset
  i965/vec4: Use MOV_INDIRECT instead of reladdr for indirect push
constants
  i965/vec4: Get rid of the uniform_size array
  i965/fs: Add support for doing MOV_INDIRECT on uniforms
  i965/fs: Don't force MASK_DISABLE on INDIRECT_MOV instructions
  i965/fs: Fix regs_read() for MOV_INDIRECT with a non-zero subnr
  i965/fs: Add support for MOV_INDIRECT on pre-Broadwell hardware
  i965/fs: Rename demote_pull_constants to lower_constant_loads
  i965/fs: Push small uniform arrays

 src/glsl/nir/nir_intrinsics.h |   7 +-
 src/glsl/nir/nir_lower_io.c   |   5 +
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 189 +-
 src/mesa/drivers/dri/i965/brw_fs.h|   4 +-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp|  52 --
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp  |  62 ---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp  |   3 -
 src/mesa/drivers/dri/i965/brw_ir_fs.h |   5 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  10 +-
 src/mesa/drivers/dri/i965/brw_vec4.h  |   7 +-
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp|  19 +--
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 130 ++-
 src/mesa/drivers/dri/i965/brw_vec4_vs_visitor.cpp |   1 -
 13 files changed, 267 insertions(+), 227 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] i965/vec4: Add support for SHADER_OPCODE_MOV_INDIRECT

2015-12-09 Thread Jason Ekstrand
On Wed, Dec 9, 2015 at 9:36 PM, Jason Ekstrand  wrote:
> This is an initial implementation of the MOV_INDIRECT opcode in the vec4
> backend.  Unfortunately, I haven't had a chance to test it in the wild yet,
> but I think review would still be good.  In particular, the approach I took
> to handling swizzles.
>
> Unfortunately, the only indirect MOV instructions you can use in align16
> mode have a uniform indirect.  This means that, in order to do an indirect
> MOV, we need to either do two movs or use align1 mode.  The problem with
> two MOVs is that, in order to force first/second half, you have to disable
> writemasking so you might as well be in align1.  In align1 mode, we have
> two options for indirects:  We could use height of 2 and make the hardware
> grab two sets of 4 consecutive dwords for us or we could use a UV
> immediate to add 0, 4, 8, and 12 to the four channels.  The second method
> only works easily on SNB+ because we don't have UV immediates on ILK and
> previous.  However, the first method (use a height of 2) may have
> interesting hardware implications if any of those sets of 4 dwords ever
> crosses a register boundary.  I didn't want to count software always giving
> us vec4-aligned offsets, so I decided on using the UV immediate.
>
> Ok, so swizzles.  One of the other advantages of adding a UV immediate is
> that we can make the immediate be whatever we want.  It doesn't have to be
> (0, 4, 8, 12); it could be anything.  So, I used it to implement swizzling.

Correction: This patch doesn't support swizzling.  But we could if we
wanted.  Do we want?

> Do we want swizzling?  Does it make sense to load a swizzled value starting
> at an arbitrary offset?  Does it make sense to support swizzling but not
> writemasking?  I don't know.  Thoughts?
>
> ---
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 45 
> 
>  1 file changed, 45 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index c3426dd..71a7f63 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -1052,6 +1052,48 @@ generate_set_simd4x2_header_gen9(struct brw_codegen *p,
>  }
>
>  static void
> +generate_mov_indirect(struct brw_codegen *p,
> +  vec4_instruction *inst,
> +  struct brw_reg dst, struct brw_reg reg,
> +  struct brw_reg indirect, struct brw_reg length)
> +{
> +   assert(indirect.type == BRW_REGISTER_TYPE_UD);
> +
> +   unsigned imm_byte_offset = reg.nr * REG_SIZE + reg.subnr * (REG_SIZE / 2);
> +
> +   /* This instruction acts in align1 mode */
> +   assert(inst->force_writemask_all || reg.writemask == 0xf);
> +
> +   brw_push_insn_state(p);
> +   brw_set_default_access_mode(p, BRW_ALIGN_1);
> +   brw_set_default_mask_control(p, BRW_MASK_DISABLE);
> +
> +   struct brw_reg addr = vec2(brw_address_reg(0));
> +
> +   /* We need to move the indirect value into the address register.  In order
> +* to make things make some sense, we want to respect at least the X
> +* component of the swizzle.  In order to do that, we need to convert the
> +* subnr (probably 0) to an align1 subnr and add in the swizzle.  We then
> +* use a region of <8,4,0>:uw to pick off the first 2 bytes of the 
> indirect
> +* and splat it out to all four channels of the given half of a0.
> +*/
> +   assert(brw_is_single_value_swizzle(indirect.swizzle));
> +   indirect.subnr = (indirect.subnr * 4 + BRW_GET_SWZ(indirect.swizzle, 0)) 
> * 2;
> +   indirect = stride(retype(indirect, BRW_REGISTER_TYPE_UW), 8, 4, 0);
> +
> +   brw_ADD(p, addr, indirect, brw_imm_uw(imm_byte_offset));
> +
> +   /* Use a <4,1> region Vx1 region*/
> +   struct brw_reg src = brw_VxH_indirect(0, 0);
> +   src.width = BRW_WIDTH_4;
> +   src.hstride = BRW_HORIZONTAL_STRIDE_1;
> +
> +   brw_MOV(p, dst, retype(src, reg.type));
> +
> +   brw_pop_insn_state(p);
> +}
> +
> +static void
>  generate_code(struct brw_codegen *p,
>const struct brw_compiler *compiler,
>void *log_data,
> @@ -1538,6 +1580,9 @@ generate_code(struct brw_codegen *p,
>   break;
>}
>
> +  case SHADER_OPCODE_MOV_INDIRECT:
> + generate_mov_indirect(p, inst, dst, src[0], src[1], src[2]);
> +
>default:
>   unreachable("Unsupported opcode");
>}
> --
> 2.5.0.400.gff86faf
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: also print hexadecimal values for register fields in the IB parser

2015-12-09 Thread Michel Dänzer
On 10.12.2015 07:54, Marek Olšák wrote:
> From: Marek Olšák 
> 
> ---
>  src/gallium/drivers/radeonsi/si_debug.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
> b/src/gallium/drivers/radeonsi/si_debug.c
> index cce665e..034acf5 100644
> --- a/src/gallium/drivers/radeonsi/si_debug.c
> +++ b/src/gallium/drivers/radeonsi/si_debug.c
> @@ -61,13 +61,16 @@ static void print_spaces(FILE *f, unsigned num)
>  static void print_value(FILE *file, uint32_t value, int bits)
>  {
>   /* Guess if it's int or float */
> - if (value <= (1 << 15))
> - fprintf(file, "%u\n", value);
> - else {
> + if (value <= (1 << 15)) {
> + if (value <= 9)
> + fprintf(file, "%u\n", value);
> + else
> + fprintf(file, "%u (0x%0*x)\n", value, bits / 4, value);
> + } else {
>   float f = uif(value);
>  
>   if (fabs(f) < 10 && f*10 == floor(f*10))
> - fprintf(file, "%.1ff\n", f);
> + fprintf(file, "%.1ff (0x%0*x)\n", f, bits / 4, value);
>   else
>   /* Don't print more leading zeros than there are bits. 
> */
>   fprintf(file, "0x%0*x\n", bits / 4, value);
> 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] draw: fix clipping with linear interpolated values and gl_ClipVertex

2015-12-09 Thread sroland
From: Roland Scheidegger 

Discovered this when working on other clip code, apparently didn't work
correctly - the combination of linear interpolated values and using
gl_ClipVertex produced wrong values (failing all such combinations
in piglits glsl-1.30 interpolation tests).
Use the pre-clip-pos values when determining the interpolation factor to
fix this.
Unfortunately I have no idea what I'm doing here really, but it fixes all
these failures in piglit (all interpolation-noperspective-XXX-vertex, 10
tests in total). Albeit piglit coverage of clipping isn't great, so hopefully
someone can confirm this actually makes sense, and wouldn't cause failures
elsewhere...
---
 src/gallium/auxiliary/draw/draw_pipe_clip.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_pipe_clip.c 
b/src/gallium/auxiliary/draw/draw_pipe_clip.c
index f2b56b0..7f22eef 100644
--- a/src/gallium/auxiliary/draw/draw_pipe_clip.c
+++ b/src/gallium/auxiliary/draw/draw_pipe_clip.c
@@ -192,11 +192,11 @@ static void interp(const struct clip_stage *clip,
   t_nopersp = t;
   /* find either in.x != out.x or in.y != out.y */
   for (k = 0; k < 2; k++) {
- if (in->clip[k] != out->clip[k]) {
+ if (in->pre_clip_pos[k] != out->pre_clip_pos[k]) {
 /* do divide by W, then compute linear interpolation factor */
-float in_coord = in->clip[k] / in->clip[3];
-float out_coord = out->clip[k] / out->clip[3];
-float dst_coord = dst->clip[k] / dst->clip[3];
+float in_coord = in->pre_clip_pos[k] / in->pre_clip_pos[3];
+float out_coord = out->pre_clip_pos[k] / out->pre_clip_pos[3];
+float dst_coord = dst->pre_clip_pos[k] / dst->pre_clip_pos[3];
 t_nopersp = (dst_coord - out_coord) / (in_coord - out_coord);
 break;
  }
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 5/5] i965: Skip execution size adjustment for instructions of width 4

2015-12-09 Thread Iago Toral
On Wed, 2015-12-09 at 08:10 -0800, Jason Ekstrand wrote:
> 
> On Dec 9, 2015 4:16 AM, "Iago Toral Quiroga" 
> wrote:
> >
> > This code in brw_set_dest adjusts the execution size of any
> instruction
> > with a dst.width < 8. However, we don't want to do this with
> instructions
> > operating on doubles, since these will have a width of 4, but still
> > need an execution size of 8 (for SIMD8). Unfortunately, we can't
> just check
> > the size of the operands involved to detect if we are doing an
> operation on
> > doubles, because we can have instructions that do operations on
> double
> > operands interpreted as UD, operating on any of its 2 32-bit
> components.
> >
> > Previous commits have made it so we never emit instructions with a
> horizontal
> > width of 4 that don't have the correct execution size set for
> gen7/gen8, so
> > we can skip it in this case, avoiding the conflicts with fp64
> requirements.
> >
> > Expanding the same fix to other hardware generations requires many
> more
> > changes but since we are not targetting fp64 support on them
> > wer don't really care for now.
> > ---
> >  src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > index 78f2c8c..50a8771 100644
> > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > @@ -202,8 +202,20 @@ brw_set_dest(struct brw_codegen *p, brw_inst
> *inst, struct brw_reg dest)
> > /* Generators should set a default exec_size of either 8
> (SIMD4x2 or SIMD8)
> >  * or 16 (SIMD16), as that's normally correct.  However, when
> dealing with
> >  * small registers, we automatically reduce it to match the
> register size.
> > +*
> > +* In platforms that support fp64 we can emit instructions with
> a width of
> > +* 4 that need two SIMD8 registers and an exec_size of 8 or 16.
> In these
> > +* cases we need to make sure that these instructions have their
> exec sizes
> > +* set properly when they are emitted and we can't rely on this
> code to fix
> > +* it.
> >  */
> > -   if (dest.width < BRW_EXECUTE_8)
> > +   bool fix_exec_size;
> > +   if (devinfo->gen == 7 || devinfo->gen == 8)
> 
> If we're doing to take this approach, we definitely want to make it
> gen > 6 or something so we include future gens.  Really gen > 4 is
> probably doable since the only real problem is the legacy clipping
> code.

Strips and fans is also a problem, but it is certainly doable if we want
to do it.

Iago


> > +  fix_exec_size = dest.width < BRW_EXECUTE_4;
> > +   else
> > +  fix_exec_size = dest.width < BRW_EXECUTE_8;
> > +
> > +   if (fix_exec_size)
> >brw_inst_set_exec_size(devinfo, inst, dest.width);
> >  }
> >
> > --
> > 2.1.4
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH 5/5] i965: Skip execution size adjustment for instructions of width 4

2015-12-09 Thread Jason Ekstrand
On Dec 9, 2015 11:47 PM, "Iago Toral"  wrote:
>
> On Wed, 2015-12-09 at 08:10 -0800, Jason Ekstrand wrote:
> >
> > On Dec 9, 2015 4:16 AM, "Iago Toral Quiroga" 
> > wrote:
> > >
> > > This code in brw_set_dest adjusts the execution size of any
> > instruction
> > > with a dst.width < 8. However, we don't want to do this with
> > instructions
> > > operating on doubles, since these will have a width of 4, but still
> > > need an execution size of 8 (for SIMD8). Unfortunately, we can't
> > just check
> > > the size of the operands involved to detect if we are doing an
> > operation on
> > > doubles, because we can have instructions that do operations on
> > double
> > > operands interpreted as UD, operating on any of its 2 32-bit
> > components.
> > >
> > > Previous commits have made it so we never emit instructions with a
> > horizontal
> > > width of 4 that don't have the correct execution size set for
> > gen7/gen8, so
> > > we can skip it in this case, avoiding the conflicts with fp64
> > requirements.
> > >
> > > Expanding the same fix to other hardware generations requires many
> > more
> > > changes but since we are not targetting fp64 support on them
> > > wer don't really care for now.
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_eu_emit.c | 14 +-
> > >  1 file changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > > index 78f2c8c..50a8771 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > > @@ -202,8 +202,20 @@ brw_set_dest(struct brw_codegen *p, brw_inst
> > *inst, struct brw_reg dest)
> > > /* Generators should set a default exec_size of either 8
> > (SIMD4x2 or SIMD8)
> > >  * or 16 (SIMD16), as that's normally correct.  However, when
> > dealing with
> > >  * small registers, we automatically reduce it to match the
> > register size.
> > > +*
> > > +* In platforms that support fp64 we can emit instructions with
> > a width of
> > > +* 4 that need two SIMD8 registers and an exec_size of 8 or 16.
> > In these
> > > +* cases we need to make sure that these instructions have their
> > exec sizes
> > > +* set properly when they are emitted and we can't rely on this
> > code to fix
> > > +* it.
> > >  */
> > > -   if (dest.width < BRW_EXECUTE_8)
> > > +   bool fix_exec_size;
> > > +   if (devinfo->gen == 7 || devinfo->gen == 8)
> >
> > If we're doing to take this approach, we definitely want to make it
> > gen > 6 or something so we include future gens.  Really gen > 4 is
> > probably doable since the only real problem is the legacy clipping
> > code.
>
> Strips and fans is also a problem, but it is certainly doable if we want
> to do it.

Yeah, my primary point is that we should make it as little of an edge-case
as possible.  We could go back to at least gen6 and we should go forward.
That said, it'll take a little testing from the Intel side.

> Iago
>
>
> > > +  fix_exec_size = dest.width < BRW_EXECUTE_4;
> > > +   else
> > > +  fix_exec_size = dest.width < BRW_EXECUTE_8;
> > > +
> > > +   if (fix_exec_size)
> > >brw_inst_set_exec_size(devinfo, inst, dest.width);
> > >  }
> > >
> > > --
> > > 2.1.4
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
> >
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] i965: Check base format to determine whether to use tiled memcpy

2015-12-09 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Thu, Nov 19, 2015 at 7:25 AM, Neil Roberts  wrote:
> The tiled memcpy doesn't work for copying from RGBX to RGBA because it
> doesn't override the alpha component to 1.0. Commit 2cebaac479d4 added
> a check to disable it for RGBX formats by looking at the TexFormat.
> However a lot of the rest of the code base is written with the
> assumption that an RGBA texture can be used internally to implement a
> GL_RGB texture. If that is done then this check breaks. This patch
> makes it instead check the base format of the texture which I think
> more directly matches the intention.
>
> Cc: Jason Ekstrand 
> ---
>  src/mesa/drivers/dri/i965/intel_pixel_read.c | 7 ---
>  src/mesa/drivers/dri/i965/intel_tex_image.c  | 7 ---
>  2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
> b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> index 9bcbbd1..c8aef65 100644
> --- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
> +++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
> @@ -135,10 +135,11 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx,
>return false;
>
> /* We can't handle copying from RGBX or BGRX because the tiled_memcpy
> -* function doesn't set the last channel to 1.
> +* function doesn't set the last channel to 1. Note this checks BaseFormat
> +* rather than TexFormat in case the RGBX format is being simulated with 
> an
> +* RGBA format.
>  */
> -   if (rb->Format == MESA_FORMAT_B8G8R8X8_UNORM ||
> -   rb->Format == MESA_FORMAT_R8G8B8X8_UNORM)
> +   if (rb->_BaseFormat == GL_RGB)
>return false;
>
> if (!intel_get_memcpy(rb->Format, format, type, _copy, ,
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
> b/src/mesa/drivers/dri/i965/intel_tex_image.c
> index 34b91e8..e3710da7 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_image.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
> @@ -399,10 +399,11 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context 
> *ctx,
>return false;
>
> /* We can't handle copying from RGBX or BGRX because the tiled_memcpy
> -* function doesn't set the last channel to 1.
> +* function doesn't set the last channel to 1. Note this checks BaseFormat
> +* rather than TexFormat in case the RGBX format is being simulated with 
> an
> +* RGBA format.
>  */
> -   if (texImage->TexFormat == MESA_FORMAT_B8G8R8X8_UNORM ||
> -   texImage->TexFormat == MESA_FORMAT_R8G8B8X8_UNORM)
> +   if (texImage->_BaseFormat == GL_RGB)
>return false;
>
> if (!intel_get_memcpy(texImage->TexFormat, format, type, _copy, ,
> --
> 1.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] glsl: use dual slot helper in the linker code.

2015-12-09 Thread Dave Airlie
On 9 December 2015 at 21:39, Timothy Arceri  wrote:
> On Wed, 2015-12-09 at 16:06 +1000, Dave Airlie wrote:
>> From: Dave Airlie 
>>
>> Signed-off-by: Dave Airlie 
>
> Great timing :) I was going to have to look into fixing this stuff for
> enhanced layouts.
>
> Patches 1 & 2 are:
> Reviewed-by: Timothy Arceri 
>
> I have a question about this patch. If these doubles only take up a
> single attribute then why do we even bother with this test? The spec
> says its optional and your fixing the counting up in later patches so
> what does it do thats useful?

It's complicated.

ARB_gpu_shader_fp64 passes doubles between shaders, dual slots ones
take two locations. So a dvec3[2] will consume locations 17/18, 19/20.
The limits
are in number of locations.

ARB_vertex_attrib_64bit allows doubles as vertex inputs. Dual slot ones
take up a single location, however they consume two slots with respect
to the hw limits. So a dvec3[2] will consume 17, 18. However it will take
4 slots against the vertex attrib limits.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


<    1   2