from:"Jose Maria Casanova Crespo"

[Mesa-dev] [PATCH 3/3] glsl: fix typos in comments "transfor" -> "transform"

2018-11-21 Thread Jose Maria Casanova Crespo

---
 src/compiler/glsl/ir.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index e09f053b77c..c3f5f1f7b05 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -773,17 +773,17 @@ public:
   unsigned is_xfb_per_vertex_output:1;
 
   /**
-   * Was a transfor feedback buffer set in the shader?
+   * Was a transform feedback buffer set in the shader?
*/
   unsigned explicit_xfb_buffer:1;
 
   /**
-   * Was a transfor feedback offset set in the shader?
+   * Was a transform feedback offset set in the shader?
*/
   unsigned explicit_xfb_offset:1;
 
   /**
-   * Was a transfor feedback stride set in the shader?
+   * Was a transform feedback stride set in the shader?
*/
   unsigned explicit_xfb_stride:1;
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] glsl: TCS outputs can not be transform feedback candidates on GLES

2018-11-21 Thread Jose Maria Casanova Crespo

Fixes: 
KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

Cc: mesa-sta...@lists.freedesktop.org
---
I think this patch and the previous one should be squashed or interchange
the order before landing. I'm sending splitted because it allows exposing
the incorrect behaviour on GLES.

 src/compiler/glsl/link_varyings.cpp | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/compiler/glsl/link_varyings.cpp 
b/src/compiler/glsl/link_varyings.cpp
index 1964dcc0a22..8bb90de8072 100644
--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -2502,10 +2502,31 @@ assign_varying_locations(struct gl_context *ctx,
 
  if (num_tfeedback_decls > 0) {
 tfeedback_candidate_generator g(mem_ctx, tfeedback_candidates);
-if (producer->Stage == MESA_SHADER_TESS_CTRL &&
-!output_var->data.patch)
-   output_var->data.is_xfb_per_vertex_output = true;
-g.process(output_var);
+
+/* From OpenGL 4.6 (Core Profile) spec, section 11.1.2.1
+ * ("Vertex Shader Variables / Output Variables")
+ *
+ * "Each program object can specify a set of output variables from
+ * one shader to be recorded in transform feedback mode (see
+ * section 13.3). The variables that can be recorded are those
+ * emitted by the first active shader, in order, from the
+ * following list:
+ *
+ *  * geometry shader
+ *  * tessellation evaluation shader
+ *  * tessellation control shader
+ *  * vertex shader"
+ *
+ * But on OpenGL ES 3.2, section 11.1.2.1 ("Vertex Shader
+ * Variables / Output Variables") tessellation control shader is
+ * not included in the stages list.
+ */
+if (!prog->IsES || producer->Stage != MESA_SHADER_TESS_CTRL) {
+   if (producer->Stage == MESA_SHADER_TESS_CTRL &&
+   !output_var->data.patch)
+  output_var->data.is_xfb_per_vertex_output = true;
+   g.process(output_var);
+}
  }
 
  ir_variable *const input_var =
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/3] glsl: XFB TSC per-vertex output varyings match as not declared as arrays

2018-11-21 Thread Jose Maria Casanova Crespo

Recent change on OpenGL CTS ("Use non-arrayed varying name for TCS blocks")
on KHR-GL*.tessellation_shader.single.xfb_captures_data_from_correct_stage
tests changed how to name per-vertex Tessellation Control Shader output
varyings in transform feedback using interface block as "BLOCK_INOUT.value"
rather than "BLOCK_INOUT[0].value"

So Tessellation control shader per-vertex output variables and blocks that
are required to be declared as arrays, with each element representing output
values for a single vertex of a multi-vertex primitive are expected to be
named as they were not declared as arrays.

This patch adds a new is_xfb_per_vertex_output flag at ir_variable level so
we mark when an ir_variable is an per-vertex TCS output varying. So we
treat it in terms on XFB its naming as a non array variable.

As we don't support NV_gpu_shader5, so PATCHES mode is not accepted as
primitiveMode parameter of BeginTransformFeedback the test expects a
failure as we can use the XFB results.

This patch uncovers that we were passing the GLES version of the tests
because candidates naming didn't match, not because on GLES the Tessellation
Control stage varyings shouldn't be XFB candidates in any case. This
is addressed in the following patch.

Fixes: KHR-GL4*.tessellation_shader.single.xfb_captures_data_from_correct_stage

Cc: mesa-sta...@lists.freedesktop.org
---
 src/compiler/glsl/ir.cpp| 1 +
 src/compiler/glsl/ir.h  | 6 ++
 src/compiler/glsl/link_uniforms.cpp | 6 --
 src/compiler/glsl/link_varyings.cpp | 8 +++-
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/compiler/glsl/ir.cpp b/src/compiler/glsl/ir.cpp
index 1d1a56ae9a5..582111d71f5 100644
--- a/src/compiler/glsl/ir.cpp
+++ b/src/compiler/glsl/ir.cpp
@@ -1750,6 +1750,7 @@ ir_variable::ir_variable(const struct glsl_type *type, 
const char *name,
this->data.fb_fetch_output = false;
this->data.bindless = false;
this->data.bound = false;
+   this->data.is_xfb_per_vertex_output = false;
 
if (type != NULL) {
   if (type->is_interface())
diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index f478b29a6b5..e09f053b77c 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -766,6 +766,12 @@ public:
*/
   unsigned is_xfb_only:1;
 
+  /**
+   * Is this varying a TSC per-vertex output candidate for transform
+   * feedback?
+   */
+  unsigned is_xfb_per_vertex_output:1;
+
   /**
* Was a transfor feedback buffer set in the shader?
*/
diff --git a/src/compiler/glsl/link_uniforms.cpp 
b/src/compiler/glsl/link_uniforms.cpp
index 63e688b19a7..547da68e216 100644
--- a/src/compiler/glsl/link_uniforms.cpp
+++ b/src/compiler/glsl/link_uniforms.cpp
@@ -72,8 +72,10 @@ program_resource_visitor::process(ir_variable *var, bool 
use_std430_as_default)
  get_internal_ifc_packing(use_std430_as_default) :
   var->type->get_internal_ifc_packing(use_std430_as_default);
 
-   const glsl_type *t =
-  var->data.from_named_ifc_block ? var->get_interface_type() : var->type;
+   const glsl_type *t = var->data.from_named_ifc_block ?
+  (var->data.is_xfb_per_vertex_output ?
+   var->get_interface_type()->without_array() :
+   var->get_interface_type()) : var->type;
const glsl_type *t_without_array = t->without_array();
 
/* false is always passed for the row_major parameter to the other
diff --git a/src/compiler/glsl/link_varyings.cpp 
b/src/compiler/glsl/link_varyings.cpp
index 52e493cb599..1964dcc0a22 100644
--- a/src/compiler/glsl/link_varyings.cpp
+++ b/src/compiler/glsl/link_varyings.cpp
@@ -2150,7 +2150,10 @@ private:
   tfeedback_candidate *candidate
  = rzalloc(this->mem_ctx, tfeedback_candidate);
   candidate->toplevel_var = this->toplevel_var;
-  candidate->type = type;
+  if (this->toplevel_var->data.is_xfb_per_vertex_output)
+ candidate->type = type->without_array();
+  else
+ candidate->type = type;
   candidate->offset = this->varying_floats;
   _mesa_hash_table_insert(this->tfeedback_candidates,
   ralloc_strdup(this->mem_ctx, name),
@@ -2499,6 +2502,9 @@ assign_varying_locations(struct gl_context *ctx,
 
  if (num_tfeedback_decls > 0) {
 tfeedback_candidate_generator g(mem_ctx, tfeedback_candidates);
+if (producer->Stage == MESA_SHADER_TESS_CTRL &&
+!output_var->data.patch)
+   output_var->data.is_xfb_per_vertex_output = true;
 g.process(output_var);
  }
 
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-29 Thread Jose Maria Casanova Crespo

These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial
read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
- Split original register_byte_use_pattern into one read and other
  write.
- Check for send like instructions using this->mlen != 0
- Pass functions src number and offset.
- Use periodic_mask function with code written by Francisco Jerez
  to simplify pattern generation.
- Avoid breaking silently if source straddles multiple GRFs.

v3: (Francisco Jerez)
- A SEND could be this->mlen != 0 or this->is_send_from_grf
- We only assume that a periodic mask with offset could be applied
  to reg_offset == 0.
- We can assure that for MOVs operations for any offset (Chema)

v4: (Francisco Jerez)
- We return 0 mask for reg_offset out of the region definition.
- We return periodic masks when access is in bounds for ALU opcodes.

v5: (Francisco Jerez)
- Mask can only be periodic when byte_offset < type_size * stride
  when reg_offset > 0.

Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs.cpp  | 121 +
 src/intel/compiler/brw_ir_fs.h |   2 +
 2 files changed, 123 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..d790b080e53 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -39,6 +39,7 @@
 #include "compiler/glsl_types.h"
 #include "compiler/nir/nir_builder.h"
 #include "program/prog_parameter.h"
+#include 
 
 using namespace brw;
 
@@ -687,6 +688,126 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x
+ *  periodic_mask(8, 4, 2, 2)  would return 0x
+ *  periodic_masc(8, 2, 2, 16) would return 0x
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+  m |= m << shift;
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize, the stride
+ * of the destination register and the internal register byte offset.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are written for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ *
+ * reg_offset identifies full registers starting at dst.reg with
+ * reg_offset == 0.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+
+   /* Instruction doesn't write out of bounds */
+   if (reg_offset >= regs_written(this))
+  return 0;
+
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+  return 0;
+
+   /* We assume that send destinations are completelly defined */
+   if (this->is_send_from_grf() || this->mlen != 0)
+  return ~0u;
+
+   /* The byte pattern is calculated using a periodic mask for ALU
+* operations and reg_offset in bounds.
+*/
+   unsigned step = this->dst.stride * type_sz(this->dst.type);
+   unsigned byte_offset = this->dst.offset % REG_SIZE;
+   if (reg_offset == 0 || byte_offset < step) {
+  return periodic_mask(this->exec_size, step, type_sz(this->dst.type),
+   byte_offset);
+   }
+
+   /* We don't know what was written so return 0 as safest choice */
+   return 0;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and the internal register byte offset.
+ *
+ * The objective of this function

[Mesa-dev] [PATCH v4 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-27 Thread Jose Maria Casanova Crespo

These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial
read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
- Split original register_byte_use_pattern into one read and other
  write.
- Check for send like instructions using this->mlen != 0
- Pass functions src number and offset.
- Use periodic_mask function with code written by Francisco Jerez
  to simplify pattern generation.
- Avoid breaking silently if source straddles multiple GRFs.

v3: (Francisco Jerez)
- A SEND could be this->mlen != 0 or this->is_send_from_grf
- We only assume that a periodic mask with offset could be applied
  to reg_offset == 0.
- We can assure that for MOVs operations for any offset (Chema)

v4: (Francisco Jerez)
- We return 0 mask for reg_offset out of the region definition.
- We return periodic masks when access is in bounds for ALU opcodes.

Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs.cpp  | 111 +
 src/intel/compiler/brw_ir_fs.h |   2 +
 2 files changed, 113 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..157b49c42d3 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -39,6 +39,7 @@
 #include "compiler/glsl_types.h"
 #include "compiler/nir/nir_builder.h"
 #include "program/prog_parameter.h"
+#include 
 
 using namespace brw;
 
@@ -687,6 +688,116 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x
+ *  periodic_mask(8, 4, 2, 2)  would return 0x
+ *  periodic_masc(8, 2, 2, 16) would return 0x
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+  m |= m << shift;
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize, the stride
+ * of the destination register and the internal register byte offset.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are written for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ *
+ * reg_offset identifies full registers starting at dst.reg with
+ * reg_offset == 0.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+
+   /* Instruction doesn't write out of bounds */
+   if (reg_offset >= regs_written(this))
+  return 0;
+
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+  return 0;
+
+   /* We assume that send destinations are completelly defined */
+   if (this->is_send_from_grf() || this->mlen != 0)
+  return ~0u;
+
+   /* The byte pattern is calculated using a periodic mask for ALU
+* operations and reg_offset in bounds.
+*/
+   return periodic_mask(this->exec_size,
+this->dst.stride * type_sz(this->dst.type),
+type_sz(this->dst.type),
+this->dst.offset % REG_SIZE);
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and the internal register byte offset.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are used for operations that don't read a full register.
+ *
+ * Parameter i identifies the instruction source number and reg_offset
+ * identifies full registers starting at src[i].reg with reg_offset == 0.
+ */
+unsigned

[Mesa-dev] [PATCH 2/2] intel/compiler: implement 8-bit constant load

2018-07-27 Thread Jose Maria Casanova Crespo

From: Iago Toral Quiroga 

---
 src/intel/compiler/brw_fs_nir.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 2c8595b9730..6e9a5829d3b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1587,6 +1587,11 @@ fs_visitor::nir_emit_load_const(const fs_builder ,
fs_reg reg = bld.vgrf(reg_type, instr->def.num_components);
 
switch (instr->def.bit_size) {
+   case 8:
+  for (unsigned i = 0; i < instr->def.num_components; i++)
+ bld.MOV(offset(reg, bld, i), setup_imm_b(bld, instr->value.i8[i]));
+  break;
+
case 16:
   for (unsigned i = 0; i < instr->def.num_components; i++)
  bld.MOV(offset(reg, bld, i), brw_imm_w(instr->value.i16[i]));
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/2] intel/compiler: Enable 8-bit constants

2018-07-27 Thread Jose Maria Casanova Crespo

New VK-GL-CTS tests that use VK_KHR_8bit_storage extension use
32-bit constants that are converted to 8-bit and there are stored in a
storage buffer.

Although 8-bit constants are not enabled by VK_KHR_8bit_storage
nir_opt_constant_folding already optimizes the 32 -> 8 integer
conversion to a 8-bit constant.

So we enable them in next two patches to not assert in this scenario.

Cc: Iago Toral Quiroga 
Cc: Jason Ekstrand 

Iago Toral Quiroga (2):
  intel/compiler: add setup_imm_(u)b helpers
  intel/compiler: implement 8-bit constant load

 src/intel/compiler/brw_fs.h   |  6 ++
 src/intel/compiler/brw_fs_nir.cpp | 21 +
 2 files changed, 27 insertions(+)

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/2] intel/compiler: add setup_imm_(u)b helpers

2018-07-27 Thread Jose Maria Casanova Crespo

From: Iago Toral Quiroga 

The hardware doesn't support byte immediates, so similar to setup_imm_df()
for doubles, these helpers work by loading the constant value into a
VGRF.
---
 src/intel/compiler/brw_fs.h   |  6 ++
 src/intel/compiler/brw_fs_nir.cpp | 16 
 2 files changed, 22 insertions(+)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 8ccd1659075..d56e33715ee 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -540,6 +540,12 @@ fs_reg shuffle_for_32bit_write(const brw::fs_builder ,
 fs_reg setup_imm_df(const brw::fs_builder ,
 double v);
 
+fs_reg setup_imm_b(const brw::fs_builder ,
+   int8_t v);
+
+fs_reg setup_imm_ub(const brw::fs_builder ,
+   uint8_t v);
+
 enum brw_barycentric_mode brw_barycentric_mode(enum glsl_interp_mode mode,
nir_intrinsic_op op);
 
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index a41dc2a47b8..2c8595b9730 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -5396,3 +5396,19 @@ setup_imm_df(const fs_builder , double v)
 
return component(retype(tmp, BRW_REGISTER_TYPE_DF), 0);
 }
+
+fs_reg
+setup_imm_b(const fs_builder , int8_t v)
+{
+   const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_B);
+   bld.MOV(tmp, brw_imm_w(v));
+   return tmp;
+}
+
+fs_reg
+setup_imm_ub(const fs_builder , uint8_t v)
+{
+   const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_UB);
+   bld.MOV(tmp, brw_imm_uw(v));
+   return tmp;
+}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] intel/fs: Write multiple 8/16-bit components with byte_scattered_write

2018-07-25 Thread Jose Maria Casanova Crespo

We also pack in the same byte_scattered_write message the maximum
number of 8/16-bit components.

Comments have been rewritten to adapt them to the 8-bit case.
---
 src/intel/compiler/brw_fs_nir.cpp | 66 ++-
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index a1f946708ed..7259acb862e 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4263,6 +4263,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  fs_reg write_src = offset(val_reg, bld, first_component);
 
  nir_const_value *const_offset = nir_src_as_const_value(instr->src[2]);
+ bool use_scattered_write = false;
 
  if (type_size > 4) {
 /* We can't write more than 2 64-bit components at once. Limit
@@ -4273,29 +4274,38 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
 write_src = shuffle_for_32bit_write(bld, write_src, 0,
 num_components);
  } else if (type_size < 4) {
-/* For 16-bit types we pack two consecutive values into a 32-bit
- * word and use an untyped write message. For single values or not
- * 32-bit-aligned we need to use byte-scattered writes because
- * untyped writes works with 32-bit components with 32-bit
- * alignment. byte_scattered_write messages only support one
- * 16-bit component at a time. As VK_KHR_relaxed_block_layout
- * could be enabled we can not guarantee that not constant offsets
- * to be 32-bit aligned for 16-bit types. For example an array, of
- * 16-bit vec3 with array element stride of 6.
+/* For 8/16-bit types we pack consecutive values into a 32-bit
+ * type and use an untyped write message. When size is not
+ * multiple of 4-bytes or offset is not 32-bit-aligned we need to
+ * use byte-scattered writes because they didn't require 32-bit
+ * components or 32-bit offset alignment. We can pack multiple
+ * 8/16-bit components on one 8/16/32-bit component used by the
+ * byte_scattered_write message.
+ *
+ * As VK_KHR_relaxed_block_layout could be requested and it is
+ * core in VK 1.1 we can not guarantee not constant offsets to be
+ * 32-bit aligned for 8/16-bit types. For example a 16-bit vec3
+ * begin with at offset 2 in a structure.
  *
  * In the case of 32-bit aligned constant offsets if there is
- * a 3-components vector we submit one untyped-write message
+ * a 16-bit vec3 we submit one untyped-write message
  * of 32-bit (first two components), and one byte-scattered
  * write message (the last component).
  */
-
-if ( !const_offset || ((const_offset->u32[0] +
-   type_size * first_component) % 4)) {
-   /* If we use a .yz writemask we also need to emit 2
-* byte-scattered write messages because of y-component not
-* being aligned to 32-bit.
+if (!const_offset || ((const_offset->u32[0] +
+   type_size * first_component) % 4) ||
+num_components * type_size < 4) {
+   /* If we don't have a constant offset or a constant offset
+* not 32-bit aligned or we are reading less than 32-bits then
+* we use byte_scattered_write with the maximum number of
+* components we can pack exactly into one 8/16/32-bit 
component.
+* So for a int8 vec3 we have to split into two one 16-bit and
+* another 8-bit writtings.
 */
-   num_components = 1;
+   use_scattered_write = true;
+   num_components = MIN2(4 / type_size, num_components);
+   if (num_components == 3)
+  num_components = 2;
 } else if (num_components * type_size > 4 &&
(num_components * type_size % 4)) {
/* If the pending components size is not a multiple of 4 bytes
@@ -4303,13 +4313,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
 * length == 1 with byte_scattered_write.
 */
num_components -= (num_components * type_size % 4) / type_size;
-} else if (num_components * type_size < 4) {
-   num_components = 1;
 }
 /* For num_components == 1 we are also shuffling the component
- * because byte scattered writes of 16-bit need values to be dword
- * aligned. Shuffling only one component would be

[Mesa-dev] [PATCH 1/2] intel/fs: Read multiple 8/16-bit components with byte_scattered_read

2018-07-25 Thread Jose Maria Casanova Crespo

We used the byte_scattered_read message because it allows to read from
non aligned 32-bit offsets. We were reading one component for each
message.

Using a 32-bit bitsize read at byte_scattered_read we can read up to two
16-bit components or four 8-bit components with only one message per
iteration.

The same applies for 16-bit bitsize for two 8-bit components read. In
the case of int8 vec3, we read them as 32-bit and we ignore the padding.

Cc: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 9b11b5fbd01..a1f946708ed 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2415,24 +2415,34 @@ do_untyped_vector_read(const fs_builder ,
  num_components);
   } else {
  fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
- for (unsigned i = 0; i < num_components; i++) {
-if (i == 0) {
+ unsigned iters = DIV_ROUND_UP(type_sz(dest.type) * num_components, 4);
+ for (unsigned it = 0; it < iters; it++) {
+if (it == 0) {
bld.MOV(read_offset, offset_reg);
 } else {
-   bld.ADD(read_offset, offset_reg,
-   brw_imm_ud(i * type_sz(dest.type)));
+   bld.ADD(read_offset, offset_reg, brw_imm_ud(4 * it));
 }
+unsigned iter_components = MIN2(4 / type_sz(dest.type),
+num_components);
+num_components -= iter_components;
+/* We adjust the bitsize_read to hold as many components we can in
+ * the same read message. We use 32-bit to read 8-bit vec3 but we
+ * ignore last padding.component.
+ */
+unsigned bitsize_read = util_next_power_of_two(8 * iter_components 
*
+   type_sz(dest.type));
 /* Non constant offsets are not guaranteed to be aligned 32-bits
- * so they are read using one byte_scattered_read message
- * for each component.
+ * for 8/16 bit componentes. We use byte_scattered_read for
+ * one or multiple components up to 4-bytes for iteration.
  */
 fs_reg read_result =
emit_byte_scattered_read(bld, surf_index, read_offset,
 1 /* dims */, 1,
-type_sz(dest.type) * 8 /* bit_size */,
+bitsize_read,
 BRW_PREDICATE_NONE);
-bld.MOV(offset(dest, bld, i),
-subscript (read_result, dest.type, 0));
+shuffle_from_32bit_read(bld, offset(dest, bld,
+it * 4 / type_sz(dest.type)),
+read_result, 0, iter_components);
  }
   }
} else if (type_sz(dest.type) == 4) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-23 Thread Jose Maria Casanova Crespo

These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
- Split original register_byte_use_pattern into one read and other
  write.
- Check for send like instructions using this->mlen != 0
- Pass functions src number and offset.
- Use periodic_mask function with code written by Francisco Jerez
  to simplify pattern generation.
- Avoid breaking silently if source straddles multiple GRFs.

v3: (Francisco Jerez)
- A SEND could be this->mlen != 0 or this->is_send_from_grf
- We only assume that a periodic mask with offset could be applied
  to reg_offset == 0.
- We can assure that for MOVs operations for any offset (Chema)

Cc: Francisco Jerez 

---
 src/intel/compiler/brw_fs.cpp  | 119 +
 src/intel/compiler/brw_ir_fs.h |   2 +
 2 files changed, 121 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..4fa0f154c44 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -39,6 +39,7 @@
 #include "compiler/glsl_types.h"
 #include "compiler/nir/nir_builder.h"
 #include "program/prog_parameter.h"
+#include 
 
 using namespace brw;
 
@@ -687,6 +688,124 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x
+ *  periodic_mask(8, 4, 2, 2)  would return 0x
+ *  periodic_masc(8, 2, 2, 16) would return 0x
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+  m |= m << shift;
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and the
+ * stride of the destination register.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are defined for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+  return 0u;
+   /* We assume that send destinations are completelly defined */
+   if (this->is_send_from_grf() || this->mlen != 0) {
+  return ~0u;
+   }
+
+   /* The byte pattern is calculated using a periodic mask for reg_offset == 0
+* because the internal offset will match how the register is written.
+*
+* We can for any reg_offset on MOV operations. We could add in the future
+* other opcodes, but we didn't include them until we have evidences of
+* them being used in partial write situations that ensure that the pattern
+* is repeated of any reg_offset.
+*/
+   if (reg_offset == 0 || this->opcode == BRW_OPCODE_MOV) {
+  return periodic_mask(this->exec_size,
+   this->dst.stride * type_sz(this->dst.type),
+   type_sz(this->dst.type),
+   this->dst.offset % REG_SIZE);
+   }
+   /* This shouldn't be reached by in liveness range calcluation but if
+* function is other context we know that we write a complete register.
+*/
+   if (!this->is_partial_write())
+  return ~0u;
+
+   /* By default we don't know what is written */
+   return 0u;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and a register offset.
+ *
+ * The objective of this function is to identify

[Mesa-dev] [PATCH v2 2/2] intel/fs: Improve liveness range calculation for partial writes

2018-07-19 Thread Jose Maria Casanova Crespo

We use the information of the registers read/write patterns
to improve variable liveness analysis avoiding extending the
liveness range of a variable to the beginning of the block so
it always reaches the beginning of the shader.

This optimization analyses inside each block if a partial write
defines completely the bytes used by a following instruction
in the block. So we are not in the case of the use of an undefined
value in the block.

This avoids almost all the spilling that happens with 8bit/16bit
storage tests, without any compilation performance impact for shader-db
execution that is compensated by spilling reductions.

At this moment we don't extend the logic to intra-block calculations
of livein/liveout to not hurt performance on the general case because of
not taking advance of BITWORD operations.

The execution time for running dEQP-VK.*8bit_storage.* tests is reduced
from 7m27.966s to 13.015s.

shader-bd on SKL shows improvements reducing spilling on
deus-ex-mankind-divided and dophin without increasing execution time.

total instructions in shared programs: 14867218 -> 14863959 (-0.02%)
instructions in affected programs: 121570 -> 118311 (-2.68%)
helped: 38
HURT: 0

total cycles in shared programs: 537923248 -> 537720965 (-0.04%)
cycles in affected programs: 63154229 -> 62951946 (-0.32%)
helped: 61
HURT: 26

total loops in shared programs: 4828 -> 4828 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 7790 -> 7375 (-5.33%)
spills in affected programs: 2824 -> 2409 (-14.70%)
helped: 35
HURT: 0

total fills in shared programs: 10557 -> 10024 (-5.05%)
fills in affected programs: 3752 -> 3219 (-14.21%)
helped: 38
HURT: 0

v2: - Use functions dst_write_pattern and src_read_pattern
  introduced in previous patch at v2.
- Avoid calculating read_pattern if defpartial is 0

Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs_live_variables.cpp | 61 
 src/intel/compiler/brw_fs_live_variables.h   | 13 -
 2 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
b/src/intel/compiler/brw_fs_live_variables.cpp
index 059f076fa51..d3559e3114f 100644
--- a/src/intel/compiler/brw_fs_live_variables.cpp
+++ b/src/intel/compiler/brw_fs_live_variables.cpp
@@ -54,9 +54,9 @@ using namespace brw;
 
 void
 fs_live_variables::setup_one_read(struct block_data *bd, fs_inst *inst,
-  int ip, const fs_reg )
+  int ip, int src, int reg_offset)
 {
-   int var = var_from_reg(reg);
+   int var = var_from_reg(inst->src[src]) + reg_offset;
assert(var < num_vars);
 
start[var] = MIN2(start[var], ip);
@@ -64,31 +64,48 @@ fs_live_variables::setup_one_read(struct block_data *bd, 
fs_inst *inst,
 
/* The use[] bitset marks when the block makes use of a variable (VGRF
 * channel) without having completely defined that variable within the
-* block.
+* block. We take into account that a partial write could have defined
+* completely the read bytes in the block.
 */
-   if (!BITSET_TEST(bd->def, var))
-  BITSET_SET(bd->use, var);
+   if (!BITSET_TEST(bd->def, var)) {
+  if (!bd->defpartial[var]) {
+ BITSET_SET(bd->use, var);
+  } else {
+ unsigned read_pattern = inst->src_read_pattern(src, reg_offset);
+ if ((bd->defpartial[var] & read_pattern) != read_pattern)
+BITSET_SET(bd->use, var);
+  }
+   }
 }
 
 void
 fs_live_variables::setup_one_write(struct block_data *bd, fs_inst *inst,
-   int ip, const fs_reg )
+   int ip, int reg_offset)
 {
-   int var = var_from_reg(reg);
+   int var = var_from_reg(inst->dst) + reg_offset;
assert(var < num_vars);
 
start[var] = MIN2(start[var], ip);
end[var] = MAX2(end[var], ip);
 
/* The def[] bitset marks when an initialization in a block completely
-* screens off previous updates of that variable (VGRF channel).
+* screens off previous updates of that variable (VGRF channel). If
+* we have a partial write now we store the write pattern so next
+* reads in the block can check if what they read was completelly screened
+* of by this partial write.
 */
-   if (inst->dst.file == VGRF) {
-  if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
+   assert(inst->dst.file == VGRF);
+   if(!BITSET_TEST(bd->use, var)) {
+  if (!inst->is_partial_write()) {
  BITSET_SET(bd->def, var);
-
-  BITSET_SET(bd->defout, var);
+ bd->defpartial[var] = ~0u;
+  } else {
+ bd->defpartial[var] |= inst->dst_write_pattern(reg_offset);
+ if (bd->defpartial[var] == ~0u)
+BITSET_SET(bd->def, var);
+  }
}
+   BITSET_SET(bd->defout, var);
 }
 
 /**
@@ -115,14 +132,9 @@ fs_live_variables::setup_def_use()
   foreach_inst_in_block(fs_inst, inst, block) {
 /* Set use[] for this

[Mesa-dev] [PATCH v2 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-19 Thread Jose Maria Casanova Crespo

These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
- Split original register_byte_use_pattern into one read and other
  write.
- Check for send like instructions using this->mlen != 0
- Pass functions src number and offset.
- Use periodic_mask function with code written by Francisco Jerez
  to simplify pattern generation.
- Avoid breaking silently if source straddles multiple GRFs.

Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs.cpp  | 87 ++
 src/intel/compiler/brw_ir_fs.h |  2 +
 2 files changed, 89 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..d06b057cdbf 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -687,6 +687,93 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x
+ *  periodic_mask(8, 4, 2, 2)  would return 0x
+ *  periodic_masc(8, 2, 2, 16) would return 0x
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, REG_SIZE);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+  m |= m << shift;
+
+   assert(offset + max - (step - bits) <= REG_SIZE);
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and the
+ * stride of the destination register.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are defined for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+  return 0u;
+   /* We assume that send destinations are completelly defined */
+   if (this->mlen > 0)
+  return ~0u;
+
+   return periodic_mask(this->exec_size,
+this->dst.stride * type_sz(this->dst.type),
+type_sz(this->dst.type),
+this->dst.offset % REG_SIZE);
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and a register offset.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are used for operations that don't read a full register.
+ */
+unsigned
+fs_inst::src_read_pattern(int i, unsigned reg_offset) const
+{
+   assert(src[i].file == VGRF);
+   /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
+* so the read pattern depends on the bitsize stored at src[4].
+*/
+   if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && i == 1)
+  return periodic_mask(8, 4, this->src[4].ud / 8, 0);
+
+   /* As for byte_scattered_write_logical but we need to take into account
+* that data written in the payload(src[0]) are now on reg_offset 1 on SIMD8
+* and reg_offset 2 and 3 on SIMD16.
+*/
+   if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && i == 0) {
+  if (DIV_ROUND_UP(reg_offset, (this->exec_size / 8)) == 1)
+ return periodic_mask(8, 4, this->src[2].ud / 8, 0);
+   }
+
+   return periodic_mask(this->exec_size,
+this->src[i].stride * type_sz(this->src[i].type),
+type_sz(this->src[i].type),
+this->src[i].offset % REG_SIZE);
+}
+
 unsigned
 fs_inst::components_read(unsigned i) const
 {
diff --git

[Mesa-dev] [PATCH 2/2] intel/fs: Improve liveness range calculation for partial writes

2018-07-13 Thread Jose Maria Casanova Crespo

We use the information of the registers read/write patterns
to improve variable liveness analysis avoiding extending the
liveness range of a variable to the beginning of the block so
it always reaches the beginning of the shader.

This optimization analyses inside each block that if a partial write
defines completely the bytes used by a following instruction we are
in the block we are not in the case of the use of undefined value in
the block.

This avoids almost all the spilling that happens with 8bit/16bit
storage tests, without any compilation performance impact for shader-db
execution that is compensated by spilling reductions.

At this moment we don't extend the logic to intra-block calculations
of livein/liveout to not hurt performance on the general case because of
not taking advance of BITWORD operations.

The execution time for running dEQP-VK.*8bit_storage.* tests is reduced
from 25m20,643s to 0m57,015s

shader-bd on SKL shows improvements reducing spilling on
deus-ex-mankind-divided and dophin without increasing execution time.

total instructions in shared programs: 14867218 -> 14863959 (-0.02%)
instructions in affected programs: 121570 -> 118311 (-2.68%)
helped: 38
HURT: 0

total cycles in shared programs: 537923248 -> 537720965 (-0.04%)
cycles in affected programs: 63154229 -> 62951946 (-0.32%)
helped: 61
HURT: 26

total loops in shared programs: 4828 -> 4828 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 7790 -> 7375 (-5.33%)
spills in affected programs: 2824 -> 2409 (-14.70%)
helped: 35
HURT: 0

total fills in shared programs: 10557 -> 10024 (-5.05%)
fills in affected programs: 3752 -> 3219 (-14.21%)
helped: 38
HURT: 0
---
 src/intel/compiler/brw_fs_live_variables.cpp | 32 +++-
 src/intel/compiler/brw_fs_live_variables.h   |  9 ++
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
b/src/intel/compiler/brw_fs_live_variables.cpp
index 059f076fa51..8947988df06 100644
--- a/src/intel/compiler/brw_fs_live_variables.cpp
+++ b/src/intel/compiler/brw_fs_live_variables.cpp
@@ -64,10 +64,15 @@ fs_live_variables::setup_one_read(struct block_data *bd, 
fs_inst *inst,
 
/* The use[] bitset marks when the block makes use of a variable (VGRF
 * channel) without having completely defined that variable within the
-* block.
+* block. We take into account that a partial write could have defined
+* completely the read bytes in the block.
 */
-   if (!BITSET_TEST(bd->def, var))
-  BITSET_SET(bd->use, var);
+   if (!BITSET_TEST(bd->def, var)) {
+  unsigned read_pattern = inst->register_byte_use_pattern(reg, false);
+  if ((bd->defpartial[var] & read_pattern) != read_pattern) {
+ BITSET_SET(bd->use, var);
+  }
+   }
 }
 
 void
@@ -81,14 +86,23 @@ fs_live_variables::setup_one_write(struct block_data *bd, 
fs_inst *inst,
end[var] = MAX2(end[var], ip);
 
/* The def[] bitset marks when an initialization in a block completely
-* screens off previous updates of that variable (VGRF channel).
+* screens off previous updates of that variable (VGRF channel). If
+* we have a partial write now we store the write pattern so next
+* reads in the block can check if what they read was completelly screened
+* of by this partial write.
 */
-   if (inst->dst.file == VGRF) {
-  if (!inst->is_partial_write() && !BITSET_TEST(bd->use, var))
+   assert(inst->dst.file == VGRF);
+   if(!BITSET_TEST(bd->use, var)) {
+  if (!inst->is_partial_write()) {
  BITSET_SET(bd->def, var);
-
-  BITSET_SET(bd->defout, var);
+ bd->defpartial[var] = ~0u;
+  } else {
+ bd->defpartial[var] |= inst->register_byte_use_pattern(reg, true);
+ if (bd->defpartial[var] == ~0u)
+BITSET_SET(bd->def, var);
+  }
}
+   BITSET_SET(bd->defout, var);
 }
 
 /**
@@ -281,6 +295,7 @@ fs_live_variables::fs_live_variables(fs_visitor *v, const 
cfg_t *cfg)
bitset_words = BITSET_WORDS(num_vars);
for (int i = 0; i < cfg->num_blocks; i++) {
   block_data[i].def = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
+  block_data[i].defpartial = rzalloc_array(mem_ctx, unsigned, num_vars);
   block_data[i].use = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
   block_data[i].livein = rzalloc_array(mem_ctx, BITSET_WORD, bitset_words);
   block_data[i].liveout = rzalloc_array(mem_ctx, BITSET_WORD, 
bitset_words);
@@ -319,6 +334,7 @@ fs_visitor::invalidate_live_intervals()
 void
 fs_visitor::calculate_live_intervals()
 {
+
if (this->live_intervals)
   return;
 
diff --git a/src/intel/compiler/brw_fs_live_variables.h 
b/src/intel/compiler/brw_fs_live_variables.h
index 9e95e443170..17f1c8177a0 100644
--- a/src/intel/compiler/brw_fs_live_variables.h
+++ b/src/intel/compiler/brw_fs_live_variables.h
@@ -44,6 +44,15 @@ struct block_data {
 */
BITSET_WORD *def;
 
+   /**
+

[Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst

2018-07-13 Thread Jose Maria Casanova Crespo

For a register source/destination of an instruction the function returns
the read/write byte pattern of a 32-byte registers as a unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the stride and if the register is source or destination.

The objective of the functions if to help to know the read/written bytes
of the instructions to improve the liveness analysis for partial read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.
---
 src/intel/compiler/brw_fs.cpp  | 183 +
 src/intel/compiler/brw_ir_fs.h |   1 +
 2 files changed, 184 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 2b8363ca362..f3045c4ff6c 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -687,6 +687,189 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read/written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and the register
+ * stride and the register is source or destination for the instruction.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are read or written for operations that don't read/write a full register.
+ * So we can identify in live range variable analysis if a partial write has
+ * completelly defined the part of the register used by a partial read. So we
+ * avoid extending the liveness range because all data read was already
+ * defined although the wasn't completely written.
+ */
+unsigned
+fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) const
+{
+   if (is_dst) {
+  /* We don't know what is written so we return the worts case */
+  if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+ return 0;
+  /* We assume that send destinations are completelly written */
+  if (this->is_send_from_grf())
+ return ~0u;
+   } else {
+  /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned
+   * so the read pattern depends on the bitsize stored at src[4]
+   */
+  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL &&
+  this->src[1].nr == r.nr) {
+ switch (this->src[4].ud) {
+ case 32:
+return ~0u;
+ case 16:
+return 0x;
+ case 8:
+return 0x;
+ default:
+unreachable("Unsupported bitsize at byte_scattered_write_logical");
+ }
+  }
+  /* As for byte_scattered_write_logical but we need to take into account
+   * that data written are in the payload offset 32 with SIMD8 and offset
+   * 64 with SIMD16.
+   */
+  if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE &&
+  this->src[0].nr == r.nr) {
+ fs_reg payload = this->src[0];
+ payload.offset = REG_SIZE * this->exec_size / 8;
+ if (regions_overlap(r, REG_SIZE,
+ payload, REG_SIZE * this->exec_size / 8)) {
+switch (this->src[2].ud) {
+case 32:
+   return ~0u;
+case 16:
+   return 0x;
+case 8:
+   return 0x;
+default:
+   unreachable("Unsupported bitsize at byte_scattered_write");
+}
+ } else {
+return ~0u;
+ }
+  }
+   }
+
+   /* We define the most conservative value in order to calculate liveness
+* range. If it is a destination nothing is defined and if is a source
+* all the bytes of the register could be read. So for release builds
+* the unreachables would have always safe return value. */
+   unsigned pattern =  is_dst ? 0 : ~0u;
+
+   /* In the general case we calculate the pattern for a specific register
+* on base of the type_size and stride. We calculate the SIMD8 pattern
+* and then we adjust the patter if needed for different exec_sizes
+* and offset
+*/
+   switch (type_sz(r.type)){
+   case 1:
+  switch (r.stride) {
+  case 0:
+ pattern = 0X1;
+ break;
+  case 1:
+ pattern = 0xff;
+ break;
+  case 2:
+ pattern = 0x;
+ break;
+  case 4:
+ pattern = 0x;
+ break;
+  case 8:
+ pattern = 0x01010101;
+ break;
+  default:
+ unreachable("Unknown pattern unsupported 8-bit stride");
+  }
+  break;
+   case 2:
+  switch (r.stride) {
+  case 0:
+ pattern = 0X3;
+ break;
+  case 1:
+ pattern = 0x;
+ break;
+  case 2:
+ pattern = 0x;
+ break;
+

[Mesa-dev] [PATCH 0/2] intel/fs: Liveness range improvements with partial writes

2018-07-13 Thread Jose Maria Casanova Crespo

This series deal with the main performance issue in shader compilation
performance when 8/16-bit types are used.

This series avoid extending the liveness range of variables that use a
partial register. We track in the liveness analysis which bytes of the
register are defined and not used before its definition inside each block.

This helps to remove almost all the spilling generated spuriously because
of the overextended liveness range. This increased register pressure
unnecessary. We improving 8bit_storage VK-CTS tests executions time 25x
faster removing almost all the spilling in the tests. The sames applys to
16bit_storage tests.

At the same time we have some nice improvements in shader-db without impact
in the execution time. See patch[2/2] for details.

In a follow up I would like to extend this approach to intra-block to
livein/liveout calculation but I've haven't found yet a case where I
see any improvements in the generated code and I have still pending to
deal with an important increase in compilation time in my WIP solution.

Jose Maria Casanova Crespo (2):
  intel/fs: New method for register_byte_use_pattern for fs_inst
  intel/fs: Improve liveness range calculation for partial writes

 src/intel/compiler/brw_fs.cpp| 183 +++
 src/intel/compiler/brw_fs_live_variables.cpp |  32 +++-
 src/intel/compiler/brw_fs_live_variables.h   |   9 +
 src/intel/compiler/brw_ir_fs.h   |   1 +
 4 files changed, 217 insertions(+), 8 deletions(-)

Cc: Francisco Jerez 
Cc: Jason Ekstrand 

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/fs: unspills shoudn't use grf127 as dest since Gen8+

2018-07-11 Thread Jose Maria Casanova Crespo

At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally asigned
to a GRFs.

In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator asigns registers that
include de grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."

So this patch activates the grf127_send_hack_node for Gen8+ and if we have
any register spilled we add interferences to the destination of the unspill
operations.

Found by Caio Marcelo de Oliveira Filho

Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends 
dest"
Cc: 18.1 
Cc: Caio Marcelo de Oliveira Filho 
Cc: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_reg_allocate.cpp | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index 59e047483c0..3ea2e7547c6 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -549,7 +549,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
if (devinfo->gen >= 7)
   node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
int grf127_send_hack_node = node_count;
-   if (devinfo->gen >= 8 && dispatch_width == 8)
+   if (devinfo->gen >= 8)
   node_count ++;
struct ra_graph *g =
   ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, node_count);
@@ -656,7 +656,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
   }
}
 
-   if (devinfo->gen >= 8 && dispatch_width == 8) {
+   if (devinfo->gen >= 8) {
   /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
* subsection "EUISA Instructions", Send Message (page 990):
*
@@ -671,13 +671,25 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
spill_all)
* overlap between sources and destination.
*/
   ra_set_node_reg(g, grf127_send_hack_node, 127);
-  foreach_block_and_inst(block, fs_inst, inst, cfg) {
- if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
-ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
+  if (dispatch_width == 8) {
+ foreach_block_and_inst(block, fs_inst, inst, cfg) {
+if (inst->is_send_from_grf() && inst->dst.file == VGRF)
+   ra_add_node_interference(g, inst->dst.nr, 
grf127_send_hack_node);
+ }
+  }
+
+  if (spilled_any_registers) {
+ foreach_block_and_inst(block, fs_inst, inst, cfg) {
+if ((inst->opcode == SHADER_OPCODE_GEN7_SCRATCH_READ ||
+inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ) &&
+inst->dst.file ==VGRF) {
+   ra_add_node_interference(g, inst->dst.nr, 
grf127_send_hack_node);
+}
  }
   }
}
 
+
/* Debug of register spilling: Go spill everything. */
if (unlikely(spill_all)) {
   int reg = choose_spill_reg(g);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 9/9] anv: Enable SPV_KHR_8bit_storage and VK_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo

Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.

Reviewed-by: Jason Ekstrand 
---
 src/intel/vulkan/anv_device.c  | 11 +++
 src/intel/vulkan/anv_extensions.py |  1 +
 src/intel/vulkan/anv_pipeline.c|  1 +
 3 files changed, 13 insertions(+)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 7b3ddbb9501..a8b0bd2fc3e 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -896,6 +896,17 @@ void anv_GetPhysicalDeviceFeatures2(
  break;
   }
 
+  case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_8BIT_STORAGE_FEATURES_KHR: {
+ VkPhysicalDevice8BitStorageFeaturesKHR *features =
+(VkPhysicalDevice8BitStorageFeaturesKHR *)ext;
+ ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice);
+
+ features->storageBuffer8BitAccess = pdevice->info.gen >= 8;
+ features->uniformAndStorageBuffer8BitAccess = pdevice->info.gen >= 8;
+ features->storagePushConstant8 = pdevice->info.gen >= 8;
+ break;
+  }
+
   default:
  anv_debug_ignored_stype(ext->sType);
  break;
diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 4179315a388..99529162781 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -72,6 +72,7 @@ MAX_API_VERSION = None # Computed later
 EXTENSIONS = [
 Extension('VK_ANDROID_native_buffer', 5, 'ANDROID'),
 Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
+Extension('VK_KHR_8bit_storage',  1, 'device->info.gen 
>= 8'),
 Extension('VK_KHR_bind_memory2',  1, True),
 Extension('VK_KHR_create_renderpass2',1, True),
 Extension('VK_KHR_dedicated_allocation',  1, True),
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index b0c9c3422a5..1565fe7a7a3 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -153,6 +153,7 @@ anv_shader_compile_to_nir(struct anv_pipeline *pipeline,
  .subgroup_shuffle = true,
  .subgroup_vote = true,
  .stencil_export = device->instance->physicalDevice.info.gen >= 9,
+ .storage_8bit = device->instance->physicalDevice.info.gen >= 8,
   },
};
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 8/9] spirv/nir: Add support for SPV_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo

Reviewed-by: Jason Ekstrand 
---
 src/compiler/shader_info.h| 1 +
 src/compiler/spirv/spirv_to_nir.c | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index 8c58ee285ec..3b95d5962c0 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -58,6 +58,7 @@ struct spirv_supported_capabilities {
bool runtime_descriptor_array;
bool stencil_export;
bool atomic_storage;
+   bool storage_8bit;
 };
 
 typedef struct shader_info {
diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index fb4211193fb..80a35b1b750 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -3498,6 +3498,12 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
  spv_check_supported(shader_viewport_index_layer, cap);
  break;
 
+  case SpvCapabilityStorageBuffer8BitAccess:
+  case SpvCapabilityUniformAndStorageBuffer8BitAccess:
+  case SpvCapabilityStoragePushConstant8:
+ spv_check_supported(storage_8bit, cap);
+ break;
+
   case SpvCapabilityInputAttachmentArrayDynamicIndexingEXT:
   case SpvCapabilityUniformTexelBufferArrayDynamicIndexingEXT:
   case SpvCapabilityStorageTexelBufferArrayDynamicIndexingEXT:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/9] i965/fs: Enable store_ssbo for 8-bit types.

2018-07-08 Thread Jose Maria Casanova Crespo

v2: Update comment according to this patch. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 4155b2ed996..62ec0df8994 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4284,7 +4284,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
 write_src = shuffle_for_32bit_write(bld, write_src, 0,
 num_components);
  } else if (type_size < 4) {
-assert(type_size == 2);
 /* For 16-bit types we pack two consecutive values into a 32-bit
  * word and use an untyped write message. For single values or not
  * 32-bit-aligned we need to use byte-scattered writes because
@@ -4308,12 +4307,15 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
 * being aligned to 32-bit.
 */
num_components = 1;
-} else if (num_components > 2 && (num_components % 2)) {
-   /* If there is an odd number of consecutive components we left
-* the not paired component for a following emit of length == 1
-* with byte_scattered_write.
+} else if (num_components * type_size > 4 &&
+   (num_components * type_size % 4)) {
+   /* If the pending components size is not a multiple of 4 bytes
+* we left the not aligned components for following emits of
+* length == 1 with byte_scattered_write.
 */
-   num_components --;
+   num_components -= (num_components * type_size % 4) / type_size;
+} else if (num_components * type_size < 4) {
+   num_components = 1;
 }
 /* For num_components == 1 we are also shuffling the component
  * because byte scattered writes of 16-bit need values to be dword
@@ -4337,7 +4339,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  }
 
  if (type_size < 4 && num_components == 1) {
-assert(type_size == 2);
 /* Untyped Surface messages have a fixed 32-bit size, so we need
  * to rely on byte scattered in order to write 16-bit elements.
  * The byte_scattered_write message needs that every written 16-bit
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/9] spirv: Include headers and grammar for SPV_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo

Update to headers and grammar to ff684ffc6a35d2a58f0f63108877d0064ea33feb
---
 src/compiler/spirv/spirv.core.grammar.json | 44 ++
 src/compiler/spirv/spirv.h |  3 ++
 2 files changed, 40 insertions(+), 7 deletions(-)

diff --git a/src/compiler/spirv/spirv.core.grammar.json 
b/src/compiler/spirv/spirv.core.grammar.json
index a03c024335c..cb641420d07 100644
--- a/src/compiler/spirv/spirv.core.grammar.json
+++ b/src/compiler/spirv/spirv.core.grammar.json
@@ -3914,7 +3914,7 @@
 { "kind" : "IdRef", "name" : "'Target'" },
 { "kind" : "Decoration" }
   ],
-  "extensions" : [ "SPV_GOOGLE_decorate_string" ],
+  "extensions" : [ "SPV_GOOGLE_decorate_string", 
"SPV_GOOGLE_hlsl_functionality1" ],
   "version" : "None"
 },
 {
@@ -3925,7 +3925,7 @@
 { "kind" : "LiteralInteger", "name" : "'Member'" },
 { "kind" : "Decoration" }
   ],
-  "extensions" : [ "SPV_GOOGLE_decorate_string" ],
+  "extensions" : [ "SPV_GOOGLE_decorate_string", 
"SPV_GOOGLE_hlsl_functionality1" ],
   "version" : "None"
 },
 {
@@ -3991,6 +3991,7 @@
 {
   "enumerant" : "ConstOffsets",
   "value" : "0x0020",
+  "capabilities" : [ "ImageGatherExtended" ],
   "parameters" : [
 { "kind" : "IdRef" }
   ]
@@ -5550,12 +5551,14 @@
   "enumerant" : "OverrideCoverageNV",
   "value" : 5248,
   "capabilities" : [ "SampleMaskOverrideCoverageNV" ],
+  "extensions" : [ "SPV_NV_sample_mask_override_coverage" ],
   "version" : "None"
 },
 {
   "enumerant" : "PassthroughNV",
   "value" : 5250,
   "capabilities" : [ "GeometryShaderPassthroughNV" ],
+  "extensions" : [ "SPV_NV_geometry_shader_passthrough" ],
   "version" : "None"
 },
 {
@@ -5568,6 +5571,7 @@
   "enumerant" : "SecondaryViewportRelativeNV",
   "value" : 5256,
   "capabilities" : [ "ShaderStereoViewNV" ],
+  "extensions" : [ "SPV_NV_stereo_view_rendering" ],
   "version" : "None",
   "parameters" : [
 { "kind" : "LiteralInteger", "name" : "'Offset'" }
@@ -5960,12 +5964,14 @@
   "enumerant" : "SecondaryPositionNV",
   "value" : 5257,
   "capabilities" : [ "ShaderStereoViewNV" ],
+  "extensions" : [ "SPV_NV_stereo_view_rendering" ],
   "version" : "None"
 },
 {
   "enumerant" : "SecondaryViewportMaskNV",
   "value" : 5258,
   "capabilities" : [ "ShaderStereoViewNV" ],
+  "extensions" : [ "SPV_NV_stereo_view_rendering" ],
   "version" : "None"
 },
 {
@@ -6043,17 +6049,23 @@
 {
   "enumerant" : "PartitionedReduceNV",
   "value" : 6,
-  "capabilities" : [ "GroupNonUniformPartitionedNV" ]
+  "capabilities" : [ "GroupNonUniformPartitionedNV" ],
+  "extensions" : [ "SPV_NV_shader_subgroup_partitioned" ],
+  "version" : "None"
 },
 {
   "enumerant" : "PartitionedInclusiveScanNV",
   "value" : 7,
-  "capabilities" : [ "GroupNonUniformPartitionedNV" ]
+  "capabilities" : [ "GroupNonUniformPartitionedNV" ],
+  "extensions" : [ "SPV_NV_shader_subgroup_partitioned" ],
+  "version" : "None"
 },
 {
   "enumerant" : "PartitionedExclusiveScanNV",
   "value" : 8,
-  "capabilities" : [ "GroupNonUniformPartitionedNV" ]
+  "capabilities" : [ "GroupNonUniformPartitionedNV" ],
+  "extensions" : [ "SPV_NV_shader_subgroup_partitioned" ],
+  "version" : "None"
 }
   ]
 },
@@ -6260,8 +6272,7 @@
 },
 {
   "enumerant" : "Int8",
-  "value" : 39,
-  "capabilities" : [ "Kernel" ]
+  "value" : 39
 },
 {
   "enumerant" : "InputAttachment",
@@ -6518,6 +6529,25 @@
   "extensions" : [ "SPV_KHR_post_depth_coverage" ],
   "version" : "None"
 },
+{
+  "enumerant" : "StorageBuffer8BitAccess",
+  "value" : 4448,
+  "extensions" : [ "SPV_KHR_8bit_storage" ],
+  "version" : "None"
+},
+{
+  "enumerant" : "UniformAndStorageBuffer8BitAccess",
+  "value" : 4449,
+  "capabilities" : [ "StorageBuffer8BitAccess" ],
+  "extensions" : [ "SPV_KHR_8bit_storage" ],
+  "version" : "None"
+},
+{
+  "enumerant" : "StoragePushConstant8",
+  "value" : 4450,
+  "extensions" : [ "SPV_KHR_8bit_storage" ],
+  "version" : "None"
+},
 {
   "enumerant" : "Float16ImageAMD",
   "value" : 5008,
diff --git a/src/compiler/spirv/spirv.h b/src/compiler/spirv/spirv.h
index e0a0330ba63..4c90c936ce0 100644
--- a/src/compiler/spirv/spirv.h
+++

[Mesa-dev] [PATCH 3/9] i965: Support for 8-bit base types in helper functions

2018-07-08 Thread Jose Maria Casanova Crespo

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 11 ++-
 src/intel/compiler/brw_nir.c  |  4 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 02ac92e62f1..83ed9575f80 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -303,10 +303,13 @@ brw_reg_type_from_bit_size(const unsigned bit_size,
   default:
  unreachable("Invalid bit size");
   }
+   case BRW_REGISTER_TYPE_B:
case BRW_REGISTER_TYPE_W:
case BRW_REGISTER_TYPE_D:
case BRW_REGISTER_TYPE_Q:
   switch(bit_size) {
+  case 8:
+ return BRW_REGISTER_TYPE_B;
   case 16:
  return BRW_REGISTER_TYPE_W;
   case 32:
@@ -316,10 +319,13 @@ brw_reg_type_from_bit_size(const unsigned bit_size,
   default:
  unreachable("Invalid bit size");
   }
+   case BRW_REGISTER_TYPE_UB:
case BRW_REGISTER_TYPE_UW:
case BRW_REGISTER_TYPE_UD:
case BRW_REGISTER_TYPE_UQ:
   switch(bit_size) {
+  case 8:
+ return BRW_REGISTER_TYPE_UB;
   case 16:
  return BRW_REGISTER_TYPE_UW;
   case 32:
@@ -1666,7 +1672,10 @@ fs_visitor::get_nir_dest(const nir_dest )
 {
if (dest.is_ssa) {
   const brw_reg_type reg_type =
- brw_reg_type_from_bit_size(dest.ssa.bit_size, BRW_REGISTER_TYPE_F);
+ brw_reg_type_from_bit_size(dest.ssa.bit_size,
+dest.ssa.bit_size == 8 ?
+BRW_REGISTER_TYPE_D :
+BRW_REGISTER_TYPE_F);
   nir_ssa_values[dest.ssa.index] =
  bld.vgrf(reg_type, dest.ssa.num_components);
   return nir_ssa_values[dest.ssa.index];
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 74b39ad80a2..5990427b731 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -887,6 +887,10 @@ brw_type_for_nir_type(const struct gen_device_info 
*devinfo, nir_alu_type type)
   return BRW_REGISTER_TYPE_W;
case nir_type_uint16:
   return BRW_REGISTER_TYPE_UW;
+   case nir_type_int8:
+  return BRW_REGISTER_TYPE_B;
+   case nir_type_uint8:
+  return BRW_REGISTER_TYPE_UB;
default:
   unreachable("unknown type");
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/9] intel/compiler: grf127 can not be dest when src and dest overlap in send

2018-07-08 Thread Jose Maria Casanova Crespo

Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):

"r127 must not be used for return address when there is a src and
dest overlap in send instruction."

v2: Style fixes (Matt Turner)

Reviewed-by: Matt Turner 
---
 src/intel/compiler/brw_eu_validate.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/intel/compiler/brw_eu_validate.c 
b/src/intel/compiler/brw_eu_validate.c
index d3189d1ef5e..29d1fe46f71 100644
--- a/src/intel/compiler/brw_eu_validate.c
+++ b/src/intel/compiler/brw_eu_validate.c
@@ -261,6 +261,17 @@ send_restrictions(const struct gen_device_info *devinfo,
   brw_inst_src0_da_reg_nr(devinfo, inst) < 112,
   "send with EOT must use g112-g127");
   }
+
+  if (devinfo->gen >= 8) {
+ ERROR_IF(!dst_is_null(devinfo, inst) &&
+  (brw_inst_dst_da_reg_nr(devinfo, inst) +
+   brw_inst_rlen(devinfo, inst) > 127) &&
+  (brw_inst_src0_da_reg_nr(devinfo, inst) +
+   brw_inst_mlen(devinfo, inst) >
+   brw_inst_dst_da_reg_nr(devinfo, inst)),
+  "r127 must not be used for return address when there is "
+  "a src and dest overlap");
+  }
}
 
return error_msg;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/9] i965/fs: Enable conversions to 8-bit integers

2018-07-08 Thread Jose Maria Casanova Crespo

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 83ed9575f80..4155b2ed996 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -834,6 +834,8 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_u2u16:
case nir_op_i2f16:
case nir_op_u2f16:
+   case nir_op_i2i8:
+   case nir_op_u2u8:
   inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
   break;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-07-08 Thread Jose Maria Casanova Crespo

Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:

dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
with a fixed assignation to grf127 and create interferences to send
message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
(Jose Maria Casanova)
---
 src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++
 1 file changed, 25 insertions(+)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index ec8e116cb38..59e047483c0 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
int first_mrf_hack_node = node_count;
if (devinfo->gen >= 7)
   node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
+   int grf127_send_hack_node = node_count;
+   if (devinfo->gen >= 8 && dispatch_width == 8)
+  node_count ++;
struct ra_graph *g =
   ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, node_count);
 
@@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
spill_all)
   }
}
 
+   if (devinfo->gen >= 8 && dispatch_width == 8) {
+  /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
+   * subsection "EUISA Instructions", Send Message (page 990):
+   *
+   * "r127 must not be used for return address when there is a src and
+   * dest overlap in send instruction."
+   *
+   * We are avoiding using grf127 as part of the destination of send
+   * messages adding a node interference to the grf127_send_hack_node.
+   * This node has a fixed asignment to grf127.
+   *
+   * We don't apply it to SIMD16 because previous code avoids any register
+   * overlap between sources and destination.
+   */
+  ra_set_node_reg(g, grf127_send_hack_node, 127);
+  foreach_block_and_inst(block, fs_inst, inst, cfg) {
+ if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
+ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
+ }
+  }
+   }
+
/* Debug of register spilling: Go spill everything. */
if (unlikely(spill_all)) {
   int reg = choose_spill_reg(g);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/9] anv: Enable VK_KHR_8bit_storage

2018-07-08 Thread Jose Maria Casanova Crespo

This series enables support to VK_KHR_8bit_storage vulkan extension
for anv. It enables all capabilities available for this extension
including StorageBuffer8BitAccess, UniformAndStorageBuffer8BitAccess
and StoragePushConstant8.

8-bit read operations from UBO and SSBO and push constants are already
supported by the backend so this series only implements the pending
write support for SSBO and the conversions to/from 8-bit integers.

This series is applied on top of VK_KHR_create_renderpass2 series
already sent by Jason that updates the vulkan XML and headers to 1.1.80.

Only patch 2 (fix for register allocator to avoid grf127 overlaps) and
7 (spir-v headers update) have pending review.

This series is organized as follows:

* Patchs 1-2 were already submitted, but patch 2 has pending review. They 
implement
  the restriction of "r127 must not be used for return address when there is a
  src and dest overlap in send instruction." We need to fix this to avoid faling
  2 CTS test of this new extensions.

* Patch 3 enables 8-bit support in some helpers.

* Patch 4 enable conversions to 8-bit integers.

* Patches 5-6 implement 8-bit write operations for SSBO. They also relax the
  requirements of the brw_eu_validate to allow raw movs of bytes altough there
  are difference between exec size and dest size.

* Patches 7-9 enable the Vulkan and SPIR-V 8bit_storage extensions. SPIR-V
  headers are updated.


With this series we pass all CTS VK_KHR_8bit_storage tests:

   dEQP-VK.spirv_assembly.instruction.*.8bit_storage.*


Jose Maria Casanova Crespo (9):
  intel/compiler: grf127 can not be dest when src and dest overlap in
send
  i965/fs: Register allocator shoudn't use grf127 for sends dest
  i965: Support for 8-bit base types in helper functions
  i965/fs: Enable conversions to 8-bit integers
  intel/compiler: relax brw_eu_validate for byte raw movs
  i965/fs: Enable store_ssbo for 8-bit types.
  spirv: Include headers and grammar for SPV_KHR_8bit_storage
  spirv/nir: Add support for SPV_KHR_8bit_storage
  anv: Enable SPV_KHR_8bit_storage and VK_KHR_8bit_storage

 src/compiler/shader_info.h |  1 +
 src/compiler/spirv/spirv.core.grammar.json | 44 ++
 src/compiler/spirv/spirv.h |  3 ++
 src/compiler/spirv/spirv_to_nir.c  |  5 +++
 src/intel/compiler/brw_eu_validate.c   | 19 --
 src/intel/compiler/brw_fs_nir.cpp  | 28 ++
 src/intel/compiler/brw_fs_reg_allocate.cpp | 25 
 src/intel/compiler/brw_nir.c   |  4 ++
 src/intel/vulkan/anv_device.c  | 11 ++
 src/intel/vulkan/anv_extensions.py |  1 +
 src/intel/vulkan/anv_pipeline.c|  1 +
 11 files changed, 124 insertions(+), 18 deletions(-)

Cc: Jason Ekstrand 
Cc: Iago Toral 

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/9] intel/compiler: relax brw_eu_validate for byte raw movs

2018-07-08 Thread Jose Maria Casanova Crespo

When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.

This restriction was only allowing stride==2 destinations
for 8-bit types.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_eu_validate.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_eu_validate.c 
b/src/intel/compiler/brw_eu_validate.c
index 29d1fe46f71..a25010b225c 100644
--- a/src/intel/compiler/brw_eu_validate.c
+++ b/src/intel/compiler/brw_eu_validate.c
@@ -472,9 +472,11 @@ general_restrictions_based_on_operand_types(const struct 
gen_device_info *devinf
   dst_type_size = 8;
 
if (exec_type_size > dst_type_size) {
-  ERROR_IF(dst_stride * dst_type_size != exec_type_size,
-   "Destination stride must be equal to the ratio of the sizes of "
-   "the execution data type to the destination type");
+  if (!(dst_type_is_byte && inst_is_raw_move(devinfo, inst))) {
+ ERROR_IF(dst_stride * dst_type_size != exec_type_size,
+  "Destination stride must be equal to the ratio of the sizes "
+  "of the execution data type to the destination type");
+  }
 
   unsigned subreg = brw_inst_dst_da1_subreg_nr(devinfo, inst);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] anv: finish the binding_table_pool on destroyDevice when use_softpin

2018-06-28 Thread Jose Maria Casanova Crespo

Running VK-CTS in batch execution mode was raising the
VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the
same failing tests were run isolated they always passed.

createDevice and destroyDevice were called before and after every
tests. Because the binding_table_pool was never closed, we reached the
maximum number of open file descriptors (ulimit -n) and when that
happened every call to createDevice implied a
VK_ERROR_INITIALIZATION_FAILED error.

Fixes: c7db0ed4e94dce563d722e1b098684fbd7315d51
  ("anv: Use a separate pool for binding tables when soft pinning")


Cc: Scott D Phillips 
Cc: Jason Ekstrand 

---
 src/intel/vulkan/anv_device.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index ea24a0ad03d..5266b269244 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -1782,6 +1782,7 @@ void anv_DestroyDevice(
 const VkAllocationCallbacks*pAllocator)
 {
ANV_FROM_HANDLE(anv_device, device, _device);
+   struct anv_physical_device *physical_device = 
>instance->physicalDevice;
 
if (!device)
   return;
@@ -1808,6 +1809,8 @@ void anv_DestroyDevice(
if (device->info.gen >= 10)
   anv_gem_close(device, device->hiz_clear_bo.gem_handle);
 
+   if (physical_device->use_softpin)
+  anv_state_pool_finish(>binding_table_pool);
anv_state_pool_finish(>surface_state_pool);
anv_state_pool_finish(>instruction_state_pool);
anv_state_pool_finish(>dynamic_state_pool);
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/14] intel/fs: general 8/16/32/64-bit shuffle_src_to_dst function (v2)

2018-06-14 Thread Jose Maria Casanova Crespo

This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.

If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.

Component units are measured in terms of the smaller type between
source and destination. As we are un/shuffling the smaller components
from/into a bigger one.

The operation allows to skip first_component number of components from
the source.

Shuffle MOVs are retyped using integer types avoiding problems with
denorms and float types if source and destination bitsize is different.
This allows to simplify uses of shuffle functions that are dealing with
these retypes individually.

Now there is a new restriction so source and destination can not overlap
anymore when calling this shuffle function. Following patches that migrate
to use this new function will take care individually of avoiding source
and destination overlaps.

v2: (Jason Ekstrand)
- Rewrite overlap asserts.
- Manage type_sz(src.type) == type_sz(dst.type) case using MOVs
  from source to dest. This works for 64-bit to 64-bits
  operation that on Gen7 as it doesn't support Q registers.
- Explain that components units are based in the smallest type.

Cc: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 100 ++
 1 file changed, 100 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 166da0aa6d7..9c5afc9c46f 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -5362,6 +5362,106 @@ shuffle_16bit_data_for_32bit_write(const fs_builder 
,
}
 }
 
+/*
+ * This helper takes a source register and un/shuffles it into the destination
+ * register.
+ *
+ * If source type size is smaller than destination type size the operation
+ * needed is a component shuffle. The opposite case would be an unshuffle. If
+ * source/destination type size is equal a shuffle is done that would be
+ * equivalent to a simple MOV.
+ *
+ * For example, if source is a 16-bit type and destination is 32-bit. A 3
+ * components .xyz 16-bit vector on SIMD8 would be.
+ *
+ *|x1|x2|x3|x4|x5|x6|x7|x8|y1|y2|y3|y4|y5|y6|y7|y8|
+ *|z1|z2|z3|z4|z5|z6|z7|z8|  |  |  |  |  |  |  |  |
+ *
+ * This helper will return the following 2 32-bit components with the 16-bit
+ * values shuffled:
+ *
+ *|x1 y1|x2 y2|x3 y3|x4 y4|x5 y5|x6 y6|x7 y7|x8 y8|
+ *|z1   |z2   |z3   |z4   |z5   |z6   |z7   |z8   |
+ *
+ * For unshuffle, the example would be the opposite, a 64-bit type source
+ * and a 32-bit destination. A 2 component .xy 64-bit vector on SIMD8
+ * would be:
+ *
+ *| x1l   x1h | x2l   x2h | x3l   x3h | x4l   x4h |
+ *| x5l   x5h | x6l   x6h | x7l   x7h | x8l   x8h |
+ *| y1l   y1h | y2l   y2h | y3l   y3h | y4l   y4h |
+ *| y5l   y5h | y6l   y6h | y7l   y7h | y8l   y8h |
+ *
+ * The returned result would be the following 4 32-bit components unshuffled:
+ *
+ *| x1l | x2l | x3l | x4l | x5l | x6l | x7l | x8l |
+ *| x1h | x2h | x3h | x4h | x5h | x6h | x7h | x8h |
+ *| y1l | y2l | y3l | y4l | y5l | y6l | y7l | y8l |
+ *| y1h | y2h | y3h | y4h | y5h | y6h | y7h | y8h |
+ *
+ * - Source and destination register must not be overlapped.
+ * - components units are measured in terms of the smaller type between
+ *   source and destination because we are un/shuffling the smaller
+ *   components from/into the bigger ones.
+ * - first_component parameter allows skipping source components.
+ */
+void
+shuffle_src_to_dst(const fs_builder ,
+   const fs_reg ,
+   const fs_reg ,
+   uint32_t first_component,
+   uint32_t components)
+{
+   if (type_sz(src.type) == type_sz(dst.type)) {
+  assert(!regions_overlap(dst,
+ type_sz(dst.type) * bld.dispatch_width() * components,
+ offset(src, bld, first_component),
+ type_sz(src.type) * bld.dispatch_width() * components));
+  for (unsigned i = 0; i < components; i++) {
+ bld.MOV(retype(offset(dst, bld, i), src.type),
+ offset(src, bld, i + first_component));
+  }
+   } else if (type_sz(src.type) < type_sz(dst.type)) {
+  /* Source is shuffled into destination */
+  unsigned size_ratio = type_sz(dst.type) / type_sz(src.type);
+  assert(!regions_overlap(dst,
+ type_sz(dst.type) * bld.dispatch_width() *
+ DIV_ROUND_UP(components, size_ratio),
+ offset(src, bld, first_component),
+ type_sz(src.type) * bld.dispatch_width() * components));
+
+  brw_reg_type shuffle_type =
+ brw_reg_type_from_bit_size(8 * type_sz(src.type),
+BRW_REGISTER_TYPE_D);
+  for (unsigned i = 0; i < components; i++) {
+ fs_reg shuffle_component_i =
+subscript(offset(dst, bld, i / size_ratio),
+

[Mesa-dev] [PATCH] intel/fs: use uint type for per_slot_offset at GS

2018-06-12 Thread Jose Maria Casanova Crespo

This helps us to compact original instruction:

mul(8)  g3<1>D  g6<8,8,1>UD  0x0006UD { align1 1Q };

So now we emit:

mul(8)  g3<1>UD g6<8,8,1>UD  0x0006UD { align1 1Q compacted };
---
 src/intel/compiler/brw_fs_visitor.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_visitor.cpp 
b/src/intel/compiler/brw_fs_visitor.cpp
index a24808eac69..7159d78a86d 100644
--- a/src/intel/compiler/brw_fs_visitor.cpp
+++ b/src/intel/compiler/brw_fs_visitor.cpp
@@ -597,7 +597,7 @@ fs_visitor::emit_urb_writes(const fs_reg _vertex_count)
  per_slot_offsets = brw_imm_ud(output_vertex_size_owords *
gs_vertex_count.ud);
   } else {
- per_slot_offsets = vgrf(glsl_type::int_type);
+ per_slot_offsets = vgrf(glsl_type::uint_type);
  bld.MUL(per_slot_offsets, gs_vertex_count,
  brw_imm_ud(output_vertex_size_owords));
   }
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 13/14] intel/compiler: use new shuffle_32bit_write for all 64-bit storage writes

2018-06-09 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_fs_nir.cpp | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 2521f3c001b..833fad4247a 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2839,8 +2839,7 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
 * for that.
 */
unsigned channel = iter * 2 + i;
-   fs_reg dest = shuffle_64bit_data_for_32bit_write(bld,
-  offset(value, bld, channel), 1);
+   fs_reg dest = shuffle_for_32bit_write(bld, value, channel, 1);
 
srcs[header_regs + (i + first_component) * 2] = dest;
srcs[header_regs + (i + first_component) * 2 + 1] =
@@ -3694,8 +3693,8 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder ,
   unsigned type_size = 4;
   if (nir_src_bit_size(instr->src[0]) == 64) {
  type_size = 8;
- val_reg = shuffle_64bit_data_for_32bit_write(bld,
-val_reg, instr->num_components);
+ val_reg = shuffle_for_32bit_write(bld, val_reg, 0,
+   instr->num_components);
   }
 
   unsigned type_slots = type_size / 4;
@@ -4236,8 +4235,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  * iteration handle the rest.
  */
 num_components = MIN2(2, num_components);
-write_src = shuffle_64bit_data_for_32bit_write(bld, write_src,
-   num_components);
+write_src = shuffle_for_32bit_write(bld, write_src, 0,
+num_components);
  } else if (type_size < 4) {
 assert(type_size == 2);
 /* For 16-bit types we pack two consecutive values into a 32-bit
@@ -4333,7 +4332,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   unsigned num_components = instr->num_components;
   unsigned first_component = nir_intrinsic_component(instr);
   if (nir_src_bit_size(instr->src[0]) == 64) {
- src = shuffle_64bit_data_for_32bit_write(bld, src, num_components);
+ src = shuffle_for_32bit_write(bld, src, 0, num_components);
  num_components *= 2;
   }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 14/14] intel/compiler: shuffle_64bit_data_for_32bit_write is not used anymore

2018-06-09 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_fs.h   |  4 
 src/intel/compiler/brw_fs_nir.cpp | 32 ---
 2 files changed, 36 deletions(-)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 1f86f17ccbb..17b1368d522 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -499,10 +499,6 @@ private:
void *mem_ctx;
 };
 
-fs_reg shuffle_64bit_data_for_32bit_write(const brw::fs_builder ,
-  const fs_reg ,
-  uint32_t components);
-
 void shuffle_from_32bit_read(const brw::fs_builder ,
  const fs_reg ,
  const fs_reg ,
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 833fad4247a..f68fe5f1d1a 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -5187,38 +5187,6 @@ fs_visitor::nir_emit_jump(const fs_builder , 
nir_jump_instr *instr)
}
 }
 
-/**
- * This helper does the inverse operation of
- * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
- *
- * We need to do this when we are going to use untyped write messsages that
- * operate with 32-bit components in order to arrange our 64-bit data to be
- * in the expected layout.
- *
- * Notice that callers of this function, unlike in the case of the inverse
- * operation, would typically need to call this with dst and src being
- * different registers, since they would otherwise corrupt the original
- * 64-bit data they are about to write. Because of this the function checks
- * that the src and dst regions involved in the operation do not overlap.
- */
-fs_reg
-shuffle_64bit_data_for_32bit_write(const fs_builder ,
-   const fs_reg ,
-   uint32_t components)
-{
-   assert(type_sz(src.type) == 8);
-
-   fs_reg dst = bld.vgrf(BRW_REGISTER_TYPE_D, 2 * components);
-
-   for (unsigned i = 0; i < components; i++) {
-  const fs_reg component_i = offset(src, bld, i);
-  bld.MOV(offset(dst, bld, 2 * i), subscript(component_i, dst.type, 0));
-  bld.MOV(offset(dst, bld, 2 * i + 1), subscript(component_i, dst.type, 
1));
-   }
-
-   return dst;
-}
-
 /*
  * This helper takes a source register and un/shuffles it into the destination
  * register.
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/14] intel/compiler: Use shuffle_from_32bit_read at VS load_input

2018-06-09 Thread Jose Maria Casanova Crespo

shuffle_from_32bit_read manages 32-bit reads to 32-bit destination
in the same way that the previous loop so now we just call the new
function for all bitsizes, simplifying also the 64-bit load_input.
---
 src/intel/compiler/brw_fs_nir.cpp | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 6abc7c0174d..fedf3bf5a83 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2483,16 +2483,8 @@ fs_visitor::nir_emit_vs_intrinsic(const fs_builder ,
   if (type_sz(dest.type) == 8)
  first_component /= 2;
 
-  for (unsigned j = 0; j < num_components; j++) {
- bld.MOV(offset(dest, bld, j), offset(src, bld, j + first_component));
-  }
-
-  if (type_sz(dest.type) == 8) {
- shuffle_32bit_load_result_to_64bit_data(bld,
- dest,
- retype(dest, 
BRW_REGISTER_TYPE_F),
- instr->num_components);
-  }
+  shuffle_from_32bit_read(bld, dest, retype(src, BRW_REGISTER_TYPE_D),
+  first_component, num_components);
   break;
}
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/14] intel/compiler: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES

2018-06-09 Thread Jose Maria Casanova Crespo

Previously, the shuffle function had a source/destination overlap that
needs to be avoided to use shuffle_from_32bit_read. As we can use for
the shuffle destination the destination of removed MOVs.

This change also avoids the internal MOVs done by the previous shuffle
to deal with possible overlaps.
---
 src/intel/compiler/brw_fs_nir.cpp | 22 --
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index fedf3bf5a83..11b707e57a8 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2662,13 +2662,10 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder 
,
   * or SSBOs.
   */
  if (type_sz(dst.type) == 8) {
-shuffle_32bit_load_result_to_64bit_data(
-   bld, dst, retype(dst, BRW_REGISTER_TYPE_F), num_components);
-
-for (unsigned c = 0; c < num_components; c++) {
-   bld.MOV(offset(orig_dst, bld, iter * 2 + c),
-   offset(dst, bld, c));
-}
+shuffle_from_32bit_read(bld,
+offset(orig_dst, bld, iter * 2),
+retype(dst, BRW_REGISTER_TYPE_D),
+0, num_components);
  }
 
  /* Copy the temporary to the destination to deal with writemasking.
@@ -3011,13 +3008,10 @@ fs_visitor::nir_emit_tes_intrinsic(const fs_builder 
,
  * or SSBOs.
  */
 if (type_sz(dest.type) == 8) {
-   shuffle_32bit_load_result_to_64bit_data(
-  bld, dest, retype(dest, BRW_REGISTER_TYPE_F), 
num_components);
-
-   for (unsigned c = 0; c < num_components; c++) {
-  bld.MOV(offset(orig_dest, bld, iter * 2 + c),
-  offset(dest, bld, c));
-   }
+   shuffle_from_32bit_read(bld,
+   offset(orig_dest, bld, iter * 2),
+   retype(dest, BRW_REGISTER_TYPE_D),
+   0, num_components);
 }
 
 /* If we are loading double data and we need a second read message
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 12/14] intel/compiler: shuffle_32bit_load_result_to_64bit_data is not used anymore

2018-06-09 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_fs.h   |  5 ---
 src/intel/compiler/brw_fs_nir.cpp | 53 ---
 2 files changed, 58 deletions(-)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index d72164ae0b6..1f86f17ccbb 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -499,11 +499,6 @@ private:
void *mem_ctx;
 };
 
-void shuffle_32bit_load_result_to_64bit_data(const brw::fs_builder ,
- const fs_reg ,
- const fs_reg ,
- uint32_t components);
-
 fs_reg shuffle_64bit_data_for_32bit_write(const brw::fs_builder ,
   const fs_reg ,
   uint32_t components);
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 7e0ef2f34a9..2521f3c001b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -5188,59 +5188,6 @@ fs_visitor::nir_emit_jump(const fs_builder , 
nir_jump_instr *instr)
}
 }
 
-/**
- * This helper takes the result of a load operation that reads 32-bit elements
- * in this format:
- *
- * x x x x x x x x
- * y y y y y y y y
- * z z z z z z z z
- * w w w w w w w w
- *
- * and shuffles the data to get this:
- *
- * x y x y x y x y
- * x y x y x y x y
- * z w z w z w z w
- * z w z w z w z w
- *
- * Which is exactly what we want if the load is reading 64-bit components
- * like doubles, where x represents the low 32-bit of the x double component
- * and y represents the high 32-bit of the x double component (likewise with
- * z and w for double component y). The parameter @components represents
- * the number of 64-bit components present in @src. This would typically be
- * 2 at most, since we can only fit 2 double elements in the result of a
- * vec4 load.
- *
- * Notice that @dst and @src can be the same register.
- */
-void
-shuffle_32bit_load_result_to_64bit_data(const fs_builder ,
-const fs_reg ,
-const fs_reg ,
-uint32_t components)
-{
-   assert(type_sz(src.type) == 4);
-   assert(type_sz(dst.type) == 8);
-
-   /* A temporary that we will use to shuffle the 32-bit data of each
-* component in the vector into valid 64-bit data. We can't write directly
-* to dst because dst can be (and would usually be) the same as src
-* and in that case the first MOV in the loop below would overwrite the
-* data read in the second MOV.
-*/
-   fs_reg tmp = bld.vgrf(dst.type);
-
-   for (unsigned i = 0; i < components; i++) {
-  const fs_reg component_i = offset(src, bld, 2 * i);
-
-  bld.MOV(subscript(tmp, src.type, 0), component_i);
-  bld.MOV(subscript(tmp, src.type, 1), offset(component_i, bld, 1));
-
-  bld.MOV(offset(dst, bld, i), tmp);
-   }
-}
-
 /**
  * This helper does the inverse operation of
  * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/14] intel/compiler: use shuffle_from_32bit_read for 64-bit FS load_input

2018-06-09 Thread Jose Maria Casanova Crespo

As the previous use of shuffle_32bit_load_result_to_64bit_data
had a source/destination overlap for 64-bit. Now a temporal destination
is used for 64-bit cases to use shuffle_from_32bit_read that doesn't
handle src/dst overlaps.
---
 src/intel/compiler/brw_fs_nir.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 11b707e57a8..7e0ef2f34a9 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3350,6 +3350,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
   unsigned base = nir_intrinsic_base(instr);
   unsigned comp = nir_intrinsic_component(instr);
   unsigned num_components = instr->num_components;
+  fs_reg orig_dest = dest;
   enum brw_reg_type type = dest.type;
 
   /* Special case fields in the VUE header */
@@ -3365,6 +3366,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
   */
  type = BRW_REGISTER_TYPE_F;
  num_components *= 2;
+ dest = bld.vgrf(type, num_components);
   }
 
   for (unsigned int i = 0; i < num_components; i++) {
@@ -3373,10 +3375,8 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
   }
 
   if (nir_dest_bit_size(instr->dest) == 64) {
- shuffle_32bit_load_result_to_64bit_data(bld,
- dest,
- retype(dest, type),
- instr->num_components);
+ shuffle_from_32bit_read(bld, orig_dest, dest, 0,
+ instr->num_components);
   }
   break;
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/14] intel/compiler: Use shuffle_from_32bit_read to read 16-bit SSBO

2018-06-09 Thread Jose Maria Casanova Crespo

Using shuffle_from_32bit_read instead of 16-bit shuffle functions
avoids the need of retype. At the same time new function are
ready for 8-bit type SSBO reads.
---
 src/intel/compiler/brw_fs_nir.cpp | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 1f684149fd5..ef7895262b8 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2372,10 +2372,8 @@ do_untyped_vector_read(const fs_builder ,
   1 /* dims */,
   num_components_32bit,
   BRW_PREDICATE_NONE);
- shuffle_32bit_load_result_to_16bit_data(bld,
-   retype(dest, BRW_REGISTER_TYPE_W),
-   retype(read_result, BRW_REGISTER_TYPE_D),
-   first_component, num_components);
+ shuffle_from_32bit_read(bld, dest, read_result, first_component,
+ num_components);
   } else {
  fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
  for (unsigned i = 0; i < num_components; i++) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/14] intel/compiler: remove old 16-bit shuffle/unshuffle functions

2018-06-09 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_fs.h   | 11 --
 src/intel/compiler/brw_fs_nir.cpp | 62 ---
 2 files changed, 73 deletions(-)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 779170ecc95..d72164ae0b6 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -508,17 +508,6 @@ fs_reg shuffle_64bit_data_for_32bit_write(const 
brw::fs_builder ,
   const fs_reg ,
   uint32_t components);
 
-void shuffle_32bit_load_result_to_16bit_data(const brw::fs_builder ,
- const fs_reg ,
- const fs_reg ,
- uint32_t first_component,
- uint32_t components);
-
-void shuffle_16bit_data_for_32bit_write(const brw::fs_builder ,
-const fs_reg ,
-const fs_reg ,
-uint32_t components);
-
 void shuffle_from_32bit_read(const brw::fs_builder ,
  const fs_reg ,
  const fs_reg ,
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index a54935f7049..7e738ade82e 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -5263,40 +5263,6 @@ shuffle_32bit_load_result_to_64bit_data(const fs_builder 
,
}
 }
 
-void
-shuffle_32bit_load_result_to_16bit_data(const fs_builder ,
-const fs_reg ,
-const fs_reg ,
-uint32_t first_component,
-uint32_t components)
-{
-   assert(type_sz(src.type) == 4);
-   assert(type_sz(dst.type) == 2);
-
-   /* A temporary is used to un-shuffle the 32-bit data of each component in
-* into a valid 16-bit vector. We can't write directly to dst because it
-* can be the same register as src and in that case the first MOV in the
-* loop below would overwrite the data read in the second MOV.
-*/
-   fs_reg tmp = retype(bld.vgrf(src.type), dst.type);
-
-   for (unsigned i = 0; i < components; i++) {
-  const fs_reg component_i =
- subscript(offset(src, bld, (first_component + i) / 2), dst.type,
-   (first_component + i) % 2);
-
-  bld.MOV(offset(tmp, bld, i % 2), component_i);
-
-  if (i % 2) {
- bld.MOV(offset(dst, bld, i -1), offset(tmp, bld, 0));
- bld.MOV(offset(dst, bld, i), offset(tmp, bld, 1));
-  }
-   }
-   if (components % 2) {
-  bld.MOV(offset(dst, bld, components - 1), tmp);
-   }
-}
-
 /**
  * This helper does the inverse operation of
  * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
@@ -5329,34 +5295,6 @@ shuffle_64bit_data_for_32bit_write(const fs_builder ,
return dst;
 }
 
-void
-shuffle_16bit_data_for_32bit_write(const fs_builder ,
-   const fs_reg ,
-   const fs_reg ,
-   uint32_t components)
-{
-   assert(type_sz(src.type) == 2);
-   assert(type_sz(dst.type) == 4);
-
-   /* A temporary is used to shuffle the 16-bit data of each component in the
-* 32-bit data vector. We can't write directly to dst because it can be the
-* same register as src and in that case the first MOV in the loop below
-* would overwrite the data read in the second MOV.
-*/
-   fs_reg tmp = bld.vgrf(dst.type);
-
-   for (unsigned i = 0; i < components; i++) {
-  const fs_reg component_i = offset(src, bld, i);
-  bld.MOV(subscript(tmp, src.type, i % 2), component_i);
-  if (i % 2) {
- bld.MOV(offset(dst, bld, i / 2), tmp);
-  }
-   }
-   if (components % 2) {
-  bld.MOV(offset(dst, bld, components / 2), tmp);
-   }
-}
-
 /*
  * This helper takes a source register and un/shuffles it into the destination
  * register.
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/14] intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function

2018-06-09 Thread Jose Maria Casanova Crespo

This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.

If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.

The operation allows to skip first_component number of components from
the source.

Shuffle MOVs are retyped using integer types avoiding problems with denorms
and float types. This allows to simplify uses of shuffle functions that are
dealing with these retypes individually.

Now there is a new restriction so source and destination can not overlap
anymore when calling this suffle function. Following patches that migrate
to use this new function will take care individually of avoiding source
and destination overlaps.
---
 src/intel/compiler/brw_fs_nir.cpp | 92 +++
 1 file changed, 92 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 166da0aa6d7..1a9d3c41d1d 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -5362,6 +5362,98 @@ shuffle_16bit_data_for_32bit_write(const fs_builder ,
}
 }
 
+/*
+ * This helper takes a source register and un/shuffles it into the destination
+ * register.
+ *
+ * If source type size is smaller than destination type size the operation
+ * needed is a component shuffle. The opposite case would be an unshuffle. If
+ * source/destination type size is equal a shuffle is done that would be
+ * equivalent to a simple MOV.
+ *
+ * For example, if source is a 16-bit type and destination is 32-bit. A 3
+ * components .xyz 16-bit vector on SIMD8 would be.
+ *
+ *|x1|x2|x3|x4|x5|x6|x7|x8|y1|y2|y3|y4|y5|y6|y7|y8|
+ *|z1|z2|z3|z4|z5|z6|z7|z8|  |  |  |  |  |  |  |  |
+ *
+ * This helper will return the following 2 32-bit components with the 16-bit
+ * values shuffled:
+ *
+ *|x1 y1|x2 y2|x3 y3|x4 y4|x5 y5|x6 y6|x7 y7|x8 y8|
+ *|z1   |z2   |z3   |z4   |z5   |z6   |z7   |z8   |
+ *
+ * For unshuffle, the example would be the opposite, a 64-bit type source
+ * and a 32-bit destination. A 2 component .xy 64-bit vector on SIMD8
+ * would be:
+ *
+ *| x1l   x1h | x2l   x2h | x3l   x3h | x4l   x4h |
+ *| x5l   x5h | x6l   x6h | x7l   x7h | x8l   x8h |
+ *| y1l   y1h | y2l   y2h | y3l   y3h | y4l   y4h |
+ *| y5l   y5h | y6l   y6h | y7l   y7h | y8l   y8h |
+ *
+ * The returned result would be the following 4 32-bit components unshuffled:
+ *
+ *| x1l | x2l | x3l | x4l | x5l | x6l | x7l | x8l |
+ *| x1h | x2h | x3h | x4h | x5h | x6h | x7h | x8h |
+ *| y1l | y2l | y3l | y4l | y5l | y6l | y7l | y8l |
+ *| y1h | y2h | y3h | y4h | y5h | y6h | y7h | y8h |
+ *
+ * - Source and destination register must not be overlapped.
+ * - first_component parameter allows skipping source components.
+ */
+void
+shuffle_src_to_dst(const fs_builder ,
+   const fs_reg ,
+   const fs_reg ,
+   uint32_t first_component,
+   uint32_t components)
+{
+   if (type_sz(src.type) <= type_sz(dst.type)) {
+  /* Source is shuffled into destination */
+  unsigned size_ratio = type_sz(dst.type) / type_sz(src.type);
+#ifndef NDEBUG
+  boolean src_dst_overlap = regions_overlap(dst,
+ type_sz(dst.type) * bld.dispatch_width() * components,
+ offset(src, bld, first_component * size_ratio),
+ type_sz(src.type) * bld.dispatch_width() * components * size_ratio);
+#endif
+  assert(!src_dst_overlap);
+
+  brw_reg_type shuffle_type =
+ brw_reg_type_from_bit_size(8 * type_sz(src.type),
+BRW_REGISTER_TYPE_D);
+  for (unsigned i = 0; i < components; i++) {
+ fs_reg shuffle_component_i =
+subscript(offset(dst, bld, i / size_ratio),
+  shuffle_type, i % size_ratio);
+ bld.MOV(shuffle_component_i,
+ retype(offset(src, bld, i + first_component), shuffle_type));
+  }
+   } else {
+  /* Source is unshuffled into destination */
+  unsigned size_ratio = type_sz(src.type) / type_sz(dst.type);
+#ifndef NDEBUG
+  boolean src_dst_overlap = regions_overlap(dst,
+ type_sz(dst.type) * bld.dispatch_width() * components,
+ offset(src, bld, first_component / size_ratio),
+ type_sz(src.type) * bld.dispatch_width() *
+ DIV_ROUND_UP(components + first_component % size_ratio, size_ratio));
+#endif
+  assert(!src_dst_overlap);
+  brw_reg_type shuffle_type =
+ brw_reg_type_from_bit_size(8 * type_sz(dst.type),
+BRW_REGISTER_TYPE_D);
+  for (unsigned i = 0; i < components; i++) {
+ fs_reg shuffle_component_i =
+subscript(offset(src, bld, (first_component + i) / size_ratio),
+  shuffle_type, (first_component + i) % size_ratio);
+ bld.MOV(retype(offset(dst, bld, i),

[Mesa-dev] [PATCH 03/14] intel/compiler: use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD

2018-06-09 Thread Jose Maria Casanova Crespo

shuffle_from_32bit_read can manage the shuffle/unshuffle needed
for different 8/16/32/64 bit-sizes at VARYING PULL CONSTANT LOAD.
To get the specific component the first_component parameter is used.

In the case of the previous 16-bit shuffle, the shuffle operation was
generating not needed MOVs where its results where never used. This
behaviour passed unnoticed on SIMD16 because dead_code_eliminate
pass removed the generated instructions but for SIMD8 they cound't be
removed because of being partial writes.
---
 src/intel/compiler/brw_fs.cpp | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index d67c0a41922..410768ef927 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -191,21 +191,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
 vec4_result, surf_index, vec4_offset);
inst->size_written = 4 * vec4_result.component_size(inst->exec_size);
 
-   fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
-   switch (type_sz(dst.type)) {
-   case 2:
-  shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 0, 1);
-  bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
-  break;
-   case 4:
-  bld.MOV(dst, retype(dw, dst.type));
-  break;
-   case 8:
-  shuffle_32bit_load_result_to_64bit_data(bld, dst, dw, 1);
-  break;
-   default:
-  unreachable("Unsupported bit_size");
-   }
+   shuffle_from_32bit_read(bld, dst, vec4_result,
+   (const_offset & 0xf) / type_sz(dst.type), 1);
 }
 
 /**
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/14] intel/compiler: Use shuffle_from_32bit_write for 16-bits store_ssbo

2018-06-09 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_fs_nir.cpp | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index ef7895262b8..a54935f7049 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4297,11 +4297,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  * aligned. Shuffling only one component would be the same as
  * striding it.
  */
-fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_D,
-  DIV_ROUND_UP(num_components, 2));
-shuffle_16bit_data_for_32bit_write(bld, tmp, write_src,
-   num_components);
-write_src = tmp;
+write_src = shuffle_for_32bit_write(bld, write_src, 0,
+num_components);
  }
 
  fs_reg offset_reg;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/14] intel/compiler: shuffle_from_32bit_read for 64-bit do_untyped_vector_read

2018-06-09 Thread Jose Maria Casanova Crespo

do_untyped_vector_read is used at load_ssbo and load_shared.

The previous MOVs are removed because shuffle_from_32bit_read
can handle storing the shuffle results in the expected destination
just using the proper offset.
---
 src/intel/compiler/brw_fs_nir.cpp | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 7e738ade82e..780a9e228de 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2434,16 +2434,8 @@ do_untyped_vector_read(const fs_builder ,
 BRW_PREDICATE_NONE);
 
  /* Shuffle the 32-bit load result into valid 64-bit data */
- const fs_reg packed_result = bld.vgrf(dest.type, iter_components);
- shuffle_32bit_load_result_to_64bit_data(
-bld, packed_result, read_result, iter_components);
-
- /* Move each component to its destination */
- read_result = retype(read_result, BRW_REGISTER_TYPE_DF);
- for (int c = 0; c < iter_components; c++) {
-bld.MOV(offset(dest, bld, it * 2 + c),
-offset(packed_result, bld, c));
- }
+ shuffle_from_32bit_read(bld, offset(dest, bld, it * 2),
+ read_result, 0, iter_components);
 
  bld.ADD(read_offset, read_offset, brw_imm_ud(16));
   }
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/14] intel/compiler: enable shuffle_from_32bit_read at 64-bit gs_input_load

2018-06-09 Thread Jose Maria Casanova Crespo

This implementation avoids two unneeded MOVs for each 64-bit
component. One was done in the old shuffle, to avoid cases of
src/dst overlap but this is not the case. And the removed MOV
was already being being done in the shuffle.

Copy propagation wasn't able to remove them because shuffle
destination values are defined with partial writes because they
have stride == 2.
---
 src/intel/compiler/brw_fs_nir.cpp | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 780a9e228de..6abc7c0174d 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2305,11 +2305,11 @@ fs_visitor::emit_gs_input_load(const fs_reg ,
   }
 
   if (type_sz(dst.type) == 8) {
- shuffle_32bit_load_result_to_64bit_data(
-bld, tmp_dst, retype(tmp_dst, BRW_REGISTER_TYPE_F), 
num_components);
-
- for (unsigned c = 0; c < num_components; c++)
-bld.MOV(offset(dst, bld, iter * 2 + c), offset(tmp_dst, bld, c));
+ shuffle_from_32bit_read(bld,
+ offset(dst, bld, iter * 2),
+ retype(tmp_dst, BRW_REGISTER_TYPE_D),
+ 0,
+ num_components);
   }
 
   if (num_iterations > 1) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/14] intel/compiler: unshuffle/shuffle functions refactoring

2018-06-09 Thread Jose Maria Casanova Crespo

64-bit types support required un/shuffle operations to deal with load/store
operations that use 32-bit components. The introduction of VK_KHR_16bit_storage
support also needed new dedicated shuffle/unshuffle functions.

This series generalizes the different shuffle/unshuffle operations in a new
helper shuffle_src_to_dst for 64/32/16/8-bit cases an creates two specific
functions shuffle_from_32bit_read and shuffle_for_32bit_write to be used
after and before load/store operations.

I revisited all shuffle/unshuffle uses and avoids the emission of
unneeded instructions caused by previously allowing source/destination
overlaps in shuffles. The series refactors that cases to avoid them.

As most of the changes affects 64-bit uses, I used the piglit test shaders to
check the impact of the series with shader-db that is mainly possitive.

Skylake
total instructions in shared programs: 1688394 -> 1661420 (-1.60%)
instructions in affected programs: 370480 -> 343506 (-7.28%)
helped: 1048
HURT: 0

total cycles in shared programs: 52697457 -> 52527275 (-0.32%)
cycles in affected programs: 1783581 -> 1613399 (-9.54%)
helped: 1004
HURT: 41

Cc: Jason Ekstrand 
Cc: Samuel Iglesias 
Cc: Iago Toral 

Jose Maria Casanova Crespo (14):
  intel/compiler: general 8/16/32/64-bit shuffle_src_to_dst function
  intel/compiler: new shuffle_for_32bit_write and shuffle_from_32bit_read
  intel/compiler: use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD
  intel/compiler: Use shuffle_from_32bit_read to read 16-bit SSBO
  intel/compiler: Use shuffle_from_32bit_write for 16-bits store_ssbo
  intel/compiler: remove old 16-bit shuffle/unshuffle functions
  intel/compiler: shuffle_from_32bit_read for 64-bit do_untyped_vector_read
  intel/compiler: enable shuffle_from_32bit_read at 64-bit gs_input_load
  intel/compiler: Use shuffle_from_32bit_read at VS load_input
  intel/compiler: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES
  intel/compiler: use shuffle_from_32bit_read for 64-bit FS load_input
  intel/compiler: shuffle_32bit_load_result_to_64bit_data is not used anymore
  intel/compiler: use new shuffle_32bit_write for all 64-bit storage writes
  intel/compiler: shuffle_64bit_data_for_32bit_write is not used anymore

 src/intel/compiler/brw_fs.cpp |  17 +-
 src/intel/compiler/brw_fs.h   |  29 +--
 src/intel/compiler/brw_fs_nir.cpp | 315 +-
 3 files changed, 147 insertions(+), 214 deletions(-)

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/14] intel/compiler: new shuffle_for_32bit_write and shuffle_from_32bit_read

2018-06-09 Thread Jose Maria Casanova Crespo

These new shuffle functions deal with the shuffle/unshuffle operations
needed for read/write operations using 32-bit components when the
read/written components have a different bit-size (8, 16, 64-bits).
Shuffle from 32-bit to 32-bit becomes a simple MOV.

As the new function shuffle_src_to_dst takes of doing a shuffle or an
unshuffle based on the different type_sz of source an destination this
generic functions work with any source/destination assuming that writes
use a 32-bit destination or reads use a 32-bit source.

To enable this new functions it is needed than there is no
source/destination overlap in the case of shuffle_from_32bit_read.
That never happens on shuffle_for_32bit_write as it allocates a new
destination register as it was at shuffle_64bit_data_for_32bit_write.
---
 src/intel/compiler/brw_fs.h   | 11 +
 src/intel/compiler/brw_fs_nir.cpp | 38 +++
 2 files changed, 49 insertions(+)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index faf51568637..779170ecc95 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -519,6 +519,17 @@ void shuffle_16bit_data_for_32bit_write(const 
brw::fs_builder ,
 const fs_reg ,
 uint32_t components);
 
+void shuffle_from_32bit_read(const brw::fs_builder ,
+ const fs_reg ,
+ const fs_reg ,
+ uint32_t first_component,
+ uint32_t components);
+
+fs_reg shuffle_for_32bit_write(const brw::fs_builder ,
+   const fs_reg ,
+   uint32_t first_component,
+   uint32_t components);
+
 fs_reg setup_imm_df(const brw::fs_builder ,
 double v);
 
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 1a9d3c41d1d..1f684149fd5 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -5454,6 +5454,44 @@ shuffle_src_to_dst(const fs_builder ,
}
 }
 
+void
+shuffle_from_32bit_read(const fs_builder ,
+const fs_reg ,
+const fs_reg ,
+uint32_t first_component,
+uint32_t components)
+{
+   assert(type_sz(src.type) == 4);
+
+   if (type_sz(dst.type) > 4) {
+  assert(type_sz(dst.type) == 8);
+  first_component *= 2;
+  components *= 2;
+   }
+
+   shuffle_src_to_dst(bld, dst, src, first_component, components);
+}
+
+fs_reg
+shuffle_for_32bit_write(const fs_builder ,
+const fs_reg ,
+uint32_t first_component,
+uint32_t components)
+{
+   fs_reg dst = bld.vgrf(BRW_REGISTER_TYPE_D,
+ DIV_ROUND_UP (components * type_sz(src.type), 4));
+
+   if (type_sz(src.type) > 4) {
+  assert(type_sz(src.type) == 8);
+  first_component *= 2;
+  components *= 2;
+   }
+
+   shuffle_src_to_dst(bld, dst, src, first_component, components);
+
+   return dst;
+}
+
 fs_reg
 setup_imm_df(const fs_builder , double v)
 {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 7.5/18] intel/compiler: support negate and abs of half float immediates

2018-05-02 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_shader.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_shader.cpp 
b/src/intel/compiler/brw_shader.cpp
index 284c2e8233c..537defd05d9 100644
--- a/src/intel/compiler/brw_shader.cpp
+++ b/src/intel/compiler/brw_shader.cpp
@@ -605,7 +605,8 @@ brw_negate_immediate(enum brw_reg_type type, struct brw_reg 
*reg)
case BRW_REGISTER_TYPE_V:
   assert(!"unimplemented: negate UV/V immediate");
case BRW_REGISTER_TYPE_HF:
-  assert(!"unimplemented: negate HF immediate");
+  reg->ud ^= 0x80008000;
+  return true;
case BRW_REGISTER_TYPE_NF:
   unreachable("no NF immediates");
}
@@ -651,7 +652,8 @@ brw_abs_immediate(enum brw_reg_type type, struct brw_reg 
*reg)
case BRW_REGISTER_TYPE_V:
   assert(!"unimplemented: abs V immediate");
case BRW_REGISTER_TYPE_HF:
-  assert(!"unimplemented: abs HF immediate");
+  reg->ud &= ~0x80008000;
+  return true;
case BRW_REGISTER_TYPE_NF:
   unreachable("no NF immediates");
}
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] intel/compiler: fix 16-bit int brw_negate_immediate and brw_abs_immediate

2018-05-02 Thread Jose Maria Casanova Crespo

From Intel Skylake PRM, vol 07, "Immediate" section (page 768):

"For a word, unsigned word, or half-float immediate data,
software must replicate the same 16-bit immediate value to both
the lower word and the high word of the 32-bit immediate field
in a GEN instruction."

This fixes the int16/uint16 negate and abs immediates that weren't
taking into account the replication in lower and upper words.

v2: Integer cases are different to Float cases. (Jason Ekstrand)
Included reference to PRM (Jose Maria Casanova)
v3: Make explicit uint32_t casting for left shift (Jason Ekstrand)
Split half float implementation. (Jason Ekstrand)
Fix brw_abs_immediate (Jose Maria Casanova)

Cc: "18 . 0 18 . 1" 
---
 src/intel/compiler/brw_shader.cpp | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_shader.cpp 
b/src/intel/compiler/brw_shader.cpp
index 9cdf9fcb23d..284c2e8233c 100644
--- a/src/intel/compiler/brw_shader.cpp
+++ b/src/intel/compiler/brw_shader.cpp
@@ -580,9 +580,11 @@ brw_negate_immediate(enum brw_reg_type type, struct 
brw_reg *reg)
   reg->d = -reg->d;
   return true;
case BRW_REGISTER_TYPE_W:
-   case BRW_REGISTER_TYPE_UW:
-  reg->d = -(int16_t)reg->ud;
+   case BRW_REGISTER_TYPE_UW: {
+  uint16_t value = -(int16_t)reg->ud;
+  reg->ud = value | (uint32_t)value << 16;
   return true;
+   }
case BRW_REGISTER_TYPE_F:
   reg->f = -reg->f;
   return true;
@@ -618,9 +620,11 @@ brw_abs_immediate(enum brw_reg_type type, struct brw_reg 
*reg)
case BRW_REGISTER_TYPE_D:
   reg->d = abs(reg->d);
   return true;
-   case BRW_REGISTER_TYPE_W:
-  reg->d = abs((int16_t)reg->ud);
+   case BRW_REGISTER_TYPE_W: {
+  uint16_t value = abs((int16_t)reg->ud);
+  reg->ud = value | (uint32_t)value << 16;
   return true;
+   }
case BRW_REGISTER_TYPE_F:
   reg->f = fabsf(reg->f);
   return true;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] intel/compiler: fix brw_imm_w for negative 16-bit integers

2018-05-02 Thread Jose Maria Casanova Crespo

16-bit immediates need to replicate the 16-bit immediate value
in both words of the 32-bit value. This needs to be careful
to avoid sign-extension, which the previous implementation was
not handling properly.

For example, with the previous implementation, storing the value
-3 would generate imm.d = 0xfffd due to signed integer sign
extension, which is not correct. Instead, we should cast to
uint16_t, which gives us the correct result: imm.ud = 0xfffdfffd.

We only had a couple of cases hitting this path in the driver
until now, one with value -1, which would work since all bits are
one in this case, and another with value -2 in brw_clip_tri(),
which would hit the aforementioned issue (this case only affects
gen4 although we are not aware of whether this was causing an
actual bug somewhere).

v2: Make explicit uint32_t casting for left shift (Jason Ekstrand)

Reviewed-by: Jason Ekstrand 

Cc: "18 . 0 18 . 1" 
---
 src/intel/compiler/brw_reg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_reg.h b/src/intel/compiler/brw_reg.h
index dff9b970b2f..ac12ab3d2dd 100644
--- a/src/intel/compiler/brw_reg.h
+++ b/src/intel/compiler/brw_reg.h
@@ -705,7 +705,7 @@ static inline struct brw_reg
 brw_imm_w(int16_t w)
 {
struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_W);
-   imm.d = w | (w << 16);
+   imm.ud = (uint16_t)w | (uint32_t)(uint16_t)w << 16;
return imm;
 }
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965/fs: Register allocator shoudn't use grf127 for sends dest (v2)

2018-04-18 Thread Jose Maria Casanova Crespo

Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:
  
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_16struct_to_32struct.uniform_buffer_block_vert
  
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_16struct_to_32struct.uniform_buffer_block_tessc

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
with a fixed assignation to grf127 and create interferences to send
message vgrf destinations. (Eric Anholt)
---
 src/intel/compiler/brw_fs_reg_allocate.cpp | 25 ++
 1 file changed, 25 insertions(+)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index ec8e116cb38..59e047483c0 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -548,6 +548,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
int first_mrf_hack_node = node_count;
if (devinfo->gen >= 7)
   node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
+   int grf127_send_hack_node = node_count;
+   if (devinfo->gen >= 8 && dispatch_width == 8)
+  node_count ++;
struct ra_graph *g =
   ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, node_count);
 
@@ -653,6 +656,28 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
spill_all)
   }
}
 
+   if (devinfo->gen >= 8 && dispatch_width == 8) {
+  /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
+   * subsection "EUISA Instructions", Send Message (page 990):
+   *
+   * "r127 must not be used for return address when there is a src and
+   * dest overlap in send instruction."
+   *
+   * We are avoiding using grf127 as part of the destination of send
+   * messages adding a node interference to the grf127_send_hack_node.
+   * This node has a fixed asignment to grf127.
+   *
+   * We don't apply it to SIMD16 because previous code avoids any register
+   * overlap between sources and destination.
+   */
+  ra_set_node_reg(g, grf127_send_hack_node, 127);
+  foreach_block_and_inst(block, fs_inst, inst, cfg) {
+ if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
+ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
+ }
+  }
+   }
+
/* Debug of register spilling: Go spill everything. */
if (unlikely(spill_all)) {
   int reg = choose_spill_reg(g);
-- 
2.17.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] i965/fs: retype offset_reg to UD at load_ssbo

2018-04-18 Thread Jose Maria Casanova Crespo

All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:

mov(8) vgrf9:UD, vgrf7:D

This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offsets.
---
 src/intel/compiler/brw_fs_nir.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 6c4bcd1c113..0ebaab96634 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4142,7 +4142,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   if (const_offset) {
  offset_reg = brw_imm_ud(const_offset->u32[0]);
   } else {
- offset_reg = get_nir_src(instr->src[1]);
+ offset_reg = retype(get_nir_src(instr->src[1]), BRW_REGISTER_TYPE_UD);
   }
 
   /* Read the vector */
-- 
2.17.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/2] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-04-11 Thread Jose Maria Casanova Crespo

Since Gen8+ Intel PRM states that "r127 must not be used for
return address  when there is a src and dest overlap in send
instruction."

This patch implements this restriction creating new register allocator
classes that are copies of the normal classes. These new classes
exclude in their set of registers the last one of the original classes
(the only one that includes the grf127).

So vgrf that are used as destination of send messages sent from a grf are
re-assigned to one of these new classes based on its size. So the register
allocator would never assign to these vgrf a register that involves the
grf127.

If dispatch_width > 8 we don't re-assign to the new classes because all
instructions have a node interference between source and destination. And
that is enought to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:
  
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_16struct_to_32struct.uniform_buffer_block_vert
  
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.uniform_16struct_to_32struct.uniform_buffer_block_tessc

Shader-db results on Skylake:

total instructions in shared programs: 7686798 -> 7686790 (<.01%)
instructions in affected programs: 1476 -> 1468 (-0.54%)
helped: 4
HURT: 0

total cycles in shared programs: 337092322 -> 337095944 (<.01%)
cycles in affected programs: 765861 -> 769483 (0.47%)
helped: 167
HURT: 161

Shader-db results on Broadwell:

total instructions in shared programs: 7658574 -> 7658561 (<.01%)
instructions in affected programs: 2355 -> 2342 (-0.55%)
helped: 5
HURT: 0

total cycles in shared programs: 340694553 -> 340689774 (<.01%)
cycles in affected programs: 1200517 -> 1195738 (-0.40%)
helped: 204
HURT: 267

total spills in shared programs: 4300 -> 4299 (-0.02%)
spills in affected programs: 72 -> 71 (-1.39%)
helped: 1
HURT: 0

total fills in shared programs: 5370 -> 5369 (-0.02%)
fills in affected programs: 58 -> 57 (-1.72%)
helped: 1
HURT: 0

As expected Shader-db reports no changes on previous generations.

Cc: Jason Ekstrand 
Cc: Francisco Jerez 
---
 src/intel/compiler/brw_compiler.h  |  7 ++-
 src/intel/compiler/brw_fs_reg_allocate.cpp | 70 +++---
 2 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/src/intel/compiler/brw_compiler.h 
b/src/intel/compiler/brw_compiler.h
index d3ae6499b91..572d373ff0c 100644
--- a/src/intel/compiler/brw_compiler.h
+++ b/src/intel/compiler/brw_compiler.h
@@ -61,9 +61,12 @@ struct brw_compiler {
 
   /**
* Array of the ra classes for the unaligned contiguous register
-   * block sizes used, indexed by register size.
+   * block sizes used, indexed by register size. Classes starting at
+   * index 16 are classes for registers that should be used for
+   * send destination registers. They are equivalent to 0-15 classes
+   * but not including grf127.
*/
-  int classes[16];
+  int classes[32];
 
   /**
* Mapping from classes to ra_reg ranges.  Each of the per-size
diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index ec8e116cb38..66e4a342d0d 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -102,19 +102,31 @@ brw_alloc_reg_set(struct brw_compiler *compiler, int 
dispatch_width)
 * Additionally, on gen5 we need aligned pairs of registers for the PLN
 * instruction, and on gen4 we need 8 contiguous regs for workaround simd16
 * texturing.
+*
+* For Gen8+ we duplicate classes with a new type of classes for vgrfs when
+* they are used as destination of SEND messages. The only difference is
+* that these classes don't include grf127 to implement the restriction of
+* not overlaping source and destination when register grf127 is the
+* destination of a SEND message. So 0-15 are the normal classes indexed by
+* size and 16-31 classes reuse the same registers but don't include
+* grf127.
 */
-   const int class_count = MAX_VGRF_SIZE;
-   int class_sizes[MAX_VGRF_SIZE];
-   for (unsigned i = 0; i < MAX_VGRF_SIZE; i++)
-  class_sizes[i] = i + 1;
+   const int class_types = devinfo->gen >= 8 ? 2 : 1;
+   const int class_count = class_types * MAX_VGRF_SIZE;
+   int class_sizes[class_count];
+   for (int i = 0; i < class_count; i++)
+  class_sizes[i] = (i % MAX_VGRF_SIZE) + 1;
 
memset(compiler->fs_reg_sets[index].class_to_ra_reg_range, 0,
   sizeof(compiler->fs_reg_sets[index].class_to_ra_reg_range));
int *class_to_ra_reg_range = 
compiler->fs_reg_sets[index].class_to_ra_reg_range;
 
-   /* Compute the total number of registers across all classes. */
+   /* Compute the total number of registers across all classes. The duplicated
+* classes to

[Mesa-dev] [PATCH 1/2] intel/compiler: grf127 can not be dest when src and dest overlap in send

2018-04-11 Thread Jose Maria Casanova Crespo

Implement at brw_eu_validate the restriction from Intel Broadwell PRM, vol 07,
section "Instruction Set Reference", subsection "EUISA Instructions", Send
Message (page 990):

"r127 must not be used for return address when there is a src and dest overlap
in send instruction."

Cc: Jason Ekstrand 
Cc: Matt Turner 
---
 src/intel/compiler/brw_eu_validate.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/intel/compiler/brw_eu_validate.c 
b/src/intel/compiler/brw_eu_validate.c
index d3189d1ef5e..0d711501303 100644
--- a/src/intel/compiler/brw_eu_validate.c
+++ b/src/intel/compiler/brw_eu_validate.c
@@ -261,6 +261,15 @@ send_restrictions(const struct gen_device_info *devinfo,
   brw_inst_src0_da_reg_nr(devinfo, inst) < 112,
   "send with EOT must use g112-g127");
   }
+  if (devinfo->gen >= 8) {
+ ERROR_IF(!dst_is_null(devinfo, inst) &&
+  (brw_inst_dst_da_reg_nr(devinfo, inst) +
+   brw_inst_rlen(devinfo, inst) > 127 ) &&
+  (brw_inst_src0_da_reg_nr(devinfo, inst) +
+   brw_inst_mlen(devinfo, inst) >
+   brw_inst_dst_da_reg_nr(devinfo, inst)),
+  "r127 can not be dest when src and dest overlap in send");
+  }
}
 
return error_msg;
-- 
2.16.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] nir/search: Include 8 and 16-bit support in construct_value

2018-03-01 Thread Jose Maria Casanova Crespo

---
 src/compiler/nir/nir_search.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
index c7c52ae320d..28b36b2b863 100644
--- a/src/compiler/nir/nir_search.c
+++ b/src/compiler/nir/nir_search.c
@@ -525,6 +525,9 @@ construct_value(const nir_search_value *value,
   case nir_type_float:
  load->def.name = ralloc_asprintf(load, "%f", c->data.d);
  switch (bitsize->dest_size) {
+ case 16:
+load->value.u16[0] = _mesa_float_to_half(c->data.d);
+break;
  case 32:
 load->value.f32[0] = c->data.d;
 break;
@@ -539,6 +542,12 @@ construct_value(const nir_search_value *value,
   case nir_type_int:
  load->def.name = ralloc_asprintf(load, "%" PRIi64, c->data.i);
  switch (bitsize->dest_size) {
+ case 8:
+load->value.i8[0] = c->data.i;
+break;
+ case 16:
+load->value.i16[0] = c->data.i;
+break;
  case 32:
 load->value.i32[0] = c->data.i;
 break;
@@ -553,6 +562,12 @@ construct_value(const nir_search_value *value,
   case nir_type_uint:
  load->def.name = ralloc_asprintf(load, "%" PRIu64, c->data.u);
  switch (bitsize->dest_size) {
+ case 8:
+load->value.u8[0] = c->data.u;
+break;
+ case 16:
+load->value.u16[0] = c->data.u;
+break;
  case 32:
 load->value.u32[0] = c->data.u;
 break;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 3/8] i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout (v3)

2018-02-28 Thread Jose Maria Casanova Crespo

16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.

Vectors reads when we have non-constant offsets are implemented with
multiple byte_scattered_read messages that not require 32-bit aligned offsets.

Now for all constant offsets we can use the untyped_read_surface message.
In the case of constant offsets not aligned to 32-bits, we calculate a
start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data
function and the first_component parameter to skip the copy of the unneeded
component.

v2: (Jason Ekstrand)
Use untyped_read_surface messages always we have constant offsets.

v3: (Jason Ekstrand)
Simplify loop for reads with non constant offsets.
Use end - start to calculate the number of 32-bit components to read with
constant offsets.
---
 src/intel/compiler/brw_fs_nir.cpp | 51 ---
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 0d1ab5b01c..3f077b3c91 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2304,28 +2304,51 @@ do_untyped_vector_read(const fs_builder ,
 {
if (type_sz(dest.type) <= 2) {
   assert(dest.stride == 1);
+  boolean is_const_offset = offset_reg.file == BRW_IMMEDIATE_VALUE;
 
-  if (num_components > 1) {
- /* Pairs of 16-bit components can be read with untyped read, for 
16-bit
-  * vec3 4th component is ignored.
+  if (is_const_offset) {
+ uint32_t start = offset_reg.ud & ~3;
+ uint32_t end = offset_reg.ud + num_components * type_sz(dest.type);
+ end = ALIGN(end, 4);
+ assert (end - start <= 16);
+
+ /* At this point we have 16-bit component/s that have constant
+  * offset aligned to 4-bytes that can be read with untyped_reads.
+  * untyped_read message requires 32-bit aligned offsets.
   */
+ unsigned first_component = (offset_reg.ud & 3) / type_sz(dest.type);
+ unsigned num_components_32bit = (end - start) / 4;
+
  fs_reg read_result =
-emit_untyped_read(bld, surf_index, offset_reg,
-  1 /* dims */, DIV_ROUND_UP(num_components, 2),
+emit_untyped_read(bld, surf_index, brw_imm_ud(start),
+  1 /* dims */,
+  num_components_32bit,
   BRW_PREDICATE_NONE);
  shuffle_32bit_load_result_to_16bit_data(bld,
retype(dest, BRW_REGISTER_TYPE_W),
retype(read_result, BRW_REGISTER_TYPE_D),
-   0, num_components);
+   first_component, num_components);
   } else {
- assert(num_components == 1);
- /* scalar 16-bit are read using one byte_scattered_read message */
- fs_reg read_result =
-emit_byte_scattered_read(bld, surf_index, offset_reg,
- 1 /* dims */, 1,
- type_sz(dest.type) * 8 /* bit_size */,
- BRW_PREDICATE_NONE);
- bld.MOV(dest, subscript(read_result, dest.type, 0));
+ fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
+ for (unsigned i = 0; i < num_components; i++) {
+if (i == 0) {
+   bld.MOV(read_offset, offset_reg);
+} else {
+   bld.ADD(read_offset, offset_reg,
+   brw_imm_ud(i * type_sz(dest.type)));
+}
+/* Non constant offsets are not guaranteed to be aligned 32-bits
+ * so they are read using one byte_scattered_read message
+ * for each component.
+ */
+fs_reg read_result =
+   emit_byte_scattered_read(bld, surf_index, read_offset,
+1 /* dims */, 1,
+type_sz(dest.type) * 8 /* bit_size */,
+BRW_PREDICATE_NONE);
+bld.MOV(offset(dest, bld, i),
+subscript (read_result, dest.type, 0));
+ }
   }
} else if (type_sz(dest.type) == 4) {
   fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 7/8] spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned

2018-02-27 Thread Jose Maria Casanova Crespo

The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some cases.

For 16-bit types, the push constant offset takes into account the
internal offset in the 32-bit uniform bucket adding 2-bytes when we access
not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0.

v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand 
---
 src/compiler/spirv/vtn_variables.c  |  2 --
 src/intel/compiler/brw_fs_nir.cpp   | 15 ++-
 src/intel/vulkan/anv_nir_lower_push_constants.c |  2 --
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 105b33a5671..7e8a090adde 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -753,8 +753,6 @@ _vtn_load_store_tail(struct vtn_builder *b, 
nir_intrinsic_op op, bool load,
}
 
if (op == nir_intrinsic_load_push_constant) {
-  vtn_assert(access_offset % 4 == 0);
-
   nir_intrinsic_set_base(instr, access_offset);
   nir_intrinsic_set_range(instr, access_size);
}
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 72b721c094d..3402c541dc8 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3885,16 +3885,21 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
 
case nir_intrinsic_load_uniform: {
-  /* Offsets are in bytes but they should always be multiples of 4 */
-  assert(instr->const_index[0] % 4 == 0);
+  /* Offsets are in bytes but they should always aligned to
+   * the type size
+   */
+  assert(instr->const_index[0] % 4 == 0 ||
+ instr->const_index[0] % type_sz(dest.type) == 0);
 
   fs_reg src(UNIFORM, instr->const_index[0] / 4, dest.type);
 
   nir_const_value *const_offset = nir_src_as_const_value(instr->src[0]);
   if (const_offset) {
- /* Offsets are in bytes but they should always be multiples of 4 */
- assert(const_offset->u32[0] % 4 == 0);
- src.offset = const_offset->u32[0];
+ assert(const_offset->u32[0] % type_sz(dest.type) == 0);
+ /* For 16-bit types we add the module of the const_index[0]
+  * offset to access to not 32-bit aligned element
+  */
+ src.offset = const_offset->u32[0] + instr->const_index[0] % 4;
 
  for (unsigned j = 0; j < instr->num_components; j++) {
 bld.MOV(offset(dest, bld, j), offset(src, bld, j));
diff --git a/src/intel/vulkan/anv_nir_lower_push_constants.c 
b/src/intel/vulkan/anv_nir_lower_push_constants.c
index b66552825b9..ad60d0c824e 100644
--- a/src/intel/vulkan/anv_nir_lower_push_constants.c
+++ b/src/intel/vulkan/anv_nir_lower_push_constants.c
@@ -41,8 +41,6 @@ anv_nir_lower_push_constants(nir_shader *shader)
 if (intrin->intrinsic != nir_intrinsic_load_push_constant)
continue;
 
-assert(intrin->const_index[0] % 4 == 0);
-
 /* We just turn them into uniform loads */
 intrin->intrinsic = nir_intrinsic_load_uniform;
  }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 8/8] anv: Enable VK_KHR_16bit_storage for PushConstant

2018-02-27 Thread Jose Maria Casanova Crespo

Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+.
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a7b586c79c7..7c8b768c589 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -787,7 +787,7 @@ void anv_GetPhysicalDeviceFeatures2KHR(
 
  features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
- features->storagePushConstant16 = false;
+ features->storagePushConstant16 = pdevice->info.gen >= 8;
  features->storageInputOutput16 = false;
  break;
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 6/8] spirv: Calculate properly 16-bit vector sizes

2018-02-27 Thread Jose Maria Casanova Crespo

Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.

v2: Use glsl_get_bit_size instead of if statement
(Jason Ekstrand)

Reviewed-by: Jason Ekstrand 
---
 src/compiler/spirv/vtn_variables.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 9eb85c24e95..105b33a5671 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -683,12 +683,9 @@ vtn_type_block_size(struct vtn_builder *b, struct vtn_type 
*type)
   if (cols > 1) {
  vtn_assert(type->stride > 0);
  return type->stride * cols;
-  } else if (base_type == GLSL_TYPE_DOUBLE ||
-base_type == GLSL_TYPE_UINT64 ||
-base_type == GLSL_TYPE_INT64) {
- return glsl_get_vector_elements(type->type) * 8;
   } else {
- return glsl_get_vector_elements(type->type) * 4;
+ unsigned type_size = glsl_get_bit_size(type->type) / 8;
+ return glsl_get_vector_elements(type->type) * type_size;
   }
}
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 2/8] i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components

2018-02-27 Thread Jose Maria Casanova Crespo

This helper used to load 16bit components from 32-bits read now allows
skipping components with the new parameter first_component. The semantics
now skip components until we reach the first_component, and then reads the
number of components passed to the function.

All previous uses of the helper are updated to use 0 as first_component.
This will allow read 16-bit components when the first one is not aligned
32-bit. Enabling more usages of untyped_reads with 16-bit types.
---
 src/intel/compiler/brw_fs.cpp | 2 +-
 src/intel/compiler/brw_fs.h   | 3 ++-
 src/intel/compiler/brw_fs_nir.cpp | 8 +---
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index bed632d21b9..e961b76ab61 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -194,7 +194,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
switch (type_sz(dst.type)) {
case 2:
-  shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
+  shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1, 0);
   bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
   break;
case 4:
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 63373580ee4..52dd5e1d6bb 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -503,7 +503,8 @@ fs_reg shuffle_64bit_data_for_32bit_write(const 
brw::fs_builder ,
 void shuffle_32bit_load_result_to_16bit_data(const brw::fs_builder ,
  const fs_reg ,
  const fs_reg ,
- uint32_t components);
+ uint32_t components,
+ uint32_t first_component);
 
 void shuffle_16bit_data_for_32bit_write(const brw::fs_builder ,
 const fs_reg ,
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 4aa411d149f..5567433a19e 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2316,7 +2316,7 @@ do_untyped_vector_read(const fs_builder ,
  shuffle_32bit_load_result_to_16bit_data(bld,
retype(dest, BRW_REGISTER_TYPE_W),
retype(read_result, BRW_REGISTER_TYPE_D),
-   num_components);
+   num_components, 0);
   } else {
  assert(num_components == 1);
  /* scalar 16-bit are read using one byte_scattered_read message */
@@ -4908,7 +4908,8 @@ void
 shuffle_32bit_load_result_to_16bit_data(const fs_builder ,
 const fs_reg ,
 const fs_reg ,
-uint32_t components)
+uint32_t components,
+uint32_t first_component)
 {
assert(type_sz(src.type) == 4);
assert(type_sz(dst.type) == 2);
@@ -4922,7 +4923,8 @@ shuffle_32bit_load_result_to_16bit_data(const fs_builder 
,
 
for (unsigned i = 0; i < components; i++) {
   const fs_reg component_i =
- subscript(offset(src, bld, i / 2), dst.type, i % 2);
+ subscript(offset(src, bld, (first_component + i) / 2), dst.type,
+   (first_component + i) % 2);
 
   bld.MOV(offset(tmp, bld, i % 2), component_i);
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 5/8] anv: Enable VK_KHR_16bit_storage for SSBO and UBO

2018-02-27 Thread Jose Maria Casanova Crespo

Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.
---
 src/intel/vulkan/anv_device.c  | 5 +++--
 src/intel/vulkan/anv_extensions.py | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index cedeed56219..a7b586c79c7 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -783,9 +783,10 @@ void anv_GetPhysicalDeviceFeatures2KHR(
   case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_16BIT_STORAGE_FEATURES_KHR: {
  VkPhysicalDevice16BitStorageFeaturesKHR *features =
 (VkPhysicalDevice16BitStorageFeaturesKHR *)ext;
+ ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice);
 
- features->storageBuffer16BitAccess = false;
- features->uniformAndStorageBuffer16BitAccess = false;
+ features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
+ features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->storagePushConstant16 = false;
  features->storageInputOutput16 = false;
  break;
diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 581921e62a1..2999b3406fd 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -49,7 +49,7 @@ class Extension:
 # and dEQP-VK.api.info.device fail due to the duplicated strings.
 EXTENSIONS = [
 Extension('VK_ANDROID_native_buffer', 5, 'ANDROID'),
-Extension('VK_KHR_16bit_storage', 1, False),
+Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
 Extension('VK_KHR_bind_memory2',  1, True),
 Extension('VK_KHR_dedicated_allocation',  1, True),
 Extension('VK_KHR_descriptor_update_template',1, True),
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 1/8] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit

2018-02-27 Thread Jose Maria Casanova Crespo

The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytew multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = isl_align(buffer_size, 4) +
   (isl_align(buffer_size, 4) - buffer_size)

So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
Move padding logic fron anv to isl_surface_state
Move calculus of original size from spirv to driver backend
v3: (Jason Ekstrand)
Rename some variables and use a similar expresion when calculating
padding than when obtaining the original buffer size.
Avoid use of unnecesary component call at brw_fs_nir.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 27 ++-
 src/intel/isl/isl_surface_state.c | 22 +-
 src/intel/vulkan/anv_device.c | 11 +++
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 8efec34cc9d..4aa411d149f 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   inst->mlen = 1;
   inst->size_written = 4 * REG_SIZE;
 
-  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
+  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and Faulting:
+   *
+   * "Out-of-bounds checking is always performed at a DWord granularity. If
+   * any part of the DWord is out-of-bounds then the whole DWord is
+   * considered out-of-bounds."
+   *
+   * This implies that types with size smaller than 4-bytes need to be
+   * padded if they don't complete the last dword of the buffer. But as we
+   * need to maintain the original size we need to reverse the padding
+   * calculation to return the correct size to know the number of elements
+   * of an unsized array. As we stored in the last two bits of the size of
+   * the buffer the needed padding we calculate here:
+   *
+   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
+   */
+
+  fs_reg size_aligned4 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+
+  ubld.AND(size_padding, ret_payload, brw_imm_ud(3));
+  ubld.AND(size_aligned4, ret_payload, brw_imm_ud(~3));
+  ubld.ADD(buffer_size, size_aligned4, negate(size_padding));
+
+  bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));
+
   brw_mark_surface_used(prog_data, index);
   break;
}
diff --git a/src/intel/isl/isl_surface_state.c 
b/src/intel/isl/isl_surface_state.c
index bfb27fa4a44..c205b3d2c0b 100644
--- a/src/intel/isl/isl_surface_state.c
+++ b/src/intel/isl/isl_surface_state.c
@@ -673,7 +673,27 @@ void
 isl_genX(buffer_fill_state_s)(void *state,
   const struct isl_buffer_fill_state_info 
*restrict info)
 {
-   uint32_t num_elements = info->size / info->stride;
+   uint64_t buffer_size = info->size;
+
+   /* Uniform and Storage buffers need to have surface size not less that the
+* aligned 32-bit size of the buffer. To calculate the array lenght on
+* unsized arrays in StorageBuffer the last 2 bits store the padding size
+* added to the surface, so we can calculate latter the original buffer
+* size to know the number of elements.
+*
+*  surface_size = isl_align(buffer_size, 4) +
+* (isl_align(buffer_size) - buffer_size)
+*
+*  buffer_size = (surface_size & ~3) - (surface_size & 3)
+*/
+   if (info->format == ISL_FORMAT_RAW  ||
+   info->stride < isl_format_get_layout(info->format)->bpb / 8) {
+  assert(info->stride == 1);
+  uint64_t aligned_size = isl_align(buffer_size, 4);
+  buffer_size = aligned_size + (aligned_size - buffer_size);
+   }
+
+   uint32_t num_elements = buffer_size / info->stride;
 
if (GEN_GEN >= 7) {
   /* From the IVB PRM,

[Mesa-dev] [PATCH v2 0/8] anv: VK_KHR_16bit_storage enabling SSBO/UBO/PushConstant

2018-02-27 Thread Jose Maria Casanova Crespo

This v2 series includes several fixes to allow enabling the VK_KHR_16bit_storage
features in ANV that have already landed but are currently disabled.

Main differences with V1 [1] are:

   * Now UBO/SSBO padding for buffers size not multiple of 4 [1/8] is done
 in isl and the calculus to get the original buffer size before
 padding is done in the backend.
   * Now load_ubo/ssbo [3/8] at constant offsets use untyped_surface_read
 in all cases. A new patch [2/8] enables the
 shuffle_32bit_load_result_to_16bit_data to skip components.
   * vtn_type_block_size [6/8] has been simplified using glsl_get_bit_size.

Patches 2 and 3 and the re-enablement of features 5 and 8 are the ones with
pending review.

The series includes the following fixes:

   * [1] Fixes issues in UBO/SSBO support when buffer size is not multiple
 of 4. Patch adds a padding so the size will always include the last DWord
 completelly. For unsized SSBO arrays there are some bits arithmetic to
 allow recalculating the original size without the padding to calculate the
 number of elements correctly.
   * [2-4] Fixes the behaviour when VK_KHR_relaxed_block_layout is enabled, when
 we can not guarantee that the surface read/write offsets are multiple of 4.
   * [5] Enables VK_KHR_16bit_storage for SSBO and UBO.
   * [6-8] Enables 16-bit push constants removing/changing asserts that don't
 apply anymore to 16-bit case and a fix in the calculus os the size to be
 read.

To catch this issues several new tests were developed and they will be included
upstream to VK-GL-CTS.

This new version of this fixup series creates some conflicts in the re-submitted
V5 series with the 16-bit Input/Output support that is still pending of review.
An updated version including both series has been force-pushed at [2]

[1] https://lists.freedesktop.org/archives/mesa-dev/2018-February/186544.html
[2] https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc5

Cc: Jason Ekstrand <jason.ekstr...@intel.com>

Jose Maria Casanova Crespo (8):
  isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of
32-bit
  i965/fs: shuffle_32bit_load_result_to_16bit_data now skips components
  i965/fs: Support 16-bit do_read_vector with
VK_KHR_relaxed_block_layout
  i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
  anv: Enable VK_KHR_16bit_storage for SSBO and UBO
  spirv: Calculate properly 16-bit vector sizes
  spirv/i965/anv: Relax push constant offset assertions being 32-bit
aligned
  anv: Enable VK_KHR_16bit_storage for PushConstant

 src/compiler/spirv/vtn_variables.c  |   9 +-
 src/intel/compiler/brw_fs.cpp   |   2 +-
 src/intel/compiler/brw_fs.h |   3 +-
 src/intel/compiler/brw_fs_nir.cpp   | 124 ++--
 src/intel/isl/isl_surface_state.c   |  22 -
 src/intel/vulkan/anv_device.c   |  18 +++-
 src/intel/vulkan/anv_extensions.py  |   2 +-
 src/intel/vulkan/anv_nir_lower_push_constants.c |   2 -
 8 files changed, 137 insertions(+), 45 deletions(-)

-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 3/8] i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout

2018-02-27 Thread Jose Maria Casanova Crespo

16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.

Vectors reads when we have non-constant offsets are implemented with
multiple byte_scattered_read messages that not require 32-bit aligned offsets.

Now for all constant offsets we can use the untyped_read_surface message.
In the case of constant offsets not aligned to 32-bits, we calculate a
start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data
function and the first_component parameter to skip the copy of the unneeded
component.

v2: Use untyped_read_surface messages always we have constant offsets.
(Jason Ekstrand)
---
 src/intel/compiler/brw_fs_nir.cpp | 54 +--
 1 file changed, 40 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 5567433a19e..affb242668a 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2304,28 +2304,54 @@ do_untyped_vector_read(const fs_builder ,
 {
if (type_sz(dest.type) <= 2) {
   assert(dest.stride == 1);
+  boolean is_const_offset = offset_reg.file == BRW_IMMEDIATE_VALUE;
 
-  if (num_components > 1) {
- /* Pairs of 16-bit components can be read with untyped read, for 
16-bit
-  * vec3 4th component is ignored.
+  if (is_const_offset) {
+ uint32_t start = offset_reg.ud & ~3;
+ uint32_t end = offset_reg.ud + num_components * type_sz(dest.type);
+ end = ALIGN(end, 4);
+ assert (end - start <= 16);
+
+ /* At this point we have 16-bit component/s that have constant
+  * offset aligned to 4-bytes that can be read with untyped_reads.
+  * untyped_read message requires 32-bit aligned offsets.
   */
+ unsigned first_component = (offset_reg.ud & 3) / type_sz(dest.type);
+ unsigned num_components_32bit =
+DIV_ROUND_UP(first_component + num_components, 4 / 
type_sz(dest.type));
+
  fs_reg read_result =
-emit_untyped_read(bld, surf_index, offset_reg,
-  1 /* dims */, DIV_ROUND_UP(num_components, 2),
+emit_untyped_read(bld, surf_index, brw_imm_ud(start),
+  1 /* dims */,
+  num_components_32bit,
   BRW_PREDICATE_NONE);
  shuffle_32bit_load_result_to_16bit_data(bld,
retype(dest, BRW_REGISTER_TYPE_W),
retype(read_result, BRW_REGISTER_TYPE_D),
-   num_components, 0);
+   num_components, first_component);
   } else {
- assert(num_components == 1);
- /* scalar 16-bit are read using one byte_scattered_read message */
- fs_reg read_result =
-emit_byte_scattered_read(bld, surf_index, offset_reg,
- 1 /* dims */, 1,
- type_sz(dest.type) * 8 /* bit_size */,
- BRW_PREDICATE_NONE);
- bld.MOV(dest, subscript(read_result, dest.type, 0));
+ fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
+ bld.MOV(read_offset, offset_reg);
+ unsigned first_component = 0;
+ unsigned pending_components = num_components;
+ while (pending_components > 0) {
+/* Non constant offsets are not guaranteed to be aligned 32-bits
+ * so they are read using one byte_scattered_read message
+ * for each component.
+ */
+fs_reg read_result =
+   emit_byte_scattered_read(bld, surf_index, read_offset,
+1 /* dims */, 1,
+type_sz(dest.type) * 8 /* bit_size */,
+BRW_PREDICATE_NONE);
+shuffle_32bit_load_result_to_16bit_data(bld,
+   retype(offset(dest, bld, first_component), BRW_REGISTER_TYPE_W),
+   retype(read_result, BRW_REGISTER_TYPE_D),
+   1, 0);
+pending_components--;
+first_component ++;
+bld.ADD(read_offset, offset_reg, brw_imm_ud(2 * first_component));
+ }
   }
} else if (type_sz(dest.type) == 4) {
   fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 4/8] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout

2018-02-27 Thread Jose Maria Casanova Crespo

Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.

Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non constant offsets we will always use
byte_scattered_write.

v2: (Jason Ekstrand)
- Assert offset_reg to be multiple of 4 if it is immediate.

Reviewed-by: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index affb242668a..72b721c094d 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4133,6 +4133,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  unsigned num_components = ffs(~(writemask >> first_component)) - 1;
  fs_reg write_src = offset(val_reg, bld, first_component);
 
+ nir_const_value *const_offset = nir_src_as_const_value(instr->src[2]);
+
  if (type_size > 4) {
 /* We can't write more than 2 64-bit components at once. Limit
  * the num_components of the write to what we can do and let the 
next
@@ -4148,14 +4150,19 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  * 32-bit-aligned we need to use byte-scattered writes because
  * untyped writes works with 32-bit components with 32-bit
  * alignment. byte_scattered_write messages only support one
- * 16-bit component at a time.
+ * 16-bit component at a time. As VK_KHR_relaxed_block_layout
+ * could be enabled we can not guarantee that not constant offsets
+ * to be 32-bit aligned for 16-bit types. For example an array, of
+ * 16-bit vec3 with array element stride of 6.
  *
- * For example, if there is a 3-components vector we submit one
- * untyped-write message of 32-bit (first two components), and one
- * byte-scattered write message (the last component).
+ * In the case of 32-bit aligned constant offsets if there is
+ * a 3-components vector we submit one untyped-write message
+ * of 32-bit (first two components), and one byte-scattered
+ * write message (the last component).
  */
 
-if (first_component % 2) {
+if ( !const_offset || ((const_offset->u32[0] +
+   type_size * first_component) % 4)) {
/* If we use a .yz writemask we also need to emit 2
 * byte-scattered write messages because of y-component not
 * being aligned to 32-bit.
@@ -4181,7 +4188,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  }
 
  fs_reg offset_reg;
- nir_const_value *const_offset = nir_src_as_const_value(instr->src[2]);
+
  if (const_offset) {
 offset_reg = brw_imm_ud(const_offset->u32[0] +
 type_size * first_component);
@@ -4220,7 +4227,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  } else {
 assert(num_components * type_size <= 16);
 assert((num_components * type_size) % 4 == 0);
-assert((first_component * type_size) % 4 == 0);
+assert(offset_reg.file != BRW_IMMEDIATE_VALUE ||
+   offset_reg.ud % 4 == 0);
 unsigned num_slots = (num_components * type_size) / 4;
 
 emit_untyped_write(bld, surf_index, offset_reg,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/7] isl/i965/fs: SSBO/UBO buffers need size padding if not multiple of 32-bit (v2)

2018-02-26 Thread Jose Maria Casanova Crespo

The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1/3 16-bit elemnts has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytes multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:

surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size

So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:

buffer_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.

v2: (Jason Ekstrand)
Move padding logic fron anv to isl_surface_state
Move calculus of original size from spirv to driver backend
---
 src/intel/compiler/brw_fs_nir.cpp | 27 ++-
 src/intel/isl/isl_surface_state.c | 21 -
 src/intel/vulkan/anv_device.c | 11 +++
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 8efec34cc9d..d017af040b4 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4290,7 +4290,32 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   inst->mlen = 1;
   inst->size_written = 4 * REG_SIZE;
 
-  bld.MOV(retype(dest, ret_payload.type), component(ret_payload, 0));
+  /* SKL PRM, vol07, 3D Media GPGPU Engine, Bounds Checking and Faulting:
+   *
+   * "Out-of-bounds checking is always performed at a DWord granularity. If
+   * any part of the DWord is out-of-bounds then the whole DWord is
+   * considered out-of-bounds."
+   *
+   * This implies that types with size smaller than 4-bytes (16-bits) need
+   * to be padded if they don't complete the last dword of the buffer. But
+   * as we need to maintain the original size we need to reverse the 
padding
+   * calculation to return the correct size to know the  number of elements
+   * of an unsized array. As we stored in the last two bits of the size
+   * of the buffer the needed padding we calculate here:
+   *
+   * buffer_size = resinfo_size & ~3 - resinfo_size & 3
+   */
+
+  fs_reg size_aligned32 = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg size_padding = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+  fs_reg buffer_size = ubld.vgrf(BRW_REGISTER_TYPE_UD);
+
+  ubld.AND(size_padding, component(ret_payload, 0), brw_imm_ud(3));
+  ubld.AND(size_aligned32, component(ret_payload, 0), brw_imm_ud(~3));
+  ubld.ADD(buffer_size, size_aligned32, negate (size_padding));
+
+  bld.MOV(retype(dest, ret_payload.type), component(buffer_size, 0));
+
   brw_mark_surface_used(prog_data, index);
   break;
}
diff --git a/src/intel/isl/isl_surface_state.c 
b/src/intel/isl/isl_surface_state.c
index bfb27fa4a44..ddc9eb53c96 100644
--- a/src/intel/isl/isl_surface_state.c
+++ b/src/intel/isl/isl_surface_state.c
@@ -673,7 +673,26 @@ void
 isl_genX(buffer_fill_state_s)(void *state,
   const struct isl_buffer_fill_state_info 
*restrict info)
 {
-   uint32_t num_elements = info->size / info->stride;
+   uint64_t buffer_size = info->size;
+
+   /* Uniform and Storage buffers need to have surface size not less that the
+* aligned 32-bit size of the buffer. To calculate the array lenght on
+* unsized arrays in StorageBuffer the last 2 bits store the padding size
+* added to the surface, so we can calculate latter the original buffer
+* size to know the number of elements.
+*
+*  surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
+*
+*  array_size = (surface_size & ~3) - (surface_size & 3)
+*/
+   if (buffer_size % 4 &&
+   (info->format == ISL_FORMAT_RAW  ||
+info->stride < isl_format_get_layout(info->format)->bpb / 8)) {
+  assert(info->stride == 1);
+  buffer_size = 2 * isl_align(buffer_size, 4) - buffer_size;
+   }
+
+   uint32_t num_elements = buffer_size / info->stride;
 
if (GEN_GEN >= 7) {
   /* From the IVB PRM, SURFACE_STATE::Height,
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a83b7a39f6a..cedeed56219 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
 
pMemoryRequirements->size = buffer->size;

[Mesa-dev] [PATCH 6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned (v2)

2018-02-26 Thread Jose Maria Casanova Crespo

The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some cases.

For 16-bit types, the push constant offset takes into account the
internal offset in the 32-bit uniform bucket adding 2-bytes when we access
not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0.

v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand)
---
 src/compiler/spirv/vtn_variables.c  |  2 --
 src/intel/compiler/brw_fs_nir.cpp   | 15 ++-
 src/intel/vulkan/anv_nir_lower_push_constants.c |  2 --
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 105b33a5671..7e8a090adde 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -753,8 +753,6 @@ _vtn_load_store_tail(struct vtn_builder *b, 
nir_intrinsic_op op, bool load,
}
 
if (op == nir_intrinsic_load_push_constant) {
-  vtn_assert(access_offset % 4 == 0);
-
   nir_intrinsic_set_base(instr, access_offset);
   nir_intrinsic_set_range(instr, access_size);
}
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 03d0bba24a8..6f721ac3a43 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3887,16 +3887,21 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
 
case nir_intrinsic_load_uniform: {
-  /* Offsets are in bytes but they should always be multiples of 4 */
-  assert(instr->const_index[0] % 4 == 0);
+  /* Offsets are in bytes but they should always aligned to
+   * the type size
+   */
+  assert(instr->const_index[0] % 4 == 0||
+ instr->const_index[0] % type_sz(dest.type) == 0);
 
   fs_reg src(UNIFORM, instr->const_index[0] / 4, dest.type);
 
   nir_const_value *const_offset = nir_src_as_const_value(instr->src[0]);
   if (const_offset) {
- /* Offsets are in bytes but they should always be multiples of 4 */
- assert(const_offset->u32[0] % 4 == 0);
- src.offset = const_offset->u32[0];
+ assert(const_offset->u32[0] % type_sz(dest.type) == 0);
+ /* For 16-bit types we add the module of the const_index[0]
+  * offset to access to not 32-bit aligned element
+  */
+ src.offset = const_offset->u32[0] + instr->const_index[0] % 4;
 
  for (unsigned j = 0; j < instr->num_components; j++) {
 bld.MOV(offset(dest, bld, j), offset(src, bld, j));
diff --git a/src/intel/vulkan/anv_nir_lower_push_constants.c 
b/src/intel/vulkan/anv_nir_lower_push_constants.c
index b66552825b9..ad60d0c824e 100644
--- a/src/intel/vulkan/anv_nir_lower_push_constants.c
+++ b/src/intel/vulkan/anv_nir_lower_push_constants.c
@@ -41,8 +41,6 @@ anv_nir_lower_push_constants(nir_shader *shader)
 if (intrin->intrinsic != nir_intrinsic_load_push_constant)
continue;
 
-assert(intrin->const_index[0] % 4 == 0);
-
 /* We just turn them into uniform loads */
 intrin->intrinsic = nir_intrinsic_load_uniform;
  }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/7] spirv: Calculate properly 16-bit vector sizes (v2)

2018-02-23 Thread Jose Maria Casanova Crespo

Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.

v2: Use glsl_get_bit_size instead of if statement
(Jason Ekstrand)
---
 src/compiler/spirv/vtn_variables.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 78adab3ed2..e68551a336 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -683,12 +683,9 @@ vtn_type_block_size(struct vtn_builder *b, struct vtn_type 
*type)
   if (cols > 1) {
  vtn_assert(type->stride > 0);
  return type->stride * cols;
-  } else if (base_type == GLSL_TYPE_DOUBLE ||
-base_type == GLSL_TYPE_UINT64 ||
-base_type == GLSL_TYPE_INT64) {
- return glsl_get_vector_elements(type->type) * 8;
   } else {
- return glsl_get_vector_elements(type->type) * 4;
+ unsigned type_size = glsl_get_bit_size(type->type) / 8;
+ return glsl_get_vector_elements(type->type) * type_size;
   }
}
 
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 04/14] anv/cmd_buffer: Add a padding to the vertex buffer

2018-02-23 Thread Jose Maria Casanova Crespo

From: Alejandro Piñeiro <apinhe...@igalia.com>

As we are using 32-bit surface formats with 16-bit elements we can be
on a situation where a vertex element can poke over the buffer by 2
bytes. To avoid that we add a padding when flushing the state.

This is similar to what the i965 drivers prior to Haswell do, as they
use 4-component formats to fake 3-component formats, and add a padding
there too. See commit:
   7c8dfa78b98a12c1c5f74d11433c8554d4c90657

v2: (Jason Ekstrand)
Increase by 2 the size returned by GetBufferMemoryRequirements
when robust buffer access is enabled in a vertex buffer.
Renamed half_inputs_read to inputs_read_16bit.
v3: Rebase minor changes (Chema Casanova)

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinhe...@igalia.com>
---
 src/intel/vulkan/anv_device.c  |  9 +
 src/intel/vulkan/genX_cmd_buffer.c | 20 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 7c8b768c58..1756cf5324 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -2115,6 +2115,15 @@ void anv_GetBufferMemoryRequirements(
 buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
   pMemoryRequirements->size = align_u64(buffer->size, 4);
 
+   /* Vertex buffers with 16-bit values need a 2 bytes padding in some cases
+* because they are read as 32-bit components. By adding 2 bytes to memory
+* requirements size when robust buffer accesss is enabled the paddings we
+* read would be outside of the VkBuffer but would not be outside "the
+* memory range(s) bound to the buffer".
+*/
+   if (device->robust_buffer_access && (buffer->usage & 
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT))
+  pMemoryRequirements->size += 2;
+
pMemoryRequirements->memoryTypeBits = memory_types;
 }
 
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index ce546249b3..a6aeb1daf1 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2378,6 +2378,11 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 {
struct anv_pipeline *pipeline = cmd_buffer->state.gfx.base.pipeline;
uint32_t *p;
+#if GEN_GEN >= 8
+   const struct brw_vs_prog_data *vs_prog_data = get_vs_prog_data(pipeline);
+   const uint64_t inputs_read_16bit = vs_prog_data->inputs_read_16bit;
+   const uint32_t elements_16bit = inputs_read_16bit >> VERT_ATTRIB_GENERIC0;
+#endif
 
uint32_t vb_emit = cmd_buffer->state.gfx.vb_dirty & pipeline->vb_used;
 
@@ -2390,6 +2395,17 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
if (vb_emit) {
   const uint32_t num_buffers = __builtin_popcount(vb_emit);
   const uint32_t num_dwords = 1 + num_buffers * 4;
+  /* ISL 16-bit formats do a 16-bit to 32-bit float conversion, so we need
+   * to use ISL 32-bit formats to avoid such conversion in order to support
+   * properly 16-bit formats. This means that the vertex element may poke
+   * over the end of the buffer by 2 bytes.
+   */
+  const unsigned padding =
+#if GEN_GEN >= 8
+ (elements_16bit > 0) * 2;
+#else
+  0;
+#endif
 
   p = anv_batch_emitn(_buffer->batch, num_dwords,
   GENX(3DSTATE_VERTEX_BUFFERS));
@@ -2419,9 +2435,9 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 .BufferStartingAddress = { buffer->bo, buffer->offset + offset },
 
 #if GEN_GEN >= 8
-.BufferSize = buffer->size - offset
+.BufferSize = buffer->size - offset + padding,
 #else
-.EndAddress = { buffer->bo, buffer->offset + buffer->size - 1},
+.EndAddress = { buffer->bo, buffer->offset + buffer->size + 
padding - 1},
 #endif
  };
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 06/14] i965/fs: Support 16-bit types at load_input and store_output

2018-02-23 Thread Jose Maria Casanova Crespo

Enables the support of 16-bit types on load_input and
store_outputs intrinsics intra-stages.

The approach was based on re-using the 32-bit URB read
and writes between stages, shuffling pairs of 16-bit values into
32-bit values at load_store intrinsic and un-shuffling the values
at load_inputs.

v2: Minor changes after rebase against recent master (Jose Maria
Casanova)

v3: - Remove unnecessary retypes (Topi Pohjolainen)
- Rebase needed changes as now get_nir_src doesn't returns a 32-bit
  type, it returns a bitsized integer. Previous implementation of this
  patch assumed 32-bit type for get_nir_src. (Jose María Casanova)
- Move 32-16 shuffle-unshuffle helpers to independent patch.
  (Jose María Casanova)
---
 src/intel/compiler/brw_fs_nir.cpp | 69 +--
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index eb45b5df27..b85aa17114 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2209,12 +2209,17 @@ fs_visitor::emit_gs_input_load(const fs_reg ,
   first_component = first_component / 2;
}
 
+   if (type_sz(dst.type) == 2) {
+  num_components = DIV_ROUND_UP(num_components, 2);
+  tmp_dst = bld.vgrf(BRW_REGISTER_TYPE_F, num_components);
+   }
+
for (unsigned iter = 0; iter < num_iterations; iter++) {
   if (offset_const) {
  /* Constant indexing - use global offset. */
  if (first_component != 0) {
 unsigned read_components = num_components + first_component;
-fs_reg tmp = bld.vgrf(dst.type, read_components);
+fs_reg tmp = bld.vgrf(tmp_dst.type, read_components);
 inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, tmp, icp_handle);
 inst->size_written = read_components *
  tmp.component_size(inst->exec_size);
@@ -2264,6 +2269,11 @@ fs_visitor::emit_gs_input_load(const fs_reg ,
 bld.MOV(offset(dst, bld, iter * 2 + c), offset(tmp_dst, bld, c));
   }
 
+  if (type_sz(dst.type) == 2) {
+ shuffle_32bit_load_result_to_16bit_data(bld, dst, tmp_dst,
+ orig_num_components);
+  }
+
   if (num_iterations > 1) {
  num_components = orig_num_components - 2;
  if(offset_const) {
@@ -2605,6 +2615,11 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
  dst = tmp;
   }
 
+  if (type_sz(dst.type) == 2) {
+ num_components = DIV_ROUND_UP(num_components, 2);
+ dst = bld.vgrf(BRW_REGISTER_TYPE_F, num_components);
+  }
+
   for (unsigned iter = 0; iter < num_iterations; iter++) {
  if (indirect_offset.file == BAD_FILE) {
 /* Constant indexing - use global offset. */
@@ -2660,6 +2675,11 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
 }
  }
 
+ if (type_sz(orig_dst.type) == 2) {
+shuffle_32bit_load_result_to_16bit_data(
+   bld, orig_dst, dst, instr->num_components);
+ }
+
  /* Copy the temporary to the destination to deal with writemasking.
   *
   * Also attempt to deal with gl_PointSize being in the .w component.
@@ -2750,6 +2770,8 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
   fs_reg value = get_nir_src(instr->src[0]);
   bool is_64bit = (instr->src[0].is_ssa ?
  instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size) == 64;
+  bool is_16bit = (instr->src[0].is_ssa ?
+ instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size) == 16;
   fs_reg indirect_offset = get_indirect_offset(instr);
   unsigned imm_offset = instr->const_index[0];
   unsigned mask = instr->const_index[1];
@@ -2779,6 +2801,11 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
 num_iterations = 2;
 iter_components = 2;
  }
+  } else {
+ if (is_16bit) {
+iter_components = DIV_ROUND_UP(num_components, 2);
+value = retype (value, BRW_REGISTER_TYPE_D);
+ }
   }
 
   mask = mask << first_component;
@@ -2824,6 +2851,13 @@ fs_visitor::nir_emit_tcs_intrinsic(const fs_builder ,
continue;
 
 if (!is_64bit) {
+   if (is_16bit) {
+  shuffle_16bit_data_for_32bit_write(bld,
+ retype(offset(value,bld, i), BRW_REGISTER_TYPE_F),
+ retype(offset(value,bld, i), BRW_REGISTER_TYPE_HF),
+ 2);
+  value = retype (value, BRW_REGISTER_TYPE_D);
+   }
srcs[header_regs + i + first_component] = offset(value, bld, i);
 } else {
/* We need to shuffle the 64-bit data to match the layout
@@ -2967,6 +3001,11 @@ fs_visitor::nir_emit_tes_intrinsic(const fs_builder ,
 dest = tmp;

[Mesa-dev] [PATCH v5 14/14] i965/fs: Enable 16-bit render target write on SKL and CHV

2018-02-23 Thread Jose Maria Casanova Crespo

Once the infrastruture to support Render Target Messages with 16-bit
payload is available, this patch enables it on SKL and CHV platforms.

Enabling it allows 16-bit payload that use half of the register on
SIMD16 and avoids the spurious conversion from 16-bit to 32-bit needed
on BDW, just to be converted again to 16-bit.

In the case of CHV there is no support for UINT so in this case the
half precision data format is not enabled and the fallback of the
32-bit payload is used.

From PRM CHV, vol 07, section "Pixel Data Port" page 260:

"Half Precision Render Target Write messages do not support UNIT
formats." where UNIT is a typo for UINT.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Eduardo Lima <el...@igalia.com>
---
 src/intel/compiler/brw_fs_nir.cpp | 46 +++
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 1688a9a3d8..beb1caabbe 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -54,19 +54,24 @@ fs_visitor::nir_setup_outputs()
   return;
 
if (stage == MESA_SHADER_FRAGMENT) {
-  /*
+  /* On HW that doesn't support half-precision render-target-write
+   * messages (e.g, some gen8 HW like Broadwell), we need a workaround
+   * to support 16-bit outputs from pixel shaders.
+   *
* The following code uses the outputs map to save the variable's
* original output type, so later we can retrieve it and retype
* the output accordingly while emitting the FS 16-bit outputs.
*/
-  nir_foreach_variable(var, >outputs) {
- const enum glsl_base_type base_type =
-glsl_get_base_type(var->type->without_array());
-
- if (glsl_base_type_is_16bit(base_type)) {
-outputs[var->data.driver_location] =
-   retype(outputs[var->data.driver_location],
-  brw_type_for_base_type(var->type));
+  if (devinfo->gen == 8) {
+ nir_foreach_variable(var, >outputs) {
+const enum glsl_base_type base_type =
+   glsl_get_base_type(var->type->without_array());
+
+if (glsl_base_type_is_16bit(base_type)) {
+   outputs[var->data.driver_location] =
+  retype(outputs[var->data.driver_location],
+ brw_type_for_base_type(var->type));
+}
  }
   }
   return;
@@ -3353,14 +3358,27 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
   const unsigned location = nir_intrinsic_base(instr) +
  SET_FIELD(const_offset->u32[0], BRW_NIR_FRAG_OUTPUT_LOCATION);
 
+  /* This flag discriminates HW where we have support for half-precision
+   * render target write messages (aka, the data-format bit), so 16-bit
+   * render target payloads can be used. It is available since skylake
+   * and cherryview. In the case of cherryview there is no support for
+   * UINT formats.
+   */
+  bool enable_hp_rtw = is_16bit &&
+ (devinfo->gen >= 9 || (devinfo->is_cherryview &&
+outputs[location].type != 
BRW_REGISTER_TYPE_UW));
+
   if (is_16bit) {
- /* The outputs[location] should already have the original output type
-  * stored from nir_setup_outputs.
+ /* outputs[location] should already have the original output type
+  * stored from nir_setup_outputs, in case the HW doesn't support
+  * half-precision RTW messages.
+  * If HP RTW is enabled we just use HF to copy 16-bit values.
   */
- src = retype(src, outputs[location].type);
+ src = retype(src, enable_hp_rtw ?
+  BRW_REGISTER_TYPE_HF : outputs[location].type);
   }
 
-  fs_reg new_dest = retype(alloc_frag_output(this, location, false),
+  fs_reg new_dest = retype(alloc_frag_output(this, location, 
enable_hp_rtw),
src.type);
 
   /* This is a workaround to support 16-bits outputs on HW that doesn't
@@ -3370,7 +3388,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
* render target with a 16-bit surface format will force the correct
* conversion of the 32-bit output values to 16-bit.
*/
-  if (is_16bit) {
+  if (is_16bit && !enable_hp_rtw) {
  new_dest.type = brw_reg_type_from_bit_size(32, src.type);
   }
   for (unsigned j = 0; j < instr->num_components; j++)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 08/14] anv: Enable VK_KHR_16bit_storage for input/output

2018-02-23 Thread Jose Maria Casanova Crespo

Enables storageInputOutput16 feature of VK_KHR_16bit_storage
for Gen8+.
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 1756cf5324..c183ea8437 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -788,7 +788,7 @@ void anv_GetPhysicalDeviceFeatures2KHR(
  features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->storagePushConstant16 = pdevice->info.gen >= 8;
- features->storageInputOutput16 = false;
+ features->storageInputOutput16 = pdevice->info.gen >= 8;
  break;
   }
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 07/14] i965/fs: Enable Render Target Write for 16-bit outputs

2018-02-23 Thread Jose Maria Casanova Crespo

Broadwell doesn't support half precisions data formats on render
target writes (RTW) messages. So the solution to write 16-bit outputs
is to use the conversion from 32-bit to 16-bit when writing 32-bit
values on a 16-bit format surface using formats like R16_FLOAT.

Half-precision outputs are converted from HF->F, W->D and UW->UD. This
requires to know the GLSL base type used in NIR to define the shader
output. We store the 16-bit register types at nir_setup_outputs.

This conversion will be used on all 16-bit types on BDW and in the
case of Cherryview that doesn't have UINT16 type support for RTW
with half precision data formats.

It is important to note that in these cases the payload has 32-bit
format, different to the one used when the half precision data format
will be enabled on SKL and Cherryview with the following patches.

v2: By default 16-bit sources should be packed (Jason Ekstrand)
Remove not necessary alignment operation for 16-bit to
32-bit conversion (Chema Casanova)

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Eduardo Lima <el...@igalia.com>
---
 src/intel/compiler/brw_fs_nir.cpp | 48 +++
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index b85aa17114..03ee1d1e09 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -50,9 +50,28 @@ fs_visitor::emit_nir_code()
 void
 fs_visitor::nir_setup_outputs()
 {
-   if (stage == MESA_SHADER_TESS_CTRL || stage == MESA_SHADER_FRAGMENT)
+   if (stage == MESA_SHADER_TESS_CTRL)
   return;
 
+   if (stage == MESA_SHADER_FRAGMENT) {
+  /*
+   * The following code uses the outputs map to save the variable's
+   * original output type, so later we can retrieve it and retype
+   * the output accordingly while emitting the FS 16-bit outputs.
+   */
+  nir_foreach_variable(var, >outputs) {
+ const enum glsl_base_type base_type =
+glsl_get_base_type(var->type->without_array());
+
+ if (glsl_base_type_is_16bit(base_type)) {
+outputs[var->data.driver_location] =
+   retype(outputs[var->data.driver_location],
+  brw_type_for_base_type(var->type));
+ }
+  }
+  return;
+   }
+
unsigned vec4s[VARYING_SLOT_TESS_MAX] = { 0, };
 
/* Calculate the size of output registers in a separate pass, before
@@ -3322,14 +3341,35 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
}
 
case nir_intrinsic_store_output: {
-  const fs_reg src = get_nir_src(instr->src[0]);
+  fs_reg src = get_nir_src(instr->src[0]);
+  bool is_16bit = (instr->src[0].is_ssa ?
+ instr->src[0].ssa->bit_size : instr->src[0].reg.reg->bit_size) == 16;
+
   const nir_const_value *const_offset = 
nir_src_as_const_value(instr->src[1]);
   assert(const_offset && "Indirect output stores not allowed");
   const unsigned location = nir_intrinsic_base(instr) +
  SET_FIELD(const_offset->u32[0], BRW_NIR_FRAG_OUTPUT_LOCATION);
-  const fs_reg new_dest = retype(alloc_frag_output(this, location),
- src.type);
 
+  if (is_16bit) {
+ /* The outputs[location] should already have the original output type
+  * stored from nir_setup_outputs.
+  */
+ src = retype(src, outputs[location].type);
+  }
+
+  fs_reg new_dest = retype(alloc_frag_output(this, location),
+   src.type);
+
+  /* This is a workaround to support 16-bits outputs on HW that doesn't
+   * support half-precision render-target-write (RTW) messages. In these
+   * cases, we construct a 32-bit payload with the result of the
+   * conversion of the output values from 16-bit to 32-bit. Later on, a
+   * render target with a 16-bit surface format will force the correct
+   * conversion of the 32-bit output values to 16-bit.
+   */
+  if (is_16bit) {
+ new_dest.type = brw_reg_type_from_bit_size(32, src.type);
+  }
   for (unsigned j = 0; j < instr->num_components; j++)
  bld.MOV(offset(new_dest, bld, nir_intrinsic_component(instr) + j),
  offset(src, bld, j));
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 09/14] i965/fs: Include support for SEND data_format bit for Render Targets

2018-02-23 Thread Jose Maria Casanova Crespo

From intel Skylake PRM, vol 07, section "EU Overview", subsection
"Send Message" (page 905):

   "Bit 30: Data format. This field specifies the width of data read
from sampler or written to render target. Format = U1 0
Single Precision (32b), 1 Half Precision (16b)"

Also present on vol 02d, "Message Descriptor - Render Target Write"
(page 326).

It is worth to note that this bit is also present on
Cherryview/Braswell but not on Broadwell, both Gen8, so we can't check
for the presence of that bit based just on the gen (example: on
brw_inst.h).

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Eduardo Lima <el...@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinhe...@igalia.com>
---
 src/intel/compiler/brw_eu.h   |  6 --
 src/intel/compiler/brw_eu_emit.c  | 25 -
 src/intel/compiler/brw_fs.cpp |  1 +
 src/intel/compiler/brw_fs_generator.cpp   |  3 ++-
 src/intel/compiler/brw_fs_surface_builder.cpp |  3 ++-
 src/intel/compiler/brw_inst.h |  1 +
 src/intel/compiler/brw_shader.h   |  7 +++
 src/intel/compiler/brw_vec4_generator.cpp |  3 ++-
 8 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/src/intel/compiler/brw_eu.h b/src/intel/compiler/brw_eu.h
index 2d0f56f793..35adb47684 100644
--- a/src/intel/compiler/brw_eu.h
+++ b/src/intel/compiler/brw_eu.h
@@ -251,7 +251,8 @@ void brw_set_dp_write_message(struct brw_codegen *p,
  unsigned last_render_target,
  unsigned response_length,
  unsigned end_of_thread,
- unsigned send_commit_msg);
+ unsigned send_commit_msg,
+ unsigned data_format);
 
 void brw_urb_WRITE(struct brw_codegen *p,
   struct brw_reg dest,
@@ -303,7 +304,8 @@ void brw_fb_WRITE(struct brw_codegen *p,
   unsigned response_length,
   bool eot,
   bool last_render_target,
-  bool header_present);
+  bool header_present,
+  unsigned data_format);
 
 brw_inst *gen9_fb_READ(struct brw_codegen *p,
struct brw_reg dst,
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index c25d8d6eda..456ff32712 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -520,7 +520,8 @@ brw_set_dp_write_message(struct brw_codegen *p,
 unsigned last_render_target,
 unsigned response_length,
 unsigned end_of_thread,
-unsigned send_commit_msg)
+unsigned send_commit_msg,
+unsigned data_format)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned sfid = (devinfo->gen >= 6 ? target_cache :
@@ -532,6 +533,16 @@ brw_set_dp_write_message(struct brw_codegen *p,
brw_inst_set_binding_table_index(devinfo, insn, binding_table_index);
brw_inst_set_dp_write_msg_type(devinfo, insn, msg_type);
brw_inst_set_dp_write_msg_control(devinfo, insn, msg_control);
+   if (data_format) {
+  /* data_format is supported since CherryView. So we can't just set the
+   * any data_format value, because it would trigger an assertion on
+   * brw_inst_set_data_format for previous hw if they try to set it to
+   * zero. And we don't add an generation assert because as mentioned,
+   * brw_inst_set_data_format already does that.
+   */
+  brw_inst_set_data_format(devinfo, insn, data_format);
+   }
+
brw_inst_set_rt_last(devinfo, insn, last_render_target);
if (devinfo->gen < 7) {
   brw_inst_set_dp_write_commit(devinfo, insn, send_commit_msg);
@@ -2063,7 +2074,8 @@ void brw_oword_block_write_scratch(struct brw_codegen *p,
   0, /* not a render target */
   send_commit_msg, /* response_length */
   0, /* eot */
-  send_commit_msg);
+  send_commit_msg,
+  0 /* data_format */);
}
 }
 
@@ -2257,7 +2269,8 @@ void brw_fb_WRITE(struct brw_codegen *p,
   unsigned response_length,
   bool eot,
   bool last_render_target,
-  bool header_present)
+  bool header_present,
+  unsigned data_format)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned target_cache =
@@ -2305,7 +2318,8 @@ void brw_fb_WRITE(struct brw_codegen *p,
last_render_target,
response_length,
eot,
-

[Mesa-dev] [PATCH v5 02/14] i965/compiler: includes 16-bit vertex input

2018-02-23 Thread Jose Maria Casanova Crespo

Includes the info about 16-bit vertex inputs coming from nir on brw VS
prog data, as we already do with 64-bit vertex input.

v2: Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
 src/intel/compiler/brw_compiler.h | 1 +
 src/intel/compiler/brw_vec4.cpp   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/intel/compiler/brw_compiler.h 
b/src/intel/compiler/brw_compiler.h
index b1086bbcee..fab79028d6 100644
--- a/src/intel/compiler/brw_compiler.h
+++ b/src/intel/compiler/brw_compiler.h
@@ -960,6 +960,7 @@ struct brw_vs_prog_data {
 
GLbitfield64 inputs_read;
GLbitfield64 double_inputs_read;
+   GLbitfield64 inputs_read_16bit;
 
unsigned nr_attribute_slots;
 
diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp
index e95886349d..58fa35612b 100644
--- a/src/intel/compiler/brw_vec4.cpp
+++ b/src/intel/compiler/brw_vec4.cpp
@@ -2770,6 +2770,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
 
prog_data->inputs_read = shader->info.inputs_read;
prog_data->double_inputs_read = shader->info.vs.double_inputs;
+   prog_data->inputs_read_16bit = shader->info.vs.inputs_read_16bit;
 
brw_nir_lower_vs_inputs(shader, key->gl_attrib_wa_flags);
brw_nir_lower_vue_outputs(shader, is_scalar);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 05/14] i965/fs: Unpack 16-bit from 32-bit components in VS load_input

2018-02-23 Thread Jose Maria Casanova Crespo

The VS load input for 16-bit values receives pairs of 16-bit values
packed in 32-bit values. Because of the adjusted format used at:

 anv/pipeline: Use 32-bit surface formats for 16-bit formats

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: Fix coding style and typo (Topi Pohjolainen)
Simplify unshuffle 32-bit to 16-bit using helper function
(Jason Ekstrand)
v4: Rebase minor changes (Chema Casanova)
---
 src/intel/compiler/brw_fs_nir.cpp | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 27611a21d0..eb45b5df27 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2450,8 +2450,26 @@ fs_visitor::nir_emit_vs_intrinsic(const fs_builder ,
   if (type_sz(dest.type) == 8)
  first_component /= 2;
 
-  for (unsigned j = 0; j < num_components; j++) {
- bld.MOV(offset(dest, bld, j), offset(src, bld, j + first_component));
+  if (type_sz(dest.type) == 2) {
+ /* The VS load input for 16-bit values receives pairs of 16-bit
+  * values packed in 32-bit values. This is an example on SIMD8:
+  *
+  * xy xy xy xy xy xy xy xy
+  * zw zw zw zw zw zw zw xw
+  *
+  * We need to format it to something like:
+  *
+  * xx xx xx xx yy yy yy yy
+  * zz zz zz zz ww ww ww ww
+  */
+
+ shuffle_32bit_load_result_to_16bit_data(bld,
+ dest,
+ retype(src, 
BRW_REGISTER_TYPE_F),
+ num_components);
+  } else {
+ for (unsigned j = 0; j < num_components; j++)
+bld.MOV(offset(dest, bld, j), offset(src, bld, j + 
first_component));
   }
 
   if (type_sz(dest.type) == 8) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 12/14] i965/fs: 16-bit source payloads always use 1 register

2018-02-23 Thread Jose Maria Casanova Crespo

Render Target Message's payloads for 16bit values fit in only one
register.

From Intel PRM vol07, page 249 "Render Target Messages" / "Message
Data Payloads"

   "The half precision Render Target Write messages have data payloads
that can pack a full SIMD16 payload into 1 register instead of
two. The half-precision packed format is used for RGBA and Source
0 Alpha, but Source Depth data payload is always supplied in full
precision."

So when 16-bit data is uploaded to the payload it will use 1 register
independently of it is SIMD16 or SIMD8.

This change implies that we need to replicate the approach in the
copy propagation of the load_payload operations.

v2: By default 16-bit sources should be packed (Jason Ekstrand)
Include changes in in copy_propagation of load_payload (Chema Casanova)
---
 src/intel/compiler/brw_fs.cpp  | 5 -
 src/intel/compiler/brw_fs_copy_propagation.cpp | 4 ++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 449588c484..9d0b30e6e8 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3523,7 +3523,10 @@ fs_visitor::lower_load_payload()
   for (uint8_t i = inst->header_size; i < inst->sources; i++) {
  if (inst->src[i].file != BAD_FILE)
 ibld.MOV(retype(dst, inst->src[i].type), inst->src[i]);
- dst = offset(dst, ibld, 1);
+ if (type_sz(inst->src[i].type) == 2)
+dst = byte_offset(dst, REG_SIZE);
+ else
+dst = offset(dst, ibld, 1);
   }
 
   inst->remove(block);
diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index 92cc0a8de5..b714182fec 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -829,7 +829,7 @@ fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, 
bblock_t *block,
  int offset = 0;
  for (int i = 0; i < inst->sources; i++) {
 int effective_width = i < inst->header_size ? 8 : inst->exec_size;
-assert(effective_width * type_sz(inst->src[i].type) % REG_SIZE == 
0);
+assert(effective_width * MAX2(4, type_sz(inst->src[i].type)) % 
REG_SIZE == 0);
 const unsigned size_written = effective_width *
   type_sz(inst->src[i].type);
 if (inst->src[i].file == VGRF) {
@@ -845,7 +845,7 @@ fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, 
bblock_t *block,
   ralloc_free(entry);
}
 }
-offset += size_written;
+offset += type_sz(inst->src[i].type) == 2 ? REG_SIZE : 
size_written;
  }
   }
}
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 03/14] anv/pipeline: Use 32-bit surface formats for 16-bit formats

2018-02-23 Thread Jose Maria Casanova Crespo

From: Alejandro Piñeiro <apinhe...@igalia.com>

From Vulkan 1.0.50 spec, Section 3.30.1. Format Definition:
VK_FORMAT_R16G16_SFLOAT

A two-component, 32-bit signed floating-point format that has a
16-bit R component in bytes 0..1, and a 16-bit G component in
bytes 2..3.

As vertex data and other inputs has been always expected to be up-converted
from 16-bit to 32-bits. But when we use the 16-bit input in the shader
without any conversion we use 32-bit uint format. (applies also to use of
2/3/4 components)

At skl PRM, vol 07, section FormatConversion, page 445 there is
a table that points that *16*FLOAT formats are converted to FLOAT,
that in that context, is a 32-bit float. This is similar to the
*64*FLOAT formats, that converts 64-bit floats to 32-bit floats.

Unfortunately, while with 64-bit floats we have the alternative to use
*64*PASSTHRU formats, it is not the case with 16-bits.

This issue happens too with 16-bit int surface formats.

As a workaround, if we are using a 16-bit location at the shader, we
use 32-bit uint formats to avoid the conversion, and will fix getting the
proper content later. Note that as we are using 32-bit formats, we
can use formats with less components (example: use *R32* for *R16G16*).

v2: Always use UINT surface format variants. (Topi Pohjolainen)
Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
Reword commit log (Jason Ekstrand)
v3: Rebase minor changes (Chema Casanova)

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinhe...@igalia.com>
---
 src/intel/vulkan/genX_pipeline.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 89cbe293b8..9a60c286cc 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -83,6 +83,31 @@ vertex_element_comp_control(enum isl_format format, unsigned 
comp)
}
 }
 
+#if GEN_GEN >= 8
+static enum isl_format
+adjust_16bit_format(enum isl_format format)
+{
+   switch(format) {
+   case ISL_FORMAT_R16_UINT:
+   case ISL_FORMAT_R16_SINT:
+   case ISL_FORMAT_R16_FLOAT:
+   case ISL_FORMAT_R16G16_UINT:
+   case ISL_FORMAT_R16G16_SINT:
+   case ISL_FORMAT_R16G16_FLOAT:
+  return ISL_FORMAT_R32_UINT;
+   case ISL_FORMAT_R16G16B16_UINT:
+   case ISL_FORMAT_R16G16B16_SINT:
+   case ISL_FORMAT_R16G16B16_FLOAT:
+   case ISL_FORMAT_R16G16B16A16_UINT:
+   case ISL_FORMAT_R16G16B16A16_SINT:
+   case ISL_FORMAT_R16G16B16A16_FLOAT:
+  return ISL_FORMAT_R32G32_UINT;
+   default:
+  return format;
+   }
+}
+#endif
+
 static void
 emit_vertex_input(struct anv_pipeline *pipeline,
   const VkPipelineVertexInputStateCreateInfo *info)
@@ -95,6 +120,10 @@ emit_vertex_input(struct anv_pipeline *pipeline,
assert((inputs_read & ((1 << VERT_ATTRIB_GENERIC0) - 1)) == 0);
const uint32_t elements = inputs_read >> VERT_ATTRIB_GENERIC0;
const uint32_t elements_double = double_inputs_read >> VERT_ATTRIB_GENERIC0;
+#if GEN_GEN >= 8
+   const uint64_t inputs_read_16bit = vs_prog_data->inputs_read_16bit;
+   const uint32_t elements_16bit = inputs_read_16bit >> VERT_ATTRIB_GENERIC0;
+#endif
const bool needs_svgs_elem = vs_prog_data->uses_vertexid ||
 vs_prog_data->uses_instanceid ||
 vs_prog_data->uses_basevertex ||
@@ -125,6 +154,11 @@ emit_vertex_input(struct anv_pipeline *pipeline,
   VK_IMAGE_ASPECT_COLOR_BIT,
   VK_IMAGE_TILING_LINEAR);
 
+#if GEN_GEN >= 8
+  if ((elements_16bit & (1 << desc->location)) != 0) {
+ format = adjust_16bit_format(format);
+  }
+#endif
   assert(desc->binding < MAX_VBS);
 
   if ((elements & (1 << desc->location)) == 0)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 11/14] i965/fs: Mark 16-bit outputs on FS store_output

2018-02-23 Thread Jose Maria Casanova Crespo

On SKL the render target write operations allow 16-bit format
output. This marks output registers as 16-bit using
BRW_REGISTER_TYPE_HF on the proper outputs target.

This allows to recognise when the data_format of 16-bit should be
enabled on render_target_write messages.

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Eduardo Lima <el...@igalia.com>
---
 src/intel/compiler/brw_fs_nir.cpp | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 03ee1d1e09..1688a9a3d8 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3250,13 +3250,16 @@ emit_coherent_fb_read(const fs_builder , const 
fs_reg , unsigned target)
 }
 
 static fs_reg
-alloc_temporary(const fs_builder , unsigned size, fs_reg *regs, unsigned n)
+alloc_temporary(const fs_builder , unsigned size, fs_reg *regs, unsigned n,
+bool is_16bit)
 {
if (n && regs[0].file != BAD_FILE) {
   return regs[0];
 
} else {
-  const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, size);
+  const brw_reg_type type =
+ is_16bit ? BRW_REGISTER_TYPE_HF : BRW_REGISTER_TYPE_F;
+  const fs_reg tmp = bld.vgrf(type, size);
 
   for (unsigned i = 0; i < n; i++)
  regs[i] = tmp;
@@ -3266,7 +3269,7 @@ alloc_temporary(const fs_builder , unsigned size, 
fs_reg *regs, unsigned n)
 }
 
 static fs_reg
-alloc_frag_output(fs_visitor *v, unsigned location)
+alloc_frag_output(fs_visitor *v, unsigned location, bool is_16bit)
 {
assert(v->stage == MESA_SHADER_FRAGMENT);
const brw_wm_prog_key *const key =
@@ -3275,26 +3278,26 @@ alloc_frag_output(fs_visitor *v, unsigned location)
const unsigned i = GET_FIELD(location, BRW_NIR_FRAG_OUTPUT_INDEX);
 
if (i > 0 || (key->force_dual_color_blend && l == FRAG_RESULT_DATA1))
-  return alloc_temporary(v->bld, 4, >dual_src_output, 1);
+  return alloc_temporary(v->bld, 4, >dual_src_output, 1, is_16bit);
 
else if (l == FRAG_RESULT_COLOR)
   return alloc_temporary(v->bld, 4, v->outputs,
- MAX2(key->nr_color_regions, 1));
+ MAX2(key->nr_color_regions, 1),
+ is_16bit);
 
else if (l == FRAG_RESULT_DEPTH)
-  return alloc_temporary(v->bld, 1, >frag_depth, 1);
+  return alloc_temporary(v->bld, 1, >frag_depth, 1, is_16bit);
 
else if (l == FRAG_RESULT_STENCIL)
-  return alloc_temporary(v->bld, 1, >frag_stencil, 1);
+  return alloc_temporary(v->bld, 1, >frag_stencil, 1, is_16bit);
 
else if (l == FRAG_RESULT_SAMPLE_MASK)
-  return alloc_temporary(v->bld, 1, >sample_mask, 1);
+  return alloc_temporary(v->bld, 1, >sample_mask, 1, is_16bit);
 
else if (l >= FRAG_RESULT_DATA0 &&
 l < FRAG_RESULT_DATA0 + BRW_MAX_DRAW_BUFFERS)
   return alloc_temporary(v->bld, 4,
- >outputs[l - FRAG_RESULT_DATA0], 1);
-
+ >outputs[l - FRAG_RESULT_DATA0], 1, is_16bit);
else
   unreachable("Invalid location");
 }
@@ -3357,7 +3360,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
  src = retype(src, outputs[location].type);
   }
 
-  fs_reg new_dest = retype(alloc_frag_output(this, location),
+  fs_reg new_dest = retype(alloc_frag_output(this, location, false),
src.type);
 
   /* This is a workaround to support 16-bits outputs on HW that doesn't
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 13/14] i965/fs: Use half_precision data_format on 16-bit fb writes

2018-02-23 Thread Jose Maria Casanova Crespo

From: Alejandro Piñeiro 

---
 src/intel/compiler/brw_fs_visitor.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/compiler/brw_fs_visitor.cpp 
b/src/intel/compiler/brw_fs_visitor.cpp
index 7a5f6451f2..c3bc024095 100644
--- a/src/intel/compiler/brw_fs_visitor.cpp
+++ b/src/intel/compiler/brw_fs_visitor.cpp
@@ -439,6 +439,12 @@ fs_visitor::emit_fb_writes()
   inst = emit_single_fb_write(abld, this->outputs[target],
   this->dual_src_output, src0_alpha, 4);
   inst->target = target;
+
+  /* Enables half-precision data_format for 16-bit outputs on
+   * Render Target Write Messages. Supported since cherry-view and
+   * Skylake.
+   */
+  inst->data_format = type_sz(this->outputs[target].type) == 2;
}
 
prog_data->dual_src_blend = (this->dual_src_output.file != BAD_FILE);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 10/14] i965/disasm: Show half-precision data_format on rt_writes

2018-02-23 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_disasm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c
index 429ed78140..2def79f1d5 100644
--- a/src/intel/compiler/brw_disasm.c
+++ b/src/intel/compiler/brw_disasm.c
@@ -1676,6 +1676,10 @@ brw_disassemble_inst(FILE *file, const struct 
gen_device_info *devinfo,
   brw_inst_rt_message_type(devinfo, inst), );
if (devinfo->gen >= 6 && brw_inst_rt_slot_group(devinfo, inst))
   string(file, " Hi");
+   if ((devinfo->gen >= 9 || devinfo->is_cherryview) &&
+   brw_inst_data_format(devinfo, inst)) {
+  string(file, " HP");
+   }
if (brw_inst_rt_last(devinfo, inst))
   string(file, " LastRT");
if (devinfo->gen < 7 && brw_inst_dp_write_commit(devinfo, inst))
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 01/14] compiler: Mark when input/ouput attribute at VS uses 16-bit

2018-02-23 Thread Jose Maria Casanova Crespo

New shader attribute to mark when a location has 16-bit
value. This patch includes support on mesa glsl and nir.

v2: Remove use of is_half_slot as is a duplicate of is_16bit
(Topi Pohjolainen)
Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
 src/compiler/glsl_types.h  | 15 +++
 src/compiler/nir/nir_gather_info.c | 10 +++---
 src/compiler/shader_info.h |  3 +++
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h
index ab0b263764..0a9a3d61ec 100644
--- a/src/compiler/glsl_types.h
+++ b/src/compiler/glsl_types.h
@@ -100,6 +100,13 @@ static inline bool glsl_base_type_is_integer(enum 
glsl_base_type type)
   type == GLSL_TYPE_IMAGE;
 }
 
+static inline bool glsl_base_type_is_16bit(enum glsl_base_type type)
+{
+   return type == GLSL_TYPE_FLOAT16 ||
+  type == GLSL_TYPE_UINT16 ||
+  type == GLSL_TYPE_INT16;
+}
+
 enum glsl_sampler_dim {
GLSL_SAMPLER_DIM_1D = 0,
GLSL_SAMPLER_DIM_2D,
@@ -574,6 +581,14 @@ public:
   return glsl_base_type_is_64bit(base_type);
}
 
+   /**
+* Query whether or not a type is 16-bit
+*/
+   bool is_16bit() const
+   {
+  return glsl_base_type_is_16bit(base_type);
+   }
+
/**
 * Query whether or not a type is a non-array boolean type
 */
diff --git a/src/compiler/nir/nir_gather_info.c 
b/src/compiler/nir/nir_gather_info.c
index 743f968035..d661a9c89b 100644
--- a/src/compiler/nir/nir_gather_info.c
+++ b/src/compiler/nir/nir_gather_info.c
@@ -55,9 +55,12 @@ set_io_mask(nir_shader *shader, nir_variable *var, int 
offset, int len,
 shader->info.inputs_read |= bitfield;
 
  /* double inputs read is only for vertex inputs */
- if (shader->info.stage == MESA_SHADER_VERTEX &&
- glsl_type_is_dual_slot(glsl_without_array(var->type)))
-shader->info.vs.double_inputs_read |= bitfield;
+ if (shader->info.stage == MESA_SHADER_VERTEX) {
+if (glsl_type_is_dual_slot(glsl_without_array(var->type)))
+   shader->info.vs.double_inputs_read |= bitfield;
+else if (glsl_get_bit_size(glsl_without_array(var->type)) == 16)
+   shader->info.vs.inputs_read_16bit |= bitfield;
+ }
 
  if (shader->info.stage == MESA_SHADER_FRAGMENT) {
 shader->info.fs.uses_sample_qualifier |= var->data.sample;
@@ -380,6 +383,7 @@ nir_shader_gather_info(nir_shader *shader, 
nir_function_impl *entrypoint)
if (shader->info.stage == MESA_SHADER_VERTEX) {
   shader->info.vs.double_inputs = 0;
   shader->info.vs.double_inputs_read = 0;
+  shader->info.vs.inputs_read_16bit = 0;
}
if (shader->info.stage == MESA_SHADER_FRAGMENT) {
   shader->info.fs.uses_sample_qualifier = false;
diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index e7fd7dbe62..645f05cd1b 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -113,6 +113,9 @@ typedef struct shader_info {
 
  /* Which inputs are actually read and are double */
  uint64_t double_inputs_read;
+
+ /* Which inputs are actually read and are 16-bit type */
+ uint64_t inputs_read_16bit;
   } vs;
 
   struct {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v5 00/14] VK_KHR_16bit_storage input/output support for gen8+

2018-02-23 Thread Jose Maria Casanova Crespo

Hello,

This is a re-send with rebased V5 series with the implementation of
the storageInputOutput16 feature for VK_KHR_16bit_storage:

This serie including the related to SSBO/UBO/PushConstant
sent today is already available also at:

https://github.com/Igalia/mesa/tree/wip/VK_KHR_16bit_storage-rc5

If patches 1-8 could be reviewed the implementation would pass CTS tests
and 9-14 are mainly taking advantage of the Half Precision render target
support avaliable since SKL and BSW/CHV.

Finally an updated overview of the patches:

Patches 1-5 implement 16-bit vertex attribute inputs support on
i965. These include changes on anv. This was needed because 16-bit
surface formats do implicit conversion to 32-bit. To workaround this,
we override the 16-bit surface format, and use 32-bit ones.

Patch 6 implements load input and load store between pipeline
stages.

Patch 7 implements 16-bit store output support for fragment
shaders on i965. This implementation uses a 16-bit -> 32-bit
conversion and takes advance of the output format conversion to
write with 16bit format. This general solution is needed for
generations that the lack for support of half precision render
targets for all (BDW) or some formats (BSW,CHV).

Patch 8 enables VK_KHR_16bit_storage input/outputs support as
previous patch enables FS outputs for gen8+

Patches 9-14  implement the Half Precision Render Target support
to avoid the 32-bit -> 16-bit bit conversions introduced in [7] that
can be used at Gen9+ and is in some cases for BSW/CHV.

Cc: Jason Ekstrand <jason.ekstr...@intel.com>
Cc: Topi Pohjolainen <topi.pohjolai...@intel.com>

Alejandro Piñeiro (3):
  anv/pipeline: Use 32-bit surface formats for 16-bit formats
  anv/cmd_buffer: Add a padding to the vertex buffer
  i965/fs: Use half_precision data_format on 16-bit fb writes

Jose Maria Casanova Crespo (11):
  compiler: Mark when input/ouput attribute at VS uses 16-bit
  i965/compiler: includes 16-bit vertex input
  i965/fs: Unpack 16-bit from 32-bit components in VS load_input
  i965/fs: Support 16-bit types at load_input and store_output
  i965/fs: Enable Render Target Write for 16-bit outputs
  anv: Enable VK_KHR_16bit_storage for input/output
  i965/fs: Include support for SEND data_format bit for Render Targets
  i965/disasm: Show half-precision data_format on rt_writes
  i965/fs: Mark 16-bit outputs on FS store_output
  i965/fs: 16-bit source payloads always use 1 register
  i965/fs: Enable 16-bit render target write on SKL and CHV

 src/compiler/glsl_types.h  |  15 +++
 src/compiler/nir/nir_gather_info.c |  10 +-
 src/compiler/shader_info.h |   3 +
 src/intel/compiler/brw_compiler.h  |   1 +
 src/intel/compiler/brw_disasm.c|   4 +
 src/intel/compiler/brw_eu.h|   6 +-
 src/intel/compiler/brw_eu_emit.c   |  25 +++-
 src/intel/compiler/brw_fs.cpp  |   6 +-
 src/intel/compiler/brw_fs_copy_propagation.cpp |   4 +-
 src/intel/compiler/brw_fs_generator.cpp|   3 +-
 src/intel/compiler/brw_fs_nir.cpp  | 180 ++---
 src/intel/compiler/brw_fs_surface_builder.cpp  |   3 +-
 src/intel/compiler/brw_fs_visitor.cpp  |   6 +
 src/intel/compiler/brw_inst.h  |   1 +
 src/intel/compiler/brw_shader.h|   7 +
 src/intel/compiler/brw_vec4.cpp|   1 +
 src/intel/compiler/brw_vec4_generator.cpp  |   3 +-
 src/intel/vulkan/anv_device.c  |  11 +-
 src/intel/vulkan/genX_cmd_buffer.c |  20 ++-
 src/intel/vulkan/genX_pipeline.c   |  34 +
 20 files changed, 306 insertions(+), 37 deletions(-)

-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/7] anv: Enable VK_KHR_16bit_storage for SSBO and UBO

2018-02-23 Thread Jose Maria Casanova Crespo

Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.
---
 src/intel/vulkan/anv_device.c  | 5 +++--
 src/intel/vulkan/anv_extensions.py | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index cedeed5621..a7b586c79c 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -783,9 +783,10 @@ void anv_GetPhysicalDeviceFeatures2KHR(
   case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_16BIT_STORAGE_FEATURES_KHR: {
  VkPhysicalDevice16BitStorageFeaturesKHR *features =
 (VkPhysicalDevice16BitStorageFeaturesKHR *)ext;
+ ANV_FROM_HANDLE(anv_physical_device, pdevice, physicalDevice);
 
- features->storageBuffer16BitAccess = false;
- features->uniformAndStorageBuffer16BitAccess = false;
+ features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
+ features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->storagePushConstant16 = false;
  features->storageInputOutput16 = false;
  break;
diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index 581921e62a..2999b3406f 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -49,7 +49,7 @@ class Extension:
 # and dEQP-VK.api.info.device fail due to the duplicated strings.
 EXTENSIONS = [
 Extension('VK_ANDROID_native_buffer', 5, 'ANDROID'),
-Extension('VK_KHR_16bit_storage', 1, False),
+Extension('VK_KHR_16bit_storage', 1, 'device->info.gen 
>= 8'),
 Extension('VK_KHR_bind_memory2',  1, True),
 Extension('VK_KHR_dedicated_allocation',  1, True),
 Extension('VK_KHR_descriptor_update_template',1, True),
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 7/7] anv: Enable VK_KHR_16bit_storage for PushConstant

2018-02-23 Thread Jose Maria Casanova Crespo

Enables storagePushConstant16 features of VK_KHR_16bit_storage for Gen8+.
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a7b586c79c..7c8b768c58 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -787,7 +787,7 @@ void anv_GetPhysicalDeviceFeatures2KHR(
 
  features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
- features->storagePushConstant16 = false;
+ features->storagePushConstant16 = pdevice->info.gen >= 8;
  features->storageInputOutput16 = false;
  break;
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/7] i965/fs: Support 16-bit do_read_vector with VK_KHR_relaxed_block_layout

2018-02-23 Thread Jose Maria Casanova Crespo

16-bit load_ubo/ssbo operations that call do_untyped_read_vector doesn't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example on 16-bit scalar arrays and in the case
of f16vec3 when then VK_KHR_relaxed_block_layoud is enabled.

Vectors reads when we have non-constant offsets are implemented with
multiple byte_scattered_read messages that not require 32-bit aligned offsets.
The same applies for constant offsets not aligned to 32-bits.

Untyped_read_surface is used message when there is a constant offset
32-bit aligned and we have more than 1 component to read.
---
 src/intel/compiler/brw_fs_nir.cpp | 60 ---
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 8efec34cc9..45b8e8b637 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2305,27 +2305,55 @@ do_untyped_vector_read(const fs_builder ,
if (type_sz(dest.type) <= 2) {
   assert(dest.stride == 1);
 
-  if (num_components > 1) {
- /* Pairs of 16-bit components can be read with untyped read, for 
16-bit
-  * vec3 4th component is ignored.
+  unsigned pending_components = num_components;
+  unsigned first_component = 0;
+  boolean is_const_offset = offset_reg.file == BRW_IMMEDIATE_VALUE;
+  fs_reg read_offset;
+  if (is_const_offset)
+ read_offset = offset_reg;
+  else {
+ read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
+ bld.MOV(read_offset, offset_reg);
+  }
+  while (pending_components > 0 &&
+ (pending_components == 1 ||
+  !is_const_offset ||
+  (offset_reg.ud + first_component * 2) % 4)) {
+ /* Non constant offsets, 16-bit scalars and constant offsets not
+  * aligned 32-bits are read using one byte_scattered_read message
+  * for eache  component untyped_read requires 32-bit aligned offsets.
   */
  fs_reg read_result =
-emit_untyped_read(bld, surf_index, offset_reg,
-  1 /* dims */, DIV_ROUND_UP(num_components, 2),
-  BRW_PREDICATE_NONE);
- shuffle_32bit_load_result_to_16bit_data(bld,
-   retype(dest, BRW_REGISTER_TYPE_W),
-   retype(read_result, BRW_REGISTER_TYPE_D),
-   num_components);
-  } else {
- assert(num_components == 1);
- /* scalar 16-bit are read using one byte_scattered_read message */
- fs_reg read_result =
-emit_byte_scattered_read(bld, surf_index, offset_reg,
+emit_byte_scattered_read(bld, surf_index, read_offset,
  1 /* dims */, 1,
  type_sz(dest.type) * 8 /* bit_size */,
  BRW_PREDICATE_NONE);
- bld.MOV(dest, subscript(read_result, dest.type, 0));
+ shuffle_32bit_load_result_to_16bit_data(bld,
+   retype(offset(dest, bld, first_component), BRW_REGISTER_TYPE_W),
+   retype(read_result, BRW_REGISTER_TYPE_D),
+   1);
+ pending_components--;
+ first_component ++;
+ if (is_const_offset)
+read_offset.ud += 2;
+ else
+bld.ADD(read_offset, offset_reg, brw_imm_ud(2 * first_component));
+  }
+  assert(pending_components != 1);
+  if (pending_components > 1) {
+ assert (is_const_offset &&
+ (offset_reg.ud + first_component * 2) % 4 == 0);
+ /* At this point we have multiple 16-bit components that have constant
+  * offset multiple of 4-bytes that can be read with untyped_reads.
+  */
+ fs_reg read_result =
+emit_untyped_read(bld, surf_index, read_offset,
+  1 /* dims */, DIV_ROUND_UP(pending_components, 
2),
+  BRW_PREDICATE_NONE);
+ shuffle_32bit_load_result_to_16bit_data(bld,
+   retype(offset(dest,bld,first_component), BRW_REGISTER_TYPE_W),
+   retype(read_result, BRW_REGISTER_TYPE_D),
+   pending_components);
   }
} else if (type_sz(dest.type) == 4) {
   fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 6/7] spirv/i965/anv: Relax push constant offset assertions being 32-bit aligned

2018-02-23 Thread Jose Maria Casanova Crespo

The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
relaxed so offsets can be multiple of 4-bytes or multiple of size of the
base type.

For 16-bit types, the push constant offset takes into account the
internal offset in the 32-bit uniform bucket adding 2-bytes when we access
not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0.
---
 src/compiler/spirv/vtn_variables.c  |  1 -
 src/intel/compiler/brw_fs_nir.cpp   | 16 +++-
 src/intel/vulkan/anv_nir_lower_push_constants.c |  2 --
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 81658afbd9..87236d0abd 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -760,7 +760,6 @@ _vtn_load_store_tail(struct vtn_builder *b, 
nir_intrinsic_op op, bool load,
}
 
if (op == nir_intrinsic_load_push_constant) {
-  vtn_assert(access_offset % 4 == 0);
 
   nir_intrinsic_set_base(instr, access_offset);
   nir_intrinsic_set_range(instr, access_size);
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index abf9098252..27611a21d0 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3887,16 +3887,22 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
 
case nir_intrinsic_load_uniform: {
-  /* Offsets are in bytes but they should always be multiples of 4 */
-  assert(instr->const_index[0] % 4 == 0);
+  /* Offsets are in bytes but they should always be multiple of 4
+   * or multiple of the size of the destination type. 2 for 16-bits
+   * types.
+   */
+  assert(instr->const_index[0] % 4 == 0 ||
+ instr->const_index[0] % type_sz(dest.type) == 0);
 
   fs_reg src(UNIFORM, instr->const_index[0] / 4, dest.type);
 
   nir_const_value *const_offset = nir_src_as_const_value(instr->src[0]);
   if (const_offset) {
- /* Offsets are in bytes but they should always be multiples of 4 */
- assert(const_offset->u32[0] % 4 == 0);
- src.offset = const_offset->u32[0];
+ assert(const_offset->u32[0] % 4 == 0 ||
+const_offset->u32[0] % type_sz(dest.type) == 0);
+ /* For 16-bit types we add the module of the const_index[0]
+  * offset to access to not 32-bit aligned element */
+ src.offset = const_offset->u32[0] + instr->const_index[0] % 4;
 
  for (unsigned j = 0; j < instr->num_components; j++) {
 bld.MOV(offset(dest, bld, j), offset(src, bld, j));
diff --git a/src/intel/vulkan/anv_nir_lower_push_constants.c 
b/src/intel/vulkan/anv_nir_lower_push_constants.c
index b66552825b..ad60d0c824 100644
--- a/src/intel/vulkan/anv_nir_lower_push_constants.c
+++ b/src/intel/vulkan/anv_nir_lower_push_constants.c
@@ -41,8 +41,6 @@ anv_nir_lower_push_constants(nir_shader *shader)
 if (intrin->intrinsic != nir_intrinsic_load_push_constant)
continue;
 
-assert(intrin->const_index[0] % 4 == 0);
-
 /* We just turn them into uniform loads */
 intrin->intrinsic = nir_intrinsic_load_uniform;
  }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/7] i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout

2018-02-23 Thread Jose Maria Casanova Crespo

Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.

Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non constant offsets we will always use
byte_scattered_write.
---
 src/intel/compiler/brw_fs_nir.cpp | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 45b8e8b637..abf9098252 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -4135,6 +4135,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  unsigned num_components = ffs(~(writemask >> first_component)) - 1;
  fs_reg write_src = offset(val_reg, bld, first_component);
 
+ nir_const_value *const_offset = nir_src_as_const_value(instr->src[2]);
+
  if (type_size > 4) {
 /* We can't write more than 2 64-bit components at once. Limit
  * the num_components of the write to what we can do and let the 
next
@@ -4150,14 +4152,19 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  * 32-bit-aligned we need to use byte-scattered writes because
  * untyped writes works with 32-bit components with 32-bit
  * alignment. byte_scattered_write messages only support one
- * 16-bit component at a time.
+ * 16-bit component at a time. As VK_KHR_relaxed_block_layout
+ * could be enabled we can not guarantee that not constant offsets
+ * to be 32-bit aligned for 16-bit types. For example an array, of
+ * 16-bit vec3 with array element stride of 6.
  *
- * For example, if there is a 3-components vector we submit one
- * untyped-write message of 32-bit (first two components), and one
- * byte-scattered write message (the last component).
+ * In the case of 32-bit aligned constant offsets if there is
+ * a 3-components vector we submit one untyped-write message
+ * of 32-bit (first two components), and one byte-scattered
+ * write message (the last component).
  */
 
-if (first_component % 2) {
+if ( !const_offset || ((const_offset->u32[0] +
+   type_size * first_component) % 4)) {
/* If we use a .yz writemask we also need to emit 2
 * byte-scattered write messages because of y-component not
 * being aligned to 32-bit.
@@ -4183,7 +4190,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  }
 
  fs_reg offset_reg;
- nir_const_value *const_offset = nir_src_as_const_value(instr->src[2]);
+
  if (const_offset) {
 offset_reg = brw_imm_ud(const_offset->u32[0] +
 type_size * first_component);
@@ -4222,7 +4229,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
  } else {
 assert(num_components * type_size <= 16);
 assert((num_components * type_size) % 4 == 0);
-assert((first_component * type_size) % 4 == 0);
+assert(!const_offset ||
+   (const_offset->u32[0] + type_size * first_component) % 4 == 
0);
 unsigned num_slots = (num_components * type_size) / 4;
 
 emit_untyped_write(bld, surf_index, offset_reg,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 5/7] spirv: Calculate properly 16-bit vector sizes

2018-02-23 Thread Jose Maria Casanova Crespo

Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.
---
 src/compiler/spirv/vtn_variables.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 78adab3ed2..81658afbd9 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -687,6 +687,10 @@ vtn_type_block_size(struct vtn_builder *b, struct vtn_type 
*type)
 base_type == GLSL_TYPE_UINT64 ||
 base_type == GLSL_TYPE_INT64) {
  return glsl_get_vector_elements(type->type) * 8;
+  } else if (base_type == GLSL_TYPE_FLOAT16 ||
+ base_type == GLSL_TYPE_UINT16 ||
+ base_type == GLSL_TYPE_INT16){
+ return glsl_get_vector_elements(type->type) * 2;
   } else {
  return glsl_get_vector_elements(type->type) * 4;
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/7] anv/spirv: SSBO/UBO buffers needs padding size is not multiple of 32-bits

2018-02-23 Thread Jose Maria Casanova Crespo

The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example is basic 16-bit cases of buffers with size 2 or 6 where the
last two bytes will always be read as 0 or its writting ignored.

The introduction of 16-bit types implies that we need to align the size
to 4-bytes multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.

But, when unsized_arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation of SSBO calculates the needed size of the surface,
as suggested by Jason:

surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size

So when we calculate backwards the SpvOpArrayLenght with a nir expresion
when the array stride is not multiple of 4.

array_size = (surface_size & ~3) - (surface_size & 3)

It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.
---

I have some doubts if vtn_variables.c is the best place to include
this specific to calculate the real buffer size as this is new
calculus seems to be quite HW dependent and maybe other drivers different
to ANV don't need this kind of solution.

 src/compiler/spirv/vtn_variables.c| 14 ++
 src/intel/vulkan/anv_descriptor_set.c | 16 
 src/intel/vulkan/anv_device.c | 11 +++
 3 files changed, 41 insertions(+)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 9eb85c24e9..78adab3ed2 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -2113,6 +2113,20 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp opcode,
   nir_builder_instr_insert(>nb, >instr);
   nir_ssa_def *buf_size = >dest.ssa;
 
+  /* Calculate real length if padding was done to align the buffer
+   * to 32-bits. This only could happen is stride is not multiple
+   * of 4. Introduced to support 16-bit type unsized arrays in anv.
+   */
+  if (stride % 4) {
+ buf_size = nir_isub(>nb,
+ nir_iand(>nb,
+  buf_size,
+  nir_imm_int(>nb, ~3)),
+ nir_iand (>nb,
+   buf_size,
+   nir_imm_int(>nb, 3)));
+  }
+
   /* array_length = max(buffer_size - offset, 0) / stride */
   nir_ssa_def *array_length =
  nir_idiv(>nb,
diff --git a/src/intel/vulkan/anv_descriptor_set.c 
b/src/intel/vulkan/anv_descriptor_set.c
index edb829601e..a97f2f37dc 100644
--- a/src/intel/vulkan/anv_descriptor_set.c
+++ b/src/intel/vulkan/anv_descriptor_set.c
@@ -704,6 +704,22 @@ anv_descriptor_set_write_buffer(struct anv_descriptor_set 
*set,
   bview->offset = buffer->offset + offset;
   bview->range = anv_buffer_get_range(buffer, offset, range);
 
+  /* Uniform and Storage buffers need to have surface size
+   * not less that the aligned 32-bit size of the buffer.
+   * To calculate the array lenght on unsized arrays
+   * in StorageBuffer the last 2 bits store the padding size
+   * added to the surface, so we can calculate latter the original
+   * buffer size to know the number of elements.
+   *
+   *  surface_size = 2 * aling_u64(buffer_size, 4)  - buffer_size
+   *
+   *  array_size = (surface_size & ~3) - (surface_size & 3)
+   */
+  if (type == VK_DESCRIPTOR_TYPE_STORAGE_BUFFER)
+ bview->range = 2 * align_u64(bview->range, 4) - bview->range;
+  else if (type == VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER)
+ bview->range = align_u64(bview->range, 4);
+
   /* If we're writing descriptors through a push command, we need to
* allocate the surface state from the command buffer. Otherwise it will
* be allocated by the descriptor pool when calling
diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index a83b7a39f6..cedeed5621 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -2103,6 +2103,17 @@ void anv_GetBufferMemoryRequirements(
 
pMemoryRequirements->size = buffer->size;
pMemoryRequirements->alignment = alignment;
+
+   /* Storage and Uniform buffers should have their size aligned to
+* 32-bits to avoid boundary checks when last DWord is not complete.
+* This would ensure that not internal padding would be needed for
+* 16-bit types.
+*/
+   if (device->robust_buffer_access &&
+   (buffer->usage & VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT ||
+buffer->usage & VK_BUFFER_USAGE_STORAGE_BUFFER_BIT))
+  pMemoryRequirements->size = align_u64(buffer->size, 4);
+
pMemoryRequirements->memoryTypeBits = memory_types;
 }

[Mesa-dev] [PATCH 0/7] anv: VK_KHR_16bit_storage enabling SSBO/UBO/PushConstant

2018-02-23 Thread Jose Maria Casanova Crespo

This series includes several fixes to allow enabling the VK_KHR_16bit_storage
features in ANV that have already landed but are currently disabled.

The series includes the following fixes:

   * [1] Fixes issues in UBO/SSBO support when buffer size is not multiple
 of 4. Patch adds a padding so the size will always include the last DWord
 completelly. For unsized SSBO arrays there are some bits arithmetic to 
allow
 recalculating the original size without the padding to calculate the
 number of elements correctly.
   * [2-3] Fixes the behaviour when VK_KHR_relaxed_block_layout is enabled, as
 we can not guarantee that the surface read/write offsets are multiple of 4.
   * [4] Enables VK_KHR_16bit_storage for SSBO and UBO.
   * [5-7] Enables 16-bit push constants removing/changing asserts that don't
 apply anymore to 16-bit case and a fix in the calculus os the size to be
 read.

To catch this issues several new tests were developed and they will be included 
upstream
to VK-GL-CTS as soon as possible.

I will re-submit a rebased V5 series with the 16-bit Input/Output support that 
is still
pending of review.

Cc: Jason Ekstrand <jason.ekstr...@intel.com>
Cc: Topi Pohjolainen <topi.pohjolai...@intel.com>

Jose Maria Casanova Crespo (7):
  anv/spirv: SSBO/UBO buffers needs padding size is not multiple of
32-bits
  i965/fs: Support 16-bit do_read_vector with
VK_KHR_relaxed_block_layout
  i965/fs: Support 16-bit store_ssbo with VK_KHR_relaxed_block_layout
  anv: Enable VK_KHR_16bit_storage for SSBO and UBO
  spirv: Calculate properly 16-bit vector sizes
  spirv/i965/anv: Relax push constant offset assertions being 32-bit
aligned
  anv: Enable VK_KHR_16bit_storage for PushConstant

 src/compiler/spirv/vtn_variables.c  | 19 -
 src/intel/compiler/brw_fs_nir.cpp   | 98 ++---
 src/intel/vulkan/anv_descriptor_set.c   | 16 
 src/intel/vulkan/anv_device.c   | 18 -
 src/intel/vulkan/anv_extensions.py  |  2 +-
 src/intel/vulkan/anv_nir_lower_push_constants.c |  2 -
 6 files changed, 120 insertions(+), 35 deletions(-)

-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 23/44] i965/fs: Enables 16-bit load_ubo with sampler (v2)

2017-12-05 Thread Jose Maria Casanova Crespo

From: Jason Ekstrand <jason.ekstr...@intel.com>

load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
surface format defined. So when reading 16-bit components with the
sampler we need to unshuffle two 16-bit components from each 32-bit
component.

Using the sampler avoids the use of the byte_scattered_read message
that needs one message for each component and is supposed to be
slower.

v2: (Jason Ekstrand)
- Simplify component selection and unshuffling for different bitsizes
- Remove SKL optimization of reading only two 32-bit components when
  reading 16-bits types.

Reviewed-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
---
 src/intel/compiler/brw_fs.cpp | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 91399c6c1d..93bb6b4673 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -191,14 +191,21 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
 vec4_result, surf_index, vec4_offset);
inst->size_written = 4 * vec4_result.component_size(inst->exec_size);
 
-   if (type_sz(dst.type) == 8) {
-  shuffle_32bit_load_result_to_64bit_data(
- bld, retype(vec4_result, dst.type), vec4_result, 2);
+   fs_reg dw = offset(vec4_result, bld, (const_offset & 0xf) / 4);
+   switch (type_sz(dst.type)) {
+   case 2:
+  shuffle_32bit_load_result_to_16bit_data(bld, dst, dw, 1);
+  bld.MOV(dst, subscript(dw, dst.type, (const_offset / 2) & 1));
+  break;
+   case 4:
+  bld.MOV(dst, retype(dw, dst.type));
+  break;
+   case 8:
+  shuffle_32bit_load_result_to_64bit_data(bld, dst, dw, 1);
+  break;
+   default:
+  unreachable("Unsupported bit_size");
}
-
-   vec4_result.type = dst.type;
-   bld.MOV(dst, offset(vec4_result, bld,
-   (const_offset & 0xf) / type_sz(vec4_result.type)));
 }
 
 /**
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 28/44] i965/fs: Use untyped_surface_read for 16-bit load_ssbo (v2)

2017-12-05 Thread Jose Maria Casanova Crespo

SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.

But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using
only one message. Once each pair is read it is unshuffled to return the
proper 16-bit components. vec3 case is assimilated to vec4 but the 4th
component is ignored.

16-bit scalars are read using one byte_scattered_read message.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
erros (Jason Ekstrand)
Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)

CC: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_nir.cpp | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index e11e75e6332..8deec082d59 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2303,16 +2303,31 @@ do_untyped_vector_read(const fs_builder ,
unsigned num_components)
 {
if (type_sz(dest.type) <= 2) {
-  fs_reg read_offset = bld.vgrf(BRW_REGISTER_TYPE_UD);
-  bld.MOV(read_offset, offset_reg);
-  for (unsigned i = 0; i < num_components; i++) {
- fs_reg read_reg =
-emit_byte_scattered_read(bld, surf_index, read_offset,
+  assert(dest.stride == 1);
+
+  if (num_components > 1) {
+ /* Pairs of 16-bit components can be read with untyped read, for 
16-bit
+  * vec3 4th component is ignored.
+  */
+ fs_reg read_result =
+emit_untyped_read(bld, surf_index, offset_reg,
+  1 /* dims */, DIV_ROUND_UP(num_components, 2),
+  BRW_PREDICATE_NONE);
+ shuffle_32bit_load_result_to_16bit_data(bld,
+   retype(dest, BRW_REGISTER_TYPE_W),
+   retype(read_result, BRW_REGISTER_TYPE_D),
+   num_components);
+  } else {
+ assert(num_components == 1);
+ /* scalar 16-bit are read using one byte_scattered_read message */
+ fs_reg read_result =
+emit_byte_scattered_read(bld, surf_index, offset_reg,
  1 /* dims */, 1,
  type_sz(dest.type) * 8 /* bit_size */,
  BRW_PREDICATE_NONE);
- bld.MOV(offset(dest, bld, i), subscript(read_reg, dest.type, 0));
- bld.ADD(read_offset, read_offset, brw_imm_ud(type_sz(dest.type)));
+ read_result.type = dest.type;
+ read_result.stride = 2;
+ bld.MOV(dest, read_result);
   }
} else if (type_sz(dest.type) == 4) {
   fs_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 41/44] i965/fs: Use half_precision data_format on 16-bit fb writes

2017-11-29 Thread Jose Maria Casanova Crespo

From: Alejandro Piñeiro 

---
 src/intel/compiler/brw_fs_visitor.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/compiler/brw_fs_visitor.cpp 
b/src/intel/compiler/brw_fs_visitor.cpp
index 481d9c51e7..01e75ff7fc 100644
--- a/src/intel/compiler/brw_fs_visitor.cpp
+++ b/src/intel/compiler/brw_fs_visitor.cpp
@@ -439,6 +439,12 @@ fs_visitor::emit_fb_writes()
   inst = emit_single_fb_write(abld, this->outputs[target],
   this->dual_src_output, src0_alpha, 4);
   inst->target = target;
+
+  /* Enables half-precision data_format for 16-bit outputs on
+   * Render Target Write Messages. Supported since cherry-view and
+   * Skylake.
+   */
+  inst->data_format = type_sz(this->outputs[target].type) == 2;
}
 
prog_data->dual_src_blend = (this->dual_src_output.file != BAD_FILE);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 44/44] anv: Enable VK_KHR_16bit_storage for push_constant

2017-11-29 Thread Jose Maria Casanova Crespo

Enables storagePushConstant16 feature of VK_KHR_16bit_storage
for Gen8+.
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 26c0ace1ca..5b6032d794 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -733,7 +733,7 @@ void anv_GetPhysicalDeviceFeatures2KHR(
 
  features->storageBuffer16BitAccess = pdevice->info.gen >= 8;
  features->uniformAndStorageBuffer16BitAccess = pdevice->info.gen >= 8;
- features->storagePushConstant16 = false;
+ features->storagePushConstant16 = pdevice->info.gen >= 8;
  features->storageInputOutput16 = pdevice->info.gen >= 8;
  break;
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 42/44] i965/fs: Enable 16-bit render target write on SKL and CHV

2017-11-29 Thread Jose Maria Casanova Crespo

Once the infrastruture to support Render Target Messages with 16-bit
payload is available, this patch enables it on SKL and CHV platforms.

Enabling it allows 16-bit payload that use half of the register on
SIMD16 and avoids the spurious conversion from 16-bit to 32-bit needed
on BDW, just to be converted again to 16-bit.

In the case of CHV there is no support for UINT so in this case the
half precision data format is not enabled and the fallback of the
32-bit payload is used.

From PRM CHV, vol 07, section "Pixel Data Port" page 260:

"Half Precision Render Target Write messages do not support UNIT
formats." where UNIT is a typo for UINT.

v2: Removed use of stride = 2 on sources (Jason Ekstrand)

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Eduardo Lima <el...@igalia.com>
---
 src/intel/compiler/brw_fs_nir.cpp | 46 +++
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 04d1e3bbf7..f4a1dd644b 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -54,19 +54,24 @@ fs_visitor::nir_setup_outputs()
   return;
 
if (stage == MESA_SHADER_FRAGMENT) {
-  /*
+  /* On HW that doesn't support half-precision render-target-write
+   * messages (e.g, some gen8 HW like Broadwell), we need a workaround
+   * to support 16-bit outputs from pixel shaders.
+   *
* The following code uses the outputs map to save the variable's
* original output type, so later we can retrieve it and retype
* the output accordingly while emitting the FS 16-bit outputs.
*/
-  nir_foreach_variable(var, >outputs) {
- const enum glsl_base_type base_type =
-glsl_get_base_type(var->type->without_array());
-
- if (glsl_base_type_is_16bit(base_type)) {
-outputs[var->data.driver_location] =
-   retype(outputs[var->data.driver_location],
-  brw_type_for_base_type(var->type));
+  if (devinfo->gen == 8) {
+ nir_foreach_variable(var, >outputs) {
+const enum glsl_base_type base_type =
+   glsl_get_base_type(var->type->without_array());
+
+if (glsl_base_type_is_16bit(base_type)) {
+   outputs[var->data.driver_location] =
+  retype(outputs[var->data.driver_location],
+ brw_type_for_base_type(var->type));
+}
  }
   }
   return;
@@ -3341,14 +3346,27 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
   const unsigned location = nir_intrinsic_base(instr) +
  SET_FIELD(const_offset->u32[0], BRW_NIR_FRAG_OUTPUT_LOCATION);
 
+  /* This flag discriminates HW where we have support for half-precision
+   * render target write messages (aka, the data-format bit), so 16-bit
+   * render target payloads can be used. It is available since skylake
+   * and cherryview. In the case of cherryview there is no support for
+   * UINT formats.
+   */
+  bool enable_hp_rtw = is_16bit &&
+ (devinfo->gen >= 9 || (devinfo->is_cherryview &&
+outputs[location].type != 
BRW_REGISTER_TYPE_UW));
+
   if (is_16bit) {
- /* The outputs[location] should already have the original output type
-  * stored from nir_setup_outputs.
+ /* outputs[location] should already have the original output type
+  * stored from nir_setup_outputs, in case the HW doesn't support
+  * half-precision RTW messages.
+  * If HP RTW is enabled we just use HF to copy 16-bit values.
   */
- src = retype(src, outputs[location].type);
+ src = retype(src, enable_hp_rtw ?
+  BRW_REGISTER_TYPE_HF : outputs[location].type);
   }
 
-  fs_reg new_dest = retype(alloc_frag_output(this, location, false),
+  fs_reg new_dest = retype(alloc_frag_output(this, location, 
enable_hp_rtw),
src.type);
 
   /* This is a workaround to support 16-bits outputs on HW that doesn't
@@ -3358,7 +3376,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
* render target with a 16-bit surface format will force the correct
* conversion of the 32-bit output values to 16-bit.
*/
-  if (is_16bit) {
+  if (is_16bit && !enable_hp_rtw) {
  new_dest.type = brw_reg_type_from_bit_size(32, src.type);
   }
   for (unsigned j = 0; j < instr->num_components; j++)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 43/44] i965/fs: Support push constants of 16-bit types

2017-11-29 Thread Jose Maria Casanova Crespo

We enable the use of 16-bit values in push constants
modifying the assign_constant_locations function to work
with 16-bit types.

The API to access buffers in Vulkan use multiples of 4-byte for
offsets and sizes. Current accountability of uniforms based on 4-byte
slots will work for 16-bit values if they are allowed to use 32-bit
slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so
2-byte elements will use 1 slot instead of 0.

We align the 16-bit locations after assigning the 32-bit
ones.

v2: Minor changes after rebase against recent master
(José María Casanova)

v3: Rebase needs compiler->supports_pull_constants at
set_push_pull_constant_loc call. (José María Casanova)
---
 src/intel/compiler/brw_fs.cpp | 31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index b1e548fd93..650ddff09e 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -1948,8 +1948,9 @@ set_push_pull_constant_loc(unsigned uniform, int 
*chunk_start,
if (!contiguous) {
   /* If bitsize doesn't match the target one, skip it */
   if (*max_chunk_bitsize != target_bitsize) {
- /* FIXME: right now we only support 32 and 64-bit accesses */
- assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8);
+ assert(*max_chunk_bitsize == 4 ||
+*max_chunk_bitsize == 8 ||
+*max_chunk_bitsize == 2);
  *max_chunk_bitsize = 0;
  *chunk_start = -1;
  return;
@@ -2038,8 +2039,9 @@ fs_visitor::assign_constant_locations()
  int constant_nr = inst->src[i].nr + inst->src[i].offset / 4;
 
  if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) {
-assert(inst->src[2].ud % 4 == 0);
-unsigned last = constant_nr + (inst->src[2].ud / 4) - 1;
+assert(type_sz(inst->src[i].type) == 2 ?
+   (inst->src[2].ud % 2 == 0) : (inst->src[2].ud % 4 == 0));
+unsigned last = constant_nr + DIV_ROUND_UP(inst->src[2].ud, 4) - 1;
 assert(last < uniforms);
 
 for (unsigned j = constant_nr; j < last; j++) {
@@ -2051,8 +2053,8 @@ fs_visitor::assign_constant_locations()
 bitsize_access[last] = MAX2(bitsize_access[last], 
type_sz(inst->src[i].type));
  } else {
 if (constant_nr >= 0 && constant_nr < (int) uniforms) {
-   int regs_read = inst->components_read(i) *
-  type_sz(inst->src[i].type) / 4;
+   int regs_read = DIV_ROUND_UP(inst->components_read(i) *
+type_sz(inst->src[i].type), 4);
assert(regs_read <= 2);
if (regs_read == 2)
   contiguous[constant_nr] = true;
@@ -2116,7 +2118,7 @@ fs_visitor::assign_constant_locations()
 
}
 
-   /* Then push the rest of uniforms */
+   /* Then push the 32-bit uniforms */
const unsigned uniform_32_bit_size = type_sz(BRW_REGISTER_TYPE_F);
for (unsigned u = 0; u < uniforms; u++) {
   if (!is_live[u])
@@ -2136,6 +2138,21 @@ fs_visitor::assign_constant_locations()
  stage_prog_data);
}
 
+   const unsigned uniform_16_bit_size = type_sz(BRW_REGISTER_TYPE_HF);
+   for (unsigned u = 0; u < uniforms; u++) {
+  if (!is_live[u])
+ continue;
+
+  set_push_pull_constant_loc(u, _start, _chunk_bitsize,
+ contiguous[u], bitsize_access[u],
+ uniform_16_bit_size,
+ push_constant_loc, pull_constant_loc,
+ _push_constants, _pull_constants,
+ max_push_components, max_chunk_size,
+ compiler->supports_pull_constants,
+ stage_prog_data);
+   }
+
/* Add the CS local thread ID uniform at the end of the push constants */
if (subgroup_id_index >= 0)
   push_constant_loc[subgroup_id_index] = num_push_constants++;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 39/44] i965/fs: Mark 16-bit outputs on FS store_output

2017-11-29 Thread Jose Maria Casanova Crespo

On SKL the render target write operations allow 16-bit format
output. This marks output registers as 16-bit using
BRW_REGISTER_TYPE_HF on the proper outputs target.

This allows to recognise when the data_format of 16-bit should be
enabled on render_target_write messages.

Signed-off-by: Jose Maria Casanova Crespo <jmcasan...@igalia.com>
Signed-off-by: Eduardo Lima <el...@igalia.com>
---
 src/intel/compiler/brw_fs_nir.cpp | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index fb138de76a..04d1e3bbf7 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3238,13 +3238,16 @@ emit_coherent_fb_read(const fs_builder , const 
fs_reg , unsigned target)
 }
 
 static fs_reg
-alloc_temporary(const fs_builder , unsigned size, fs_reg *regs, unsigned n)
+alloc_temporary(const fs_builder , unsigned size, fs_reg *regs, unsigned n,
+bool is_16bit)
 {
if (n && regs[0].file != BAD_FILE) {
   return regs[0];
 
} else {
-  const fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, size);
+  const brw_reg_type type =
+ is_16bit ? BRW_REGISTER_TYPE_HF : BRW_REGISTER_TYPE_F;
+  const fs_reg tmp = bld.vgrf(type, size);
 
   for (unsigned i = 0; i < n; i++)
  regs[i] = tmp;
@@ -3254,7 +3257,7 @@ alloc_temporary(const fs_builder , unsigned size, 
fs_reg *regs, unsigned n)
 }
 
 static fs_reg
-alloc_frag_output(fs_visitor *v, unsigned location)
+alloc_frag_output(fs_visitor *v, unsigned location, bool is_16bit)
 {
assert(v->stage == MESA_SHADER_FRAGMENT);
const brw_wm_prog_key *const key =
@@ -3263,26 +3266,26 @@ alloc_frag_output(fs_visitor *v, unsigned location)
const unsigned i = GET_FIELD(location, BRW_NIR_FRAG_OUTPUT_INDEX);
 
if (i > 0 || (key->force_dual_color_blend && l == FRAG_RESULT_DATA1))
-  return alloc_temporary(v->bld, 4, >dual_src_output, 1);
+  return alloc_temporary(v->bld, 4, >dual_src_output, 1, is_16bit);
 
else if (l == FRAG_RESULT_COLOR)
   return alloc_temporary(v->bld, 4, v->outputs,
- MAX2(key->nr_color_regions, 1));
+ MAX2(key->nr_color_regions, 1),
+ is_16bit);
 
else if (l == FRAG_RESULT_DEPTH)
-  return alloc_temporary(v->bld, 1, >frag_depth, 1);
+  return alloc_temporary(v->bld, 1, >frag_depth, 1, is_16bit);
 
else if (l == FRAG_RESULT_STENCIL)
-  return alloc_temporary(v->bld, 1, >frag_stencil, 1);
+  return alloc_temporary(v->bld, 1, >frag_stencil, 1, is_16bit);
 
else if (l == FRAG_RESULT_SAMPLE_MASK)
-  return alloc_temporary(v->bld, 1, >sample_mask, 1);
+  return alloc_temporary(v->bld, 1, >sample_mask, 1, is_16bit);
 
else if (l >= FRAG_RESULT_DATA0 &&
 l < FRAG_RESULT_DATA0 + BRW_MAX_DRAW_BUFFERS)
   return alloc_temporary(v->bld, 4,
- >outputs[l - FRAG_RESULT_DATA0], 1);
-
+ >outputs[l - FRAG_RESULT_DATA0], 1, is_16bit);
else
   unreachable("Invalid location");
 }
@@ -3345,7 +3348,7 @@ fs_visitor::nir_emit_fs_intrinsic(const fs_builder ,
  src = retype(src, outputs[location].type);
   }
 
-  fs_reg new_dest = retype(alloc_frag_output(this, location),
+  fs_reg new_dest = retype(alloc_frag_output(this, location, false),
src.type);
 
   /* This is a workaround to support 16-bits outputs on HW that doesn't
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 40/44] i965/fs: 16-bit source payloads always use 1 register

2017-11-29 Thread Jose Maria Casanova Crespo

Render Target Message's payloads for 16bit values fit in only one
register.

From Intel PRM vol07, page 249 "Render Target Messages" / "Message
Data Payloads"

   "The half precision Render Target Write messages have data payloads
that can pack a full SIMD16 payload into 1 register instead of
two. The half-precision packed format is used for RGBA and Source
0 Alpha, but Source Depth data payload is always supplied in full
precision."

So when 16-bit data is uploaded to the payload it will use 1 register
independently of it is SIMD16 or SIMD8.

This change implies that we need to replicate the approach in the
copy propagation of the load_payload operations.

v2: By default 16-bit sources should be packed (Jason Ekstrand)
Include changes in in copy_propagation of load_payload (Chema Casanova)
---
 src/intel/compiler/brw_fs.cpp  | 5 -
 src/intel/compiler/brw_fs_copy_propagation.cpp | 4 ++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index b695508823..b1e548fd93 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3485,7 +3485,10 @@ fs_visitor::lower_load_payload()
   for (uint8_t i = inst->header_size; i < inst->sources; i++) {
  if (inst->src[i].file != BAD_FILE)
 ibld.MOV(retype(dst, inst->src[i].type), inst->src[i]);
- dst = offset(dst, ibld, 1);
+ if (type_sz(inst->src[i].type) == 2)
+dst = byte_offset(dst, REG_SIZE);
+ else
+dst = offset(dst, ibld, 1);
   }
 
   inst->remove(block);
diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp 
b/src/intel/compiler/brw_fs_copy_propagation.cpp
index d4d01d783c..470eaeec4f 100644
--- a/src/intel/compiler/brw_fs_copy_propagation.cpp
+++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
@@ -800,7 +800,7 @@ fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, 
bblock_t *block,
  int offset = 0;
  for (int i = 0; i < inst->sources; i++) {
 int effective_width = i < inst->header_size ? 8 : inst->exec_size;
-assert(effective_width * type_sz(inst->src[i].type) % REG_SIZE == 
0);
+assert(effective_width * MAX2(4, type_sz(inst->src[i].type)) % 
REG_SIZE == 0);
 const unsigned size_written = effective_width *
   type_sz(inst->src[i].type);
 if (inst->src[i].file == VGRF) {
@@ -816,7 +816,7 @@ fs_visitor::opt_copy_propagation_local(void *copy_prop_ctx, 
bblock_t *block,
   ralloc_free(entry);
}
 }
-offset += size_written;
+offset += type_sz(inst->src[i].type) == 2 ? REG_SIZE : 
size_written;
  }
   }
}
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 38/44] i965/disasm: Show half-precision data_format on rt_writes

2017-11-29 Thread Jose Maria Casanova Crespo

---
 src/intel/compiler/brw_disasm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c
index 1a94ed3954..c752e15331 100644
--- a/src/intel/compiler/brw_disasm.c
+++ b/src/intel/compiler/brw_disasm.c
@@ -1676,6 +1676,10 @@ brw_disassemble_inst(FILE *file, const struct 
gen_device_info *devinfo,
   brw_inst_rt_message_type(devinfo, inst), );
if (devinfo->gen >= 6 && brw_inst_rt_slot_group(devinfo, inst))
   string(file, " Hi");
+   if ((devinfo->gen >= 9 || devinfo->is_cherryview) &&
+   brw_inst_data_format(devinfo, inst)) {
+  string(file, " HP");
+   }
if (brw_inst_rt_last(devinfo, inst))
   string(file, " LastRT");
if (devinfo->gen < 7 && brw_inst_dp_write_commit(devinfo, inst))
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 33/44] i965/fs: Unpack 16-bit from 32-bit components in VS load_input

2017-11-29 Thread Jose Maria Casanova Crespo

The VS load input for 16-bit values receives pairs of 16-bit values
packed in 32-bit values. Because of the adjusted format used at:

 anv/pipeline: Use 32-bit surface formats for 16-bit formats

v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: Fix coding style and typo (Topi Pohjolainen)
Simplify unshuffle 32-bit to 16-bit using helper function
(Jason Ekstrand)
---
 src/intel/compiler/brw_fs_nir.cpp | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 57e79853ef..0f1a428242 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -2430,8 +2430,26 @@ fs_visitor::nir_emit_vs_intrinsic(const fs_builder ,
   assert(const_offset && "Indirect input loads not allowed");
   src = offset(src, bld, const_offset->u32[0]);
 
-  for (unsigned j = 0; j < num_components; j++) {
- bld.MOV(offset(dest, bld, j), offset(src, bld, j + first_component));
+  if (type_sz(type) == 2) {
+ /* The VS load input for 16-bit values receives pairs of 16-bit
+  * values packed in 32-bit values. This is an example on SIMD8:
+  *
+  * xy xy xy xy xy xy xy xy
+  * zw zw zw zw zw zw zw xw
+  *
+  * We need to format it to something like:
+  *
+  * xx xx xx xx yy yy yy yy
+  * zz zz zz zz ww ww ww ww
+  */
+
+ shuffle_32bit_load_result_to_16bit_data(bld,
+ dest,
+ retype(src, 
BRW_REGISTER_TYPE_F),
+ num_components);
+  } else {
+ for (unsigned j = 0; j < num_components; j++)
+bld.MOV(offset(dest, bld, j), offset(src, bld, j + 
first_component));
   }
 
   if (type == BRW_REGISTER_TYPE_DF) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v4 30/44] i965/compiler: includes 16-bit vertex input

2017-11-29 Thread Jose Maria Casanova Crespo

Includes the info about 16-bit vertex inputs coming from nir on brw VS
prog data, as we already do with 64-bit vertex input.

v2: Renamed half_inputs_read to inputs_read_16bit (Jason Ekstrand)
---
 src/intel/compiler/brw_compiler.h | 1 +
 src/intel/compiler/brw_vec4.cpp   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/intel/compiler/brw_compiler.h 
b/src/intel/compiler/brw_compiler.h
index 28aed83324..191dc8bd1d 100644
--- a/src/intel/compiler/brw_compiler.h
+++ b/src/intel/compiler/brw_compiler.h
@@ -961,6 +961,7 @@ struct brw_vs_prog_data {
 
GLbitfield64 inputs_read;
GLbitfield64 double_inputs_read;
+   GLbitfield64 inputs_read_16bit;
 
unsigned nr_attribute_slots;
 
diff --git a/src/intel/compiler/brw_vec4.cpp b/src/intel/compiler/brw_vec4.cpp
index 73c40ad600..d32b1e3302 100644
--- a/src/intel/compiler/brw_vec4.cpp
+++ b/src/intel/compiler/brw_vec4.cpp
@@ -2771,6 +2771,7 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
 
prog_data->inputs_read = shader->info.inputs_read;
prog_data->double_inputs_read = shader->info.double_inputs_read;
+   prog_data->inputs_read_16bit = shader->info.inputs_read_16bit;
 
brw_nir_lower_vs_inputs(shader, use_legacy_snorm_formula,
key->gl_attrib_wa_flags);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

1 2 >

1 - 100 of 185 matches

Mail list logo