[Mesa-dev] [PATCH v5 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-07-29 Thread Jose Maria Casanova Crespo
These new methods return for a instruction register source/destination
the read/write byte pattern of the 32-byte GRF as an unsigned int.

The returned pattern takes into account the exec_size of the instruction,
the type bitsize, the register stride and a relative offset inside the
register.

The motivation of this functions if to know the read/written bytes
of the instructions to improve the liveness analysis for partial
read/writes.

We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
parameter they have a different read pattern.

v2: (Francisco Jerez)
- Split original register_byte_use_pattern into one read and other
  write.
- Check for send like instructions using this->mlen != 0
- Pass functions src number and offset.
- Use periodic_mask function with code written by Francisco Jerez
  to simplify pattern generation.
- Avoid breaking silently if source straddles multiple GRFs.

v3: (Francisco Jerez)
- A SEND could be this->mlen != 0 or this->is_send_from_grf
- We only assume that a periodic mask with offset could be applied
  to reg_offset == 0.
- We can assure that for MOVs operations for any offset (Chema)

v4: (Francisco Jerez)
- We return 0 mask for reg_offset out of the region definition.
- We return periodic masks when access is in bounds for ALU opcodes.

v5: (Francisco Jerez)
- Mask can only be periodic when byte_offset < type_size * stride
  when reg_offset > 0.

Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs.cpp  | 121 +
 src/intel/compiler/brw_ir_fs.h |   2 +
 2 files changed, 123 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 7ddbd285fe2..d790b080e53 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -39,6 +39,7 @@
 #include "compiler/glsl_types.h"
 #include "compiler/nir/nir_builder.h"
 #include "program/prog_parameter.h"
+#include 
 
 using namespace brw;
 
@@ -687,6 +688,126 @@ fs_inst::is_partial_write() const
this->dst.offset % REG_SIZE != 0);
 }
 
+/**
+ * Returns a periodic mask that is repeated "count" times with a "step"
+ * size and consecutive "bits" finally shifted "offset" bits to the left.
+ *
+ * This helper is used to calculate the representations of byte read/write
+ * register patterns
+ *
+ * Example: periodic_mask(8, 4, 2, 0)  would return 0x
+ *  periodic_mask(8, 4, 2, 2)  would return 0x
+ *  periodic_masc(8, 2, 2, 16) would return 0x
+ */
+static inline uint32_t
+periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
+{
+   uint32_t m = (count ? (1 << bits) - 1 : 0);
+   const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT);
+
+   for (unsigned shift = step; shift < max; shift *= 2)
+  m |= m << shift;
+
+   return m << offset;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been written by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize, the stride
+ * of the destination register and the internal register byte offset.
+ *
+ * The objective of this function is to identify which parts of the register
+ * are written for operations that don't write a full register. So we
+ * we can identify in live range variable analysis if a partial write has
+ * completelly defined the data used by a partial read.
+ *
+ * reg_offset identifies full registers starting at dst.reg with
+ * reg_offset == 0.
+ */
+unsigned
+fs_inst::dst_write_pattern(unsigned reg_offset) const
+{
+   assert(this->dst.file == VGRF);
+
+   /* Instruction doesn't write out of bounds */
+   if (reg_offset >= regs_written(this))
+  return 0;
+
+   /* We don't know what is written so we return the worst case */
+   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
+  return 0;
+
+   /* We assume that send destinations are completelly defined */
+   if (this->is_send_from_grf() || this->mlen != 0)
+  return ~0u;
+
+   /* The byte pattern is calculated using a periodic mask for ALU
+* operations and reg_offset in bounds.
+*/
+   unsigned step = this->dst.stride * type_sz(this->dst.type);
+   unsigned byte_offset = this->dst.offset % REG_SIZE;
+   if (reg_offset == 0 || byte_offset < step) {
+  return periodic_mask(this->exec_size, step, type_sz(this->dst.type),
+   byte_offset);
+   }
+
+   /* We don't know what was written so return 0 as safest choice */
+   return 0;
+}
+
+/**
+ * Returns a 32-bit uint whose bits represent if the associated register byte
+ * has been read by the instruction. The returned pattern takes into
+ * account the exec_size of the instruction, the type bitsize and stride of
+ * a source register and the internal register byte offset.
+ *
+ * The objective of this function is

Re: [Mesa-dev] [PATCH v5 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst

2018-08-21 Thread Francisco Jerez
Jose Maria Casanova Crespo  writes:

> These new methods return for a instruction register source/destination
> the read/write byte pattern of the 32-byte GRF as an unsigned int.
>
> The returned pattern takes into account the exec_size of the instruction,
> the type bitsize, the register stride and a relative offset inside the
> register.
>
> The motivation of this functions if to know the read/written bytes
> of the instructions to improve the liveness analysis for partial
> read/writes.
>
> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL
> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize
> parameter they have a different read pattern.
>
> v2: (Francisco Jerez)
> - Split original register_byte_use_pattern into one read and other
>   write.
> - Check for send like instructions using this->mlen != 0
> - Pass functions src number and offset.
> - Use periodic_mask function with code written by Francisco Jerez
>   to simplify pattern generation.
> - Avoid breaking silently if source straddles multiple GRFs.
>
> v3: (Francisco Jerez)
> - A SEND could be this->mlen != 0 or this->is_send_from_grf
> - We only assume that a periodic mask with offset could be applied
>   to reg_offset == 0.
> - We can assure that for MOVs operations for any offset (Chema)
>
> v4: (Francisco Jerez)
> - We return 0 mask for reg_offset out of the region definition.
> - We return periodic masks when access is in bounds for ALU opcodes.
>
> v5: (Francisco Jerez)
> - Mask can only be periodic when byte_offset < type_size * stride
>   when reg_offset > 0.
>
> Cc: Francisco Jerez 
> ---
>  src/intel/compiler/brw_fs.cpp  | 121 +
>  src/intel/compiler/brw_ir_fs.h |   2 +
>  2 files changed, 123 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 7ddbd285fe2..d790b080e53 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -39,6 +39,7 @@
>  #include "compiler/glsl_types.h"
>  #include "compiler/nir/nir_builder.h"
>  #include "program/prog_parameter.h"
> +#include 
>

I don't think you need to include this.

>  using namespace brw;
>  
> @@ -687,6 +688,126 @@ fs_inst::is_partial_write() const
> this->dst.offset % REG_SIZE != 0);
>  }
>  
> +/**
> + * Returns a periodic mask that is repeated "count" times with a "step"
> + * size and consecutive "bits" finally shifted "offset" bits to the left.
> + *

With 'size' you probably mean periodicity.  And with 'consecutive
"bits"' you probably mean there are "bits" consecutive ones per period.

> + * This helper is used to calculate the representations of byte read/write
> + * register patterns
> + *
> + * Example: periodic_mask(8, 4, 2, 0)  would return 0x
> + *  periodic_mask(8, 4, 2, 2)  would return 0x
> + *  periodic_masc(8, 2, 2, 16) would return 0x
> + */
> +static inline uint32_t
> +periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset)
> +{
> +   uint32_t m = (count ? (1 << bits) - 1 : 0);
> +   const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT);
> +
> +   for (unsigned shift = step; shift < max; shift *= 2)
> +  m |= m << shift;
> +
> +   return m << offset;

I don't think there's any benefit from passing the offset as argument of
this function at this point.  The comma is less self-documenting than a
left-shift operator applied on the result.

> +}
> +
> +/**
> + * Returns a 32-bit uint whose bits represent if the associated register byte
> + * has been written by the instruction. The returned pattern takes into
> + * account the exec_size of the instruction, the type bitsize, the stride
> + * of the destination register and the internal register byte offset.
> + *
> + * The objective of this function is to identify which parts of the register
> + * are written for operations that don't write a full register. So we
> + * we can identify in live range variable analysis if a partial write has
> + * completelly defined the data used by a partial read.
> + *
> + * reg_offset identifies full registers starting at dst.reg with
> + * reg_offset == 0.

Last sentence seems misleading because the dst.offset also has an
influence on which GRF is referred to by a given reg_offset.  Maybe
write something along the lines of "reg_offset selects the GRF the mask
is calculated for, relative to the first GRF written by the
instruction".

> + */
> +unsigned
> +fs_inst::dst_write_pattern(unsigned reg_offset) const
> +{
> +   assert(this->dst.file == VGRF);
> +
> +   /* Instruction doesn't write out of bounds */
> +   if (reg_offset >= regs_written(this))
> +  return 0;
> +
> +   /* We don't know what is written so we return the worst case */
> +   if (this->predicate && this->opcode != BRW_OPCODE_SEL)
> +  return 0;
> +
> +   /* We assume that send destinations are completelly defined */

complete