[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

--- Comment #2 from Michel Dänzer  ---
Created attachment 127204
  --> https://bugs.freedesktop.org/attachment.cgi?id=127204&action=edit
Work with a local reference of so->fence

Does this patch help?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows it

2016-10-11 Thread Nicolai Hähnle

Reviewed-by: Nicolai Hähnle 

On 10.10.2016 13:25, Marek Olšák wrote:

From: Marek Olšák 

The kernel patch has been sent to amd-gfx.
---
 src/gallium/drivers/radeonsi/si_compute.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_compute.c 
b/src/gallium/drivers/radeonsi/si_compute.c
index 1d1df2f..8a803c9 100644
--- a/src/gallium/drivers/radeonsi/si_compute.c
+++ b/src/gallium/drivers/radeonsi/si_compute.c
@@ -244,21 +244,26 @@ static void si_initialize_compute(struct si_context *sctx)
}

/* Set the pointer to border colors. */
bc_va = sctx->border_color_buffer->gpu_address;

if (sctx->b.chip_class >= CIK) {
radeon_set_uconfig_reg_seq(cs, R_030E00_TA_CS_BC_BASE_ADDR, 2);
radeon_emit(cs, bc_va >> 8);  /* R_030E00_TA_CS_BC_BASE_ADDR */
radeon_emit(cs, bc_va >> 40); /* R_030E04_TA_CS_BC_BASE_ADDR_HI 
*/
} else {
-   radeon_set_config_reg(cs, R_00950C_TA_CS_BC_BASE_ADDR, bc_va >> 
8);
+   if (sctx->screen->b.info.drm_major == 3 ||
+   (sctx->screen->b.info.drm_major == 2 &&
+sctx->screen->b.info.drm_minor >= 48)) {
+   radeon_set_config_reg(cs, R_00950C_TA_CS_BC_BASE_ADDR,
+ bc_va >> 8);
+   }
}

sctx->cs_shader_state.emitted_program = NULL;
sctx->cs_shader_state.initialized = true;
 }

 static bool si_setup_compute_scratch_buffer(struct si_context *sctx,
 struct si_shader *shader,
 struct si_shader_config *config)
 {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 014/103] i965/disasm: align16 DF source regions have a width of 2

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 5e51be7..1d2a4d2 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -942,7 +942,10 @@ src_da16(FILE *file,
   format(file, ".%d", 16 / reg_type_size[_reg_type]);
string(file, "<");
err |= control(file, "vert stride", vert_stride, _vert_stride, NULL);
-   string(file, ",4,1>");
+   if (reg_type_size[_reg_type] == 8)
+  string(file, ",2,1>");
+   else
+  string(file, ",4,1>");
err |= src_swizzle(file, BRW_SWIZZLE4(swz_x, swz_y, swz_z, swz_w));
err |= control(file, "src da16 reg type", reg_encoding, _reg_type, NULL);
return err;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 000/103] i965 Haswell ARB_gpu_shader_fp64 / OpenGL 4.0

2016-10-11 Thread Iago Toral Quiroga
It's been some time since we sent the first version of the patches, so here is
a v2, which adds:

1. Feedback from Curro to v1. I think the only thing missing is the suggestion
to change the semantics of the offset() helper in vec4 to match those in the
scalar backend. I sent this as a separate series [1] that is still awaiting
review. Once that is good to land we should adapt this series accordingly.

2. Adaptations to the sub-register offsets work done by Curro in master.

3. Some rudimentary support for 64-bit spilling. This is quite limited at the
moment, since it skips spilling of fp64 data in a number of cases where it
is not safe to do it at present. I guess we can look for ways improve this
going forward, but I rather do that after we land the bulk of fp64, since the
series is already quite big as it is.

4. Avoid scalarizing a number of swizzle combinations that we can support
natively.

5. Many other small clean-ups and fixes.

The series is available for testing in the 'i965-fp64-gen7-scalar-vec4-rc2'
branch of our github repository [2].

This series implements the bulk of the fp64 align16 backend support and creates
the infrastructure to implement vertex attrib 64bit as well, so once this lands
in master we plan to send additional series that add VA64 for Haswell, and then
Fp64 and VA64 for IvyBridge.

[1] https://lists.freedesktop.org/archives/mesa-dev/2016-October/130459.html
[2] https://github.com/Igalia/mesa/tree/i965-fp64-gen7-scalar-vec4-rc2

Connor Abbott (6):
  i965/vec4/nir: simplify glsl_type_for_nir_alu_type()
  i965/vec4/nir: allocate two registers for dvec3/dvec4
  i965/vec4/nir: set the right type for 64-bit registers
  i965/vec4: add support for printing DF immediates
  i965: add brw_vecn_grf()
  i965/vec4: don't constant propagate 64-bit immediates

Iago Toral Quiroga (92):
  i965/vec4/nir: Add bit-size information to types
  i965/vec4/nir: support doubles in ALU operations
  i965/vec4/nir: fix emitting 64-bit immediates
  i965/vec4: add double/float conversion pseudo-opcodes
  i965/vec4: translate d2f/f2d
  i965: fix subnr overflow in suboffset()
  i965/vec4: set correct register regions for 32-bit and 64-bit
  i965/disasm: align16 DF source regions have a width of 2
  i965/vec4: We only support 32-bit integer ALU operations for now
  i965/vec4: add dst_null_df()
  i965/vec4: add VEC4_OPCODE_PICK_{LOW,HIGH}_32BIT opcodes
  i965/vec4: add VEC4_OPCODE_SET_{LOW,HIGH}_32BIT opcodes
  i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW,HIGH}_32BIT
  i965/vec4: don't copy propagate vector opcodes that operate in align1
mode
  i965/vec4: implement double unpacking
  i965/vec4: implement double packing
  i965/vec4/nir: implement double comparisons
  i965/vec4: fix base offset for nir_registers with doubles
  i965/vec4: fix indentation in get_nir_src()
  i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations
  i965/vec4: make opt_vector_float ignore doubles
  i965/vec4: fix register allocation for 64-bit undef sources
  i965/vec4: Rename DF to/from F generator opcodes
  i965/vec4: add helpers for conversions to/from doubles
  i965/vec4: implement hardware workaround for align16 double to float
conversion
  i965/vec4: implement d2i, d2u, i2d and u2d
  i965/vec4: implement d2b
  i965/vec4: implement fsign() for doubles
  i965/vec4: fix optimize predicate for doubles
  i965/vec4: add a helper function to create double immediates
  i965: move exec_size from fs_instruction to backend_instruction
  i965/vec4: fix size_written for doubles
  i965/vec4: fix regs_read() for doubles
  i965/vec4: use the IR's execution size
  i965/vec4: dump the instruction execution size
  i965/vec4: add a horiz_offset() helper
  i965: move the group field from fs_inst to backend_instruction.
  i965/vec4: add a SIMD lowering pass
  i965/vec4: make the generator set correct NibCtrl for SIMD4 DF
instructions
  i965/vec4: dump NibCtrl for instructions with execsize != 8
  i965/disasm: print NibCtrl for instructions with execsize < 8
  i965/vec4: teach CSE about exec_size, group and doubles
  i965/vec4: teach cmod propagation about different execution sizes
  i965/vec4: split double-precision bcsel
  i965/vec4: add a scalarization pass for double-precision instructions
  i965/vec4: translate 64-bit swizzles to 32-bit
  i965/vec4: implement access to DF source components Z/W
  i965/disasm: fix subreg for dst in Align16 mode
  i965/vec4: teach register coalescing about 64-bit
  i965/vec4: fix pack_uniform_registers for doubles
  i965/vec4: fix indentation in pack_uniform_registers
  i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands
  i965/vec4/nir: do not emit 64-bit MAD
  i965/vec4: do not emit 64-bit MAD
  i965/vec4: support multiple dispatch widths and groups in the IR
builder.
  i965/vec4: Add a shuffle_64bit_data helper
  i965/vec4: Fix UBO loads for 64-bit data
  i965/vec4: Fix SSBO loads for 64-bit data
  i965/vec4: Fix SSBO stores for 64-bit data
  i965/vec4: prevent co

[Mesa-dev] [PATCH 2/3] radv/winsys: Move a 'default:' to the end of case stmt

2016-10-11 Thread Edward O'Callaghan
Shift this down and maintain the exact same behaviour as the
current code.

Signed-off-by: Edward O'Callaghan 
---
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c 
b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c
index 7319a98..3f41778 100644
--- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c
+++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c
@@ -242,10 +242,11 @@ static unsigned radv_eg_tile_split_rev(unsigned 
eg_tile_split)
case 128:   return 1;
case 256:   return 2;
case 512:   return 3;
-   default:
case 1024:  return 4;
case 2048:  return 5;
case 4096:  return 6;
+   default:
+   return 4;
}
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 017/103] i965/vec4: add VEC4_OPCODE_PICK_{LOW, HIGH}_32BIT opcodes

2016-10-11 Thread Iago Toral Quiroga
These opcodes will pick the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this, for example, to do things like
unpackDouble2x32.

We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in terms of 64-bit logical swizzles for DF operands
all the way up to codegen.

v2:
 - use suboffset() instead of get_element_ud()
 - no need to set the width on the dst
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  2 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp |  4 
 src/mesa/drivers/dri/i965/brw_vec4.cpp   |  4 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 25 
 4 files changed, 35 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 79b96a4..8ffb50c 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1100,6 +1100,8 @@ enum opcode {
VEC4_OPCODE_UNPACK_UNIFORM,
VEC4_OPCODE_DOUBLE_TO_FLOAT,
VEC4_OPCODE_FLOAT_TO_DOUBLE,
+   VEC4_OPCODE_PICK_LOW_32BIT,
+   VEC4_OPCODE_PICK_HIGH_32BIT,
 
FS_OPCODE_DDX_COARSE,
FS_OPCODE_DDX_FINE,
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index b063f77..b2f3a56 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -321,6 +321,10 @@ brw_instruction_name(const struct gen_device_info 
*devinfo, enum opcode op)
   return "double_to_float";
case VEC4_OPCODE_FLOAT_TO_DOUBLE:
   return "float_to_double";
+   case VEC4_OPCODE_PICK_LOW_32BIT:
+  return "pick_low_32bit";
+   case VEC4_OPCODE_PICK_HIGH_32BIT:
+  return "pick_high_32bit";
 
case FS_OPCODE_DDX_COARSE:
   return "ddx_coarse";
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 40f8702..4fd04f1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -255,6 +255,8 @@ vec4_instruction::can_do_writemask(const struct 
gen_device_info *devinfo)
case SHADER_OPCODE_GEN4_SCRATCH_READ:
case VEC4_OPCODE_DOUBLE_TO_FLOAT:
case VEC4_OPCODE_FLOAT_TO_DOUBLE:
+   case VEC4_OPCODE_PICK_LOW_32BIT:
+   case VEC4_OPCODE_PICK_HIGH_32BIT:
case VS_OPCODE_PULL_CONSTANT_LOAD:
case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9:
@@ -510,6 +512,8 @@ vec4_visitor::opt_reduce_swizzle()
 
   case VEC4_OPCODE_FLOAT_TO_DOUBLE:
   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
+  case VEC4_OPCODE_PICK_LOW_32BIT:
+  case VEC4_OPCODE_PICK_HIGH_32BIT:
  swizzle = brw_swizzle_for_size(4);
  break;
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 6f4c438..b8778c4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1940,6 +1940,31 @@ generate_code(struct brw_codegen *p,
  break;
   }
 
+  case VEC4_OPCODE_PICK_LOW_32BIT:
+  case VEC4_OPCODE_PICK_HIGH_32BIT: {
+ /* Stores the low/high 32-bit of each 64-bit element in src[0] into
+  * dst using ALIGN1 mode and a <8,4,2>:UD region on the source.
+  */
+ assert(type_sz(src[0].type) == 8);
+ assert(type_sz(dst.type) == 4);
+
+ brw_set_default_access_mode(p, BRW_ALIGN_1);
+
+ dst = retype(dst, BRW_REGISTER_TYPE_UD);
+ dst.hstride = BRW_HORIZONTAL_STRIDE_1;
+
+ src[0] = retype(src[0], BRW_REGISTER_TYPE_UD);
+ if (inst->opcode == VEC4_OPCODE_PICK_HIGH_32BIT)
+src[0] = suboffset(src[0], 1);
+ src[0].vstride = BRW_VERTICAL_STRIDE_8;
+ src[0].width = BRW_WIDTH_4;
+ src[0].hstride = BRW_HORIZONTAL_STRIDE_2;
+ brw_MOV(p, dst, src[0]);
+
+ brw_set_default_access_mode(p, BRW_ALIGN_16);
+ break;
+  }
+
   case VEC4_OPCODE_PACK_BYTES: {
  /* Is effectively:
   *
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] radv/winsys: Trivial style and readability fixups

2016-10-11 Thread Edward O'Callaghan
Drop/add a few newlines where appropriate and drop a couple of
unnessary braces.

Signed-off-by: Edward O'Callaghan 
---
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c  | 16 ++--
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h  |  2 +-
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c |  8 
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c  |  3 ++-
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c 
b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c
index dedc778..330b59b 100644
--- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c
+++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c
@@ -84,6 +84,7 @@ static bool radv_amdgpu_fence_wait(struct radeon_winsys *_ws,
unsigned flags = absolute ? AMDGPU_QUERY_FENCE_TIMEOUT_IS_ABSOLUTE : 0;
int r;
uint32_t expired = 0;
+
/* Now use the libdrm query. */
r = amdgpu_cs_query_fence_status(fence,
 timeout,
@@ -95,16 +96,16 @@ static bool radv_amdgpu_fence_wait(struct radeon_winsys 
*_ws,
return false;
}
 
-   if (expired) {
+   if (expired)
return true;
-   }
-   return false;
 
+   return false;
 }
 
 static void radv_amdgpu_cs_destroy(struct radeon_winsys_cs *rcs)
 {
struct radv_amdgpu_cs *cs = radv_amdgpu_cs(rcs);
+
if (cs->ib_buffer)
cs->ws->base.buffer_destroy(cs->ib_buffer);
else
@@ -112,6 +113,7 @@ static void radv_amdgpu_cs_destroy(struct radeon_winsys_cs 
*rcs)
 
for (unsigned i = 0; i < cs->num_old_ib_buffers; ++i)
cs->ws->base.buffer_destroy(cs->old_ib_buffers[i]);
+
free(cs->old_ib_buffers);
free(cs->handles);
free(cs->priorities);
@@ -121,9 +123,9 @@ static void radv_amdgpu_cs_destroy(struct radeon_winsys_cs 
*rcs)
 static boolean radv_amdgpu_init_cs(struct radv_amdgpu_cs *cs,
   enum ring_type ring_type)
 {
-   for (int i = 0; i < ARRAY_SIZE(cs->buffer_hash_table); ++i) {
+   for (int i = 0; i < ARRAY_SIZE(cs->buffer_hash_table); ++i)
cs->buffer_hash_table[i] = -1;
-   }
+
return true;
 }
 
@@ -297,7 +299,7 @@ static int radv_amdgpu_cs_find_buffer(struct radv_amdgpu_cs 
*cs,
if (index == -1)
return -1;
 
-   if(cs->handles[index] == bo)
+   if (cs->handles[index] == bo)
return index;
 
for (unsigned i = 0; i < cs->num_buffers; ++i) {
@@ -306,6 +308,7 @@ static int radv_amdgpu_cs_find_buffer(struct radv_amdgpu_cs 
*cs,
return i;
}
}
+
return -1;
 }
 
@@ -455,6 +458,7 @@ static int radv_amdgpu_create_bo_list(struct 
radv_amdgpu_winsys *ws,
free(handles);
free(priorities);
}
+
return r;
 }
 
diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h 
b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h
index b4482fc..affee95 100644
--- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h
+++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.h
@@ -36,8 +36,8 @@
 #include 
 
 #include "radv_radeon_winsys.h"
-
 #include "radv_amdgpu_winsys.h"
+
 struct radv_amdgpu_ctx {
struct radv_amdgpu_winsys *ws;
amdgpu_context_handle ctx;
diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c 
b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c
index a3c2411..31927ec 100644
--- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c
+++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c
@@ -27,12 +27,14 @@
  */
 
 #include 
+
 #include "radv_private.h"
 #include "addrlib/addrinterface.h"
 #include "util/bitset.h"
 #include "radv_amdgpu_winsys.h"
 #include "radv_amdgpu_surface.h"
 #include "sid.h"
+
 #ifndef NO_ENTRIES
 #define NO_ENTRIES 32
 #endif
@@ -194,9 +196,8 @@ static int radv_compute_level(ADDR_HANDLE addrlib,
ret = AddrComputeSurfaceInfo(addrlib,
 AddrSurfInfoIn,
 AddrSurfInfoOut);
-   if (ret != ADDR_OK) {
+   if (ret != ADDR_OK)
return ret;
-   }
 
surf_level = is_stencil ? &surf->stencil_level[level] : 
&surf->level[level];
surf_level->offset = align64(surf->bo_size, AddrSurfInfoOut->baseAlign);
@@ -340,8 +341,7 @@ static int radv_amdgpu_winsys_surface_init(struct 
radeon_winsys *_ws,
default:
assert(0);
}
-   }
-   else {
+   } else {
AddrDccIn.bpp = AddrSurfInfoIn.bpp = surf->bpe * 8;
}
 
diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c 
b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
index 9450536..0ce44ac 100644
--- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
+++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
@@ -37,6 +37,7 @@
 #include "radv_amdgpu_cs.h"
 #include "radv_a

[Mesa-dev] [PATCH v2 013/103] i965/vec4: set correct register regions for 32-bit and 64-bit

2016-10-11 Thread Iago Toral Quiroga
For 32-bit instructions we want to use <4,4,1> regions for VGRF
sources so we should really set a width of 4 (we were setting 8).

For 64-bit instructions we want to use a width of 2 because the
hardware uses 32-bit swizzles, meaning that we can only address 2
consecutive 64-bit components in a row. Also, Curro suggested that
the hardware is probably fixing the width to 2 for 64-bit instructions
anyway, so just go with that and use <2,2,1>.

v2:
 - No need to explicitly set the vertical stride of 64-bit regions to 2,
   brw_vecn_grf with a width of 2 will do that for us.
 - No need to adjust the width of dst registers.

Signed-off-by: Connor Abbott 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 32c04b2..40f8702 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1873,20 +1873,24 @@ vec4_visitor::convert_to_hw_regs()
  struct src_reg &src = inst->src[i];
  struct brw_reg reg;
  switch (src.file) {
- case VGRF:
-reg = byte_offset(brw_vec8_grf(src.nr, 0), src.offset);
+ case VGRF: {
+unsigned type_size = type_sz(src.type);
+unsigned width = REG_SIZE / 2 / MAX2(4, type_size);
+reg = byte_offset(brw_vecn_grf(width, src.nr, 0), src.offset);
 reg.type = src.type;
 reg.swizzle = src.swizzle;
 reg.abs = src.abs;
 reg.negate = src.negate;
 break;
+ }
 
- case UNIFORM:
+ case UNIFORM: {
+unsigned width = REG_SIZE / 2 / MAX2(4, type_sz(src.type));
 reg = stride(byte_offset(brw_vec4_grf(
 prog_data->base.dispatch_grf_start_reg 
+
 src.nr / 2, src.nr % 2 * 4),
  src.offset),
- 0, 4, 1);
+ 0, width, 1);
 reg.type = src.type;
 reg.swizzle = src.swizzle;
 reg.abs = src.abs;
@@ -1895,6 +1899,7 @@ vec4_visitor::convert_to_hw_regs()
 /* This should have been moved to pull constants. */
 assert(!src.reladdr);
 break;
+ }
 
  case ARF:
  case FIXED_GRF:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Various radv fixups, style + one mem leak fix

2016-10-11 Thread Edward O'Callaghan
Nothing major here, patch 3 is the only interesting one.

Edward O'Callaghan (3):
 [PATCH 1/3] radv/winsys: Trivial style and readability fixups
 [PATCH 2/3] radv/winsys: Move a 'default:' to the end of case stmt
 [PATCH 3/3] radv/winsys: Fix mem leak at failed do_winsys_init() call
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 022/103] i965/vec4: implement double packing

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 2631bf3..37c3d7c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1538,6 +1538,17 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
}
 
+   case nir_op_pack_double_2x32_split: {
+  dst_reg result = dst_reg(this, glsl_type::dvec4_type);
+  dst_reg tmp = dst_reg(this, glsl_type::uvec4_type);
+  emit(MOV(tmp, retype(op[0], BRW_REGISTER_TYPE_UD)));
+  emit(VEC4_OPCODE_SET_LOW_32BIT, result, src_reg(tmp));
+  emit(MOV(tmp, retype(op[1], BRW_REGISTER_TYPE_UD)));
+  emit(VEC4_OPCODE_SET_HIGH_32BIT, result, src_reg(tmp));
+  emit(MOV(dst, src_reg(result)));
+  break;
+   }
+
case nir_op_unpack_double_2x32_split_x:
case nir_op_unpack_double_2x32_split_y: {
   enum opcode oper = (instr->op == nir_op_unpack_double_2x32_split_x) ?
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 032/103] i965/vec4: implement d2i, d2u, i2d and u2d

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 0170d21..cc10247 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1166,6 +1166,20 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 BRW_REGISTER_TYPE_F);
   break;
 
+   case nir_op_d2i:
+   case nir_op_d2u:
+  emit_double_to_single(dst, op[0], instr->dest.saturate,
+instr->op == nir_op_d2i ? BRW_REGISTER_TYPE_D :
+  BRW_REGISTER_TYPE_UD);
+  break;
+
+   case nir_op_i2d:
+   case nir_op_u2d:
+  emit_single_to_double(dst, op[0], instr->dest.saturate,
+instr->op == nir_op_i2d ? BRW_REGISTER_TYPE_D :
+  BRW_REGISTER_TYPE_UD);
+  break;
+
case nir_op_iadd:
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
case nir_op_fadd:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 029/103] i965/vec4: Rename DF to/from F generator opcodes

2016-10-11 Thread Iago Toral Quiroga
The opcodes are not specific for conversions to/from float since we need
the same for conversions to/from other 32-bit types. Rename the opcodes
accordingly and change the asserts to check the size of the types involved
instead.
---
 src/mesa/drivers/dri/i965/brw_defines.h |  4 ++--
 src/mesa/drivers/dri/i965/brw_shader.cpp|  8 
 src/mesa/drivers/dri/i965/brw_vec4.cpp  |  8 
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp |  4 ++--
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp| 12 ++--
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp  |  6 +++---
 6 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 35d638c..b137fb4 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1098,8 +1098,8 @@ enum opcode {
VEC4_OPCODE_MOV_BYTES,
VEC4_OPCODE_PACK_BYTES,
VEC4_OPCODE_UNPACK_UNIFORM,
-   VEC4_OPCODE_DOUBLE_TO_FLOAT,
-   VEC4_OPCODE_FLOAT_TO_DOUBLE,
+   VEC4_OPCODE_DOUBLE_TO_SINGLE,
+   VEC4_OPCODE_SINGLE_TO_DOUBLE,
VEC4_OPCODE_PICK_LOW_32BIT,
VEC4_OPCODE_PICK_HIGH_32BIT,
VEC4_OPCODE_SET_LOW_32BIT,
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index 153bd43..df43509 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -317,10 +317,10 @@ brw_instruction_name(const struct gen_device_info 
*devinfo, enum opcode op)
   return "pack_bytes";
case VEC4_OPCODE_UNPACK_UNIFORM:
   return "unpack_uniform";
-   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
-  return "double_to_float";
-   case VEC4_OPCODE_FLOAT_TO_DOUBLE:
-  return "float_to_double";
+   case VEC4_OPCODE_DOUBLE_TO_SINGLE:
+  return "double_to_single";
+   case VEC4_OPCODE_SINGLE_TO_DOUBLE:
+  return "single_to_double";
case VEC4_OPCODE_PICK_LOW_32BIT:
   return "pick_low_32bit";
case VEC4_OPCODE_PICK_HIGH_32BIT:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 675b7fc..75a8473 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -253,8 +253,8 @@ vec4_instruction::can_do_writemask(const struct 
gen_device_info *devinfo)
 {
switch (opcode) {
case SHADER_OPCODE_GEN4_SCRATCH_READ:
-   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
-   case VEC4_OPCODE_FLOAT_TO_DOUBLE:
+   case VEC4_OPCODE_DOUBLE_TO_SINGLE:
+   case VEC4_OPCODE_SINGLE_TO_DOUBLE:
case VEC4_OPCODE_PICK_LOW_32BIT:
case VEC4_OPCODE_PICK_HIGH_32BIT:
case VEC4_OPCODE_SET_LOW_32BIT:
@@ -513,8 +513,8 @@ vec4_visitor::opt_reduce_swizzle()
  swizzle = brw_swizzle_for_size(2);
  break;
 
-  case VEC4_OPCODE_FLOAT_TO_DOUBLE:
-  case VEC4_OPCODE_DOUBLE_TO_FLOAT:
+  case VEC4_OPCODE_SINGLE_TO_DOUBLE:
+  case VEC4_OPCODE_DOUBLE_TO_SINGLE:
   case VEC4_OPCODE_PICK_LOW_32BIT:
   case VEC4_OPCODE_PICK_HIGH_32BIT:
   case VEC4_OPCODE_SET_LOW_32BIT:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index d0045a7..49920c2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -286,8 +286,8 @@ static bool
 is_align1_opcode(unsigned opcode)
 {
switch (opcode) {
-   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
-   case VEC4_OPCODE_FLOAT_TO_DOUBLE:
+   case VEC4_OPCODE_DOUBLE_TO_SINGLE:
+   case VEC4_OPCODE_SINGLE_TO_DOUBLE:
case VEC4_OPCODE_PICK_LOW_32BIT:
case VEC4_OPCODE_PICK_HIGH_32BIT:
case VEC4_OPCODE_SET_LOW_32BIT:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 120797b..4d05fcd 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1896,9 +1896,9 @@ generate_code(struct brw_codegen *p,
  break;
   }
 
-  case VEC4_OPCODE_DOUBLE_TO_FLOAT: {
- assert(src[0].type == BRW_REGISTER_TYPE_DF);
- assert(dst.type == BRW_REGISTER_TYPE_F);
+  case VEC4_OPCODE_DOUBLE_TO_SINGLE: {
+ assert(type_sz(src[0].type) == 8);
+ assert(type_sz(dst.type) == 4);
 
  brw_set_default_access_mode(p, BRW_ALIGN_1);
 
@@ -1917,9 +1917,9 @@ generate_code(struct brw_codegen *p,
  break;
   }
 
-  case VEC4_OPCODE_FLOAT_TO_DOUBLE: {
- assert(src[0].type == BRW_REGISTER_TYPE_F);
- assert(dst.type == BRW_REGISTER_TYPE_DF);
+  case VEC4_OPCODE_SINGLE_TO_DOUBLE: {
+ assert(type_sz(src[0].type) == 4);
+ assert(type_sz(dst.type) == 8);
 
  brw_set_default_access_mode(p, BRW_ALIGN_1);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 4dffd76..5

[Mesa-dev] [PATCH v2 008/103] i965/vec4: add support for printing DF immediates

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott 

Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 6aa9102..c29cfb5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1517,6 +1517,9 @@ vec4_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
  case BRW_REGISTER_TYPE_F:
 fprintf(file, "%fF", inst->src[i].f);
 break;
+ case BRW_REGISTER_TYPE_DF:
+fprintf(file, "%fDF", inst->src[i].df);
+break;
  case BRW_REGISTER_TYPE_D:
 fprintf(file, "%dD", inst->src[i].d);
 break;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] radv/winsys: Fix mem leak at failed do_winsys_init() call site

2016-10-11 Thread Edward O'Callaghan
Probably unlikely however ensure we don't leak a heap allocation
on the fail path.

Signed-off-by: Edward O'Callaghan 
---
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c 
b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
index 0ce44ac..ded5ed7 100644
--- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
+++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.c
@@ -356,5 +356,6 @@ radv_amdgpu_winsys_create(int fd)
 
return &ws->base;
 fail:
+   free(ws);
return NULL;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 007/103] i965/vec4/nir: fix emitting 64-bit immediates

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 05e7f29..ce95c8d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -352,8 +352,15 @@ vec4_visitor::get_indirect_offset(nir_intrinsic_instr 
*instr)
 void
 vec4_visitor::nir_emit_load_const(nir_load_const_instr *instr)
 {
-   dst_reg reg = dst_reg(VGRF, alloc.allocate(1));
-   reg.type =  BRW_REGISTER_TYPE_D;
+   dst_reg reg;
+
+   if (instr->def.bit_size == 64) {
+  reg = dst_reg(VGRF, alloc.allocate(2));
+  reg.type = BRW_REGISTER_TYPE_DF;
+   } else {
+  reg = dst_reg(VGRF, alloc.allocate(1));
+  reg.type = BRW_REGISTER_TYPE_D;
+   }
 
unsigned remaining = brw_writemask_for_size(instr->def.num_components);
 
@@ -368,13 +375,20 @@ vec4_visitor::nir_emit_load_const(nir_load_const_instr 
*instr)
  continue;
 
   for (unsigned j = i; j < instr->def.num_components; j++) {
- if (instr->value.u32[i] == instr->value.u32[j]) {
+ if ((instr->def.bit_size == 32 &&
+  instr->value.u32[i] == instr->value.u32[j]) ||
+ (instr->def.bit_size == 64 &&
+  instr->value.f64[i] == instr->value.f64[j])) {
 writemask |= 1 << j;
  }
   }
 
   reg.writemask = writemask;
-  emit(MOV(reg, brw_imm_d(instr->value.i32[i])));
+  if (instr->def.bit_size == 64) {
+ emit(MOV(reg, brw_imm_df(instr->value.f64[i])));
+  } else {
+ emit(MOV(reg, brw_imm_d(instr->value.i32[i])));
+  }
 
   remaining &= ~writemask;
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 010/103] i965/vec4: translate d2f/f2d

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ce95c8d..b75337c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -,6 +,30 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   inst = emit(MOV(dst, op[0]));
   break;
 
+   case nir_op_d2f: {
+  dst_reg temp = dst_reg(this, glsl_type::dvec4_type);
+  emit(MOV(temp, op[0]));
+
+  dst_reg temp2 = dst_reg(this, glsl_type::dvec4_type);
+  temp2 = retype(temp2, BRW_REGISTER_TYPE_F);
+  emit(VEC4_OPCODE_DOUBLE_TO_FLOAT, temp2, src_reg(temp))
+ ->size_written = 2 * REG_SIZE;
+
+  vec4_instruction *inst = emit(MOV(dst, src_reg(temp2)));
+  inst->saturate = instr->dest.saturate;
+  break;
+   }
+
+   case nir_op_f2d: {
+  dst_reg tmp_dst = dst_reg(src_reg(this, glsl_type::dvec4_type));
+  src_reg tmp_src = src_reg(this, glsl_type::vec4_type);
+  emit(MOV(dst_reg(tmp_src), retype(op[0], BRW_REGISTER_TYPE_F)));
+  emit(VEC4_OPCODE_FLOAT_TO_DOUBLE, tmp_dst, tmp_src);
+  vec4_instruction *inst = emit(MOV(dst, src_reg(tmp_dst)));
+  inst->saturate = instr->dest.saturate;
+  break;
+   }
+
case nir_op_fadd:
   /* fall through */
case nir_op_iadd:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 002/103] i965/vec4/nir: simplify glsl_type_for_nir_alu_type()

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott 

Less duplication, one one less case to handle for doubles and support
for sized NIR types.

v2: Fix call to get_instance by swapping rows and columns params (Iago)

Signed-off-by: Iago Toral Quiroga 
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 1d834a4..ddeff2d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1784,20 +1784,8 @@ const glsl_type *
 glsl_type_for_nir_alu_type(nir_alu_type alu_type,
unsigned components)
 {
-   switch (alu_type) {
-   case nir_type_float:
-  return glsl_type::vec(components);
-   case nir_type_int:
-  return glsl_type::ivec(components);
-   case nir_type_uint:
-  return glsl_type::uvec(components);
-   case nir_type_bool:
-  return glsl_type::bvec(components);
-   default:
-  return glsl_type::error_type;
-   }
-
-   return glsl_type::error_type;
+   return glsl_type::get_instance(brw_glsl_base_type_for_nir_type(alu_type),
+  components, 1);
 }
 
 void
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 026/103] i965/vec4: fix get_nir_dest() to use DF type for 64-bit destinations

2016-10-11 Thread Iago Toral Quiroga
v2: Make dst_reg_for_nir_reg() handle this for nir_register since we
want to have the correct type set before we call offset().
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index c825aeb..fdd3cba 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -259,8 +259,10 @@ dst_reg_for_nir_reg(vec4_visitor *v, nir_register *nir_reg,
dst_reg reg;
 
reg = v->nir_locals[nir_reg->index];
-   if (nir_reg->bit_size == 64)
+   if (nir_reg->bit_size == 64) {
   base_offset *= 2;
+  reg.type = BRW_REGISTER_TYPE_DF;
+   }
reg = offset(reg, base_offset);
if (indirect) {
   reg.reladdr =
@@ -277,6 +279,8 @@ vec4_visitor::get_nir_dest(const nir_dest &dest)
if (dest.is_ssa) {
   dst_reg dst =
  dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(dest.ssa.bit_size, 32)));
+  if (dest.ssa.bit_size == 64)
+ dst.type = BRW_REGISTER_TYPE_DF;
   nir_ssa_values[dest.ssa.index] = dst;
   return dst;
} else {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 003/103] i965/vec4/nir: allocate two registers for dvec3/dvec4

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott 

v2 (Curro):
  - Do not special-case for a bit-size of 64, divide the bit_size by 32
instead.
  - Use DIV_ROUND_UP so we can handle sub-32-bit types.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index ddeff2d..af76730 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -140,8 +140,8 @@ vec4_visitor::nir_emit_impl(nir_function_impl *impl)
foreach_list_typed(nir_register, reg, node, &impl->registers) {
   unsigned array_elems =
  reg->num_array_elems == 0 ? 1 : reg->num_array_elems;
-
-  nir_locals[reg->index] = dst_reg(VGRF, alloc.allocate(array_elems));
+  unsigned num_regs = array_elems * DIV_ROUND_UP(reg->bit_size, 32);
+  nir_locals[reg->index] = dst_reg(VGRF, alloc.allocate(num_regs));
}
 
nir_ssa_values = ralloc_array(mem_ctx, dst_reg, impl->ssa_alloc);
@@ -270,7 +270,8 @@ dst_reg
 vec4_visitor::get_nir_dest(const nir_dest &dest)
 {
if (dest.is_ssa) {
-  dst_reg dst = dst_reg(VGRF, alloc.allocate(1));
+  dst_reg dst =
+ dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(dest.ssa.bit_size, 32)));
   nir_ssa_values[dest.ssa.index] = dst;
   return dst;
} else {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 012/103] i965: add brw_vecn_grf()

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott 

Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_reg.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 8907c9c..1fa2595 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -722,6 +722,12 @@ brw_vec16_grf(unsigned nr, unsigned subnr)
return brw_vec16_reg(BRW_GENERAL_REGISTER_FILE, nr, subnr);
 }
 
+static inline struct brw_reg
+brw_vecn_grf(unsigned width, unsigned nr, unsigned subnr)
+{
+   return brw_vecn_reg(width, BRW_GENERAL_REGISTER_FILE, nr, subnr);
+}
+
 
 static inline struct brw_reg
 brw_uw8_grf(unsigned nr, unsigned subnr)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 034/103] i965/vec4: implement fsign() for doubles

2016-10-11 Thread Iago Toral Quiroga
v2: use a predicated MOV instead of a CMP, like we do in d2b, to skip
loading a double immediate.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 64 +++---
 1 file changed, 49 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 69f11ff..c0cb141 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1773,24 +1773,58 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   unreachable("not reached: should have been lowered");
 
case nir_op_fsign:
-  /* AND(val, 0x8000) gives the sign bit.
-   *
-   * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not
-   * zero.
-   */
-  emit(CMP(dst_null_f(), op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ));
+  if (type_sz(op[0].type) < 8) {
+ /* AND(val, 0x8000) gives the sign bit.
+  *
+  * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not
+  * zero.
+  */
+ emit(CMP(dst_null_f(), op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ));
 
-  op[0].type = BRW_REGISTER_TYPE_UD;
-  dst.type = BRW_REGISTER_TYPE_UD;
-  emit(AND(dst, op[0], brw_imm_ud(0x8000u)));
+ op[0].type = BRW_REGISTER_TYPE_UD;
+ dst.type = BRW_REGISTER_TYPE_UD;
+ emit(AND(dst, op[0], brw_imm_ud(0x8000u)));
 
-  inst = emit(OR(dst, src_reg(dst), brw_imm_ud(0x3f80u)));
-  inst->predicate = BRW_PREDICATE_NORMAL;
-  dst.type = BRW_REGISTER_TYPE_F;
+ inst = emit(OR(dst, src_reg(dst), brw_imm_ud(0x3f80u)));
+ inst->predicate = BRW_PREDICATE_NORMAL;
+ dst.type = BRW_REGISTER_TYPE_F;
+
+ if (instr->dest.saturate) {
+inst = emit(MOV(dst, src_reg(dst)));
+inst->saturate = true;
+ }
+  } else {
+ /* For doubles we do the same but we need to consider:
+  *
+  * - We use a predicated MOV instead of a CMP so that we can skip
+  *   loading a 0.0 immediate. We use a source modifier on the source
+  *   of the MOV so that we flush denormalized values to 0. Since we
+  *   want to compare against 0, this won't alter the result.
+  * - We need to extract the high 32-bit of each DF where the sign
+  *   is stored.
+  * - We need to produce a DF result.
+  */
+
+ /* Check for zero */
+ src_reg value = op[0];
+ value.abs = true;
+ inst = emit(MOV(dst_null_df(), value));
+ inst->conditional_mod = BRW_CONDITIONAL_NZ;
+
+ /* AND each high 32-bit channel with 0x8000u */
+ dst_reg tmp = dst_reg(this, glsl_type::uvec4_type);
+ emit(VEC4_OPCODE_PICK_HIGH_32BIT, tmp, op[0]);
+ emit(AND(tmp, src_reg(tmp), brw_imm_ud(0x8000u)));
+
+ /* Add 1.0 to each channel, predicated to skip the cases where the
+  * channel's value was 0
+  */
+ inst = emit(OR(tmp, src_reg(tmp), brw_imm_ud(0x3f80u)));
+ inst->predicate = BRW_PREDICATE_NORMAL;
 
-  if (instr->dest.saturate) {
- inst = emit(MOV(dst, src_reg(dst)));
- inst->saturate = true;
+ /* Now convert the result from float to double */
+ emit_single_to_double(dst, src_reg(tmp), instr->dest.saturate,
+   BRW_REGISTER_TYPE_F);
   }
   break;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 018/103] i965/vec4: add VEC4_OPCODE_SET_{LOW, HIGH}_32BIT opcodes

2016-10-11 Thread Iago Toral Quiroga
These opcodes will set the low/high 32-bit in each 64-bit data element
using Align1 mode. We will use this to implement packDouble2x32.

We use Align1 mode because in order to implement this in Align16 mode
we would need to use 32-bit logical swizzles (XZ for low, YW for high),
but the IR works in terms of 64-bit logical swizzles for DF operands
all the way up to codegen.

v2:
 - use suboffset() instead of get_element_ud()
 - no need to set the width on the dst
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  2 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp |  4 
 src/mesa/drivers/dri/i965/brw_vec4.cpp   |  4 
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 25 
 4 files changed, 35 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index 8ffb50c..35d638c 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1102,6 +1102,8 @@ enum opcode {
VEC4_OPCODE_FLOAT_TO_DOUBLE,
VEC4_OPCODE_PICK_LOW_32BIT,
VEC4_OPCODE_PICK_HIGH_32BIT,
+   VEC4_OPCODE_SET_LOW_32BIT,
+   VEC4_OPCODE_SET_HIGH_32BIT,
 
FS_OPCODE_DDX_COARSE,
FS_OPCODE_DDX_FINE,
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index b2f3a56..153bd43 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -325,6 +325,10 @@ brw_instruction_name(const struct gen_device_info 
*devinfo, enum opcode op)
   return "pick_low_32bit";
case VEC4_OPCODE_PICK_HIGH_32BIT:
   return "pick_high_32bit";
+   case VEC4_OPCODE_SET_LOW_32BIT:
+  return "set_low_32bit";
+   case VEC4_OPCODE_SET_HIGH_32BIT:
+  return "set_high_32bit";
 
case FS_OPCODE_DDX_COARSE:
   return "ddx_coarse";
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 4fd04f1..06fa38f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -257,6 +257,8 @@ vec4_instruction::can_do_writemask(const struct 
gen_device_info *devinfo)
case VEC4_OPCODE_FLOAT_TO_DOUBLE:
case VEC4_OPCODE_PICK_LOW_32BIT:
case VEC4_OPCODE_PICK_HIGH_32BIT:
+   case VEC4_OPCODE_SET_LOW_32BIT:
+   case VEC4_OPCODE_SET_HIGH_32BIT:
case VS_OPCODE_PULL_CONSTANT_LOAD:
case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9:
@@ -514,6 +516,8 @@ vec4_visitor::opt_reduce_swizzle()
   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
   case VEC4_OPCODE_PICK_LOW_32BIT:
   case VEC4_OPCODE_PICK_HIGH_32BIT:
+  case VEC4_OPCODE_SET_LOW_32BIT:
+  case VEC4_OPCODE_SET_HIGH_32BIT:
  swizzle = brw_swizzle_for_size(4);
  break;
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index b8778c4..120797b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1965,6 +1965,31 @@ generate_code(struct brw_codegen *p,
  break;
   }
 
+  case VEC4_OPCODE_SET_LOW_32BIT:
+  case VEC4_OPCODE_SET_HIGH_32BIT: {
+ /* Reads consecutive 32-bit elements from src[0] and writes
+  * them to the low/high 32-bit of each 64-bit element in dst.
+  */
+ assert(type_sz(src[0].type) == 4);
+ assert(type_sz(dst.type) == 8);
+
+ brw_set_default_access_mode(p, BRW_ALIGN_1);
+
+ dst = retype(dst, BRW_REGISTER_TYPE_UD);
+ if (inst->opcode == VEC4_OPCODE_SET_HIGH_32BIT)
+dst = suboffset(dst, 1);
+ dst.hstride = BRW_HORIZONTAL_STRIDE_2;
+
+ src[0] = retype(src[0], BRW_REGISTER_TYPE_UD);
+ src[0].vstride = BRW_VERTICAL_STRIDE_4;
+ src[0].width = BRW_WIDTH_4;
+ src[0].hstride = BRW_HORIZONTAL_STRIDE_1;
+ brw_MOV(p, dst, src[0]);
+
+ brw_set_default_access_mode(p, BRW_ALIGN_16);
+ break;
+  }
+
   case VEC4_OPCODE_PACK_BYTES: {
  /* Is effectively:
   *
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 004/103] i965/vec4/nir: Add bit-size information to types

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index af76730..5048c4e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -325,7 +325,7 @@ src_reg
 vec4_visitor::get_nir_src(const nir_src &src, unsigned num_components)
 {
/* if type is not specified, default to signed int */
-   return get_nir_src(src, nir_type_int, num_components);
+   return get_nir_src(src, nir_type_int32, num_components);
 }
 
 src_reg
@@ -747,7 +747,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
   const nir_intrinsic_info *info = &nir_intrinsic_infos[instr->intrinsic];
 
   /* Get the arguments of the atomic intrinsic. */
-  src_reg offset = get_nir_src(instr->src[0], nir_type_int,
+  src_reg offset = get_nir_src(instr->src[0], nir_type_int32,
instr->num_components);
   const src_reg surface = brw_imm_ud(surf_index);
   const src_reg src0 = (info->num_srcs >= 2
@@ -793,7 +793,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
   * from any live channel.
   */
  surf_index = src_reg(this, glsl_type::uint_type);
- emit(ADD(dst_reg(surf_index), get_nir_src(instr->src[0], nir_type_int,
+ emit(ADD(dst_reg(surf_index), get_nir_src(instr->src[0], 
nir_type_int32,
instr->num_components),
   brw_imm_ud(prog_data->base.binding_table.ubo_start)));
  surf_index = emit_uniformize(surf_index);
@@ -811,7 +811,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
   if (const_offset) {
  offset = brw_imm_ud(const_offset->u32[0] & ~15);
   } else {
- offset = get_nir_src(instr->src[1], nir_type_int, 1);
+ offset = get_nir_src(instr->src[1], nir_type_uint32, 1);
   }
 
   src_reg packed_consts = src_reg(this, glsl_type::vec4_type);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 011/103] i965: fix subnr overflow in suboffset()

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_reg.h | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 3b46d27..8907c9c 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -520,14 +520,6 @@ sechalf(struct brw_reg reg)
 }
 
 static inline struct brw_reg
-suboffset(struct brw_reg reg, unsigned delta)
-{
-   reg.subnr += delta * type_sz(reg.type);
-   return reg;
-}
-
-
-static inline struct brw_reg
 offset(struct brw_reg reg, unsigned delta)
 {
reg.nr += delta;
@@ -544,6 +536,11 @@ byte_offset(struct brw_reg reg, unsigned bytes)
return reg;
 }
 
+static inline struct brw_reg
+suboffset(struct brw_reg reg, unsigned delta)
+{
+   return byte_offset(reg, delta * type_sz(reg.type));
+}
 
 /** Construct unsigned word[16] register */
 static inline struct brw_reg
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 005/103] i965/vec4/nir: support doubles in ALU operations

2016-10-11 Thread Iago Toral Quiroga
Basically, this involves considering the bit-size information to set
the appropriate type on both operands and destination.

v2 (Curro)
  - Don't use two temporaries (and write one of them twice ) to obtain
the nir_alu_type.

Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 5048c4e..0d4c8f5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1055,14 +1055,17 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 {
vec4_instruction *inst;
 
-   dst_reg dst = get_nir_dest(instr->dest.dest,
-  nir_op_infos[instr->op].output_type);
+   nir_alu_type dst_type = (nir_alu_type) (nir_op_infos[instr->op].output_type 
|
+   
nir_dest_bit_size(instr->dest.dest));
+   dst_reg dst = get_nir_dest(instr->dest.dest, dst_type);
dst.writemask = instr->dest.write_mask;
 
src_reg op[4];
for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
-  op[i] = get_nir_src(instr->src[i].src,
-  nir_op_infos[instr->op].input_types[i], 4);
+  nir_alu_type src_type = (nir_alu_type)
+ (nir_op_infos[instr->op].input_types[i] |
+  nir_src_bit_size(instr->src[i].src));
+  op[i] = get_nir_src(instr->src[i].src, src_type, 4);
   op[i].swizzle = brw_swizzle_for_nir_swizzle(instr->src[i].swizzle);
   op[i].abs = instr->src[i].abs;
   op[i].negate = instr->src[i].negate;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 028/103] i965/vec4: fix register allocation for 64-bit undef sources

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index fdd3cba..4dffd76 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -2085,7 +2085,8 @@ vec4_visitor::nir_emit_texture(nir_tex_instr *instr)
 void
 vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr)
 {
-   nir_ssa_values[instr->def.index] = dst_reg(VGRF, alloc.allocate(1));
+   nir_ssa_values[instr->def.index] =
+  dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(instr->def.bit_size, 32)));
 }
 
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 021/103] i965/vec4: implement double unpacking

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 04f70ef..2631bf3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1538,6 +1538,18 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
}
 
+   case nir_op_unpack_double_2x32_split_x:
+   case nir_op_unpack_double_2x32_split_y: {
+  enum opcode oper = (instr->op == nir_op_unpack_double_2x32_split_x) ?
+ VEC4_OPCODE_PICK_LOW_32BIT : VEC4_OPCODE_PICK_HIGH_32BIT;
+  dst_reg tmp = dst_reg(this, glsl_type::dvec4_type);
+  emit(MOV(tmp, op[0]));
+  dst_reg tmp2 = dst_reg(this, glsl_type::uvec4_type);
+  emit(oper, tmp2, src_reg(tmp));
+  emit(MOV(dst, src_reg(tmp2)));
+  break;
+   }
+
case nir_op_unpack_half_2x16:
   /* As NIR does not guarantee that we have a correct swizzle outside the
* boundaries of a vector, and the implementation of 
emit_unpack_half_2x16
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 006/103] i965/vec4/nir: set the right type for 64-bit registers

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott 

---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 0d4c8f5..05e7f29 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -142,6 +142,9 @@ vec4_visitor::nir_emit_impl(nir_function_impl *impl)
  reg->num_array_elems == 0 ? 1 : reg->num_array_elems;
   unsigned num_regs = array_elems * DIV_ROUND_UP(reg->bit_size, 32);
   nir_locals[reg->index] = dst_reg(VGRF, alloc.allocate(num_regs));
+
+  if (reg->bit_size == 64)
+ nir_locals[reg->index].type = BRW_REGISTER_TYPE_DF;
}
 
nir_ssa_values = ralloc_array(mem_ctx, dst_reg, impl->ssa_alloc);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 035/103] i965/vec4: fix optimize predicate for doubles

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index c0cb141..088ed13 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1010,8 +1010,10 @@ vec4_visitor::optimize_predicate(nir_alu_instr *instr,
src_reg op[2];
assert(nir_op_infos[cmp_instr->op].num_inputs == 2);
for (unsigned i = 0; i < 2; i++) {
-  op[i] = get_nir_src(cmp_instr->src[i].src,
-  nir_op_infos[cmp_instr->op].input_types[i], 4);
+  nir_alu_type type = nir_op_infos[cmp_instr->op].input_types[i];
+  unsigned bit_size = nir_src_bit_size(cmp_instr->src[i].src);
+  type = (nir_alu_type) (((unsigned) type) | bit_size);
+  op[i] = get_nir_src(cmp_instr->src[i].src, type, 4);
   unsigned base_swizzle =
  brw_swizzle_for_nir_swizzle(cmp_instr->src[i].swizzle);
   op[i].swizzle = brw_compose_swizzle(size_swizzle, base_swizzle);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 067/103] i965/vec4: Fix SSBO stores for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
In this case we need to shuffle the 64-bit data before we write it
to memory, source from reg_offset + 1 to write components Z and W
and consider that each DF channel is twice as big.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 40 --
 1 file changed, 32 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 001a62f..60a8425 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -510,7 +510,7 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
   }
 
   /* Value */
-  src_reg val_reg = get_nir_src(instr->src[0], 4);
+  src_reg val_reg = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_F, 4);
 
   /* Writemask */
   unsigned write_mask = instr->const_index[0];
@@ -556,24 +556,47 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
   const vec4_builder bld = vec4_builder(this).at_end()
.annotate(current_annotation, base_ir);
 
-  int swizzle[4] = { 0, 0, 0, 0};
+  unsigned type_slots = nir_src_bit_size(instr->src[0]) / 32;
+  if (type_slots == 2) {
+ dst_reg tmp = dst_reg(this, glsl_type::dvec4_type);
+ shuffle_64bit_data(tmp, retype(val_reg, tmp.type), true);
+ val_reg = src_reg(retype(tmp, BRW_REGISTER_TYPE_F));
+  }
+
+  uint8_t swizzle[4] = { 0, 0, 0, 0};
   int num_channels = 0;
   unsigned skipped_channels = 0;
   int num_components = instr->num_components;
   for (int i = 0; i < num_components; i++) {
+ /* Read components Z/W of a dvec from the appropriate place. We will
+  * also have to adjust the swizzle (we do that with the '% 4' below)
+  */
+ if (i == 2 && type_slots == 2)
+val_reg = offset(val_reg, 1);
+
  /* Check if this channel needs to be written. If so, record the
   * channel we need to take the data from in the swizzle array
   */
  int component_mask = 1 << i;
  int write_test = write_mask & component_mask;
- if (write_test)
-swizzle[num_channels++] = i;
+ if (write_test) {
+/* If we are writing doubles we have to write 2 channels worth of
+ * of data (64 bits) for each double component.
+ */
+swizzle[num_channels++] = (i * type_slots) % 4;
+if (type_slots == 2)
+   swizzle[num_channels++] = (i * type_slots + 1) % 4;
+ }
 
  /* If we don't have to write this channel it means we have a gap in 
the
   * vector, so write the channels we accumulated until now, if any. Do
-  * the same if this was the last component in the vector.
+  * the same if this was the last component in the vector, if we have
+  * enough channels for a full vec4 write or if we have processed
+  * components XY of a dvec (since components ZW are not in the same
+  * SIMD register)
   */
- if (!write_test || i == num_components - 1) {
+ if (!write_test || i == num_components - 1 || num_channels == 4 ||
+ (i == 1 && type_slots == 2)) {
 if (num_channels > 0) {
/* We have channels to write, so update the offset we need to
 * write at to skip the channels we skipped, if any.
@@ -607,8 +630,9 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
num_channels = 0;
 }
 
-/* We did not write the current channel, so increase skipped count 
*/
-skipped_channels++;
+/* If we didn't write the channel, increase skipped count */
+if (!write_test)
+   skipped_channels += type_slots;
  }
   }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 044/103] i965/vec4: add a horiz_offset() helper

2016-10-11 Thread Iago Toral Quiroga
This will come in handy when we implement a simd lowering pass in a
follow-up patch.
---
 src/mesa/drivers/dri/i965/brw_ir_vec4.h | 41 +
 1 file changed, 41 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index 7451f44..e271fe1 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -69,6 +69,40 @@ offset(src_reg reg, unsigned delta)
return reg;
 }
 
+static inline void
+add_horiz_offset(backend_reg *reg, unsigned delta)
+{
+   switch (reg->file) {
+   case BAD_FILE:
+  break;
+   case MRF:
+   case VGRF:
+   case ATTR:
+   case UNIFORM: {
+  reg->offset += delta * type_sz(reg->type);
+  assert(reg->offset % 16 == 0);
+  break;
+   }
+   case ARF:
+   case FIXED_GRF: {
+  const unsigned suboffset = reg->subnr + delta * type_sz(reg->type);
+  reg->nr += suboffset / REG_SIZE;
+  reg->subnr = suboffset % REG_SIZE;
+  assert(reg->subnr % 16 == 0);
+  break;
+   }
+   default:
+  assert(delta == 0);
+   }
+}
+
+static inline src_reg
+horiz_offset(src_reg reg, unsigned delta)
+{
+   add_horiz_offset(®, delta);
+   return reg;
+}
+
 /**
  * Reswizzle a given source register.
  * \sa brw_swizzle().
@@ -139,6 +173,13 @@ offset(dst_reg reg, unsigned delta)
 }
 
 static inline dst_reg
+horiz_offset(dst_reg reg, unsigned delta)
+{
+   add_horiz_offset(®, delta);
+   return reg;
+}
+
+static inline dst_reg
 writemask(dst_reg reg, unsigned mask)
 {
assert(reg.file != IMM);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 041/103] i965/vec4: use the IR's execution size

2016-10-11 Thread Iago Toral Quiroga
In the vec4 backend the generator sets to 8 the execution size for all
instructions by default, however, to implement 64-bit floating-point we
will need to split certain instruction into smaller sizes so we need the
IR to convey this information like we do in the scalar backend. This patch
uses the execution size from the vec4 IR.

We will use this feature in a later patch when we implement a SIMD
splitting pass.

v2:
  - Drop the assertion on the execution size being 8 or 4 (Curro)
  - Use exec_size from backend_instruction (Curro)
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 4d05fcd..e4e2742 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1498,6 +1498,7 @@ generate_code(struct brw_codegen *p,
   brw_set_default_saturate(p, inst->saturate);
   brw_set_default_mask_control(p, inst->force_writemask_all);
   brw_set_default_acc_write_control(p, inst->writes_accumulator);
+  brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
 
   assert(inst->base_mrf + inst->mlen <= BRW_MAX_MRF(devinfo->gen));
   assert(inst->mlen <= BRW_MAX_MSG_LENGTH);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 051/103] i965/vec4: teach cmod propagation about different execution sizes

2016-10-11 Thread Iago Toral Quiroga
We can't propagate the conditional modifier from one instruction to
another of a different execution size / group, since that would change
the channels affected by the conditional.
---
 src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
index c531fba..4454cdb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cmod_propagation.cpp
@@ -76,7 +76,9 @@ opt_cmod_propagation_local(bblock_t *block)
  scan_inst->dst.writemask != WRITEMASK_XYZW) ||
 (scan_inst->dst.writemask == WRITEMASK_XYZW &&
  inst->src[0].swizzle != BRW_SWIZZLE_XYZW) ||
-(inst->dst.writemask & ~scan_inst->dst.writemask) != 0) {
+(inst->dst.writemask & ~scan_inst->dst.writemask) != 0 ||
+scan_inst->exec_size != inst->exec_size ||
+scan_inst->group != inst->group) {
break;
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 074/103] i965/vec4: Do not use DepCtrl with 64-bit instructions

2016-10-11 Thread Iago Toral Quiroga
The BDW PRM says that it is not supported, but it seems that gen7 is also
affected, since doing DepCtrl on double-float instructions leads to
GPU hangs in some cases, which is probably not surprising knowing that
this is not supported in new hardware iterations. The SKL PRMs do not
mention this restriction, so it is probably fine.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 7f6acc3..f60334f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -908,12 +908,16 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
*inst)
(reg.type == BRW_REGISTER_TYPE_UD || \
 reg.type == BRW_REGISTER_TYPE_D)
 
+#define IS_64BIT(reg) (reg.file != BAD_FILE && type_sz(reg.type) == 8)
+
/* From the Cherryview and Broadwell PRMs:
 *
 * "When source or destination datatype is 64b or operation is integer DWord
 * multiply, DepCtrl must not be used."
 *
-* SKL PRMs don't include this restriction though.
+* SKL PRMs don't include this restriction, however, gen7 seems to be
+* affected, at least by the 64b restriction, since DepCtrl with double
+* precision instructions seems to produce GPU hangs in some cases.
 */
if (devinfo->gen == 8) {
   if (inst->opcode == BRW_OPCODE_MUL &&
@@ -921,6 +925,14 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
*inst)
  IS_DWORD(inst->src[1]))
  return true;
}
+
+   if (devinfo->gen >= 7 && devinfo->gen <= 8) {
+  if (IS_64BIT(inst->dst) || IS_64BIT(inst->src[0]) ||
+  IS_64BIT(inst->src[1]) || IS_64BIT(inst->src[2]))
+  return true;
+   }
+
+#undef IS_64BIT
 #undef IS_DWORD
 
if (devinfo->gen >= 8) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 019/103] i965/vec4: Fix DCE for VEC4_OPCODE_SET_{LOW, HIGH}_32BIT

2016-10-11 Thread Iago Toral Quiroga
These align1 opcodes do partial writes of 64-bit data. The problem is that we
want to use them to write on the same register to implement packDouble2x32 and
from the point of view of DCE, since both opcodes write to the same register,
only the last one stands and decides to eliminate the first, which is
not correct, so prevent this from happening.

v2: Make a helper in vec4_instruction to know if the instruction is an
align1 partial write. This will come in handy when we implement a
simd splitting pass in a later patch.
---
 src/mesa/drivers/dri/i965/brw_ir_vec4.h| 6 ++
 src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp | 3 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index a8e5f4a..7451f44 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -232,6 +232,12 @@ public:
bool can_change_types() const;
bool has_source_and_destination_hazard() const;
 
+   bool is_align1_partial_write()
+   {
+  return opcode == VEC4_OPCODE_SET_LOW_32BIT ||
+ opcode == VEC4_OPCODE_SET_HIGH_32BIT;
+   }
+
bool reads_flag()
{
   return predicate || opcode == VS_OPCODE_UNPACK_FLAGS_SIMD4X2;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
index 50706a9..950c6c8 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
@@ -109,7 +109,8 @@ vec4_visitor::dead_code_eliminate()
 }
  }
 
- if (inst->dst.file == VGRF && !inst->predicate) {
+ if (inst->dst.file == VGRF && !inst->predicate &&
+ !inst->is_align1_partial_write()) {
 for (unsigned i = 0; i < regs_written(inst); i++) {
for (int c = 0; c < 4; c++) {
   if (inst->dst.writemask & (1 << c)) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 077/103] i965/vec4: fix scratch reads for 64bit data

2016-10-11 Thread Iago Toral Quiroga
v2: Setup for a 64-bit scratch read by checking the type size of the
correct register (Iago)
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 5d47f83..44e6709 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1499,7 +1499,17 @@ vec4_visitor::emit_scratch_read(bblock_t *block, 
vec4_instruction *inst,
src_reg index = get_scratch_offset(block, inst, orig_src.reladdr,
   reg_offset);
 
-   emit_before(block, inst, SCRATCH_READ(temp, index));
+   if (type_sz(orig_src.type) < 8) {
+  emit_before(block, inst, SCRATCH_READ(temp, index));
+   } else {
+  dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type);
+  dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F);
+  emit_before(block, inst, SCRATCH_READ(shuffled_float, index));
+  index = get_scratch_offset(block, inst, orig_src.reladdr, reg_offset + 
1);
+  vec4_instruction *last_read = SCRATCH_READ(offset(shuffled_float, 1), 
index);
+  emit_before(block, inst, last_read);
+  shuffle_64bit_data(temp, src_reg(shuffled), false, block, last_read);
+   }
 }
 
 /**
@@ -1565,7 +1575,8 @@ vec4_visitor::emit_resolve_reladdr(int scratch_loc[], 
bblock_t *block,
 
/* Now handle scratch access on src */
if (src.file == VGRF && scratch_loc[src.nr] != -1) {
-  dst_reg temp = dst_reg(this, glsl_type::vec4_type);
+  dst_reg temp = dst_reg(this, type_sz(src.type) == 8 ?
+ glsl_type::dvec4_type : glsl_type::vec4_type);
   emit_scratch_read(block, inst, temp, src, scratch_loc[src.nr]);
   src.nr = temp.nr;
   src.offset %= REG_SIZE;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 033/103] i965/vec4: implement d2b

2016-10-11 Thread Iago Toral Quiroga
v2 (Curo):
  - Generate the flag register with a predicated MOV instead of a CMP
instruction, which has the benefit that we can skip loading a DF
0.0 constant.
  - Avoid the PICK_LOW_32BIT + MOV by using the flag result and a
SEL to set the boolean result.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index cc10247..69f11ff 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1547,6 +1547,24 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   emit(CMP(dst, op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ));
   break;
 
+   case nir_op_d2b: {
+  /* We use a predicated MOV to check if the provided value is 0.0. We want
+   * this to flush denormalized numbers to zero, so we set a source 
modifier
+   * on the source operand to trigger this, as source modifiers don't
+   * affect the result of the testing against 0.0.
+   */
+  src_reg value = op[0];
+  value.abs = true;
+  vec4_instruction *inst = emit(MOV(dst_null_df(), value));
+  inst->conditional_mod = BRW_CONDITIONAL_NZ;
+
+  src_reg one = src_reg(this, glsl_type::ivec4_type);
+  emit(MOV(dst_reg(one), brw_imm_d(~0)));
+  inst = emit(BRW_OPCODE_SEL, dst, one, brw_imm_d(0));
+  inst->predicate = BRW_PREDICATE_NORMAL;
+  break;
+   }
+
case nir_op_i2b:
   emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ));
   break;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 053/103] i965/vec4: add a scalarization pass for double-precision instructions

2016-10-11 Thread Iago Toral Quiroga
The hardware only supports 32-bit swizzles, which means that we can
only access directly channels XY of a DF making access to channels ZW
more difficult, specially considering the various regioning restrictions
imposed by the hardware. The combination of both things makes handling
ramdom swizzles on DF operands rather difficult, as there are many
combinations that can't be represented at all, at least not without
some work and some level of instruction splitting depending on the case.

Writemasks are 64-bit in general, however XY and ZW writemasks also work
in 32-bit, which means these writemasks can't be represented natively,
adding to the complexity.

For now, we decided to try and simplify things as much as possible to
avoid dealing with all this from the get go by adding a scalarization
pass that runs after the main optimization loop. By fully scalarizing
DF instructions in align16 we avoid most of the complexity introduced
by the aforementioned hardware restrictions and we have an easier path
to an initial fully functional version for the vector backend in Haswell
and IvyBridge.

Later, we can improve the implementation so we don't necessarily
scalarize everything, iteratively adding more complexity and building
on top of a framework that is already working. Curro drafted some ideas
for how this could be done here:
https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82

v2:
  - Use a copy constructor for the scalar instructions so we copy all
relevant instructions fields from the original instruction.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 91 ++
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 2 files changed, 92 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 48816be..b15fcee 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2137,6 +2137,95 @@ vec4_visitor::lower_simd_width()
return progress;
 }
 
+static bool
+is_align1_df(vec4_instruction *inst)
+{
+   switch (inst->opcode) {
+  case VEC4_OPCODE_DOUBLE_TO_SINGLE:
+  case VEC4_OPCODE_SINGLE_TO_DOUBLE:
+  case VEC4_OPCODE_PICK_LOW_32BIT:
+  case VEC4_OPCODE_PICK_HIGH_32BIT:
+  case VEC4_OPCODE_SET_LOW_32BIT:
+  case VEC4_OPCODE_SET_HIGH_32BIT:
+ return true;
+  default:
+ return false;
+   }
+}
+
+static brw_predicate
+scalarize_predicate(brw_predicate predicate, unsigned writemask)
+{
+   if (predicate != BRW_PREDICATE_NORMAL)
+  return predicate;
+
+   switch (writemask) {
+   case WRITEMASK_X:
+  return BRW_PREDICATE_ALIGN16_REPLICATE_X;
+   case WRITEMASK_Y:
+  return BRW_PREDICATE_ALIGN16_REPLICATE_Y;
+   case WRITEMASK_Z:
+  return BRW_PREDICATE_ALIGN16_REPLICATE_Z;
+   case WRITEMASK_W:
+  return BRW_PREDICATE_ALIGN16_REPLICATE_W;
+   default:
+  unreachable("invalid writemask");
+   }
+}
+
+bool
+vec4_visitor::scalarize_df()
+{
+   bool progress = false;
+
+   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
+  /* Skip DF instructions that operate in Align1 mode */
+  if (is_align1_df(inst))
+ continue;
+
+  /* Check if this is a double-precision instruction */
+  bool is_double = type_sz(inst->dst.type) == 8;
+  for (int arg = 0; !is_double && arg < 3; arg++) {
+ is_double = inst->src[arg].file != BAD_FILE &&
+ type_sz(inst->src[arg].type) == 8;
+  }
+
+  if (!is_double)
+ continue;
+
+  /* Generate scalar instructions for each enabled channel */
+  for (unsigned chan = 0; chan < 4; chan++) {
+ unsigned chan_mask = 1 << chan;
+ if (!(inst->dst.writemask & chan_mask))
+continue;
+
+ vec4_instruction *scalar_inst = new(mem_ctx) vec4_instruction(*inst);
+
+ for (unsigned i = 0; i < 3; i++) {
+unsigned swz = BRW_GET_SWZ(inst->src[i].swizzle, chan);
+scalar_inst->src[i].swizzle = BRW_SWIZZLE4(swz, swz, swz, swz);
+ }
+
+ scalar_inst->dst.writemask = chan_mask;
+
+ if (inst->predicate != BRW_PREDICATE_NONE) {
+scalar_inst->predicate =
+   scalarize_predicate(inst->predicate, chan_mask);
+ }
+
+ inst->insert_before(block, scalar_inst);
+  }
+
+  inst->remove(block);
+  progress = true;
+   }
+
+   if (progress)
+  invalidate_live_intervals();
+
+   return progress;
+}
+
 bool
 vec4_visitor::run()
 {
@@ -2236,6 +2325,8 @@ vec4_visitor::run()
if (failed)
   return false;
 
+   OPT(scalarize_df);
+
setup_payload();
 
if (unlikely(INTEL_DEBUG & DEBUG_SPILL_VEC4)) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 3f7045e..03c7345 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -163,6 +163,7 @@ public:
void convert_to_hw_regs();
 
bool lower_simd_width();
+   bool scal

[Mesa-dev] [PATCH v2 097/103] i965/vec4: run scalarize_df() after spilling

2016-10-11 Thread Iago Toral Quiroga
Spilling of 64-bit data requires data shuffling for the corresponding
scratch read/write messages. This produces unsupported swizzle regions
and writemasks that we need to scalarize.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index edb8a84..29ac2d6 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2632,6 +2632,12 @@ vec4_visitor::run()
   return false;
 
OPT(translate_64bit_mad_to_mul_add);
+
+   /* Run this before payload setup because tesselation shaders
+* rely on it to prevent cross dvec2 regioning on DF attributes
+* that are setup so that XY are on the second half of register and
+* ZW are in the first half of the next.
+*/
OPT(scalarize_df);
 
setup_payload();
@@ -2647,6 +2653,12 @@ vec4_visitor::run()
 continue;
  spill_reg(i);
   }
+
+  /* We want to run this after spilling because 64-bit (un)spills need to
+   * emit code to shuffle 64-bit data for the 32-bit scratch read/write
+   * messages that can produce unsupported 64-bit swizzle regions.
+   */
+  OPT(scalarize_df);
}
 
bool allocated_without_spills = reg_allocate();
@@ -2662,6 +2674,12 @@ vec4_visitor::run()
  if (failed)
 return false;
   }
+
+  /* We want to run this after spilling because 64-bit (un)spills need to
+   * emit code to shuffle 64-bit data for the 32-bit scratch read/write
+   * messages that can produce unsupported 64-bit swizzle regions.
+   */
+  OPT(scalarize_df);
}
 
opt_schedule_instructions();
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 039/103] i965/vec4: fix size_written for doubles

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 619e010..4e7515c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -46,7 +46,6 @@ vec4_instruction::vec4_instruction(enum opcode opcode, const 
dst_reg &dst,
this->predicate = BRW_PREDICATE_NONE;
this->predicate_inverse = false;
this->target = 0;
-   this->size_written = (dst.file == BAD_FILE ? 0 : REG_SIZE);
this->shadow_compare = false;
this->ir = NULL;
this->urb_write_flags = BRW_URB_WRITE_NO_FLAGS;
@@ -56,6 +55,8 @@ vec4_instruction::vec4_instruction(enum opcode opcode, const 
dst_reg &dst,
this->base_mrf = 0;
this->offset = 0;
this->exec_size = 8;
+   this->size_written = (dst.file == BAD_FILE ?
+ 0 : this->exec_size * type_sz(dst.type));
this->annotation = NULL;
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 016/103] i965/vec4: add dst_null_df()

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 1505ba6..86e58f3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -79,6 +79,11 @@ public:
   return dst_reg(brw_null_reg());
}
 
+   dst_reg dst_null_df()
+   {
+  return dst_reg(retype(brw_null_reg(), BRW_REGISTER_TYPE_DF));
+   }
+
dst_reg dst_null_d()
{
   return dst_reg(retype(brw_null_reg(), BRW_REGISTER_TYPE_D));
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 084/103] i965/vec4: fix attribute setup for doubles

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 426faf0..56a46ad 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1659,12 +1659,19 @@ vec4_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
 
 
 static inline struct brw_reg
-attribute_to_hw_reg(int attr, bool interleaved)
+attribute_to_hw_reg(int attr, brw_reg_type type, bool interleaved)
 {
-   if (interleaved)
-  return stride(brw_vec4_grf(attr / 2, (attr % 2) * 4), 0, 4, 1);
-   else
-  return brw_vec8_grf(attr, 0);
+   struct brw_reg reg;
+
+   unsigned width = REG_SIZE / 2 / MAX2(4, type_sz(type));
+   if (interleaved) {
+  reg = stride(brw_vecn_grf(width, attr / 2, (attr % 2) * 4), 0, width, 1);
+   } else {
+  reg = brw_vecn_grf(width, attr, 0);
+   }
+
+   reg.type = type;
+   return reg;
 }
 
 
@@ -1698,9 +1705,9 @@ vec4_visitor::lower_attributes_to_hw_regs(const int 
*attribute_map,
   */
  assert(grf != 0);
 
- struct brw_reg reg = attribute_to_hw_reg(grf, interleaved);
+ struct brw_reg reg =
+attribute_to_hw_reg(grf, inst->src[i].type, interleaved);
  reg.swizzle = inst->src[i].swizzle;
- reg.type = inst->src[i].type;
  if (inst->src[i].abs)
 reg = brw_abs(reg);
  if (inst->src[i].negate)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 037/103] i965/vec4: use the new helper function to create double immediates

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez 

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 4d5fa96..1da7c85 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -391,7 +391,7 @@ vec4_visitor::nir_emit_load_const(nir_load_const_instr 
*instr)
 
   reg.writemask = writemask;
   if (instr->def.bit_size == 64) {
- emit(MOV(reg, brw_imm_df(instr->value.f64[i])));
+ emit(MOV(reg, setup_imm_df(instr->value.f64[i])));
   } else {
  emit(MOV(reg, brw_imm_d(instr->value.i32[i])));
   }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 047/103] i965/vec4: make the generator set correct NibCtrl for SIMD4 DF instructions

2016-10-11 Thread Iago Toral Quiroga
From the HSW PRM, Command Reference, QtrCtrl:

   "NibCtrl is only allowed for SIMD4 instructions with a DF (Double Float)
source or destination type."

v2: Assert that the type is DF (Samuel)
v3: Don't set the default group to 0 and then set it only for 4-wide
instructions. Instead, assert that exec size and group are always
a correct match and then always set the default group from the
instruction. (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index e4e2742..33071f2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1500,6 +1500,15 @@ generate_code(struct brw_codegen *p,
   brw_set_default_acc_write_control(p, inst->writes_accumulator);
   brw_set_default_exec_size(p, cvt(inst->exec_size) - 1);
 
+  assert(inst->group % inst->exec_size == 0);
+  assert(inst->group % 8 == 0 ||
+ inst->dst.type == BRW_REGISTER_TYPE_DF ||
+ inst->src[0].type == BRW_REGISTER_TYPE_DF ||
+ inst->src[1].type == BRW_REGISTER_TYPE_DF ||
+ inst->src[2].type == BRW_REGISTER_TYPE_DF);
+  if (!inst->force_writemask_all)
+ brw_set_default_group(p, inst->group);
+
   assert(inst->base_mrf + inst->mlen <= BRW_MAX_MRF(devinfo->gen));
   assert(inst->mlen <= BRW_MAX_MSG_LENGTH);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 024/103] i965/vec4: fix base offset for nir_registers with doubles

2016-10-11 Thread Iago Toral Quiroga
v2: do this inside dst_reg_for_nir_reg() instead of its callers
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 815082e..860ec51 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -259,6 +259,8 @@ dst_reg_for_nir_reg(vec4_visitor *v, nir_register *nir_reg,
dst_reg reg;
 
reg = v->nir_locals[nir_reg->index];
+   if (nir_reg->bit_size == 64)
+  base_offset *= 2;
reg = offset(reg, base_offset);
if (indirect) {
   reg.reladdr =
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 009/103] i965/vec4: add double/float conversion pseudo-opcodes

2016-10-11 Thread Iago Toral Quiroga
These need to be emitted as align1 MOV's, since they need to have a
stride of 2 on the float register (whether src or dest) so that data
from another thread doesn't cross the middle of a SIMD8 register.

v2 (Iago):
- The float-to-double needs to align 32-bit data to 64-bit before doing the
conversion. This was doable in align16 when we tried to use an execsize
of 4, but with an execsize of 8 we would need another align1 opcode to do
that (since we need data to cross the middle of a SIMD register). Just
making the opcode handle this internally seems more practical that adding
another opcode just for this purpose and having the caller know about this
before converting.
- The double-to-float conversion produces 32-bit elements aligned to 64-bit
so we make the opcode re-pack the result to 32-bit and fit in one register,
as expected by SIMD4x2 operation. This still requires that callers reserve
two registers for the float data destination because we need to produce
64-bit aligned data first, and repack it later on the same destination
register, but it saves the need for a re-pack opcode only to achieve this
making the operation complete in a single opcode. Hopefully that is worth
the weirdness of the double register allocation...

Signed-off-by: Connor Abbott 
Signed-off-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_defines.h  |  2 ++
 src/mesa/drivers/dri/i965/brw_shader.cpp |  4 +++
 src/mesa/drivers/dri/i965/brw_vec4.cpp   |  8 +
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 44 
 4 files changed, 58 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
index c4e0f27..79b96a4 100644
--- a/src/mesa/drivers/dri/i965/brw_defines.h
+++ b/src/mesa/drivers/dri/i965/brw_defines.h
@@ -1098,6 +1098,8 @@ enum opcode {
VEC4_OPCODE_MOV_BYTES,
VEC4_OPCODE_PACK_BYTES,
VEC4_OPCODE_UNPACK_UNIFORM,
+   VEC4_OPCODE_DOUBLE_TO_FLOAT,
+   VEC4_OPCODE_FLOAT_TO_DOUBLE,
 
FS_OPCODE_DDX_COARSE,
FS_OPCODE_DDX_FINE,
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index ed81563..b063f77 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -317,6 +317,10 @@ brw_instruction_name(const struct gen_device_info 
*devinfo, enum opcode op)
   return "pack_bytes";
case VEC4_OPCODE_UNPACK_UNIFORM:
   return "unpack_uniform";
+   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
+  return "double_to_float";
+   case VEC4_OPCODE_FLOAT_TO_DOUBLE:
+  return "float_to_double";
 
case FS_OPCODE_DDX_COARSE:
   return "ddx_coarse";
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index c29cfb5..32c04b2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -253,6 +253,8 @@ vec4_instruction::can_do_writemask(const struct 
gen_device_info *devinfo)
 {
switch (opcode) {
case SHADER_OPCODE_GEN4_SCRATCH_READ:
+   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
+   case VEC4_OPCODE_FLOAT_TO_DOUBLE:
case VS_OPCODE_PULL_CONSTANT_LOAD:
case VS_OPCODE_PULL_CONSTANT_LOAD_GEN7:
case VS_OPCODE_SET_SIMD4X2_HEADER_GEN9:
@@ -505,6 +507,12 @@ vec4_visitor::opt_reduce_swizzle()
   case BRW_OPCODE_DP2:
  swizzle = brw_swizzle_for_size(2);
  break;
+
+  case VEC4_OPCODE_FLOAT_TO_DOUBLE:
+  case VEC4_OPCODE_DOUBLE_TO_FLOAT:
+ swizzle = brw_swizzle_for_size(4);
+ break;
+
   default:
  swizzle = brw_swizzle_for_mask(inst->dst.writemask);
  break;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 163cf9d..6f4c438 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1896,6 +1896,50 @@ generate_code(struct brw_codegen *p,
  break;
   }
 
+  case VEC4_OPCODE_DOUBLE_TO_FLOAT: {
+ assert(src[0].type == BRW_REGISTER_TYPE_DF);
+ assert(dst.type == BRW_REGISTER_TYPE_F);
+
+ brw_set_default_access_mode(p, BRW_ALIGN_1);
+
+ dst.hstride = BRW_HORIZONTAL_STRIDE_2;
+ dst.width = BRW_WIDTH_4;
+ src[0].vstride = BRW_VERTICAL_STRIDE_4;
+ src[0].width = BRW_WIDTH_4;
+ brw_MOV(p, dst, src[0]);
+
+ struct brw_reg dst_as_src = dst;
+ dst.hstride = BRW_HORIZONTAL_STRIDE_1;
+ dst.width = BRW_WIDTH_8;
+ brw_MOV(p, dst, dst_as_src);
+
+ brw_set_default_access_mode(p, BRW_ALIGN_16);
+ break;
+  }
+
+  case VEC4_OPCODE_FLOAT_TO_DOUBLE: {
+ assert(src[0].type == BRW_REGISTER_TYPE_F);
+ assert(dst.type == BRW_REGISTER_TYPE_DF);
+
+ brw_set_default_access_mode(p, BRW_ALIGN_1);
+
+ struct brw_reg tmp = retype(dst, src[0].type);
+ tmp.hstride = BRW_HORIZONTAL_STRIDE_2;
+ tmp.wid

[Mesa-dev] [PATCH v2 079/103] i965/vec4: fix move_uniform_array_access_to_pull_constant() for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index b0b5f39..f12a114 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1817,8 +1817,23 @@ 
vec4_visitor::move_uniform_array_access_to_pull_constants()
 
   assert(inst->src[0].swizzle == BRW_SWIZZLE_NOOP);
 
-  emit_pull_constant_load(block, inst, inst->dst, inst->src[0],
-  pull_constant_loc[uniform_nr], inst->src[1]);
+  if (type_sz(inst->src[0].type) != 8) {
+ emit_pull_constant_load(block, inst, inst->dst, inst->src[0],
+ pull_constant_loc[uniform_nr], inst->src[1]);
+  } else {
+ dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type);
+ dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F);
+
+ emit_pull_constant_load(block, inst, shuffled_float, inst->src[0],
+ pull_constant_loc[uniform_nr], inst->src[1]);
+ emit_pull_constant_load(block, inst, offset(shuffled_float, 1),
+ offset(inst->src[0], 1),
+ pull_constant_loc[uniform_nr], inst->src[1]);
+
+ shuffle_64bit_data(retype(inst->dst, BRW_REGISTER_TYPE_DF),
+src_reg(shuffled), false, block, inst);
+  }
+
   inst->remove(block);
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 063/103] i965/vec4: support multiple dispatch widths and groups in the IR builder.

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_builder.h | 39 ++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_builder.h 
b/src/mesa/drivers/dri/i965/brw_vec4_builder.h
index dab6e03..8352542 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_builder.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4_builder.h
@@ -52,8 +52,9 @@ namespace brw {
   /**
* Construct a vec4_builder that inserts instructions into \p shader.
*/
-  vec4_builder(backend_shader *shader) :
+  vec4_builder(backend_shader *shader, unsigned dispatch_width = 8) :
  shader(shader), block(NULL), cursor(NULL),
+ _dispatch_width(dispatch_width), _group(0),
  force_writemask_all(false),
  annotation()
   {
@@ -67,6 +68,7 @@ namespace brw {
*/
   vec4_builder(backend_shader *shader, bblock_t *block, instruction *inst) 
:
  shader(shader), block(block), cursor(inst),
+ _dispatch_width(inst->exec_size), _group(inst->group),
  force_writemask_all(inst->force_writemask_all)
   {
  annotation.str = inst->annotation;
@@ -99,6 +101,25 @@ namespace brw {
   }
 
   /**
+   * Construct a builder specifying the default SIMD width and group of
+   * channel enable signals, inheriting other code generation parameters
+   * from this.
+   *
+   * \p n gives the default SIMD width, \p i gives the slot group used for
+   * predication and control flow masking in multiples of \p n channels.
+   */
+  vec4_builder
+  group(unsigned n, unsigned i) const
+  {
+ assert(force_writemask_all ||
+(n <= dispatch_width() && i < dispatch_width() / n));
+ vec4_builder bld = *this;
+ bld._dispatch_width = n;
+ bld._group += i * n;
+ return bld;
+  }
+
+  /**
* Construct a builder with per-channel control flow execution masking
* disabled if \p b is true.  If control flow execution masking is
* already disabled this has no effect.
@@ -130,7 +151,16 @@ namespace brw {
   unsigned
   dispatch_width() const
   {
- return 8;
+ return _dispatch_width;
+  }
+
+  /**
+   * Get the channel group in use.
+   */
+  unsigned
+  group() const
+  {
+ return _group;
   }
 
   /**
@@ -281,7 +311,10 @@ namespace brw {
   instruction *
   emit(instruction *inst) const
   {
+ inst->exec_size = dispatch_width();
+ inst->group = group();
  inst->force_writemask_all = force_writemask_all;
+ inst->size_written = inst->exec_size * type_sz(inst->dst.type);
  inst->annotation = annotation.str;
  inst->ir = annotation.ir;
 
@@ -587,6 +620,8 @@ namespace brw {
   bblock_t *block;
   exec_node *cursor;
 
+  unsigned _dispatch_width;
+  unsigned _group;
   bool force_writemask_all;
 
   /** Debug annotation info. */
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 093/103] i965/vec4: split instructions that read 64-bit interleaved attributes

2016-10-11 Thread Iago Toral Quiroga
Stages that use interleaved attributes generate regions with a vstride=0
that can hit the gen7 hardware decompression bug.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 33a8c52..9b9bef1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2033,6 +2033,20 @@ vec4_visitor::convert_to_hw_regs()
}
 }
 
+bool
+stage_uses_interleaved_attributes(unsigned stage,
+  enum shader_dispatch_mode dispatch_mode)
+{
+   switch (stage) {
+  case MESA_SHADER_TESS_EVAL:
+ return true;
+  case MESA_SHADER_GEOMETRY:
+ return dispatch_mode != DISPATCH_MODE_4X2_DUAL_OBJECT;
+  default:
+ return false;
+   }
+}
+
 /**
  * Get the closest native SIMD width supported by the hardware for instruction
  * \p inst.  The instruction will be left untouched by
@@ -2041,7 +2055,8 @@ vec4_visitor::convert_to_hw_regs()
  */
 static unsigned
 get_lowered_simd_width(const struct gen_device_info *devinfo,
-   const vec4_instruction *inst)
+   enum shader_dispatch_mode dispatch_mode,
+   unsigned stage, const vec4_instruction *inst)
 {
/* Do not split some instructions that require special handling */
switch (inst->opcode) {
@@ -2076,6 +2091,14 @@ get_lowered_simd_width(const struct gen_device_info 
*devinfo,
 continue;
  if (inst->size_read(i) <= REG_SIZE)
 lowered_width = MIN2(lowered_width, 4);
+
+ /* Interleaved attribute setups use a vertical stride of 0, which
+  * makes them hit the associated instruction decompression bug in 
gen7.
+  * Split them to prevent this.
+  */
+ if (inst->src[i].file == ATTR &&
+ stage_uses_interleaved_attributes(stage, dispatch_mode))
+lowered_width = MIN2(lowered_width, 4);
   }
}
 
@@ -2117,7 +2140,8 @@ vec4_visitor::lower_simd_width()
bool progress = false;
 
foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
-  const unsigned lowered_width = get_lowered_simd_width(devinfo, inst);
+  const unsigned lowered_width =
+ get_lowered_simd_width(devinfo, prog_data->dispatch_mode, stage, 
inst);
   assert(lowered_width <= inst->exec_size);
   if (lowered_width == inst->exec_size)
  continue;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 075/103] i965/vec4: do not split scratch read/write opcodes

2016-10-11 Thread Iago Toral Quiroga
64-bit scratch read/writes require to shuffle data around so we need
to have access to the full 64-bit data. We will do the right thing
for these when we emit the messages.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index f60334f..75e47f9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2034,6 +2034,15 @@ static unsigned
 get_lowered_simd_width(const struct gen_device_info *devinfo,
const vec4_instruction *inst)
 {
+   /* Do not split some instructions that require special handling */
+   switch (inst->opcode) {
+   case SHADER_OPCODE_GEN4_SCRATCH_READ:
+   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
+  return inst->exec_size;
+   default:
+  break;
+   }
+
unsigned lowered_width = MIN2(16, inst->exec_size);
 
/* We need to split some cases of double-precision instructions that write
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 083/103] i965/vec4: fix indentation in lower_attributes_to_hw_regs()

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e732bf4..426faf0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1686,8 +1686,8 @@ vec4_visitor::lower_attributes_to_hw_regs(const int 
*attribute_map,
 {
foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
   for (int i = 0; i < 3; i++) {
-if (inst->src[i].file != ATTR)
-   continue;
+ if (inst->src[i].file != ATTR)
+continue;
 
  int grf = attribute_map[inst->src[i].nr +
  inst->src[i].offset / REG_SIZE];
@@ -1698,13 +1698,13 @@ vec4_visitor::lower_attributes_to_hw_regs(const int 
*attribute_map,
   */
  assert(grf != 0);
 
-struct brw_reg reg = attribute_to_hw_reg(grf, interleaved);
-reg.swizzle = inst->src[i].swizzle;
+ struct brw_reg reg = attribute_to_hw_reg(grf, interleaved);
+ reg.swizzle = inst->src[i].swizzle;
  reg.type = inst->src[i].type;
-if (inst->src[i].abs)
-   reg = brw_abs(reg);
-if (inst->src[i].negate)
-   reg = negate(reg);
+ if (inst->src[i].abs)
+reg = brw_abs(reg);
+ if (inst->src[i].negate)
+reg = negate(reg);
 
  inst->src[i] = reg;
   }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 052/103] i965/vec4: split double-precision bcsel

2016-10-11 Thread Iago Toral Quiroga
There is a hardware bug affecting compressed double-precision bcsel
instructions in align16 mode by which they won't read predication mask
properly. The bug does not affect other predicated instructions
and it does not affect bcsel in Align1 mode either. This was found
empirically and verified by Curro in the simulator.

Fix this by splitting double-precision bcsel in Align16 mode to use an
execution size of 4.

v2: Check that the dst type is 64-bit, since we can have 16-wide single
precision bcsel instructions that also write 2 registers.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 69fdb1e..48816be 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1997,6 +1997,12 @@ get_lowered_simd_width(const struct gen_device_info 
*devinfo,
 * only hardware that implements fp64 in Align16.
 */
if (devinfo->gen == 7 && inst->size_written > REG_SIZE) {
+  /* Align16 8-wide double-precision bcsel does not work well. Verified
+   * empirically.
+   */
+  if (inst->opcode == BRW_OPCODE_SEL && type_sz(inst->dst.type) == 8)
+ lowered_width = MIN2(lowered_width, 4);
+
   /* HSW PRM, 3D Media GPGPU Engine, Region Alignment Rules for Direct
* Register Addressing:
*
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 042/103] i965/vec4: dump the instruction execution size

2016-10-11 Thread Iago Toral Quiroga
Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 2bde628..3191eab 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1437,7 +1437,8 @@ vec4_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
   pred_ctrl_align16[inst->predicate]);
}
 
-   fprintf(file, "%s", brw_instruction_name(devinfo, inst->opcode));
+   fprintf(file, "%s(%d)", brw_instruction_name(devinfo, inst->opcode),
+   inst->exec_size);
if (inst->saturate)
   fprintf(file, ".sat");
if (inst->conditional_mod) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 068/103] i965/vec4: don't constant propagate 64-bit immediates

2016-10-11 Thread Iago Toral Quiroga
From: Connor Abbott 

v2: Also check if the instruction source target is 64-bit. (Samuel)

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 49920c2..7b53aed 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -151,6 +151,13 @@ try_constant_propagate(const struct gen_device_info 
*devinfo,
if (value.file != IMM)
   return false;
 
+   /* 64-bit types can't be used except for one-source instructions, which
+* higher levels should have constant folded away, so there's no point in
+* propagating immediates here.
+*/
+   if (type_sz(value.type) == 8 || type_sz(inst->src[arg].type) == 8)
+  return false;
+
if (value.type == BRW_REGISTER_TYPE_VF) {
   /* The result of bit-casting the component values of a vector float
* cannot in general be represented as an immediate.
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 043/103] i965/vec4: handle 32 and 64 bit channels in liveness analysis

2016-10-11 Thread Iago Toral Quiroga
From: "Juan A. Suarez Romero" 

Our current data flow analysis does not take into account that channels
on 64-bit operands are 64-bit. This is a problem when the same register
is accessed using both 64-bit and 32-bit channels. This is very common
in operations where we need to access 64-bit data in 32-bit chunks,
such as the double packing and packing operations.

This patch changes the analysis by checking the bits that each source
or destination datatype needs. Actually, rather than bits, we use
blocks of 32bits, which is the minimum channel size.

Because a vgrf can contain a dvec4 (256 bits), we reserve 8
32-bit blocks to map the channels.

v2 (Curro):
  - Simplify code by making the var_from_reg helpers take an extra
argument with the register component we want.
  - Fix a couple of cases where we had to update the code to the new
way of representing live variables.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp |  2 +-
 .../dri/i965/brw_vec4_dead_code_eliminate.cpp  | 25 +
 .../drivers/dri/i965/brw_vec4_live_variables.cpp   | 32 +++---
 .../drivers/dri/i965/brw_vec4_live_variables.h | 15 ++
 5 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 3191eab..34cab04 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1140,7 +1140,7 @@ vec4_visitor::opt_register_coalesce()
   /* Can't coalesce this GRF if someone else was going to
* read it later.
*/
-  if (var_range_end(var_from_reg(alloc, dst_reg(inst->src[0])), 4) > ip)
+  if (var_range_end(var_from_reg(alloc, dst_reg(inst->src[0])), 8) > ip)
 continue;
 
   /* We need to check interference with the final destination between this
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index 1b91db9..bef897a 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
@@ -246,7 +246,7 @@ vec4_visitor::opt_cse_local(bblock_t *block)
  * more -- a sure sign they'll fail operands_match().
  */
 if (src->file == VGRF) {
-   if (var_range_end(var_from_reg(alloc, dst_reg(*src)), 4) < ip) {
+   if (var_range_end(var_from_reg(alloc, dst_reg(*src)), 8) < ip) {
   entry->remove();
   ralloc_free(entry);
   break;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
index 950c6c8..6a80810 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_dead_code_eliminate.cpp
@@ -57,12 +57,13 @@ vec4_visitor::dead_code_eliminate()
  if ((inst->dst.file == VGRF && !inst->has_side_effects()) ||
  (inst->dst.is_null() && inst->writes_flag())){
 bool result_live[4] = { false };
-
 if (inst->dst.file == VGRF) {
-   for (unsigned i = 0; i < regs_written(inst); i++) {
-  for (int c = 0; c < 4; c++)
- result_live[c] |= BITSET_TEST(
-live, var_from_reg(alloc, offset(inst->dst, i), c));
+   for (unsigned i = 0; i < 2 * regs_written(inst); i++) {
+  for (int c = 0; c < 4; c++) {
+ const unsigned v =
+var_from_reg(alloc, inst->dst, c, i);
+ result_live[c] |= BITSET_TEST(live, v);
+  }
}
 } else {
for (unsigned c = 0; c < 4; c++)
@@ -111,11 +112,12 @@ vec4_visitor::dead_code_eliminate()
 
  if (inst->dst.file == VGRF && !inst->predicate &&
  !inst->is_align1_partial_write()) {
-for (unsigned i = 0; i < regs_written(inst); i++) {
+for (unsigned i = 0; i < 2 * regs_written(inst); i++) {
for (int c = 0; c < 4; c++) {
   if (inst->dst.writemask & (1 << c)) {
- BITSET_CLEAR(live, var_from_reg(alloc,
- offset(inst->dst, i), c));
+ const unsigned v =
+var_from_reg(alloc, inst->dst, c, i);
+ BITSET_CLEAR(live, v);
   }
}
 }
@@ -133,10 +135,11 @@ vec4_visitor::dead_code_eliminate()
 
  for (int i = 0; i < 3; i++) {
 if (inst->src[i].file == VGRF) {
-   for (unsigned j = 0; j < regs_read(inst, i); j++) {
+   for (unsigned j = 0; j < 2 * regs_read(inst, i); j++) {
   for (int c = 0; c < 4; c++) {
- BITSET_SET(live, var_from_reg(alloc,
-   

[Mesa-dev] [PATCH v2 059/103] i965/vec4: fix indentation in pack_uniform_registers

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b79fd5e..45d49e9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -660,25 +660,25 @@ vec4_visitor::pack_uniform_registers()
   int dst;
   /* Find the lowest place we can slot this uniform in. */
   for (dst = 0; dst < src; dst++) {
-if (chans_used[dst] + size <= 4)
-   break;
+ if (chans_used[dst] + size <= 4)
+break;
   }
 
   if (src == dst) {
-new_loc[src] = dst;
-new_chan[src] = 0;
+ new_loc[src] = dst;
+ new_chan[src] = 0;
   } else {
-new_loc[src] = dst;
-new_chan[src] = chans_used[dst];
+ new_loc[src] = dst;
+ new_chan[src] = chans_used[dst];
 
-/* Move the references to the data */
-for (int j = 0; j < size; j++) {
-   stage_prog_data->param[dst * 4 + new_chan[src] + j] =
-  stage_prog_data->param[src * 4 + j];
-}
+ /* Move the references to the data */
+ for (int j = 0; j < size; j++) {
+stage_prog_data->param[dst * 4 + new_chan[src] + j] =
+   stage_prog_data->param[src * 4 + j];
+ }
 
-chans_used[dst] += size;
-chans_used[src] = 0;
+ chans_used[dst] += size;
+ chans_used[src] = 0;
   }
 
   new_uniform_count = MAX2(new_uniform_count, dst + 1);
@@ -691,8 +691,8 @@ vec4_visitor::pack_uniform_registers()
   for (int i = 0 ; i < 3; i++) {
  int src = inst->src[i].nr;
 
-if (inst->src[i].file != UNIFORM)
-   continue;
+ if (inst->src[i].file != UNIFORM)
+continue;
 
  inst->src[i].nr = new_loc[src];
  inst->src[i].swizzle += BRW_SWIZZLE4(new_chan[src], new_chan[src],
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 095/103] i965/vec4/scalarize_df: support more swizzles via vstride=0

2016-10-11 Thread Iago Toral Quiroga
By exploiting gen7's hardware decompression bug with vstride=0 we gain the
capacity to support additional swizzle combinations.

This also fixes ZW writes from X/Y channels like in:

mov r2.z:df r0.:df

Because DF regions use 2-wide rows with a vstride of 2, the region generated
for the source would be r0<2,2,1>.xyxy:DF, which is equivalent to r0.xxzz, so
we end up writing r0.z in r2.z instead of r0.x. Using a vertical stride of 0
in these cases we get to replicate the XX swizzle and write what we want.
---
 src/mesa/drivers/dri/i965/brw_reg.h|  2 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 68 --
 src/mesa/drivers/dri/i965/brw_vec4.h   |  2 +-
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 39cc25a..f849f42 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -81,11 +81,13 @@ struct gen_device_info;
 #define BRW_SWIZZLE_  BRW_SWIZZLE4(2,2,2,2)
 #define BRW_SWIZZLE_  BRW_SWIZZLE4(3,3,3,3)
 #define BRW_SWIZZLE_XYXY  BRW_SWIZZLE4(0,1,0,1)
+#define BRW_SWIZZLE_YXYX  BRW_SWIZZLE4(1,0,1,0)
 #define BRW_SWIZZLE_XZXZ  BRW_SWIZZLE4(0,2,0,2)
 #define BRW_SWIZZLE_YZXW  BRW_SWIZZLE4(1,2,0,3)
 #define BRW_SWIZZLE_YWYW  BRW_SWIZZLE4(1,3,1,3)
 #define BRW_SWIZZLE_ZXYW  BRW_SWIZZLE4(2,0,1,3)
 #define BRW_SWIZZLE_ZWZW  BRW_SWIZZLE4(2,3,2,3)
+#define BRW_SWIZZLE_WZWZ  BRW_SWIZZLE4(3,2,3,2)
 #define BRW_SWIZZLE_WZYX  BRW_SWIZZLE4(3,2,1,0)
 #define BRW_SWIZZLE_XXZZ  BRW_SWIZZLE4(0,0,2,2)
 #define BRW_SWIZZLE_YYWW  BRW_SWIZZLE4(1,1,3,3)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 438dce1..d33fb65 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2259,18 +2259,33 @@ scalarize_predicate(brw_predicate predicate, unsigned 
writemask)
}
 }
 
+/* Gen7 has a hardware decompression bug that we can exploit to represent
+ * handful of additional swizzles natively.
+ */
+static bool
+is_gen7_supported_64bit_swizzle(vec4_instruction *inst, unsigned arg)
+{
+   switch (inst->src[arg].swizzle) {
+   case BRW_SWIZZLE_:
+   case BRW_SWIZZLE_:
+   case BRW_SWIZZLE_:
+   case BRW_SWIZZLE_:
+   case BRW_SWIZZLE_XYXY:
+   case BRW_SWIZZLE_YXYX:
+   case BRW_SWIZZLE_ZWZW:
+   case BRW_SWIZZLE_WZWZ:
+  return true;
+   default:
+  return false;
+   }
+}
+
 /* 64-bit sources use regions with a width of 2. These 2 elements in each row
  * can be addressed using 32-bit swizzles (which is what the hardware supports)
  * but it also means that the swizzle we apply on the first two components of a
  * dvec4 is coupled with the swizzle we use for the last 2. In other words,
  * only some specific swizzle combinations can be natively supported.
  *
- * FIXME: We can also exploit the vstride 0 decompression bug in gen7 to
- *implement some more swizzles via simple translations. For
- *example:  as XYXY,  as ZWZW (same for  and  by
- *using subnr), XYXY as XYZW, YXYX as ZWXY (same for ZWZW and
- *WZWZ using subnr).
- *
  * FIXME: we can go an step further and implement even more swizzle
  *variations using only partial scalarization.
  *
@@ -2278,8 +2293,9 @@ scalarize_predicate(brw_predicate predicate, unsigned 
writemask)
  * https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82
  */
 bool
-vec4_visitor::is_supported_64bit_region(src_reg src)
+vec4_visitor::is_supported_64bit_region(vec4_instruction *inst, unsigned arg)
 {
+   const src_reg &src = inst->src[arg];
assert(type_sz(src.type) == 8);
 
/* Uniform regions have a vstride=0. Because we use 2-wide rows with
@@ -2301,7 +2317,7 @@ vec4_visitor::is_supported_64bit_region(src_reg src)
case BRW_SWIZZLE_YXWZ:
   return true;
default:
-  return false;
+  return devinfo->gen == 7 && is_gen7_supported_64bit_swizzle(inst, arg);
}
 }
 
@@ -2340,8 +2356,7 @@ vec4_visitor::scalarize_df()
  for (unsigned i = 0; i < 3; i++) {
 if (inst->src[i].file == BAD_FILE || type_sz(inst->src[i].type) < 
8)
continue;
-skip_lowering = skip_lowering &&
-is_supported_64bit_region(inst->src[i]);
+skip_lowering = skip_lowering && is_supported_64bit_region(inst, 
i);
  }
   }
 
@@ -2455,9 +2470,10 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg 
*hw_reg,
 
/* Take the 64-bit logical swizzle channel and translate it to 32-bit */
assert(brw_is_single_value_swizzle(reg.swizzle) ||
-  is_supported_64bit_region(reg));
+  is_supported_64bit_region(inst, arg));
 
-   if (is_supported_64bit_region(reg)) {
+   if (is_supported_64bit_region(inst, arg) &&
+   !is_gen7_supported_64bit_swizzle(inst, arg)) {
   /* Supported 64-bit swizzles are those such 

[Mesa-dev] [PATCH v2 027/103] i965/vec4: make opt_vector_float ignore doubles

2016-10-11 Thread Iago Toral Quiroga
The pass does not support doubles in its current form.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 06fa38f..675b7fc 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -393,6 +393,7 @@ vec4_visitor::opt_vector_float()
  inst->src[0].file == IMM &&
  inst->predicate == BRW_PREDICATE_NONE &&
  inst->dst.writemask != WRITEMASK_XYZW &&
+ type_sz(inst->src[0].type) < 8 &&
  (inst->src[0].type == inst->dst.type || inst->src[0].d == 0)) {
 
 vf = brw_float_to_vf(inst->src[0].d);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 048/103] i965/vec4: dump NibCtrl for instructions with execsize != 8

2016-10-11 Thread Iago Toral Quiroga
v2: do it in the same fashion as the FS backend for consistency (Curro)
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 490cbae..69fdb1e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1612,6 +1612,9 @@ vec4_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
if (inst->force_writemask_all)
   fprintf(file, " NoMask");
 
+   if (inst->exec_size != 8)
+  fprintf(file, " group%d", inst->group);
+
fprintf(file, "\n");
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 030/103] i965/vec4: add helpers for conversions to/from doubles

2016-10-11 Thread Iago Toral Quiroga
Use these helpers to implement d2f and f2d. We will reuse these helpers when
we implement things like d2i or i2d as well.
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  5 +++
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 54 +++---
 2 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 86e58f3..0111966 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -317,6 +317,11 @@ public:
 
bool optimize_predicate(nir_alu_instr *instr, enum brw_predicate 
*predicate);
 
+   void emit_double_to_single(dst_reg dst, src_reg src, bool saturate,
+  brw_reg_type single_type);
+   void emit_single_to_double(dst_reg dst, src_reg src, bool saturate,
+  brw_reg_type single_type);
+
virtual void emit_nir_code();
virtual void nir_setup_uniforms();
virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 502a290..94d0161 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1074,6 +1074,34 @@ emit_find_msb_using_lzd(const vec4_builder &bld,
 }
 
 void
+vec4_visitor::emit_double_to_single(dst_reg dst, src_reg src, bool saturate,
+brw_reg_type single_type)
+{
+   dst_reg temp = dst_reg(this, glsl_type::dvec4_type);
+   emit(MOV(temp, src));
+
+   dst_reg temp2 = dst_reg(this, glsl_type::dvec4_type);
+   temp2 = retype(temp2, single_type);
+   emit(VEC4_OPCODE_DOUBLE_TO_SINGLE, temp2, src_reg(temp))
+  ->size_written = 2 * REG_SIZE;
+
+   vec4_instruction *inst = emit(MOV(dst, src_reg(temp2)));
+   inst->saturate = saturate;
+}
+
+void
+vec4_visitor::emit_single_to_double(dst_reg dst, src_reg src, bool saturate,
+brw_reg_type single_type)
+{
+   dst_reg tmp_dst = dst_reg(src_reg(this, glsl_type::dvec4_type));
+   src_reg tmp_src = retype(src_reg(this, glsl_type::vec4_type), single_type);
+   emit(MOV(dst_reg(tmp_src), retype(src, single_type)));
+   emit(VEC4_OPCODE_SINGLE_TO_DOUBLE, tmp_dst, tmp_src);
+   vec4_instruction *inst = emit(MOV(dst, src_reg(tmp_dst)));
+   inst->saturate = saturate;
+}
+
+void
 vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 {
vec4_instruction *inst;
@@ -1117,29 +1145,15 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   inst = emit(MOV(dst, op[0]));
   break;
 
-   case nir_op_d2f: {
-  dst_reg temp = dst_reg(this, glsl_type::dvec4_type);
-  emit(MOV(temp, op[0]));
-
-  dst_reg temp2 = dst_reg(this, glsl_type::dvec4_type);
-  temp2 = retype(temp2, BRW_REGISTER_TYPE_F);
-  emit(VEC4_OPCODE_DOUBLE_TO_SINGLE, temp2, src_reg(temp))
- ->regs_written = 2 * REG_SIZE;
-
-  vec4_instruction *inst = emit(MOV(dst, src_reg(temp2)));
-  inst->saturate = instr->dest.saturate;
+   case nir_op_d2f:
+  emit_double_to_single(dst, op[0], instr->dest.saturate,
+BRW_REGISTER_TYPE_F);
   break;
-   }
 
-   case nir_op_f2d: {
-  dst_reg tmp_dst = dst_reg(src_reg(this, glsl_type::dvec4_type));
-  src_reg tmp_src = src_reg(this, glsl_type::vec4_type);
-  emit(MOV(dst_reg(tmp_src), retype(op[0], BRW_REGISTER_TYPE_F)));
-  emit(VEC4_OPCODE_SINGLE_TO_DOUBLE, tmp_dst, tmp_src);
-  vec4_instruction *inst = emit(MOV(dst, src_reg(tmp_dst)));
-  inst->saturate = instr->dest.saturate;
+   case nir_op_f2d:
+  emit_single_to_double(dst, op[0], instr->dest.saturate,
+BRW_REGISTER_TYPE_F);
   break;
-   }
 
case nir_op_iadd:
   assert(nir_dest_bit_size(instr->dest.dest) < 64);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 064/103] i965/vec4: Add a shuffle_64bit_data helper

2016-10-11 Thread Iago Toral Quiroga
SIMD4x2 64bit data is stored in register space like this:

r0.0:DF  x0 y0 z0 w0
r0.1:DF  x1 y1 z1 w1

When we need to write data such as this to memory using 32-bit write
messages we need to shuffle it in this fashion:

r0.0:DF  x0 y0 x1 y1
r0.1:DF  z0 w0 z1 w1

and emit two 32-bit write messages, one for r0.0 at base_offset
and another one for r0.1 at base_offset+16.

We also need to do the inverse operation when we read using 32-bit messages
to produce valid SIMD4x2 64bit data from the data read. We can achieve this
by aplying the exact same shuffling to the data read, although we need to
apply different channel enables since the layout of the data is reversed.

This helper implements the data shuffling logic and we will use it in
various places where we read and write 64bit data from/to memory.

v2 (Curro):
  - Use the writemask helper and don't assert on the original writemask
being XYZW.
  - Use the Vec4 IR builder to simplify the implementation.
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  5 +++
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 70 ++
 2 files changed, 75 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 0af55c5..6942918 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -330,6 +330,11 @@ public:
 
src_reg setup_imm_df(double v);
 
+   vec4_instruction *shuffle_64bit_data(dst_reg dst, src_reg src,
+bool for_write,
+bblock_t *block = NULL,
+vec4_instruction *ref = NULL);
+
virtual void emit_nir_code();
virtual void nir_setup_uniforms();
virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 0b8c808..04e95a7 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -2227,4 +2227,74 @@ vec4_visitor::nir_emit_undef(nir_ssa_undef_instr *instr)
   dst_reg(VGRF, alloc.allocate(DIV_ROUND_UP(instr->def.bit_size, 32)));
 }
 
+/* SIMD4x2 64bit data is stored in register space like this:
+ *
+ * r0.0:DF  x0 y0 z0 w0
+ * r0.1:DF  x1 y1 z1 w1
+ *
+ * When we need to write data such as this to memory using 32-bit write
+ * messages we need to shuffle it in this fashion:
+ *
+ * r0.0:DF  x0 y0 x1 y1 (to be written at base offset)
+ * r0.0:DF  z0 w0 z1 w1 (to be written at base offset + 16)
+ *
+ * We need to do the inverse operation when we read using 32-bit messages,
+ * which we can do by applying the same exact shuffling on the 64-bit data
+ * read, only that because the data for each vertex is positioned differently
+ * we need to apply different channel enables.
+ *
+ * This function takes 64bit data and shuffles it as explained above.
+ *
+ * The @for_write parameter is used to specify if the shuffling is being done
+ * for proper SIMD4x2 64-bit data that needs to be shuffled prior to a 32-bit
+ * write message (for_write = true), or instead we are doing the inverse
+ * opperation and we have just read 64-bit data using a 32-bit messages that we
+ * need to shuffle to create valid SIMD4x2 64-bit data (for_write = false).
+ *
+ * If @block and @ref are non-NULL, then the shuffling is done after @ref,
+ * otherwise the instructions are emitted normally at the end. The function
+ * returns the last instruction inserted.
+ *
+ * Notice that @src and @dst cannot be the same register.
+ */
+vec4_instruction *
+vec4_visitor::shuffle_64bit_data(dst_reg dst, src_reg src, bool for_write,
+ bblock_t *block, vec4_instruction *ref)
+{
+   assert(type_sz(src.type) == 8);
+   assert(type_sz(dst.type) == 8);
+   assert(!regions_overlap(dst, 2 * REG_SIZE, src, 2 * REG_SIZE));
+   assert(!ref == !block);
+
+   const vec4_builder bld = !ref ? vec4_builder(this).at_end() :
+   vec4_builder(this).at(block, ref->next);
+
+   /* Resolve swizzle in src */
+   vec4_instruction *inst;
+   if (src.swizzle != BRW_SWIZZLE_XYZW) {
+  dst_reg data = dst_reg(this, glsl_type::dvec4_type);
+  inst = bld.MOV(data, src);
+  src = src_reg(data);
+   }
+
+   /* dst+0.XY = src+0.XY */
+   inst = bld.group(4, 0).MOV(writemask(dst, WRITEMASK_XY), src);
+
+   /* dst+0.ZW = src+1.XY */
+   inst = bld.group(4, for_write ? 1 : 0).
+MOV(writemask(dst, WRITEMASK_ZW),
+swizzle(offset(src, 1), BRW_SWIZZLE_XYXY));
+
+   /* dst+1.XY = src+0.ZW */
+   inst = bld.group(4, for_write ? 0 : 1).
+MOV(writemask(offset(dst, 1), WRITEMASK_XY),
+swizzle(src, BRW_SWIZZLE_ZWZW));
+
+   /* dst+1.ZW = src+1.ZW */
+   inst = bld.group(4, 1).
+MOV(writemask(offset(dst, 1), WRITEMASK_ZW), offset(src, 1));
+
+   return inst;
+}
+
 }
-- 
2.7.4


[Mesa-dev] [PATCH v2 055/103] i965/vec4: implement access to DF source components Z/W

2016-10-11 Thread Iago Toral Quiroga
The general idea is that with 32-bit swizzles we cannot address DF
components Z/W directly, so instead we select the region that starts
at the the 16B offset into the register and use X/Y swizzles.

The above, however, has the caveat that we can't do that without
violating register region restrictions unless we probably do some
sort of SIMD splitting.

Alternatively, we can accomplish what we need without SIMD splitting
by exploiting the gen7 hardware decompression bug for instructions
with a vstride=0. For example, an instruction like this:

mov(8) r2.x:DF r0.2<0>xyzw:DF

Activates the hardware bug and produces this region:

Component: x0   y0   z0   w0   x1   y1   z1   w1
Register:  r0.2 r0.3 r0.2 r0.3 r1.2 r1.3 r1.2 r1.3

Where r0.2 and r0.3 are r0.z:DF for the first vertex of the SIMD4x2
execution and r1.2 and r1.3 are the same for the second vertex.

Using this to our advantage we can select r0.z:DF by doing
r0.2<0,2,1>.xyxy and r0.w by doing r0.2<0,2,1>.zwzw without needing
to split the instruction.

Of course, this only works for gen7, but that is the only hardware
platform were we implement align16/fp64 at the moment.

v2: Adapted to the fact that we now do this after converting to
hardware registers (Iago)
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b37dd59..c728e38 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2264,7 +2264,28 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg 
*hw_reg,
 */
assert(brw_is_single_value_swizzle(reg.swizzle));
 
+   /* To gain access to Z/W components we need to select the second half
+* of the register and then use a X/Y swizzle to select Z/W respectively.
+*/
unsigned swizzle = BRW_GET_SWZ(reg.swizzle, 0);
+
+   if (swizzle >= 2) {
+  *hw_reg = suboffset(*hw_reg, 2);
+  swizzle -= 2;
+   }
+
+   /* Any 64-bit source with an offset at 16B is intended to address the
+* second half of a register and needs a vertical stride of 0 so we:
+*
+* 1. Don't violate register region restrictions.
+* 2. Activate the gen7 instruction decompresion bug exploit when
+*execsize > 4
+*/
+   if (hw_reg->subnr % REG_SIZE == 16) {
+  assert(devinfo->gen == 7);
+  hw_reg->vstride = BRW_VERTICAL_STRIDE_0;
+   }
+
hw_reg->swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1,
   swizzle * 2, swizzle * 2 + 1);
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 094/103] i965/vec4/scalarize_df: do not scalarize swizzles that we can support natively

2016-10-11 Thread Iago Toral Quiroga
Certain swizzles like XYZW can be supported by translating only the first two
64-bit swizzle channels to 32-bit channels. This happens with swizzles such
that the first two logical components, when translated to 32-bit channels and
replicated across the second dvec2 row, select the same channels specified by
the 3rd and 4th logical swizzle components.

Notice that this opens up the possibility that some instructions are not
scalarized and can end up with XY or ZW 32-bit writemasks. Make sure we always
scalarize in such cases.
---
 src/mesa/drivers/dri/i965/brw_reg.h|   3 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 133 ++---
 src/mesa/drivers/dri/i965/brw_vec4.h   |   1 +
 3 files changed, 112 insertions(+), 25 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
b/src/mesa/drivers/dri/i965/brw_reg.h
index 1fa2595..39cc25a 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -87,6 +87,9 @@ struct gen_device_info;
 #define BRW_SWIZZLE_ZXYW  BRW_SWIZZLE4(2,0,1,3)
 #define BRW_SWIZZLE_ZWZW  BRW_SWIZZLE4(2,3,2,3)
 #define BRW_SWIZZLE_WZYX  BRW_SWIZZLE4(3,2,1,0)
+#define BRW_SWIZZLE_XXZZ  BRW_SWIZZLE4(0,0,2,2)
+#define BRW_SWIZZLE_YYWW  BRW_SWIZZLE4(1,1,3,3)
+#define BRW_SWIZZLE_YXWZ  BRW_SWIZZLE4(1,0,3,2)
 
 #define BRW_SWZ_COMP_INPUT(comp) (BRW_SWIZZLE_XYZW >> ((comp)*2))
 #define BRW_SWZ_COMP_OUTPUT(comp) (BRW_SWIZZLE_XYZW << ((comp)*2))
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 9b9bef1..438dce1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2259,6 +2259,52 @@ scalarize_predicate(brw_predicate predicate, unsigned 
writemask)
}
 }
 
+/* 64-bit sources use regions with a width of 2. These 2 elements in each row
+ * can be addressed using 32-bit swizzles (which is what the hardware supports)
+ * but it also means that the swizzle we apply on the first two components of a
+ * dvec4 is coupled with the swizzle we use for the last 2. In other words,
+ * only some specific swizzle combinations can be natively supported.
+ *
+ * FIXME: We can also exploit the vstride 0 decompression bug in gen7 to
+ *implement some more swizzles via simple translations. For
+ *example:  as XYXY,  as ZWZW (same for  and  by
+ *using subnr), XYXY as XYZW, YXYX as ZWXY (same for ZWZW and
+ *WZWZ using subnr).
+ *
+ * FIXME: we can go an step further and implement even more swizzle
+ *variations using only partial scalarization.
+ *
+ * For more details see:
+ * https://bugs.freedesktop.org/show_bug.cgi?id=92760#c82
+ */
+bool
+vec4_visitor::is_supported_64bit_region(src_reg src)
+{
+   assert(type_sz(src.type) == 8);
+
+   /* Uniform regions have a vstride=0. Because we use 2-wide rows with
+* 64-bit regions it means that we cannot access components Z/W, so
+* return false for any such case. Interleaved attributes will also be
+* mapped to GRF registers with a vstride of 0, so apply the same
+* treatment.
+*/
+   if ((is_uniform(src) ||
+(stage_uses_interleaved_attributes(stage, prog_data->dispatch_mode) &&
+ src.file == ATTR)) &&
+   (brw_mask_for_swizzle(src.swizzle) & 12))
+  return false;
+
+   switch (src.swizzle) {
+   case BRW_SWIZZLE_XYZW:
+   case BRW_SWIZZLE_XXZZ:
+   case BRW_SWIZZLE_YYWW:
+   case BRW_SWIZZLE_YXWZ:
+  return true;
+   default:
+  return false;
+   }
+}
+
 bool
 vec4_visitor::scalarize_df()
 {
@@ -2279,6 +2325,29 @@ vec4_visitor::scalarize_df()
   if (!is_double)
  continue;
 
+  /* Skip the lowering for specific regioning scenarios that we can
+   * support natively.
+   */
+  bool skip_lowering = true;
+
+  /* XY and ZW writemasks operate in 32-bit, which means that they don't
+   * have a native 64-bit representation and they should always be split.
+   */
+  if (inst->dst.writemask == WRITEMASK_XY ||
+  inst->dst.writemask == WRITEMASK_ZW) {
+ skip_lowering = false;
+  } else {
+ for (unsigned i = 0; i < 3; i++) {
+if (inst->src[i].file == BAD_FILE || type_sz(inst->src[i].type) < 
8)
+   continue;
+skip_lowering = skip_lowering &&
+is_supported_64bit_region(inst->src[i]);
+ }
+  }
+
+  if (skip_lowering)
+ continue;
+
   /* Generate scalar instructions for each enabled channel */
   for (unsigned chan = 0; chan < 4; chan++) {
  unsigned chan_mask = 1 << chan;
@@ -2384,35 +2453,49 @@ vec4_visitor::apply_logical_swizzle(struct brw_reg 
*hw_reg,
   return;
}
 
-   /* Otherwise we should have scalarized the instruction, so take the single
-* 64-bit logical swizzle channel and translate it to 32-bit
-*/
-   assert(brw_is_single_value_swizzle(reg.swizzle));
+   /* Take the 64-bit logical swizzle c

[Mesa-dev] [PATCH v2 069/103] i965/vec4: prevent copy-propagation from values with a different type size

2016-10-11 Thread Iago Toral Quiroga
Because the meaning of the swizzles and writemasks involved is different,
so replacing the source would lead to different semantics.
---
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 7b53aed..08da96d 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -324,6 +324,13 @@ try_copy_propagate(const struct gen_device_info *devinfo,
value.file != ATTR)
   return false;
 
+   /* If the type of the copy value is different from the type of the
+* instruction then the swizzles and writemasks involved don't have the same
+* meaning and simply replacing the source would produce different 
semantics.
+*/
+   if (type_sz(value.type) != type_sz(inst->src[arg].type))
+  return false;
+
if (devinfo->gen >= 8 && (value.negate || value.abs) &&
is_logic_op(inst->opcode)) {
   return false;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 092/103] i965/vec4: dump subnr for FIXED_GRF

2016-10-11 Thread Iago Toral Quiroga
This came in handy when debugging the payload setup for Tess Eval,
since it prints correct subnr for attributes that can be loaded
in the second half of a register.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 56a46ad..33a8c52 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1561,7 +1561,7 @@ vec4_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
  fprintf(file, "vgrf%d", inst->src[i].nr);
  break;
   case FIXED_GRF:
- fprintf(file, "g%d", inst->src[i].nr);
+ fprintf(file, "g%d.%d", inst->src[i].nr, inst->src[i].subnr);
  break;
   case ATTR:
  fprintf(file, "attr%d", inst->src[i].nr);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 099/103] i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit access

2016-10-11 Thread Iago Toral Quiroga
When 64-bit registers are (un)spilled, we need to execute data shuffling
code before writing to or after reading from memory. If we have instructions
that operate on 64-bit data via 32-bit instructions, (un)spills for the
register produced by 32-bit instructions will not do data shuffling at all
(because we only see a normal 32-bit istruction seemingly operating on
32-bit data). This means that subsequent reads with that register using DF
access will unshuffle data read from memory that was never adequately
shuffled when it was written.

Fixing this would require to identify which 32-bit instructions write
64-bit data and emit spill instructions only when the full 64-bit
data has been written (by multiple 32-bit instructions writing to different
offsets of the same register) and always emit 64-bit unspills whenever
64-bit data is read, even when the instruction uses a 32-bit type to read
from them.
---
 .../drivers/dri/i965/brw_vec4_reg_allocate.cpp | 24 ++
 1 file changed, 24 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
index 7aff2d8..79951e2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
@@ -374,9 +374,13 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, 
bool *no_spill)
 {
float loop_scale = 1.0;
 
+   unsigned *reg_type_size = (unsigned *)
+  ralloc_size(NULL, this->alloc.count * sizeof(unsigned));
+
for (unsigned i = 0; i < this->alloc.count; i++) {
   spill_costs[i] = 0.0;
   no_spill[i] = alloc.sizes[i] != 1 && alloc.sizes[i] != 2;
+  reg_type_size[i] = 0;
}
 
/* Calculate costs for spilling nodes.  Call it a cost of 1 per
@@ -406,6 +410,15 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, 
bool *no_spill)
if (type_sz(inst->src[i].type) == 8 && inst->exec_size != 8)
   no_spill[inst->src[i].nr] = true;
 }
+
+/* We can't spill registers that mix 32-bit and 64-bit access (that
+ * contain 64-bit data that is operated on via 32-bit instructions)
+ */
+unsigned type_size = type_sz(inst->src[i].type);
+if (reg_type_size[inst->src[i].nr] == 0)
+   reg_type_size[inst->src[i].nr] = type_size;
+else if (reg_type_size[inst->src[i].nr] != type_size)
+   no_spill[inst->src[i].nr] = true;
  }
   }
 
@@ -422,6 +435,15 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, 
bool *no_spill)
   */
  if (type_sz(inst->dst.type) == 8 && inst->exec_size != 8)
 no_spill[inst->dst.nr] = true;
+
+ /* We can't spill registers that mix 32-bit and 64-bit access (that
+  * contain 64-bit data that is operated on via 32-bit instructions)
+  */
+ unsigned type_size = type_sz(inst->dst.type);
+ if (reg_type_size[inst->dst.nr] == 0)
+reg_type_size[inst->dst.nr] = type_size;
+ else if (reg_type_size[inst->dst.nr] != type_size)
+no_spill[inst->dst.nr] = true;
   }
 
   switch (inst->opcode) {
@@ -448,6 +470,8 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool 
*no_spill)
  break;
   }
}
+
+   ralloc_free(reg_type_size);
 }
 
 int
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 065/103] i965/vec4: Fix UBO loads for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
We need to emit 2 32-bit load messages to load a full dvec4. If only
1 or 2 double components are needed dead-code-elimination will remove
the second one.

We also need to shuffle the result of the 32-bit messages to form
valid 64-bit SIMD4x2 data.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 46 +-
 1 file changed, 32 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 04e95a7..f234e65 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -829,31 +829,49 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
nir->info.num_ubos - 1);
   }
 
-  src_reg offset;
+  src_reg offset_reg;
   nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]);
   if (const_offset) {
- offset = brw_imm_ud(const_offset->u32[0] & ~15);
+ offset_reg = src_reg(this, glsl_type::uint_type);
+ emit(MOV(dst_reg(offset_reg), brw_imm_ud(const_offset->u32[0] & 
~15)));
   } else {
- offset = get_nir_src(instr->src[1], nir_type_uint32, 1);
+ offset_reg = get_nir_src(instr->src[1], nir_type_uint32, 1);
   }
 
-  src_reg packed_consts = src_reg(this, glsl_type::vec4_type);
-  packed_consts.type = dest.type;
+  src_reg packed_consts;
+  if (nir_dest_bit_size(instr->dest) == 32) {
+ packed_consts = src_reg(this, glsl_type::vec4_type);
+ emit_pull_constant_load_reg(dst_reg(packed_consts),
+ surf_index,
+ offset_reg,
+ NULL, NULL /* before_block/inst */);
+  } else {
+ src_reg temp = src_reg(this, glsl_type::dvec4_type);
+ src_reg temp_float = retype(temp, BRW_REGISTER_TYPE_F);
+
+ emit_pull_constant_load_reg(dst_reg(temp_float),
+ surf_index, offset_reg, NULL, NULL);
 
-  emit_pull_constant_load_reg(dst_reg(packed_consts),
-  surf_index,
-  offset,
-  NULL, NULL /* before_block/inst */);
+ emit(ADD(dst_reg(offset_reg), offset_reg, brw_imm_ud(16u)));
+ emit_pull_constant_load_reg(dst_reg(offset(temp_float, 1)),
+ surf_index, offset_reg, NULL, NULL);
+
+ packed_consts = src_reg(this, glsl_type::dvec4_type);
+ shuffle_64bit_data(dst_reg(packed_consts), temp, false);
+  }
 
   packed_consts.swizzle = brw_swizzle_for_size(instr->num_components);
   if (const_offset) {
- packed_consts.swizzle += BRW_SWIZZLE4(const_offset->u32[0] % 16 / 4,
-   const_offset->u32[0] % 16 / 4,
-   const_offset->u32[0] % 16 / 4,
-   const_offset->u32[0] % 16 / 4);
+ unsigned type_size = type_sz(dest.type);
+ packed_consts.swizzle +=
+BRW_SWIZZLE4(const_offset->u32[0] % 16 / type_size,
+ const_offset->u32[0] % 16 / type_size,
+ const_offset->u32[0] % 16 / type_size,
+ const_offset->u32[0] % 16 / type_size);
   }
 
-  emit(MOV(dest, packed_consts));
+  emit(MOV(dest, retype(packed_consts, dest.type)));
+
   break;
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 091/103] i965/vec4/tes: consider register offsets during attribute setup

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
index c8fa2ca..a1aa672 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
@@ -84,8 +84,8 @@ vec4_tes_visitor::setup_payload()
 
  bool is_64bit = type_sz(inst->src[i].type) == 8;
 
- struct brw_reg grf =
-brw_vec4_grf(reg + inst->src[i].nr / 2, 4 * (inst->src[i].nr % 2));
+ unsigned slot = inst->src[i].nr + inst->src[i].offset / 16;
+ struct brw_reg grf = brw_vec4_grf(reg + slot / 2, 4 * (slot % 2));
  grf = stride(grf, 0, is_64bit ? 2 : 4, 1);
  grf.swizzle = inst->src[i].swizzle;
  grf.type = inst->src[i].type;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 089/103] i965/vec4/tes: fix input loading for 64bit data types

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 72 +++---
 1 file changed, 55 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
index 226dcb4..f2a4507 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
@@ -177,10 +177,12 @@ vec4_tes_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
case nir_intrinsic_load_input:
case nir_intrinsic_load_per_vertex_input: {
   src_reg indirect_offset = get_indirect_offset(instr);
-  dst_reg dst = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_D);
   unsigned imm_offset = instr->const_index[0];
-  unsigned first_component = nir_intrinsic_component(instr);
   src_reg header = input_read_header;
+  bool is_64bit = nir_dest_bit_size(instr->dest) == 64;
+  unsigned first_component = nir_intrinsic_component(instr);
+  if (is_64bit)
+ first_component /= 2;
 
   if (indirect_offset.file != BAD_FILE) {
  header = src_reg(this, glsl_type::uvec4_type);
@@ -192,31 +194,67 @@ vec4_tes_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
   */
  const unsigned max_push_slots = 24;
  if (imm_offset < max_push_slots) {
-src_reg src = src_reg(ATTR, imm_offset, glsl_type::ivec4_type);
+const glsl_type *src_glsl_type =
+   is_64bit ? glsl_type::dvec4_type : glsl_type::ivec4_type;
+src_reg src = src_reg(ATTR, imm_offset, src_glsl_type);
 src.swizzle = BRW_SWZ_COMP_INPUT(first_component);
 
-emit(MOV(dst, src));
+const brw_reg_type dst_reg_type =
+   is_64bit ? BRW_REGISTER_TYPE_DF : BRW_REGISTER_TYPE_D;
+emit(MOV(get_nir_dest(instr->dest, dst_reg_type), src));
+
 prog_data->urb_read_length =
MAX2(prog_data->urb_read_length,
-DIV_ROUND_UP(imm_offset + 1, 2));
+DIV_ROUND_UP(imm_offset + (is_64bit ? 2 : 1), 2));
 break;
  }
   }
 
-  dst_reg temp(this, glsl_type::ivec4_type);
-  vec4_instruction *read =
- emit(VEC4_OPCODE_URB_READ, temp, src_reg(header));
-  read->offset = imm_offset;
-  read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
+  if (!is_64bit) {
+ dst_reg temp(this, glsl_type::ivec4_type);
+ vec4_instruction *read =
+emit(VEC4_OPCODE_URB_READ, temp, src_reg(header));
+ read->offset = imm_offset;
+ read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
 
-  src_reg src = src_reg(temp);
-  src.swizzle = BRW_SWZ_COMP_INPUT(first_component);
+ src_reg src = src_reg(temp);
+ src.swizzle = BRW_SWZ_COMP_INPUT(first_component);
 
-  /* Copy to target.  We might end up with some funky writemasks landing
-   * in here, but we really don't want them in the above pseudo-ops.
-   */
-  dst.writemask = brw_writemask_for_size(instr->num_components);
-  emit(MOV(dst, src));
+ /* Copy to target.  We might end up with some funky writemasks landing
+  * in here, but we really don't want them in the above pseudo-ops.
+  */
+ dst_reg dst = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_D);
+ dst.writemask = brw_writemask_for_size(instr->num_components);
+ emit(MOV(dst, src));
+  } else {
+ /* For 64-bit we need to load twice as many 32-bit components, and for
+  * dvec3/4 we need to emit 2 URB Read messages
+  */
+ dst_reg temp(this, glsl_type::dvec4_type);
+ dst_reg temp_d = retype(temp, BRW_REGISTER_TYPE_D);
+
+ vec4_instruction *read =
+emit(VEC4_OPCODE_URB_READ, temp_d, src_reg(header));
+ read->offset = imm_offset;
+ read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
+
+ if (instr->num_components > 2) {
+read =
+   emit(VEC4_OPCODE_URB_READ, offset(temp_d, 1), src_reg(header));
+read->offset = imm_offset + 1;
+read->urb_write_flags = BRW_URB_WRITE_PER_SLOT_OFFSET;
+ }
+
+ src_reg temp_as_src = src_reg(temp);
+ temp_as_src.swizzle = BRW_SWZ_COMP_INPUT(first_component);
+
+ dst_reg shuffled(this, glsl_type::dvec4_type);
+ shuffle_64bit_data(shuffled, temp_as_src, false);
+
+ dst_reg dst = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_DF);
+ dst.writemask = brw_writemask_for_size(instr->num_components);
+ emit(MOV(dst, src_reg(shuffled)));
+  }
   break;
}
default:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 070/103] i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions

2016-10-11 Thread Iago Toral Quiroga
In gen < 8 instructions that write more than one register need to read
more than one register too. Make sure we don't break that restriction
by copy propagating from a uniform.
---
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 08da96d..116287e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -324,6 +324,13 @@ try_copy_propagate(const struct gen_device_info *devinfo,
value.file != ATTR)
   return false;
 
+   /* In gen < 8 instructions that write 2 registers also need to read 2
+* registers. Make sure we don't break that restriction by copy
+* propagating from a uniform.
+*/
+   if (devinfo->gen < 8 && inst->size_written > REG_SIZE && is_uniform(value))
+  return false;
+
/* If the type of the copy value is different from the type of the
 * instruction then the swizzles and writemasks involved don't have the same
 * meaning and simply replacing the source would produce different 
semantics.
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 071/103] i965/vec4: don't propagate single-precision uniforms into 4-wide instructions

2016-10-11 Thread Iago Toral Quiroga
Otherwise we end up producing code that violates the register region
restriction that says that when execsize == width and hstride != 0
the vstride can't be 0.
---
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 116287e..4f7b844 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -331,6 +331,17 @@ try_copy_propagate(const struct gen_device_info *devinfo,
if (devinfo->gen < 8 && inst->size_written > REG_SIZE && is_uniform(value))
   return false;
 
+   /* There is a regioning restriction such that if execsize == width
+* and hstride != 0 then the vstride can't be 0. When we split instrutions
+* that take a single-precision source (like F->DF conversions) we end up
+* with a 4-wide source on an instruction with an execution size of 4.
+* If we then copy-propagate the source from a uniform we also end up with a
+* vstride of 0 and we violate the restriction.
+*/
+   if (inst->exec_size == 4 && value.file == UNIFORM &&
+   type_sz(value.type) == 4)
+  return false;
+
/* If the type of the copy value is different from the type of the
 * instruction then the swizzles and writemasks involved don't have the same
 * meaning and simply replacing the source would produce different 
semantics.
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 066/103] i965/vec4: Fix SSBO loads for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
Same requirements as for UBO loads.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 31 +-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index f234e65..001a62f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -645,7 +645,8 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
   src_reg offset_reg;
   nir_const_value *const_offset = nir_src_as_const_value(instr->src[1]);
   if (const_offset) {
- offset_reg = brw_imm_ud(const_offset->u32[0]);
+ offset_reg = src_reg(this, glsl_type::uint_type);
+ emit(MOV(dst_reg(offset_reg), brw_imm_ud(const_offset->u32[0])));
   } else {
  offset_reg = get_nir_src(instr->src[1], 1);
   }
@@ -654,14 +655,34 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
   const vec4_builder bld = vec4_builder(this).at_end()
  .annotate(current_annotation, base_ir);
 
-  src_reg read_result = emit_untyped_read(bld, surf_index, offset_reg,
-  1 /* dims */, 4 /* size*/,
-  BRW_PREDICATE_NONE);
+  src_reg read_result;
   dst_reg dest = get_nir_dest(instr->dest);
+  if (type_sz(dest.type) < 8) {
+ read_result = emit_untyped_read(bld, surf_index, offset_reg,
+ 1 /* dims */, 4 /* size*/,
+ BRW_PREDICATE_NONE);
+  } else {
+ src_reg shuffled = src_reg(this, glsl_type::dvec4_type);
+
+ src_reg temp;
+ temp = emit_untyped_read(bld, surf_index, offset_reg,
+  1 /* dims */, 4 /* size*/,
+  BRW_PREDICATE_NONE);
+ emit(MOV(dst_reg(retype(shuffled, temp.type)), temp));
+
+ emit(ADD(dst_reg(offset_reg), offset_reg, brw_imm_ud(16)));
+ temp = emit_untyped_read(bld, surf_index, offset_reg,
+  1 /* dims */, 4 /* size*/,
+  BRW_PREDICATE_NONE);
+ emit(MOV(dst_reg(retype(offset(shuffled, 1), temp.type)), temp));
+
+ read_result = src_reg(this, glsl_type::dvec4_type);
+ shuffle_64bit_data(dst_reg(read_result), shuffled, false);
+  }
+
   read_result.type = dest.type;
   read_result.swizzle = brw_swizzle_for_size(instr->num_components);
   emit(MOV(dest, read_result));
-
   break;
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 001/103] i965/nir: double/dvec2 uniforms only need to be padded to a single vec4 slot

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez 

max_vector_size is used in the vec4 backend to pad out the uniform
components to match a size that is a multiple of a vec4. Double and dvec2
uniforms only require a single vec4 slot, not two.

Signed-off-by: Samuel Iglesias Gonsálvez 
Signed-off-by: Iago Toral Quiroga 

Reviewed-by: Timothy Arceri 
---
 src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp 
b/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp
index b752ad5..e3ce5f9 100644
--- a/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp
+++ b/src/mesa/drivers/dri/i965/brw_nir_uniforms.cpp
@@ -107,7 +107,8 @@ brw_nir_setup_glsl_uniform(gl_shader_stage stage, 
nir_variable *var,
  unsigned max_vector_size = 4;
  if (storage->type->base_type == GLSL_TYPE_DOUBLE) {
 vector_size *= 2;
-max_vector_size *= 2;
+if (vector_size > 4)
+   max_vector_size = 8;
  }
 
  for (unsigned s = 0; s < vector_count; s++) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 085/103] i965/vec4: fix store output for 64-bit types

2016-10-11 Thread Iago Toral Quiroga
We need to shuffle the data before it is written to the URB. Also,
dvec3/4 need two vec4 slots.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 60a8425..dfe2740 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -437,16 +437,39 @@ vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
 
   int varying = instr->const_index[0] + const_offset->u32[0];
 
-  src = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_F,
-instr->num_components);
+  bool is_64bit = nir_src_bit_size(instr->src[0]) == 64;
+  if (is_64bit) {
+ src_reg data;
+ src = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_DF,
+   instr->num_components);
+ data = src_reg(this, glsl_type::dvec4_type);
+ shuffle_64bit_data(dst_reg(data), src, true);
+ src = retype(data, BRW_REGISTER_TYPE_F);
+  } else {
+ src = get_nir_src(instr->src[0], BRW_REGISTER_TYPE_F,
+   instr->num_components);
+  }
 
   if (varying >= VARYING_SLOT_VAR0) {
  unsigned c = nir_intrinsic_component(instr);
  unsigned v = varying - VARYING_SLOT_VAR0;
+
+ unsigned num_components = instr->num_components;
+ if (is_64bit)
+num_components *= 2;
+
  output_generic_reg[v][c] = dst_reg(src);
- output_generic_num_components[v][c] = instr->num_components;
+ output_generic_num_components[v][c] = MIN2(4, num_components);
+
+ if (is_64bit && num_components > 4) {
+assert(num_components <= 8);
+output_generic_reg[v + 1][c] = offset(dst_reg(src), 1);
+output_generic_num_components[v + 1][c] = num_components - 4;
+ }
   } else {
  output_reg[varying] = dst_reg(src);
+ if (is_64bit && instr->num_components > 2)
+output_reg[varying + 1] = offset(dst_reg(src), 1);
   }
   break;
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 060/103] i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands

2016-10-11 Thread Iago Toral Quiroga
We make scalar sources in 3src instructions use subnr instead of
swizzles because they don't really use swizzles.

With doubles it is more complicated because we use vstride=0 in
more scenarios in which they don't produce scalar regions. Also
RepCtrl=1 is not allowed with 64-bit operands, so we should avoid
this.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 45d49e9..190581e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1961,9 +1961,12 @@ vec4_visitor::convert_to_hw_regs()
   if (inst->is_3src(devinfo)) {
  /* 3-src instructions with scalar sources support arbitrary subnr,
   * but don't actually use swizzles.  Convert swizzle into subnr.
+  * Skip this for double-precision instructions: RepCtrl=1 is not
+  * allowed for them and need special handling.
   */
  for (int i = 0; i < 3; i++) {
-if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0) {
+if (inst->src[i].vstride == BRW_VERTICAL_STRIDE_0 &&
+type_sz(inst->src[i].type) < 8) {
assert(brw_is_single_value_swizzle(inst->src[i].swizzle));
inst->src[i].subnr += 4 * BRW_GET_SWZ(inst->src[i].swizzle, 0);
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 090/103] i965/vec4/tes: fix setup_payload() for 64bit data types

2016-10-11 Thread Iago Toral Quiroga
Use a width of 2 with 64-bit attributes.

Also, if we have a dvec3/4 attribute that gets split across two registers
such that components XY are stored in the second half of a register and
components ZW are stored in the first half of the next, we need to fix
regioning for any instruction that reads components Z/W of the attribute.
Notice this also means that we can't support sources that read cross-dvec2
swizzles (like XZ for example).

v2: don't assert that we have a single channel swizzle in the case that we
have to fix up Z/W access on the first half of the next register. We
can handle any swizzle that does not cross dvec2 boundaries, which
the double scalarization pass should have prevented anyway.
---
 src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
index f2a4507..c8fa2ca 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
@@ -82,14 +82,33 @@ vec4_tes_visitor::setup_payload()
  if (inst->src[i].file != ATTR)
 continue;
 
+ bool is_64bit = type_sz(inst->src[i].type) == 8;
+
  struct brw_reg grf =
 brw_vec4_grf(reg + inst->src[i].nr / 2, 4 * (inst->src[i].nr % 2));
- grf = stride(grf, 0, 4, 1);
+ grf = stride(grf, 0, is_64bit ? 2 : 4, 1);
  grf.swizzle = inst->src[i].swizzle;
  grf.type = inst->src[i].type;
  grf.abs = inst->src[i].abs;
  grf.negate = inst->src[i].negate;
 
+ /* For 64-bit attributes we can end up with components XY in the
+  * second half of a register and components ZW in the first half
+  * of the next. Fix it up here.
+  */
+ if (is_64bit && grf.subnr > 0) {
+/* We can't do swizzles that mix XY and ZW channels in this case.
+ * Such cases should have been handled by the scalarization pass.
+ */
+assert((brw_mask_for_swizzle(grf.swizzle) & 0x3) ^
+   (brw_mask_for_swizzle(grf.swizzle) & 0xc));
+if (brw_mask_for_swizzle(grf.swizzle) & 0xc) {
+   grf.subnr = 0;
+   grf.nr++;
+   grf.swizzle -= BRW_SWIZZLE_;
+}
+ }
+
  inst->src[i] = grf;
   }
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 082/103] i965/vec4: make emit_pull_constant_load support 64-bit loads

2016-10-11 Thread Iago Toral Quiroga
This way callers don't need to know about 64-bit particularities and
we reuse some code.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 22 ++-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 81 ++
 2 files changed, 50 insertions(+), 53 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b0bc2d5..e732bf4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -884,24 +884,12 @@ vec4_visitor::move_push_constants_to_pull_constants()
 
  int uniform = inst->src[i].nr;
 
- dst_reg temp;
- if (type_sz(inst->src[i].type) != 8) {
-temp = dst_reg(this, glsl_type::vec4_type);
-emit_pull_constant_load(block, inst, temp, inst->src[i],
-pull_constant_loc[uniform], src_reg());
- } else {
-dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type);
-dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F);
-
-emit_pull_constant_load(block, inst, shuffled_float, inst->src[i],
-pull_constant_loc[uniform], src_reg());
-emit_pull_constant_load(block, inst, offset(shuffled_float, 1),
-offset(inst->src[i], 1),
-pull_constant_loc[uniform], src_reg());
+ const glsl_type *temp_type = type_sz(inst->src[i].type) == 8 ?
+glsl_type::dvec4_type : glsl_type::vec4_type;
+ dst_reg temp = dst_reg(this, temp_type);
 
-temp = dst_reg(this, glsl_type::dvec4_type);
-shuffle_64bit_data(temp, src_reg(shuffled), false, block, inst);
- }
+ emit_pull_constant_load(block, inst, temp, inst->src[i],
+ pull_constant_loc[uniform], src_reg());
 
  inst->src[i].file = temp.file;
  inst->src[i].nr = temp.nr;
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index f12a114..0177f68 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1718,33 +1718,57 @@ vec4_visitor::move_grf_array_access_to_scratch()
  */
 void
 vec4_visitor::emit_pull_constant_load(bblock_t *block, vec4_instruction *inst,
- dst_reg temp, src_reg orig_src,
+  dst_reg temp, src_reg orig_src,
   int base_offset, src_reg indirect)
 {
assert(orig_src.offset % 16 == 0);
-   int reg_offset = base_offset + orig_src.offset / 16;
const unsigned index = prog_data->base.binding_table.pull_constants_start;
 
-   src_reg offset;
-   if (indirect.file != BAD_FILE) {
-  offset = src_reg(this, glsl_type::uint_type);
-
-  emit_before(block, inst, ADD(dst_reg(offset), indirect,
-   brw_imm_ud(reg_offset * 16)));
-   } else if (devinfo->gen >= 8) {
-  /* Store the offset in a GRF so we can send-from-GRF. */
-  offset = src_reg(this, glsl_type::uint_type);
-  emit_before(block, inst, MOV(dst_reg(offset), brw_imm_ud(reg_offset * 
16)));
-   } else {
-  offset = brw_imm_d(reg_offset * 16);
+   /* For 64bit loads we need to emit two 32-bit load messages and we also
+* we need to shuffle the 32-bit data result into proper 64-bit data. To do
+* that we emit the 32-bit loads into a temporary and we shuffle the result
+* into the original destination.
+*/
+   dst_reg orig_temp = temp;
+   bool is_64bit = type_sz(orig_src.type) == 8;
+   if (is_64bit) {
+  assert(type_sz(temp.type) == 8);
+  dst_reg temp_df = dst_reg(this, glsl_type::dvec4_type);
+  temp = retype(temp_df, BRW_REGISTER_TYPE_F);
}
 
-   emit_pull_constant_load_reg(temp,
-   brw_imm_ud(index),
-   offset,
-   block, inst);
+   src_reg src = orig_src;
+   for (int i = 0; i < (is_64bit ? 2 : 1); i++) {
+  int reg_offset = base_offset + src.offset / 16;
+
+  src_reg byte_offset;
+  if (indirect.file != BAD_FILE) {
+ byte_offset = src_reg(this, glsl_type::uint_type);
+ emit_before(block, inst, ADD(dst_reg(byte_offset), indirect,
+  brw_imm_ud(reg_offset * 16)));
+  } else if (devinfo->gen >= 8) {
+ /* Store the offset in a GRF so we can send-from-GRF. */
+ byte_offset = src_reg(this, glsl_type::uint_type);
+ emit_before(block, inst, MOV(dst_reg(byte_offset),
+  brw_imm_ud(reg_offset * 16)));
+  } else {
+ byte_offset = brw_imm_d(reg_offset * 16);
+  }
+
+  emit_pull_constant_load_reg(offset(temp, i),
+  brw_imm_ud(index),
+  byte_offset,
+  

[Mesa-dev] [PATCH v2 086/103] i965/vec4/gs: fix input loading for 64bit data

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez 

v2 (Iago):
   - Adapt 64-bit path to component packing changes.

Signed-off-by: Samuel Iglesias Gonsálvez 
Signed-off-by: Iago Toral Quiroga 
---
 src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 51 ++-
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
index 16d2410..ed8c03b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
@@ -64,23 +64,40 @@ vec4_gs_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
* be constant.  We should handle indirects someday.
*/
   nir_const_value *vertex = nir_src_as_const_value(instr->src[0]);
-  nir_const_value *offset = nir_src_as_const_value(instr->src[1]);
-
-  /* Make up a type...we have no way of knowing... */
-  const glsl_type *const type = glsl_type::ivec(instr->num_components);
-
-  src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u32[0] +
-  instr->const_index[0] + offset->u32[0],
-type);
-  src.swizzle = BRW_SWZ_COMP_INPUT(nir_intrinsic_component(instr));
-
-  /* gl_PointSize is passed in the .w component of the VUE header */
-  if (instr->const_index[0] == VARYING_SLOT_PSIZ)
- src.swizzle = BRW_SWIZZLE_;
-
-  dest = get_nir_dest(instr->dest, src.type);
-  dest.writemask = brw_writemask_for_size(instr->num_components);
-  emit(MOV(dest, src));
+  nir_const_value *offset_reg = nir_src_as_const_value(instr->src[1]);
+
+  if (nir_dest_bit_size(instr->dest) == 64) {
+ src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u32[0] +
+   instr->const_index[0] + offset_reg->u32[0],
+   glsl_type::dvec4_type);
+
+ dst_reg tmp = dst_reg(this, glsl_type::dvec4_type);
+ shuffle_64bit_data(tmp, src, false);
+
+ src = src_reg(tmp);
+ src.swizzle = BRW_SWZ_COMP_INPUT(nir_intrinsic_component(instr) / 2);
+
+ /* Write to dst reg taking into account original writemask */
+ dest = get_nir_dest(instr->dest, BRW_REGISTER_TYPE_DF);
+ dest.writemask = brw_writemask_for_size(instr->num_components);
+ emit(MOV(dest, src));
+  } else {
+ /* Make up a type...we have no way of knowing... */
+ const glsl_type *const type = glsl_type::ivec(instr->num_components);
+
+ src = src_reg(ATTR, BRW_VARYING_SLOT_COUNT * vertex->u32[0] +
+   instr->const_index[0] + offset_reg->u32[0],
+   type);
+ src.swizzle = BRW_SWZ_COMP_INPUT(nir_intrinsic_component(instr));
+
+ /* gl_PointSize is passed in the .w component of the VUE header */
+ if (instr->const_index[0] == VARYING_SLOT_PSIZ)
+src.swizzle = BRW_SWIZZLE_;
+
+ dest = get_nir_dest(instr->dest, src.type);
+ dest.writemask = brw_writemask_for_size(instr->num_components);
+ emit(MOV(dest, src));
+  }
   break;
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 057/103] i965/vec4: teach register coalescing about 64-bit

2016-10-11 Thread Iago Toral Quiroga
Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.

Also, we should check that if we are coalescing a register from another
MOV we should be reading the same amount of data written by that MOV,
Otherwise it might not be safe to eliminate it. This can happen, for example,
when we have split fp64 MOVs with an exec size of 4 that only write one
register each and then a MOV with exec size of 8 that reads both. We want to
avoid the pass to think that it can coalesce from the first split MOV alone.
Ideally we would like the pass to see that it can coalesce from both split
MOVs instead, but for now we keep it simple.

Finally, the pass doesn't support coalescing of multiple registers but in the
case of normal SIMD4x2 double-precision instructions they naturally write two
registers (one per vertex) and there is no reason why we should not allow
coalescing in this case. Change the restriction to bail if we see instructions
that write more than 8 channels, where the channels can be 32-bit or 64-bit.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index c728e38..e5391b9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1191,6 +1191,19 @@ vec4_visitor::opt_register_coalesce()
   scan_inst->dst.type == scan_inst->src[0].type))
break;
 
+/* Only allow coalescing between registers of the same type size.
+ * Otherwise we would need to make the pass aware of the fact that
+ * channel sizes are different for single and double precision.
+ */
+if (type_sz(inst->src[0].type) != type_sz(scan_inst->src[0].type))
+   break;
+
+/* Check that scan_inst writes at least the same amount of data
+ * that we read in the instruction
+ */
+if (scan_inst->size_written >= inst->size_read(0))
+   break;
+
 /* If we can't handle the swizzle, bail. */
 if (!scan_inst->can_reswizzle(devinfo, inst->dst.writemask,
   inst->src[0].swizzle,
@@ -1198,10 +1211,12 @@ vec4_visitor::opt_register_coalesce()
break;
 }
 
-/* This only handles coalescing of a single register starting at
- * the source offset of the copy instruction.
+/* This only handles coalescing writes of 8 channels (1 register
+ * for single-precision and 2 registers for double-precision)
+ * starting at the source offset of the copy instruction.
  */
-if (scan_inst->size_written > REG_SIZE ||
+if (DIV_ROUND_UP(scan_inst->size_written,
+ type_sz(scan_inst->dst.type)) > 8 ||
 scan_inst->dst.offset != inst->src[0].offset)
break;
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 020/103] i965/vec4: don't copy propagate vector opcodes that operate in align1 mode

2016-10-11 Thread Iago Toral Quiroga
Basically, ALIGN1 mode will ignore swizzles on the input vectors so we don't
want the copy propagation pass to mess with them.
---
 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 24 ++
 1 file changed, 24 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 545f4c7..d0045a7 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -283,6 +283,22 @@ try_constant_propagate(const struct gen_device_info 
*devinfo,
 }
 
 static bool
+is_align1_opcode(unsigned opcode)
+{
+   switch (opcode) {
+   case VEC4_OPCODE_DOUBLE_TO_FLOAT:
+   case VEC4_OPCODE_FLOAT_TO_DOUBLE:
+   case VEC4_OPCODE_PICK_LOW_32BIT:
+   case VEC4_OPCODE_PICK_HIGH_32BIT:
+   case VEC4_OPCODE_SET_LOW_32BIT:
+   case VEC4_OPCODE_SET_HIGH_32BIT:
+  return true;
+   default:
+  return false;
+   }
+}
+
+static bool
 try_copy_propagate(const struct gen_device_info *devinfo,
vec4_instruction *inst, int arg,
const copy_entry *entry, int attributes_per_reg)
@@ -326,6 +342,14 @@ try_copy_propagate(const struct gen_device_info *devinfo,
 
unsigned composed_swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
value.swizzle);
+
+   /* Instructions that operate on vectors in ALIGN1 mode will ignore swizzles
+* so copy-propagation won't be safe if the composed swizzle is anything
+* other than the identity.
+*/
+   if (is_align1_opcode(inst->opcode) && composed_swizzle != BRW_SWIZZLE_XYZW)
+  return false;
+
if (inst->is_3src(devinfo) &&
(value.file == UNIFORM ||
 (value.file == ATTR && attributes_per_reg != 1)) &&
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 050/103] i965/vec4: teach CSE about exec_size, group and doubles

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 31 +++---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index bef897a..229d7b2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
@@ -130,6 +130,8 @@ instructions_match(vec4_instruction *a, vec4_instruction *b)
   a->dst.writemask == b->dst.writemask &&
   a->force_writemask_all == b->force_writemask_all &&
   a->size_written == b->size_written &&
+  a->exec_size == b->exec_size &&
+  a->group == b->group &&
   operands_match(a, b);
 }
 
@@ -181,9 +183,17 @@ vec4_visitor::opt_cse_local(bblock_t *block)
   regs_written(entry->generator)),
NULL), inst->dst.type);
 
-   for (unsigned i = 0; i < regs_written(entry->generator); ++i) {
-  vec4_instruction *copy = MOV(offset(entry->generator->dst, 
i),
-   offset(entry->tmp, i));
+   unsigned type_scale = DIV_ROUND_UP(type_sz(entry->tmp.type), 4);
+   unsigned regs_per_mov =
+   DIV_ROUND_UP(type_scale * entry->generator->exec_size, 8);
+   unsigned num_copy_movs =
+  DIV_ROUND_UP(regs_written(entry->generator), regs_per_mov);
+   for (unsigned i = 0; i < num_copy_movs; ++i) {
+  vec4_instruction *copy =
+ MOV(offset(entry->generator->dst, i * regs_per_mov),
+offset(entry->tmp, i * regs_per_mov));
+  copy->exec_size = entry->generator->exec_size;
+  copy->group = entry->generator->group;
   copy->force_writemask_all =
  entry->generator->force_writemask_all;
   entry->generator->insert_after(block, copy);
@@ -195,10 +205,17 @@ vec4_visitor::opt_cse_local(bblock_t *block)
 /* dest <- temp */
 if (!inst->dst.is_null()) {
assert(inst->dst.type == entry->tmp.type);
-
-   for (unsigned i = 0; i < regs_written(inst); ++i) {
-  vec4_instruction *copy = MOV(offset(inst->dst, i),
-   offset(entry->tmp, i));
+   unsigned type_scale = DIV_ROUND_UP(type_sz(inst->dst.type), 4);
+   unsigned regs_per_mov =
+   DIV_ROUND_UP(type_scale * inst->exec_size, 8);
+   unsigned num_copy_movs =
+  DIV_ROUND_UP(regs_written(inst), regs_per_mov);
+   for (unsigned i = 0; i < num_copy_movs; ++i) {
+  vec4_instruction *copy =
+ MOV(offset(inst->dst, i * regs_per_mov),
+ offset(entry->tmp, i * regs_per_mov));
+  copy->exec_size = inst->exec_size;
+  copy->group = inst->group;
   copy->force_writemask_all = inst->force_writemask_all;
   inst->insert_before(block, copy);
}
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 015/103] i965/vec4: We only support 32-bit integer ALU operations for now

2016-10-11 Thread Iago Toral Quiroga
Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.

Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 71 ++
 1 file changed, 53 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index b75337c..04f70ef 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1135,9 +1135,9 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
}
 
-   case nir_op_fadd:
-  /* fall through */
case nir_op_iadd:
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
+   case nir_op_fadd:
   inst = emit(ADD(dst, op[0], op[1]));
   inst->saturate = instr->dest.saturate;
   break;
@@ -1148,6 +1148,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
 
case nir_op_imul: {
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
   if (devinfo->gen < 8) {
  nir_const_value *value0 = nir_src_as_const_value(instr->src[0].src);
  nir_const_value *value1 = nir_src_as_const_value(instr->src[1].src);
@@ -1183,6 +1184,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 
case nir_op_imul_high:
case nir_op_umul_high: {
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
   struct brw_reg acc = retype(brw_acc_reg(8), dst.type);
 
   if (devinfo->gen >= 8)
@@ -1221,6 +1223,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 
case nir_op_idiv:
case nir_op_udiv:
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
   emit_math(SHADER_OPCODE_INT_QUOTIENT, dst, op[0], op[1]);
   break;
 
@@ -1230,6 +1233,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
* appears that our hardware just does the right thing for signed
* remainder.
*/
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
   emit_math(SHADER_OPCODE_INT_REMAINDER, dst, op[0], op[1]);
   break;
 
@@ -1283,6 +1287,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
 
case nir_op_uadd_carry: {
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
   struct brw_reg acc = retype(brw_acc_reg(8), BRW_REGISTER_TYPE_UD);
 
   emit(ADDC(dst_null_ud(), op[0], op[1]));
@@ -1291,6 +1296,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
}
 
case nir_op_usub_borrow: {
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
   struct brw_reg acc = retype(brw_acc_reg(8), BRW_REGISTER_TYPE_UD);
 
   emit(SUBB(dst_null_ud(), op[0], op[1]));
@@ -1358,16 +1364,18 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
}
 
-   case nir_op_fmin:
case nir_op_imin:
case nir_op_umin:
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
+   case nir_op_fmin:
   inst = emit_minmax(BRW_CONDITIONAL_L, dst, op[0], op[1]);
   inst->saturate = instr->dest.saturate;
   break;
 
-   case nir_op_fmax:
case nir_op_imax:
case nir_op_umax:
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
+   case nir_op_fmax:
   inst = emit_minmax(BRW_CONDITIONAL_GE, dst, op[0], op[1]);
   inst->saturate = instr->dest.saturate;
   break;
@@ -1380,26 +1388,30 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
case nir_op_fddy_fine:
   unreachable("derivatives are not valid in vertex shaders");
 
-   case nir_op_flt:
case nir_op_ilt:
case nir_op_ult:
-   case nir_op_fge:
case nir_op_ige:
case nir_op_uge:
-   case nir_op_feq:
case nir_op_ieq:
-   case nir_op_fne:
case nir_op_ine:
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
+  /* Fallthrough */
+   case nir_op_flt:
+   case nir_op_fge:
+   case nir_op_feq:
+   case nir_op_fne:
   emit(CMP(dst, op[0], op[1],
brw_conditional_for_nir_comparison(instr->op)));
   break;
 
-   case nir_op_ball_fequal2:
case nir_op_ball_iequal2:
-   case nir_op_ball_fequal3:
case nir_op_ball_iequal3:
-   case nir_op_ball_fequal4:
-   case nir_op_ball_iequal4: {
+   case nir_op_ball_iequal4:
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
+  /* Fallthrough */
+   case nir_op_ball_fequal2:
+   case nir_op_ball_fequal3:
+   case nir_op_ball_fequal4: {
   unsigned swiz =
  brw_swizzle_for_size(nir_op_infos[instr->op].input_sizes[0]);
 
@@ -1411,12 +1423,14 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   break;
}
 
-   case nir_op_bany_fnequal2:
case nir_op_bany_inequal2:
-   case nir_op_bany_fnequal3:
case nir_op_bany_inequal3:
-   case nir_op_bany_fnequal4:
-   case nir_op_bany_inequal4: {
+   case nir_op_bany_inequal4:
+  assert(nir_dest_bit_size(instr->dest.dest) < 64);
+  /* Fallthrough */
+   case nir_op_bany_fnequal2:
+   case nir_op_bany_fnequal3:
+   case nir_op_bany_fnequal4: {
   unsigned swiz =
  brw_swizzle_for_size(nir_op_infos[inst

[Mesa-dev] [PATCH v2 078/103] i965/vec4: fix scratch writes for 64bit data

2016-10-11 Thread Iago Toral Quiroga
Mostly the same stuff as usual: we ned to shuffle the data before we
write and we need to emit two 32-bit write messages (with appropriate
32-bit writemask channels set) for a full dvec4 scratch write.
---
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 64 ++
 1 file changed, 55 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 44e6709..b0b5f39 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -1534,17 +1534,63 @@ vec4_visitor::emit_scratch_write(bblock_t *block, 
vec4_instruction *inst,
 * weren't initialized, it will confuse live interval analysis, which will
 * make spilling fail to make progress.
 */
-   const src_reg temp = swizzle(retype(src_reg(this, glsl_type::vec4_type),
+   bool is_64bit = type_sz(inst->dst.type) == 8;
+   const glsl_type *alloc_type =
+  is_64bit ? glsl_type::dvec4_type : glsl_type::vec4_type;
+   const src_reg temp = swizzle(retype(src_reg(this, alloc_type),
inst->dst.type),
 brw_swizzle_for_mask(inst->dst.writemask));
-   dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0),
-  inst->dst.writemask));
-   vec4_instruction *write = SCRATCH_WRITE(dst, temp, index);
-   if (inst->opcode != BRW_OPCODE_SEL)
-  write->predicate = inst->predicate;
-   write->ir = inst->ir;
-   write->annotation = inst->annotation;
-   inst->insert_after(block, write);
+
+   if (!is_64bit) {
+  dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0),
+ inst->dst.writemask));
+  vec4_instruction *write = SCRATCH_WRITE(dst, temp, index);
+  if (inst->opcode != BRW_OPCODE_SEL)
+ write->predicate = inst->predicate;
+  write->ir = inst->ir;
+  write->annotation = inst->annotation;
+  inst->insert_after(block, write);
+   } else {
+  dst_reg shuffled = dst_reg(this, alloc_type);
+  vec4_instruction *last =
+ shuffle_64bit_data(shuffled, temp, true, block, inst);
+  src_reg shuffled_float = src_reg(retype(shuffled, BRW_REGISTER_TYPE_F));
+
+  uint8_t mask = 0;
+  if (inst->dst.writemask & WRITEMASK_X)
+ mask |= WRITEMASK_XY;
+  if (inst->dst.writemask & WRITEMASK_Y)
+ mask |= WRITEMASK_ZW;
+  if (mask) {
+ dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0), mask));
+
+ vec4_instruction *write = SCRATCH_WRITE(dst, shuffled_float, index);
+ if (inst->opcode != BRW_OPCODE_SEL)
+write->predicate = inst->predicate;
+ write->ir = inst->ir;
+ write->annotation = inst->annotation;
+ last->insert_after(block, write);
+  }
+
+  mask = 0;
+  if (inst->dst.writemask & WRITEMASK_Z)
+ mask |= WRITEMASK_XY;
+  if (inst->dst.writemask & WRITEMASK_W)
+ mask |= WRITEMASK_ZW;
+  if (mask) {
+ dst_reg dst = dst_reg(brw_writemask(brw_vec8_grf(0, 0), mask));
+
+ src_reg index = get_scratch_offset(block, inst, inst->dst.reladdr,
+reg_offset + 1);
+ vec4_instruction *write =
+SCRATCH_WRITE(dst, offset(shuffled_float, 1), index);
+ if (inst->opcode != BRW_OPCODE_SEL)
+write->predicate = inst->predicate;
+ write->ir = inst->ir;
+ write->annotation = inst->annotation;
+ last->insert_after(block, write);
+  }
+   }
 
inst->dst.file = temp.file;
inst->dst.nr = temp.nr;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 072/103] i965/vec4: don't copy propagate misaligned registers

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez 

This means we would copy propagate partial reads or writes and that can affect
the result.

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 4f7b844..db2b317 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -354,6 +354,9 @@ try_copy_propagate(const struct gen_device_info *devinfo,
   return false;
}
 
+   if (inst->src[arg].offset % REG_SIZE || value.offset % REG_SIZE)
+  return false;
+
bool has_source_modifiers = value.negate || value.abs;
 
/* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 046/103] i965/vec4: add a SIMD lowering pass

2016-10-11 Thread Iago Toral Quiroga
Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain hardware bugs and limitations
that we need to work around by splitting the instructions so we only
write to 1 register at a time. This patch implements a SIMD splitting
pass similar to the one in the scalar backend.

Because we only use double-precision instructions in Align16 mode
in gen7 (gen8+ is fully scalar and gens < 7 do not implement fp64)
the pass should be a no-op on any other generation.

For now the pass only handles the gen7 restriction where any
instruction that writes 2 registers also needs to read 2 registers.
This affects double-precision instructions reading uniforms, for
example. Later patches will extend the lowering pass adding a few
more cases.

v2:
 - Move the simd lowering pass after the main optimization loop and
   run copy-propagation and dce if it reports progress (Curro)
 - Compute number of registers written instead of fixing it to 1 (Iago)
 - Use group from backend_instruction (Iago)
 - Drop assertion that checked that we only split 8-wide instructions
   into 4-wide. (Curro)
 - Don't assume that instructions can only be 8-wide, we might want
   to use 16-wide instructions in the future too (Curro)
 - Wrap gen7 workarounds in a conditional to ease adding workarounds
   for other gens in the future (Curro)
 - Handle dst/src overlap hazard (Curro)
 - Use the horiz_offset() helper to simplify the implementation (Curro)
 - Drop the assertion that checks that each split instruction writes
   exactly one register (Curro)
 - Use the copy constructor to generate split instructions with all
   the relevant fields initialized to the values in the original
   instruction instead of copying only a handful of them manually (Curro)
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 156 +
 src/mesa/drivers/dri/i965/brw_vec4.h   |   2 +
 2 files changed, 158 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 34cab04..490cbae 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1977,6 +1977,157 @@ vec4_visitor::convert_to_hw_regs()
}
 }
 
+/**
+ * Get the closest native SIMD width supported by the hardware for instruction
+ * \p inst.  The instruction will be left untouched by
+ * vec4_visitor::lower_simd_width() if the returned value matches the
+ * instruction's original execution size.
+ */
+static unsigned
+get_lowered_simd_width(const struct gen_device_info *devinfo,
+   const vec4_instruction *inst)
+{
+   unsigned lowered_width = MIN2(16, inst->exec_size);
+
+   /* We need to split some cases of double-precision instructions that write
+* 2 registers. We only need to care about this in gen7 because that is the
+* only hardware that implements fp64 in Align16.
+*/
+   if (devinfo->gen == 7 && inst->size_written > REG_SIZE) {
+  /* HSW PRM, 3D Media GPGPU Engine, Region Alignment Rules for Direct
+   * Register Addressing:
+   *
+   *"When destination spans two registers, the source MUST span two
+   * registers."
+   */
+  for (unsigned i = 0; i < 3; i++) {
+ if (inst->src[i].file == BAD_FILE)
+continue;
+ if (inst->size_read(i) <= REG_SIZE)
+lowered_width = MIN2(lowered_width, 4);
+  }
+   }
+
+   return lowered_width;
+}
+
+static bool
+dst_src_regions_overlap(vec4_instruction *inst)
+{
+   if (inst->size_written == 0)
+  return false;
+
+   unsigned dst_start = inst->dst.offset;
+   unsigned dst_end = dst_start + inst->size_written - 1;
+   for (int i = 0; i < 3; i++) {
+  if (inst->src[i].file == BAD_FILE)
+ continue;
+
+  if (inst->dst.file != inst->src[i].file ||
+  inst->dst.nr != inst->src[i].nr)
+ continue;
+
+  unsigned src_start = inst->src[i].offset;
+  unsigned src_end = src_start + inst->size_read(i) - 1;
+
+  if ((dst_start >= src_start && dst_start <= src_end) ||
+  (dst_end >= src_start && dst_end <= src_end) ||
+  (dst_start <= src_start && dst_end >= src_end)) {
+ return true;
+  }
+   }
+
+   return false;
+}
+
+bool
+vec4_visitor::lower_simd_width()
+{
+   bool progress = false;
+
+   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
+  const unsigned lowered_width = get_lowered_simd_width(devinfo, inst);
+  assert(lowered_width <= inst->exec_size);
+  if (lowered_width == inst->exec_size)
+ continue;
+
+  /* We need to deal with source / destination overlaps when splitting.
+   * The hardware supports reading from and writing to the same register
+   * in the same instruction, but we need to be caref

[Mesa-dev] [PATCH v2 025/103] i965/vec4: fix indentation in get_nir_src()

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 860ec51..c825aeb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -308,8 +308,8 @@ vec4_visitor::get_nir_src(const nir_src &src, enum 
brw_reg_type type,
   reg = nir_ssa_values[src.ssa->index];
}
else {
- reg = dst_reg_for_nir_reg(this, src.reg.reg, src.reg.base_offset,
-   src.reg.indirect);
+  reg = dst_reg_for_nir_reg(this, src.reg.reg, src.reg.base_offset,
+src.reg.indirect);
}
 
reg = retype(reg, type);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 040/103] i965/vec4: fix regs_read() for doubles

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 75a8473..2bde628 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -228,8 +228,8 @@ vec4_instruction::size_read(unsigned arg) const
case UNIFORM:
   return 4 * type_sz(src[arg].type);
default:
-  /* XXX - Represent actual execution size and vertical stride. */
-  return 8 * type_sz(src[arg].type);
+  /* XXX - Represent actual vertical stride. */
+  return exec_size * type_sz(src[arg].type);
}
 }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 073/103] i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 7af65ab..7f6acc3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -908,11 +908,14 @@ vec4_visitor::is_dep_ctrl_unsafe(const vec4_instruction 
*inst)
(reg.type == BRW_REGISTER_TYPE_UD || \
 reg.type == BRW_REGISTER_TYPE_D)
 
-   /* "When source or destination datatype is 64b or operation is integer DWord
+   /* From the Cherryview and Broadwell PRMs:
+*
+* "When source or destination datatype is 64b or operation is integer DWord
 * multiply, DepCtrl must not be used."
-* May apply to future SoCs as well.
+*
+* SKL PRMs don't include this restriction though.
 */
-   if (devinfo->is_cherryview) {
+   if (devinfo->gen == 8) {
   if (inst->opcode == BRW_OPCODE_MUL &&
  IS_DWORD(inst->src[0]) &&
  IS_DWORD(inst->src[1]))
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 036/103] i965/vec4: add a helper function to create double immediates

2016-10-11 Thread Iago Toral Quiroga
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.

v2 (Curro):
  - Use swizzle() and writemask() helpers and make tmp const.

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 38 ++
 2 files changed, 40 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 0111966..6063bee 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -322,6 +322,8 @@ public:
void emit_single_to_double(dst_reg dst, src_reg src, bool saturate,
   brw_reg_type single_type);
 
+   src_reg setup_imm_df(double v);
+
virtual void emit_nir_code();
virtual void nir_setup_uniforms();
virtual void nir_setup_system_value_intrinsic(nir_intrinsic_instr *instr);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 088ed13..4d5fa96 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1114,6 +1114,44 @@ vec4_visitor::emit_single_to_double(dst_reg dst, src_reg 
src, bool saturate,
inst->saturate = saturate;
 }
 
+src_reg
+vec4_visitor::setup_imm_df(double v)
+{
+   assert(devinfo->gen >= 7);
+
+   if (devinfo->gen >= 8)
+  return brw_imm_df(v);
+
+   /* gen7 does not support DF immediates */
+   union {
+  double d;
+  struct {
+ uint32_t i1;
+ uint32_t i2;
+  };
+   } di;
+
+   di.d = v;
+
+   /* Write the low 32-bit of the constant to the X:UD channel and the
+* high 32-bit to the Y:UD channel to build the constant in a VGRF.
+* We have to do this twice (offset 0 and offset 1), since a DF VGRF takes
+* two SIMD8 registers in SIMD4x2 execution. Finally, return a swizzle
+*  so any access to the VGRF only reads the constant data in these
+* channels.
+*/
+   const dst_reg tmp =
+  retype(dst_reg(VGRF, alloc.allocate(2)), BRW_REGISTER_TYPE_UD);
+   for (int n = 0; n < 2; n++) {
+  emit(MOV(writemask(offset(tmp, n), WRITEMASK_X), brw_imm_ud(di.i1)))
+ ->force_writemask_all = true;
+  emit(MOV(writemask(offset(tmp, n), WRITEMASK_Y), brw_imm_ud(di.i2)))
+ ->force_writemask_all = true;
+   }
+
+   return swizzle(src_reg(retype(tmp, BRW_REGISTER_TYPE_DF)), 
BRW_SWIZZLE_);
+}
+
 void
 vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
 {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 023/103] i965/vec4/nir: implement double comparisons

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 37c3d7c..815082e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1399,10 +1399,25 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
case nir_op_flt:
case nir_op_fge:
case nir_op_feq:
-   case nir_op_fne:
-  emit(CMP(dst, op[0], op[1],
-   brw_conditional_for_nir_comparison(instr->op)));
+   case nir_op_fne: {
+  enum brw_conditional_mod conditional_mod =
+ brw_conditional_for_nir_comparison(instr->op);
+  if (nir_src_bit_size(instr->src[0].src) < 64) {
+ emit(CMP(dst, op[0], op[1], conditional_mod));
+  } else {
+ /* Produce a 32-bit boolean result from the DF comparison by selecting
+  * only the low 32-bit in each DF produced. Do this in a temporary
+  * so we can then move from there to the result using align16 again
+  * to honor the original writemask.
+  */
+ dst_reg temp = dst_reg(this, glsl_type::dvec4_type);
+ emit(CMP(temp, op[0], op[1], conditional_mod));
+ dst_reg result = dst_reg(this, glsl_type::bvec4_type);
+ emit(VEC4_OPCODE_PICK_LOW_32BIT, result, src_reg(temp));
+ emit(MOV(dst, src_reg(result)));
+  }
   break;
+   }
 
case nir_op_ball_iequal2:
case nir_op_ball_iequal3:
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 049/103] i965/disasm: print NibCtrl for instructions with execsize < 8

2016-10-11 Thread Iago Toral Quiroga
v2 (Curro):
  - Print it also for execsize < 4.
  - QtrCtrl is still in effect, so print 2 * qtr_ctl + nib_ctl + 1
  - Do not read the nib ctl from the instruction in gen < 7,
the field only exists in gen7+.
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 1d2a4d2..0c43217 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -1193,7 +1193,11 @@ qtr_ctrl(FILE *file, const struct gen_device_info 
*devinfo, brw_inst *inst)
int qtr_ctl = brw_inst_qtr_control(devinfo, inst);
int exec_size = 1 << brw_inst_exec_size(devinfo, inst);
 
-   if (exec_size == 8) {
+   if (exec_size < 8) {
+  const unsigned nib_ctl = devinfo->gen < 7 ? 0 :
+   brw_inst_nib_control(devinfo, inst);
+  format(file, " %dN", qtr_ctl * 2 + nib_ctl + 1);
+   } else if (exec_size == 8) {
   switch (qtr_ctl) {
   case 0:
  string(file, " 1Q");
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 080/103] i965/vec4: fix indentation in move_push_constants_to_pull_constants()

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 60 +-
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 75e47f9..0788ba2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -842,34 +842,34 @@ vec4_visitor::move_push_constants_to_pull_constants()
   pull_constant_loc[i / 4] = -1;
 
   if (i >= max_uniform_components) {
-const gl_constant_value **values = &stage_prog_data->param[i];
+ const gl_constant_value **values = &stage_prog_data->param[i];
 
-/* Try to find an existing copy of this uniform in the pull
- * constants if it was part of an array access already.
- */
-for (unsigned int j = 0; j < stage_prog_data->nr_pull_params; j += 4) {
-   int matches;
+ /* Try to find an existing copy of this uniform in the pull
+  * constants if it was part of an array access already.
+  */
+ for (unsigned int j = 0; j < stage_prog_data->nr_pull_params; j += 4) 
{
+int matches;
 
-   for (matches = 0; matches < 4; matches++) {
-  if (stage_prog_data->pull_param[j + matches] != values[matches])
- break;
-   }
+for (matches = 0; matches < 4; matches++) {
+   if (stage_prog_data->pull_param[j + matches] != values[matches])
+  break;
+}
 
-   if (matches == 4) {
-  pull_constant_loc[i / 4] = j / 4;
-  break;
-   }
-}
+if (matches == 4) {
+   pull_constant_loc[i / 4] = j / 4;
+   break;
+}
+ }
 
-if (pull_constant_loc[i / 4] == -1) {
-   assert(stage_prog_data->nr_pull_params % 4 == 0);
-   pull_constant_loc[i / 4] = stage_prog_data->nr_pull_params / 4;
+ if (pull_constant_loc[i / 4] == -1) {
+assert(stage_prog_data->nr_pull_params % 4 == 0);
+pull_constant_loc[i / 4] = stage_prog_data->nr_pull_params / 4;
 
-   for (int j = 0; j < 4; j++) {
-  stage_prog_data->pull_param[stage_prog_data->nr_pull_params++] =
+for (int j = 0; j < 4; j++) {
+   stage_prog_data->pull_param[stage_prog_data->nr_pull_params++] =
   values[j];
-   }
-}
+}
+ }
   }
}
 
@@ -878,21 +878,21 @@ vec4_visitor::move_push_constants_to_pull_constants()
 */
foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
   for (int i = 0 ; i < 3; i++) {
-if (inst->src[i].file != UNIFORM ||
+ if (inst->src[i].file != UNIFORM ||
  pull_constant_loc[inst->src[i].nr] == -1)
-   continue;
+continue;
 
  int uniform = inst->src[i].nr;
 
-dst_reg temp = dst_reg(this, glsl_type::vec4_type);
+ dst_reg temp = dst_reg(this, glsl_type::vec4_type);
 
-emit_pull_constant_load(block, inst, temp, inst->src[i],
-pull_constant_loc[uniform], src_reg());
+ emit_pull_constant_load(block, inst, temp, inst->src[i],
+ pull_constant_loc[uniform], src_reg());
 
-inst->src[i].file = temp.file;
+ inst->src[i].file = temp.file;
  inst->src[i].nr = temp.nr;
-inst->src[i].offset %= 16;
-inst->src[i].reladdr = NULL;
+ inst->src[i].offset %= 16;
+ inst->src[i].reladdr = NULL;
   }
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 062/103] i965/vec4: do not emit 64-bit MAD

2016-10-11 Thread Iago Toral Quiroga
The previous patch made sure that we do not generate MAD instructions
for any NIR's 64-bit ffma, but there is nothing preventing i965 from
producing MAD instructions as a result of lowerings or optimization
passes. This patch makes sure that any 64-bit MAD produced inside the
driver after translating from NIR is also converted to MUL+ADD before
we generate code.

v2:
  - Use a copy constructor to copy all relevant instruction fields from
the original mad into the add and mul instructions
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 44 ++
 src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
 2 files changed, 45 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 190581e..7af65ab 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2255,6 +2255,49 @@ vec4_visitor::scalarize_df()
return progress;
 }
 
+bool
+vec4_visitor::translate_64bit_mad_to_mul_add()
+{
+   bool progress = false;
+
+   foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
+  if (inst->opcode != BRW_OPCODE_MAD)
+ continue;
+
+  if (type_sz(inst->dst.type) != 8)
+ continue;
+
+  dst_reg mul_dst = dst_reg(this, glsl_type::dvec4_type);
+
+  /* Use the copy constructor so we copy all relevant instruction fields
+   * from the original mad into the add and mul instructions
+   */
+  vec4_instruction *mul = new(mem_ctx) vec4_instruction(*inst);
+  mul->opcode = BRW_OPCODE_MUL;
+  mul->dst = mul_dst;
+  mul->src[0] = inst->src[1];
+  mul->src[1] = inst->src[2];
+  mul->src[2].file = BAD_FILE;
+
+  vec4_instruction *add = new(mem_ctx) vec4_instruction(*inst);
+  add->opcode = BRW_OPCODE_ADD;
+  add->src[0] = src_reg(mul_dst);
+  add->src[1] = inst->src[0];
+  add->src[2].file = BAD_FILE;
+
+  inst->insert_before(block, mul);
+  inst->insert_before(block, add);
+  inst->remove(block);
+
+  progress = true;
+   }
+
+   if (progress)
+  invalidate_live_intervals();
+
+   return progress;
+}
+
 /* The align16 hardware can only do 32-bit swizzle channels, so we need to
  * translate the logical 64-bit swizzle channels that we use in the Vec4 IR
  * to 32-bit swizzle channels in hardware registers.
@@ -2414,6 +2457,7 @@ vec4_visitor::run()
if (failed)
   return false;
 
+   OPT(translate_64bit_mad_to_mul_add);
OPT(scalarize_df);
 
setup_payload();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 7e51c41..0af55c5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -164,6 +164,7 @@ public:
 
bool lower_simd_width();
bool scalarize_df();
+   bool translate_64bit_mad_to_mul_add();
void apply_logical_swizzle(struct brw_reg *hw_reg,
   vec4_instruction *inst, int arg);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 058/103] i965/vec4: fix pack_uniform_registers for doubles

2016-10-11 Thread Iago Toral Quiroga
We need to consider the fact that dvec3/4 require two vec4 slots.
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e5391b9..b79fd5e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -610,13 +610,20 @@ vec4_visitor::pack_uniform_registers()
  if (inst->src[i].file != UNIFORM)
 continue;
 
+ assert(type_sz(inst->src[i].type) % 4 == 0);
+ unsigned channel_size = type_sz(inst->src[i].type) / 4;
+
  int reg = inst->src[i].nr;
  for (int c = 0; c < 4; c++) {
 if (!(readmask & (1 << c)))
continue;
 
-chans_used[reg] = MAX2(chans_used[reg],
-   BRW_GET_SWZ(inst->src[i].swizzle, c) + 1);
+unsigned channel = BRW_GET_SWZ(inst->src[i].swizzle, c) + 1;
+unsigned used = MAX2(chans_used[reg], channel * channel_size);
+if (used <= 4)
+   chans_used[reg] = used;
+else
+   chans_used[reg + 1] = used - 4;
  }
   }
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 088/103] i965/vec4/tcs: fix outputs for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 31 --
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
index f62dc9c..914396c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
@@ -443,13 +443,40 @@ vec4_tcs_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
 
   unsigned first_component = nir_intrinsic_component(instr);
   if (first_component) {
+ if (nir_src_bit_size(instr->src[0]) == 64)
+first_component /= 2;
  assert(swiz == BRW_SWIZZLE_XYZW);
  swiz = BRW_SWZ_COMP_OUTPUT(first_component);
  mask = mask << first_component;
   }
 
-  emit_urb_write(swizzle(value, swiz), mask,
- imm_offset, indirect_offset);
+  if (nir_src_bit_size(instr->src[0]) == 64) {
+ /* For 64-bit data we need to shuffle the data before we write and
+  * emit two messages. Also, since each channel is twice as large we
+  * need to fix the writemask in each 32-bit message to account for it.
+  */
+ value = swizzle(retype(value, BRW_REGISTER_TYPE_DF), swiz);
+ dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type);
+ shuffle_64bit_data(shuffled, value, true);
+ src_reg shuffled_float = src_reg(retype(shuffled, 
BRW_REGISTER_TYPE_F));
+
+ for (int n = 0; n < 2; n++) {
+unsigned fixed_mask = 0;
+if (mask & WRITEMASK_X)
+   fixed_mask |= WRITEMASK_XY;
+if (mask & WRITEMASK_Y)
+   fixed_mask |= WRITEMASK_ZW;
+emit_urb_write(shuffled_float, fixed_mask,
+   imm_offset, indirect_offset);
+
+shuffled_float = offset(shuffled_float, 1);
+mask >>= 2;
+imm_offset++;
+ }
+  } else {
+ emit_urb_write(swizzle(value, swiz), mask,
+imm_offset, indirect_offset);
+  }
   break;
}
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 056/103] i965/disasm: fix subreg for dst in Align16 mode

2016-10-11 Thread Iago Toral Quiroga
There is a single bit for this, so it is a binary 0 or 1 meaning
offset 0B or 16B respectively.

v2:
  - Since brw_inst_dst_da16_subreg_nr() is known to be 1, remove it
from the expression (Curro)

Reviewed-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_disasm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
b/src/mesa/drivers/dri/i965/brw_disasm.c
index 0c43217..e439ec4 100644
--- a/src/mesa/drivers/dri/i965/brw_disasm.c
+++ b/src/mesa/drivers/dri/i965/brw_disasm.c
@@ -772,7 +772,7 @@ dest(FILE *file, const struct gen_device_info *devinfo, 
brw_inst *inst)
  if (err == -1)
 return 0;
  if (brw_inst_dst_da16_subreg_nr(devinfo, inst))
-format(file, ".%"PRIu64, brw_inst_dst_da16_subreg_nr(devinfo, 
inst) /
+format(file, ".%u", 16 /
reg_type_size[brw_inst_dst_reg_type(devinfo, inst)]);
  string(file, "<1>");
  err |= control(file, "writemask", writemask,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 081/103] i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 0788ba2..b0bc2d5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -884,10 +884,24 @@ vec4_visitor::move_push_constants_to_pull_constants()
 
  int uniform = inst->src[i].nr;
 
- dst_reg temp = dst_reg(this, glsl_type::vec4_type);
+ dst_reg temp;
+ if (type_sz(inst->src[i].type) != 8) {
+temp = dst_reg(this, glsl_type::vec4_type);
+emit_pull_constant_load(block, inst, temp, inst->src[i],
+pull_constant_loc[uniform], src_reg());
+ } else {
+dst_reg shuffled = dst_reg(this, glsl_type::dvec4_type);
+dst_reg shuffled_float = retype(shuffled, BRW_REGISTER_TYPE_F);
+
+emit_pull_constant_load(block, inst, shuffled_float, inst->src[i],
+pull_constant_loc[uniform], src_reg());
+emit_pull_constant_load(block, inst, offset(shuffled_float, 1),
+offset(inst->src[i], 1),
+pull_constant_loc[uniform], src_reg());
 
- emit_pull_constant_load(block, inst, temp, inst->src[i],
- pull_constant_loc[uniform], src_reg());
+temp = dst_reg(this, glsl_type::dvec4_type);
+shuffle_64bit_data(temp, src_reg(shuffled), false, block, inst);
+ }
 
  inst->src[i].file = temp.file;
  inst->src[i].nr = temp.nr;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 045/103] i965: move the group field from fs_inst to backend_instruction.

2016-10-11 Thread Iago Toral Quiroga
Just like the exec_size, we are going to need this in the vec4 backend
when we implement a simd splitting pass.
---
 src/mesa/drivers/dri/i965/brw_ir_fs.h  | 9 -
 src/mesa/drivers/dri/i965/brw_shader.h | 9 +
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 +
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h 
b/src/mesa/drivers/dri/i965/brw_ir_fs.h
index c569bd4..cad3712 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h
@@ -367,15 +367,6 @@ public:
 
uint8_t sources; /**< Number of fs_reg sources. */
 
-   /**
-* Channel group from the hardware execution and predication mask that
-* should be applied to the instruction.  The subset of channel enable
-* signals (calculated from the EU control flow and predication state)
-* given by [group, group + exec_size) will be used to mask GRF writes and
-* any other side effects of the instruction.
-*/
-   uint8_t group;
-
bool eot:1;
bool pi_noperspective:1;   /**< Pixel interpolator noperspective flag */
 };
diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
b/src/mesa/drivers/dri/i965/brw_shader.h
index aca26dc..0c8f296 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.h
+++ b/src/mesa/drivers/dri/i965/brw_shader.h
@@ -140,6 +140,15 @@ struct backend_instruction {
 */
uint8_t exec_size;
 
+   /**
+* Channel group from the hardware execution and predication mask that
+* should be applied to the instruction.  The subset of channel enable
+* signals (calculated from the EU control flow and predication state)
+* given by [group, group + exec_size) will be used to mask GRF writes and
+* any other side effects of the instruction.
+*/
+   uint8_t group;
+
uint32_t offset; /**< spill/unspill offset or texture offset bitfield */
uint8_t mlen; /**< SEND message length */
int8_t base_mrf; /**< First MRF in the SEND message, if mlen is nonzero. */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 4e7515c..75c60a0 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -55,6 +55,7 @@ vec4_instruction::vec4_instruction(enum opcode opcode, const 
dst_reg &dst,
this->base_mrf = 0;
this->offset = 0;
this->exec_size = 8;
+   this->group = 0;
this->size_written = (dst.file == BAD_FILE ?
  0 : this->exec_size * type_sz(dst.type));
this->annotation = NULL;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 031/103] i965/vec4: implement hardware workaround for align16 double to float conversion

2016-10-11 Thread Iago Toral Quiroga
From the BDW PRM, Workarounds chapter:

   "DF->f format conversion for Align16 has wrong emask calculation when
source is immediate."

So detect the case and move the immediate source to a VGRF before we attempt
the conversion.

Notice that Broadwell and later are strictly scalar at the moment though, so
this is not really necessary.
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 94d0161..0170d21 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1077,6 +1077,17 @@ void
 vec4_visitor::emit_double_to_single(dst_reg dst, src_reg src, bool saturate,
 brw_reg_type single_type)
 {
+   /* BDW PRM vol 15 - workarounds:
+* DF->f format conversion for Align16 has wrong emask calculation when
+* source is immediate.
+*/
+   if (devinfo->gen == 8 && single_type == BRW_REGISTER_TYPE_F &&
+   src.file == BRW_IMMEDIATE_VALUE) {
+  dst_reg fixed_src = dst_reg(this, glsl_type::dvec4_type);
+  emit(MOV(fixed_src, src));
+  src = src_reg(fixed_src);
+   }
+
dst_reg temp = dst_reg(this, glsl_type::dvec4_type);
emit(MOV(temp, src));
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 054/103] i965/vec4: translate 64-bit swizzles to 32-bit

2016-10-11 Thread Iago Toral Quiroga
The hardware can only operate with 32-bit swizzles, which is a rather
limiting restriction. However, the idea is not to expose this to the
optimization passes, which would be a mess to deal with. Instead, we let
the bulk of the vec4 backend ignore this fact and we fix the swizzles right
at codegen time.

At the moment the pass only needs to handle single value swizzles thanks to
the scalarization pass that runs before it.

Notice that this only works for X/Y swizzles. We will add support for Z/W
swizzles in the next patch, since they need a bit more work.

v2 (Sam):
  - Do not expand swizzle of 64-bit immediate values.

v3:
  - Do this after translation to hardware registers instead of doing it right
before so we don't need the force_vstride0 flag (Curro).
  - Squashed patch that included FIXED_GRF in the list of register files that
need this translation (Iago).
  - Remove swizzle assignments for VGRF and UNIFORM files in
convert_to_hw_regs(), they will be set by apply_logical_swizzle() (Iago).

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/mesa/drivers/dri/i965/brw_vec4.cpp | 49 +++---
 src/mesa/drivers/dri/i965/brw_vec4.h   |  2 ++
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index b15fcee..b37dd59 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -1891,7 +1891,6 @@ vec4_visitor::convert_to_hw_regs()
 unsigned width = REG_SIZE / 2 / MAX2(4, type_size);
 reg = byte_offset(brw_vecn_grf(width, src.nr, 0), src.offset);
 reg.type = src.type;
-reg.swizzle = src.swizzle;
 reg.abs = src.abs;
 reg.negate = src.negate;
 break;
@@ -1905,7 +1904,6 @@ vec4_visitor::convert_to_hw_regs()
  src.offset),
  0, width, 1);
 reg.type = src.type;
-reg.swizzle = src.swizzle;
 reg.abs = src.abs;
 reg.negate = src.negate;
 
@@ -1914,8 +1912,13 @@ vec4_visitor::convert_to_hw_regs()
 break;
  }
 
- case ARF:
  case FIXED_GRF:
+if (type_sz(src.type) == 8) {
+   reg = src.as_brw_reg();
+   break;
+}
+/* fallthrough */
+ case ARF:
  case IMM:
 continue;
 
@@ -1929,6 +1932,7 @@ vec4_visitor::convert_to_hw_regs()
 unreachable("not reached");
  }
 
+ apply_logical_swizzle(®, inst, i);
  src = reg;
   }
 
@@ -2226,6 +2230,45 @@ vec4_visitor::scalarize_df()
return progress;
 }
 
+/* The align16 hardware can only do 32-bit swizzle channels, so we need to
+ * translate the logical 64-bit swizzle channels that we use in the Vec4 IR
+ * to 32-bit swizzle channels in hardware registers.
+ *
+ * @inst and @arg identify the original vec4 IR source operand we need to
+ * translate the swizzle for and @hw_reg is the hardware register where we
+ * will write the hardware swizzle to use.
+ *
+ * This pass assumes that Align16/DF instructions have been fully scalarized
+ * previously so there is just one 64-bit swizzle channel to deal with for any
+ * given Vec4 IR source.
+ */
+void
+vec4_visitor::apply_logical_swizzle(struct brw_reg *hw_reg,
+vec4_instruction *inst, int arg)
+{
+   src_reg reg = inst->src[arg];
+
+   if (reg.file == BAD_FILE || reg.file == BRW_IMMEDIATE_VALUE)
+  return;
+
+   /* If this is not a 64-bit operand or this is a scalar instruction we don't
+* need to do anything about the swizzles.
+*/
+   if(type_sz(reg.type) < 8 || is_align1_df(inst)) {
+  hw_reg->swizzle = reg.swizzle;
+  return;
+   }
+
+   /* Otherwise we should have scalarized the instruction, so take the single
+* 64-bit logical swizzle channel and translate it to 32-bit
+*/
+   assert(brw_is_single_value_swizzle(reg.swizzle));
+
+   unsigned swizzle = BRW_GET_SWZ(reg.swizzle, 0);
+   hw_reg->swizzle = BRW_SWIZZLE4(swizzle * 2, swizzle * 2 + 1,
+  swizzle * 2, swizzle * 2 + 1);
+}
+
 bool
 vec4_visitor::run()
 {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 03c7345..7e51c41 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -164,6 +164,8 @@ public:
 
bool lower_simd_width();
bool scalarize_df();
+   void apply_logical_swizzle(struct brw_reg *hw_reg,
+  vec4_instruction *inst, int arg);
 
vec4_instruction *emit(vec4_instruction *inst);
 
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   4   >