From: Iago Toral Quiroga <ito...@igalia.com>

In fp64 we can produce code like this:

mov(16) vgrf2<2>:UD, vgrf3<2>:UD

That our simd lowering pass would typically split in instructions with a
width of 8, writing to two consecutive registers each. Unfortunately, gen7
hardware has a bug affecting execution masking and as a result, the
second GRF register write won't work properly. Curro verified this:

"The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1
 (where QtrCtrl is the 8-bit quarter of the execution mask signals
 specified in the instruction control fields) for the second
 compressed half of any single-precision instruction (for
 double-precision instructions it's hardwired to use NibCtrl+1),
 which means that the EU will apply the wrong execution controls
 for the second sequential GRF write if the number of channels per
 GRF is not exactly eight in single-precision mode (or four in
 double-float mode)."

In practice, this means that we cannot write more than one
consecutive GRF in a single instruction if the number of channels
per GRF is not exactly eight in single-precision mode (or four
in double-float mode).

This patch makes our SIMD lowering pass split this kind of instructions
so that the split versions only write to a single register. In the
example above this means that we split the write in 4 instructions, each
one writing 4 UD elements (width = 4) to a single register.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 2f473cc..caf88d1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4677,6 +4677,26 @@ static unsigned
 get_fpu_lowered_simd_width(const struct brw_device_info *devinfo,
                            const fs_inst *inst)
 {
+   /* Pre-Gen8 EUs are hardwired to use the QtrCtrl+1 (where QtrCtrl is
+    * the 8-bit quarter of the execution mask signals specified in the
+    * instruction control fields) for the second compressed half of any
+    * single-precision instruction (for double-precision instructions
+    * it's hardwired to use NibCtrl+1), which means that the EU will
+    * apply the wrong execution controls for the second sequential GRF
+    * write if the number of channels per GRF is not exactly eight in
+    * single-precision mode (or four in double-float mode).
+    *
+    * In this situation we calculate the maximum size of the split
+    * instructions so they only ever write to a single register.
+    */
+   unsigned type_size = type_sz(inst->dst.type);
+   unsigned channels_per_grf = inst->exec_size / inst->regs_written;
+   assert(channels_per_grf > 0);
+   if (devinfo->gen < 8 && inst->regs_written > 1 &&
+       channels_per_grf != REG_SIZE / type_size) {
+      return channels_per_grf;
+   }
+
    /* Maximum execution size representable in the instruction controls. */
    unsigned max_width = MIN2(32, inst->exec_size);
 
-- 
2.7.4

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to