From: Iago Toral Quiroga <ito...@igalia.com> In fp64 we can produce code like this:
mov(16) vgrf2<2>:UD, vgrf3<2>:UD That our simd lowering pass would typically split in instructions with a width of 8, writing to two consecutive registers each. Unfortunately, gen7 hardware has a bug affecting execution masking and as a result, the second GRF register write won't work properly. Curro verified this: "The problem is that pre-Gen8 EUs are hardwired to use the QtrCtrl+1 (where QtrCtrl is the 8-bit quarter of the execution mask signals specified in the instruction control fields) for the second compressed half of any single-precision instruction (for double-precision instructions it's hardwired to use NibCtrl+1), which means that the EU will apply the wrong execution controls for the second sequential GRF write if the number of channels per GRF is not exactly eight in single-precision mode (or four in double-float mode)." In practice, this means that we cannot write more than one consecutive GRF in a single instruction if the number of channels per GRF is not exactly eight in single-precision mode (or four in double-float mode). This patch makes our SIMD lowering pass split this kind of instructions so that the split versions only write to a single register. In the example above this means that we split the write in 4 instructions, each one writing 4 UD elements (width = 4) to a single register. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 2f473cc..caf88d1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -4677,6 +4677,26 @@ static unsigned get_fpu_lowered_simd_width(const struct brw_device_info *devinfo, const fs_inst *inst) { + /* Pre-Gen8 EUs are hardwired to use the QtrCtrl+1 (where QtrCtrl is + * the 8-bit quarter of the execution mask signals specified in the + * instruction control fields) for the second compressed half of any + * single-precision instruction (for double-precision instructions + * it's hardwired to use NibCtrl+1), which means that the EU will + * apply the wrong execution controls for the second sequential GRF + * write if the number of channels per GRF is not exactly eight in + * single-precision mode (or four in double-float mode). + * + * In this situation we calculate the maximum size of the split + * instructions so they only ever write to a single register. + */ + unsigned type_size = type_sz(inst->dst.type); + unsigned channels_per_grf = inst->exec_size / inst->regs_written; + assert(channels_per_grf > 0); + if (devinfo->gen < 8 && inst->regs_written > 1 && + channels_per_grf != REG_SIZE / type_size) { + return channels_per_grf; + } + /* Maximum execution size representable in the instruction controls. */ unsigned max_width = MIN2(32, inst->exec_size); -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev