We use ALign16 mode for this, since it is more convenient, but the PRM for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region restrictions', Section '1. Special Restrictions':
"In Align16 mode, the channel selects and channel enables apply to a pair of half-floats, because these parameters are defined for DWord elements ONLY. This is applicable when both source and destination are half-floats." This means that we cannot select individual HF elements using swizzles like we do with 32-bit floats so we can't implement the required regioning for this. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in gen9 or gen10, where the Align16 implementation seems to work just fine. Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> --- src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/src/intel/compiler/brw_fs_generator.cpp b/src/intel/compiler/brw_fs_generator.cpp index d0cc4a6d231..4310f0b7fdc 100644 --- a/src/intel/compiler/brw_fs_generator.cpp +++ b/src/intel/compiler/brw_fs_generator.cpp @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst *inst, const uint32_t type_size = type_sz(src.type); if (inst->opcode == FS_OPCODE_DDY_FINE) { - /* produce accurate derivatives */ - if (devinfo->gen >= 11) { + /* produce accurate derivatives. We can do this easily in Align16 + * but this is not supported in gen11+ and gen8 Align16 swizzles + * for Half-Float operands work in units of 32-bit and always + * select pairs of consecutive half-float elements, so we can't use + * use it for this. + */ + if (devinfo->gen >= 11 || + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) { src = stride(src, 0, 2, 1); struct brw_reg src_0 = byte_offset(src, 0 * type_size); struct brw_reg src_2 = byte_offset(src, 2 * type_size); -- 2.17.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev