On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <ito...@igalia.com> wrote: > > We use ALign16 mode for this, since it is more convenient, but the PRM > for Broadwell states in Volume 3D Media GPGPU, Chapter 'Register region > restrictions', Section '1. Special Restrictions': > > "In Align16 mode, the channel selects and channel enables apply to a > pair of half-floats, because these parameters are defined for DWord > elements ONLY. This is applicable when both source and destination > are half-floats." > > This means that we cannot select individual HF elements using swizzles > like we do with 32-bit floats so we can't implement the required > regioning for this. > > Use the gen11 path for this instead, which uses Align1 mode. > > The restriction is not present in gen9 or gen10, where the Align16 > implementation seems to work just fine. > > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> > --- > src/intel/compiler/brw_fs_generator.cpp | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/src/intel/compiler/brw_fs_generator.cpp > b/src/intel/compiler/brw_fs_generator.cpp > index d0cc4a6d231..4310f0b7fdc 100644 > --- a/src/intel/compiler/brw_fs_generator.cpp > +++ b/src/intel/compiler/brw_fs_generator.cpp > @@ -1339,8 +1339,14 @@ fs_generator::generate_ddy(const fs_inst *inst, > const uint32_t type_size = type_sz(src.type); > > if (inst->opcode == FS_OPCODE_DDY_FINE) { > - /* produce accurate derivatives */ > - if (devinfo->gen >= 11) { > + /* produce accurate derivatives. We can do this easily in Align16 > + * but this is not supported in gen11+ and gen8 Align16 swizzles > + * for Half-Float operands work in units of 32-bit and always > + * select pairs of consecutive half-float elements, so we can't use > + * use it for this. > + */
Let's break this comment up and include (or move) the BSpec text from the commit message here. I wouldn't mention the "this is not supported in gen11+" because it's slightly unclear whether you're talking about "accurate derivatives" or "Align16". How about: /* produce accurate derivatives. * * From the Broadwell PRM, Volume 7 (3D-Media-GPGPU) * "Register Region Restrictions", Section "1. Special Restrictions": * * "In Align16 mode, the channel selects and channel enables apply to * a pair of half-floats, because these parameters are defined for * DWord elements ONLY. This is applicable when both source and * destination are half-floats." * * So for half-float operations we use the Gen11+ Align1 path. CHV * inherits its FP16 hardware from SKL, so it is not affected. */ > + if (devinfo->gen >= 11 || > + (devinfo->gen == 8 && src.type == BRW_REGISTER_TYPE_HF)) { The docs are bad about telling us whether a BDW restriction applies to CHV as well, but in this case I suspect the answer is no. CHV seems to inherit its FP16 hw from SKL, which doesn't have the restriction as you say. So I suspect you want devinfo->is_broadwell instead of devinfo->gen == 8. With that (and confirmation that CHV isn't affected), this patch is Reviewed-by: Matt Turner <matts...@gmail.com> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev