On Thu, 2017-01-19 at 15:00 -0800, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > On Wed, 2017-01-18 at 12:44 -0800, Francisco Jerez wrote: > > > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > > > > > On Tue, 2017-01-17 at 13:26 -0800, Francisco Jerez wrote: > > > > > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > > > > > > > > > From: "Juan A. Suarez Romero" <jasua...@igalia.com> > > > > > > > > > > > > When converting a DF to F, we set dst stride to 2, to > > > > > > fulfil > > > > > > alignment > > > > > > restrictions. > > > > > > > > > > > > But in IVB/BYT, this is not necessary, as each DF > > > > > > conversion > > > > > > already > > > > > > writes 2 F, the first one the real value, and the second > > > > > > one a > > > > > > 0. > > > > > > That > > > > > > is, IVB/BYT already set stride = 2 implicitly, so we must > > > > > > set > > > > > > it to > > > > > > 1 > > > > > > explicitly to avoid ending up with stride = 4. > > > > > > > > > > > > v2: > > > > > > - Fix typo (Matt) > > > > > > --- > > > > > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 > > > > > > ++++++++++ > > > > > > 1 file changed, 10 insertions(+) > > > > > > > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > > > > index 45881e3ec95..487f2e90224 100644 > > > > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > > > > @@ -1629,6 +1629,16 @@ fs_generator::generate_code(const > > > > > > cfg_t > > > > > > *cfg, int dispatch_width) > > > > > > inst->src[i].type != BRW_REGISTER_TYPE_UD > > > > > > || > > > > > > !inst->src[i].negate); > > > > > > } > > > > > > + /* When converting from DF->F, we set destination's > > > > > > stride > > > > > > as 2 as an > > > > > > + * alignment requirement. But in IVB/BYT, each DF > > > > > > implicitly > > > > > > writes 2 F, > > > > > > + * being the first one the converted value. So we > > > > > > don't > > > > > > need > > > > > > to > > > > > > + * explicitly set stride 2, but 1. > > > > > > + */ > > > > > > + if (devinfo->gen == 7 && !devinfo->is_haswell && > > > > > > + type_sz(inst->src[0].type) > type_sz(inst- > > > > > > > dst.type)) { > > > > > > > > > > This should be exec_type_size(inst) rather than > > > > > type_sz(inst->src[0].type). > > > > > > > > > > > > > Actually it is going to be the same as this is going to catch > > > > only > > > > DF > > > > -> 32-bit data type conversions. But yeah, you are right. > > > > > > > > > > Are you sure? AFAICT the few lines of this patch were going to > > > be > > > executed for *all* instructions, you didn't even have a check for > > > the > > > execution type of the instruction being FP64... > > > > > > > Actually this is an error, I fixed it locally before sending this > > email > > by checking if it is a DF to 32-bit data type conversion and > > forgot to > > mention it. > > > > Yes, not checking for FP64 was an error, but applying this workaround > regardless of the instruction opcode seemed reasonable, that way > you'd > avoid having to take this into account for every combination of > double > precision opcode (apparently two in this series) and destination type > you end up using. >
Right. > > > > > > + assert(inst->dst.stride == 2 || inst->dst.stride > > > > > > == > > > > > > 1); > > > > > > + inst->dst.stride = 1; > > > > > > + } > > > > > > > > > > This is modifying the IR, please don't. > > > > > > > > > > > > > Right, I am going to do the change in brw_eu_emit.c inside the > > > > brw_MOV() function that Matt added in other patch and also when > > > > emitting the MOV indirect. > > > > > > > > > Also I don't think the above has the same semantics as a > > > > > destination > > > > > region with stride 2... AFAIUI IVB will just write garbage > > > > > into > > > > > the > > > > > odd > > > > > channels when the destination type is narrower than a DF > > > > > which is > > > > > really > > > > > not what a strided move is supposed to do. If that's the > > > > > case it > > > > > would > > > > > probably be safer to add a new F64TO32 virtual opcode for > > > > > type > > > > > conversions and assert(inst->dst.stride == 1) here... > > > > > > > > > > > > > I think adding a virtual opcode for this change is likely too > > > > much. > > > > I > > > > will keep the aforementioned changes. However, I don't have a > > > > strong > > > > opinion against it: if you prefer the virtual opcode, we can > > > > add it > > > > now > > > > or even later as a follow-up patch. > > > > > > > > > > TBH I'm not particularly fond of the idea of adding a virtual > > > conversion > > > opcode for IVB either, but if you don't, I hope you have some > > > other > > > plan > > > to deal with the subtle kind of breakage this could > > > cause... E.g. > > > some > > > future, seemingly unrelated and obviously correct (because it's > > > the > > > codegen pass that will be breaking a contract with this patch) > > > optimizer > > > improvement could rely on the (seemingly obvious) assumption that > > > strided destination regions don't corrupt any of the skipped > > > components > > > (honestly I'm not fully convinced that we don't rely on this > > > already), > > > which would probably work fine except on IVB for this > > > infrequently > > > used > > > feature, and even then the kind of failure mode is likely to be > > > non-deterministic and may not be caught by the simplest test- > > > cases. > > > > > > > I see the problem now, thanks for the explanation. However this > > problem > > affects all generations that support doubles as this is not > > specific to > > IVB. From the Broadwell PRM, 3D Media GPGPU, "Double Precision > > Float to > > Single Precision Float": > > > > "The upper Dword of every Qword will be written with > > undefined value > > when converting DF to F." > > > > Shoot, that sucks, but makes the matter even more compelling. > > > What makes special to IVB is to use stride 1 instead of 2 in the > > destination region. > > > > Then, the virtual conversion code makes more sense now to me > > because > > this way we identify the cases where we have corrupted skipped > > components. > > > > > Some alternatives to adding a virtual opcode: > > > > > > - Continue using regular moves with destination stride 2 as > > > you're > > > doing here, but call the contents of the skipped components > > > undefined > > > after the instruction is executed. Add some simple code to > > > the FS > > > validator pass to warn us if we ever break this assumption -- > > > The > > > most naive implementation of this could just make sure that > > > the > > > destination VGRF of any strided narrowing conversion is only > > > ever > > > written once on IVB (which could conceivably give false > > > positives > > > but > > > never false negatives). > > > > > > - Same as above, but instead of the validation pass, write a > > > legalization pass to fix up invalid strided conversions to use > > > a > > > temporary instead of the real destination which would then > > > copied > > > into the real register (this may have benefits of its own > > > because > > > the > > > same logic could potentially by used to get rid of an amount > > > of > > > cruft > > > from the compiler back-end intended to address execution type > > > alignment restrictions of the destination and sources of > > > various > > > instructions, which gets particularly annoying on CHV/BXT). > > > > > > > Actually, such legalization is done for d2x conversions in > > fs_visitor::lower_d2x(), then the added virtual opcode will > > identify > > the temporary conversion to avoid the problem with the assumption > > done > > by future optimizations that you mentioned. > > > > I have a couple of patches, one to add the virtual opcode for the > > conversions done by the d2x pass [0] and another virtual opcode to > > fix > > a couple of mov indirects we generated [1]. > > > > I don't think the FS_OPCODE_FROM_DOUBLE opcode as you defined it in > [0] > would substantially improves the situation, because you still use a > strided 32-bit destination so you'll be breaking the exact same > assumption MOV instructions were breaking. > > That said, because you have a legalization pass in place already, the > virtual opcode seems unnecessary to isolate the optimizer from this > hardware bogosity. Let's just make sure that the pass kicks in in > all > cases where this could be a problem, e.g. for any double-precision > instructions where the destination type is narrower than the > execution > type. > OK. Thanks! Sam > > As we are blocking the release and there more patches for review, > > another possibility is to land this patch [2] (replacing this > > patch) > > and then fix it later to not block Mesa 17.0 release more than > > needed. > > > > What do you think? > > > > Sam > > > > > > [0] https://github.com/Igalia/mesa/commit/2e98b6238996756ee489adcf9 > > 8377 > > d9314de2cc9 > > As we have {VEC4,FS}_OPCODE_FROM_DOUBLE, I renamed them later to > > SHADER_OPCODE_FROM_DOUBLE https://github.com/Igalia/mesa/commit/57e > > 2a2d > > 1fbbe2464a42f590589826fe46a341e9d > > > > [1] https://github.com/Igalia/mesa/commit/2307ace353423713c5a9d193f > > c200 > > 6ca738a841e > > > > [2] Notice this is a temporary patch before doing actual one: > > https://g > > ithub.com/samuelig/mesa/commit/61cd28175178a0c240be1492f9e535c30d90 > > 8c77 > > > > > > P.S: you can see changes [0] and [1] together in our v3 development > > branch: > > > > https://github.com/Igalia/mesa/tree/i965-fp64-gen7-ivb-scalar-vec4- > > rc3 > > > > > > Sam > > > > > > > > > > dst = brw_reg_from_fs_reg(compiler->devinfo, inst, > > > > > > &inst->dst, compressed); > > > > > > > > > > > > -- > > > > > > 2.11.0 > > > > > > > > > > > > _______________________________________________ > > > > > > mesa-dev mailing list > > > > > > mesa-dev@lists.freedesktop.org > > > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: This is a digitally signed message part
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev