Jason Ekstrand <ja...@jlekstrand.net> writes: > On Sat, May 26, 2018 at 3:36 PM, Francisco Jerez <curroje...@riseup.net> > wrote: > >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> > On Sat, May 26, 2018 at 2:03 PM, Francisco Jerez <curroje...@riseup.net> >> > wrote: >> > >> >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> >> >> > On Sat, May 26, 2018 at 12:21 PM, Francisco Jerez < >> curroje...@riseup.net >> >> > >> >> > wrote: >> >> > >> >> >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> >> >> >> >> > On Sat, May 26, 2018 at 11:10 AM, Francisco Jerez < >> >> curroje...@riseup.net >> >> >> > >> >> >> > wrote: >> >> >> > >> >> >> >> Jason Ekstrand <ja...@jlekstrand.net> writes: >> >> >> >> >> >> >> >> > From: Francisco Jerez <curroje...@riseup.net> >> >> >> >> > >> >> >> >> > When we don't have PLN (gen4 and gen11+), we implement LINTERP >> as >> >> >> either >> >> >> >> > LINE+MAC or a pair of MADs. In both cases, the accumulator is >> >> written >> >> >> >> > by the first of the two instructions and read by the second. >> Even >> >> >> >> > though the accumulator value isn't actually ever used from a >> >> logical >> >> >> >> > instruction perspective, it is trashed so we need to make the >> >> >> scheduler >> >> >> >> > aware. Otherwise, the scheduler could end up re-ordering >> >> instructions >> >> >> >> > and putting a LINTERP between another an instruction which >> writes >> >> the >> >> >> >> > accumulator and another which tries to use that result. >> >> >> >> > >> >> >> >> > Cc: mesa-sta...@lists.freedesktop.org >> >> >> >> > --- >> >> >> >> > src/intel/compiler/brw_shader.cpp | 3 ++- >> >> >> >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> >> >> >> > >> >> >> >> > diff --git a/src/intel/compiler/brw_shader.cpp >> >> >> b/src/intel/compiler/brw_ >> >> >> >> shader.cpp >> >> >> >> > index 141b64e..dfd2c5c 100644 >> >> >> >> > --- a/src/intel/compiler/brw_shader.cpp >> >> >> >> > +++ b/src/intel/compiler/brw_shader.cpp >> >> >> >> > @@ -984,7 +984,8 @@ backend_instruction::writes_ >> >> >> accumulator_implicitly(const >> >> >> >> struct gen_device_info >> >> >> >> > return writes_accumulator || >> >> >> >> > (devinfo->gen < 6 && >> >> >> >> > ((opcode >= BRW_OPCODE_ADD && opcode < >> BRW_OPCODE_NOP) >> >> || >> >> >> >> > - (opcode >= FS_OPCODE_DDX_COARSE && opcode <= >> >> >> >> FS_OPCODE_LINTERP))); >> >> >> >> > + (opcode >= FS_OPCODE_DDX_COARSE && opcode <= >> >> >> >> FS_OPCODE_LINTERP))) || >> >> >> >> > + (opcode == FS_OPCODE_LINTERP && !devinfo->has_pln); >> >> >> >> >> >> >> >> The devinfo->has_pln condition is technically inaccurate, because >> >> even >> >> >> >> SNB will fall back to the non-PLN path which overwrites the >> >> accumulator >> >> >> >> for certain valid IR, which yeah I'm aware is not *typically* >> >> >> >> encountered before this series, >> >> >> > >> >> >> > >> >> >> > I'm pretty sure it's impossible before this series because, without >> >> >> SIMD32, >> >> >> > the barycentric coordinates always start at g2 and get incremented >> by >> >> 2 >> >> >> > every time. The only other way to get something into the >> coordinates >> >> >> > source of the PLN is with a pixel interpolator message. For that, >> the >> >> >> > register allocator has a workaround to ensure that it's assigned an >> >> even >> >> >> > register on SNB. One of the early patches in my series replaces >> the >> >> >> broken >> >> >> > gen4.5-6 PLN fall-back (it didn't work in SIMD16 because it assumed >> >> the >> >> >> > wrong register layout for coordinates) with an assert and Jenkins >> is >> >> just >> >> >> > fine with the assert. >> >> >> > >> >> >> >> >> >> I know the conditions for the non-PLN fall-back are not typically >> >> >> encountered on Gen5-6, but it's still valid IR, so this >> implementation >> >> >> of writes_accumulator_implicitly() relies on the behavior of the >> >> >> register allocator, the NIR-to-i965 translation pass and the rest of >> the >> >> >> visitor being exactly what you expect in every possible codepath for >> >> >> correctness. >> >> > >> >> > >> >> > It's already relied on for correctness because the LINE+MAC fallback >> that >> >> > we had before is wrong in SIMD16 (see patch 33). >> >> > >> >> >> >> Yes, I had to do basically the same fix on my SIMD32 branch in order for >> >> it to pass piglit on Gen4-5. Still hardly an excuse to make more code >> >> rely on the same broken assumption which wasn't doing it before. >> >> >> > >> > I'm very confused by what you mean here. The old code assumed the >> LINE+MAC >> > layout register layout which is different from the PLN layout. This >> means >> > that the fallback is broken for g45-gen6 because the hardware supports >> PLN >> > so we use the PLN layout which arranges thing x1y1x2y2 but the LINE+MAC >> > fallback assumes x1x2y1y2. The reason why we need so much more code is >> > because we have to split it all the way down to 8-wide in order to handle >> > the PLN layout with LINE+MAC. Looking at the code in your branch, I have >> > no idea how it could possibly work correctly. >> > >> >> It worked correctly with the LINE/MAC path (and in fact it passed piglit >> with SIMD32 enabled on Gen4-5) in exactly the same way PATCH 33 of this >> series does, but it used a loop instead of the 3x hand-unrolled code of >> PATCH 33, and it took advantage of the two accumulator registers for >> better pipelining. > > > I dug around a bit more in your your branch and I think I know what's going > on now. In your patch "i965/fs: Fix LINTERP instruction codegen for > misaligned regs on pre-Gen7 hw", you changed the LINE+MAC emit code to use > a loop to generate 1, 2, or 4 8-wide instructions. This implicitly > switches the assumed layout of barycentric coordinates in the fall-back > code to the PLN convention (x1y1x2y2) and breaks SIMD16 on gen4 which lays > the barycentric coordinates out as x1x2y1y2. In "i965/fs: Fix Gen4-5 > interpolation setup for SIMD32", you silently change the barycentric > coordinate layout convention on gen4 to be the PLN convention and it gets > fixed again.
Yes, that's right, it relies on the other patch that makes the payload barycentric layout consistent across generations. And yes, I didn't spend any effort documenting the changes nor ordering things in a way that would allow bisection, since that branch wasn't meant to be reviewed nor merged upstream yet. > I'm also very confused about accumulators since the SNB documentation > says that we have 16 floats worth (two accumulators) and your branch > definitely does 4 8-wide LINE instructions followed by 4 8-wide MAC > instructions. I agree that taking advantage of the two accumulators > would be good, but the patch I see in your branch seems to be > incorrect. > No, my branch wasn't doing 4 8-wide instructions in a row, that would require four accumulators -- It was relying on SIMD lowering to split things up into 16-wide LINTERP instructions, just like your patch is. > All in all, changing the gen4 barycentric coordinate convention doesn't > seem like all that bad of a thing. It's a bit odd that we lay them out > differently on exactly one platform. I suppose, though, that we might be > able to generate SIMD16 MADs on gen11 if we started using the LINE > convention again. Not sure if it's worth it. > > >> I can send you the patch as a replacement for PATCH >> 33 if you care at all about performance of this corner case on Gen4-6... >> >> > My point is that the odd register fallback on g4x-gen6 was never used and >> > didn't work even if it was. >> > >> >> My point was that it doesn't matter for correctness of this patch >> whether another codepath in the compiler is buggy under approximately >> the same conditions. If you think there is some sort of benefit from >> committing knowingly broken code just so you can fix it in the next >> commit, feel free, but please drop my name from the author tag since I >> didn't intend it to. >> >> > --Jason >> > >> > >> >> > >> >> >> That wasn't the case in my original patch, nor in your >> >> >> series after PATCHv2 33 because this inaccuracy actually becomes a >> >> >> problem. Instead of introducing code which we know is dubiously >> >> >> correct, CC'ing mesa-stable, and then fixing it immediately, why >> don't >> >> >> we just do the obviously correct thing from the start? >> >> >> >> >> > >> >> > Sure, we can squash them together if you really want. But if you're >> >> > concerned about this restriction being valid in live code, then 33/53 >> >> also >> >> > needs to go to stable. I'm fine with that if you're unconvinced by my >> >> > argument that LINTERP with an odd coordinate never occurs. >> >> > >> >> > --Jason >> >> > >> >> > >> >> >> > >> >> >> >> but why make things more inaccurate than >> >> >> >> the original only to revert back to a devinfo->gen based check in >> >> >> >> PATCHv2 33? >> >> >> >> >> >> >> > >> >> >> > See above. >> >> >> > >> >> >> > >> >> >> >> I think I'd squash the last hunk of PATCHv2 33 into this one which >> >> would >> >> >> >> give you something functionally equivalent to v1 but updated to >> >> handle >> >> >> >> Gen11 correctly. >> >> >> >> >> >> >> >> > } >> >> >> >> > >> >> >> >> > bool >> >> >> >> > -- >> >> >> >> > 2.5.0.400.gff86faf >> >> >> >> >> >> >> >> >> >>
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev