fs: Mark LINTERP opcode as writing accumulator on platforms without PLN

Francisco Jerez Sat, 26 May 2018 18:50:14 -0700

Jason Ekstrand <ja...@jlekstrand.net> writes:

> On Sat, May 26, 2018 at 3:36 PM, Francisco Jerez <curroje...@riseup.net>
> wrote:
>
>> Jason Ekstrand <ja...@jlekstrand.net> writes:
>>
>> > On Sat, May 26, 2018 at 2:03 PM, Francisco Jerez <curroje...@riseup.net>
>> > wrote:
>> >
>> >> Jason Ekstrand <ja...@jlekstrand.net> writes:
>> >>
>> >> > On Sat, May 26, 2018 at 12:21 PM, Francisco Jerez <
>> curroje...@riseup.net
>> >> >
>> >> > wrote:
>> >> >
>> >> >> Jason Ekstrand <ja...@jlekstrand.net> writes:
>> >> >>
>> >> >> > On Sat, May 26, 2018 at 11:10 AM, Francisco Jerez <
>> >> curroje...@riseup.net
>> >> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> >> Jason Ekstrand <ja...@jlekstrand.net> writes:
>> >> >> >>
>> >> >> >> > From: Francisco Jerez <curroje...@riseup.net>
>> >> >> >> >
>> >> >> >> > When we don't have PLN (gen4 and gen11+), we implement LINTERP
>> as
>> >> >> either
>> >> >> >> > LINE+MAC or a pair of MADs.  In both cases, the accumulator is
>> >> written
>> >> >> >> > by the first of the two instructions and read by the second.
>> Even
>> >> >> >> > though the accumulator value isn't actually ever used from a
>> >> logical
>> >> >> >> > instruction perspective, it is trashed so we need to make the
>> >> >> scheduler
>> >> >> >> > aware.  Otherwise, the scheduler could end up re-ordering
>> >> instructions
>> >> >> >> > and putting a LINTERP between another an instruction which
>> writes
>> >> the
>> >> >> >> > accumulator and another which tries to use that result.
>> >> >> >> >
>> >> >> >> > Cc: mesa-sta...@lists.freedesktop.org
>> >> >> >> > ---
>> >> >> >> >  src/intel/compiler/brw_shader.cpp | 3 ++-
>> >> >> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >> >> >> >
>> >> >> >> > diff --git a/src/intel/compiler/brw_shader.cpp
>> >> >> b/src/intel/compiler/brw_
>> >> >> >> shader.cpp
>> >> >> >> > index 141b64e..dfd2c5c 100644
>> >> >> >> > --- a/src/intel/compiler/brw_shader.cpp
>> >> >> >> > +++ b/src/intel/compiler/brw_shader.cpp
>> >> >> >> > @@ -984,7 +984,8 @@ backend_instruction::writes_
>> >> >> accumulator_implicitly(const
>> >> >> >> struct gen_device_info
>> >> >> >> >     return writes_accumulator ||
>> >> >> >> >            (devinfo->gen < 6 &&
>> >> >> >> >             ((opcode >= BRW_OPCODE_ADD && opcode <
>> BRW_OPCODE_NOP)
>> >> ||
>> >> >> >> > -            (opcode >= FS_OPCODE_DDX_COARSE && opcode <=
>> >> >> >> FS_OPCODE_LINTERP)));
>> >> >> >> > +            (opcode >= FS_OPCODE_DDX_COARSE && opcode <=
>> >> >> >> FS_OPCODE_LINTERP))) ||
>> >> >> >> > +          (opcode == FS_OPCODE_LINTERP && !devinfo->has_pln);
>> >> >> >>
>> >> >> >> The devinfo->has_pln condition is technically inaccurate, because
>> >> even
>> >> >> >> SNB will fall back to the non-PLN path which overwrites the
>> >> accumulator
>> >> >> >> for certain valid IR, which yeah I'm aware is not *typically*
>> >> >> >> encountered before this series,
>> >> >> >
>> >> >> >
>> >> >> > I'm pretty sure it's impossible before this series because, without
>> >> >> SIMD32,
>> >> >> > the barycentric coordinates always start at g2 and get incremented
>> by
>> >> 2
>> >> >> > every time.  The only other way to get something into the
>> coordinates
>> >> >> > source of the PLN is with a pixel interpolator message.  For that,
>> the
>> >> >> > register allocator has a workaround to ensure that it's assigned an
>> >> even
>> >> >> > register on SNB.  One of the early patches in my series replaces
>> the
>> >> >> broken
>> >> >> > gen4.5-6 PLN fall-back (it didn't work in SIMD16 because it assumed
>> >> the
>> >> >> > wrong register layout for coordinates) with an assert and Jenkins
>> is
>> >> just
>> >> >> > fine with the assert.
>> >> >> >
>> >> >>
>> >> >> I know the conditions for the non-PLN fall-back are not typically
>> >> >> encountered on Gen5-6, but it's still valid IR, so this
>> implementation
>> >> >> of writes_accumulator_implicitly() relies on the behavior of the
>> >> >> register allocator, the NIR-to-i965 translation pass and the rest of
>> the
>> >> >> visitor being exactly what you expect in every possible codepath for
>> >> >> correctness.
>> >> >
>> >> >
>> >> > It's already relied on for correctness because the LINE+MAC fallback
>> that
>> >> > we had before is wrong in SIMD16 (see patch 33).
>> >> >
>> >>
>> >> Yes, I had to do basically the same fix on my SIMD32 branch in order for
>> >> it to pass piglit on Gen4-5.  Still hardly an excuse to make more code
>> >> rely on the same broken assumption which wasn't doing it before.
>> >>
>> >
>> > I'm very confused by what you mean here.  The old code assumed the
>> LINE+MAC
>> > layout register layout which is different from the PLN layout.  This
>> means
>> > that the fallback is broken for g45-gen6 because the hardware supports
>> PLN
>> > so we use the PLN layout which arranges thing x1y1x2y2 but the LINE+MAC
>> > fallback assumes x1x2y1y2.  The reason why we need so much more code is
>> > because we have to split it all the way down to 8-wide in order to handle
>> > the PLN layout with LINE+MAC.  Looking at the code in your branch, I have
>> > no idea how it could possibly work correctly.
>> >
>>
>> It worked correctly with the LINE/MAC path (and in fact it passed piglit
>> with SIMD32 enabled on Gen4-5) in exactly the same way PATCH 33 of this
>> series does, but it used a loop instead of the 3x hand-unrolled code of
>> PATCH 33, and it took advantage of the two accumulator registers for
>> better pipelining.
>
>
> I dug around a bit more in your your branch and I think I know what's going
> on now.  In your patch "i965/fs: Fix LINTERP instruction codegen for
> misaligned regs on pre-Gen7 hw", you changed the LINE+MAC emit code to use
> a loop to generate 1, 2, or 4 8-wide instructions.  This implicitly
> switches the assumed layout of barycentric coordinates in the fall-back
> code to the PLN convention (x1y1x2y2) and breaks SIMD16 on gen4 which lays
> the barycentric coordinates out as x1x2y1y2.  In "i965/fs: Fix Gen4-5
> interpolation setup for SIMD32", you silently change the barycentric
> coordinate layout convention on gen4 to be the PLN convention and it gets
> fixed again.


Yes, that's right, it relies on the other patch that makes the payload
barycentric layout consistent across generations.  And yes, I didn't
spend any effort documenting the changes nor ordering things in a way
that would allow bisection, since that branch wasn't meant to be
reviewed nor merged upstream yet.

> I'm also very confused about accumulators since the SNB documentation
> says that we have 16 floats worth (two accumulators) and your branch
> definitely does 4 8-wide LINE instructions followed by 4 8-wide MAC
> instructions.  I agree that taking advantage of the two accumulators
> would be good, but the patch I see in your branch seems to be
> incorrect.
>
No, my branch wasn't doing 4 8-wide instructions in a row, that would
require four accumulators -- It was relying on SIMD lowering to split
things up into 16-wide LINTERP instructions, just like your patch is.

> All in all, changing the gen4 barycentric coordinate convention doesn't
> seem like all that bad of a thing.  It's a bit odd that we lay them out
> differently on exactly one platform.  I suppose, though, that we might be
> able to generate SIMD16 MADs on gen11 if we started using the LINE
> convention again.  Not sure if it's worth it.
>
>
>> I can send you the patch as a replacement for PATCH
>> 33 if you care at all about performance of this corner case on Gen4-6...
>>
>> > My point is that the odd register fallback on g4x-gen6 was never used and
>> > didn't work even if it was.
>> >
>>
>> My point was that it doesn't matter for correctness of this patch
>> whether another codepath in the compiler is buggy under approximately
>> the same conditions.  If you think there is some sort of benefit from
>> committing knowingly broken code just so you can fix it in the next
>> commit, feel free, but please drop my name from the author tag since I
>> didn't intend it to.
>>
>> > --Jason
>> >
>> >
>> >> >
>> >> >> That wasn't the case in my original patch, nor in your
>> >> >> series after PATCHv2 33 because this inaccuracy actually becomes a
>> >> >> problem.  Instead of introducing code which we know is dubiously
>> >> >> correct, CC'ing mesa-stable, and then fixing it immediately, why
>> don't
>> >> >> we just do the obviously correct thing from the start?
>> >> >>
>> >> >
>> >> > Sure, we can squash them together if you really want.  But if you're
>> >> > concerned about this restriction being valid in live code, then 33/53
>> >> also
>> >> > needs to go to stable.  I'm fine with that if you're unconvinced by my
>> >> > argument that LINTERP with an odd coordinate never occurs.
>> >> >
>> >> > --Jason
>> >> >
>> >> >
>> >> >> >
>> >> >> >> but why make things more inaccurate than
>> >> >> >> the original only to revert back to a devinfo->gen based check in
>> >> >> >> PATCHv2 33?
>> >> >> >>
>> >> >> >
>> >> >> > See above.
>> >> >> >
>> >> >> >
>> >> >> >> I think I'd squash the last hunk of PATCHv2 33 into this one which
>> >> would
>> >> >> >> give you something functionally equivalent to v1 but updated to
>> >> handle
>> >> >> >> Gen11 correctly.
>> >> >> >>
>> >> >> >> >  }
>> >> >> >> >
>> >> >> >> >  bool
>> >> >> >> > --
>> >> >> >> > 2.5.0.400.gff86faf
>> >> >> >>
>> >> >>
>> >>
>>

signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 32/53] intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN

Reply via email to