Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
On 1/25/19 6:46 PM, Eric Anholt wrote: > Eduardo Lima Mitev writes: > >> ir3 compiler has an integer multiply-add instruction (IMAD_S24) >> that is used for different offset calculations in the backend. >> Since we intend to move some of these calculations to NIR, we need >> a new ALU op that can represent it. >> --- >> src/compiler/nir/nir_opcodes.py | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/src/compiler/nir/nir_opcodes.py >> b/src/compiler/nir/nir_opcodes.py >> index d32005846a6..b61845fd514 100644 >> --- a/src/compiler/nir/nir_opcodes.py >> +++ b/src/compiler/nir/nir_opcodes.py >> @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, >> src3_size, const_expr): >> [tuint, tuint, tuint], "", const_expr) >> >> triop("ffma", tfloat, "src0 * src1 + src2") >> +triop("imad", tint, "src0 * src1 + src2") > > The constant-folding expression should be corrected to what the backend > operation actually does, and you should probably call it imad24 or > something in that case. > Roger. Got similar feedback from Ilia. Will fix that. Eduardo ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
Eduardo Lima Mitev writes: > ir3 compiler has an integer multiply-add instruction (IMAD_S24) > that is used for different offset calculations in the backend. > Since we intend to move some of these calculations to NIR, we need > a new ALU op that can represent it. > --- > src/compiler/nir/nir_opcodes.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py > index d32005846a6..b61845fd514 100644 > --- a/src/compiler/nir/nir_opcodes.py > +++ b/src/compiler/nir/nir_opcodes.py > @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, > src3_size, const_expr): > [tuint, tuint, tuint], "", const_expr) > > triop("ffma", tfloat, "src0 * src1 + src2") > +triop("imad", tint, "src0 * src1 + src2") The constant-folding expression should be corrected to what the backend operation actually does, and you should probably call it imad24 or something in that case. signature.asc Description: PGP signature ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
The specification in NIR has to be exact. Otherwise it will constant-fold in a way that doesn't reflect what the hardware would do, leading to subtle bugs. On Fri, Jan 25, 2019 at 11:06 AM Eduardo Lima Mitev wrote: > > On 1/25/19 5:01 PM, Ilia Mirkin wrote: > > On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin wrote: > >> > >> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called > >> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely > >> had this, and I think maxwell+ has a variant of this implemented by > >> XMAD): > >> > >> (src0 * src1) & 0xff + src2 > > > > And of course even that's wrong... the 24th bit has to get > > sign-extended on that. Can express it with shifts. > > > > IMAD_S24 is what is currently used in > ir3_compiler_nir::get_image_offset(), so the pass doesn't change > anything regarding computations. > > I agree that the nir opcode should hint at the bit limit, so probably > nir_op_imad24. That is one of the open questions. > > Thanks, > > Eduardo > > ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
On 1/25/19 5:01 PM, Ilia Mirkin wrote: > On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin wrote: >> >> IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called >> imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely >> had this, and I think maxwell+ has a variant of this implemented by >> XMAD): >> >> (src0 * src1) & 0xff + src2 > > And of course even that's wrong... the 24th bit has to get > sign-extended on that. Can express it with shifts. > IMAD_S24 is what is currently used in ir3_compiler_nir::get_image_offset(), so the pass doesn't change anything regarding computations. I agree that the nir opcode should hint at the bit limit, so probably nir_op_imad24. That is one of the open questions. Thanks, Eduardo ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
On Fri, Jan 25, 2019 at 10:58 AM Ilia Mirkin wrote: > > IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called > imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely > had this, and I think maxwell+ has a variant of this implemented by > XMAD): > > (src0 * src1) & 0xff + src2 And of course even that's wrong... the 24th bit has to get sign-extended on that. Can express it with shifts. ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno
Re: [Freedreno] [Mesa-dev] [RFC 2/4] nir: Add a new ALU nir_op_imad
IMAD_S24 isn't src0 * src1 + src2 though. I think this could be called imad24, which I suspect exits on many GPUs (nv50-era NVIDIA definitely had this, and I think maxwell+ has a variant of this implemented by XMAD): (src0 * src1) & 0xff + src2 Cheers, -ilia On Fri, Jan 25, 2019 at 10:49 AM Eduardo Lima Mitev wrote: > > ir3 compiler has an integer multiply-add instruction (IMAD_S24) > that is used for different offset calculations in the backend. > Since we intend to move some of these calculations to NIR, we need > a new ALU op that can represent it. > --- > src/compiler/nir/nir_opcodes.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py > index d32005846a6..b61845fd514 100644 > --- a/src/compiler/nir/nir_opcodes.py > +++ b/src/compiler/nir/nir_opcodes.py > @@ -754,6 +754,7 @@ def triop_horiz(name, output_size, src1_size, src2_size, > src3_size, const_expr): > [tuint, tuint, tuint], "", const_expr) > > triop("ffma", tfloat, "src0 * src1 + src2") > +triop("imad", tint, "src0 * src1 + src2") > > triop("flrp", tfloat, "src0 * (1 - src2) + src1 * src2") > > -- > 2.20.1 > > ___ > mesa-dev mailing list > mesa-...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ Freedreno mailing list Freedreno@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/freedreno