Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On 11:21 AM - Apr 18 2016, Hans de Goede wrote: > Hi, > > On 17-04-16 22:27, Pierre Moreau wrote: > >On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote: > >>On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreau> >>wrote: > >>>Ping :-) > >>> > >>>On 10:56 PM - Mar 19 2016, Pierre Moreau wrote: > Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the > GPU > unhappy. Instead, manually handle the conversion between 64-bit and 32-bit > values, and use `cvt` to convert between the original target (resp. > source) > and 32-bit value. This happens to be the behaviour of NVIDIA's driver. > > Signed-off-by: Pierre Moreau > --- > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 > ++ > .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + > 2 files changed, 60 insertions(+) > > diff --git > a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > index 2719f2c..c419a68 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) > return true; > } > > +bool > +NVC0LoweringPass::handleCVT(Instruction *i) > +{ > + if (isFloatType(i->dType) || isFloatType(i->sType) || > + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) > >> > >>I know pre-C89 features are cool, but let's avoid using them. I know > >>characters like ^ were uncommon on the 1960's and 1970's teletypes, > >>but I think we're past those days now. > > > >Yeah… Will fix that. > > So "xor" or "^" is bitwise not logical, since isSignedIntType() returns > a bool, which when cast to an int is guaranteed to be 0 or 1, this > should work fine. > > And being a bitwise op its presedence means it will get evaluated > before the "||" operators in your condition which I believe is what > we want here, but can we please have a pair of parenthesis around the > "^" and its operands to make this more clear ? Sure, I’ll add a pair of parenthesis around it. Regards, Pierre > > Regards, > > Hans > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
Hi, On 17-04-16 22:27, Pierre Moreau wrote: On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote: On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreauwrote: Ping :-) On 10:56 PM - Mar 19 2016, Pierre Moreau wrote: Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU unhappy. Instead, manually handle the conversion between 64-bit and 32-bit values, and use `cvt` to convert between the original target (resp. source) and 32-bit value. This happens to be the behaviour of NVIDIA's driver. Signed-off-by: Pierre Moreau --- .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 ++ .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + 2 files changed, 60 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp index 2719f2c..c419a68 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) return true; } +bool +NVC0LoweringPass::handleCVT(Instruction *i) +{ + if (isFloatType(i->dType) || isFloatType(i->sType) || + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) I know pre-C89 features are cool, but let's avoid using them. I know characters like ^ were uncommon on the 1960's and 1970's teletypes, but I think we're past those days now. Yeah… Will fix that. So "xor" or "^" is bitwise not logical, since isSignedIntType() returns a bool, which when cast to an int is guaranteed to be 0 or 1, this should work fine. And being a bitwise op its presedence means it will get evaluated before the "||" operators in your condition which I believe is what we want here, but can we please have a pair of parenthesis around the "^" and its operands to make this more clear ? Regards, Hans ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On Sun, Apr 17, 2016 at 4:27 PM, Pierre Moreauwrote: > On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote: >> On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreau wrote: >> > Ping :-) >> > >> > On 10:56 PM - Mar 19 2016, Pierre Moreau wrote: >> >> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the >> >> GPU >> >> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit >> >> values, and use `cvt` to convert between the original target (resp. >> >> source) >> >> and 32-bit value. This happens to be the behaviour of NVIDIA's driver. >> >> >> >> Signed-off-by: Pierre Moreau >> >> --- >> >> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 >> >> ++ >> >> .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + >> >> 2 files changed, 60 insertions(+) >> >> >> >> diff --git >> >> a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> >> index 2719f2c..c419a68 100644 >> >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> >> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) >> >> return true; >> >> } >> >> >> >> +bool >> >> +NVC0LoweringPass::handleCVT(Instruction *i) >> >> +{ >> >> + if (isFloatType(i->dType) || isFloatType(i->sType) || >> >> + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) >> >> I know pre-C89 features are cool, but let's avoid using them. I know >> characters like ^ were uncommon on the 1960's and 1970's teletypes, >> but I think we're past those days now. > > Yeah… Will fix that. > >> >> >> + return false; >> >> + >> >> + if (typeSizeof(i->sType) == 8) { >> >> + Value *toSplit = i->getSrc(0); >> >> + if (i->saturate) { >> >> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul); >> >> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX); >> >> + if (isSignedType(i->sType)) { >> >> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN); >> >> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX); >> >> + } >> >> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), >> >> toSplit, >> >> +minValue); >> >> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes, >> >> + maxValue); >> >> Aren't you assuming that i->dType == 4 here? It could be an unsigned >> <-> signed conversion, at 64-bit. So the clamp values would be > > I am assuming `i->dType <= 4`: remember the ^ from before! ;-) But, it > could be a U64 <=> U64 or S64 <=> S64 conversion, which would then fail… Oh I see. And the == 4 out is to avoid the extra cvt... which we don't optimize out? I thought we should, but maybe not. Some asserts could be in order, like assert(typeSizeof(i->sType) != typeSizeof(i->dType)) or something like that above, to clarify what you mean. -ilia > >> different. Handling ALL the cases is quite annoying... can you figure >> out what the hw doesn't support and just handle that? I doubt it'll be >> any slower, and definitely simpler. > > I don’t remember if I checked U64 <=> S64 conversions… Will need to refresh my > memory and log which combinations fail. > > Thanks! > Pierre > >> >> -ilia >> >> >> + } >> >> + >> >> + Value *value32[2] = { bld.getSSA(), bld.getSSA() }; >> >> + bld.mkSplit(value32, 4, toSplit); >> >> + if (typeSizeof(i->dType) == 4) { >> >> + bld.mkMov(i->getDef(0), value32[0], i->dType); >> >> + delete_Instruction(prog, i); >> >> + return true; >> >> + } >> >> + >> >> + i->setSrc(0, bld.getSSA()); >> >> + i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; >> >> + bld.mkMov(i->getSrc(0), value32[0], i->sType); >> >> + } else if (typeSizeof(i->dType) == 8) { >> >> + bld.setPosition(i, true); >> >> + Value *res = i->getDef(0); >> >> + Value *high32 = bld.loadImm(bld.getSSA(), >> >> + isSignedType(i->sType) ? UINT32_MAX : >> >> 0u); >> >> + Value *low32 = i->getSrc(0); >> >> + DataType resType = i->dType; >> >> + >> >> + if (typeSizeof(i->sType) <= 2) { >> >> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; >> >> + low32 = bld.getSSA(); >> >> + i->setDef(0, low32); >> >> + } else if (typeSizeof(i->sType) == 4) { >> >> + delete_Instruction(prog, i); >> >> + } >> >> + >> >> + Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), >> >> low32, >> >> + high32); >> >> + bld.mkMov(res, merged64, resType); >> >> + } >> >> + >> >> + return true; >> >> +} >> >> + >> >> // Generate a binary predicate if an instruction is predicated by >>
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On 04:17 PM - Apr 17 2016, Ilia Mirkin wrote: > On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreauwrote: > > Ping :-) > > > > On 10:56 PM - Mar 19 2016, Pierre Moreau wrote: > >> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the > >> GPU > >> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit > >> values, and use `cvt` to convert between the original target (resp. source) > >> and 32-bit value. This happens to be the behaviour of NVIDIA's driver. > >> > >> Signed-off-by: Pierre Moreau > >> --- > >> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 > >> ++ > >> .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + > >> 2 files changed, 60 insertions(+) > >> > >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > >> index 2719f2c..c419a68 100644 > >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > >> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) > >> return true; > >> } > >> > >> +bool > >> +NVC0LoweringPass::handleCVT(Instruction *i) > >> +{ > >> + if (isFloatType(i->dType) || isFloatType(i->sType) || > >> + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) > > I know pre-C89 features are cool, but let's avoid using them. I know > characters like ^ were uncommon on the 1960's and 1970's teletypes, > but I think we're past those days now. Yeah… Will fix that. > > >> + return false; > >> + > >> + if (typeSizeof(i->sType) == 8) { > >> + Value *toSplit = i->getSrc(0); > >> + if (i->saturate) { > >> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul); > >> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX); > >> + if (isSignedType(i->sType)) { > >> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN); > >> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX); > >> + } > >> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), > >> toSplit, > >> +minValue); > >> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes, > >> + maxValue); > > Aren't you assuming that i->dType == 4 here? It could be an unsigned > <-> signed conversion, at 64-bit. So the clamp values would be I am assuming `i->dType <= 4`: remember the ^ from before! ;-) But, it could be a U64 <=> U64 or S64 <=> S64 conversion, which would then fail… > different. Handling ALL the cases is quite annoying... can you figure > out what the hw doesn't support and just handle that? I doubt it'll be > any slower, and definitely simpler. I don’t remember if I checked U64 <=> S64 conversions… Will need to refresh my memory and log which combinations fail. Thanks! Pierre > > -ilia > > >> + } > >> + > >> + Value *value32[2] = { bld.getSSA(), bld.getSSA() }; > >> + bld.mkSplit(value32, 4, toSplit); > >> + if (typeSizeof(i->dType) == 4) { > >> + bld.mkMov(i->getDef(0), value32[0], i->dType); > >> + delete_Instruction(prog, i); > >> + return true; > >> + } > >> + > >> + i->setSrc(0, bld.getSSA()); > >> + i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > >> + bld.mkMov(i->getSrc(0), value32[0], i->sType); > >> + } else if (typeSizeof(i->dType) == 8) { > >> + bld.setPosition(i, true); > >> + Value *res = i->getDef(0); > >> + Value *high32 = bld.loadImm(bld.getSSA(), > >> + isSignedType(i->sType) ? UINT32_MAX : > >> 0u); > >> + Value *low32 = i->getSrc(0); > >> + DataType resType = i->dType; > >> + > >> + if (typeSizeof(i->sType) <= 2) { > >> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > >> + low32 = bld.getSSA(); > >> + i->setDef(0, low32); > >> + } else if (typeSizeof(i->sType) == 4) { > >> + delete_Instruction(prog, i); > >> + } > >> + > >> + Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), > >> low32, > >> + high32); > >> + bld.mkMov(res, merged64, resType); > >> + } > >> + > >> + return true; > >> +} > >> + > >> // Generate a binary predicate if an instruction is predicated by > >> // e.g. an f32 value. > >> void > >> @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i) > >>checkPredicate(i); > >> > >> switch (i->op) { > >> + case OP_CVT: > >> + return handleCVT(i); > >> case OP_TEX: > >> case OP_TXB: > >> case OP_TXL: > >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > >> index 6eb8aff..9fc24d9 100644 > >> ---
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On Sun, Apr 17, 2016 at 4:07 PM, Pierre Moreauwrote: > Ping :-) > > On 10:56 PM - Mar 19 2016, Pierre Moreau wrote: >> Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU >> unhappy. Instead, manually handle the conversion between 64-bit and 32-bit >> values, and use `cvt` to convert between the original target (resp. source) >> and 32-bit value. This happens to be the behaviour of NVIDIA's driver. >> >> Signed-off-by: Pierre Moreau >> --- >> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 >> ++ >> .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + >> 2 files changed, 60 insertions(+) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> index 2719f2c..c419a68 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp >> @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) >> return true; >> } >> >> +bool >> +NVC0LoweringPass::handleCVT(Instruction *i) >> +{ >> + if (isFloatType(i->dType) || isFloatType(i->sType) || >> + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) I know pre-C89 features are cool, but let's avoid using them. I know characters like ^ were uncommon on the 1960's and 1970's teletypes, but I think we're past those days now. >> + return false; >> + >> + if (typeSizeof(i->sType) == 8) { >> + Value *toSplit = i->getSrc(0); >> + if (i->saturate) { >> + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul); >> + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX); >> + if (isSignedType(i->sType)) { >> +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN); >> +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX); >> + } >> + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), >> toSplit, >> +minValue); >> + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes, >> + maxValue); Aren't you assuming that i->dType == 4 here? It could be an unsigned <-> signed conversion, at 64-bit. So the clamp values would be different. Handling ALL the cases is quite annoying... can you figure out what the hw doesn't support and just handle that? I doubt it'll be any slower, and definitely simpler. -ilia >> + } >> + >> + Value *value32[2] = { bld.getSSA(), bld.getSSA() }; >> + bld.mkSplit(value32, 4, toSplit); >> + if (typeSizeof(i->dType) == 4) { >> + bld.mkMov(i->getDef(0), value32[0], i->dType); >> + delete_Instruction(prog, i); >> + return true; >> + } >> + >> + i->setSrc(0, bld.getSSA()); >> + i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; >> + bld.mkMov(i->getSrc(0), value32[0], i->sType); >> + } else if (typeSizeof(i->dType) == 8) { >> + bld.setPosition(i, true); >> + Value *res = i->getDef(0); >> + Value *high32 = bld.loadImm(bld.getSSA(), >> + isSignedType(i->sType) ? UINT32_MAX : 0u); >> + Value *low32 = i->getSrc(0); >> + DataType resType = i->dType; >> + >> + if (typeSizeof(i->sType) <= 2) { >> + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; >> + low32 = bld.getSSA(); >> + i->setDef(0, low32); >> + } else if (typeSizeof(i->sType) == 4) { >> + delete_Instruction(prog, i); >> + } >> + >> + Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32, >> + high32); >> + bld.mkMov(res, merged64, resType); >> + } >> + >> + return true; >> +} >> + >> // Generate a binary predicate if an instruction is predicated by >> // e.g. an f32 value. >> void >> @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i) >>checkPredicate(i); >> >> switch (i->op) { >> + case OP_CVT: >> + return handleCVT(i); >> case OP_TEX: >> case OP_TXB: >> case OP_TXL: >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h >> index 6eb8aff..9fc24d9 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h >> @@ -96,6 +96,7 @@ protected: >> bool handleMOD(Instruction *); >> bool handleSQRT(Instruction *); >> bool handlePOW(Instruction *); >> + bool handleCVT(Instruction *); >> bool handleTEX(TexInstruction *); >> bool handleTXD(TexInstruction *); >> bool handleTXQ(TexInstruction *); >> -- >> 2.7.4 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >>
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
Ping :-) On 10:56 PM - Mar 19 2016, Pierre Moreau wrote: > Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU > unhappy. Instead, manually handle the conversion between 64-bit and 32-bit > values, and use `cvt` to convert between the original target (resp. source) > and 32-bit value. This happens to be the behaviour of NVIDIA's driver. > > Signed-off-by: Pierre Moreau> --- > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 > ++ > .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + > 2 files changed, 60 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > index 2719f2c..c419a68 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) > return true; > } > > +bool > +NVC0LoweringPass::handleCVT(Instruction *i) > +{ > + if (isFloatType(i->dType) || isFloatType(i->sType) || > + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) > + return false; > + > + if (typeSizeof(i->sType) == 8) { > + Value *toSplit = i->getSrc(0); > + if (i->saturate) { > + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul); > + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX); > + if (isSignedType(i->sType)) { > +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN); > +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX); > + } > + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), toSplit, > +minValue); > + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes, > + maxValue); > + } > + > + Value *value32[2] = { bld.getSSA(), bld.getSSA() }; > + bld.mkSplit(value32, 4, toSplit); > + if (typeSizeof(i->dType) == 4) { > + bld.mkMov(i->getDef(0), value32[0], i->dType); > + delete_Instruction(prog, i); > + return true; > + } > + > + i->setSrc(0, bld.getSSA()); > + i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > + bld.mkMov(i->getSrc(0), value32[0], i->sType); > + } else if (typeSizeof(i->dType) == 8) { > + bld.setPosition(i, true); > + Value *res = i->getDef(0); > + Value *high32 = bld.loadImm(bld.getSSA(), > + isSignedType(i->sType) ? UINT32_MAX : 0u); > + Value *low32 = i->getSrc(0); > + DataType resType = i->dType; > + > + if (typeSizeof(i->sType) <= 2) { > + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > + low32 = bld.getSSA(); > + i->setDef(0, low32); > + } else if (typeSizeof(i->sType) == 4) { > + delete_Instruction(prog, i); > + } > + > + Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32, > + high32); > + bld.mkMov(res, merged64, resType); > + } > + > + return true; > +} > + > // Generate a binary predicate if an instruction is predicated by > // e.g. an f32 value. > void > @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i) >checkPredicate(i); > > switch (i->op) { > + case OP_CVT: > + return handleCVT(i); > case OP_TEX: > case OP_TXB: > case OP_TXL: > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > index 6eb8aff..9fc24d9 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > @@ -96,6 +96,7 @@ protected: > bool handleMOD(Instruction *); > bool handleSQRT(Instruction *); > bool handlePOW(Instruction *); > + bool handleCVT(Instruction *); > bool handleTEX(TexInstruction *); > bool handleTXD(TexInstruction *); > bool handleTXQ(TexInstruction *); > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On 06:41 PM - Mar 19 2016, Ilia Mirkin wrote: > On Sat, Mar 19, 2016 at 6:36 PM, Pierre Moreauwrote: > > On 06:26 PM - Mar 19 2016, Ilia Mirkin wrote: > >> On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreau > >> wrote: > >> > However, you could have some `long bar; char foo = > >> > convert_char_sat(bar);` in > >> > the OpenCL kernel. > >> > >> Sure, but the SPIR-V -> nv50/ir converter could be smarter about when > >> it generates the converts, no? > > > > It should be possible, but then I'm unsure what ends up in the SPIR-V -> > > nv50/ir converter and what ends up in the nv50/ir backend. Should I also do > > constant folding in the converter? I was assuming the backend would take > > care > > of the optimisations, removing useless converts, but maybe my assumptions > > were > > wrong and I have to take care of more things than just translating from > > SPIR-V > > to nv50/ir? > > Well, the nv50 ir is not a well-specified IR. So it's kind of up to us > what to do. However if the hw hates converts with src/dst types like > that, I'm perfectly happy to decree that such converts shall never > make it into the IR. That said, if you feel strongly about it, we can > leave it as a fix-up pass. What about nv50, need the same logic there > too right? If such converts are not permitted in nv50/ir, then that means each converter to nv50/ir will need to do the fixup themselves, resulting in, most likely, duplicate code between them, as they will handle it more or less the same way. (Well, there aren't many converters to nv50/ir, so not really a big issue here.) Whereas if we have it as a fixup pass, we need the code only once, and have it handled for all existing (and future) converters. But you have way more experience with nv50/ir and compilers than I have, so, your call. Most likely, but I haven't tried it. I should probably have this code in an earlier pass then, that is not family dependent. Pierre > > -ilia > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On Sat, Mar 19, 2016 at 6:36 PM, Pierre Moreauwrote: > On 06:26 PM - Mar 19 2016, Ilia Mirkin wrote: >> On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreau wrote: >> > However, you could have some `long bar; char foo = convert_char_sat(bar);` >> > in >> > the OpenCL kernel. >> >> Sure, but the SPIR-V -> nv50/ir converter could be smarter about when >> it generates the converts, no? > > It should be possible, but then I'm unsure what ends up in the SPIR-V -> > nv50/ir converter and what ends up in the nv50/ir backend. Should I also do > constant folding in the converter? I was assuming the backend would take care > of the optimisations, removing useless converts, but maybe my assumptions were > wrong and I have to take care of more things than just translating from SPIR-V > to nv50/ir? Well, the nv50 ir is not a well-specified IR. So it's kind of up to us what to do. However if the hw hates converts with src/dst types like that, I'm perfectly happy to decree that such converts shall never make it into the IR. That said, if you feel strongly about it, we can leave it as a fix-up pass. What about nv50, need the same logic there too right? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On 06:26 PM - Mar 19 2016, Ilia Mirkin wrote: > On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreauwrote: > > However, you could have some `long bar; char foo = convert_char_sat(bar);` > > in > > the OpenCL kernel. > > Sure, but the SPIR-V -> nv50/ir converter could be smarter about when > it generates the converts, no? It should be possible, but then I'm unsure what ends up in the SPIR-V -> nv50/ir converter and what ends up in the nv50/ir backend. Should I also do constant folding in the converter? I was assuming the backend would take care of the optimisations, removing useless converts, but maybe my assumptions were wrong and I have to take care of more things than just translating from SPIR-V to nv50/ir? Pierre > > -ilia > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On Sat, Mar 19, 2016 at 6:26 PM, Pierre Moreauwrote: > However, you could have some `long bar; char foo = convert_char_sat(bar);` in > the OpenCL kernel. Sure, but the SPIR-V -> nv50/ir converter could be smarter about when it generates the converts, no? -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
On 06:06 PM - Mar 19 2016, Ilia Mirkin wrote: > Where are these coming from? Could you perhaps not generate them in > the first place? Those are coming from the generated SPIR-V, of the following kernel for example: __kernel void global_id(__global int * out) { unsigned id = get_global_id(0); out[id] = id; } But I don't see any reason why there should be cvt generated in this case. I'll have to investigate the SPIR-V generation. However, you could have some `long bar; char foo = convert_char_sat(bar);` in the OpenCL kernel. > > On Sat, Mar 19, 2016 at 5:56 PM, Pierre Moreauwrote: > > Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the > > GPU > > unhappy. Instead, manually handle the conversion between 64-bit and 32-bit > > values, and use `cvt` to convert between the original target (resp. source) > > and 32-bit value. This happens to be the behaviour of NVIDIA's driver. > > > > Signed-off-by: Pierre Moreau > > --- > > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 > > ++ > > .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + > > 2 files changed, 60 insertions(+) > > > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > > index 2719f2c..c419a68 100644 > > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > > @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) > > return true; > > } > > > > +bool > > +NVC0LoweringPass::handleCVT(Instruction *i) > > +{ > > + if (isFloatType(i->dType) || isFloatType(i->sType) || > > + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) > > + return false; > > + > > + if (typeSizeof(i->sType) == 8) { > > + Value *toSplit = i->getSrc(0); > > + if (i->saturate) { > > + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul); > > + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX); > > + if (isSignedType(i->sType)) { > > +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN); > > +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX); > > + } > > + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), > > toSplit, > > +minValue); > > + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes, > > + maxValue); > > + } > > + > > + Value *value32[2] = { bld.getSSA(), bld.getSSA() }; > > + bld.mkSplit(value32, 4, toSplit); > > + if (typeSizeof(i->dType) == 4) { > > + bld.mkMov(i->getDef(0), value32[0], i->dType); > > + delete_Instruction(prog, i); > > + return true; > > + } > > + > > + i->setSrc(0, bld.getSSA()); > > + i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > > + bld.mkMov(i->getSrc(0), value32[0], i->sType); > > + } else if (typeSizeof(i->dType) == 8) { > > + bld.setPosition(i, true); > > + Value *res = i->getDef(0); > > + Value *high32 = bld.loadImm(bld.getSSA(), > > + isSignedType(i->sType) ? UINT32_MAX : > > 0u); > > + Value *low32 = i->getSrc(0); > > + DataType resType = i->dType; > > + > > + if (typeSizeof(i->sType) <= 2) { > > + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > > + low32 = bld.getSSA(); > > + i->setDef(0, low32); > > + } else if (typeSizeof(i->sType) == 4) { > > + delete_Instruction(prog, i); > > + } > > + > > + Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32, > > + high32); > > + bld.mkMov(res, merged64, resType); > > + } > > + > > + return true; > > +} > > + > > // Generate a binary predicate if an instruction is predicated by > > // e.g. an f32 value. > > void > > @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i) > >checkPredicate(i); > > > > switch (i->op) { > > + case OP_CVT: > > + return handleCVT(i); > > case OP_TEX: > > case OP_TXB: > > case OP_TXL: > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > > index 6eb8aff..9fc24d9 100644 > > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > > @@ -96,6 +96,7 @@ protected: > > bool handleMOD(Instruction *); > > bool handleSQRT(Instruction *); > > bool handlePOW(Instruction *); > > + bool handleCVT(Instruction *); > > bool handleTEX(TexInstruction *); > > bool handleTXD(TexInstruction *); > > bool handleTXQ(TexInstruction *); > > -- > > 2.7.4 > > > >
Re: [Mesa-dev] [PATCH] nvc0/lowering: Handle conversions to U64/S64 manually
Where are these coming from? Could you perhaps not generate them in the first place? On Sat, Mar 19, 2016 at 5:56 PM, Pierre Moreauwrote: > Generating a `cvt u32 $r0 u64 $r1d` or a `cvt u64 $r0d u32 $r2` makes the GPU > unhappy. Instead, manually handle the conversion between 64-bit and 32-bit > values, and use `cvt` to convert between the original target (resp. source) > and 32-bit value. This happens to be the behaviour of NVIDIA's driver. > > Signed-off-by: Pierre Moreau > --- > .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 59 > ++ > .../nouveau/codegen/nv50_ir_lowering_nvc0.h| 1 + > 2 files changed, 60 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > index 2719f2c..c419a68 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp > @@ -1859,6 +1859,63 @@ NVC0LoweringPass::handleOUT(Instruction *i) > return true; > } > > +bool > +NVC0LoweringPass::handleCVT(Instruction *i) > +{ > + if (isFloatType(i->dType) || isFloatType(i->sType) || > + isSignedIntType(i->dType) xor isSignedIntType(i->sType)) > + return false; > + > + if (typeSizeof(i->sType) == 8) { > + Value *toSplit = i->getSrc(0); > + if (i->saturate) { > + Value *minValue = bld.loadImm(bld.getSSA(8), 0ul); > + Value *maxValue = bld.loadImm(bld.getSSA(8), UINT32_MAX); > + if (isSignedType(i->sType)) { > +minValue = bld.loadImm(bld.getSSA(8), INT32_MIN); > +maxValue = bld.loadImm(bld.getSSA(8), INT32_MAX); > + } > + Value *minRes = bld.mkOp2v(OP_MAX, i->sType, bld.getSSA(8), toSplit, > +minValue); > + toSplit = bld.mkOp2v(OP_MIN, i->sType, bld.getSSA(8), minRes, > + maxValue); > + } > + > + Value *value32[2] = { bld.getSSA(), bld.getSSA() }; > + bld.mkSplit(value32, 4, toSplit); > + if (typeSizeof(i->dType) == 4) { > + bld.mkMov(i->getDef(0), value32[0], i->dType); > + delete_Instruction(prog, i); > + return true; > + } > + > + i->setSrc(0, bld.getSSA()); > + i->sType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > + bld.mkMov(i->getSrc(0), value32[0], i->sType); > + } else if (typeSizeof(i->dType) == 8) { > + bld.setPosition(i, true); > + Value *res = i->getDef(0); > + Value *high32 = bld.loadImm(bld.getSSA(), > + isSignedType(i->sType) ? UINT32_MAX : 0u); > + Value *low32 = i->getSrc(0); > + DataType resType = i->dType; > + > + if (typeSizeof(i->sType) <= 2) { > + i->dType = isSignedIntType(i->dType) ? TYPE_S32 : TYPE_U32; > + low32 = bld.getSSA(); > + i->setDef(0, low32); > + } else if (typeSizeof(i->sType) == 4) { > + delete_Instruction(prog, i); > + } > + > + Value *merged64 = bld.mkOp2v(OP_MERGE, resType, bld.getSSA(8), low32, > + high32); > + bld.mkMov(res, merged64, resType); > + } > + > + return true; > +} > + > // Generate a binary predicate if an instruction is predicated by > // e.g. an f32 value. > void > @@ -1894,6 +1951,8 @@ NVC0LoweringPass::visit(Instruction *i) >checkPredicate(i); > > switch (i->op) { > + case OP_CVT: > + return handleCVT(i); > case OP_TEX: > case OP_TXB: > case OP_TXL: > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > index 6eb8aff..9fc24d9 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.h > @@ -96,6 +96,7 @@ protected: > bool handleMOD(Instruction *); > bool handleSQRT(Instruction *); > bool handlePOW(Instruction *); > + bool handleCVT(Instruction *); > bool handleTEX(TexInstruction *); > bool handleTXD(TexInstruction *); > bool handleTXQ(TexInstruction *); > -- > 2.7.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev